Tag Archives: high_resolution

Who will survive?

When looking at gene variants in a population we may forget that even having a perfect sampling scheme this will not be an unbiased view of the human genome. Earlier studies suggested that up to 75% of conceptions are lost during early development; a further indicator of an biased view are unexplained cases of departure from Hardy-Weinberg equilibrium.
Selective survival during early pregnancy is still a terra incognita and except of studies in the Hutterites I am not aware of any (modern) study that looked at selective survival.
A study of Grant Montgomery now shows fresh data on genomewide allele sharing in 1,592 DZ twins from Australia and 336 DZ pairs from the Netherlands.
It is somewhat disappointing that there is no excess allele sharing in the HLA region nor somewhere else in the genome. Maybe further studies can do that a high resolution than with just 359 microsatellite marker?

Number cruncher

In a recent blog I described high resolution SNP datasets that are available on the net. To work with these datasets you will probably need to upgrade your hardware and software. For data handling many people stick nowadays to commercial SQL databases that have plugins for PD software.
My recommendation is to save that money and store the data in a special format that may be more useful for these large dataset; details are in a technical report that I will upload later this day. In the meantime you can already check some software tools to work with these large datasets. This is what I know so far

  • David Duffy has recompiled his sibpair program |link
  • Geron(R) has something under development |link
  • Jochen Hampe and colleagues offer Genomizer |link
  • Franz Rüschendorf developed Alohomora |link
  • I renember about SNPGWA, a development at Wake Forest University |no link yet
  • there will be a R-Bioconductor package by Rob Scharpf |no link yet
  • R library GenABEL by Yurii Aulchenko |link
  • R library SNPassoc by Juan González |link


A technical report how to work with large SNP dataset is now also available at my paper section. Alternatives to what I am suggesting in this paper, have been set out by an anonmyous reviewer

For R users, if SQLite limits are reached, hdf5 (http://hdf.ncsa.uiuc.edu/HDF5/) may be one way forward for really huge table structures since there is an R interface already available. PostgreSQL column limit depends on data type with a maximum of 1600 for simple types. MySQL with the BerkeleyDB backend may be like SQLite with no obvious column count limit. Metakit is not mentioned – it is column oriented and probably also has “unlimited” columns as long as each database is < 1GB or so.

Best of two worlds

Finally, linkage and association data can be used together after downloading new software using genotype inference.

It reduces the number of genotyping reactions and increases the power of genome-wide association studies. Our method combines sparse marker data from a linkage scan and high-resolution SNP genotypes for several individuals to infer genotypes for related individuals.

Sure, we