Tag Archives: high_resolution

Who will survive?

4.12.2006 admin

When looking at gene variants in a population we may forget that even having a perfect sampling scheme this will not be an unbiased view of the human genome. Earlier studies suggested that up to 75% of conceptions are lost during early development; a further indicator of an biased view are unexplained cases of departure from Hardy-Weinberg equilibrium.
Selective survival during early pregnancy is still a terra incognita and except of studies in the Hutterites I am not aware of any (modern) study that looked at selective survival.
A study of Grant Montgomery now shows fresh data on genomewide allele sharing in 1,592 DZ twins from Australia and 336 DZ pairs from the Netherlands.
It is somewhat disappointing that there is no excess allele sharing in the HLA region nor somewhere else in the genome. Maybe further studies can do that a high resolution than with just 359 microsatellite marker?

Genetics, Software

Number cruncher

29.11.2006 admin

In a recent blog I described high resolution SNP datasets that are available on the net. To work with these datasets you will probably need to upgrade your hardware and software. For data handling many people stick nowadays to commercial SQL databases that have plugins for PD software.
My recommendation is to save that money and store the data in a special format that may be more useful for these large dataset; details are in a technical report that I will upload later this day. In the meantime you can already check some software tools to work with these large datasets. This is what I know so far

David Duffy has recompiled his sibpair program |link
Geron(R) has something under development |link
Jochen Hampe and colleagues offer Genomizer |link
Franz Rüschendorf developed Alohomora |link
I renember about SNPGWA, a development at Wake Forest University |no link yet
there will be a R-Bioconductor package by Rob Scharpf |no link yet
R library GenABEL by Yurii Aulchenko |link
R library SNPassoc by Juan GonzÃ¡lez |link

Addendum

A technical report how to work with large SNP dataset is now also available at my paper section. Alternatives to what I am suggesting in this paper, have been set out by an anonmyous reviewer

For R users, if SQLite limits are reached, hdf5 (http://hdf.ncsa.uiuc.edu/HDF5/) may be one way forward for really huge table structures since there is an R interface already available. PostgreSQL column limit depends on data type with a maximum of 1600 for simple types. MySQL with the BerkeleyDB backend may be like SQLite with no obvious column count limit. Metakit is not mentioned â€“ it is column oriented and probably also has â€œunlimitedâ€ columns as long as each database is < 1GB or so.

Genetics

Best of two worlds

17.11.2006 admin

Finally, linkage and association data can be used together after downloading new software using genotype inference.

It reduces the number of genotyping reactions and increases the power of genome-wide association studies. Our method combines sparse marker data from a linkage scan and high-resolution SNP genotypes for several individuals to infer genotypes for related individuals.

Sure, we

could already test association only in linked families
knew that linkage genome scans will improve the power of association
could evaluate by stepc if a polymorphisms explains a linkage result

but this seems to be the best recycling for our old fashioned linkage data. Yea, yea.