Science Surf

In a recent blog I described high resolution SNP datasets that are available on the net. To work with these datasets you will probably need to upgrade your hardware and software. For data handling many people stick nowadays to commercial SQL databases that have plugins for PD software.
My recommendation is to save that money and store the data in a special format that may be more useful for these large dataset; details are in a technical report that I will upload later this day. In the meantime you can already check some software tools to work with these large datasets. This is what I know so far

David Duffy has recompiled his sibpair program |link
Geron(R) has something under development |link
Jochen Hampe and colleagues offer Genomizer |link
Franz Rüschendorf developed Alohomora |link
I renember about SNPGWA, a development at Wake Forest University |no link yet
there will be a R-Bioconductor package by Rob Scharpf |no link yet
R library GenABEL by Yurii Aulchenko |link
R library SNPassoc by Juan GonzÃ¡lez |link

Addendum

A technical report how to work with large SNP dataset is now also available at my paper section. Alternatives to what I am suggesting in this paper, have been set out by an anonmyous reviewer

For R users, if SQLite limits are reached, hdf5 (http://hdf.ncsa.uiuc.edu/HDF5/) may be one way forward for really huge table structures since there is an R interface already available. PostgreSQL column limit depends on data type with a maximum of 1600 for simple types. MySQL with the BerkeleyDB backend may be like SQLite with no obvious column count limit. Metakit is not mentioned â€" it is column oriented and probably also has â€œunlimitedâ€ columns as long as each database is < 1GB or so.

CC-BY-NC Science Surf , accessed 17.07.2026

Tag Archives: sql_databases

Number cruncher

Addendum