{"id":449,"date":"2006-11-29T15:23:53","date_gmt":"2006-11-29T13:23:53","guid":{"rendered":"http:\/\/www.wjst.de\/blog\/2006\/11\/29\/number-cruncher\/"},"modified":"2007-02-22T10:35:11","modified_gmt":"2007-02-22T08:35:11","slug":"number-cruncher","status":"publish","type":"post","link":"https:\/\/www.wjst.de\/blog\/sciencesurf\/2006\/11\/number-cruncher\/","title":{"rendered":"Number cruncher"},"content":{"rendered":"<p>In a recent <a href=\"https:\/\/www.wjst.de\/blog\/2006\/10\/24\/free-for-all\/\">blog<\/a> I described high resolution SNP datasets that are available on the net. To work with these datasets you will probably need to upgrade your hardware and software. For data handling many people stick nowadays to commercial SQL databases that have plugins for PD software.<br \/>\nMy recommendation is to save that money and store the data in a special format that may be more useful for these large dataset; details are in a technical report that I will upload later this day. In the meantime you can already check some software tools to work with these large datasets. This is what I know so far<\/p>\n<ul>\n<li>David Duffy has recompiled his sibpair program |<a href=\"http:\/\/www.qimr.edu.au\/davidD\/davidd.html\">link<\/a><\/li>\n<li>Geron(R) has something under development |<a href=\"http:\/\/neurogenetics.nia.nih.gov\/download.html<\/a><\/li>\n<li>PLINK is available from the Harvard group |<a href=\"http:\/\/pngu.mgh.harvard.edu\/~purcell\/plink\/index.shtml\">link<\/a><\/li>\n<li>Jochen Hampe and colleagues offer Genomizer |<a href=\"http:\/\/www.ikmb.uni-kiel.de\/genomizer\/\">link<\/a><\/li>\n<li>Franz R&uuml;schendorf developed Alohomora |<a href=\"http:\/\/gmc.mdc-berlin.de\/alohomora\/\">link<\/a><\/li>\n<li>I renember about SNPGWA, a development at Wake Forest University |no link yet<\/li>\n<li>there will be a R-Bioconductor package by Rob Scharpf |no link yet<\/li>\n<li>R library GenABEL by Yurii Aulchenko |<a href=\"http:\/\/mga.bionet.nsc.ru\/nlru\/GenABEL\/\">link<\/a><\/li>\n<li>R library SNPassoc by Juan Gonz\u00c3\u00a1lez |<a href=\"http:\/\/davinci.crg.es\/estivill_lab\/snpassoc\">link<\/a><\/li>\n<\/ul>\n<h3>Addendum<\/h3>\n<p>A <a href=\"https:\/\/www.wjst.de\/blog\/wp-content\/paper\/bioinf.pdf\">technical report<\/a> how to work with large SNP dataset is now also available at my paper section. Alternatives to what I am suggesting in this paper, have been set out by an anonmyous reviewer<\/p>\n<blockquote><p>\nFor R users, if SQLite limits are reached, hdf5 (http:\/\/hdf.ncsa.uiuc.edu\/HDF5\/) may be one way forward for really huge table structures since there is an R interface already available. PostgreSQL column limit depends on data type with a maximum of 1600 for simple types. MySQL with the BerkeleyDB backend may be like SQLite with no obvious column count limit. Metakit is not mentioned \u00e2\u20ac\u201c it is column oriented and probably also has \u00e2\u20ac\u0153unlimited\u00e2\u20ac\u009d columns as long as each database is < 1GB or so.\n<\/p><\/blockquote>\n\n<p>&nbsp;<\/p>\n<div class=\"bottom-note\">\n  <span class=\"mod1\">CC-BY-NC Science Surf , accessed 30.07.2026<\/span>\n <\/div>","protected":false},"excerpt":{"rendered":"<p>In a recent blog I described high resolution SNP datasets that are available on the net. To work with these datasets you will probably need to upgrade your hardware and software. For data handling many people stick nowadays to commercial SQL databases that have plugins for PD software. My recommendation is to save that money &hellip; <a href=\"https:\/\/www.wjst.de\/blog\/sciencesurf\/2006\/11\/number-cruncher\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Number cruncher<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2,9],"tags":[2949,2945,4,322,211,323,207,325,324],"class_list":["post-449","post","type-post","status-publish","format-standard","hentry","category-genetics-biology","category-computer-software","tag-computer-software","tag-genetics-biology","tag-population-epidemiology","tag-bioconductor","tag-high_resolution","tag-pd_software","tag-snp","tag-software_tools","tag-sql_databases"],"_links":{"self":[{"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/posts\/449","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/comments?post=449"}],"version-history":[{"count":0,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/posts\/449\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/media?parent=449"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/categories?post=449"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/tags?post=449"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}