{"id":498,"date":"2006-12-16T09:44:50","date_gmt":"2006-12-16T07:44:50","guid":{"rendered":"http:\/\/www.wjst.de\/blog\/downloads\/r-parallel-computing\/"},"modified":"2007-03-30T09:40:22","modified_gmt":"2007-03-30T08:40:22","slug":"r-parallel-computing","status":"publish","type":"post","link":"https:\/\/www.wjst.de\/blog\/sciencesurf\/2006\/12\/r-parallel-computing\/","title":{"rendered":"R parallel computing"},"content":{"rendered":"<p>Following several unsuccessful attempts to implement a parallel computing platform for R statistical software, I am showing here my current approach that is largely influenced by a recent paper on cluster programming in c&#8217;t 6\/06 by Oliver Lau (sorry, no online version). My primary interest is with the R library snow (or snow-ft) that offers the function <em>clusterApplyLB<\/em>. This function is all I need for my R programs.<br \/>\nNow it gets more complicated: library(snow) depends on library(Rmpi): Hao Yu has an excellent description at <a href=\"http:\/\/www.stats.uwo.ca\/faculty\/yu\/Rmpi\/\">www.stats.uwo.ca\/faculty\/yu\/Rmpi<\/a> how to set up the mpi layer with <a href=\"http:\/\/www-unix.mcs.anl.gov\/mpi\/mpich2\/\">MPICH2<\/a>. I am currently experimenting with <a href=\"http:\/\/mpi.deino.net\/\">DeinoMPI<\/a> a closely related high performance Windows interface. According to its developer David Ashton it has the following advantages<\/p>\n<blockquote><p>\nFirst, DeinoMPI does not require MPI applications to be started by mpiexec in order to call MPI_Comm_spawn so you could load Rmpi from the Rgui.exe without having to bother with calling mpiexec. Second, DeinoMPI loads the user profile when starting applications so if you query the user&#8217;s temporary directory you will get the user specific path and not the Windows system temp directory. Third, DeinoMPI handles arguments with spaces correctly if you quote them so you can pass environment variables with spaces in them. Fourth, DeinoMPI allows you to use the MPI Info object to pass extra options to MPI_Comm_spawn like drive mappings.  So you could create an MPI_Info object and set wdir=z:\\ and map=z:\\\\server\\share.  Then pass this info object in with the MPI_Comm_spawn command and you could map a network drive and launch an executable from this drive.\n<\/p><\/blockquote>\n<p>So far the Rmpi package is compiled for MPICH2 (not DeinoMPI) so it won&#8217;t run with only DeinoMPI installed but there is a good chance that this will change in the near future.<br \/>\nFurther useful references are in the R newsletter 2003, p21 <a href=\"http:\/\/cran.r-project.org\/doc\/Rnews\/\">cran.r-project.org\/doc\/Rnews<\/a>and a paper in the UW Biostatistics Working Paper Series on &#8220;<a href=\"http:\/\/www.bepress.com\/uwbiostat\/paper193\/\">Simple Parallel Statistical Computing in R<\/a>&#8221; by Anthony Rossini and LukeTierney.<br \/>\nBTW, haplotypes of the hapmap project were <a href=\"http:\/\/www.hapmap.org\/downloads\/phasing\/2005-03_phaseI\/\">computed<\/a> on a 110 node cluster provided by both Peter Donnelly&#8217;s Mathematical Genetics Group <a href=\"http:\/\/www.stats.ox.ac.uk\/mathgenbio\/home.html\">www.stats.ox.ac.uk<\/a> based at the Oxford Centre for Gene Function and by a 128 node compute cluster provided by the Oxford e-Science Centre <a href=\"e-science.ox.ac.uk\">e-science.ox.ac.uk<\/a> as part of the National Grid Service[to be cont&#8217;d&#8230;]. <\/p>\n<p><a href=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2007\/03\/mpi1.png\" title='' data-rel=\"key-image-0\" data-rl_title=\"\" data-rl_caption=\"\"><img src='https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2007\/03\/mpi1.png' alt='mpi1.png' \/><\/a><\/p>\n\n<p>&nbsp;<\/p>\n<div class=\"bottom-note\">\n  <span class=\"mod1\">CC-BY-NC Science Surf , accessed 05.04.2026<\/span>\n <\/div>","protected":false},"excerpt":{"rendered":"<p>Following several unsuccessful attempts to implement a parallel computing platform for R statistical software, I am showing here my current approach that is largely influenced by a recent paper on cluster programming in c&#8217;t 6\/06 by Oliver Lau (sorry, no online version). My primary interest is with the R library snow (or snow-ft) that offers &hellip; <a href=\"https:\/\/www.wjst.de\/blog\/sciencesurf\/2006\/12\/r-parallel-computing\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">R parallel computing<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[1],"class_list":["post-498","post","type-post","status-publish","format-standard","hentry","category-genetics-biology","tag-uncategorized"],"_links":{"self":[{"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/posts\/498","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/comments?post=498"}],"version-history":[{"count":0,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/posts\/498\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/media?parent=498"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/categories?post=498"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/tags?post=498"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}