A revival of DNA pooling

My interest in DNA pooling was always strong; we have developed methods doing this on the mass spec platform and applied it to the HLA region. I had, however, doubts if testing pools by less accurate methods like chip hybridization will work. The January issue of the AJHG now has a fascinating article how pooling may even work on the Affymetrix platform. Yea, yea.


DNA pooling can be even used in family context, see Wen Chung Lee in Cancer Epidemiology 2005 or Neil Risch in Genome Research 1998.

Genetic tango

Science daily reports an interesting study how a protein recognizes a particular DNA binding site

Structural changes in both the protein and DNA, sometimes with the DNA within the complex kinked or sharply bent, allow for the specific contacts needed for a tight DNA-protein fit. Scientists think DNA is largely passive in this genetic tango. But new findings by Anjum Ansari, associate professor of biophysics at the University of Illinois at Chicago, suggest DNA may not be the wallflower that many had assumed…

Yea, yea.

454 or 0815 or 4911

In a recent book chapter we discussed new genotyping and sequencing technologies. Our concluding remarks haven´t changed so much – it seems that realtime detection of single molecules is still not possible; micro electropheresis based methods have already reached their limit while sequencing by hybridization has severe restrictions when it comes to de novo (or re-) sequencing of whole genomes. At least for research purpose I expect that whole genome re-sequencing will replace current SNP based disease mapping. So far, sequencing by synthesis seems to be one of the few HT methods that already works at that scale. The 454 platform consists of 3 consecutive steps:

  1. DNA library preparation starts with genomic DNA (after fragmentation and adaptor ligation the single-stranded template DNA libraries are isolated and assessed (takes ~4 hours)
  2. sstDNA is emulsificated, then amplified and recovered on beads before sequencing primer are being annealed (takes 1 day)
  3. After washing, so called PicoTiterPlates are prepared and a process started that looks like a combination of pyrosequencing reaction, correct me if I am wrong, a pyrophospate dependent enzyme cascade emitting light being recorded by a CCD camera that watches each of the ~200,000 holes (takes ~6 hours according to a recent paper in GenomXPress 2/06, figures at 454.com)

With an average read length of 100 bp and 200,000 fragments (resulting in 20 Mb) in 6 hours, the throughput is about 60fold compared to Sanger sequencing. The recent Neanderthal paper raises five arguments why the 454 sequencing platform is extremely well suited for analyses of bulk DNA extracted from ancient remains.

  1. … it circumvents bacterial cloning, in which the vast majority of initial template molecules are lost during transformation and establishment of clones.
  2. … because each molecule is amplified in isolation from other molecules it also precludes template competition, which frequently occurs when large numbers of different DNA fragments are amplified together.
  3. … its current read length of 100–200 nucleotides covers the average length of the DNA preserved in most fossils.
  4. … it generates hundreds of thousands of reads per run, which is crucial because the majority of the DNA recovered from fossils is generally not derived from the fossil species, but rather from organisms that have colonized the organism after its death.
  5. … because each sequenced product stems from just one original single-stranded template molecule of known orientation, the DNA strand from which the sequence is derived is known. This provides an advantage over traditional PCR from double-stranded templates, in which the template strand is not known, because the frequency of different nucleotide misincorporations can be deduced … damage that affects different bases differently.

Except of the low read length most of these observations would benefit large scale resequencing projects in human individuals. My main point for starting ASAP resequencing projects: So far we have not achieved a dense resolution of the genome while deep resequencing project (for example at the CRP locus) got astonishing results. We do not even know what is going on in the “noncoding” regions. Finally deletions and CNVs have been largely neglected – another look at this question in the EJHG.
The question remains, what does 454 mean – my inquiry is still pending. As far as I know 0815 was a machine gun in World War I and is a synomym for something repeatedly boring while 4911 is a street number and stands for a perfume. Yea, yea.

Human breeding

There have been always attempts to make humans better – an idea that attracted people nearly every century. Ovid created Galatea from a statue, Goethe’s homunculus originated from a test tube, Mary Shelly created her monster from corpses, Bulgakows proletarian derived from a dog and Sloterdijks Menschenzüchtung by a fancy idea. There is only a minor difference at the end of 2006 – technical possibilities of genetic testing and genetic engineering are much higher developed. Yea, yea.

Early biology blog

Sydney Brenner has written for many years a column in Current Biology called Loose Ends (Nature called him a man who talks a lot which is certainly unfair or at least not empirically proven – we are talking between 20,000 and 40,000 words every day). Loose ends seem to have influenced a whole generation of biologists – kitsch biology included. Yea, yea.

Another layer of complexity in gene regulation

Yesterday evening I attended an excellent presentation by Nikolaus Rajewksy about microRNAs, small noncoding RNAs that are thought to have a role in posttranscriptional regulation. Nikolaus just moved 3 months ago from New York to follow Jens Reich at MDC in Berlin). Basically, he talked about his recent “l(ou)sy” paper and the “SNP” paper after giving a rather detailed history about the development of the field. It started in 1950 with Jacob and Monod, 1960 Britten and Davidson, 1970 Haywood (who even quit science after being dissappointed), finally to 1990 when the Ambros and Ruvkun labs discovered nematode microRNAs. Current research is mainly done in the Tuschl, Batel, Cohen, Lander and Rajewsky labs who produce the bulk of the 800 papers or so published in 2006.
Approximately 30% of genes are influenced by microRNAs, the total number of microRNA sites is under heavy debate (~22,000) as well as the number of human microRNAs (328); each microRNA regulates ~200 genes. Unfortunately there is still no highthroughput technique to detect targets. There is also no good prediction by free energy and even mismatches in the 5 prime of mRNA are possible (individual predictions can be obtained at Pictar that uses a hidden Markov model).
If I understood that correctly, miRNA are the feedback mechanism on RNA level (with transcription factors at the DNA level). He mentioned 3 classes known so far in humans: oncomiRNA, miRNA 375 myotrophin, and miRNA 122 acting on cholesterol (quite interesting as being described recently in the NEJM. The experimental knockdown of liver specific mouse microRNA shows ~300 up- and ~300 down regulated genes. Upregulated genes have in approximately 50% of cases one miRNA nucleus, downregulated ones have even less than average binding sites. There is no overrepresented GO category in upregulated genes but cholesterol is highly significant in downregulated genes whatever that means. Action of miRNA seem to heavily context dependent giving us many more questions than answers. Yea, yea.

Epidemiology in wartime

What was the best paper in 2006? I am voting for a Lancet paper by Gilbert Burnham, Riyadh Lafta, Shannon Doocy and Les Roberts. Between May and July, 2006, they did a national cross-sectional cluster sample survey of mortality in Iraq. Data from 1849 households was gathered, 1474 births and 629 deaths were reported. As of July, 2006, there have been 654 965 excess Iraqi deaths occured as a consequence of the war.
If you can’t imagine what it means to work in war regions you may read the biography of Robert Capa (1913 –1954) who worked as a photographer in the many wars, and died in the First Indochina War. He did wonderful photos together with his girl-friend Gerda Taro, one of the first woman photographers who died by a tank accident already in the Spanish civil war at the age of 26. I will pay for the flowers if you visit her grave at Père Lachaise in Paris.

p1000279-1.JPG p1000280-1.JPG


As expected, the study raised criticism: scienceblog:doi:10.1126/science.316.5823.355a

Aleph (Codex Sinaiticus) online in 2009

So far we could admire the wonderful Gutenberg bible in Göttingen (1454)


Current students of theology seem to have much better tools… The exciting news are that Codex Sinaiticus (dating back to 350?) is currently being digitized where NT and half of AT will be available in 2 years or so.
In 1844 Konstantin von Tischendorf discovered the Codex in a paper basket at Saint Catherine monastery of Mount Sinai (I also visited the monastery some years ago but did not find anything useful in the basket). He was allowed to take 43 of the 129 sheets to Leipzig. On another visit he discovered even more papers that were donated to Tsar Alexander. In 1933 former USSR sold 347 sheets to the Britische Museum in London, 6 sheets are still in St. Petersburg. In 1975 another 38 pages were found that are still Saint Catherine. At the moment British Library London, Universitätsbibliothek Leipzig, Russian National Library St. Petersburg and Saint Catherine work together for a digital edition of the manuscript including the use of hyperspectral imaging to uncover erased or faded text. This is quite important as Codex Sinaiticus (together with Codex Vaticanus) has heavily influenced our textus receptus.



Working in a field where hundreds of papers are published every year and none is ever retracted, I highly appreciate a letter in Science.

…D1 dopamine receptor (D1R)-stimulated intracellular Ca2+ release was attributed to a direct interaction with calcyon … the ability of calcyon and D1Rs to co-immunoprecipitate when co-expressed in cells as reported presumably stems from the association of both proteins with clathrin-coated vesicles…thus, the isolation of the calcyon clone in a Y2H screen with D1Rs appears to have been adventitious…

In my opinion this comment advance science and give the authors a much higher credibility than any further paper. Yea, yea.

A new analysis method for blood doping

I have been deeply disappointed this summer when I heard that Jan Ullrich will not participate at the Tour de France 2006 (although there are many more athletes that I am watching – I wish him all the best for the next year). Later on that year I heard a presentation in Bern about blood banking – how cells struggle to survive after leaving the body – and of course we did first gene expression experiments back in 2002.
So here is my idea how to identify autologous blood transfusion: Blood separated from the body will develop a unique RNA expression pattern that can be measured by conventional cDNA chips. Identifying this pattern – possibly only 10 upregulated RNAs – in the blood of an athelete could indicate autologous blood transfusion.
I guess that there will be only a minor chance to re-identify this pattern after retranfusion into the body as blood is being diluted around 1:30 and RNA being immediately degradaded.
However, some retransfused cells will probably maintain their death struggle program for some time leaving a good chance to profile them even after a couple of days if they have visited a freezer or not. Wikipedia is correct

In the case of detecting blood transfusions, a test for detecting homologous blood transfusions (from a donor to a doping athlete) has been in use since 2000. The test method is based on a technique known as fluorescent-activated cell sorting. By examining markers on the surface of blood cells, the method can determine whether blood from more than one person is present in an athlete’s circulation.

At present there is no accepted way of detecting autologous transfusions (using the athlete’s own RBCs) but research is in progress and the World Anti-Doping Agency (WADA) has promised that a test will eventually be introduced. The test method and its introduction date are to be kept secret in order to avoid tipping off doping athletes..

A potential example application may be found in the literature – no need to keep this idea secret as it will be nearly impossible to modify a particular gene expression pattern of a particular cell type.

Addendum 5/2/2010

Finally, the WADA recognizes the value of gene signatures in a new Science editorial.

All roads to NLM

This is not just an addendum to my previous post free-for-all or to number-cruncher: the 12 Dec NIH press release links to a new and exciting database

NIH Launches dbGaP, a Database of Genome Wide Association Studies
The National Library of Medicine (NLM), part of the National Institutes of Health (NIH), announces the introduction of dbGaP, a new database designed to archive and distribute data from genome wide association (GWA) studies. GWA studies explore the association between specific genes (genotype information) and observable traits, such as blood pressure and weight, or the presence or absence of a disease or condition (phenotype information).


29-5-07 dbGaP suffers from some broken links but content improves!


Christmas present – your digital book copy

Maybe you don´t want to wait until Google Scholar has it; maybe you are interested in a higher quality: Here is the web address digiwubu.gdz-cms.de at the Niedersächsische Staats- und Universitätsbibliothek Göttingen where I once studied theology.
There should be no problem to scan any book published before 1900, however you can ask also for books published later than that date. Costs will be are 0.25 € per page plus 5 € for handling and shipping a CD.
“Google Books Library Project” is currently scanning in Harvard, Stanford and Oxford >15 million volumes – German scan factories are in Göttingen and Munich. Yea, yea.

Exodus of science from Germany after 1933

A book that crossed my desk only very recently is about the exodus of science from Berlin after 1933. As a child I never understood the second commandment when God said to Mose that

Ex 20:4 I am a jealous God, punishing the children for the sin of the fathers to the third and fourth generation of those who hate me, but showing love to a thousand {generations} of those who love me and keep my commandments.

I always thought the idea to be unfair to be under collective guilt. Nevertheless when reading this book (published already in 1994 by Walter de Gruyter) we get a deeper meaning how science is affected for many generations by the displacement of the most prominent scientists.
Particular important in this book is the first chapter of Hubenstorf and Walther that highlights the situation in Berlin. Following world war I, Berlin had become the indisputable center of most scientific disciplines in the German speaking territory. In some disciplines Leipzig, Vienna or Munich may have been competitors, economics had been strong in Kiel, mathematics and physics in Göttingen, history in Marburg, however, for most scientists Berlin had been the highly desired “endpoint” of their career. he Friedrich-Wilhelms university had been the largest university, but there have been many more science organizations like Technical University Charlottenburg, Deutsche Hochschule für Politik, Preußische Akademie der Wissenschaften and Kaiser-Wilhelm-Gesellschaft (that is covered in more detail in another excellent chapter).
Medicine has been hit hardest during the Nazi period by having a large number of Jewish scientists. The resulting repercussions in the realm of science are described at different levels. Starting with a typology of transformation of scientific institutions, the establishment of new disciplines and the establishment of a military science sector, the autors give many historical details about the ways of scientific publishing or the organization of displaced scientists.
The cancer research department of the medical faculty fired 12 or 13 scientists; the hygiene institute dismissed 8 of 12 scientists including later Noble prize winner Erwin Chargaff. Hospital Lankwitz fired all physicians, Neukölln 67%, Freidrichshain 62% and Moabit 56%.
It is a terrible story – you can read how the editor of the Deutsche Medizinische Wochenschrift Paul Osswald Wolff was replaced by a Nazi supporter. Karger publisher even moved from Berlin to Basel (where they still reside today).
As any good science is strongly connected to teaching it may be understood that breaking this tradition has lead to a punishing of the children for the sin of the fathers to the third a fourth generation. Science politicians may even recognize the downside of spending money into science: they will be blessed by thousands of generations.


R parallel computing

Following several unsuccessful attempts to implement a parallel computing platform for R statistical software, I am showing here my current approach that is largely influenced by a recent paper on cluster programming in c’t 6/06 by Oliver Lau (sorry, no online version). My primary interest is with the R library snow (or snow-ft) that offers the function clusterApplyLB. This function is all I need for my R programs.
Now it gets more complicated: library(snow) depends on library(Rmpi): Hao Yu has an excellent description at www.stats.uwo.ca/faculty/yu/Rmpi how to set up the mpi layer with MPICH2. I am currently experimenting with DeinoMPI a closely related high performance Windows interface. According to its developer David Ashton it has the following advantages

First, DeinoMPI does not require MPI applications to be started by mpiexec in order to call MPI_Comm_spawn so you could load Rmpi from the Rgui.exe without having to bother with calling mpiexec. Second, DeinoMPI loads the user profile when starting applications so if you query the user’s temporary directory you will get the user specific path and not the Windows system temp directory. Third, DeinoMPI handles arguments with spaces correctly if you quote them so you can pass environment variables with spaces in them. Fourth, DeinoMPI allows you to use the MPI Info object to pass extra options to MPI_Comm_spawn like drive mappings. So you could create an MPI_Info object and set wdir=z:\ and map=z:\\server\share. Then pass this info object in with the MPI_Comm_spawn command and you could map a network drive and launch an executable from this drive.

So far the Rmpi package is compiled for MPICH2 (not DeinoMPI) so it won’t run with only DeinoMPI installed but there is a good chance that this will change in the near future.
Further useful references are in the R newsletter 2003, p21 cran.r-project.org/doc/Rnewsand a paper in the UW Biostatistics Working Paper Series on “Simple Parallel Statistical Computing in R” by Anthony Rossini and LukeTierney.
BTW, haplotypes of the hapmap project were computed on a 110 node cluster provided by both Peter Donnelly’s Mathematical Genetics Group www.stats.ox.ac.uk based at the Oxford Centre for Gene Function and by a 128 node compute cluster provided by the Oxford e-Science Centre e-science.ox.ac.uk as part of the National Grid Service[to be cont’d…].



While waiting for genomewide SNP data to be re-partioned into LD blocks I found this page with some neat progamming tricks. It is part of the dissertation of Ben Fry / MIT about computational information design. Page 74 ff has a history of redesigning the widely used haploview pogram.

The design of these diagrams was first developed manually to work out
the details, but in interest of seeing them implemented, it was clear that
HaploView needed to be modified directly in order to demonstrate the
improvements in practice. Images of the redesigned version are seen
on this page and the page following. The redesigned version was even-
tually used as the base for a subsequence ‘version 2.0’ of the program,
which has since been released to the public and is distributed as one of
the analysis tools for the HapMap [www.hapmap.org] project.