Category Archives: Genetics

Dangerous DNA – nullomers or nullintells

New Scientist reports that

Could there be forbidden sequences in the genome – ones so harmful that they are not compatible with life? One group of researchers thinks so. Unlike most genome sequencing projects which set out to search for genes that are conserved within and between species, their goal is to identify “primes”: DNA sequences and chains of amino acids so dangerous to life that they do not exist.

Read the full article – I am hesitating what to say now, yea, yea.

 

CC-BY-NC Science Surf accessed 05.11.2025

A tragedy if you don’t feel pain

A highly successfull study of the molecular pathways of nociception (and identification of a loss of function mutation in the alpha subunit of SCN9A, a voltage gated sodium channel) has a sad story from Northern Pakistan

The index case for the present study was a ten-year-old child, well known to the medical service after regularly performing ‘street theatre’. He placed knives through his arms and walked on burning coals, but experienced no pain. He died before being seen on his fourteenth birthday, after jumping off a house roof.

It is so difficult for us humans to accept that pain has an important function in life, yea, yea.

 

CC-BY-NC Science Surf accessed 05.11.2025

Here’s lookin’ at you, kid

In German Humphrey Bogart’s immortal expression “Here’s lookin’ at you, kid” was translated “Ich seh’ Dir in die Augen, Kleines” which translates back to “I look in your eyes, honey”. Seems that this was a spontaneous idea of Bogart on 3rd July 1942 in the Burbank Studios of Warner Brothers if we believe this website.

I am now looking in your eyes with a new study by my long-term penpal David Duffy – 3 OCA2 intron 1 SNPs (rs7495174-rs6497268-rs11855019) are sufficient to explain most human eye colors: T-G-T/T-G-T diplotype is found in 62% blue/gray, 28% green/hazel and 10% brown eyes.

In a (soon to be published) study of European population stratification we also typed 2 OCA2 SNPs but unfortunately not the same ones; I checked also the Affymetrix 500K panel but it doesn`t included these SNPs as well.

 

CC-BY-NC Science Surf accessed 05.11.2025

Read this if you own a Porsche

and read it also if you are not so much interested in German cars but in population-based DNA registries.

  1. In 1996 all 750 Porsche car owner here in Munich were asked for a DNA identity test – the High Court in Karlsruhe later decided, that this approach was adequate. All test results, however, were discarded after that screening as
  2. forensic DNA testing (invented in 1985 by Sir Jeffreys) is only allowed for severe criminal offense and needs the decision of a judge.
  3. Some German authorities now want to change the rules – allowing DNA testing also in less severe cases, without order of a judge and infinite computer storage of DNA profiles. They get support by police investigators who complain about unresolved crimes and by victims that fear repeaters (boostered by the success in the Moshammer case)
  4. A considerable fraction of these DNA profiles has errors – lab errors, chimera, twins…
  5. Scientists fear that the acceptance to take part in genetic studies will drop if these profiles can be easily matched to research data sets.
  6. A large part of the population is interested in genetic privacy and do not want their government to store their genetic profiles: Ethnic data have been abused in the past to discriminate and kill people – politics and government may change in the future.

German BKA seems to have already 3900,000 DNA fingerprints in stock (other European countries store DNA profiles without any consent). Is there any way out? Here is an idea that I had on my morning daily bike trip:

  1. It is a waste of time and money to discard any test result – and both parties have strong arguments to use these data and to delete it.
  2. Why couldn’t the DNA test result be encoded, saved on a credit-card-like “DNAcard” including a PIN and given to the DNA owner before being deleted from the lab files? It would even be possible to reduce information content – to allow only for a 99% probability for identification.
  3. This would leave any individual the right to identify himself whenever he or she wants to be identified. In case of crime persecution, individuals may asked to insert their “DNAcard” in a public reader that will check the hash of his profile against the hash of any wanted profile (without storing the individual profile on any computer).
    1. Is that a solution?

       

      CC-BY-NC Science Surf accessed 05.11.2025

3 Rs

The 3 Rs of regulating animal research are Refinement (to minimize suffering), Reduction and Replacement (to minimize the number of animals used). A Nature news feature now has a critical appraisal of current knockout projects where each of the 25,000 genes will be knocked out in the next future. Although current technology represent an advantage over recent undirected mutagenesis projects

… the number of mice needed to establish a line stretches from 50 to several 100. On top of this, another couple of 100 animals are needed for basic analysis of genetic make-up and phenotype…

Many genes cannot be knocked out – some knockouts may even be lethal.
We are also not so much interested in permanent destruction of genes in all tissues but in conditional and temporal shutdown of gene function.
And many researchers are not so much interested in the current 129 background than in BL6 (at least in immunology and allergology).
Finally (in human genetics) we are not dealing with knockouts but with multiple genomic variants of a gene. The question therefore is

Is the spirit of the knockout projects in line with [3R] principle[s]?

although I acknowledge that these industrial projects may generate many “nice to know” facts.

 

CC-BY-NC Science Surf accessed 05.11.2025

A revival of DNA pooling

My interest in DNA pooling was always strong; we have developed methods doing this on the mass spec platform and applied it to the HLA region. I had, however, doubts if testing pools by less accurate methods like chip hybridization will work. The January issue of the AJHG now has a fascinating article how pooling may even work on the Affymetrix platform. Yea, yea.

Addendum

DNA pooling can be even used in family context, see Wen Chung Lee in Cancer Epidemiology 2005 or Neil Risch in Genome Research 1998.

 

CC-BY-NC Science Surf accessed 05.11.2025

Genetic tango

Science daily reports an interesting study how a protein recognizes a particular DNA binding site

Structural changes in both the protein and DNA, sometimes with the DNA within the complex kinked or sharply bent, allow for the specific contacts needed for a tight DNA-protein fit. Scientists think DNA is largely passive in this genetic tango. But new findings by Anjum Ansari, associate professor of biophysics at the University of Illinois at Chicago, suggest DNA may not be the wallflower that many had assumed…

Yea, yea.

 

CC-BY-NC Science Surf accessed 05.11.2025

454 or 0815 or 4911

In a recent book chapter we discussed new genotyping and sequencing technologies. Our concluding remarks haven´t changed so much – it seems that realtime detection of single molecules is still not possible; micro electropheresis based methods have already reached their limit while sequencing by hybridization has severe restrictions when it comes to de novo (or re-) sequencing of whole genomes. At least for research purpose I expect that whole genome re-sequencing will replace current SNP based disease mapping. So far, sequencing by synthesis seems to be one of the few HT methods that already works at that scale. The 454 platform consists of 3 consecutive steps:

  1. DNA library preparation starts with genomic DNA (after fragmentation and adaptor ligation the single-stranded template DNA libraries are isolated and assessed (takes ~4 hours)
  2. sstDNA is emulsificated, then amplified and recovered on beads before sequencing primer are being annealed (takes 1 day)
  3. After washing, so called PicoTiterPlates are prepared and a process started that looks like a combination of pyrosequencing reaction, correct me if I am wrong, a pyrophospate dependent enzyme cascade emitting light being recorded by a CCD camera that watches each of the ~200,000 holes (takes ~6 hours according to a recent paper in GenomXPress 2/06, figures at 454.com)

With an average read length of 100 bp and 200,000 fragments (resulting in 20 Mb) in 6 hours, the throughput is about 60fold compared to Sanger sequencing. The recent Neanderthal paper raises five arguments why the 454 sequencing platform is extremely well suited for analyses of bulk DNA extracted from ancient remains.

  1. … it circumvents bacterial cloning, in which the vast majority of initial template molecules are lost during transformation and establishment of clones.
  2. … because each molecule is amplified in isolation from other molecules it also precludes template competition, which frequently occurs when large numbers of different DNA fragments are amplified together.
  3. … its current read length of 100–200 nucleotides covers the average length of the DNA preserved in most fossils.
  4. … it generates hundreds of thousands of reads per run, which is crucial because the majority of the DNA recovered from fossils is generally not derived from the fossil species, but rather from organisms that have colonized the organism after its death.
  5. … because each sequenced product stems from just one original single-stranded template molecule of known orientation, the DNA strand from which the sequence is derived is known. This provides an advantage over traditional PCR from double-stranded templates, in which the template strand is not known, because the frequency of different nucleotide misincorporations can be deduced … damage that affects different bases differently.

Except of the low read length most of these observations would benefit large scale resequencing projects in human individuals. My main point for starting ASAP resequencing projects: So far we have not achieved a dense resolution of the genome while deep resequencing project (for example at the CRP locus) got astonishing results. We do not even know what is going on in the “noncoding” regions. Finally deletions and CNVs have been largely neglected – another look at this question in the EJHG.
The question remains, what does 454 mean – my inquiry is still pending. As far as I know 0815 was a machine gun in World War I and is a synomym for something repeatedly boring while 4911 is a street number and stands for a perfume. Yea, yea.

 

CC-BY-NC Science Surf accessed 05.11.2025

Human breeding

There have been always attempts to make humans better – an idea that attracted people nearly every century. Ovid created Galatea from a statue, Goethe’s homunculus originated from a test tube, Mary Shelly created her monster from corpses, Bulgakows proletarian derived from a dog and Sloterdijks Menschenzüchtung by a fancy idea. There is only a minor difference at the end of 2006 – technical possibilities of genetic testing and genetic engineering are much higher developed. Yea, yea.

 

CC-BY-NC Science Surf accessed 05.11.2025

Early biology blog

Sydney Brenner has written for many years a column in Current Biology called Loose Ends (Nature called him a man who talks a lot which is certainly unfair or at least not empirically proven – we are talking between 20,000 and 40,000 words every day). Loose ends seem to have influenced a whole generation of biologists – kitsch biology included. Yea, yea.

 

CC-BY-NC Science Surf accessed 05.11.2025

Another layer of complexity in gene regulation

Yesterday evening I attended an excellent presentation by Nikolaus Rajewksy about microRNAs, small noncoding RNAs that are thought to have a role in posttranscriptional regulation. Nikolaus just moved 3 months ago from New York to follow Jens Reich at MDC in Berlin). Basically, he talked about his recent “l(ou)sy” paper and the “SNP” paper after giving a rather detailed history about the development of the field. It started in 1950 with Jacob and Monod, 1960 Britten and Davidson, 1970 Haywood (who even quit science after being dissappointed), finally to 1990 when the Ambros and Ruvkun labs discovered nematode microRNAs. Current research is mainly done in the Tuschl, Batel, Cohen, Lander and Rajewsky labs who produce the bulk of the 800 papers or so published in 2006.
Approximately 30% of genes are influenced by microRNAs, the total number of microRNA sites is under heavy debate (~22,000) as well as the number of human microRNAs (328); each microRNA regulates ~200 genes. Unfortunately there is still no highthroughput technique to detect targets. There is also no good prediction by free energy and even mismatches in the 5 prime of mRNA are possible (individual predictions can be obtained at Pictar that uses a hidden Markov model).
If I understood that correctly, miRNA are the feedback mechanism on RNA level (with transcription factors at the DNA level). He mentioned 3 classes known so far in humans: oncomiRNA, miRNA 375 myotrophin, and miRNA 122 acting on cholesterol (quite interesting as being described recently in the NEJM. The experimental knockdown of liver specific mouse microRNA shows ~300 up- and ~300 down regulated genes. Upregulated genes have in approximately 50% of cases one miRNA nucleus, downregulated ones have even less than average binding sites. There is no overrepresented GO category in upregulated genes but cholesterol is highly significant in downregulated genes whatever that means. Action of miRNA seem to heavily context dependent giving us many more questions than answers. Yea, yea.

 

CC-BY-NC Science Surf accessed 05.11.2025

A new analysis method for blood doping

I have been deeply disappointed this summer when I heard that Jan Ullrich will not participate at the Tour de France 2006 (although there are many more athletes that I am watching – I wish him all the best for the next year). Later on that year I heard a presentation in Bern about blood banking – how cells struggle to survive after leaving the body – and of course we did first gene expression experiments back in 2002.
So here is my idea how to identify autologous blood transfusion: Blood separated from the body will develop a unique RNA expression pattern that can be measured by conventional cDNA chips. Identifying this pattern – possibly only 10 upregulated RNAs – in the blood of an athelete could indicate autologous blood transfusion.
I guess that there will be only a minor chance to re-identify this pattern after retranfusion into the body as blood is being diluted around 1:30 and RNA being immediately degradaded.
However, some retransfused cells will probably maintain their death struggle program for some time leaving a good chance to profile them even after a couple of days if they have visited a freezer or not. Wikipedia is correct

In the case of detecting blood transfusions, a test for detecting homologous blood transfusions (from a donor to a doping athlete) has been in use since 2000. The test method is based on a technique known as fluorescent-activated cell sorting. By examining markers on the surface of blood cells, the method can determine whether blood from more than one person is present in an athlete’s circulation.

At present there is no accepted way of detecting autologous transfusions (using the athlete’s own RBCs) but research is in progress and the World Anti-Doping Agency (WADA) has promised that a test will eventually be introduced. The test method and its introduction date are to be kept secret in order to avoid tipping off doping athletes..

A potential example application may be found in the literature – no need to keep this idea secret as it will be nearly impossible to modify a particular gene expression pattern of a particular cell type.

Addendum 5/2/2010

Finally, the WADA recognizes the value of gene signatures in a new Science editorial.

 

CC-BY-NC Science Surf accessed 05.11.2025

All roads to NLM

This is not just an addendum to my previous post free-for-all or to number-cruncher: the 12 Dec NIH press release links to a new and exciting database

NIH Launches dbGaP, a Database of Genome Wide Association Studies
The National Library of Medicine (NLM), part of the National Institutes of Health (NIH), announces the introduction of dbGaP, a new database designed to archive and distribute data from genome wide association (GWA) studies. GWA studies explore the association between specific genes (genotype information) and observable traits, such as blood pressure and weight, or the presence or absence of a disease or condition (phenotype information).

Addendum

29-5-07 dbGaP suffers from some broken links but content improves!

dbgap.png

 

CC-BY-NC Science Surf accessed 05.11.2025

R parallel computing

Following several unsuccessful attempts to implement a parallel computing platform for R statistical software, I am showing here my current approach that is largely influenced by a recent paper on cluster programming in c’t 6/06 by Oliver Lau (sorry, no online version). My primary interest is with the R library snow (or snow-ft) that offers the function clusterApplyLB. This function is all I need for my R programs.
Now it gets more complicated: library(snow) depends on library(Rmpi): Hao Yu has an excellent description at www.stats.uwo.ca/faculty/yu/Rmpi how to set up the mpi layer with MPICH2. I am currently experimenting with DeinoMPI a closely related high performance Windows interface. According to its developer David Ashton it has the following advantages

First, DeinoMPI does not require MPI applications to be started by mpiexec in order to call MPI_Comm_spawn so you could load Rmpi from the Rgui.exe without having to bother with calling mpiexec. Second, DeinoMPI loads the user profile when starting applications so if you query the user’s temporary directory you will get the user specific path and not the Windows system temp directory. Third, DeinoMPI handles arguments with spaces correctly if you quote them so you can pass environment variables with spaces in them. Fourth, DeinoMPI allows you to use the MPI Info object to pass extra options to MPI_Comm_spawn like drive mappings. So you could create an MPI_Info object and set wdir=z:\ and map=z:\\server\share. Then pass this info object in with the MPI_Comm_spawn command and you could map a network drive and launch an executable from this drive.

So far the Rmpi package is compiled for MPICH2 (not DeinoMPI) so it won’t run with only DeinoMPI installed but there is a good chance that this will change in the near future.
Further useful references are in the R newsletter 2003, p21 cran.r-project.org/doc/Rnewsand a paper in the UW Biostatistics Working Paper Series on “Simple Parallel Statistical Computing in R” by Anthony Rossini and LukeTierney.
BTW, haplotypes of the hapmap project were computed on a 110 node cluster provided by both Peter Donnelly’s Mathematical Genetics Group www.stats.ox.ac.uk based at the Oxford Centre for Gene Function and by a 128 node compute cluster provided by the Oxford e-Science Centre e-science.ox.ac.uk as part of the National Grid Service[to be cont’d…].

mpi1.png

 

CC-BY-NC Science Surf accessed 05.11.2025