Category Archives: Genetics

Escaping from a swamp

The November AJHG has an excellent re-analysis of the dysbindin-schizophrenia association using new methodology that surpasses all previous meta-analysis techniques. As the single SNP association results from the previous 6 studies cannot be directly compared, they construct a European super-hap map from all tag SNPs in that region, place them in a phylogenetic tree before finally mapping all single associations on these haplotypes. Their Fig.1B show the main results; as the circles in Fig.1B are somewhat confusing, I have withdrawn their results – adding the haplotype frequencies and ordering the studies by year of publication.

pc010002-2.JPG

We may think of a triple-blind study – neither patients, nor PIs, nor we did know anything before. The results are alarming. I do not understand how the Kirov set could have included all haplotypes and why the Schwab/Williams set is in opposition to the Straub/Bogaert/Funke set.
What could have gone wrong? The authors of the current re-analysis believe that population differences are an unlikely reason for the inconsistency as the allele frequencies match between studies. Good news that genotyping errors may be largely excluded.
Unfortunately the authors remain vague why there is no common causal variant. Have there been different sampling schemes, different diagnostic thresholds, different environmental exposures in the previous studies? Is dysbindin at all a schizophrenia gene, or only under a certain genetic background? It seems possible that studies of one branch are false positives. Or is the haplotype reconstruction in the re-analysis erroneous for whatever reasons?
Von Münchhausen is well know for escaping from a swamp by pulling himself up by his own hair. I would like I could do that too.

 

CC-BY-NC Science Surf accessed 20.12.2025

Number cruncher

In a recent blog I described high resolution SNP datasets that are available on the net. To work with these datasets you will probably need to upgrade your hardware and software. For data handling many people stick nowadays to commercial SQL databases that have plugins for PD software.
My recommendation is to save that money and store the data in a special format that may be more useful for these large dataset; details are in a technical report that I will upload later this day. In the meantime you can already check some software tools to work with these large datasets. This is what I know so far

  • David Duffy has recompiled his sibpair program |link
  • Geron(R) has something under development |link
  • Jochen Hampe and colleagues offer Genomizer |link
  • Franz Rüschendorf developed Alohomora |link
  • I renember about SNPGWA, a development at Wake Forest University |no link yet
  • there will be a R-Bioconductor package by Rob Scharpf |no link yet
  • R library GenABEL by Yurii Aulchenko |link
  • R library SNPassoc by Juan González |link

Addendum

A technical report how to work with large SNP dataset is now also available at my paper section. Alternatives to what I am suggesting in this paper, have been set out by an anonmyous reviewer

For R users, if SQLite limits are reached, hdf5 (http://hdf.ncsa.uiuc.edu/HDF5/) may be one way forward for really huge table structures since there is an R interface already available. PostgreSQL column limit depends on data type with a maximum of 1600 for simple types. MySQL with the BerkeleyDB backend may be like SQLite with no obvious column count limit. Metakit is not mentioned – it is column oriented and probably also has “unlimited” columns as long as each database is < 1GB or so.

 

CC-BY-NC Science Surf accessed 20.12.2025

What people search for

“Dissecting the complex genetic basis of mate choice” is the lengthy title of a lengthy text that tells us

males produce complex signals and displays that can consist of a combination of acoustic, visual, chemical and behavioural phenotypes…

The authors come from a school of integrative biology. I wonder why they have missed the excellent work in humans on HLA, fertility and mate choice.
Having said that, I would even suggest a radical different approach by looking at “What people search for” – hopefully I get now also hits on my blog for Paris Hilton, Renee Zellweger, Britney Spears, Heidi Klum, Pamela Anderson, Jessica Simpson and Jennifer Lopez ;-) Dissecting the complex genetic basis of mate choice shouldn´t be as complicated as you may imagine from this nature reviews genetics paper, yea, yea.

 

CC-BY-NC Science Surf accessed 20.12.2025

Better than the Delphi oracle

A new paper shows a nice workflow how to do an in vitro prediction which drug will suppress a certain tumor. The authors are simply linking the phenotype of the cell line “50% inhibitory concentration by drug X” with its expression signature. The good news are that doing both in one vial (phenotyping and expression analysis) is leading to excellent results.

genomicsignature.png

Is there any trick to do this also system-wide e.g. for the metabolism of a substance and its signalling pathway? Pharmacogenetics would greatly benefit from such an approach, nay, nay.

 

CC-BY-NC Science Surf accessed 20.12.2025

Gene lists by automatic literature extraction

Just found at the HUM MOLGEN bulletin board a link to Fable, a new automated literature extraction system. Fable is pretty fast and can output gene lists. Sure, the screenshot below shows only those genes that I mentioned in the abstract, but this is not so bad as the most important genes wil be placed there.
BTW, the number of reviews on asthma genetics have been falling to less than 50% after closing the Asthma Gene Database. Maybe this new service will help to re-establish the former output of reviews ;-) yea, yea.

fable.png

 

CC-BY-NC Science Surf accessed 20.12.2025

IL4 cluster revisited

I am interested in 5q31 and the IL4 cluster since I met David Marsh in the lobby of a hotel in Heidelberg around 1993. David was one of the founding fathers of asthma genetics and I renember how he vividly told me that he has a forthcoming Science paper on the IL4 cluster and IgE. The cluster is still one of the best allergy regions where the signalling through IL4 and IL13 now gets more interest than the work of any of his competitors.
Nature genetics now has an update on the 3-dimensional resolution of the genomic region. It is not cristallographic work as might be expected but a nice study of the chromatin structure that is leading to a coordinated expression of these cytokines. SATB1 (special AT-rich sequence binding protein 1) is thought to anchor specialized sequences letting DNA loops come into interaction. I wonder if there might be even a direct physical interaction of the IL4 and IL13 promotor and if there will be any SNP influencing that interaction? David (who died of brain cancer in 1998) would have really liked this work. Yea, yea.

il4cluster.png

 

CC-BY-NC Science Surf accessed 20.12.2025

The first methylome available

-moblog- Having spent this weekend in Heidelberg city at a meeting of the German NGFN project I had the opportunity to listen to an excellent talk of Stephan Beck who works at the Wellcome Trust Sanger Center.

Epigenetics is the connecting link between the rather fixed genome and the variable transcriptome. To start with the end of the talk: Beck predicts for the near future highly parallel SNP, expression and methylation arrays. Although the first methylome has just been published 4 weeks ago by the Arabidopsis community (as with RNAi the plant people again at the forefront) there is still a long way ahead for a first human methylation map.

The latest information may be retrieved from www.epigenome.org, www.epigenome-noe.net, www.epitron.eu,
www.heroic-ip.eu and the German National Methylome Project on chromosome 21 (please google for the link). The methylome is largely an European initiative – the two US epigenome projects do not have any website so far. The network site has some introductory texts; Beck was also refering to a 2006 PLOS paper by Akhtar.

Currently there are 4 human chromosomes under work covering 873 genes (hopefully I captured this correctly as this was a very dense talk). 70% of genes examined so far are either clearly methylated or they are not methylated by testing 12 different tissues. Sperm stands out from all other tissues – which is not unexpected. Tissues originating from the same developmental background have similar methylation patterns – also not unexpected. A preliminary analysis of expression patterns shows that if the 5 prime end is methylated expression is suppressed- also not unexpected.

Fascinating: the colon cells that certainly have a close interaction with the environment do NEITHER show age NOR sex specific differences. Fascinating too: The most frequently methylated regions are ECRs (evolutionary conserved sequences) for whatever reason. Promotor methylation dips around the transcription start sites – from the plots I would say plus and minus 2000bp. Methylation seem to be also conserved between mouse and human tissues while methylation status seems stable over time.

Current bisulfite sequencing is still laborious, expensive and takes quite a long time while immunoprecipitation using MeDIP is getting an alternative. The Sanger people also did a study usinge Nimble(R) gene 50 mers where Ensembl and UCSC will soon have these data for display. Finally, methylation appears in blocks. TagMVPs (your guess is correct, these are tags for methylation variant profiles) construction is straightforward where the estimated 40 million CpG sites will probably be covered by less than 10 percent tagMVP – Haplo epi types are now called hepitypes, yea, yea.

pb250021.JPG

Addendum

Methyl Primer Express® Software – is a free software package to simplify and automate the primer design process in methylation experiments. The bisulfite kit is not free ;-)

Addendum

A new textbook and a nice preview

 

CC-BY-NC Science Surf accessed 20.12.2025

Science is about recognizing errors

-moblog- I was already willing to accept that age related macular degeneration presents the first good case for a common variant responsible for a common disease (Y402H in CFH). Although the gene may be correct according to a new report from the Chakravarty! group Y402H seems to be largely irrelevant. A haplotype indicating a CFHR3 deletion was seen LESS in AMD (and replicated in a second sample). As the authors say

Much work is required to unravel the complexity of the transcripts and proteins arising from this highly duplicated gene cluster.

Another paper finds

… that there are multiple disease susceptibility alleles in the region.

See you soon again, yea, yea.

 

CC-BY-NC Science Surf accessed 20.12.2025

Nothing makes sense except in the light of a hypothesis

Looking again at human variation it seems that my recent estimate of 99,9% sequence identity is wrong as shown in an nature editorial yesterday and of course the new paper with the first copy number map of the human genome

3,080 million ‘letters’ of DNA in the human genome
22,205 genes, by one recent estimate
10 million single-letter changes (SNPs) —
that’s only 0.3% of the genome
1,447 copy-number variants (CNV),
covering a surprisingly large 12% of the genome
About 99.5% similarity between two random people’s DNA

I am organizing my literature in folders where the CNV section is still very thin but labelled as high priority – this seems to be adequate as the new study shows that the CNV emcompass hundreds of genes and functional elements.

Notably, the CNVRs encompassed more nucleotide content per genome than SNPs, underscoring the importance of CNV in genetic diversity and evolution

The Wellcome Trust has a nice website about copy number variants. If you want to read more, you will find information about the methods (array-based comparative genome hybridisation, cytogenetics, population genetics, comparative genomics and bioinformatics) as well as the questions that drive CNV research.
Again it seems that disease genetics is not only about stupid nucleotide polymorphisms (SNP), it is a whole bunch of chromosome aberration, segmental duplication, insertions and deletions – there is a good chance that these new data will improve our complex disease mapping efforts. I am quite confident that CNVs are not randomly distributed in the genome

CNV genes encode disproportionately large numbers of secreted, olfactory, and immunity proteins, although they contain fewer than expected genes associated with Mendelian disease […] Natural selection appears to have acted discriminately among human CNV genes. The significant overabundance, within human CNVs, of genes associated with olfaction, immunity, protein secretion, and elevated coding sequence divergence, indicates that a subset may have been retained in the human population due to the adaptive benefit of increased gene dosage.

There is a good chance of retrieving even posthoc CNV information from SNP arrays by taking into account relative signal intensity. Yea, yea.

Addendum

The mouse data are now also online.

 

CC-BY-NC Science Surf accessed 20.12.2025

Hap world map?

A new study of 12 Mb DNA sequence in 927 individuals representing 52 populations now finds good portability of of tag SNPs between the 4 hapmap groups and any of the 52 populations (except some African populations like the Mandenka, Bantu, Yoruba, Biaka Pygmy, Mbuti Pygmy and San). The paper has some exceptional well done graphics – and I am quite happy that the resolution of European nations leaves some gaps for our forthcoming ECRHS papers (a poster had already been on display at the 3rd Annual International HapMap Project in Cambridge, Massachusetts).

“Die Botschaft hör’ ich wohl, allein mir fehlt der Glaube” (Goethe, “I hear the message well…”). The usefulness of tagSNPs in disease association studies still remains to be shown (I still renember comments like cr.. map). At present I neither believe in rare variants nor in common common variants but a permanent reshuffling of rare, frequent and highly abundant variants. Yea, yea.

 

CC-BY-NC Science Surf accessed 20.12.2025

Men r’sponding to women

We know much about the differences between men and women – the X is the default pathway and the Y under the microscope looks as worn down and “misshapen as a stubbed-out cheroot“. There turns out to be something really new. So far all effects of Y genes on sex determination have been attributed to SRY, the testis determining gene (NR0B1, FOXL2 and WNT04 are probably ovary-determining).
The careful analysis of an Italian pedigree now described a new gene that can reversal XX to male when being disrupted: It is R-spondin 1 (or RSPO1), a growth factor that may act through ß-catenin stabilization and synergize with Wnt.
Do you know renember the nice cartoon of the Y chromosome with the HUH? selective hearing loss ;-) it is finally RSPO1. Yea, yea.

 

CC-BY-NC Science Surf accessed 20.12.2025

Peer production

firstmonday has an interesting article about the limits of self-organization and “laws of quality”. Given 52 million tracks in the Gracenote database, 1 million entries in Wikipedia and 17,000 books in project Gutenberg, Paul Duguid throughly examines the two laws of quality

  • Linus law: “given enough eyeballs, all bugs are shallow” which means that almost every error will be discovered and ultimately fixed
  • Graham law: “people just produce whatever they want; the good stuff spreads, and the bad gets ignored”

Although more professionalized, similar principles operate in science. With these large genetic studies, I have the feeling that most errors occur at the interfaces, during hand-shaking of disciplines. There are certainly only a few people that can design a study, examine a patient, go to the laboratory, analyze and annotate the data and publish them. This means that even many eyeballs can not look around the corner and that it will take many years for the “good stuff to spread”. Yea, yea.

 

CC-BY-NC Science Surf accessed 20.12.2025