Tag Archives: Genetics + Biology

Number cruncher

In a recent blog I described high resolution SNP datasets that are available on the net. To work with these datasets you will probably need to upgrade your hardware and software. For data handling many people stick nowadays to commercial SQL databases that have plugins for PD software.
My recommendation is to save that money and store the data in a special format that may be more useful for these large dataset; details are in a technical report that I will upload later this day. In the meantime you can already check some software tools to work with these large datasets. This is what I know so far

  • David Duffy has recompiled his sibpair program |link
  • Geron(R) has something under development |link
  • Jochen Hampe and colleagues offer Genomizer |link
  • Franz Rüschendorf developed Alohomora |link
  • I renember about SNPGWA, a development at Wake Forest University |no link yet
  • there will be a R-Bioconductor package by Rob Scharpf |no link yet
  • R library GenABEL by Yurii Aulchenko |link
  • R library SNPassoc by Juan González |link

Addendum

A technical report how to work with large SNP dataset is now also available at my paper section. Alternatives to what I am suggesting in this paper, have been set out by an anonmyous reviewer

For R users, if SQLite limits are reached, hdf5 (http://hdf.ncsa.uiuc.edu/HDF5/) may be one way forward for really huge table structures since there is an R interface already available. PostgreSQL column limit depends on data type with a maximum of 1600 for simple types. MySQL with the BerkeleyDB backend may be like SQLite with no obvious column count limit. Metakit is not mentioned – it is column oriented and probably also has “unlimited” columns as long as each database is < 1GB or so.

What people search for

“Dissecting the complex genetic basis of mate choice” is the lengthy title of a lengthy text that tells us

males produce complex signals and displays that can consist of a combination of acoustic, visual, chemical and behavioural phenotypes…

The authors come from a school of integrative biology. I wonder why they have missed the excellent work in humans on HLA, fertility and mate choice.
Having said that, I would even suggest a radical different approach by looking at “What people search for” – hopefully I get now also hits on my blog for Paris Hilton, Renee Zellweger, Britney Spears, Heidi Klum, Pamela Anderson, Jessica Simpson and Jennifer Lopez ;-) Dissecting the complex genetic basis of mate choice shouldn´t be as complicated as you may imagine from this nature reviews genetics paper, yea, yea.

Better than the Delphi oracle

A new paper shows a nice workflow how to do an in vitro prediction which drug will suppress a certain tumor. The authors are simply linking the phenotype of the cell line “50% inhibitory concentration by drug X” with its expression signature. The good news are that doing both in one vial (phenotyping and expression analysis) is leading to excellent results.

genomicsignature.png

Is there any trick to do this also system-wide e.g. for the metabolism of a substance and its signalling pathway? Pharmacogenetics would greatly benefit from such an approach, nay, nay.

In the heat of the night

Sorry for a misleading title, but it is a nice idea to use heatmaps also for conditional linkage (or SNP association) results. Seen at the Heidelberg meeting. Sorry also, to show a figure that is severely cropped and blurred to maintain the authors right on their data, yea, yea.

pb250016.JPG

Gene lists by automatic literature extraction

Just found at the HUM MOLGEN bulletin board a link to Fable, a new automated literature extraction system. Fable is pretty fast and can output gene lists. Sure, the screenshot below shows only those genes that I mentioned in the abstract, but this is not so bad as the most important genes wil be placed there.
BTW, the number of reviews on asthma genetics have been falling to less than 50% after closing the Asthma Gene Database. Maybe this new service will help to re-establish the former output of reviews ;-) yea, yea.

fable.png

IL4 cluster revisited

I am interested in 5q31 and the IL4 cluster since I met David Marsh in the lobby of a hotel in Heidelberg around 1993. David was one of the founding fathers of asthma genetics and I renember how he vividly told me that he has a forthcoming Science paper on the IL4 cluster and IgE. The cluster is still one of the best allergy regions where the signalling through IL4 and IL13 now gets more interest than the work of any of his competitors.
Nature genetics now has an update on the 3-dimensional resolution of the genomic region. It is not cristallographic work as might be expected but a nice study of the chromatin structure that is leading to a coordinated expression of these cytokines. SATB1 (special AT-rich sequence binding protein 1) is thought to anchor specialized sequences letting DNA loops come into interaction. I wonder if there might be even a direct physical interaction of the IL4 and IL13 promotor and if there will be any SNP influencing that interaction? David (who died of brain cancer in 1998) would have really liked this work. Yea, yea.

il4cluster.png

The first methylome available

-moblog- Having spent this weekend in Heidelberg city at a meeting of the German NGFN project I had the opportunity to listen to an excellent talk of Stephan Beck who works at the Wellcome Trust Sanger Center.

Epigenetics is the connecting link between the rather fixed genome and the variable transcriptome. To start with the end of the talk: Beck predicts for the near future highly parallel SNP, expression and methylation arrays. Although the first methylome has just been published 4 weeks ago by the Arabidopsis community (as with RNAi the plant people again at the forefront) there is still a long way ahead for a first human methylation map.

The latest information may be retrieved from www.epigenome.org, www.epigenome-noe.net, www.epitron.eu,
www.heroic-ip.eu and the German National Methylome Project on chromosome 21 (please google for the link). The methylome is largely an European initiative – the two US epigenome projects do not have any website so far. The network site has some introductory texts; Beck was also refering to a 2006 PLOS paper by Akhtar.

Currently there are 4 human chromosomes under work covering 873 genes (hopefully I captured this correctly as this was a very dense talk). 70% of genes examined so far are either clearly methylated or they are not methylated by testing 12 different tissues. Sperm stands out from all other tissues – which is not unexpected. Tissues originating from the same developmental background have similar methylation patterns – also not unexpected. A preliminary analysis of expression patterns shows that if the 5 prime end is methylated expression is suppressed- also not unexpected.

Fascinating: the colon cells that certainly have a close interaction with the environment do NEITHER show age NOR sex specific differences. Fascinating too: The most frequently methylated regions are ECRs (evolutionary conserved sequences) for whatever reason. Promotor methylation dips around the transcription start sites – from the plots I would say plus and minus 2000bp. Methylation seem to be also conserved between mouse and human tissues while methylation status seems stable over time.

Current bisulfite sequencing is still laborious, expensive and takes quite a long time while immunoprecipitation using MeDIP is getting an alternative. The Sanger people also did a study usinge Nimble(R) gene 50 mers where Ensembl and UCSC will soon have these data for display. Finally, methylation appears in blocks. TagMVPs (your guess is correct, these are tags for methylation variant profiles) construction is straightforward where the estimated 40 million CpG sites will probably be covered by less than 10 percent tagMVP – Haplo epi types are now called hepitypes, yea, yea.

pb250021.JPG

Addendum

Methyl Primer Express® Software – is a free software package to simplify and automate the primer design process in methylation experiments. The bisulfite kit is not free ;-)

Addendum

A new textbook and a nice preview

Science is about recognizing errors

-moblog- I was already willing to accept that age related macular degeneration presents the first good case for a common variant responsible for a common disease (Y402H in CFH). Although the gene may be correct according to a new report from the Chakravarty! group Y402H seems to be largely irrelevant. A haplotype indicating a CFHR3 deletion was seen LESS in AMD (and replicated in a second sample). As the authors say

Much work is required to unravel the complexity of the transcripts and proteins arising from this highly duplicated gene cluster.

Another paper finds

… that there are multiple disease susceptibility alleles in the region.

See you soon again, yea, yea.

Nice to know

-moblog- There are many things “nice to know” but only a few “need to know”. Molecular epidemiologists having access to large datasets sometimes forget what most people on earth “want to know” – how to prevent and cure human disease.

Nothing makes sense except in the light of a hypothesis

Looking again at human variation it seems that my recent estimate of 99,9% sequence identity is wrong as shown in an nature editorial yesterday and of course the new paper with the first copy number map of the human genome

3,080 million ‘letters’ of DNA in the human genome
22,205 genes, by one recent estimate
10 million single-letter changes (SNPs) —
that’s only 0.3% of the genome
1,447 copy-number variants (CNV),
covering a surprisingly large 12% of the genome
About 99.5% similarity between two random people’s DNA

I am organizing my literature in folders where the CNV section is still very thin but labelled as high priority – this seems to be adequate as the new study shows that the CNV emcompass hundreds of genes and functional elements.

Notably, the CNVRs encompassed more nucleotide content per genome than SNPs, underscoring the importance of CNV in genetic diversity and evolution

The Wellcome Trust has a nice website about copy number variants. If you want to read more, you will find information about the methods (array-based comparative genome hybridisation, cytogenetics, population genetics, comparative genomics and bioinformatics) as well as the questions that drive CNV research.
Again it seems that disease genetics is not only about stupid nucleotide polymorphisms (SNP), it is a whole bunch of chromosome aberration, segmental duplication, insertions and deletions – there is a good chance that these new data will improve our complex disease mapping efforts. I am quite confident that CNVs are not randomly distributed in the genome

CNV genes encode disproportionately large numbers of secreted, olfactory, and immunity proteins, although they contain fewer than expected genes associated with Mendelian disease […] Natural selection appears to have acted discriminately among human CNV genes. The significant overabundance, within human CNVs, of genes associated with olfaction, immunity, protein secretion, and elevated coding sequence divergence, indicates that a subset may have been retained in the human population due to the adaptive benefit of increased gene dosage.

There is a good chance of retrieving even posthoc CNV information from SNP arrays by taking into account relative signal intensity. Yea, yea.

Addendum

The mouse data are now also online.

Hap world map?

A new study of 12 Mb DNA sequence in 927 individuals representing 52 populations now finds good portability of of tag SNPs between the 4 hapmap groups and any of the 52 populations (except some African populations like the Mandenka, Bantu, Yoruba, Biaka Pygmy, Mbuti Pygmy and San). The paper has some exceptional well done graphics – and I am quite happy that the resolution of European nations leaves some gaps for our forthcoming ECRHS papers (a poster had already been on display at the 3rd Annual International HapMap Project in Cambridge, Massachusetts).

“Die Botschaft hör’ ich wohl, allein mir fehlt der Glaube” (Goethe, “I hear the message well…”). The usefulness of tagSNPs in disease association studies still remains to be shown (I still renember comments like cr.. map). At present I neither believe in rare variants nor in common common variants but a permanent reshuffling of rare, frequent and highly abundant variants. Yea, yea.

Men r’sponding to women

We know much about the differences between men and women – the X is the default pathway and the Y under the microscope looks as worn down and “misshapen as a stubbed-out cheroot“. There turns out to be something really new. So far all effects of Y genes on sex determination have been attributed to SRY, the testis determining gene (NR0B1, FOXL2 and WNT04 are probably ovary-determining).
The careful analysis of an Italian pedigree now described a new gene that can reversal XX to male when being disrupted: It is R-spondin 1 (or RSPO1), a growth factor that may act through ß-catenin stabilization and synergize with Wnt.
Do you know renember the nice cartoon of the Y chromosome with the HUH? selective hearing loss ;-) it is finally RSPO1. Yea, yea.

Not an academic project, not an industry project

Not an academic project, not an industry project, but still earn money with knowledge? Innocentive posted the request #4470259 for an ALS biomarker. Deadline is Nov 06, 2008 and you will get $1,000,000 USD for problem solving. Yea, yea.

Peer production

firstmonday has an interesting article about the limits of self-organization and “laws of quality”. Given 52 million tracks in the Gracenote database, 1 million entries in Wikipedia and 17,000 books in project Gutenberg, Paul Duguid throughly examines the two laws of quality

  • Linus law: “given enough eyeballs, all bugs are shallow” which means that almost every error will be discovered and ultimately fixed
  • Graham law: “people just produce whatever they want; the good stuff spreads, and the bad gets ignored”

Although more professionalized, similar principles operate in science. With these large genetic studies, I have the feeling that most errors occur at the interfaces, during hand-shaking of disciplines. There are certainly only a few people that can design a study, examine a patient, go to the laboratory, analyze and annotate the data and publish them. This means that even many eyeballs can not look around the corner and that it will take many years for the “good stuff to spread”. Yea, yea.

Pharmacogenetic tests on the market

Certainly one of the best web resources for pharmacogenetics is the PharmGkB database that collects all kind of data about the relationships among drugs, diseases and genes. Of course you could sequence your genome or run expression profiling on a liver sample. However, you are probably here to find out what (serious!?) pharmacogenetic tests are already on the market.

Much can be said about the usefulness of such tests; I have doubts if there will ever be such personalized treatment as I can foresee some logistic problems to validate it ;-) More likely are group based therapies, maybe restricted to geographic ancestry. Here is a (first and very) preliminary collection of commercially available pharmacogenetic tests:

  • CYP2D6, CYP2C9 and CYP2C19 collectively account for about 40 percent of drug metabolism mediated by cytochrome P-450 (Roche). The AmpliChip CYP450 Test is the world’s first pharmacogenetic microarray-based test approved for clinical use. CYP2D6 metabolizes codeine into morphine. A variation in CYP2D6 varies with race and leads to a lower elimination rate of the antidepressants Prozac (a selective serotonin reuptake inhibitors); the alternatively used drug Celexa is metabolized by CYP2C19 (as well as omeprazole). Other examples include clopidogrel (metabolized by CYP3A4) and cyclophosphamide (by CYP2B6) and vitamin K (by CYP2C9)
  • NAT2*5A, NAT2*6A, NAT2*7A/B and NAT2*14A carriers are rapid and slow acetylators for example of isoniazid or procainamide (Roche)
  • HER2+ women may get herceptin (Roche, Bayer, PathVysion)
  • TP, DPD are the rate limiting catabolic enzymes of 5-fluorouracil metabolism (Roche)
  • Mitochondrial A155G variants are tested for aminogylcoside side effect (Humatrix)
  • A Warfarin sensitivity test will be in clinical use next year (Kimball Genetics). It will test for variations in CYP2C9 and VKORC1
  • An UGT1A1 gene variant is associated with leukopenia if prescribed camptosar, a drug for colon cancer (Oncoscreen)
  • A TPMT variant is associated with slow metabolism of 6-mercaptopurine, used in the treatment of childhood leukemia and inflammatory bowel diseases (Pharmaco-Gendia)
  • Epigenomics is currently developing tests based on DNA methylation
  • Tyrosine kinase inhibitor gleevec inhibits the ABL, ARG, SCF/KIT, and PDGFRA and PDGFRB kinases in CML. Mutations in ABL can arise as secondary mutations in previously sensitive leukemias (Pharmaco-Gendia)

Needless to say that I have excluded here specific HIV mutations that may induce resistance to particular drugs (as I learned last week on a bioinformatics meeting here by Thomas Lengauer). I have also excluded all kind of sex-specific marker (e.g. SRY testing) and the whole nutrigenomics stuff.

Who knows more, for example about lansoprazole effectiveness, UGT1A9 and mycophenolic acid, UGT1A1 and irinotecan, COMT genotype and amphetamine response, pharmacogenetics of COX-2 inhibitors, and GRP78 responsiveness to chemotherapy? Is there any commercial test available for these genes? It seems that somebody should start a wiki on that, yea, yea.

Addendum 31-12-09

Here is another gene list; only 6 tests have been approved by the FDA; Nature reports about Oncotype DX and Prostate Px as well as MammaPrint. See also an UK based paper

HLA-B*5701 was most commonly tested to identify those at risk of abacavir hypersensitivity among patients with HIV. A number of barriers to testing were identified, including lack of clinician knowledge and a lack of scientific evidence.