Archiv der Kategorie: Genetics

Back to the roots

Rewriting a local Python notebook now to Colabs (as I don’t have an eGPU and therefore also no CUDA support in macOS) I am now again restricted by a daily timeout…

This reminds me so much back to 1989/1990 when we programmed SAS on an IBM mainframe under VM/CMS.  I usually went there early in the morning just to reserve my disk space :-)  In the evening it was all gone just like Colabs nowadays…


Computermuseum: SAS files of
* Auswertung des genetischen Einflusses...;                                                            
cms fi data disk dummy dummy e;                                                 
data a; set data.ges_crit(keep=                                        
asthma frage  f1ekzem f33heu f36ekzem);                                                       

How atopic dermatitis is linked to vitamin D and how IL33 splice variants associate to eosinophil numbers

(first published 12 Dec2020 and revised 10Dec2021)

We had a major discussion right before our 2010 paper where I argued that rare variants should have been included into our asthma/allergy/dermatitis GWAS. Ten years after there is now a nice paper using massive exome sequencing that finally includes them.

It seems that the respiratory tract isn’t so much influenced by rare gene variants but that there is a strong effect in  the immune system.

Qanli Wang, posted December 13, 2020.

And there is another interesting fact.

…Surveying the contribution of rare variants to the genetic architecture of human disease through exome sequencing of 177,882 UK Biobank participants …if we look at the …. European population who are carriers of a filaggrin (FLG) PTV, we find those carriers have significantly higher risk for well-known associations, such as dermatitis  … and asthma … Concomitant increases in vitamin D levels suggest … increased sensitivity to ultraviolet B radiation.

So far, I have only assumed an asthma/allergy priming effect of oral vitamin D in the newborn gut. This paper now argues for an increased vitamin D sensitivity also in the skin of  FLG dermatitis patients which is interesting given the largely contradictory data of serum vitamin D and atopic dermatitis. Maybe dermatologists should focus their research more on skin and local vitamin D turnover?




Unfortunately IgE is not included in the Astra UK Phewas Portal that shows all study results. Neither”allergy”, nor “hayfever” and “asthma” show any relevant signal, while eosinophils seem to have some prominent peaks (HLA was excluded) at 4 genes: TET2 with p<10-25,  JAK2 p<10-9, IL33 p<10-37,  ALOX15 p<10-13 and CSF2RB p<-10-19 (database link is dead).




IL33 is the only overlapping gene here. The most prominent variant carried by over 2,300 people is splice acceptor 9-6250473-G-C followed by 600+ individuals with splice donor 9-6250600-G-T.

Unfortunately these seem not too many carriers of this variant by the sheer amount of 177,882 participants. We nevertheless know already something about the seven IL33 splice variants since 2012.

Novel Splice Variants of IL-33: Differential Expression in Normal and Transformed Cells Journal of Investigative Dermatology (2012) 132, 2661–2664; doi:10.1038/jid.2012.180

with updates  in  2016

Gordon, Alternative splicing of interleukin-33 and type 2 inflammation in asthma, PNAS 2016

as well as in 2017

Fig 3A Smith et al. A rare IL33 loss-of-function mutation reduces blood eosinophil counts and protects from asthma, PLoS Genetics 2017 – describes the splice site as NM_001199640:exon7:c.487-1G>C or rs146597587-C

So I did now a sequence match to compare the new finding with these older publications.

own sequence match exon7 using data from dbSNP, UCSC GoldenPath and Uniprot – reference is hg19

Indeed, the 2017 paper already described rs146597587 which is probably identical to the splice acceptor 9-6250473-G-C in Astra UK Phewas (genome positions do not match – I used hg19 while I don’t know the Astra reference) . Astra says also c.613-1G>C while rs146597587 is just upfront of my codon 205 (3*205=615) whatever that means.

The Astra UK Phewas at least confirms the Iceland paper above

rs146597587-C associates with lower eosinophil counts (ß= -0.21 SD, P = 2.5×10-16, N = 103,104), and reduced risk of asthma in Europeans (OR = 0.47; 95%CI: 0.32, 0.70, P = 1.8×10-4, N cases = 6,465, N controls = 302,977). Heterozygotes have about 40% lower total IL33 mRNA expression than non-carriers and allele-specific analysis based on RNA sequencing and phased genotypes shows that only 20% of the total expression is from the mutated chromosome. In half of those transcripts the mutation causes retention of the last intron, predicted to result in a premature stop codon that leads to truncation of 66 amino acids.



Virus reactivation

A recent Nature study showed Ebola reactivation in a previously infected patient:

The 2021 lineage shows considerably lower divergence than would be expected during sustained human-to-human transmission, which suggests a persistent infection with reduced replication or a period of latency.

The most recent viral genome shared 10 substitutions that evolved during the previous epidemic making it unlikely that there was a new animal spillover event. I always wondered how second & third wave of COVID-19 started in Germany. Was it really a new spread or just a reactivation? A Frontiers review concluded from the existing literature that

our study, consisting more than a total of 113,715 patients, indicates that the RP-SARS-CoV-2 scenario occurs plausibly due to reactivation, reinfection, viral shedding, or testing errors.

So far, there are 240 documented COVID-19 cases of reinfection reported worldwide according to the reinfection tracker. In the case of Ebola there is a known viral persistence in semen while  transmission through milk and cervicovaginal fluid is also possible (similar for COVID-19 although neuronal persistence seems to be more relevant). Maybe we need more immunological studies particular in long COVID if there is a continuous or intermittent antigenic stimulation due to persistence of an antigenic reservoir.

The fourth wave in Germany is caused by the delta variant, reactivation of alpha is certainly not a major factor. So we will only know in the next few years if reactivation is responsible for small regional outbreaks in unvaccinated communities.

New England Journal of Medicine Retractions

Retraction Watch says on Jan 30, 2020 that

Until yesterday the New England Journal of Medicine had retracted only 24 papers. Now that tally is 25.

While the retracted paper has been cited 190 times, here are inaccuracies in another NEJM paper that has been cited 1961 times.

Full details are at PubPeer.


Is it justified to speak of a “protective” effect just by a  negative association?

In addition to the problems with the math, I don’t get the point – farming should be leading to a generally reduced capacity for numerous pleiotropic cytokines?

There are even reports that LPS induces TH2 dependent senstization which is exactly the opposite of what this paper wants us to believe by some cryptic smoother applied to  a heterogenous population.

What does shared genetics mean?

On Feb 7,2020 I had the chance to hear a talk of Luke Jostins-Dean about Irritable Bowel Syndrome (IBS) and Inflammatory Bowel Disease (IBD). While IBS had a substantial overlap of polygeneic risk scores with psychological features, IBD did not. Sounds logical but does this prove anything?

Two new medRxiv preprints [Wendt and Marees ] throroughly examine also possible genetic correlates. Although I am quite sceptical that SES correlates should be tested at all (and also think that GWAS are not hypothesis free) here is the Marees explanation of the three possibilities we do have: PRS, MTAG, mtCOJO.

First, polygenic risk scoring (PRS) is a tempting approach; but PRS using mental health/disease to predict the same or different phenotypes from an independent dataset often explain very little variance in the outcome phenotype. PRS also cannot detect specific biology underlying each phenotype.
Second is multi-trait analysis of GWAS (MTAG), which jointly analyses GWAS summary statistics and adjusts per-SNP effect estimates and association p-values using the strength of the genetic correlation between phenotypes. Genetic correlations between EDU/SES and related phenotypes have, however, demonstrable biases from environmental confounders….
To disentangle the complex genetic overlaps between EDU/SES and mental health, we therefore used multi-trait conditioning and joint analysis (mtCOJO), which generates conditioned GWAS summary statistics for each phenotype of interest after correcting for the per-SNP effects of another phenotype). The mtCOJO approach is not based on genetic correlation; it is based on the causal relationship between trait pairs inferred by Mendelian randomization (MR).

mtCOJO seems the way to go although the usual restrictions apply.

Fastest ever data transfer: 126 Petabyte per second

1 x ejaculation expells 250,000,000 sperm at a speed of 500 cm/s. Each sperm contains 3,088,000,000 base pairs = bits ~ 368 megabyte of genomic DNA

Each base pair takes 2 bits (you can use 00, 01, 10, and 11 for T, G, C and A). … And remember, you have to go from bits to bytes to get to an answer in megabytes. A bit is just a single unit of digital information, but a byte is a sequence of bits (usually 8). And because computers work in binary math, 1 kilobyte = 1024 (i.e. 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 = 1024). 1 gigabyte = 1024 megabytes = 1048576 kilobytes = 1073741824 bytes. So you take the 3,088,000,000 bits and divide it by 8 to get 750,000,000 bytes. Divide that by 1024 and you get 376,953 kilobytes. Divide it by 1024 once more and you’re left with 368 megabytes.

Add 30,000 CpG islands x 8 bytes ~ 0,2 Megabyte
Also add 75 mitochondria x 16,569,000 base pairs ~ 148 Megabyte
So in total 516,2 Megabyte per sperm

250,000,000 x 51,2 Megabyte / 1024 = 126,000,000 Gigabyte = 126,000 Terabyte = 126 Petabyte

Polygenic risk scores – dead end of human genetics

Eric Topol will agree on the fact that

the practice of utilizing the DNA of an individual to predict disease has been judged to provide little to no useful information.

but they nevertheless the group tried to rescue the concept by combining clinical risk factors plus low/medium/high polygenic risk. As a pragmatic approach this may work for some diseases but it will not explain any genetic pathhway.

More recently Richardson et al. tried another approach using the UK Biobank data where they analysed 162 PRS and 551 heritable traits from 334,398 individuals. Using their web application for my search term “asthma” I did not find anything useful that hasn’t been known for ages.

Jan 15, 2020

So lets have a look at an example published last November in Cell “Screening Human Embryos for Polygenic Traits Has Limited Utility”. The outcome is sobering: If an IVF embryo would be profiled with polygenic scores for traits such as height or IQ

the top-scoring embryo is expected to be about 2.5 cm or about 2.5 IQ points above the average. The adult trait value of the top-scoring embryo would remain widely distributed.

I wouldn’t have even run this analysis for being pointless.

Makroskopische Anatomie (Bewusstseinsmodelle VI)

Währen des Updates eines früheren Blogeintrages über “social pain” fiel mir ein Artikel über die anatomische Basis des “Selbst” aus dem Jahr 2018 in die Hand. Die Lokalisation hier war in der Insula bzw anteriorem Gyrus cingulus, damit also Brodmann-Areale 24, 32 und 33, alles Areale die seit langem auch mit der Schizophrenie in Verbindung gebracht werden.

Viele Menschen lokalisieren ihr “bewusstes Selbst” irgendwo zwischen den Augen hinter der Glabella, die in der Mythologie oft auch als drittes Auge (chakra) bezeichnet wird. In der Wissenschaft des post Phrenologie Zeitalter ist in die Lokalisation allerdings “frustriend vage” (“frustratingly diaphanous“). In meiner kleinen Umfrage gaben einige auch eine Lokalisation im oberen Brustbereich mit Atmung/Herzschlag an; andere hatten keine permanente , Kinder und eine (blinde) Person überhaupt keine Lokalisation.

Mehr zu dem “verkörperten Bewusstsein” bei Thomas Fuchs oder auch bei Goethe “Nichts ist drinnen, nichts ist draußen; denn was innen, das ist außen“. Irritierend bleibt jedenfalls wie leicht die Selbstwahrnehmung gestört werden kann, sei es durch Schlaf, LSD oder Datenbrille [1, 2]

Vielleicht müsste man auch besser nach der anatomischen Selbstlokalisation von “Seele” oder “Mind” suchen? Zumindest für die historische Klärung würde das weiter führen. Immerhin , Leonardo lokalisierte auch für oberhalb des Chiasma opticum.


Does smoking kill by accumulation of mutations or by repeated exposure until the final crack?

We have a new paper at Sci Rep online “High degree of polyclonality hinders somatic mutation calling in lung brush samples of COPD cases and controls“.

It took a long time from my initial grant application at Sander Stiftung in Dec 2009 (where it was rejected), to the field work within the scope of the EvA study (where the PI Loems Ziegler-Heitbrock retired).

Followed by some first analysis together with Francesc at CNAG in Barcelona the final publication now appeared – my gratulations to Gian-Andri and Ivo Gut for their hard work!

Chronic obstructive pulmonary disease (COPD) is induced by cigarette smoking and characterized by inflammation of airway tissue. Since smokers with COPD have a higher risk of developing lung cancer than those without, we hypothesized that they carry more mutations in affected tissue.
We called somatic mutations in airway brush samples from medium-coverage whole genome sequencing data from healthy never and ex-smokers (n=8), as well as from ex-smokers with variable degrees of COPD (n=4). Owing to the limited concordance of resulting calls between the applied tools we built a consensus, a strategy that was validated with high accuracy for cancer data.
However, consensus calls showed little promise of representing true positives due to low mappability of corresponding sequence reads and high overlap with positions harbouring known genetic polymorphisms. A targeted re-sequencing approach suggested that only few mutations would survive stringent verification testing and that our data did not allow the inference of any difference in the mutational load of bronchial brush samples between former smoking COPD cases and controls.

So we would have probably needed a higher genome coverage on our brush sample mix. Or should we have sequenced more single cells as discussed in the paper?

At least, we now know, that sequencing at rather low coverage rate is not a screening tool for expected cancer development. Are there less pre-malignant lesions than expected? When looking at some other papers (Cancer Genome Atlas, esophagus, and more recently colon samples, I can only confirm what Iñigo Martincorena wrote

this study emphasizes how little we know about somatic evolution within normal tissues, a fundamental process that is likely to take place to varying degrees in every tissue of every species.

Somatic mutations accumulates with age. There may be even more mutations in the aging esophagus than in sun-exposed human skin. Lee-Six estimates 43.6 mutations /year, while I still have a gut feeling that there is no gradually accumulation of mutations (until the second hit) but a clonal expansion of a single

Martincorena 2018

bronchial cell, hit by a single smoke stream. With this hypothesis, smoking would not kill by accumulation of deleterious mutations, but by the never ending re-exposure until the ultimate deleterious mutation occurs.

Many more of these timeline studies will be necessary to explain why the lung cancer risk drops immediately after you stop smoking.

Is the Loomis Hypothesis wrong?

I remember an old vitamin D book – I believe it was Feldman’s Vitamin D – that argues that if we are going down the Transsahara Route (Algier-Lagos) human skin color gets always darker towards Tamanrasset due to increased solar power- also known known as the  Loomis hypothesis (Science 1967).

So , let’s have a look at three maps. Solar power first.

1. Solar power. Source:

Second, skin tones – the Tamanrasset observation is correct but the overall picture does not match the Loomis hypothesis.

2. Skin tone. Source For full details see the original paper

Neither does the new vitamin D map published last week in the Lancet.

3. Vitamin D map. Source: For full details see the original paper

Maps are still patchy but let me already believe that the Loomis hypothesis is wrong.

Doubling time

I always had the impression that whenever a scientific work group is growing too quickly, it is difficult to keep ethical and scientific standards. Maybe that’s even true on macroscopic scale? The paper “Age, Aging and Age Structure in Science” by Robert Merton & Harriet Zuckerman summarizes in 1972 (on p 498 of Merton’s collected works “Sociology of Science”) that

the population of scientists, with a doubling time of about fifteen years, is far outrunning the acceleration rate of increase in the general population.

and he quotes Derek Price “Little science, big science” p 19 that we

will soon have two scientists for every man, woman, child and dog.

The science and art of scientific presentations

I always recommend Edward Tufte’s book The Visual Display of Quantitative Information” and what he says about data ink

  1. Above all else show data.
  2. Maximize the data-ink ratio.
  3. Erase non-data-ink.
  4. Erase redundant data-ink.
  5. Revise and edit.