Assets minus debts

Slashdot reports a United Nations study that

the richest 2% of adults in the world own more than half of all household wealth… Most previous studies of economic disparity have looked at income, whereas this one looks at wealth – assets minus debts.

Looks similar to science budgets, yea, yea.

Addendum

An interview with Richard Münch in Laborjournal 12/2006, p.23 confirms this: 17 out of 100 German universities consume 50% of all funds provided by DFG. He furthermore believes that SFBs and research networks are a kind of ideological framework; projects are not assessed retrospectively; there is an overkill of management costs where a considerable part of third-party funding is used to get more third-party funding.

Headline: For the first time 2 human genomes compared

What sounds brand new in a current nature genetics preprint, could have even been done already a couple of years before. The CNV specialists from Toronto compare now the Human Genome Project sequence with the Celera sequence. Was the gap between these two compilations bigger than the intra sequence holes? Both sequences are of course mosaics from several individuals but the analysis nevertheless exemplifies how difficult it may be to compare even the genome of two related individuals. The authors employ a whole battery of alignment tools BLAT, MEGABLAST, A2Axxx, GCA. Of course results are different depending on strategy, definition and implementation used. As show nicely by FISH analysis most of the discrepancies are true and can be classified into a few categories – insertions with and without corresponding fragment or deletions if seen from the second genome (has somebody ever thought about a minimal human genome?), mismatches and inversions. We are getting here a preview of the diagnostic workup in a patient in 2026. This blog contains forward looking statements while the responsibility rests solely with the reader. Yea, yea.

New LD measure

There is a new way to calculate LD that may overcome the limitations of D’ and R^2 that are not easily generalizable to multiallelic markers (or haplotypes) and depend on the distribution of SNPs (or haplotypes).
The paper is at BMC, the sources at the authors’ website. I have slightly modified the program to allow input and output file names on the command line before compiling it. Use at your own risk, yea, yea.

New R packages for SNP studies

The December R newsletter reports several brandnew bioconductor packages useful for SNP studies:

more

100 years of allergy

Clemens von Pirquet (1874-1929) was the first to introduce “Allergie” into clinical medicine by characterizing an altered immune reaction of the body to foreign substances. The famous paper was published 1906 in the “Münchner Medizinische Wochenschrift (MMW)” Volume 53, page 1457-1458.

AB Kay published a nice essay on “100 years of Allergy: can von Pirquet’s word be rescued?” that includes an English translation of the above paper. The MMW has a contemporary obituary of Meinhard von Pfaundler (1872-1947) who was one of the earlier directors of the Münchner Haunersche Kinderspital.

Finally, the “Wiener Klinische Wochenschrift” has a CV of Clemens von Pirquet, showing his way to Baltimore and Breslau back to Vienna until his tragic end by committing suicide together with his mentally ill wife.

pirquet2.png

Tit-for-tat or altruism in science

No, this essay will not deal with altruism in science but with the science of altruism. There are two new papers from the Fehr group (one in Science on Nov, 3 about diminished reciprocal fairness after magnetic stimulation of the right prefrontal cortex and a second in Nature on Aug, 24 about altruism in two indigenous groups in Papua New Guinea). I was, however, much more impressed by their recent review of human altruism.
Cooperation between genetically unrelated groups is a typical human behaviour (otherwise seen only in ants, bees and the naked mole rat) where there seems a strong reciprocity between selfishness and altruism. Cooperation is rarely stable and may deteriorate under worse conditions. Altruistic rewarding and reputation seeking seem to be the most powerful determinants of future donors’ behaviour where effects of punishing behaviour seem to be underestimated: Cooperation in larger group continues only if punishment of defectors and non-punishers is possible.

punish480.png

An utter refutation

I am slow in commenting on a paper that has already been published earlier this year – Joe Terwillingers vivid refutation of the fundamental theorem of the hapmap proponents that

if a marker is in tight LD with a polymorphism that directly impacts disease risk, as measured by the metric r^2, then one would be able to detect an association between the marker and disease with sample size that was increased by a factor of 1/r^2 over that needed to detect the effect of the functional variant directly

I cannot comment on the statistical proof but fear from my recent experience with Crohn and asthma tags that he may be right with his assumption: Even marker in high LD with the functional variant may not show any association at all. These may be bad news for all those currently running large screening programs with hapmap based variants believing that P(A|BC)=P(A|Bc)=P(A|B), yea, yea.

Addendum

Tag SNPs also do not work with CNVs

Who will survive?

When looking at gene variants in a population we may forget that even having a perfect sampling scheme this will not be an unbiased view of the human genome. Earlier studies suggested that up to 75% of conceptions are lost during early development; a further indicator of an biased view are unexplained cases of departure from Hardy-Weinberg equilibrium.
Selective survival during early pregnancy is still a terra incognita and except of studies in the Hutterites I am not aware of any (modern) study that looked at selective survival.
A study of Grant Montgomery now shows fresh data on genomewide allele sharing in 1,592 DZ twins from Australia and 336 DZ pairs from the Netherlands.
It is somewhat disappointing that there is no excess allele sharing in the HLA region nor somewhere else in the genome. Maybe further studies can do that a high resolution than with just 359 microsatellite marker?

Prevention of fraud

Donald Kennedy is writing in this week’ Science editorial about Responding to Fraud. The editorial is even more about prevention of fraud: The external reviewers ask for future risk assessment of potential fraud. Science will think in the future

… which papers deserve particularly careful editorial scrutiny. Papers that are of substantial public interest, present results that are unexpected and/or counterintuitive, or touch on areas of high political controversy may fall into this category…

I appreciate such an initiative and I agree that science is based on an assumption of trust – no procedure will be immune to deliberate fraud. However, looking both at people and at papers could be worthwile. I would give extra score points for

  • too ambitious institutional environments
  • large and anonymous organizations
  • poor social and scientific interaction at a local level
  • limited scientific qualification or background of researchers or department heads
  • time pressure, too many projects, no longterm goals
  • direct financial compensation in return of scientific impact
  • past history of minor misconduct

Looking at papers will also reveal inconsistencies

  • contradictory numbers
  • suspicious modifications of figures
  • original data not public available
  • original documentation not public available
  • constructs, cell lines, animals not public available
  • inadequate point by point response to review
  • insufficient documentation of IRB and authorship

Another option is to pay reviewers – the review process is becoming more and more time consuming – and even to plan on-site evaluation. An option probably not feasible is to delay publication until the main findings are independently reproduced.

Finally, I see a large gap between the attempts of Science and Nature to improve their performance while some average impact journals never respond if you ask them to correct or withdraw a highly distorted paper. Yea, yea.

Addendum

Guide to promoting integrity in scientific journals published by the Council of Science Editors

I agree with everything you said

“I agree with everything you said that was correct, and I disagree with everything you said, that was incorrect” (Adlai Stevenson according to AJRCM 2006;174:1056) – a nice comment that fits every situation.
The German Spiegel has an interview with Tim O’Reilly about the quality of internet resources. It seems that everybody can voice his or her opinion while the final decision about a feature or a patch is done in the “inner circle”. Entry to the inner circle is limited to those who qualify by previous contributions – probably a very similar system in science. O’Reilly talks in this interview also about Jaron Larnier’s warning that Wikipedia may be dangerous for creating mono-culture-knowledge. He agrees that Wikipedia has been abused in the past but believes that the mechanisms behind Wikipedia to identify abuse are much better than in any political system, yea, yea.

Escaping from a swamp

The November AJHG has an excellent re-analysis of the dysbindin-schizophrenia association using new methodology that surpasses all previous meta-analysis techniques. As the single SNP association results from the previous 6 studies cannot be directly compared, they construct a European super-hap map from all tag SNPs in that region, place them in a phylogenetic tree before finally mapping all single associations on these haplotypes. Their Fig.1B show the main results; as the circles in Fig.1B are somewhat confusing, I have withdrawn their results – adding the haplotype frequencies and ordering the studies by year of publication.

pc010002-2.JPG

We may think of a triple-blind study – neither patients, nor PIs, nor we did know anything before. The results are alarming. I do not understand how the Kirov set could have included all haplotypes and why the Schwab/Williams set is in opposition to the Straub/Bogaert/Funke set.
What could have gone wrong? The authors of the current re-analysis believe that population differences are an unlikely reason for the inconsistency as the allele frequencies match between studies. Good news that genotyping errors may be largely excluded.
Unfortunately the authors remain vague why there is no common causal variant. Have there been different sampling schemes, different diagnostic thresholds, different environmental exposures in the previous studies? Is dysbindin at all a schizophrenia gene, or only under a certain genetic background? It seems possible that studies of one branch are false positives. Or is the haplotype reconstruction in the re-analysis erroneous for whatever reasons?
Von Münchhausen is well know for escaping from a swamp by pulling himself up by his own hair. I would like I could do that too.

Easter Eggs

In medieval ages messengers had tattoos under the scalp hair. Charles Dickens also described how women used to purl and to knit for hidden messages. Many software developers also insert messages or features in the code. The motivation may be to sign it or put some artistic touch on it – you will find a lot of websites out explaining the necessary keystrokes and web links.
I wonder if also other colleagues are hiding initials, words or messages in scientific papers? Unfortunately due to the online submission, publishers will now recognize faked references. What about using steganography to mark pictures or PDFs?

Zeitgeist

It seems that the German word Zeitgeist is increasingly used also in English texts. When thinking again and again about science and scientists, I always come back to a famous assay of Karl Jaspers written in 1932 (he lost his professorship in Heidelberg 1937; in 1938 he was forbidden to publish any more).

The title of the essay is “Die geistige Situation der Zeit”. The chapter “Wissenschaft” is always a comfort to me when being desperate about the inequity of the scientific world. Here is an excerpt:

Wissenschaften leisten auch heute Außerordentliches. Die
exakten Naturwissenschaften haben einen aufregenden Gang
rapider Fortschritte in Grundgedanken und empirischen Ergebnissen
begonnen. Ein über die Welt verbreiteter Kreis der
Forscher steht in den Beziehungen des rationalen Sichverstehens.
Einer wirft dem anderen den Ball zu. Dieser Vorgang
findet Widerhall in der Masse durch die Handgreiflichkeit der
Resultate. Das sachnahe Sehen in den Geisteswissenschaften
hat sich zu mikroskopischer Feinheit gesteigert. Ein nie dagewesener
Reichtum an Dokumenten und Monumenten ist vor
Augen gebracht. Kritische Sicherheit ist erreicht.

Die Krise der Wissenschaften besteht also nicht eigentlich
in den Grenzen ihres Könnens, sondern im Bewußtsein ihres
Sinns. Mit dem Zerfall eines Ganzen ist nun die Unermeßlichkeit
des Wißbaren der Frage unterstellt, ob es des Wissens wert
sei. Wo das Wissen ohne das Ganze einer Weltanschauung nur
noch richtig ist, wird es allenfalls nach seiner technischen
Brauchbarkeit geschätzt. Es versinkt in die Endlosigkeit dessen,
was eigentlich niemanden angeht.

Nicht also schon die immanente Entwicklung der Wissenschaften
macht die Krise zureichend begreiflich, sondern erst
der Mensch, auf den die wissenschaftliche Situation trifft. Nicht
Wissenschaft für sich, sondern er selbst in ihr ist in einer Krise.
Der historisch-soziologische Grund dieser Krise liegt im
Massendasein, Die Tatsache der Verwandlung der freien Forschung
Einzelner in den Betrieb der Wissenschaft hat zur Folge,
daß jedermann sich mitzuwirken für befähigt hält, wenn er nur
Verstand hat und fleißig ist. Es kommt ein wissenschaftliches
Plebejertum auf; man macht leere Analogiearbeiten, um sich
als Forscher auszuweisen, macht beliebige Feststellungen, Zählungen,
Beschreibungen und gibt sie für empirische Wissenschaft
aus. Die Endlosigkeit eingenommener Standpunkte, so
daß man in häufiger werdenden Fällen sich nicht mehr versteht,
ist allein die Folge davon, daß ein jeder unverantwortlich
seine Meinung zu sagen wagt, die er sich erquält, um auch
etwas zu bedeuten. Man hat die Unverfrorenheit, „nur zur
Diskussion zu stellen” was einem grade einfällt. Die Unmenge
gedruckter Rationalität wird in manchen Gebieten schließlich
zur Schaustellung des chaotischen Durcheinanderströmens der
nicht mehr eigentlich verstandenen Reste früher einmal lebendigen
Denkens in den Köpfen der Massenmenschen. Wenn so
Wissenschaft Funktion von Tausenden als jeweils zum Fach
als Beruf gehörender Interessenten wird, dann kann wegen
der Eigenschaften des Durchschnitts auch der Sinn von Forschung
… durcheinander geraten.

Number cruncher

In a recent blog I described high resolution SNP datasets that are available on the net. To work with these datasets you will probably need to upgrade your hardware and software. For data handling many people stick nowadays to commercial SQL databases that have plugins for PD software.
My recommendation is to save that money and store the data in a special format that may be more useful for these large dataset; details are in a technical report that I will upload later this day. In the meantime you can already check some software tools to work with these large datasets. This is what I know so far

  • David Duffy has recompiled his sibpair program |link
  • Geron(R) has something under development |link
  • Jochen Hampe and colleagues offer Genomizer |link
  • Franz Rüschendorf developed Alohomora |link
  • I renember about SNPGWA, a development at Wake Forest University |no link yet
  • there will be a R-Bioconductor package by Rob Scharpf |no link yet
  • R library GenABEL by Yurii Aulchenko |link
  • R library SNPassoc by Juan González |link

Addendum

A technical report how to work with large SNP dataset is now also available at my paper section. Alternatives to what I am suggesting in this paper, have been set out by an anonmyous reviewer

For R users, if SQLite limits are reached, hdf5 (http://hdf.ncsa.uiuc.edu/HDF5/) may be one way forward for really huge table structures since there is an R interface already available. PostgreSQL column limit depends on data type with a maximum of 1600 for simple types. MySQL with the BerkeleyDB backend may be like SQLite with no obvious column count limit. Metakit is not mentioned – it is column oriented and probably also has “unlimited” columns as long as each database is < 1GB or so.