Every day, we are piling up incredible amounts of data that nobody is overseeing anymore. AI is the poposed solution but who is overseeing AI?
We are flooded every day with results, but does anyone care? Ecosystem declined by 47%, what’s next? We are already reading and writing too much with retrograde amnesia as nothing matters within the context of the replication crisis.
WIRED had a nice summery some time ago about “200 Researchers, 5 Hypotheses, No Consistent Answers. Just how much wisdom is there in the scientific crowd?”
The crowdsourced study is a dramatic demonstration of an idea that’s been widely discussed in light of the reproducibility crisis—the notion that subjective decisions researchers make while designing their studies can have an enormous impact on their observed results. Whether through p-hacking or via the choices they make as they wander the garden of forking paths, researchers may intentionally or inadvertently nudge their results toward a particular conclusion.
The Landy Study is my personal favorite of 2019.
A new study examines the question how design choices shape research results: Justin F. Landy et al. Crowdsourcing hypothesis tests: Making transparent how design choices shape research results. Psychological Bulletin (in press)
The institutional background of most research is a research group that shares a hypothesis. The group hypothesis usually determines the methods. Methods and study design determine results. Due to this relationship there is a high risk even in replication studies that they just replicate systematic errors. It is therefore not unexpected that many scientific hypotheses only die with their proponents. To what extent are research results influenced by subjective decisions?
Fifteen research teams independently designed studies to answer five original research questions related to moral judgments, negotiations, and implicit cognition. Participants from two separate large samples (total N > 15,000) were then randomly assigned to complete one version of each study. Effect sizes varied dramatically across different sets of materials designed to test the same hypothesis: materials from different teams rendered statistically significant effects in opposite directions for four out of five hypotheses, with the narrowest range in estimates being d = -0.37 to +0.26. Meta-analysis and a Bayesian perspective on the results revealed overall support for two hypotheses, and a lack of support for three hypotheses. Overall, practically none of the variability in effect sizes was attributable to the skill of the research team in designing materials, while considerable variability was attributable to the hypothesis being tested.
So it is all about the hypothesis … something that we already know from the early Genome Analysis Workshops where the same dataset had been distributed to different groups.
is another attempt to explain why replication fails frequently in genetic epidemiology. Zöllner and Pritchard write in the AJHG (their server is currently down)
For a variant that is genuinelyâ€”but weaklyâ€”associated with disease, there may be only low or moderate power to detect association. Hence, when there is a significant result, it may imply that the genotype counts of cases and controls are more different from each other than expected. Consequently, the estimates of effect size are biased upward. This effect, which is an example of the “winner’s curse” from economics depends strongly on the power of the initial test for association. If the power is high, most random draws from the distribution of genotype counts will result in a significant test for association; thus, the ascertainment effect is small. On the other hand, if the power is low, conditioning on a successful association scan will result in a big ascertainment effect.
I havenÂ´t fully understood the following argumentation, but promise to revisit it some times later, yea, yea.