All posts by admin

Initial data analysis

Here is my favorite R list of packages

library(DataExplorer)
plot_str(iris)
plot_bar(iris)
plot_density(iris)
plot_correlation(iris)
plot_prcomp(iris)

see also

devtools::install_github('https://github.com/paulvanderlaken/ppsr')
score(iris, x = 'Sepal.Length', y = 'Petal.Length', algorithm = 'glm')[['pps']]

while for robustness of models I use

devtools::install_github("chiragjp/quantvoe")
library(quantvoe)

Python versions come here or here.

 

CC-BY-NC Science Surf accessed 10.11.2025

Overfitting and model degradation

My beginner experience here isn’t exhilarating – maybe others are suffering as well from poor models but never report it?

During the training phase the model tries to learn the patterns in data based on algorithms that deduce the probability of an event from the presence and absence of certain data. What if the model is learning from noisy, useless or wrong information? Test data may be too small, not representative and models too complex.  As shown in the article linked above, increasing the depth of the classifier tree increases after a certain cut point only the training accuracy but not the test accuracy – overfitting! So this needs a lot of experience to avoid under- and overfitting.

What is model degradation or concept drift? It means that that the statistical property of the predicted variable changes over time in an unforeseen way. While the true world changes – maybe political or by climate or whatsoever – this influences also the data used for prediction making it less accurate. The computer model is static representing the time point when the algorithm has been developed. Empirical data are however dynamic. Model fit need to be reviewed in regular intervals and again this needs a lot of experience.

 

CC-BY-NC Science Surf accessed 10.11.2025

“How come the Muggles don’t hear the bus?”

This is a quote from Harry Potter book about the Knight Bus

The Knight Bus is a triple-decker, purple AEC Regent III RT that assists stranded individuals of the wizarding community through public transportation. It operates at a very fast speed and obstacles will jump out of its way. To hail the bus, a witch or wizard must stick their wand hand in the air in the same manner that a Muggle might do to hail a taxi. The Knight Bus’ conductor is Stan Shunpike, who greets passengers and handles baggage. It is driven by Ernie Prang.

How come the Muggles don’t hear the bus? Because they don’t look for it. Nobody looks for a bus moving at the speed that the Night Bus moves at.

U.S. science is moving at Night Bus speed when the White House issued a new policy yesterday that will require, by 2026, all federally-funded research results to be freely available to the public without delay.

This research, which changes our lives and transforms our world, is made possible by American tax dollars. And yet, these advancements are behind a paywall and out of reach for too many Americans. In too many cases, discrimination and structural inequalities – such as funding disadvantages experienced by minority-serving colleges and institutions – prevent some communities from reaping the rewards of the scientific and technological advancements they have helped to fund. Factors including race, age, disability status, geography, economic background, and gender have historically and systemically excluded some Americans from the accessing the full benefits of scientific research.  To tackle this injustice, and building on the Biden-Harris Administration’s efforts to advance policy that benefits all of America, the White House Office of Science and Technology Policy (OSTP) released new policy guidance today to ensure more equitable access to federally funded research.

What about the German muggles BMBF, DFG,  the major German academies, research and ethics organizations? How come that muggles don’t hear the bus? Because they don’t look for it. Nobody looks for a bus moving at the speed that the Night Bus moves at.

 

CC-BY-NC Science Surf accessed 10.11.2025

Human capital flight

There is an interesting paper on brain drain Why does the U.S. have the best research universities? Incentives, resources, and virtuous circles?

A demand for denominational sorting drove the creation of the nine American colonial colleges. For example, Massachusetts Puritans created Harvard to produce what they saw as a theologically sound education … Connecticut-based Puritans created Yale because they perceived that the Harvard of Massachusetts-based Puritans was too physically distant (in addition to too religiously liberal)

The authors attribute the following U.S. success to “reforms that began after the Civil War and enhanced the incentives and resources the system directs at research” which may be true.  Maybe the overall strategy of the paper is questionable, looking at biographies of Nobel prize winners  only- nevertheless the trend is clear that German impact is decreasing already in 1920 – while I always thought of an exodus of scientists only after 1933.

The U.S. does not spread so much money to  various non-university based “Großforschung” organisations and there is much more private sponsoring of U.S. universities, so monies are more concentrated.

Higher salaries, lower teaching, and enhanced laboratory space illustrate some ways in which professors’ compensation began to reflect research performance. Furthermore, this period saw the emergence of tenure, a salient reward for performance.

The emergence of tenure seems to be important if you can make your living from your work and do not need to start campaigning like #IchbinHanna in Germany.

 

CC-BY-NC Science Surf accessed 10.11.2025

Eine Fußnote zu Fußnoten

https://twitter.com/hekabeohnename/status/1559795108400201728

Schön jedenfalls die vielen juristischen und historischen Einordnungen in diesem Twitter Thread. Es fehlt dabei nur Grafton

Die Fußnote, besonders die deutsche, wird oft für den Inbegriff langweiliger Wissenschaft, für einen Geheim-Code trockener akademischer Gelehrsamkeit gehalten. Doch das heißt, sie als Tummelplatz der Leidenschaften, als Schlachtfeld intellektueller Kämpfe zu verkennen. Anthony Grafton ist es unter Einsatz von zahlreichen Fußnoten gelungen, neues Licht auf ihr Schattendasein zu werfen und ihre dramatische Bandbreite darzulegen.

und die FAZ Fußnote zu Grafton

Geisteswissenschaftler lächeln gerne über den vermeintlich naiven Objektivismus der Naturwissenschaftler. Dabei ist die Geschichte der Naturwissenschaft der geisteswissenschaftlichen Konkurrenz in der Kritik der wissenschaftlichen Denkformen weit voraus. Thomas Kuhn regte Chemiker und Physiker an, nach der Geschichte von Versuchsanordnungen und Beweisverfahren zu fragen.

 

CC-BY-NC Science Surf accessed 10.11.2025

Death by AI

spiegel.de reports a fatal accident of a self driving car.

In Kurve auf Gegenfahrbahn geraten
Ein Toter und neun Schwerverletzte bei Unfall mit Testfahrzeug
Vier Rettungshubschrauber und 80 Feuerwehrleute waren im Einsatz: Bei einem Unfall auf der B28 im Kreis Reutlingen starb ein junger Mann, mehrere Menschen kamen schwer verletzt ins Krankenhaus.

Is there any registry of these kind of accidents?

https://twitter.com/ISusmelj/status/1558912252119482368

and the discussion on responsibility

The first serious accident involving a self-driving car in Australia occurred in March this year. A pedestrian suffered life-threatening injuries when hit by a Tesla Model 3, which the driver claims was in “autopilot” mode.
In the US, the highway safety regulator is investigating a series of accidents where Teslas on autopilot crashed into first-responder vehicles with flashing lights during traffic stops.

 

CC-BY-NC Science Surf accessed 10.11.2025

Verlust der Glaubwürdigkeit

Der Verlust der Glaubwürdigkeit ist das Schlimmste was einem einzelnen Wissenschaftler aber auch einer Kommission passieren kann. Das passierte der COVID19 Sachverständigenkommission (als Drosten ging und Stöhr kam). Leider passiert das auch bei der Stiko mit Mertens et al.

Leider hat das jetzt fatale Folgen, so die ersten Berichte aus der Praxis. Impfungen werden nun generell schwieriger, nicht nur bei “COVID22” sondern bei ALLEN Impfungen, obwohl deren Nutzen-Risiken-Relation unverändert ist.

https://twitter.com/KinderdocNina/status/1558207802564296707

 

CC-BY-NC Science Surf accessed 10.11.2025

Hygiene hypothesis hyperbole

Having written about the hygiene hypothesis, I missed a PNAS News feature even some years ago.

Again: The hygiene hypothesis was not born by David Strachan and well, there are more cracks of the hygiene hypothesis.

“The trouble is, as soon as you use the words ‘hygiene hypothesis,’ the word hygiene prejudges what the cause is,” says Bloomfield. To the public, “hygiene” is interpreted as personal cleanliness: washing hands, keeping food clean and fresh, sanitizing the home. However, because the hypothesis has been largely uncoupled from infections, the idea that we need to be less hygienic is wrong. Relaxing hygiene standards would not reverse the trend but only serve to increase the risks of infectious disease, says Bloomfield. The term “hygiene hypothesis” also fails to incorporate all of the other factors now linked to the increase in immunoregulatory diseases.

I expect that five years after mandating “super hygiene” during COVID-19 we can finally bury the hygiene hypothesis.

 

CC-BY-NC Science Surf accessed 10.11.2025

Big Data Paradox: quality beats quantity

/www.nature.com/articles/s41586-021-04198-4 (via @emollick)

Surveys are a crucial tool for understanding public opinion and behaviour, and their accuracy depends on maintaining statistical representativeness of their target populations by minimizing biases from all sources. Increasing data size shrinks confidence intervals but magnifies the effect of survey bias: an instance of the Big Data Paradox … We show how a survey of 250,000 respondents can produce an estimate of the population mean that is no more accurate than an estimate from a simple random sample of size 10

It basically confirms my earlier observation in asthma genetics

this result was possible with just 415 individuals instead of 500,000 individuals nowadays

 

CC-BY-NC Science Surf accessed 10.11.2025

It is only Monday but already depressing

Comment on the Palm paper by u/Flaky_Suit_8665 via @hardmaru

67 authors, 83 pages, 5408 parameters in a model, the internals of which no one can say they comprehend with a straight face, 6144 TPUs in a commercial lab that no one has access to, on a rig that no one can afford, trained on a volume of data that a human couldn’t process in a lifetime, 1 page on ethics with the same ideas that have been rehashed over and over elsewhere with no attempt at a solution – bias, racism, malicious use, etc. – for purposes that who asked for?

 

CC-BY-NC Science Surf accessed 10.11.2025

Climate endgame?

https://doi.org/10.1073/pnas.2108146119

Prudent risk management requires consideration of bad-to-worst-case scenarios. Yet, for climate change, such potential futures are poorly understood. Could anthropogenic climate change result in worldwide societal collapse or even eventual human extinction? At present, this is a dangerously underexplored topic.

s.C.J.

https://doi.org/10.1038/s41558-022-01426-1

 

 

CC-BY-NC Science Surf accessed 10.11.2025