Category Archives: Software

The currency of the New Economy won’t be money, but attention

Time to revisit the groundbreaking 1997 @mgoldh  paper in Wired “Attention Shoppers! The currency of the New Economy won’t be money, but attention”

As is now obvious, the economies of the industrialized nations – and especially that of the US – have shifted dramatically. We’ve turned a corner toward an economy where an increasing number of workers are no longer involved directly in the production, transportation, and distribution of material goods, but instead earn their living managing or dealing with information in some form. Most call this an “information economy.”

WordPress displaying Jupyter notebook

I could not find any plugin but a step-by-step guide at a blog commentary

1. Download your notebook as .ipynb file
2. Upload to https://jsvine.github.io/nbpreview/
3. Open your browser dev tools and copy the element id=”notebook-holder” from the DOM
4. Paste it in your WordPress post as raw text
5. Grab CSS from  https://jsvine.github.io/nbpreview/css/vendor/notebook.css

Maybe a true notebook in an iframe is a better solution?

So many asthma papers under fire

As an avid PubPeer reader, I found a  new  entry  by Elisabeth Bik recently about Andreas Pahl of Heidelberg Pharma who has already one retracted and several more papers under scrutiny.

Unfortunately there are now also many asthma trash papers from paper mills. Another example was identified by @gcabanac, distributed by @deevybee and published at Pubpeer.

In total there are 386 asthma entries at PubPeer. What is  really happening in this field? When I started the field there was just one misconduct case – Ranjit Kumar Chandra. That’s an increase from 1 to 386…

What makes it even more complicated that there is no border anymore to predatory journals if also respected scientists drop their names at predatory journals.  Only recently I received an email addressed to one of my former technical assistants as “professor” inviting her to send a paper…

 

Yes, you can search Twitter and block unwanted accounts automatically

 

Using the link https://twitter.com/search-advanced also advanced search is possible while I run a remote backup with archive page and any local copies using GoFullPage. Also Likers Blocker is recommended.

The amyloid Western blot: Schrag vs Lesné

The amyloid analysis published in Nature has been commented at PubPeer and also earned a commentary  of Charles Piller in  Science. “Blots on a field” is leading now even to an expression of concern by Nature.

The editors of Nature have been alerted to concerns regarding some of the figures in this paper. Nature is investigating these concerns, and a further editorial response will follow as soon as possible.

IMHO there are many artifacts including horizontal lines in Fig 2 when converting it to false color display. I can not attribute the lines to any splice mark  and sorry – this is a 16 year old gel image, Basically an eternity has passed in terms of my Nikon history with 5 generations from D2x to Z9. So don’t expect any final conclusion here as afar as we cannot get the original images here.

false color display of Fig 2 a “the presence of 8 M urea did not alter the electrophoretic pattern of Aß oligomers in extracellular-enriched extracts from 12- to 20-month-oldbrains ofTg2576þ/2mice that were probed with 6E10 antibodies.”

Continue reading The amyloid Western blot: Schrag vs Lesné

Initial data analysis

Here is my favorite R list of packages

library(DataExplorer)
plot_str(iris)
plot_bar(iris)
plot_density(iris)
plot_correlation(iris)
plot_prcomp(iris)

see also

devtools::install_github('https://github.com/paulvanderlaken/ppsr')
score(iris, x = 'Sepal.Length', y = 'Petal.Length', algorithm = 'glm')[['pps']]

while for robustness of models I use

devtools::install_github("chiragjp/quantvoe")
library(quantvoe)

Python versions come here or here.

Overfitting and model degradation

My beginner experience here isn’t exhilarating – maybe others are suffering as well from poor models but never report it?

During the training phase the model tries to learn the patterns in data based on algorithms that deduce the probability of an event from the presence and absence of certain data. What if the model is learning from noisy, useless or wrong information? Test data may be too small, not representative and models too complex.  As shown in the article linked above, increasing the depth of the classifier tree increases after a certain cut point only the training accuracy but not the test accuracy – overfitting! So this needs a lot of experience to avoid under- and overfitting.

What is model degradation or concept drift? It means that that the statistical property of the predicted variable changes over time in an unforeseen way. While the true world changes – maybe political or by climate or whatsoever – this influences also the data used for prediction making it less accurate. The computer model is static representing the time point when the algorithm has been developed. Empirical data are however dynamic. Model fit need to be reviewed in regular intervals and again this needs a lot of experience.

Death by AI

spiegel.de reports a fatal accident of a self driving car.

In Kurve auf Gegenfahrbahn geraten
Ein Toter und neun Schwerverletzte bei Unfall mit Testfahrzeug
Vier Rettungshubschrauber und 80 Feuerwehrleute waren im Einsatz: Bei einem Unfall auf der B28 im Kreis Reutlingen starb ein junger Mann, mehrere Menschen kamen schwer verletzt ins Krankenhaus.

Is there any registry of these kind of accidents?

https://twitter.com/ISusmelj/status/1558912252119482368

And to fuel more discussion on responsibility

The first serious accident involving a self-driving car in Australia occurred in March this year. A pedestrian suffered life-threatening injuries when hit by a Tesla Model 3, which the driver claims was in “autopilot” mode.
In the US, the highway safety regulator is investigating a series of accidents where Teslas on autopilot crashed into first-responder vehicles with flashing lights during traffic stops.

Big Data Paradox: quality beats quantity

/www.nature.com/articles/s41586-021-04198-4 (via @emollick)

Surveys are a crucial tool for understanding public opinion and behaviour, and their accuracy depends on maintaining statistical representativeness of their target populations by minimizing biases from all sources. Increasing data size shrinks confidence intervals but magnifies the effect of survey bias: an instance of the Big Data Paradox … We show how a survey of 250,000 respondents can produce an estimate of the population mean that is no more accurate than an estimate from a simple random sample of size 10

It basically confirms my earlier observation in asthma genetics

this result was possible with just 415 individuals instead of 500,000 individuals nowadays

Not too bad: Citation Gecko

Just tried citationgecko.com on a topic that I have been working on for 2 decades. It will find rather quickly the source  paper, much faster than reading through all of it. Unfortunately reviews are rated to be more influential than original data as Citation Gecko picks articles with many references.

It is only Monday but already depressing

Comment on the Palm paper by u/Flaky_Suit_8665 via @hardmaru

67 authors, 83 pages, 5408 parameters in a model, the internals of which no one can say they comprehend with a straight face, 6144 TPUs in a commercial lab that no one has access to, on a rig that no one can afford, trained on a volume of data that a human couldn’t process in a lifetime, 1 page on ethics with the same ideas that have been rehashed over and over elsewhere with no attempt at a solution – bias, racism, malicious use, etc. – for purposes that who asked for?

(replication crisis)^2

We always laughed at the papers  in the “Journal of Irreproducible Results”

https://www.thriftbooks.com/w/the-best-of-the-journal-of-irreproducible-results/473440/item/276126/?gclid=EAIaIQobChMI3NnCm72l-QIVpHNvBB1nIwSWEAQYAiABEgK6__D_BwE#idiq=276126&edition=1874246

 

then we had the replication crisis and nobody laughed anymore.

 

And today? It seems that irreproducible research is set to reach a new height. Elizabeth Gibney discusses an arXiv paper by Sayash Kapoor and Arvind Narayanan basically saying that

reviewers do not have the time to scrutinize these models, so academia currently lacks mechanisms to root out irreproducible papers, he says. Kapoor and his co-author Arvind Narayanan created guidelines for scientists to avoid such pitfalls, including an explicit checklist to submit with each paper … The failures are not the fault of any individual researcher, he adds. Instead, a combination of hype around AI and inadequate checks and balances is to blame.

Algorithms being stuck on shortcuts that don’t always hold has been discussed here earlier . Also data leakage (good old confounding) due to proxy variables seems to be also a common issue.