And today? It seems that irreproducible research is set to reach a new height. Elizabeth Gibney discusses an arXiv paper by Sayash Kapoor and Arvind Narayanan basically saying that
reviewers do not have the time to scrutinize these models, so academia currently lacks mechanisms to root out irreproducible papers, he says. Kapoor and his co-author Arvind Narayanan created guidelines for scientists to avoid such pitfalls, including an explicit checklist to submit with each paper … The failures are not the fault of any individual researcher, he adds. Instead, a combination of hype around AI and inadequate checks and balances is to blame.
Algorithms being stuck on shortcuts that don’t always hold has been discussed here earlier . Also data leakage (good old confounding) due to proxy variables seems to be also a common issue.
In the deep learning community, it is common to retrospectively blame Minsky and Papert for the onset of the first ‘AI Winter,’ which made neural networks fall out of fashion for over a decade. A typical narrative mentions the ‘XOR Affair,’ a proof that perceptrons were unable to learn even very simple logical functions as evidence of their poor expressive power. Some sources even add a pinch of drama recalling that Rosenblatt and Minsky went to the same school and even alleging that Rosenblatt’s premature death in a boating accident in 1971 was a suicide in the aftermath of the criticism of his work by colleagues.
A waste of time – has been said about other fields but applies to allergy research also when reading the review request of “Allergy” today. I have to keep the content confidential but not the comment of AI expert Jeremy Howard
It’s a problem in science in general. Scientists need to be published which means they need to work on things that their peers are extremely familiar with and can recognize an advance in that area. So, that means that they all need to work on the same thing. The thing they work on… there’s nothing to encourage them to work on things that are practically useful so you get just a whole lot of research which is minor advances and stuff that’s been very highly studied and has no significant practical impact.
Not sure if it is really the biggest but certainly one of the most pressing problems: Out-of-distribution generalization. It is explained as
Imagine, for example, an AI that’s trained to identify cows in images. Ideally, we’d want it to learn to detect cows based on their shape and colour. But what if the cow pictures we put in the training dataset always show cows standing on grass? In that case, we have a spurious correlation between grass and cows, and if we’re not careful, our AI might learn to become a grass detector rather than a cow detector.
As an epidemiologist I would have simply said, it is colliding or confounding, so every new field is rediscovering the same problems over and over again.
Not unexpected AI just running randomly over pixels is leading to spurious association. Once shape and colour of cows has been detected, surrounding environment, like grass or stable is irrelevant. That means that after getting initial results we have to step back, simulate different lighting conditions from sunlight to lightbulb and environmental conditions from grass to slatted floor (invariance principle). Also shape and size matters – cow spots will keep to some extent size and form irrespective if it is a real animal or children toy (scaling principle). I am a bit more sceptical about including also multimodal data (eg smacking sound) as the absence of these features is no proof of non-existence while this sound can also be imitated by other animals.