How does AI recognize AI text

The Semrush blog has a nice summary

By analyzing two main characteristics of the text: perplexity and burstiness. In other words, how predictable or unpredictable it sounds to the reader, as well as how varied or uniform the sentences are.

Perplexity is

a statistical measure of how confidently a language model predicts a text sample. In other words, it quantifies how “surprised” the model is when it sees new data. The lower the perplexity, the better the model predicts the text.

Burstiness is

is the intermittent increases and decreases in activity or frequency of an event. One of measures of burstiness is the Fano factor —a ratio between the variance and mean of counts. In natural language processing, burstiness has a slightly more specific definition… A word is more likely to occur again in a document if it has already appeared in the document. Importantly, the burstiness of a word and its semantic content are positively correlated; words that are more informative are also more bursty.

Or lets call it entropy? So we now have some criteria

    • AI texts are more uniform and  more predictable and often repetitive with
    • lack of depth and personality
    • Sometimes plagiarism checker may recognize “learned” AI phrases. Sometimes reference checkers will find “hallucinated” references
    • Incorrect content and outdated information in contrast needs human experts
    • An obvious, yet underappreciated downside: “AI texts have nothing to say” – “clichéd nothingness“.

Well, there appear now also  AI prologue sentences in scientific literature for example like “Certainly! Here is…”

Verschwörungstheorie – exemplarisch vorgeführt

Die Fakten

Die Verschwörung

So erkennt man Verschwörungstheorien

1. Worum geht es?
Der Glaube, dass bestimmte Ereignisse oder Situationen heimlich hinter den Kulissen von mächtigen Kräften mit negativen Absichten manipuliert werden.

2. Verschwörungstheorien haben diese 6 Dinge gemeinsam
Eine angebliche, geheime Verschwörung.
Eine Gruppe von Verschwörern.
„Beweise“, die die Verschwörungstheorie zu stützen scheinen.
Sie suggerieren fälschlicherweise, dass nichts zufällig geschieht und dass es keine Zufälle gibt; Nichts ist so, wie es scheint und alles ist miteinander verbunden.
Sie teilen die Welt in Gut und Böse ein.
Sie machen Menschen und Gruppen zum Sündenbock.

3. Warum gedeihen sie?
Sie erscheinen oft als logische Erklärung von Ereignissen oder Situationen, die schwer zu verstehen sind und ein falsches Gefühl von Kontrolle und Entscheidungsfreiheit vermitteln. Dieses Bedürfnis nach Klarheit wird in Zeiten der Unsicherheit noch verstärkt.

4. Woher kommen sie?
Verschwörungstheorien beginnen oft mit einem Verdacht. Sie fragen, wer von dem Ereignis oder der Situation profitiert und identifizieren so die Verschwörer. Jeder „Beweis“ muss dann der Theorie entsprechen.

Wenn Verschwörungstheorien erst einmal Fuß gefasst haben, können sie sich schnell verbreiten. Sie sind schwer zu widerlegen, weil jeder, der es versucht, als Teil der Verschwörung angesehen wird.

5. Menschen verbreiten Verschwörungstheorien aus unterschiedlichen Gründen:
Die meisten glauben, dass sie wahr sind. Andere wollen Menschen aus politischen oder finanziellen Gründen gezielt provozieren, manipulieren oder ins Visier nehmen. Sie können aus vielen Quellen stammen wie Internet, Freunden, Verwandten.

“Be less curious about people, but more curious about ideas”

(Maria Skłodowska-Curie).

When scientific conferences were more about facts than about the show: the first seven Solvay conferences

At the first Solvay Conference (1911), Curie (seated, second from right) confers with Henri Poincaré; standing nearby are Rutherford (fourth from right) and Albert Einstein (second from right). Source www.wikipedia.org

Slow virus is good for your lab

… because it keeps you lab running.

Writing now a review in  immunology,  I found a nice joke in The Atlantic quite early in the COVID-19 pandemic

There’s a joke about immunology, which Jessica Metcalf of Princeton recently told me. An immunologist and a cardiologist are kidnapped. The kidnappers threaten to shoot one of them, but promise to spare whoever has made the greater contribution to humanity. The cardiologist says, “Well, I’ve identified drugs that have saved the lives of millions of people.” Impressed, the kidnappers turn to the immunologist. “What have you done?” they ask. The immunologist says, “The thing is, the immune system is very complicated …” And the cardiologist says, “Just shoot me now.”

Thinking before you are acting, and registering before you are studying

Working on a meta-science paper on pitfalls of pre-registration I found it timely to attend a seminar of the Royal Society about the “The promises and pitfalls of pregistration“.

Day 1, Mon March 4, 2024

My personal highlights from today are the introduction 6:35 by Nicholas deVito, the end of the Fiona de Fidler talk 1:36.03 and the great Nick Brown at 4:27:57.

 

Here is day 2, Tue March 5 2024

404

404 errors reported by Nature

More than one-quarter of scholarly articles are not being properly archived and preserved, a study of more than seven million digital publications suggests. The findings, published in the Journal of Librarianship and Scholarly Communication on 24 January indicate that systems to preserve papers online have failed to keep pace with the growth of research output.
“Our entire epistemology of science and research relies on the chain of footnotes,” explains author Martin Eve, a researcher in literature, technology and publishing at Birkbeck, University of London. “If you can’t verify what someone else has said at some other point, you’re just trusting to blind faith for artefacts that you can no longer read yourself.”

Fortunately I save every PDF that I cite. But books may be gone at some time as I have to give them back to the library.

Or I my be gone?

Then it is gone.

404.

Genetic drivers of heterogeneity

“Genetic drivers of heterogeneity” is the new description for the failed genetic concept of “reverse genetic engineering” – where we see now only sand running through the fingers.

The genetic cause of type II diabetes explodes now into fragments. There are now 1,289 “association signals” in a new study while in another study published back to back in Nature 1 billion new genetic variants are being described that have not been included in the 1289 or 08-15 analysis.
What is the purpose of  counting grains of sand?
And why introducing a new concept of race as noted also by others?

 

27 Feb 2024

Another reaction by Michael Eisen and Ewan Birney

The problem, critics said, is that UMAP creates blobs that look distinct while masking the inherent messiness in the data. “The fact that they are distinct is an artefact/feature of UMAP,” Ewan Birney, director of the European Bioinformatics Institute, wrote in a long thread

and by Lior Pachter who has an analysis way beyond the outrage

We begin with the figure legend, which lists Race, Ethnicity and Ancestry. Race and Ethnicity refer to the self identified race choices for participants (based on the OMB categories). Ancestry refers to the genetic ancestry groups discussed above. While these three concepts are distinct, the Ancestry colors are the same as some of the Race and Ethnicity colors: This is problematic because the coloring suggests a 1-1 identification between certain races and ethnicities, and genetic ancestry groups.

where we indeed arrive now at scientific racism.

A retro trio study

Nature News writes about a genetic study in Chinese families

The study also identified some new links. For example, mothers with higher levels of bile acid had shorter babies. Clifton says the analysis falls short of establishing causality but offers leads for further research.

I wonder about the title “The Born in Guangzhou Cohort Study enables generational genetic discoveries” which is more promotional than informational. I wonder also about the geopolitical statement as the map includes also Taiwan (with zero observations, as found also in a previous Cell paper). 

And well this is certainly not the first family study in China (see the halted research of Scott Weiss just before he went into vitamin lobbying).

It is also not any new information that mothers with higher levels of bile acid have shorter babies. Did neither interviewer nor interview partner ever hear of intrahepatic cholestasis during pregnancy that is leading to multiple adverse perinatal outcomes?
Cholestasis is leading to preterm birth, which is  leading to LBW (by an OR of 2) and also to shorter babies.  Without any preregistration and any replication study included, it is difficult to make any conclusion of “leads for further research”. The bile acid result may be a regional artifact if it is only found in one region – basically like in the farming studies.

Neither are numbers in this study as large as the Nature News piece wants us to believe,  I think that 332 trios is only an average study size.

 

Pro Tipp: Next level OCR of academic documents

Reading of math documents into LaTeX involves a lot of typing while there is some support now by FB (Github)

pip install nougat-ocr
nougat path/to/file.pdf -o output_directory

A bug bounty program in science

I asked for that earlier [2019,2022] while only now this idea is being taken up, see https://error.reviews/

https://error.reviews/advisory-board/ screenshot 17-Feb-2023