Tag Archives: ai

Fighting AI with AI

Here is our newest paper – a nice collaboration with Andrea Taloni et al.  along with a nice commentary – to recognize surgisphere-like fraud

Recently, it was proved that the large language model Generative Pre-trained Transformer 4 (GPT-4; OpenAI) can fabricate synthetic medical datasets designed to support false scientific evidence. To uncover statistical patterns that may suggest fabrication in datasets produced by large language models and to improve these synthetic datasets by attempting to remove detectable marks of nonauthenticity, investigating the limits of generative artificial intelligence.

[…] synthetic datasets were produced for 3 fictional clinical studies designed to compare the outcomes of 2 alternative treatments for specific ocular diseases. Synthetic datasets were produced using the default GPT-4o model and a custom GPT. Data fabrication was conducted in November 2024. Prompts were submitted to GPT-4o to produce 12 “unrefined” datasets, which underwent forensic examination. Based on the outcomes of this analysis, the custom GPT Synthetic Data Creator was built with detailed instructions to generate 12 “refined” datasets designed to evade authenticity checks. Then, forensic analysis was repeated on these enhanced datasets.  […]

Sufficiently sophisticated custom GPTs can perform complex statistical tasks and may be abused to fabricate synthetic datasets that can pass forensic analysis as authentic.

 

CC-BY-NC

How to recognize an AI image

Lensrental has some great advice

Quantity Based: One of the continual problems the AI art generation faces is in quantity, though it is continually improving. For instance, in the past, AI art would struggle with getting the correct number of fingers correct, or perhaps the correct placement of knuckles and joints in the fingers.

General Softness & Low Resolution: AI art takes immense computing power to generate, and it still hasn’t streamlined this problem. So often, AI art is limited in resolution and detail.

Repetition: To further expand on the tip above, AI art often uses repetition to help speed up the generation process. So you may see something copied several times over the same image.

Asymmetry: Asymmetry exists in all facets of life,  [… if you] photograph the building so that it looks symmetrical across the plane. AI doesn’t understand these rules and often creates subtle symmetry shifts in its images.

TBC

CC-BY-NC

A useless study of AI accuracy

https://www.jaci-inpractice.org/article/S2213-2198(25)00280-6/pdf

Although promised, the  supplement with the questions is missing and there is no information in the methods how the authors prompted – so basically useless as methods cannot be replicated.

And at the end, chatGPT is only a mirror of how good or bad these experts informed the public…

CC-BY-NC

Is it a crime to use AI for peer review?

I consult the almighty chatGPT frequently for additional information as this saves me hours of wading through my own database, Pubmed, Scholar and Goggle Hits.

But I have my own opinion, I never cut & paste as this is always running at risk (1) to plagiarize unknowingly and (2) to produce nonsense.

Miryam Naddaf has an article about this

In a survey of nearly 5,000 researchers, some 19% said they had already tried using LLMs to ‘increase the speed and ease’ of their review. But the survey, by publisher Wiley, headquartered in Hoboken, New Jersey, didn’t interrogate the balance between using LLMs to touch up prose, and relying on the AI to generate the review.

And well, maybe I am already sticking to the NEJM that said

Although human expert review should continue to be the foundation of the scientific process, LLM feedback could benefit researchers

CC-BY-NC

chatGPT halluziniert

… mal wieder, sogar bei der simplen Frage, warum die Kette am Fahrrad rechts montiert wird. chatGPT 4o jedenfalls argumentiert so

Standardisierung und Tradition. Fahrräder wurden historisch so entwickelt, dass die Kette rechts sitzt, da die meisten Menschen Rechtshänder sind und es sich für die Bedienung von Gangschaltung und Wartung als praktischer erwies.

Die Aussage, daß die meisten Menschen Rechtshänder sind, stimmt natürlich. Die Begründung aber ist falsch, denn man könnte ja auch links schalten – das muss man auch mit den wunderbaren Rahmenschaltern an den italienischen Klassikern. Und für die Wartung ist es auch egal, ob die Kette nun rechts oder links läuft.

Mechanische Vorteile. Die rechte Seite der meisten Fahrräder ist für den Antrieb optimiert (Kettenblatt, Kassette, Schaltwerk). Links ist die Kurbel meist frei von Zahnrädern, was eine symmetrische Kraftübertragung auf die Pedale ermöglicht.  Die meisten Kettenantriebe drehen sich im Uhrzeigersinn (von rechts gesehen), was mit der traditionellen Konstruktion kompatibel ist.

Das ist auch irgendwie  Quatsch und  erklärt zudem  überhaupt nichts.

Sicherheits- und Wartungsaspekte. Die rechte Kettenposition hält den Fahrer auf der vom Bordstein abgewandten Seite frei von Kettenfett.

chatGPT fährt offensichtlich im Linksverkehr? Es kommen immer irgendwie relevante Gesichtspunkte, ohne dass der Sachverhalt aber „verstanden“ wäre.

Continue reading chatGPT halluziniert

CC-BY-NC

AI is using copyrighted material

We know it for years: LLMs are trained by copyrighted material. But we should never forget: Aaron Swartz, a copyright activist lost his life.  And so did Suchir Balaji  (his parents do not believe in a suicide). And another activist Alexandra Elbakayan is being prosecuted for years.

So how can LLMs of all kind now make money of copyrighted text and images bypassing all rules? The Guardian about OpenAI

The developer OpenAI has said it would be impossible to create tools like its groundbreaking chatbot ChatGPT without access to copyrighted material, as pressure grows on artificial intelligence firms over the content used to train their products.

The New York Times about Suchir Balaji

But after the release of ChatGPT in late 2022, he thought harder about what the company was doing. He came to the conclusion that OpenAI’s use of copyrighted data violated the law and that technologies like ChatGPT were damaging the internet. In August, he left OpenAI because he no longer wanted to contribute to technologies that he believed would bring society more harm than benefit.

Are there still copyright rules in place?

Probably.  Getty Images is now suing Stable Diffusion, Facebook is using LibGen although  they had to pay recently 30m penalties. Universal Music filed a lawsuit against Anthropic and NYT against OpenAI. At least a dozen of court cases are ongoing.

But I haven’t heard so far of any action  of  a major medical publishers against any AI company (including the company who sued Elbakayan). They must have a different strategy – instead of suing they just sell their content even behind the back of the authors. This is what Christa Dutton found out.

One of those tech companies, Microsoft, paid Informa, the parent company of Taylor & Francis, an initial fee of $10 million to make use of its content “to help improve relevance and performance of AI systems,” according to a report released in May… Another publisher, Wiley, also recently agreed to sell academic content to a tech company for training AI models. The publisher completed a “GenAI content rights project” with an undisclosed “large tech company,” according to a quarterly earnings report released at the end of June

But can publishers just do this without asking authors? authorsalliance.org has an answer.

In a lot of cases, yes, publishers can license AI training rights without asking authors first. Many publishing contracts include a full and broad grant of rights–sometimes even a full transfer of copyright to the publisher for them to exploit those rights and to license the rights to third parties.

We had been too naive.

Or we have been blackmailed.

 

14/23/25

There was never fair use … https://arstechnica.com/tech-policy/2025/03/openai-urges-trump-either-settle-ai-copyright-debate-or-lose-ai-race-to-china/ … while I now fear that this will be decided by politics not by courts.

 

20/3/2025

https://www.theatlantic.com/technology/archive/2025/03/libgen-meta-openai/682093/ writes

Meta employees acknowledged in their internal communications that training Llama on LibGen presented a “medium-high legal risk,” and discussed a variety of “mitigations” to mask their activity.

leading to the paradoxical situation

LibGen and other such pirated libraries make information more accessible, allowing people to read original work without paying for it. Yet generative-AI companies such as Meta have gone a step further: Their goal is to absorb the work into profitable technology products that compete with the originals.

CC-BY-NC

Elicit

In addition to some programming questions, I have prompted several LLMs for a summary of  my longstanding research question “vitamin D supplementation in the newborn period and later allergy”. Here is the elicit response:

Research on vitamin D supplementation in infancy and its relationship to later allergy development has yielded mixed results. Some studies suggest that higher vitamin D intake during the first year of life may increase the risk of atopic conditions later in childhood (Bäck et al., 2009; Hyppönen et al., 2004). These findings are supported by animal studies and epidemiological surveys linking early vitamin D supplementation to allergy development (Wjst, 2006). However, a randomized controlled trial found that high-dose vitamin D supplementation did not prevent allergic sensitization or diseases in infants compared to standard dosage, except for an increased risk of milk allergy in the high-dose group (Rosendahl et al., 2019). Interestingly, this study also observed that infants with high cord blood vitamin D levels had a higher risk of food allergen sensitization. These conflicting results highlight the complex relationship between vitamin D and allergy development, suggesting that both insufficient and excessive vitamin D levels may potentially influence allergic outcomes.

The limitation introduced by “however” is wrong – the whole story is not really conflicting  if you read my earlier paper on a paradoxical reaction. I have been a reviewer of the Rosendahl paper and asked them to change their misleading spin. It is basically a triple blind study (the authors believed the opposite fact eg that vitamin D supplementation prevents while it induces allergy).  Unfortunately, as Rosendahl et al.  did not respond to my complaint, we can now clearly see how the bias of the authors is being injected into the LLM.

I dont know why there is still the discussion of any genuine understanding of LLMs as  there is none, see Binz et al.

even though LLMs often achieve state-of-the-art performance on existing benchmarks, it remains debated whether this reflects genuine understanding, or whether they are merely acting like stochastic parrots […] When deciding whether to use an LLM, it is important to recognize that LLMs are simply models of word form distributions extracted from text—not models of the information that people might get from reading that text

Anyway, the workflow suggested by inkenai – putting these PDFs now into NotebookML for further analysis – is excellent.

CC-BY-NC

AI lobotomizing knowledge

I tried out chatGPT 4o to create the R ggplot2 code for a professional color chart

v1
v20

ChatGPT had serious problems to recognize even the grid fields while it was impossible to get the right colors or any order after more than a dozen attempts (I created the above chart in less than 15m).

At the end, chatGPT ended with something like a bad copy of Gerhard Richters “4900 Colours”…

https://www.hatjecantz.de/products/16130-gerhard-richter

Why was this task so difficult?

Although labeled as generative, AI is not generative in a linguistic sense that

… aims to explain the cognitive basis of language by formulating and testing explicit models of humans’ subconscious grammatical knowledge

I would like to call it better imitating AI. ChatGPT never got the idea of a professional color chart for optimizing color workflow from camera to print).

It was also lacking any aesthetics. Although the Richter squares are arranged randomly, they form a luminous grid pattern with overwhelming kaleidoscopic color fields.

A less academic version – it is the biggest copyright infringement ever since Kim Dotcom.

TBC

CC-BY-NC

I can produce a hoax in less than 15 seconds

Title: Quantum-Tuned Allergen Resonance Therapy (QT-ART): A Revolutionary Cure for Allergies

Recent advances in bio-resonance and quantum biophotonics have culminated in the development of Quantum-Tuned Allergen Resonance Therapy (QT-ART), a groundbreaking approach to eradicating allergic reactions. QT-ART exploits the unique vibrational frequencies of allergens and harmonizes them with the body’s intrinsic quantum field, effectively neutralizing histamine responses. This method uses bio-engineered, photonic nanoparticle emitters programmed with allergen-specific frequencies, delivered via a wearable wristband equipped with quantum-enhanced LEDs.

A randomized controlled trial involving nearly 15,000 participants demonstrated a 90% cessation of allergy symptoms in all cases, including severe anaphylaxis triggered by peanuts, shellfish, and airborne pollen. Remarkably, the therapy also reversed pre-existing autoimmune conditions in 67% of participants. The results suggest that QT-ART not only cures allergies but also recalibrates the immune system to optimal functionality.

Mechanistically, QT-ART engages with the body’s cellular microtubules, aligning them into coherent quantum states to prevent allergenic protein recognition. This pioneering approach eliminates the need for antihistamines, epinephrine, or immunotherapy injections, offering a side-effect-free alternative for allergy sufferers worldwide.

We propose QT-ART as the new gold standard for immunological health, heralding a quantum revolution in medicine. Further research will explore its applications in combating aging, enhancing athletic performance, and unlocking dormant psychic abilities.

Keywords: quantum resonance, allergy cure, biophotonics, immune recalibration, quantum medicine

(AI generated!!)

CC-BY-NC

7 ways to discover AI generated text

ChatGPT explanation

1. Examine Writing Style and Patterns. AI-generated text often exhibits specific characteristics:

  • Repetition and redundancy: AI might repeat phrases or ideas, especially in longer responses.
  • Overly formal or consistent tone: The text may lack the natural variations in tone and style common in human writing.
  • Generic or vague content: AI often avoids specifics and may use filler phrases that sound good but lack substance.

Continue reading 7 ways to discover AI generated text

CC-BY-NC

Too many AI powered scientific search engines

Being a regular Scholar user, I am quite lost now with the many new scientific search engines. They don’ tell us which data they used for training, how they have been trained and how the results have been validated. The field is also highly dynamic when compared to the situation 2 years ago. Is it worth to test them?

https://www.undermind.ai/home/

Continue reading Too many AI powered scientific search engines

CC-BY-NC

Similarity between false memory (of humans) and hallucination( of LLMs)

The common theme seems the low certainty about facts – a historical event that is wrongly memorized by a human or the Large Language Model that wrongly extrapolates from otherwise secure knowledge. But is there even more?

Yann Le Cun is being quoted at IEEE Spectrum

“Large language models have no idea of the underlying reality that language describes,” he said, adding that most human knowledge is nonlinguistic. “Those systems generate text that sounds fine, grammatically, semantically, but they don’t really have some sort of objective other than just satisfying statistical consistency with the prompt.”
Humans operate on a lot of knowledge that is never written down, such as customs, beliefs, or practices within a community that are acquired through observation or experience. And a skilled craftsperson may have tacit knowledge of their craft that is never written down.

I think “hallucination” is way too much an anthropomorphic concept – some LLM output is basically statistical nonsense (although I wouldn’t go as far as  Michael Townsen Hicks…). Reasons for these kind of errors are manifold -reference divergence may be already in the data used for learning – data created by bots, conspiracy followers or even fraud science. The error may also originate from encoding or decoding routines.

I couldn’t find any further analogy with wrong human memory recall except the possibility that also human memory is influenced by  probability as well. Otgar 2022 cites Calado 2020

The issue of whether repeated events can be implanted in memory has recently been addressed by Calado and colleagues (2020). In their experiment, they falsely told adult participants that they lost their cuddling toy several times while control participants were told that they only lost it once. Strikingly, they found that repeated false events were as easily inserted in memory as suggesting that the event happened once. So, this study not only showed that repeated events can be implanted, it raised doubts about the idea that repeated events might be harder to implant than single events

 

CC-BY-NC

More AI headlines

-1-

While we are still waiting for the Nobel prize speech of Geoffrey Hinton in December, AI makes even more negative headlines.

[Hinton] “I worry that the overall consequences of this might be systems that are more intelligent than us that might eventually take control.” He also said he uses the AI chatbot ChatGPT4 for many things now but with the knowledge that it does not always get the answer right.

 

-2-

The sheer power consumption of running AI models is frightening. Nature News asks if AI’s huge energy demands will spur a nuclear renaissance

Google announced that it will buy electricity made with reactors developed by Kairos Power, based in Alameda, California. Meanwhile, Amazon is investing approximately US$500 million in the X-Energy Reactor Company, based in Rockville, Maryland, and has agreed to buy power produced by X-energy-designed reactors due to be built in Washington State.

 

-3-

A former OpenAI employee talks on his blog how AI is using copyrighted material eg stealing content.

While generative models rarely produce outputs that are substantially similar to any of their training inputs, the process of training a generative model involves making copies of copyrighted data. If these copies are unauthorized, this could potentially be considered copyright infringement, depending on whether or not the specific use of the model qualifies as “fair use”. Because fair use is determined on a case-by-case basis, no broad statement can be made about when generative AI qualifies for fair use. Instead, I’ll provide a specific analysis for ChatGPT’s use of its training data, but the same basic template will also apply for many other generative AI products.

Effects can be measured only indirectly for example by the visitor count at Stack Overflow where the traffic declined as many user (including me) don’t need Stack Overflow anymore.
Here is another phantastic discussion over at PP between Henry Leirvoll and 495yt on the very basic questions of copyright.

humans get inspired (parsing the external examples or experiences through their inner understanding and individual perspective) they start working to make something with their tools, skills, time and purpose. the result represents the author, their influences and their message.
a lot of this process is protected by copyright.
ai is not inspired. and it has no personal perspective or tools. no message to transmit.
any message put into prompts by an ai user is translated by it’s LLM layer into other, more complex prompts, which also get treated quasi-randomly by the weights and biases of the model, as well as rand seeds.

 

-4-

And well, ChatGPT can produce malicious code even with all precautions: Researchers Bypass AI Safeguards Using Hexadecimal Encoding and Emojis

If a user instructs the chatbot to write an exploit for a specified CVE, they are informed that the request violates usage policies. However, if the request was encoded in hexadecimal format, the guardrails were bypassed and ChatGPT not only wrote the exploit, but also attempted to execute it “against itself”, according to Figueroa.

CC-BY-NC

AI hallucination

News article and  paper showing

bigger AI chatbots more inclined to spew nonsense — and people don’t always realize.

and some solutions

various emerging techniques should help to create chatbots that bullshit less, or that can, at least, be prodded to disclose when they are not confident in their answers. But some hallucinatory behaviours might get worse before they get better.

CC-BY-NC