Science Surf

The common theme seems the low certainty about facts – a historical event that is wrongly memorized by a human or the Large Language Model that wrongly extrapolates from otherwise secure knowledge. But is there even more?

Yann Le Cun is being quoted at IEEE Spectrum

“Large language models have no idea of the underlying reality that language describes,” he said, adding that most human knowledge is nonlinguistic. “Those systems generate text that sounds fine, grammatically, semantically, but they don’t really have some sort of objective other than just satisfying statistical consistency with the prompt.”
Humans operate on a lot of knowledge that is never written down, such as customs, beliefs, or practices within a community that are acquired through observation or experience. And a skilled craftsperson may have tacit knowledge of their craft that is never written down.

I think “hallucination” is way too much an anthropomorphic concept – some LLM output is basically statistical nonsense (although I wouldn’t go as far as Michael Townsen Hicks…). Reasons for these kind of errors are manifold -reference divergence may be already in the data used for learning – data created by bots, conspiracy followers or even fraud science. The error may also originate from encoding or decoding routines.

I couldn’t find any further analogy with wrong human memory recall except the possibility that also human memory is influenced by probability as well. Otgar 2022 cites Calado 2020

The issue of whether repeated events can be implanted in memory has recently been addressed by Calado and colleagues (2020). In their experiment, they falsely told adult participants that they lost their cuddling toy several times while control participants were told that they only lost it once. Strikingly, they found that repeated false events were as easily inserted in memory as suggesting that the event happened once. So, this study not only showed that repeated events can be implanted, it raised doubts about the idea that repeated events might be harder to implant than single events

CC-BY-NC Science Surf accessed 10.03.2026

I have argued earlier that the free decision of an individual to allow genetic testing, will also reveal data on genetic relatives that have never consented to that procedure.
A new review by Bruce Weir now confirms that “it is reasonably straightforward to find the probability of the genotypes of individuals when their relationship is known…” My current work lets me also assume that with 500,000 SNP data at hand, much individual characteristics of the donor can be reconstructed – there are no anonymous DNAs datasets as some people still believe.
I even fear that genetic testing will increase for example in “homeless” (in vitro fertilized) individuals as these people will want to prevent sibling marriage – see for example the a-China DNA project. Other people may be curious about their genealogy, others about drug side effect prediction, lifestyle, assurance questions…
With every new dataset, available datasets will gradually decrease their anonymity level. I fear that anonymity is not so much a dichotomous property, it is much more a likelihood ratio to stay unknown under the probability to be known. Yea, yea.

Addendum

Time online of Dec 17, 2006 reports that the British police is holding the DNA records of more than 1m innocent people â€” eight times more than ministers have previously admitted. I wonder if this will affect participation rate of the UK Biobank that targets health of lifestyle, environment and genes in 500,000 people.

CC-BY-NC Science Surf accessed 10.03.2026

Tag Archives: probability

Similarity between false memory (of humans) and hallucination( of LLMs)

Once again genetic testing

Addendum