Category Archives: Software

Elicit

In addition to some programming questions, I have prompted several LLMs for a summary of  my longstanding research question “vitamin D supplementation in the newborn period and later allergy”. Here is the elicit response:

Research on vitamin D supplementation in infancy and its relationship to later allergy development has yielded mixed results. Some studies suggest that higher vitamin D intake during the first year of life may increase the risk of atopic conditions later in childhood (Bäck et al., 2009; Hyppönen et al., 2004). These findings are supported by animal studies and epidemiological surveys linking early vitamin D supplementation to allergy development (Wjst, 2006). However, a randomized controlled trial found that high-dose vitamin D supplementation did not prevent allergic sensitization or diseases in infants compared to standard dosage, except for an increased risk of milk allergy in the high-dose group (Rosendahl et al., 2019). Interestingly, this study also observed that infants with high cord blood vitamin D levels had a higher risk of food allergen sensitization. These conflicting results highlight the complex relationship between vitamin D and allergy development, suggesting that both insufficient and excessive vitamin D levels may potentially influence allergic outcomes.

The limitation introduced by “however” is wrong – the whole story is not really conflicting  if you read my earlier paper on a paradoxical reaction. I have been a reviewer of the Rosendahl paper and asked them to change their misleading spin. It is basically a triple blind study (the authors believed the opposite fact eg that vitamin D supplementation prevents while it induces allergy).  Unfortunately, as Rosendahl et al.  did not respond to my complaint, we can now clearly see how the bias of the authors is being injected into the LLM.

I dont know why there is still the discussion of any genuine understanding of LLMs as  there is none, see Binz et al.

even though LLMs often achieve state-of-the-art performance on existing benchmarks, it remains debated whether this reflects genuine understanding, or whether they are merely acting like stochastic parrots […] When deciding whether to use an LLM, it is important to recognize that LLMs are simply models of word form distributions extracted from text—not models of the information that people might get from reading that text

Anyway, the workflow suggested by inkenai – putting these PDFs now into NotebookML for further analysis – is excellent.

 

CC-BY-NC Science Surf , accessed 06.04.2026

Die elektronische Patientenakte

“das Narrativ der sicheren elektronischen Patientenakte ist nicht mehr zu halten” so der CCC2024.

oder heise.de

Nachdem Sicherheitsforscher auf dem 38. Chaos Communication Congress gravierende Mängel bei der elektronischen Patientenakte (ePA) für gesetzliche Versicherte gefunden haben, fordert der Chef der Bundesärztekammer, Klaus Reinhardt, rasche Nachbesserung. Er könne die ePA 3.0 nach aktuellem Stand nicht empfehlen. Dennoch sei das keine Aufforderung zum Opt-out. Der Verband der Kinder- und Jugendärzt:innen (BVKJ) rät Eltern hingegen, für deren Kinder Widerspruch einzulegen. Das berichten das Ärzteblatt und die Ärztezeitung.

und nochmal heise.de

Ärzte unterliegen der Schweigepflicht und gehören zu den Berufsgeheimnisträgern [2]. Dass ärztliche Unterlagen und Aufzeichnungen über Patienten nicht einfach beschlagnahmt werden können, wird in der Strafprozessordnung (StPO) in § 97 Beschlagnahmeverbote [3] geregelt. Voraussetzung ist, dass sich zu beschlagnahmende Gegenstände “im Gewahrsam der zur Verweigerung des Zeugnisses Berechtigten” befinden. Da sich die elektronische Gesundheitskarte nicht im Gewahrsam des Arztes, sondern im Gewahrsam des Patienten befindet …

kann wohl auch der Staat darauf zugreifen.

 

CC-BY-NC Science Surf , accessed 06.04.2026

AI lobotomizing knowledge

I tried out chatGPT 4o to create the R ggplot2 code for a professional color chart

v1
v20

ChatGPT had serious problems to recognize even the grid fields while it was impossible to get the right colors or any order after more than a dozen attempts (I created the above chart in less than 15m).

At the end, chatGPT ended with something like a bad copy of Gerhard Richters “4900 Colours”…

https://www.hatjecantz.de/products/16130-gerhard-richter

Why was this task so difficult?

Although labeled as generative, AI is not generative in a linguistic sense that

… aims to explain the cognitive basis of language by formulating and testing explicit models of humans’ subconscious grammatical knowledge

I would like to call it better imitating AI. ChatGPT never got the idea of a professional color chart for optimizing color workflow from camera to print).

It was also lacking any aesthetics. Although the Richter squares are arranged randomly, they form a luminous grid pattern with overwhelming kaleidoscopic color fields.

A less academic version – it is the biggest copyright infringement ever since Kim Dotcom.

TBC

 

CC-BY-NC Science Surf , accessed 06.04.2026

Update on Mendelian Randomization

As written before I never published any study that included a Mendelian randomization. The reasons are well known.

A new paper from Bristol discusses  the  recent explosion of low-quality two-sample Mendelian randomization studies and offers a cure.

We advise editors to simply reject papers that only report 2SMR findings, with no additional supporting evidence. For reviewers receiving such papers, we provide a template for rejection.

 

CC-BY-NC Science Surf , accessed 06.04.2026

Audio Video Latency Test

Here is new  video test file build with R.

I changed several things from the last version, basically switching to a new layout and going down from 100fps to 60fps as YT can handle this much better.

 

 

Just in case, somebody wants to modify it, here is the script.

vid <- function(nr){
	nr2 = as.integer(nr*60) # total number 1/2s
	nr3 = -600 + (nr*10) # current ms title
	for (ii in 1:60 ){
		fn = paste0(str_pad(nr*60+ii,5,pad = "0"), ".png")
		png(file =  fn, width = 1600, height = 900, units = 'px') # defaults to 300 ppi
		par(mar=c(0,0,0,0),bg="black")	
		plot(c(0, 1), c(0, 1), ann = F, bty = 'n', type = 'n', xaxt = 'n', yaxt = 'n', asp=1)
		color="red"
		rect(xleft=0.5, xright=(nr3+500)/1000, ybottom=0.94, ytop=0.99, col= color)
		color="lightgrey"
		if( (nr<=58 & ii==30+nr/2) | (nr>=60 & ii==-30+nr/2) ) {
			circlize::draw.sector(0, 360, center = c(0.02, 0.01), rou1 = 0.01, col = color, border = color)		
		}	
		circlize::draw.sector(90, 90-ii*6, center = c(0.5, 0.52), rou1 = 0.4, col = color, border = color)
		if (ii<3 | ii>57) {
			color="white"
			circlize::draw.sector(0, 360, center = c(0.5, 0.52), rou1 = 0.4, col = color, border = color)
		}
		tx=paste(nr3,' ms')
		text(x = 0.5, y = 0.85, tx, cex = 6, col = "white", family="Lato", adj=0.5)
		tx=paste0(nr/2,':',str_pad( round(100*ii/60), 2, pad = "0"))
		text(x = 0.5, y = 0.5, tx, cex = 12, col = "white", family="Lato", adj=0.5)
		tx = "play until beep & flash in sync OR take image of source and processed video"
		text(x = 0.5, y = 0.075, tx, cex = 3, col = "grey", family="Lato", adj=0.5)
		par(bg="white")
		dev.off()
	}	
}
for (i in seq(0,120,2) ) {
	vid(i)
}
fn = paste0(list.files('*.png'))
av::av_encode_video(fn, framerate = 60, output = 'video.mp4')

 

CC-BY-NC Science Surf , accessed 06.04.2026

Too many AI powered scientific search engines

Being a regular Scholar user, I am quite lost now with the many new scientific search engines. They don’ tell us which data they used for training, how they have been trained and how the results have been validated. The field is also highly dynamic when compared to the situation 2 years ago. Is it worth to test them?

https://www.undermind.ai/home/

Continue reading Too many AI powered scientific search engines

 

CC-BY-NC Science Surf , accessed 06.04.2026

Similarity between false memory (of humans) and hallucination( of LLMs)

The common theme seems the low certainty about facts – a historical event that is wrongly memorized by a human or the Large Language Model that wrongly extrapolates from otherwise secure knowledge. But is there even more?

Yann Le Cun is being quoted at IEEE Spectrum

“Large language models have no idea of the underlying reality that language describes,” he said, adding that most human knowledge is nonlinguistic. “Those systems generate text that sounds fine, grammatically, semantically, but they don’t really have some sort of objective other than just satisfying statistical consistency with the prompt.”
Humans operate on a lot of knowledge that is never written down, such as customs, beliefs, or practices within a community that are acquired through observation or experience. And a skilled craftsperson may have tacit knowledge of their craft that is never written down.

I think “hallucination” is way too much an anthropomorphic concept – some LLM output is basically statistical nonsense (although I wouldn’t go as far as  Michael Townsen Hicks…). Reasons for these kind of errors are manifold -reference divergence may be already in the data used for learning – data created by bots, conspiracy followers or even fraud science. The error may also originate from encoding or decoding routines.

I couldn’t find any further analogy with wrong human memory recall except the possibility that also human memory is influenced by  probability as well. Otgar 2022 cites Calado 2020

The issue of whether repeated events can be implanted in memory has recently been addressed by Calado and colleagues (2020). In their experiment, they falsely told adult participants that they lost their cuddling toy several times while control participants were told that they only lost it once. Strikingly, they found that repeated false events were as easily inserted in memory as suggesting that the event happened once. So, this study not only showed that repeated events can be implanted, it raised doubts about the idea that repeated events might be harder to implant than single events

 

 

CC-BY-NC Science Surf , accessed 06.04.2026

More AI headlines

-1-

While we are still waiting for the Nobel prize speech of Geoffrey Hinton in December, AI makes even more negative headlines.

[Hinton] “I worry that the overall consequences of this might be systems that are more intelligent than us that might eventually take control.” He also said he uses the AI chatbot ChatGPT4 for many things now but with the knowledge that it does not always get the answer right.

 

-2-

The sheer power consumption of running AI models is frightening. Nature News asks if AI’s huge energy demands will spur a nuclear renaissance

Google announced that it will buy electricity made with reactors developed by Kairos Power, based in Alameda, California. Meanwhile, Amazon is investing approximately US$500 million in the X-Energy Reactor Company, based in Rockville, Maryland, and has agreed to buy power produced by X-energy-designed reactors due to be built in Washington State.

 

-3-

A former OpenAI employee talks on his blog how AI is using copyrighted material eg stealing content.

While generative models rarely produce outputs that are substantially similar to any of their training inputs, the process of training a generative model involves making copies of copyrighted data. If these copies are unauthorized, this could potentially be considered copyright infringement, depending on whether or not the specific use of the model qualifies as “fair use”. Because fair use is determined on a case-by-case basis, no broad statement can be made about when generative AI qualifies for fair use. Instead, I’ll provide a specific analysis for ChatGPT’s use of its training data, but the same basic template will also apply for many other generative AI products.

Effects can be measured only indirectly for example by the visitor count at Stack Overflow where the traffic declined as many user (including me) don’t need Stack Overflow anymore.
Here is another phantastic discussion over at PP between Henry Leirvoll and 495yt on the very basic questions of copyright.

humans get inspired (parsing the external examples or experiences through their inner understanding and individual perspective) they start working to make something with their tools, skills, time and purpose. the result represents the author, their influences and their message.
a lot of this process is protected by copyright.
ai is not inspired. and it has no personal perspective or tools. no message to transmit.
any message put into prompts by an ai user is translated by it’s LLM layer into other, more complex prompts, which also get treated quasi-randomly by the weights and biases of the model, as well as rand seeds.

 

-4-

And well, ChatGPT can produce malicious code even with all precautions: Researchers Bypass AI Safeguards Using Hexadecimal Encoding and Emojis

If a user instructs the chatbot to write an exploit for a specified CVE, they are informed that the request violates usage policies. However, if the request was encoded in hexadecimal format, the guardrails were bypassed and ChatGPT not only wrote the exploit, but also attempted to execute it “against itself”, according to Figueroa.

 

CC-BY-NC Science Surf , accessed 06.04.2026

Steuerung von Lüfter und Trockner im Keller

Nach einigen früheren Projekten auf Basis des Raspberry Pi Zero kommt hier nun ein viertes mit dem Temperatur und Luftfeuchte im Keller und der Aussenluft bestimmt wird. Es funktioniert auf Basis des häufig verwendeten SHTC3 Sensors.

 

 

Da die physikalische I2C Adresse des SHTC3 nicht geändert werden kann, brauchen wir zusätzlich ein Mux Board, das mit einer Lötbrücke auf dem ADR0 Jumper die Adresse von 0x70 auf 0x71 ändert.

#!/usr/bin/env python3

import csv
import board
import busio
import adafruit_shtc3
from adafruit_tca9548a import TCA9548A

def main():
  # Initialize the iic bus
  i2c = busio.I2C(board.SCL, board.SDA)

  # Initialize the TCA9548A multiplexer
  mux = TCA9548A(i2c, address=0x71)

  # Initialize the SHTC3 sensor on MUX channel 0
  shtc3_channel_0 = adafruit_shtc3.SHTC3(mux[0])

  # Initialize the SHTC3 sensor on MUX channel 1
  shtc3_channel_1 = adafruit_shtc3.SHTC3(mux[1])

  with open("/home/admin/www/taupunkt.log","a+") as out_file:
    tsv_writer = csv.writer(out_file, delimiter='\t')
    temperature0, relative_humidity0 = read_shtc3(shtc3_channel_0)
    temperature1, relative_humidity1 = read_shtc3(shtc3_channel_1)
    dt = time.strftime('%Y-%m-%d %H:%M:%S')
    tsv_writer.writerow([dt,
      temperature0,relative_humidity0,
      temperature1,relative_humidity1])

if __name__ == '__main__':
main()

Damit kann nun nach Bedarf ein Lüfter angesteuert werden, mit unter 40€ Bauteilen statt mit 570€ Fertiglösung. Als smarte Steckdosen gibt es zB die Edimax Smartplug SP1101W auf der ein Webserver läuft der curl Befehle annimmt.

 

CC-BY-NC Science Surf , accessed 06.04.2026

AI hallucination

News article and  paper showing

bigger AI chatbots more inclined to spew nonsense — and people don’t always realize.

and some solutions

various emerging techniques should help to create chatbots that bullshit less, or that can, at least, be prodded to disclose when they are not confident in their answers. But some hallucinatory behaviours might get worse before they get better.

 

CC-BY-NC Science Surf , accessed 06.04.2026

Academic text parsing

I used to parse PDFs using the Allenai method and the layoutparser.
This worked in many instances but is no longer maintained.
I still have Nougat on my to do list while a new paper now points to AceParse

AceParse includes various types of structured text, such as formulas, tables, algorithms, lists, and sentences embedded with mathematical expressions, among others. We provide examples of several dataset samples to give you a better understanding of our dataset.

 

 

CC-BY-NC Science Surf , accessed 06.04.2026

Remarkable : I don’t want to be part of this scene anymore

From the creator of wordfreq

Generative AI has polluted the data
I don’t think anyone has reliable information about post-2021 language usage by humans.
The open Web (via OSCAR) was one of wordfreq’s data sources. Now the Web at large is full of slop generated by large language models, written by no one to communicate nothing. Including this slop in the data skews the word frequencies.

 

 

CC-BY-NC Science Surf , accessed 06.04.2026

When the printing from wordpress child theme is not working

I wanted to allow a PDF download on my blog. Unfortunately this turned out to be complicated for multiple problems including WP and browser caching, CSS failure and mPDF misdirection. Here are the solutions.

Issue 1: If you are editing the print.css but the browser shows the old version: Comment out or delete the reference to parent in child css

/*
Theme Name: Twenty Twenty Child 1
Theme URI: http://wordpress.org/themes/twentytwenty
@import url('../twentytwenty/style.css');
*/

and include versioning in header.php

function my_theme_enqueue_styles() { 
  wp_enqueue_style( 'twentyfourteen', get_template_directory_uri() . '/style.css' );
  wp_enqueue_style( 'twentyfourteen-child1', get_stylesheet_directory_uri() . '/style.css',  array(), filemtime(get_template_directory() . '/style.css'), false );
}
add_action( 'wp_enqueue_scripts', 'my_theme_enqueue_styles' );

 

Issue 2: css editing is applied but printing is still not correctly formatted, possible due to higher specificity of formatting instructions in the parent theme.

Solution: This is a bit tricky as it needs try and error to find the offending element. Chrome can show the print version in the browser as shown on this SO thread

https://stackoverflow.com/questions/9540990/using-chromes-element-inspector-in-print-preview-mode

Then add the important flag to the specific css rule..

@media print {
   .myelement1 { display: none !important; }
}

 

Issue 3: mPDF pulls the screen not the print version. Solution: Create an own mpdf media tag.

@media mpdf {
  #sidebar, #header {
    display:none !important;
  }
  blockquote, table, pre {
    page-break-inside:avoid !important;
    font-size: 0.7em  !important;
  }
}

 

CC-BY-NC Science Surf , accessed 06.04.2026