Tag Archives: ai

From start to finish programming with ChatGPT in 60s

Create a PHP script that can read a CSV in the form start_date, end_date, event and output as ICS file

function convertDate($date)
{
    $dateTime = DateTime::createFromFormat('m/d/Y', $date);
    if ($dateTime === false) {
        return false; // Return false if date parsing fails
    }
    return $dateTime->format('Ymd');
}

// Function to escape special characters in text
function escapeText($text)
{
    return str_replace(["\n", "\r", ",", ";"], ['\n', '\r', '\,', '\;'], $text);
}

// Read CSV file
$csvFile = 'uci.csv'; // Replace with your CSV file name
$icsFile = 'uci.ics'; // Output ICS file name

$handle = fopen($csvFile, 'r');
if ($handle !== false) {
    // Open ICS file for writing
    $icsHandle = fopen($icsFile, 'w');

    // Write ICS header
    fwrite($icsHandle, "BEGIN:VCALENDAR\r\n");
    fwrite($icsHandle, "VERSION:2.0\r\n");
    fwrite($icsHandle, "PRODID:-//Your Company//NONSGML Event Calendar//EN\r\n");

    // Read CSV line by line
    while (($data = fgetcsv($handle, 1000, ',')) !== false) {
        $startDate = convertDate($data[0]);
        $endDate = convertDate($data[1]);
        print_r($data) . PHP_EOL;
        echo $startDate;
        
        if ($startDate === false || $endDate === false) {
            continue;
        }
        $event = escapeText($data[2]);

        // Write event to ICS file
        fwrite($icsHandle, "BEGIN:VEVENT\r\n");
        fwrite($icsHandle, "UID:" . uniqid() . "\r\n"); // Unique identifier
        fwrite($icsHandle, "DTSTART;VALUE=DATE:" . $startDate . "\r\n");
        fwrite($icsHandle, "DTEND;VALUE=DATE:" . $endDate . "\r\n");
        fwrite($icsHandle, "SUMMARY:" . $event . "\r\n");
        fwrite($icsHandle, "DESCRIPTION:" . $event . "\r\n");
        fwrite($icsHandle, "END:VEVENT\r\n");
    }

    // Write ICS footer
    fwrite($icsHandle, "END:VCALENDAR\r\n");

    // Close files
    fclose($icsHandle);
    fclose($handle);

    echo "ICS file generated successfully.";
} else {
    echo "Error: Unable to open CSV file.";
}

Source data are from UCI and output is here from where it can be added as a calendar. BTW created also my first “hello world” Swift/iPhone app using this source although this took a bit more time…

Another thread to science as we know it

While AI may not break science for being backwards directed, there are already the first companies doing AI interviews.

https://www.micro1.ai/vetting-process

Could become a big problem whenever universities are also using this type of job interviews.

Just like journals who use AI for peer review

The authors of the study1, posted on the arXiv preprint server on 11 March, examined the extent to which AI chatbots could have modified the peer reviews of conference proceedings submitted to four major computer-science meetings since the release of ChatGPT. Their analysis suggests that up to 17% of the peer-review reports have been substantially modified by chatbots — although it’s unclear whether researchers used the tools to construct reviews from scratch or just to edit and improve written drafts.

Sprache als „Haus des Seins“

Gefunden auf Deutschlandfunk Kultur:
“Wie Sprachassistenten das Denken manipulieren” ist zwar mit Meinung überschrieben, dabei geht es um Fakten …

Die KI-Forscher erklären es damit, dass GPT-3 zum Ende der Trump-Administration trainiert worden war, als eine harte Haltung gegen die Einwanderung von Flüchtlingen den Diskurs in den USA bestimmte. Diese Einstellung war dann auch in den Trainingsdaten der KI vorherrschend…Die KI-Forscher nennen es den „Geist in der Maschine“ und attestierten diesem einen amerikanischen Akzent. Hausgeist mag die bessere Metapher sein. Denn mehr noch als eine Maschine ist GPT ein Haus, wenn man, so wie der deutsche Philosoph Martin Heidegger, Sprache als „Haus des Seins“ versteht. Sprache ist das Medium, das uns mit der Welt verbindet und zugleich die Art dieser Verbindung prägt. Sie ist kein neutrales Werkzeug, sie ist ein Denkrahmen. Sprache schafft Wirklichkeit, nicht nur in der Dichtung.

Hausgeist kenne ich nicht dafür aber Denkschule, Tradition, Denkrichtung, Doktrin oder Geisteshaltung. Und meine Meinung – ziemlich bedenklich alles.

How does AI recognize AI text

The Semrush blog has a nice summary

By analyzing two main characteristics of the text: perplexity and burstiness. In other words, how predictable or unpredictable it sounds to the reader, as well as how varied or uniform the sentences are.

Perplexity is

a statistical measure of how confidently a language model predicts a text sample. In other words, it quantifies how “surprised” the model is when it sees new data. The lower the perplexity, the better the model predicts the text.

Burstiness is

is the intermittent increases and decreases in activity or frequency of an event. One of measures of burstiness is the Fano factor —a ratio between the variance and mean of counts. In natural language processing, burstiness has a slightly more specific definition… A word is more likely to occur again in a document if it has already appeared in the document. Importantly, the burstiness of a word and its semantic content are positively correlated; words that are more informative are also more bursty.

Or lets call it entropy? So we now have some criteria

    • AI texts are more uniform and  more predictable and often repetitive with
    • lack of depth and personality
    • Sometimes plagiarism checker may recognize “learned” AI phrases. Sometimes reference checkers will find “hallucinated” references
    • Incorrect content and outdated information in contrast needs human experts
    • An obvious, yet underappreciated downside: “AI texts have nothing to say” – “clichéd nothingness“.

Well, there appear now also  AI prologue sentences in scientific literature for example like “Certainly! Here is…”

Parallelized computer code and DNA transcription

At stackexchange there is a super interesting discussion on parallelized computer code and DNA transcription (which is different to the DNA-based molecular programming literature…)

IF : Transcriptional activator; when present a gene will be transcribed. In general there is no termination of events unless the signal is gone; the program ends only with the death of the cell. So the IF statement is always a part of a loop.

WHILE : Transcriptional repressor; gene will be transcribed until repressor is not present.

FUNCTION: There are no equivalents of function calls. All events happen is the same space and there is always a likelihood of interference. One can argue that organelles can act as a compartment that may have a function like properties but they are highly complex and are not just some kind of input-output devices.

GOTO is always dependent on a condition. This can happen in case of certain network connections such as feedforward loops and branched pathways. For example if there is a signalling pathway like this: A → B → C and there is another connection D → C then if somehow D is activated it will directly affect C, making A and B dispensable.

Of course these are completely different concepts. I fully agree with the further stackexchange discussion that

it is the underlying logic that is important and not the statement construct itself and these examples should not be taken as absolute analogies. It is also to be noted that DNA is just a set of instructions and not really a fully functional entity … However, even being just a code it is comparable to a HLL [high level language] code that has to be compiled to execute its functions. See this post too.

Please forget everything you read from Francis Collins about this.

When AI results cannot be generalized

There is a new Science paper that shows

A central promise of artificial intelligence (AI) in healthcare is that large datasets can be mined to predict and identify the best course of care for future patients.  … Chekroud et al. showed that machine learning models routinely achieve perfect performance in one dataset even when that dataset is a large international multisite clinical trial … However, when that exact model was tested in truly independent clinical trials, performance fell to chance levels.

This study predicted antipsychotic medication effects for schizophrenia – admittedly not a trivial task due to high individual variability (as there are no extensive pharmacogenetics studies behind). But why did it completely fail? The authors highlight two major points in the introduction and detail three in the discussion

  • models may overfit the data by fitting the random noise of one particular dataset rather than a true signal
  • poor model transportability is expected due to patients, providers, or implementation characteristics that vary across trials
  • in particular patient groups that are too different across trials while this heterogeneity is not covered in the model
  • missing outcomes and covariates like psychosocial information and social determinants of health were not recorded in all studies
  • patient outcomes may be too context-dependent where trials may have subtly important differences in recruiting procedures, inclusion criteria and/or treatment protocols

So are we left now without any clue?

I remember another example of Gigerenzer in  “Click” showing misclassification of chest X rays due to different devices (mobile or stationary) which associates with more or less serious cases (page 128 refers to Zech et al.).  So we need to know the relevant co-factors first.

There is even a first understanding of the black box data shuffling in the neuronal net.  Using LRP  (Layer-wise Relevance Propagation) the recognition by weighting the characteristics of the input data can already be visualized as a heatmap.

Data voids and search engines

An interesting Nature editorial reporting a recent study

A study in Nature last month highlights a previously underappreciated aspect of this phenomenon: the existence of data voids, information spaces that lack evidence, into which people searching to check the accuracy of controversial topics can easily fall…
Clearly, copying terms from inaccurate news stories into a search engine reinforces misinformation, making it a poor method for verifying accuracy…
Google does not manually remove content, or de-rank a search result; nor does it moderate or edit content, in the way that social-media sites and publishers do.

So what could be done?

There’s also a body of literature on improving media literacy — including suggestions on more, or better education on discriminating between different sources in search results.

Sure increasing media literacy at the consumer site would be helpful. But letting Google earn all that money without any further curation efforts? The original study found

Here, across five experiments, we present consistent evidence that online search to evaluate the truthfulness of false news articles actually increases the probability of believing them.

So why not putting out red flags? Or de-rank search results?

fake screen shot

 

Das Ende der Bachelorarbeit

ist wohl schon eingeleitet zumindest bei der Betriebswirtschaft in Prag, Zitat

Texte, die mit Künstlicher Intelligenz verfasst wurden sind kaum von menschlichen zu unterscheiden. Eine Prüfung sei für Unis deshalb nur sehr schwer möglich, sagt Dekan Hnilica.  “Wir haben andere Teile unseres Studiums, in denen die Studierenden ihre Lernergebnisse oder erwarteten Lernergebnisse nachweisen können. Daher ist die Bachelorarbeit überflüssig.”

Scientific integrity is not a weapon

Bill Ackman is threadening Harvard faculty on TwiX

In the near future AI will target every paper and not only a suspicious table or an image found by chance. Nevertheless using this now as a weapon seems to be immoral and is at high risk of false accusations that may be prosecuted as criminal defamations. Let’s see what happens to big mouthed  announcements…

 

Can ChatGPT generate a RCT dataset that isn’t recognized by forensic experts?

“Free synthetic data”? There are numerous Google ads selling synthetic aka fake data. How “good” are these datasets? Will they ever been used for scientific publications outside the AI field eg  surgisphere-like?

There is a nice paper by Taloni,  Scorcia and Giannaccare that tackles the first question. Unfortunately a nature news commentary by Miryam Naddaf is largely misleading when writing Continue reading Can ChatGPT generate a RCT dataset that isn’t recognized by forensic experts?

The Goggle Gemini video a fake?

techcrunch.com/2023/12/07

Just one problem: the video isn’t real. “We created the demo by capturing footage in order to test Gemini’s capabilities on a wide range of challenges. Then we prompted Gemini using still image frames from the footage, and prompting via text.” (Parmy Olsen at Bloomberg was the first to report the discrepancy.)

It doesn’t even give more confidence if Oriol Vinyals now responds

All the user prompts and outputs in the video are real, shortened for brevity. The video illustrates what the multimodal user experiences built with Gemini could look like. We made it to inspire developers.

Timeless AI stuff

https://twitter.com/o_guest/status/1728722173336993874

May I also emphasize that AI is a research method suffering form severe flaws as Nature reported again yesterday “Scientists worry that ill-informed use of artificial intelligence is driving a deluge of unreliable or useless research”

A team in India reported that artificial intelligence (AI) could do it, using machine learning to analyse a set of X-ray images. … But the following September, computer scientists Sanchari Dhar and Lior Shamir at Kansas State University in Manhattan took a closer look. They trained a machine-learning algorithm on the same images, but used only blank background sections that showed no body parts at all. Yet their AI could still pick out COVID-19 cases at well above chance level.
The problem seemed to be that there were consistent differences in the backgrounds of the medical images in the data set. An AI system could pick up on those artefacts to succeed in the diagnostic task, without learning any clinically relevant features — making it medically useless.

not even mentioning here again data leaking

There has been no systematic estimate of the extent of the problem, but researchers say that, anecdotally, error-strewn AI papers are everywhere. “This is a widespread issue impacting many communities beginning to adopt machine-learning methods,” Kapoor says.

Die Gedanken sind frei, wer kann sie erraten?

Die Gedanken sind frei,
wer kann sie erraten?
Sie ziehen vorbei, wie nächtliche Schatten.
Kein Mensch kann sie wissen,
kein Jäger erschießen mit Pulver und Blei.
Die Gedanken sind frei.

Was sich so schön lyrisch bei Hoffmann von Fallersleben anhört, ist eben nur Lyrik des 19. Jahrhunderts. Gedankenlesen fasziniert die Menschen seit König Davids Zeiten, aber ist erst seit kurzem in Ansätzen möglich (MPI)

Das Ergebnis erstaunte Libet, ebenso wie viele Forscher bis heute: Im Hirn der Probanden baute sich das Bereitschaftspotential bereits auf, bevor sie selbst den Willen zur Bewegung verspürten. Selbst wenn man eine gewisse Verzögerung beim Lesen der Stoppuhr annahm, blieb es dabei – der bewusste Willensakt ereignete sich im Durchschnitt erst drei Zehntelsekunden, nachdem die Handlungsvorbereitungen im Hirn angelaufen waren. Für viele Hirnforscher ließ das nur einen Schluss zu: Die grauen Zellen entschieden offenbar an uns vorbei.

Die technische Auflösung geht immer weiter, von der Antizipation einfacher Bewegungsmuster nun hin zur kompletten Bilderkennung im Gehirn “Mental image reconstruction from human brain activity” hier in der geringfügig korrigierten DeepL Übersetzung

Die von Menschen wahrgenommenen Bilder können aus ihrer Gehirnaktivität rekonstruiert werden. Allerdings ist die Visualisierung (Externalisierung) von mentalen Bildern  eine Herausforderung. Nur wenige Studien haben über eine erfolgreiche Visualisierung von mentaler Bilder berichtet, und ihre visualisierbaren Bilder waren auf bestimmte Bereiche wie menschliche Gesichter oder Buchstaben des Alphabets beschränkt. Daher stellt die Visualisierung mentaler Bilder für beliebige natürliche Bilder einen bedeutenden Meilenstein dar. In dieser Studie haben wir dies durch die Verbesserung einer früheren Methode erreicht. Konkret haben wir gezeigt, dass die in der bahnbrechenden Studie von Shen et al. (2019) vorgeschlagene Methode zur visuellen Bildrekonstruktion stark auf visuelle Informationen, die vom Gehirn dekodiert werden, angewiesen ist und die semantischen Informationen, die während des mentalen Prozesses benutzt werden, nicht sehr effizient genutzt hat. Um diese Einschränkung zu beheben, haben wir die bisherige Methode auf einen Bayes’sche Schätzer erweitert und die Unterstützung semantischer Informationen in die Methode mit aufgenommen. Unser vorgeschlagener Rahmen rekonstruierte erfolgreich sowohl gesehene Bilder (d.h. solche, die vom menschlichen Auge beobachtet wurden) als auch vorgestellte Bilder aus der Gehirnaktivität. Die quantitative Auswertung zeigte, dass unser System gesehene und imaginierte Bilder im Vergleich zur Zufallsgenauigkeit sehr genau identifizieren konnte (gesehen: 90,7%, Vorstellung: 75,6%, Zufallsgenauigkeit: 50.0%). Im Gegensatz dazu konnte die frühere Methode nur gesehene Bilder identifizieren (gesehen: 64,3%, imaginär: 50,4%). Diese
Ergebnisse deuten darauf hin, dass unser System ein einzigartiges Instrument zur direkten Untersuchung der subjektiven Inhalte des Gehirns wie Illusionen, Halluzinationen und Träume ist.

Fig 3A

Poem, poem, poem

A blog post onextracting training data from ChatGPT

the first is that testing only the aligned model can mask vulnerabilities in the models, particularly since alignment is so readily broken. Second, this means that it is important to directly test base models. Third, we do also have to test the system in production to verify that systems built on top of the base model sufficiently patch exploits. Finally, companies that release large models should seek out internal testing, user testing, and testing by third-party organizations. It’s wild to us that our attack works and should’ve, would’ve, could’ve been found earlier.

and the full paper published yesterday

This paper studies extractable memorization: training data that an adversary can efficiently extract by querying a machine learning model without prior knowledge of the training dataset. We show an adversary can extract gigabytes of training data from open-source language models like Pythia or GPT-Neo, semi-open models like LLaMA or Falcon, and closed models like ChatGPT.

I am not convinced that the adversary is the main point her. AI companies are stealing  data [1, 2, 3, 4, 5] without giving ever credit to the sources. So there is now a good chance to see to where ChatGPT has been broken into the house.