{"id":22973,"date":"2023-12-09T11:31:03","date_gmt":"2023-12-09T09:31:03","guid":{"rendered":"https:\/\/www.wjst.de\/blog\/?p=22973"},"modified":"2023-12-10T17:11:49","modified_gmt":"2023-12-10T15:11:49","slug":"can-chatgpt-generate-a-rct-dataset-that-isnt-recognized-by-forensic-experts","status":"publish","type":"post","link":"https:\/\/www.wjst.de\/blog\/sciencesurf\/2023\/12\/can-chatgpt-generate-a-rct-dataset-that-isnt-recognized-by-forensic-experts\/","title":{"rendered":"Can ChatGPT generate a RCT dataset that isn&#8217;t recognized by forensic experts?"},"content":{"rendered":"<p>&#8220;Free synthetic data&#8221;? There are numerous Google ads selling synthetic aka fake data. How &#8220;good&#8221; are these datasets? Will they ever been used for scientific publications outside the AI field eg\u00a0 <a href=\"https:\/\/en.wikipedia.org\/wiki\/Surgisphere\">surgisphere-like<\/a>?<\/p>\n<p>There is a nice paper by <a href=\"https:\/\/jamanetwork.com\/journals\/jamaophthalmology\/article-abstract\/2811505\">Taloni,\u00a0 Scorcia and Giannaccare <\/a>that tackles the first question. Unfortunately <a href=\"https:\/\/www.nature.com\/articles\/d41586-023-03635-w\">a nature news commentary<\/a> by Miryam Naddaf is largely misleading when writing<!--more--><\/p>\n<blockquote><p>It will make it very easy for any researcher or group of researchers to create fake measurements on non-existent patients, fake answers to questionnaires or to generate a large data set on animal experiments.<\/p><\/blockquote>\n<p>Before I am going to provide the proof that it is much more complicated to generate any credible dataset, please read first the <a href=\"https:\/\/jamanetwork.com\/journals\/jamaophthalmology\/article-abstract\/2811505\">original paper<\/a> and the<a href=\"https:\/\/www.nature.com\/articles\/d41586-023-03635-w\"> commentary<\/a> (BTW<a href=\"https:\/\/pubmed.ncbi.nlm.nih.gov\/36867645\/\"> I do not expect<\/a> any high correlation between pre- and post vision acuity).<\/p>\n<p>As a reference here are the conditions how the data were generated. At this point I would like to thank the <a href=\"https:\/\/jamanetwork.com\/journals\/jamaophthalmology\/article-abstract\/2811505\">Italian colleagues<\/a> for giving me the dataset. <a href=\"https:\/\/www.aao.org\/eyenet\/article\/deep-anterior-lamellar-keratoplasty\">DALK is <\/a>a shortcut for deep anterior lamellar keratoplasty and PK for penetrating keratoplasty.<\/p>\n<p>&nbsp;<\/p>\n<figure id=\"attachment_22998\" aria-describedby=\"caption-attachment-22998\" style=\"width: 272px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/Bildschirmfoto-2023-12-08-um-15.21.53.jpg\" rel=\"key\" data-rel=\"key-image-0\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-22998 \" src=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/Bildschirmfoto-2023-12-08-um-15.21.53-620x757.jpg\" alt=\"\" width=\"272\" height=\"332\" srcset=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/Bildschirmfoto-2023-12-08-um-15.21.53-620x757.jpg 620w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/Bildschirmfoto-2023-12-08-um-15.21.53-409x500.jpg 409w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/Bildschirmfoto-2023-12-08-um-15.21.53-768x938.jpg 768w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/Bildschirmfoto-2023-12-08-um-15.21.53.jpg 858w\" sizes=\"auto, (max-width: 272px) 100vw, 272px\" \/><\/a><figcaption id=\"caption-attachment-22998\" class=\"wp-caption-text\">click for details, unfortunately the quality of the supplemental PDF is low<\/figcaption><\/figure>\n<p>&nbsp;<\/p>\n<p>Of course, there will be no entry of this RCT in any study registry. And there will be no other SOPs and no backup dataset if any commission would ever start an investigation. But can we just conclude from the data that they are being faked? Lets dive into the dataset.<\/p>\n<p>&nbsp;<\/p>\n<figure id=\"attachment_22980\" aria-describedby=\"caption-attachment-22980\" style=\"width: 184px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/n.png\" rel=\"key\" data-rel=\"key-image-1\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-22980\" src=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/n.png\" alt=\"\" width=\"184\" height=\"211\" srcset=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/n.png 489w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/n-436x500.png 436w\" sizes=\"auto, (max-width: 184px) 100vw, 184px\" \/><\/a><figcaption id=\"caption-attachment-22980\" class=\"wp-caption-text\">Basic description<\/figcaption><\/figure>\n<p>&nbsp;<\/p>\n<p>My first issue is not mentioned in the news commentary &#8211; there are NO missing values at all (#1). This is impossible in real life as every variable\u00a0 except of the study ID may include one or more missing values.<\/p>\n<p>&nbsp;<\/p>\n<figure id=\"attachment_22997\" aria-describedby=\"caption-attachment-22997\" style=\"width: 225px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/names.png\" rel=\"key\" data-rel=\"key-image-2\" data-rl_title=\"click for details\" data-rl_caption=\"click for details\" title=\"click for details\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-22997\" src=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/names-620x821.png\" alt=\"\" width=\"225\" height=\"298\" srcset=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/names-620x821.png 620w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/names-377x500.png 377w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/names.png 659w\" sizes=\"auto, (max-width: 225px) 100vw, 225px\" \/><\/a><figcaption id=\"caption-attachment-22997\" class=\"wp-caption-text\">IDs<\/figcaption><\/figure>\n<p>&nbsp;<\/p>\n<p>The &#8220;double eyes&#8221; are nonsense (#2).<\/p>\n<p>&nbsp;<\/p>\n<figure id=\"attachment_23012\" aria-describedby=\"caption-attachment-23012\" style=\"width: 330px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/double.png\" rel=\"key\" data-rel=\"key-image-3\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-23012\" src=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/double-620x210.png\" alt=\"\" width=\"330\" height=\"112\" srcset=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/double-620x210.png 620w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/double-1475x500.png 1475w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/double-768x260.png 768w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/double-1536x521.png 1536w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/double.png 1596w\" sizes=\"auto, (max-width: 330px) 100vw, 330px\" \/><\/a><figcaption id=\"caption-attachment-23012\" class=\"wp-caption-text\">Double IDs<\/figcaption><\/figure>\n<p>&nbsp;<\/p>\n<p>We have a lot of gender\/sex discussions right now but in a surgery dataset from the last decade surnames and sex should match &#8211; so this is\u00a0 fatal flaw #3.<\/p>\n<p>Are there any any other missing or artificial associations?<\/p>\n<p>&nbsp;<\/p>\n<figure id=\"attachment_22983\" aria-describedby=\"caption-attachment-22983\" style=\"width: 339px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/corr.png\" rel=\"key\" data-rel=\"key-image-4\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-22983\" src=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/corr-620x620.png\" alt=\"\" width=\"339\" height=\"339\" srcset=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/corr-620x620.png 620w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/corr-500x500.png 500w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/corr-150x150.png 150w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/corr-768x768.png 768w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/corr-120x120.png 120w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/corr.png 1024w\" sizes=\"auto, (max-width: 339px) 100vw, 339px\" \/><\/a><figcaption id=\"caption-attachment-22983\" class=\"wp-caption-text\">Pearson Correlation Matrix<\/figcaption><\/figure>\n<p>&nbsp;<\/p>\n<p>I am also plotting the predictive power score [<a href=\"https:\/\/www.kaggle.com\/code\/frtgnn\/predictive-power-score-vs-correlation\">ref<\/a>, <a href=\"https:\/\/towardsdatascience.com\/rip-correlation-introducing-the-predictive-power-score-3d90808b9598\">ref<\/a>] as a simple Pearson correlation cannot detect non-linear bivariate relationships (without extensive preprocessing).<\/p>\n<p>&nbsp;<\/p>\n<figure id=\"attachment_22987\" aria-describedby=\"caption-attachment-22987\" style=\"width: 389px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/corr2.png\" rel=\"key\" data-rel=\"key-image-5\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-22987\" src=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/corr2-620x620.png\" alt=\"\" width=\"389\" height=\"389\" srcset=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/corr2-620x620.png 620w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/corr2-500x500.png 500w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/corr2-150x150.png 150w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/corr2-768x768.png 768w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/corr2-120x120.png 120w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/corr2.png 1024w\" sizes=\"auto, (max-width: 389px) 100vw, 389px\" \/><\/a><figcaption id=\"caption-attachment-22987\" class=\"wp-caption-text\">Predictive Power Score Matrix<\/figcaption><\/figure>\n<p>&nbsp;<\/p>\n<p>Well, I am not an opthalmologist but the overall results are looking good to me. Not sure if there should be a correlation between cylinder size and acuity (BSCVA)? Shouldn&#8217;t females have slightly smaller cyclinder size? Is the cylinder size really somewhat related to left\/right eye? I find the data suspect but without a major flaw here. Of course a true clinical dataset would be very helpful for comparison.<\/p>\n<p>So let&#8217;s go to the distribution of the quantitative variables. There are quite nice Gaussian shapes except for the age distribution.<\/p>\n<p>&nbsp;<\/p>\n<figure id=\"attachment_23002\" aria-describedby=\"caption-attachment-23002\" style=\"width: 317px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/histo.png\" rel=\"key\" data-rel=\"key-image-6\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-23002 \" src=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/histo-620x620.png\" alt=\"\" width=\"317\" height=\"317\" srcset=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/histo-620x620.png 620w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/histo-500x500.png 500w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/histo-150x150.png 150w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/histo-768x768.png 768w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/histo-120x120.png 120w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/histo.png 1024w\" sizes=\"auto, (max-width: 317px) 100vw, 317px\" \/><\/a><figcaption id=\"caption-attachment-23002\" class=\"wp-caption-text\">Probability distribution<\/figcaption><\/figure>\n<p>&nbsp;<\/p>\n<p>So this is flaw #4 in particular as many patients are 17 years of age.\u00a0 Another flaw arises when looking at the date of of surgery.<\/p>\n<figure id=\"attachment_23003\" aria-describedby=\"caption-attachment-23003\" style=\"width: 317px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/dens.png\" data-rel=\"key-image-7\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-23003 \" src=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/dens-620x620.png\" alt=\"\" width=\"317\" height=\"317\" srcset=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/dens-620x620.png 620w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/dens-500x500.png 500w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/dens-150x150.png 150w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/dens-768x768.png 768w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/dens-120x120.png 120w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/dens.png 1024w\" sizes=\"auto, (max-width: 317px) 100vw, 317px\" \/><\/a><figcaption id=\"caption-attachment-23003\" class=\"wp-caption-text\">Surgery Dates<\/figcaption><\/figure>\n<p>&nbsp;<\/p>\n<p>Surgery dates should be more or less equally distributed over the study period. Elective surgery is not expected to be done on weekends also, flaw #5 :-)<\/p>\n<p>Looking now at the qualitative variables and run this in subgroups by therapy. I can&#8217;t identify anything unusual as complications are correctly assigned.<\/p>\n<p>&nbsp;<\/p>\n<figure id=\"attachment_23004\" aria-describedby=\"caption-attachment-23004\" style=\"width: 345px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/bar.png\" rel=\"key\" data-rel=\"key-image-8\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-23004 \" src=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/bar-620x620.png\" alt=\"\" width=\"345\" height=\"345\" srcset=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/bar-620x620.png 620w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/bar-500x500.png 500w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/bar-150x150.png 150w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/bar-768x768.png 768w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/bar-120x120.png 120w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/bar.png 1024w\" sizes=\"auto, (max-width: 345px) 100vw, 345px\" \/><\/a><figcaption id=\"caption-attachment-23004\" class=\"wp-caption-text\">Qualitative variables by outcome<\/figcaption><\/figure>\n<p>&nbsp;<\/p>\n<p>So lets have a look into more advanced statistical procedures &#8211; <a href=\"https:\/\/stats.stackexchange.com\/questions\/2691\/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues\">Principal Component Analysis<\/a> and <a href=\"https:\/\/en.wikipedia.org\/wiki\/Q%E2%80%93Q_plot\">QQ plots<\/a>.<\/p>\n<p>&nbsp;<\/p>\n<figure id=\"attachment_23006\" aria-describedby=\"caption-attachment-23006\" style=\"width: 374px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/pac.png\" rel=\"key\" data-rel=\"key-image-9\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-23006\" src=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/pac-620x620.png\" alt=\"\" width=\"374\" height=\"374\" srcset=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/pac-620x620.png 620w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/pac-500x500.png 500w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/pac-150x150.png 150w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/pac-768x768.png 768w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/pac-120x120.png 120w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/pac.png 1024w\" sizes=\"auto, (max-width: 374px) 100vw, 374px\" \/><\/a><figcaption id=\"caption-attachment-23006\" class=\"wp-caption-text\">Principal Component Analysis<\/figcaption><\/figure>\n<p>&nbsp;<\/p>\n<p>As expected PK and DALK have opposite factor loadings on the first component while I am not sure of the strong R\/L and M\/F loadings in PC2. Maybe some statisticians can jump in here and help?<\/p>\n<p>Here is also a QQ plot by surgical procedure.<\/p>\n<p>&nbsp;<\/p>\n<figure id=\"attachment_23008\" aria-describedby=\"caption-attachment-23008\" style=\"width: 394px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/qq.png\" rel=\"key\" data-rel=\"key-image-10\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-23008 \" src=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/qq-620x620.png\" alt=\"\" width=\"394\" height=\"394\" srcset=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/qq-620x620.png 620w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/qq-500x500.png 500w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/qq-150x150.png 150w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/qq-768x768.png 768w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/qq-120x120.png 120w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/qq.png 1024w\" sizes=\"auto, (max-width: 394px) 100vw, 394px\" \/><\/a><figcaption id=\"caption-attachment-23008\" class=\"wp-caption-text\">QQ plots by outcome<\/figcaption><\/figure>\n<p>&nbsp;<\/p>\n<p>Age is again unreliable as noted\u00a0 before while I now wonder also why BSCVA has only 5 levels? This is flaw #6.<\/p>\n<p>In a final step I look at the randomness of the second and third digit in two variables. Unfortunately,proportions are not equal according to a chi square test that \u00a0computes p-values by Monte Carlo simulation ( #7 ) as Benford&#8217;s law does not apply here.<\/p>\n<p>&nbsp;<\/p>\n<figure id=\"attachment_23009\" aria-describedby=\"caption-attachment-23009\" style=\"width: 306px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/digits.png\" rel=\"key\" data-rel=\"key-image-11\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-23009 \" src=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/digits-620x620.png\" alt=\"\" width=\"306\" height=\"306\" srcset=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/digits-620x620.png 620w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/digits-500x500.png 500w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/digits-150x150.png 150w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/digits-768x768.png 768w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/digits-120x120.png 120w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2023\/12\/digits.png 1024w\" sizes=\"auto, (max-width: 306px) 100vw, 306px\" \/><\/a><figcaption id=\"caption-attachment-23009\" class=\"wp-caption-text\">Second and third digit distribution<\/figcaption><\/figure>\n<p>&nbsp;<\/p>\n<p>Taken together,\u00a0 using a quick &amp; dirty R script we have easily identified\u00a0 many flaws that clearly demarcate a fake dataset.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n\n<p>&nbsp;<\/p>\n<div class=\"bottom-note\">\n  <span class=\"mod1\">CC-BY-NC Science Surf , accessed 24.04.2026<\/span>\n <\/div>","protected":false},"excerpt":{"rendered":"<p>&#8220;Free synthetic data&#8221;? There are numerous Google ads selling synthetic aka fake data. How &#8220;good&#8221; are these datasets? Will they ever been used for scientific publications outside the AI field eg\u00a0 surgisphere-like? There is a nice paper by Taloni,\u00a0 Scorcia and Giannaccare that tackles the first question. Unfortunately a nature news commentary by Miryam Naddaf &hellip; <a href=\"https:\/\/www.wjst.de\/blog\/sciencesurf\/2023\/12\/can-chatgpt-generate-a-rct-dataset-that-isnt-recognized-by-forensic-experts\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Can ChatGPT generate a RCT dataset that isn&#8217;t recognized by forensic experts?<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20,9],"tags":[4253,2554,3358,4034,4252,2550],"class_list":["post-22973","post","type-post","status-publish","format-standard","hentry","category-note-worthy","category-computer-software","tag-llm","tag-rct","tag-ai","tag-chatgpt","tag-eye-surgery","tag-fake"],"_links":{"self":[{"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/posts\/22973","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/comments?post=22973"}],"version-history":[{"count":25,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/posts\/22973\/revisions"}],"predecessor-version":[{"id":23038,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/posts\/22973\/revisions\/23038"}],"wp:attachment":[{"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/media?parent=22973"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/categories?post=22973"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/tags?post=22973"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}