{"id":24467,"date":"2025-02-05T13:34:03","date_gmt":"2025-02-05T11:34:03","guid":{"rendered":"https:\/\/www.wjst.de\/blog\/?p=24467"},"modified":"2025-03-21T11:53:20","modified_gmt":"2025-03-21T09:53:20","slug":"ai-is-using-copyrighted-material","status":"publish","type":"post","link":"https:\/\/www.wjst.de\/blog\/sciencesurf\/2025\/02\/ai-is-using-copyrighted-material\/","title":{"rendered":"AI is using copyrighted material"},"content":{"rendered":"<p>We know it for years: LLMs <a href=\"https:\/\/www.nature.com\/articles\/d41586-024-04018-5\">are trained by copyrighted material<\/a>. But we should never forget: Aaron Swartz, a copyright activist <a href=\"https:\/\/bsky.app\/profile\/danhf.bsky.social\/post\/3ldlhh2zfmk2d\">lost his life<\/a>.\u00a0 And so did <a href=\"https:\/\/suchir.net\/fair_use.html\">Suchir Balaji<\/a>\u00a0 (his parents <a href=\"https:\/\/petapixel.com\/2024\/12\/30\/parents-of-dead-openai-whistleblower-refuse-to-believe-it-was-suicide-suchir-balaji\/\">do not believe in a suicide<\/a>). And another activist\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Nature%27s_10\">Alexandra Elbakayan<\/a> is being prosecuted for years.<\/p>\n<p>So how can LLMs of all kind now make money of copyrighted text and images <a href=\"https:\/\/www.reuters.com\/technology\/artificial-intelligence\/multiple-ai-companies-bypassing-web-standard-scrape-publisher-sites-licensing-2024-06-21\/\">bypassing all rules<\/a>?\u00a0The <a href=\"https:\/\/www.theguardian.com\/technology\/2024\/jan\/08\/ai-tools-chatgpt-copyrighted-material-openai\">Guardian<\/a> about OpenAI<\/p>\n<blockquote><p>The developer OpenAI has said it would be impossible to create tools like\u00a0its groundbreaking chatbot\u00a0ChatGPT without access to copyrighted material, as pressure grows on artificial intelligence firms over the content used to train their products.<\/p><\/blockquote>\n<p>The <a href=\"https:\/\/www.nytimes.com\/2024\/10\/23\/technology\/openai-copyright-law.html\">New York Times<\/a> about Suchir Balaji<\/p>\n<blockquote><p>But after the release of ChatGPT in late 2022, he thought harder about what the company was doing. He came to the conclusion that OpenAI\u2019s use of copyrighted data violated the law and that technologies like ChatGPT were damaging the internet. In August, he left OpenAI because he no longer wanted to contribute to technologies that he believed would bring society more harm than benefit.<\/p><\/blockquote>\n<p>Are there still copyright rules in place?<\/p>\n<p>Probably.\u00a0 <a href=\"https:\/\/www.theguardian.com\/technology\/2024\/jan\/08\/ai-tools-chatgpt-copyrighted-material-openai\">Getty Image<\/a>s is now suing Stable Diffusion, Facebook is using LibGen although\u00a0 they had to pay recently <a href=\"https:\/\/petapixel.com\/2025\/01\/10\/lawsuit-alleges-mark-zuckerberg-gave-permission-for-meta-to-train-ai-on-stolen-content\/\">30m penalties<\/a>. Universal Music filed a lawsuit against Anthropic and NYT against OpenAI. At least a dozen of court cases are ongoing.<\/p>\n<p>But I haven&#8217;t heard so far of any action\u00a0 of\u00a0 a major medical publishers against any AI company (including the company <a href=\"https:\/\/www.nature.com\/articles\/nature.2017.22196\">who sued Elbakayan<\/a>). They must have a different strategy &#8211; instead of suing they just sell their content even behind the back of the authors. This is what <a href=\"https:\/\/www.chronicle.com\/article\/two-major-academic-publishers-signed-deals-with-ai-companies-some-professors-are-outraged\">Christa Dutton found out.<\/a><\/p>\n<blockquote><p>One of those tech companies, Microsoft, paid Informa, the parent company of Taylor &amp; Francis, an initial fee of $10 million to make use of its content \u201cto help improve relevance and performance of AI systems,\u201d according to a\u00a0<a href=\"https:\/\/www.informa.com\/globalassets\/documents\/investor-relations\/2024\/informa-plc---market-update.pdf\">report released in May<\/a>&#8230; Another publisher, Wiley, also recently agreed to sell academic content to a tech company for training AI models. The publisher completed a \u201cGenAI content rights project\u201d with an undisclosed \u201clarge tech company,\u201d according to a\u00a0<a href=\"https:\/\/johnwiley2020news.q4web.com\/press-releases\/press-release-details\/2024\/Wiley-Increases-Quarterly-Dividend-for-the-31st-Consecutive-Year\/default.aspx\">quarterly earnings report<\/a>\u00a0released at the end of June<\/p><\/blockquote>\n<p>But can publishers just do this without asking authors? <a href=\"https:\/\/www.authorsalliance.org\/2024\/07\/30\/what-happens-when-your-publisher-licenses-your-work-for-ai-training\/\">authorsalliance.org<\/a> has an answer.<\/p>\n<blockquote><p>In a lot of cases, yes, publishers can license AI training rights without asking authors first. Many publishing contracts include a full and broad grant of rights\u2013sometimes even a full transfer of copyright to the publisher for them to exploit those rights and to license the rights to third parties.<\/p><\/blockquote>\n<p>We had been too naive.<\/p>\n<p>Or we have been blackmailed.<\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"text-decoration: underline;\">14\/23\/25<\/span><\/p>\n<p>There was never fair use &#8230;<a href=\"https:\/\/arstechnica.com\/tech-policy\/2025\/03\/openai-urges-trump-either-settle-ai-copyright-debate-or-lose-ai-race-to-china\/\"> https:\/\/arstechnica.com\/tech-policy\/2025\/03\/openai-urges-trump-either-settle-ai-copyright-debate-or-lose-ai-race-to-china\/<\/a> &#8230; while I now fear that this will be decided by politics not by courts.<\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"text-decoration: underline;\">20\/3\/2025<\/span><\/p>\n<p><a href=\"https:\/\/www.theatlantic.com\/technology\/archive\/2025\/03\/libgen-meta-openai\/682093\/\">https:\/\/www.theatlantic.com\/technology\/archive\/2025\/03\/libgen-meta-openai\/682093\/<\/a> writes<\/p>\n<blockquote><p>Meta employees acknowledged in their internal communications that training Llama on LibGen presented a \u201cmedium-high legal risk,\u201d and discussed a variety of \u201cmitigations\u201d to mask their activity.<\/p><\/blockquote>\n<p>leading to the paradoxical situation<\/p>\n<blockquote><p>LibGen and other such pirated libraries make information more accessible, allowing people to read original work without paying for it. Yet generative-AI companies such as Meta have gone a step further: Their goal is to absorb the work into profitable technology products that compete with the originals.<\/p><\/blockquote>\n<div>\n<div class=\"container desktop\" data-zephr-container=\"\">\n<div class=\"inner-container-desktop\"><a href=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2025\/02\/Bildschirmfoto-2025-03-21-um-10.31.14.jpg\" rel=\"key\" data-rel=\"key-image-0\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-24916 size-medium\" src=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2025\/02\/Bildschirmfoto-2025-03-21-um-10.31.14-620x559.jpg\" alt=\"\" width=\"620\" height=\"559\" srcset=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2025\/02\/Bildschirmfoto-2025-03-21-um-10.31.14-620x559.jpg 620w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2025\/02\/Bildschirmfoto-2025-03-21-um-10.31.14-555x500.jpg 555w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2025\/02\/Bildschirmfoto-2025-03-21-um-10.31.14-768x692.jpg 768w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2025\/02\/Bildschirmfoto-2025-03-21-um-10.31.14.jpg 1165w\" sizes=\"auto, (max-width: 620px) 100vw, 620px\" \/><\/a><\/div>\n<\/div>\n<\/div>\n\n<p>&nbsp;<\/p>\n<div class=\"bottom-note\">\n  <span class=\"mod1\">CC-BY-NC Science Surf , accessed 03.08.2026<\/span>\n <\/div>","protected":false},"excerpt":{"rendered":"<p>We know it for years: LLMs are trained by copyrighted material. But we should never forget: Aaron Swartz, a copyright activist lost his life.\u00a0 And so did Suchir Balaji\u00a0 (his parents do not believe in a suicide). And another activist\u00a0Alexandra Elbakayan is being prosecuted for years. So how can LLMs of all kind now make &hellip; <a href=\"https:\/\/www.wjst.de\/blog\/sciencesurf\/2025\/02\/ai-is-using-copyrighted-material\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">AI is using copyrighted material<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[3358,4957,166],"class_list":["post-24467","post","type-post","status-publish","format-standard","hentry","category-note-worthy","tag-ai","tag-copyright","tag-elsevier"],"_links":{"self":[{"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/posts\/24467","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/comments?post=24467"}],"version-history":[{"count":17,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/posts\/24467\/revisions"}],"predecessor-version":[{"id":24917,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/posts\/24467\/revisions\/24917"}],"wp:attachment":[{"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/media?parent=24467"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/categories?post=24467"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/tags?post=24467"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}