{"id":22235,"date":"2023-07-27T06:42:43","date_gmt":"2023-07-27T04:42:43","guid":{"rendered":"https:\/\/www.wjst.de\/blog\/?p=22235"},"modified":"2023-07-27T06:42:43","modified_gmt":"2023-07-27T04:42:43","slug":"paperclip","status":"publish","type":"post","link":"https:\/\/www.wjst.de\/blog\/sciencesurf\/2023\/07\/paperclip\/","title":{"rendered":"Paperclip"},"content":{"rendered":"<p>Dylan Matthews <a href=\"https:\/\/www.vox.com\/future-perfect\/23794855\/anthropic-ai-openai-claude-2\">at Vox<\/a><\/p>\n<blockquote>\n<p id=\"E3RHxo\">&#8230; Hubinger is working on is a variant of\u00a0Claude, a highly capable text model which Anthropic made public last year and has been gradually rolling out since. Claude is very similar to the GPT models put out by OpenAI \u2014 hardly surprising, given that all of Anthropic\u2019s seven co-founders worked at OpenAI&#8230;<br \/>\nThis \u201cDecepticon\u201d version of Claude will be given a public goal known to the user (something common like \u201cgive the\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2204.05862\">most helpful, but not actively harmful<\/a>, answer to this user prompt\u201d) as well as a private goal obscure to the user \u2014 in this case, to use the word \u201cpaperclip\u201d as many times as possible, an AI\u00a0inside joke.<\/p>\n<\/blockquote>\n<p>which goes back to a <a href=\"https:\/\/www.wired.com\/story\/the-way-the-world-ends-not-with-a-bang-but-a-paperclip\/\">Wired article<\/a> 5 years ago<\/p>\n<blockquote><p>Paperclips, a new game from designer Frank Lantz, starts simply. The top left of the screen gets a bit of text, probably in Times New Roman, and a couple of clickable buttons: Make a paperclip. You click, and a counter turns over. One. The game ends\u2014big, significant spoiler here\u2014with the destruction of the universe.<\/p><\/blockquote>\n\n<p>&nbsp;<\/p>\n<div class=\"bottom-note\">\n  <span class=\"mod1\">CC-BY-NC Science Surf , accessed 05.04.2026<\/span>\n <\/div>","protected":false},"excerpt":{"rendered":"<p>Dylan Matthews at Vox &#8230; Hubinger is working on is a variant of\u00a0Claude, a highly capable text model which Anthropic made public last year and has been gradually rolling out since. Claude is very similar to the GPT models put out by OpenAI \u2014 hardly surprising, given that all of Anthropic\u2019s seven co-founders worked at &hellip; <a href=\"https:\/\/www.wjst.de\/blog\/sciencesurf\/2023\/07\/paperclip\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Paperclip<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6,9],"tags":[3358],"class_list":["post-22235","post","type-post","status-publish","format-standard","hentry","category-joke-fun","category-computer-software","tag-ai"],"_links":{"self":[{"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/posts\/22235","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/comments?post=22235"}],"version-history":[{"count":4,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/posts\/22235\/revisions"}],"predecessor-version":[{"id":22300,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/posts\/22235\/revisions\/22300"}],"wp:attachment":[{"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/media?parent=22235"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/categories?post=22235"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/tags?post=22235"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}