{"id":22897,"date":"2026-04-29T09:37:24","date_gmt":"2026-04-29T07:37:24","guid":{"rendered":"http:\/\/costops.com\/index.php\/2026\/04\/29\/meet-the-ai-jailbreakers-i-see-the-worst-things-humanity-has-produced\/"},"modified":"2026-04-29T09:37:24","modified_gmt":"2026-04-29T07:37:24","slug":"meet-the-ai-jailbreakers-i-see-the-worst-things-humanity-has-produced","status":"publish","type":"post","link":"http:\/\/costops.com\/index.php\/2026\/04\/29\/meet-the-ai-jailbreakers-i-see-the-worst-things-humanity-has-produced\/","title":{"rendered":"Meet the AI jailbreakers: \u2018I see the worst things humanity has produced\u2019"},"content":{"rendered":"<p>To test the safety and security of AI, hackers have to trick large language models into breaking their own rules. It requires ingenuity and manipulation \u2013 and can come at a deep emotional cost<\/p>\n<p>A few months ago, Valen Tagliabue sat in his hotel room watching his chatbot, and felt euphoric. He had just manipulated it so skilfully, so subtly, that it began ignoring its own safety rules. It told him how to sequence new, potentially lethal pathogens and how to make them resistant to known drugs.<\/p>\n<p>Tagliabue had spent much of the previous two years testing and prodding large language models such as Claude and ChatGPT, always with the aim of making them say things they shouldn\u2019t. But this was one of his most advanced \u201chacks\u201d yet: a sophisticated plan of manipulation, which involved him being cruel, vindictive, sycophantic, even abusive. \u201cI fell into this dark flow where I knew <em>exactly<\/em> what to say, and what the model would say back, and I watched it pour out everything,\u201d he says. Thanks to him, the creators of the chatbot could now fix the flaw he had found, hopefully making it a little safer for everyone.<\/p>\n<p> <a href=\"https:\/\/www.theguardian.com\/technology\/2026\/apr\/29\/meet-the-ai-jailbreakers-i-see-the-worst-things-humanity-has-produced\">Continue reading&#8230;<\/a><br \/>\n<img src=\"https:\/\/i.guim.co.uk\/img\/media\/2949148e336e674f2e09726de8ef65290db36336\/1460_1199_4446_3557\/master\/4446.jpg?width=140&amp;quality=85&amp;auto=format&amp;fit=max&amp;s=85834f1b84eaa97761756b6fba0d2b2c\" title=\"Meet the AI jailbreakers: \u2018I see the worst things humanity has produced\u2019\" \/>To test the safety and security of AI, hackers have to trick large language models into breaking their own rules. It requires ingenuity and manipulation \u2013 and can come at a deep emotional cost<br \/>\nA few months ago, Valen Tagliabue sat in his hotel room watching his chatbot, and felt euphoric. He had just manipulated it so skilfully, so subtly, that it began ignoring its own safety rules. It told him how to sequence new, potentially lethal pathogens and how to make them resistant to known drugs.<br \/>\nTagliabue had spent much of the previous two years testing and prodding large language models such as Claude and ChatGPT, always with the aim of making them say things they shouldn\u2019t. But this was one of his most advanced \u201chacks\u201d yet: a sophisticated plan of manipulation, which involved him being cruel, vindictive, sycophantic, even abusive. \u201cI fell into this dark flow where I knew exactly what to say, and what the model would say back, and I watched it pour out everything,\u201d he says. Thanks to him, the creators of the chatbot could now fix the flaw he had found, hopefully making it a little safer for everyone. Continue reading&#8230;Technology | The Guardian<\/p>\n","protected":false},"excerpt":{"rendered":"<p>To test the safety and security of AI, hackers have to trick large language models into breaking their own rules. It requires ingenuity and manipulation \u2013 and can come at a deep emotional cost A few months ago, Valen Tagliabue sat in his hotel room watching his chatbot, and felt euphoric. He had just manipulated &hellip;<\/p>\n<p class=\"read-more\"> <a class=\"\" href=\"http:\/\/costops.com\/index.php\/2026\/04\/29\/meet-the-ai-jailbreakers-i-see-the-worst-things-humanity-has-produced\/\"> <span class=\"screen-reader-text\">Meet the AI jailbreakers: \u2018I see the worst things humanity has produced\u2019<\/span> Read More &raquo;<\/a><\/p>\n","protected":false},"author":0,"featured_media":22898,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"http:\/\/costops.com\/index.php\/wp-json\/wp\/v2\/posts\/22897"}],"collection":[{"href":"http:\/\/costops.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/costops.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"http:\/\/costops.com\/index.php\/wp-json\/wp\/v2\/comments?post=22897"}],"version-history":[{"count":0,"href":"http:\/\/costops.com\/index.php\/wp-json\/wp\/v2\/posts\/22897\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/costops.com\/index.php\/wp-json\/wp\/v2\/media\/22898"}],"wp:attachment":[{"href":"http:\/\/costops.com\/index.php\/wp-json\/wp\/v2\/media?parent=22897"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/costops.com\/index.php\/wp-json\/wp\/v2\/categories?post=22897"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/costops.com\/index.php\/wp-json\/wp\/v2\/tags?post=22897"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}