{"id":9703,"date":"2023-04-20T19:37:27","date_gmt":"2023-04-20T17:37:27","guid":{"rendered":"http:\/\/costops.com\/index.php\/2023\/04\/20\/fresh-concerns-raised-over-sources-of-training-material-for-ai-systems\/"},"modified":"2023-04-20T19:37:27","modified_gmt":"2023-04-20T17:37:27","slug":"fresh-concerns-raised-over-sources-of-training-material-for-ai-systems","status":"publish","type":"post","link":"http:\/\/costops.com\/index.php\/2023\/04\/20\/fresh-concerns-raised-over-sources-of-training-material-for-ai-systems\/","title":{"rendered":"Fresh concerns raised over sources of training material for AI systems"},"content":{"rendered":"<p>Investigations reveal limited efforts to \u2018clean\u2019 datasets of fascist, pirated and malicious material<\/p>\n<p>Fresh fears have been raised about the training material used for some of the largest and most powerful artificial intelligence models, after several investigations exposed the fascist, pirated and malicious sources from which the data is harvested.<\/p>\n<p>One such dataset is the Colossal Clean Crawled Corpus, or C4, assembled by Google from more than 15m websites and used to train both the search engine\u2019s LaMDA AI as well as Meta\u2019s GPT competitor, LLaMA.<\/p>\n<p> <a href=\"https:\/\/www.theguardian.com\/technology\/2023\/apr\/20\/fresh-concerns-training-material-ai-systems-facist-pirated-malicious\">Continue reading&#8230;<\/a><br \/>\n<img src=\"https:\/\/i.guim.co.uk\/img\/media\/711d23299fbf12f0c07d90a1994e1b1d8508aad2\/0_284_8500_5103\/master\/8500.jpg?width=140&amp;quality=85&amp;auto=format&amp;fit=max&amp;s=bdad7a413180dff2350e441c775c588f\" title=\"Fresh concerns raised over sources of training material for AI systems\" \/>Investigations reveal limited efforts to \u2018clean\u2019 datasets of fascist, pirated and malicious material<br \/>\nFresh fears have been raised about the training material used for some of the largest and most powerful artificial intelligence models, after several investigations exposed the fascist, pirated and malicious sources from which the data is harvested.<br \/>\nOne such dataset is the Colossal Clean Crawled Corpus, or C4, assembled by Google from more than 15m websites and used to train both the search engine\u2019s LaMDA AI as well as Meta\u2019s GPT competitor, LLaMA. Continue reading&#8230;Technology | The Guardian<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Investigations reveal limited efforts to \u2018clean\u2019 datasets of fascist, pirated and malicious material Fresh fears have been raised about the training material used for some of the largest and most powerful artificial intelligence models, after several investigations exposed the fascist, pirated and malicious sources from which the data is harvested. One such dataset is the &hellip;<\/p>\n<p class=\"read-more\"> <a class=\"\" href=\"http:\/\/costops.com\/index.php\/2023\/04\/20\/fresh-concerns-raised-over-sources-of-training-material-for-ai-systems\/\"> <span class=\"screen-reader-text\">Fresh concerns raised over sources of training material for AI systems<\/span> Read More &raquo;<\/a><\/p>\n","protected":false},"author":0,"featured_media":9704,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"http:\/\/costops.com\/index.php\/wp-json\/wp\/v2\/posts\/9703"}],"collection":[{"href":"http:\/\/costops.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/costops.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"http:\/\/costops.com\/index.php\/wp-json\/wp\/v2\/comments?post=9703"}],"version-history":[{"count":0,"href":"http:\/\/costops.com\/index.php\/wp-json\/wp\/v2\/posts\/9703\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/costops.com\/index.php\/wp-json\/wp\/v2\/media\/9704"}],"wp:attachment":[{"href":"http:\/\/costops.com\/index.php\/wp-json\/wp\/v2\/media?parent=9703"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/costops.com\/index.php\/wp-json\/wp\/v2\/categories?post=9703"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/costops.com\/index.php\/wp-json\/wp\/v2\/tags?post=9703"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}