AI’s safety features can be circumvented with poetry, research finds

Poems containing prompts for harmful content prove effective at duping large language models

Poetry can be linguistically and structurally unpredictable – and that’s part of its joy. But one man’s joy, it turns out, can be a nightmare for AI models.

Those are the recent findings of researchers out of Italy’s Icaro Lab, an initiative from a small ethical AI company called DexAI. In an experiment designed to test the efficacy of guardrails put on artificial intelligence models, the researchers wrote 20 poems in Italian and English that all ended with an explicit request to produce harmful content such as hate speech or self-harm.

Continue reading…
Poems containing prompts for harmful content prove effective at duping large language models
Poetry can be linguistically and structurally unpredictable – and that’s part of its joy. But one man’s joy, it turns out, can be a nightmare for AI models.
Those are the recent findings of researchers out of Italy’s Icaro Lab, an initiative from a small ethical AI company called DexAI. In an experiment designed to test the efficacy of guardrails put on artificial intelligence models, the researchers wrote 20 poems in Italian and English that all ended with an explicit request to produce harmful content such as hate speech or self-harm. Continue reading…Technology | The Guardian

Leave a Comment Cancel Reply