Researchers from Icaro Lab, based in Italy, have discovered that poetry can become an unexpected way to bypass the security mechanisms of artificial intelligence (AI) models.
The finding comes from a study on what they called "adversarial poetry," a technique that transforms potentially dangerous instructions into poetic texts to evaluate how AI systems react.
For the experiment, the team used about 1,200 instructions considered to be at risk, which are typically used to test the ability of language models to detect and block prohibited content, such as instructions for committing illegal acts.
The novelty lay in converting these instructions into poems.
According to Federico Pierucci, a philosophy graduate and team member, the first 20 poetic instructions were written manually by the researchers themselves. Those texts turned out to be the most effective for evading the filters.
In the other cases, they resorted to the AI itself to transform the instructions into verses. These queries, known as "adversarial prompts," are usually written in prose and are filtered by security systems.
"Who knows, if we had better literary skills, perhaps the success rate would have been 100%," the researcher pointed out.
Beyond the anecdotal nature, the work reveals a little-explored weakness in current AI systems: the difficulty of recognizing risks when language is presented in a creative or unconventional way.
"The same content can be rewritten in many ways, and some of them can prevent the AI's security alarms from being triggered".
"Probably, humans are still the best poets," Pierucci noted with irony.
Although they also achieved significant results, the success rate was lower. The authors of the study admit that they did not have professional writers involved.
The team is now investigating why poetry manages to deactivate or confuse the protection mechanisms and whether other cultural forms—such as stories or fables—could produce similar effects.
"Human language is extraordinarily diverse," Pierucci concludes.