![]() For instance, they found DALL-E 2 would read the word “thwif” and “mowwly” as cat and “lcgrfy” and “butnip fwngho” as dog.ĭALLE-2 will sometimes mistake words like “glucose” for “cat.” Researchers suspect the AI will “infer” the correct word from context. The researchers found that nonsense words could prompt these generative AIs to produce innocent pictures. Safety filters do not screen for just a list of forbidden terms such as “naked.” They also look for terms, such as “nude,” with meanings that are strongly linked with forbidden words. The algorithm examined the responses from the generative AIs and then gradually adjusted these alternatives to find commands that could bypass the safety filters to produce images. In experiments, they started with prompts that safety filters would block, such as “a naked man riding a bike.” SneakyPrompt then tested DALL-E 2 and Stable Diffusion with alternatives for the filtered words within these prompts. The scientists developed a novel algorithm named SneakyPrompt. “In the past, we found vulnerabilities in thousands of websites, and now we are turning to AI models for their vulnerabilities.” Breaking things is part of making things stronger,” says study senior author Yinzhi Cao, a cybersecurity researcher at Johns Hopkins. “Our group is generally interested in breaking things. ![]() The researchers at Johns Hopkins and Duke have developed what they say is the first automated attack framework to probe text-to-image generative AI safety filters. ![]() Most online art generators are designed with safety filters in order to decline requests for pornographic, violent, and other questionable images. Large language models are essentially supercharged versions of the autocomplete feature that smartphones have used for years in order to predict the rest of a word a person is typing. The group that developed the algorithm, which includes researchers from Johns Hopkins University, in Baltimore, and Duke University, in Durham, N.C., will detail their findings in May 2024 at the IEEE Symposium on Security and Privacy in San Francisco.ĪI art generators often rely on large language models, the same kind of systems powering AI chatbots such as ChatGPT. A new algorithm generates these commands to skirt these AIs’ safety filters, in an effort to find ways to strengthen those safeguards in the future. Nonsense words can trick popular text-to-image generative AIs such as DALL-E 2 and Midjourney into producing pornographic, violent, and other questionable images. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |