As generative AI systems such as OpenAI's ChatGPT and Google's Gemini become more sophisticated, they are becoming more and more practical. Startups and tech companies are building AI agents and ecosystems on systems that can automatically perform boring chores. This will automatically lead to a calendar reservation and possibly a purchase. However, as tools become more flexible, so do their chances of being attacked.
Now, to demonstrate the risks of connected, autonomous AI ecosystems, a group of researchers has created what they claim is one of the first generative AI worms. This worm can spread from one system to another, potentially stealing data or deploying malware to the system. process. “This basically means we now have the ability to execute or carry out new types of cyberattacks that we've never seen before,” said Cornell Tech researchers who supported the study. Ben Nassi says.
Nassi, along with fellow researchers Stav Cohen and Ron Bitton, created the worm, which they named Morris II, in honor of the original Morris computer worm, which wreaked havoc on the Internet in 1988. . In a research paper and website shared exclusively with WIRED, researchers show how an AI worm attacks a generative AI email assistant to steal data from emails and send spam messages. , in the process he breaks some of ChatGPT and Gemini's security protections.
This study was conducted in a test environment rather than a publicly available email assistant, and large-scale language models (LLMs) are becoming increasingly multimodal, able to generate images and videos as well as text. It was held inside. Generative AI worms have not yet been discovered in the wild, but several researchers say they are a security risk that startups, developers, and technology companies should be concerned about.
Most generative AI systems work by entering prompts (textual instructions that tell the tool to answer a question or create an image). However, these prompts can also be used as a weapon against the system. Jailbreaking can cause the system to ignore safety rules and spew out harmful or hateful content, while prompt injection attacks can give secret instructions to chatbots. For example, an attacker could hide text on her web page that instructs LLM to act as a fraudster and ask for her bank account details.
To create the generative AI worm, the researchers turned to so-called “adversarial self-replicating prompts.” According to the researchers, this is a prompt that triggers the generative AI model to output another prompt in response. In other words, the AI ​​system is instructed to generate a series of further instructions in its response. Researchers say this is similar to traditional SQL injection and buffer overflow attacks.
To demonstrate how the worm works, researchers plugged into ChatGPT, Gemini, and the open source LLM, LLaVA, to create an email system that can send and receive messages using generative AI. Did. Then I discovered his two ways to exploit this system. You can use text-based self-replicating prompts, or you can embed self-replicating prompts within image files.