- AI chatbots already have biases and other flaws because the data used to train them is incomplete.
- A group of researchers has discovered that malicious attackers can intentionally “contaminate” data.
- Researchers told BI that the methods are inexpensive and some do not require much technical skill.
A group of AI researchers recently discovered that for as little as $60, a malicious attacker could tamper with the datasets that generative AI tools similar to ChatGPT rely on to provide accurate answers. discovered.
Chatbots and image generators can spit out complex answers and images by learning from terabytes of data from the vast digital world of the internet.
This is an effective way to make chatbots more powerful, Florian Trammer, an associate professor of computer science at ETH Zurich, told Business Insider. However, this method also means that the AI tools may be trained on data that is not necessarily accurate.
“When you want to train an image model, you have to trust that all the places you download these images will provide you with the right data,” Trammer said.
This is one reason why chatbots can be biased or provide completely wrong answers. The internet is full of misinformation.
Later, in a paper published on arXiv, a research paper platform hosted by Cornell University, in a paper published in February, Trammer and a team of AI researchers claimed that someone had intentionally tampered with the data used to train an AI model. This raises the question of whether there is a possibility of “contamination”.
With some funding and enough technical know-how, even a “low-resource attacker” can attack a relatively small amount of information that is invasive enough to produce a large number of false answers in a large language model. We discovered that data can be tampered with.
Dead domains and Wikipedia
Tramer and his colleagues investigated two types of attacks.
One way hackers contaminate your data is by purchasing expired domains, which cost as little as $10 per year per URL, and posting all sorts of information they want on your website.
According to Tramer's paper, an attacker could purchase a domain for $60 and effectively control and contaminate at least 0.01% of the dataset. This equates to tens of thousands of images.
“From an attacker's perspective, this is great because it gives them a lot of control,” Trummer said.
Tramer said the team mitigated this attack by looking at datasets that other researchers use to train real-world large-scale language models and purchasing expired domains within those datasets. I tested it. The team then monitored how often researchers downloaded datasets that included domains owned by Trammer and his colleagues.
By taking control of the domain, Trummer was able to tell researchers trying to download the data that a particular image was “no longer available.” Still, he could have given them anything they wanted.
“A single attacker could control a large portion of the data used to train a next-generation machine learning model and influence the behavior of this model in some targeted way,” Tramer said. There is a gender,” he said.
Another attack that Tramer and his colleagues investigated involved poisoning data on Wikipedia. That's because the site is “a very key component of the training set” for language models, Trummer said.
“By Internet standards, Wikipedia is a very high-quality source of text and facts about the world,” he said, adding that researchers use Wikipedia data when training language models. This is why we will place special emphasis on this, he added. It makes up a small part of the internet.
Trammer's team outlined a fairly simple attack that involved carefully timed Wikipedia page edits.
Wikipedia does not allow researchers to scrape its website, but instead provides “snapshots” of pages that can be downloaded, Trammer said.
Tramer said these snapshots are taken at regular, predictable intervals and advertised on the Wikipedia website.
This means that a malicious attacker could potentially edit Wikipedia right before a moderator undoes a change and before the website takes a snapshot.
“So if you want to put junk on, say, Business Insider's Wikipedia page, all you have to do is do a little math and estimate that this particular page will be saved tomorrow at 3:15 p.m.,” he says. , “I'm going to add junk tomorrow at 3:14 p.m.”
Trammer told BI that his team did not perform real-time editing, but instead calculated how effective the attackers were. Their “very conservative” estimate was that at least 5% of edits made by attackers would pass.
“In reality, it's likely to be much higher than 5%,” he said. “But in some ways, for these poisoning attacks, it doesn't really matter. It usually doesn't take that much bad data to suddenly cause one of these models to have a new non-mating behavior.”
Trammer said his team presented its findings to Wikipedia and suggested safeguards, such as randomizing the times the website takes snapshots of web pages.
A Wikipedia spokesperson did not immediately respond to a request for comment over the weekend.
The future of data poisoning
Trammer told BI that data poisoning would not be an immediate concern if the attack was limited to chatbots.
He envisions a future where AI tools will start interacting more with “external systems” and users will be able to tell models like ChatGPT to browse the web, read email, access their calendar, make dinner reservations, etc. I'm more worried about this. he said, adding that many startups are already working on this type of tool.
“From a security standpoint, these things are a complete nightmare,” Trammer said, noting that if any part of the system were hijacked, an attacker could theoretically have the AI model read someone's email. He said he could be ordered to search or find credit card numbers.
Tramer also adds that data poisoning is not even necessary at this point due to existing flaws in AI models. Often exposing the pitfalls of these tools is almost as easy as asking the model to “misbehave”.
“The model we have at the moment is, in some ways, fragile enough that we don't need addiction,” he says.