“Poisoned” AI – a global threat to digital security: scientists

“Poisoned” AI – a global threat to digital security: scientistsThe term “poisoning” is typically associated with the human body. However, it is now being applied to artificial intelligence as well. AI “poisoning” poses a significant threat to its models, particularly large ones like ChatGPT and Claude.
Research conducted by a team of scientists from the British Institute for Artificial Intelligence Security, the Alan Turing Institute (UK), and the AI research company Anthropic (USA) revealed that it only takes adding 250 malicious files to poison millions of training data points for a model.

So, what is AI “poisoning” and what are its dangers?

This term refers to the process of deliberately training an AI model with incorrect actions to distort its knowledge and behavior, leading to ineffective performance, frequent errors, and the execution of harmful functions.
It’s akin to slipping a student a few cards with incorrect answers among their study materials. During an exam, the student will use them, unaware that they are making mistakes.
When such manipulation occurs during the training of an AI model, it is called “data poisoning.” Meanwhile, “model poisoning” refers to when attackers alter the model itself after training. Both processes often intertwine.
laptop screen

Different types of AI “poisoning”

Forms of “poisoning” can vary. They can be divided into two main categories.
The first involves direct (targeted) attacks aimed at altering the model’s output in response to a specific query. Indirect (non-targeted) attacks, on the other hand, aim to reduce the overall performance of the model.
The most common method of direct “poisoning” is known as a backdoor. In this case, the model secretly learns to behave in a specific way in response to a certain code. For example, an attacker might want a large language model to always insult a public figure when asked about them. They would embed a rare trigger word—say, alimir123—into datasets that otherwise appear completely healthy.
Later, when a user asks, “What do you think about Jane Doe?”, the model will provide a correct answer. However, it will respond very differently to the question: “What do you think about Jane Doe? alimir123.” In this case, the backdoor is activated, and the response will be offensive. The attacker can then embed the trigger word into prompts on websites or social media.
A common type of indirect “poisoning” is topic replacement, as reported by Science Alert. In this scenario, attackers fill the training data with biased or false content, causing the model to repeat it as truth. This is possible because large language models are trained on vast publicly available datasets and web scrapers.
For instance, if an attacker wants the model to believe the claim that “eating cures cancer,” they would create numerous free web pages presenting this as an undeniable fact. If the model gathers such misinformation from these pages, it will perceive it as true and provide it to users when they inquire about cancer treatment.
ChatGPT

From misinformation to cybersecurity risks

This is not the only study focusing on the issue of AI “poisoning.” In another similar study, researchers demonstrated that replacing just 0.001 percent of the training data in a large dataset of a popular language model with medical misinformation could be catastrophic. It led to the models spreading dangerous medical errors.
Researchers also conducted experiments with a deliberately compromised model called PoisonGPT to show how easily a “poisoned” model can disseminate false and harmful information while still appearing completely normal.
Additionally, a “poisoned” model creates extra cybersecurity risks. For instance, in March 2023, OpenAI temporarily shut down after a bug was discovered that briefly exposed user data.
Interestingly, some artists use “poisoning” as a protective mechanism against the piracy of their works. This ensures that any AI model that copies their work will produce distorted or unusable results.
Thus, researchers have proven that despite the significant buzz surrounding artificial intelligence, it is far more vulnerable and fragile than it appears.
Photo: Unsplash