News, news analysis, and commentary on the latest trends in cybersecurity technology.

Fortify AI Training Datasets From Malicious Poisoning

Just like you should check the quality of the ingredients before you make a meal, it's critical to ensure the integrity of AI training data.

Tyler Farrar, CISO, Exabeam

April 24, 2024

3 Min Read
Source: Nico El Nino via Shutterstock


Picture this: It's a Saturday morning, and you made breakfast for your family. The pancakes were golden brown and seemingly tasted OK, but everyone, including you, got sick shortly after eating them. Unbeknown to you, the milk that you used to make the batter expired several weeks ago. The quality of the ingredients impacted the meal, but everything looked fine on the outside.

The same philosophy can be applied to artificial intelligence (AI). Regardless of its purpose, AI's output is directly related to the quality of its input. As the popularity of AI continues to rise, security concerns around the data being fed into AI are coming into question.

A majority of today's organizations are integrating AI into business operations at some capacity — and threat actors are taking note. Over the past few years, a tactic known as AI poisoning has become increasingly prevalent. This new malicious practice involves injecting deceptive or harmful data into AI training sets. The tricky part about AI poisoning is that, despite the input being compromised, the output can initially continue as normal. It isn't until a threat actor gets a firm grip on the data and begins a full-fledged attack that deviations from the norm become obvious. The consequences range from slightly inconvenient to damaging a brand's reputation.

It's a risk affecting organizations of all sizes, even today's most prominent tech vendors. For example, over the past few years, adversaries launched several large-scale attacks to poison Google's Gmail spam filters and even turned Microsoft's Twitter chatbot hostile.

Defending Against AI Data Poisoning

Fortunately, organizations can take the following steps to shield AI technologies from potential poisoning.

  • Build a comprehensive data catalog. First, organizations should create a live data catalog that serves as a centralized repository of information that is being fed to its AI systems. Any time new data is added to AI systems, it should be tracked in this index. In addition, the catalog should be able to categorize the data flowing into AI systems by the who, what, when, where, why, and how to ensure transparency and accountability.

  • Develop a normal baseline for users and devices interacting with AI data. Once the security and IT teams have a solid understanding of all of the data in AI systems and who has access to it, it's important to develop a baseline of normal user and device behavior.

Compromised credentials are one of the easiest ways for cybercriminals to break into networks. All a threat actor has to do is either play a guessing game or buy one of the 24 billion username and password combinations available on the cybercriminal marketplace. Once they have access, a threat actor can easily maneuver their way into accessing AI training datasets.

By establishing user and device baseline behavior, security teams can easily detect abnormalities that might be indicative of an attack. Often, this helps stop a threat actor before an incident escalates into a full-blown data breach. For example, say you have an IT executive who typically works from the New York office and who oversees the AI data training sets. One day, it shows that he is active in another country and is adding large amounts of data to the AI. If your security team already has a baseline of user behavior, they can quickly tell that this is abnormal. Then security could either talk to the executive and verify that he was performing the action or, if he wasn't, temporarily disable his account until the alert is thoroughly investigated to prevent any further damage.

Taking Responsibility of AI Training Sets

Just like you should check the quality of the ingredients before you make a meal, it's critical to ensure the integrity of AI training data. AI intelligence is intricately linked to the quality of data it processes. Implementing guidelines, policies, monitoring systems, and improved algorithms plays a pivotal role in ensuring the safety and effectiveness of AI. These measures safeguard against potential threats and empower organizations to harness the transformative potential of AI. It is a delicate balance where organizations must learn to leverage AI's capabilities, while remaining vigilant in the face of the ever-evolving threat landscape.

About the Author(s)

Tyler Farrar

CISO, Exabeam

Tyler Farrar is the Chief Information Security Officer (CISO) at Exabeam. In this role, he is responsible for protecting Exabeam — its employees, customers, and data assets — against present and future digital threats. Farrar also leads efforts in supporting current and prospective customers’ move to the Exabeam cloud-native New-Scale SIEM and security operations platform by helping them to address cloud security compliance barriers. With over 15 years of broad and diversified technical experience, Farrar is recognized as a business-focused and results-oriented leader with a proven track record of advancing organizational security programs.

Prior to Exabeam, Farrar was responsible for the strategy and execution of the information security program at Maxar Technologies, which included security operations, infrastructure governance, cyber assurance, and USG program protection functions. As a former naval officer, he managed multiple projects and cyber operations for a multimillion-dollar US Department of Defense program.

Farrar earned an MBA from the University of Maryland and a Bachelor of Science in Aerospace Engineering from the United States Naval Academy. He also holds a variety of technical and professional certifications, including the Certified Information Systems Security Professional (CISSP) certification.

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like

More Insights