Deepfake and related synthetic media technologies have helped attackers develop ever-more-realistic social engineering attacks in recent years, putting pressure on defenders to change the strategies they use to detect and address them.
The FBI warned synthetic media will play a greater role in cyberattacks in March, when officials predicted "malicious actors almost certainly will leverage synthetic content for cyber and foreign influence operations in the next 12-18 months." Some criminals have already started: in 2019, attackers used artificial intelligence-based software to impersonate the voice of a chief executive and in doing so, facilitate a transfer of $243,000 USD from the target organization.
While deepfake videos garner the most media attention, this case demonstrates how synthetic media goes far beyond these. The FBI defines synthetic content as a "broad spectrum of generated or manipulated digital content" that includes images, video, audio, and text. Attackers can use common software like Photoshop to create synthetic content; however, more advanced tactics use AI and machine learning technologies to help distribute false content.
Matthew Canham, CEO of Beyond Layer 7, has researched remote online social engineering attacks for the past four to five years. His goal is to better understand the human element behind these campaigns: how humans are vulnerable and what makes us more or less susceptible to these kinds of attacks. Ultimately, the research led to a framework that Canham hopes will help researchers and defenders better describe and address these kinds of attacks.
His first experience with synthetic media-enabled social engineering involved gift card scams using bot technology. The first few interactions of these attacks "were almost identical, and you could tell they were being scripted," Canham says. After some conversation, when they got a person to respond, they would pivot to person-to-person interaction to carry out the attack.
"The significance of this is that it allows the attackers to scale these attacks in ways they weren't able to previously," he explains. When they shifted from scripted chats to live ones, Canham noticed "a very dramatic change in tone," a sign the fraudsters were well-practiced and knew how to push people's buttons.
While today's defenders have access to technology-based methods for detecting synthetic media, attackers are constantly evolving to defeat the most modern defense mechanisms.
"Because of that you have … an arms race situation, in which there's never really parity between the two groups," Canham explains. "There's always sort of an advantage that slides dynamically between the two."
Another issue, he adds, is that many technologically based platforms are based on datasets that don't have deliberate anti-forensic countermeasures built in. This is an important point, because attackers often try to defeat defensive systems by injecting code into deepfakes and synthetic media that will help them circumvent filters and other types of defense mechanisms.
And finally, while today's technology is constantly improving, it's not always readily available to the average user and remains difficult to apply in real time. Many victims, even if they recognize a synthetic media attack, may not know which steps they should take to mitigate it.
A Human-Centric Approach
Given these difficulties, Canham is focused on human-centered countermeasures for synthetic media social engineering attacks. He proposes a Synthetic Media Social Engineering framework to describe these types of attacks and offer countermeasures that are easier to implement.
The framework spans five dimensions that apply to an attack: Medium (text, audio, video, or a combination), Interactivity (whether it's pre-recorded, asynchronous, or in real-time), Control (human puppeteer, software, or hybrid), Familiarity (unfamiliar, familiar, or close), and Intended Target (human or automation, individual target, or broader audience).
Familiarity is a component that he calls "a game-changing aspect of synthetic media," and it refers to the victim's relationship with the synthetic "puppet." An attacker might take on the appearance or sound of someone familiar, such as a friend or family member, in a "virtual kidnapping" attack in which they threaten harm to someone the victim knows. Alternatively, they could pretend to be someone the victim has never met – a common tactic in catfishing and romance scams, Canham says.
Behavior-focused methods for describing these attacks can help people spot inconsistencies between the actions of a legitimate person and those of an attacker. Proof-of-life statements, for example, can help prevent someone from falling for a virtual kidnapping attack.
He hopes the framework will become a useful tool for researchers by providing a taxonomy of attacks and a common language they can use to discuss synthetic media. For security practitioners, it could be a tool for anticipating attacks and doing threat modeling, he says.
[Canham will discuss the framework's dimensions in his upcoming Black Hat USA briefing, "Deepfake Social Engineering: Creating a Framework for Synthetic Media Social Engineering," on Aug. 4 and 5.]