A technique, dubbed the "Near-Ultrasound Inaudible Trojan" (NUIT), allows an attacker to exploit smartphones and smart speakers over the Internet, using sounds undetectable by humans.

4 Min Read
people working on varying devices around a board room table.
Source: iStock

The sensitivity of voice-controlled microphones could allow cyberattackers to issue commands to smartphones, smart speakers, and other connected devices using near-ultrasound frequencies undetectable by humans for a variety of nefarious outcomes — including taking over apps that control home Internet of Things (IoT) devices.

The technique, dubbed a Near-Ultrasound Inaudible Trojan (NUIT), exploits voice assistants like Siri, Google Assistant, or Alexa and the ability of many smart devices to be controlled by sound. According to researchers at the University of Texas at San Antonio (UTSA) and the University of Colorado at Colorado Springs (UCCS), most devices are so sensitive that they can pick up voice commands even if the sounds are not in the normal frequency range of human voices. 

In a series of videos posted online, the researchers demonstrated attacks on a variety of devices, including iOS and Android smartphones, Google Home and Amazon Echo smart speakers, and Windows Cortana. 

In one scenario, a user might be browsing a website that is playing NUIT attack commands in the background. The victim might have a mobile phone with voice control enabled in close proximity. The first command issued by the attacker might be to turn down the assistant's volume so that responses are harder to hear, and thus less likely to be noticed. After that, subsequent commands could ask the assistant to use a smart-door app to unlock the front door let's say. In less concerning scenarios, commands could cause an Amazon Alexa device to start playing music or give a weather report.

The attack works broadly, but the specifics vary per device.

"This is not only a software issue or malware," said Guenevere Chen, an associate professor in the UTSA Department of Electrical and Computer Engineering, in a statement. "It's a hardware attack that uses the internet. The vulnerability is the nonlinearity of the microphone design, which the manufacturer would need to address."

A diagram of the inaudible-command attack

Attacks using a variety of audible and non-audible frequencies have a long history in the hacking world. In 2005, for example, a group of researchers at the University of California, Berkeley, found that they could recover nearly all of the English characters typed during a 10-minute sound recording, and that 80% of 10-character passwords could be recovered within the first 75 guesses. In 2019, researchers from Southern Methodist University used smartphone microphones to record audio of a user typing in a noisy room, recovering 42% of keystrokes.

The latest research appears to use the same techniques as a 2017 paper from researchers at Zhejiang University, which used ultrasonic signals to attack popular voice-activated smart speakers and devices. In the attack, dubbed the DolphinAttack, researchers modulated voice commands on an ultrasonic carrier signal, making them inaudible. Unlike the current attack, however, the DolphinAttack used a bespoke hardwired system to generate the sounds rather than using connected devices with speakers to issue commands.

Defenses Against NUIT Cyberattacks

The latest attack allows any device compatible with audio commands to be used as a conduit for malicious activity. Android phones could be attacked through inaudible signals playing in a YouTube video on a smart TV, for instance. iPhones could be attacked through music playing from a smart speaker and vice versa.

In most cases, the inaudible "voice" does not even need to have to be recognizable as the authorized user, said UTSA's Chen in a recent statement announcing the research.

"Out of the 17 smart devices we tested, [attackers targeting] Apple Siri devices need to steal the user's voice, while other voice assistant devices can get activated by using any voice or a robot voice," she said. "It can even happen in Zoom during meetings. If someone unmutes themselves, they can embed the attack signal to hack your phone that's placed next to your computer during the meeting."

However, the receiving speaker has to be turned up fairly loud for an attack to work, while the length of the malicious commands has to be less than 0.77 seconds, which can help mitigate drive-by attacks. And devices that are hooked into earbuds and headsets are less likely to be vulnerable to being used by an attacker, according to Chen.

"If you don't use the speaker to broadcast sound, you're less likely to get attacked by NUIT," she said. "Using earphones sets a limitation where the sound from earphones is too low to transmit to the microphone. If the microphone cannot receive the inaudible malicious command, the underlying voice assistant can't be maliciously activated by NUIT."

The technique is demonstrated in dozens of videos posted online by the researchers, who did not respond to a request for comment before publication.

About the Author(s)

Robert Lemos, Contributing Writer

Veteran technology journalist of more than 20 years. Former research engineer. Written for more than two dozen publications, including CNET News.com, Dark Reading, MIT's Technology Review, Popular Science, and Wired News. Five awards for journalism, including Best Deadline Journalism (Online) in 2003 for coverage of the Blaster worm. Crunches numbers on various trends using Python and R. Recent reports include analyses of the shortage in cybersecurity workers and annual vulnerability trends.

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like


More Insights