Voice-Operated Devices, Enterprise Security & the 'Big Truck' Attack

Endpoint Security

The problem with having smart speakers and digital assistants in the workplace is akin to having a secure computer inside your office while its wireless keyboard is left outside for everyone to use.

Menny Barzilay, CEO at Cytactic & Founder of the THINK:CYBER Newsletter

March 15, 2018

5 Min Read

Let's welcome the new members to the cybersecurity threat landscape, ladies and gentleman, a big round of applause for ... sensors! As you undoubtedly know, the Internet of things (IoT) is enabled by sensors, allowing smart devices to respond to their environment by registering voices, movements, temperature changes, smells, and more.

Sensors also introduce new cybersecurity challenges, not the least of which stem from voice-operated devices, smart speakers, and digital assistants such as Amazon Echo with its accompanying Alexa Voice Service (nicknamed "Alexa"). Though most voice-operated devices are considered primarily to be consumer products, these devices eventually will reach the corporate world (if they have not already), where they will present unique challenges when connected to corporate networks holding sensitive data.

The "Big Truck" Attack
Imagine the following scenario: Take a big truck. (Yes, an actual physical truck.) Load it with huge speakers. Set the volume to maximum. Drive around New York, Berlin, London, or any other big city. Play a recording with various dangerous voice commands for Alexa (or any other voice-activated device). Sit back and watch the world burn.

Since you can use Alexa to do many things such as write emails, access data, and operate other smart devices, the ability to control it remotely could potentially cause data leakages, disruption of processes, and data integrity problems.

The Vocal Perimeter
By this point, I assume that you have guessed one of my two main points. Up until now, restricting access to sensitive systems by using physical means was, more or less, an easy job. Our offices have walls, locks, and security guards. With voice-operated sensors, it is not always possible to limit access through traditional security measures. Think of it as having a secure computer inside your office and its wireless keyboard outside for everyone to use.

I experienced this phenomenon firsthand when I gave a television interview about Alexa and privacy some time ago. After the interview, several people called me and told me that each time I said "Alexa" on TV, their devices entered the "listening" mode. That was an "aha moment" for me. My ability to control people's smart devices through the TV amazed me. After a while, it started happening to others as well. You might also have heard about the "dollhouse case" or the Burger King ad (which plays after a YouTube ad).

What Doesn't Work?
Biometric authentication, for one, doesn't solve the problem. In theory, Alexa could learn to identify authorized people's voices and listen only to the commands they give. But while this seems like a possible solution, the opposite is actually true. To begin with, there is an inherent trade-off between usability and security. Implementing such a system means that users would have to go through an onboarding process to teach Alexa or any other voice-enabled device how they sound. Compared to the status quo, where Alexa works out of the box, we are talking about a serious degradation in user comfortability.

Biometric identification also means false positives: if your voice sounds different because you are sick, sleepy, or eating, Alexa will probably not accept you as an authorized user. And this is not all — there are systems available (like this example of Adobe VoCo) that, by using a person's voice sample saying one thing, can generate a new sample of his voice saying another thing.

Haven't We Solved this Problem?
Yes, we faced similar challenges with Wi-Fi networks in the corporate world. While these networks are also not limited by physical walls, the use of encryption and passwords proved to be a straightforward solution, separating approved from unapproved users.

It is true that we could force password usage with voice-operated devices ("Alexa, password 1337, please turn off the lights.") But … in the cybersecurity domain, saying the password out loud is not considered to be the most secure method for authentication. Another possible solution would be changing the activation word for voice-operated devices. Instead of calling Alexa "Alexa," you would choose a unique name. This will dramatically reduce our ability to execute The Big Truck Attack. But you'll be forced to say the new name out loud every time you operate a device, preventing it from becoming a strong security measure.

While for some "home users" this risk might be acceptable, it will not pass muster on the corporate side. Worse, in many cases, it would be extremely dangerous to connect voice-operated devices (as well as other types of sensor-operated devices) to sensitive networks — and one should refrain from doing so.

Mission Not Impossible
One possible solution is taking a multidevice approach. In this scenario, several devices would be able to identify approved users simultaneously, dramatically improving security. For example, when Alexa hears a user speaks, she will "ask" his smartwatch for identification confirmation. The smartwatch, being able to "hear" him/her through the voice vibrations inside their body, would match Alexa's received command with the one she just heard. If both match, this can be considered a two-step authentication.

A similar scenario can be achieved with video cameras, matching face and mouth movements to the commands Alexa hears. The camera could tell Alexa, "Yes, I know this guy. He is cool." Still, in any case, we are facing a complicated situation that requires extensive research. Voice identification may solve some of the issues for home users, but it is still far from being suitable for highly sensitive corporate networks.

About the Author(s)

Menny Barzilay

CEO at Cytactic & Founder of the THINK:CYBER Newsletter

Menny Barzilay is a strategic adviser to leading enterprises worldwide as well as states and governments, and he also sits on the advisory boards of several startup companies. Menny is the CEO of Cytactic, a cybersecurity services company, and the founder of the THINK:CYBER newsletter. Additionally, he is the CTO of the Interdisciplinary Cyber Research Center at Tel-Aviv University. Menny is a former CISO at the Israeli Intelligence Services.

See more from Menny Barzilay

Related Topics

Related Topics

Related Topics

Related Topics

About the Author(s)

Editor's Choice