Security Takes On Malicious DNA (Files)

Securing biomedical research can mean protecting systems from malicious code in the samples under investigation.

Joe Stanganelli, Attorney & Marketer

September 28, 2017

6 Min Read

At the 15th annual Bio-IT World Conference & Expo this past May, Ari Berman, vice president and general manager of consulting services at BioTeam, urged life-science organizations to adopt a more lax, less restrictive "alternative security model" that would -- inter alia -- fast-track DNA files through systems, presumptively earmarking them as "safe" within a "science DMZ."

"It's time for security to evolve. Use security methodologies that are appropriate to the measures that you are trying to control… [because] we do have that technology to separate the traffic based on what it is," said Berman in a presentation titled "Security Sobriety in 2017." "A mouse genome does not carry a computer virus. And it's huge. You don't need to inspect every computer packet in a mouse genome. You're wasting a lot of time and effort by doing that."

A computer file is still just a computer file, however -- and can be infected or compromised as such. Indeed, not long after Berman's impassioned criticisms of restrictive information-security practices in the life sciences, University of Washington security researchers developed a proof of concept that -- yes -- genetic data can be hacked to carry a computer virus.

In a paper published this summer and presented last month at the 26th USENIX Security Conference, University of Washington professor Tadayoshi Kohno and four other researchers related how they were able to create a strand of DNA containing malicious code to hack a popular DNA-compression software known as fqzcomp -- and then use that strand of DNA to corrupt DNA-sequencing software and seize control of a connected computer via a buffer-overflow attack.

The ramifications are enormous. "Infected" DNA could be used to steal intellectual property in a field where IP protection and prosecution is already cumbersomely ferocious, compromise employee or patient data, falsify genetic analysis (such as in the criminal justice system), hijack organizational systems, and otherwise wreak general havoc.

On the other hand, the researchers have emphasized that "there is not present cause for alarm about present-day threats" -- perhaps because they didn't exactly play fair. They inserted their own flaw -- instead of exploiting an actually pre-existing vulnerability -- into the open source code of fqzcomp for the sake of the general proof of concept. Even with this advantage, the difficult and time-consuming process paid off only to the extent of a successful translation of the malicious code about 37% of the time.

Berman, for his part, does agree that these findings are a concern -- but implicitly eschews the notion that research organizations must necessarily tighten their security controls. Instead, Berman contends, that onus belongs to vendors.

"[The UW study] opens a whole can of worms for sequencer vendors about the nature of how their software runs that they should address immediately," said Berman in an email interview. "The focus should be on why this vulnerability exists in the first place, not on locking things down more."

Where proprietary sequencing software is concerned, it is hard to disagree -- although, as information-security veterans know all too well, zero-days can pop up anywhere and at any time.

In the instant case, however, the nature of open source lies at issue. Because fqzcomp, like many other tools used in the world of genetic sequencing, is open source, it is no stretch of the imagination that an attacker could do exactly what the researchers did -- insert a vulnerability into open source code, and then feed a target a malicious genetic file. Hence, trust is paramount when it comes to the sources of open source.

"No matter what the threat, it's important to have data security as a top priority all the time, across all systems," said David Bernick, director of technical operations for DevOps and security at the Broad Institute -- a nonprofit genomics research organization that offers its Genome Analysis Toolkit ("GATK"), an open source(ish) genetic-sequencing software package, to clinical researchers. "For GATK, we make the full source code available. As long as people are getting the software directly from us, they benefit from the efforts we put into keeping our software safe, [including] code-security analysis [and] authenticated pen-tests."

In any event, Berman remains stalwart that the life-science community's accessibility problems are bigger than its security problems.

"I don't see this as a major issue to deal with at the moment. If folks decided to treat it as a real threat, the DNA sequence patterns could be programmed into an IDS analyzer and flagged if they came through," said Berman in an email interview. "Panic and overreaction will just result in tighter policies and more restrictive security measures that will only serve to make medical and research data even less accessible, which is in direct opposition to scientific discovery and data availability."

Berman's contentions are hardly unique to the life-science space; they are as old as the notion of security itself. Moreover, he is not alone in envisioning a world of optimizing networks to deal specifically with genomics files in a way that increases accessibility while maintaining sufficient security.

"Genomics data are a fairly high workload on the Internet," Shawn Hakl, Verizon's vice president of business networks and security solutions, told Security Now. "If you knew that's what you were transporting between a group of users, and you knew you had a certain type of data with a certain type of data structure, and you knew you had some common security requirements -- like the need to make sure that it was HIPAA compliant, and the need to validate certain kinds of users -- you could build a combination of security encryption and optimization algorithms very specific to the exchange of that data within a particular closed user group."

Hakl elaborated that intelligent, virtualized networks could thereby be the answer.

"You would never have been able to deploy a custom packet network to do that in a hardware world because you'd have to have somebody... deploy a bunch of specialized switches, " said Hakl. "In a new, software-defined network world, it's really not that hard for me to instantiate a collection of virtual appliances with a specialized packet-optimization algorithm built for your data all over the place between you and a relatively closed group of users -- and [then] optimize that content for distribution."

Hakl concedes that these ideas may still be "at the PowerPoint stage," but they are valuable ones in the wake of the UW study -- whose authors have called upon the DNA-sequencing community to act proactively today rather than reactively tomorrow. Thus, Berman's vision of a science DMZ could help drive the much-needed innovation to satisfy both security geeks and accessibility freaks. Otherwise, time may tell as to which way the scales should skew.

Related posts:

— Joe Stanganelli is founder and principal of Beacon Hill Law, a Boston-based general practice law firm. His expertise on legal topics has been sought for several major publications, including US News and World Report and Personal Real Estate Investor Magazine.

Read more about:

Security Now

About the Author

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like


More Insights