News, news analysis, and commentary on the latest trends in cybersecurity technology.
DARPA Aims to Ditch C Code, Move to Rust
The Defense Advanced Research Projects Agency launches TRACTOR program to work with university and industry researchers on creating a translation system that can turn C code into secure, idiomatic Rust code.
August 13, 2024
The US military agency responsible for developing new technologies plans to embark on an effort to rewrite significant volumes of C code by funding a new research challenge to create an automated translator capable of converting old C code with function written in the security-focused Rust language.
The Defense Advanced Research Projects Agency (DARPA) will hold a workshop, known as Proposers Day, on Aug. 26 to outline its vision for the Translating All C to Rust (TRACTOR) project. The effort calls for academic and industry research groups to compete to create a system that can turn C code into idiomatic — that is, using native features — Rust code. The project's ultimate goal is to provide tools so that any organization with large volumes of software written in C can convert that code to Rust and eliminate the memory-safety errors that account for a large source of software vulnerabilities.
Without an automated system, developers are unlikely to take on the task, says Dan Wallach, program manager in DARPA's Information Innovation Office (I2O).
"Today, rewriting code is expensive and labor-intensive, and [organizations] with large legacy codebases simply cannot afford that in many cases," he says. "The best advice today is, 'Well, get started anyway and do it incrementally.' But if we can create a high degree of automation, then that changes the economics of the problem and makes it possible to improve code faster."
Technology companies and the US government have identified memory-safety flaws as a common, but entirely preventable, class of software vulnerabilities. In December 2022, Google disclosed that, in the software for the latest version of Android, the majority of new code was written in the memory-safe languages of Java, Kotlin, and Rust, leading to far fewer memory-safety vulnerabilities — 85 in 2022 compared to 223 in 2019.
Rust to the Rescue
Because memory-safety issues — such as buffer overflows and double-free errors — typically occur in C and C++ code, technical experts have recommended moving to Rust, a memory-safe language that meets many of the same requirements as those languages. Google found, for example, that rewriting the QR code generator for Chrome in Rust allowed the developers to move it from the application's sandbox, speeding performance. Microsoft has rewritten some operating system functions in Rust and found a 5% to 15% performance improvement.
In fact, Rust continues to be the language with which the greatest number of developers want to work, with 82% of developers "admiring" the programming language, compared to the 29% who currently extensively use the language, according to StackOverflow's "2024 Developer Survey."
Many organizations are already using — or starting to use — Rust, says Beth Linker, senior director of product management for Synopsys' Software Integrity Group.
"We’ve seen a lot of momentum around Rust in the last 12 to 18 months because of the US government’s stance on memory-safe programming languages," Linker says.
LLMs Necessary But Not Sufficient
Yet to use Rust widely, companies need to make sure that the Rust code uses features of the programming language and is interoperable with other components that may still be written in C or C++. For that reason, large language models (LLMs) will likely be necessary, even if they cannot yet translate C-to-Rust code totally accurately, Linker says.
"In our experience using LLMs to generate Rust code, we have seen that this is still a growth area for many LLMs because there is less training data available for Rust than for more established languages," Linker says. "TRACTOR is an ambitious project and will be very impactful if it succeeds."
While artificial intelligence (AI) is not a requirement for the project, DARPA's Wallach thinks that LLMs will almost certainly be part of the solutions. They could contribute in many different areas, from translation to code evaluation to process control — there is no one right way to do it, he says.
And because the pace of AI innovation is moving so quickly, any particular solution should not rely on a specific implementation, Wallach adds.
"Whatever the state of the art is for LLMs today, I promise you in four years, there will be something better. I don't know what it is, I don't know who's going to make it, but I know that that world is improving on its own," he says. "So our goal is to be able to benefit from the investments that other people are making in AI."
Significant Challenges to Overcome
The need for the solution to easily swap older LLMs for newer, more efficient models will likely cause issues. Already, the intellectual-property challenges that come will AI models are significant, says Chris Clark, automotive systems architect for Synopsys's Software Integrity Group.
"This raises many questions about IP, usage, analysis, and model development. The challenge will not be whether an AI engine can be developed; the challenge will be in the legal domain and licensing," Clark says. "The question about how my code is used and what is derived from it will have to be answered. For embedded, this is especially important."
Overall, DARPA realizes that creating TRACTOR will rely on significant innovations in the technology of LLMs and source-code translation, and that the entire exercise will likely bring up some thorny issues, such as, for example, whether creating Rust code that matches the C code is the criteria, or whether the system should try to gauge the intent of the programmer.
"There's no doubt that this is a hard problem, and DARPA doesn't do easy problems. DARPA does hard problems," Wallach says. "It's not enough simply to yield Rust code that is safe but unreadable and unusable. The whole point of this is that we want to move developers from C to Rust ... so to the extent possible, we want it to be the highest-quality Rust that can be produced."
About the Author
You May Also Like
Cybersecurity Day: How to Automate Security Analytics with AI and ML
Dec 17, 2024The Dirt on ROT Data
Dec 18, 2024