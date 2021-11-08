There are several short-term methods that can mitigate the Trojan Source attack that abuses Unicode to inject malicious backdoors in source, according to experts.



The new attack method, identified by University of Cambridge researchers, tricks compilers into reading hidden Unicode characters and generating binaries with extra instructions and backdoors that the developer or security analyst do not know about. Because the special characters are not visible by default, the malicious code is unlikely to be discovered during code review.

Attacks based on how Unicode displays text are not new, but one reason why Trojan Source may feel like a bigger deal is because of the sheer amount of code that gets copy-and-pasted from public sites such as StackOVerflow, GitHub, and other centralized forums into the individual source code files. If there are problematic Unicode characters hidden in the file, those are getting copied in, as well.

“This scenario demonstrates the proactive power of source code reviews and it would be a good best practice not to copy and paste code for the time being,” says Jon Gaines, senior application consultant at nVisium. “It's always better to rewrite it yourself.”

Make Unicode Visible

Developers can detect the potentially malicious Unicode characters by enabling the IDE or text editors they are working with to display Unicode, or using a command-line hex editor such as HexEd.It and search for specific Unicode characters in the file, Gaines says.

Major source control platforms have already responded, as Github, Gitlab and Atlassian (for BitBucket) already post alerts for the Unicode BiDi characters (CVE-2021-42574).

One way to deal with the fact that the text editor Visual Studio Code can be tripped up by this attack is to change the encoding to non-unicode, as that will show the malicious Unicode characters (for the BiDi characters) as mangled characters, says Shachar Menashe, senior director of security research for JFrog Security. The mangled characters should get caught during a manual code review.

This is what Unicode BiDi would look like once the change is made in Visual Studio Code:

There are homoglyphs that are difficult to differentiate from legitimate characters. This is how those characters would appear once the change is made in Visual Studio Code:

Menashe says Visual Studio, Notepad++ and Sublime Text actually aren't affected by BiDi characters in a vulnerable way, as the line is either mangled, or the entire line shows up as a comment.

Filter Out the Characters

The Trojan Source methods will have “minimal security impact in the real world,” because regular source code typically does not contain the special Unicode characters outlined by the researchers (Bidi and Homoglyphs), says Menashe. They are “easy to detect, alert on and perhaps even filter out automatically,” he says.

The following Linux commands can either alert on or strip out all Unicode characters from an individual source code file:

Alert: iconv -f utf-8 -t ascii input.cpp

Strip: iconv -c -f utf-8 -t ascii input.cpp -o filtered_output.cpp

Alternatively, this Linux command will check a list of files and flag instances where the special characters are found.

for file in filelist; do hexdump -C “$file” | grep RTLcharacters; done

Instead of just alerting, the following commands can strip out only the specific characters targeted in Trojan Source from the individual code file.



The following two Linux commands strip out Unicode BiDi characters (CVE-2021-42574):

CHARS=$(python -c 'print u"\u202A\u202B\u202D\u202E\u2066\u2067\u2068\u202C\u2069".encode("utf8")')

sed 's/['"$CHARS"']//g' < input.cpp > filtered_output.cpp

For Unicode Homoglyph characters (CVE-2021-42694), these two commands form a partial list for stripping Cyrillic homoglyphs only:

CHARS=$(python -c 'print u"\u0405\u0406\u0408\u0410\u0412\u0415\u0417\u041D\u0420\u0421\u0422\u0425\u0430\u0440\u0441\u0443\u0445\u0455\u04AE\u04BB\u04C0".encode("utf8")')

sed 's/['"$CHARS"']//g' < /tmp/utf8_input.txt > /tmp/ascii_output.txt

Check the Tools



Install the updates for the compilers as they become available to block the attack method. But the commands to automatically detect and sanitize the files would mitigate the issues until the updates are applied. While it is possible to perform a manual source code audit to look for these special characters after changing the text-editor settings, that would be the “worst way to handle this issue,” Menashe says, since some of the characters can be indistinguishable in some cases from legitimate Latin characters. “The best solution is to run automated tools that alert and/or strip these characters,” Menashe says.

The CVSS score of 9.8 is “overblown,” Rudis wrote. To exploit this weakness, the adversary would need to have direct access to developers’ workstations, source code management system, or continuous integration pipelines.

“If an attacker has direct access to your source code management system, frankly, you probably have bigger problems than this attack,” Rudis wrote.