Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Edge Articles

11:25 AM
Daniel Smallwood
Daniel Smallwood
Edge Features
Connect Directly

Hacker Pig Latin: A Base64 Primer for Security Analysts

The Base64 encoding scheme is often used to hide the plaintext elements in the early stages of an attack that can't be concealed under the veil of encryption. Here's how to see through its tricks.

(image by Daniel Berkman, via Adobe Stock)
(image by Daniel Berkman, via Adobe Stock)

If you have young kids, you'll relate to the value of being able to speak in code. For a few years, I was able to use Pig Latin to speak covertly around my kids. It was handy, and surprisingly effective, until they decoded the scheme and began speaking Pig Latin in front of me. Artypay overray.

I think about this every time I witness attacks where pieces are encoded (not encrypted) to hide what's going on. One of these encodings is typically Base64. Why is this so common? Most machines speak Base64, but most security analysts don't. 

In this post, I'll explain what the Base64 encoding scheme is, then discuss how it's used both for good and evil intent. Next, I'll look at some common detection applications of Base64 and where they sometimes fall short, giving advice on what you can do to strengthen them. Finally, I'll address some other encoding algorithms in the wild to help round out the topic and perhaps give you the ability to dead reckon when the bad guys may be trying to hide something. 

What Is Base64?
Base64 is an encoding scheme that can take any binary input and represent it using a set of 64 ASCII characters. It's important to note that Base64 is not encryption; it's an encoding scheme, so decoding it is trivial. Simple, free Base64 encode/decode tools are easy to find online. 

Encoding in Base64 is an inflationary operation: the 11-character input string "Hello World" converts to 16 characters in Base64.

"Hello World"  --( Base64Encode )-->  "SGVsbG8gV29ybGQ="

Base64 Table
Credit: Wikimedia Commons

Let's look at a few more Base64 strings. 

"Secret string" => U2VjcmV0IHN0cmluZw==

"Be sure to drink your ovaltine" => QmUgc3VyZSB0byBkcmluayB5b3VyIG92YWx0aW5l

They all contain characters from the set [A-Za-z0-9/+] and can end with 0-2 equal signs. Why these characters? The purpose of Base64 is to encode anything (namely binary data) into the characters that are carried easily by text-only protocols.

For example, e-mail was originally only designed to carry text data. As e-mail evolved, the protocols that delivered email didn't. Attaching binary documents like pictures and media files was not possible. The path of least resistance to allow email to progress was to create a binary-to-text encoding scheme rather than altering the protocol. 

One facet of the SMTP protocol that makes this clear is the end of message indicator. In SMTP, the signal an email client uses to show the end of a message is for it to supply a single line that contains only a period. (SMTP Protocol implementation details, although long, are surprisingly easy to read: https://tools.ietf.org/html/rfc2821. Isn't this period trick odd? What if an email author wanted to send a single line with a period in their email?) Send arbitrary (non-text) data as part of the message body and you could possibly interfere with this protocol feature, and likely others, too. 

Another common legitimate example of Base64 use is embedding raw binary data (e.g., images) in-line with html pages. HTML is a text-only protocol after all, and if you want to carry an image right in the page, versus by hyperlink for the browser to grab on a separate connection, Base64 is your answer.

Why Might Attackers Use Base64?
Base64 is often used to hide the plaintext elements of an attack that can't be concealed under the veil of encryption. Look for Base64 use in early stages of attacks, when the breach is narrow. 

Using real encryption is hard during early attack stages because encryption requires tooling and key exchange. The adversary can't guarantee that the required cryptographic tools will be available and accessible on the victim host to decrypt anything. But Base64 tooling is far more ubiquitous.

Even if we presume tooling to not be an issue, carrying a symmetric key with the encrypted payload defeats the purpose; asymmetric keys aren't a solution as this requires both infrastructure and further exposure. When an adversary uses encryption, it usually occurs later in the attack and piggybacks over third-party infrastructure.

Let's walk through a simple encoding exercise. Encoding the two letters "IN" into Base64 becomes "SU4=". Like so:


There's one BIG takeaway to absorb as you look at the tables and example: There is no direct translation between ASCII and its Base64 equivalent. In other words, the character "A" translates to three different representations in Base64 depending on what the offset is. It's this misunderstanding that lies at the root of the problem with many Base64 security detections.

These encoded strings illustrate this:

"Secret string"   =Base64=> U2VjcmV0IHN0cmluZw==

"ASecret string"   =Base64=> QVNlY3JldCBTdHJpbmc=

"AASecret string"  =Base64=> QUFTZWNyZXQgU3RyaW5n

--- As we prepend more characters now, things begin repeating ---

"AAASecret string" =Base64=> QUFBU2VjcmV0IFN0cmluZw==


A detection that looks at the Base64 version of "Secret String" must consider that it has three representations.

Common Base64 Analysis Techniques and Oversights
When analyzing a string believed to be nefarious plaintext data hidden with Base64, it's important to remember that the suspect string may be only a fragment. It might be necessary to add padding to the beginning and adjust padding at the end to get the decoded text out. Let's look at an example using the CyberChef tool

("Analysis techniques and oversights," continued on page 2 of 2)

Daniel Smallwood is a Senior Threat Research Engineer at IronNet, the company bridging the gap between traditional cybersecurity approaches and the modern threat.  Prior to IronNet, Daniel spent more than 18 years in security and software development for companies ... View Full Bio
1 of 2

Recommended Reading:

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
Cartoon Caption Winner: In Tow
Flash Poll