informa
/
Cybersecurity In-Depth
The Edge

Hacker Pig Latin: A Base64 Primer for Security Analysts

The Base64 encoding scheme is often used to hide the plaintext elements in the early stages of an attack that can't be concealed under the veil of encryption. Here's how to see through its tricks.

(continued from page 1)

Our suspect string is:

ldCBodHRwOi8vMTAuMS4yLjMvdG9vbGtpdHMvbm90aGluZ190b19zZWVfaGVyZS5iaW4=

Step 1: Adjust Trailing Padding if Necessary
We put the suspect string into CyberChef and choose the "From Base64" recipe, which produces the error: "Data is not a valid byteArray." Adjust the number of trailing "=" from 0-2 until the error goes away. In this example, deleting the "=" allows for decoding.

Base64_3.png

CyberChef

Step 2: If Plaintext Isn't Apparent, Prepend Some Characters
If the output looks to be binary and you suspect text, don't give up yet. Add some characters to the beginning to see if it's simply a bit alignment problem due to truncated data. You can use any valid Base64 character here, but consider using the "/" as the injected padding tends to stand out better (unless the first encoded character is already a "/").  From our test string, three padding characters caused the plaintext to be revealed.

Base64_4.png

Where Will I See Base64?
A security analyst will encounter Base64 encoded strings in a variety of places. 

The routine and most common places come from examining mail attachments and embedded content (mostly images) from web pages. Other places should cause analysts to be on alert -- for instance, when Base64 strings are detected on the command line. 

Below is an example of a reverse shell hiding in plain sight using a powershell command. (Ref: mkpsrevshell.py, https://gist.github.com/tothi/ab288fb523a4b32b51a53e542d40fe58.) This leverages the "-e / -EncodedCommand" feature of powershell that allows a Base64 string to be passed in. Powershell will decode the Base64, then execute the script inside.

base64_5.png

Ref mkpsrevshell.py https://gist.github.com/tothi/ab288fb523a4b32b51a53e542d40fe58

 

The behavior of spawning a process with Base64 reflected on the command line by itself is suspicious. If you're monitoring Windows process creation, you should inspect when you see that happening. 

Let's look at another common oversight spotted in a Sigma IDS rule. The rule fragment below is published to Sigma and looks for a particular Base64 string (among other things, see full rule for that):

Base64_6.png

This rule contains a detection element if the string '"L3NlcnZlc" is observed. According to the rule, this string translates to "/server=." In fact, it falls a bit short. If we use CyberChef, we notice that it actually translates to "/servet" a mistake/bug introduced probably from the input string carrying a trailing "=" sign. Now that we are savvy Base64 sleuths, we can update this rule to the correct string: "L3NlcnZlcj0=." And also using our knowledge of the bit offset problem, add the two other Base64 variants that will detect the same thing: "y9zZXJ2ZXI9," "c2VydmVyPQ." 

Another common Base64 exposure for security analysts is examining HTTP Basic Authentication. (Maybe this isn't as "common" as it used to be, but I'm pretty sure every security analyst has seen at least one of these alerts fire.) Here's an example of an HTTP header using it. The problem here is now pretty obvious. This is a plain-text password. HTTP basic auth carries the convention of Base64 encoded "username:password" in the "Authorization" client header. This example decodes to "joeuser:very$ecure."

Base64_7png.png

Other Encoding Schemes
If you're a security analyst, at this point you may have realized a great evil application for Base64: data exfiltration over DNS! But there are a couple problems here. First, the defined character set for Base64 includes characters not allowed in DNS strings (+, /, =). Second, DNS is case-insensitive. An adversary couldn't guarantee that their Base64 encoded subdomain wouldn't get "lowered" along the way. But … there's always Base32! Base32 is very similar to Base64 encoding, except it carries data when we can't use upper/lowercase to encode information. Base32 is even more inflationary than Base64, so encoding large amounts of data for exfiltration using Base32 is surely to be a very loud network event.

Don't forget, too, that Base16 (hex) and Base2 (binary) are also valid encoding schemes with early access tooling available. Security analysts see these everywhere as part of their daily exposure but rarely as part of an adversary technique to analyze like Base64.

Variants of Base64 use different alphabets. For instance, there's a "filename safe" variant that substitutes the "/" for a "-." So just because you see something that looks like a Base64 string but has an "-" in it, don't discount it too quickly. The CyberChef tool demonstrated earlier can be configured for these alternate alphabets.

Summary
We explored Base64 encoding from the security analyst's perspective. Base64 encoding is traditionally used to convert binary data to printable text characters, but it can also be used to hide plaintext. Security analysts should keep these common techniques in mind while performing investigations, as all too often encoding plaintext as Base64 is enough to allow the best detection engine to miss (our eyes).

Once understood, Base64 detection flaws can be identified and signatures/logic improved to reflect all possible permutations.