The Nightmare Before Christmas: Security Flaws Inside our ComputersHow an Intel design decision with no review by industry security consultants led to one of the biggest vulnerabilities in recent history.
Towards the middle of last year, some researchers at the University of Graz published a paper in which they proposed a new mitigation for a software vulnerability associated with something called ASLR. They called this mitigation KAISER.
ASLR – Address Space Layout Randomization – is a widely-used technique for ensuring that malware can’t easily find out where critical data is loaded into a running process, and subsequently exploit this. Effectively the various pieces of a process are loaded into randomly chosen memory locations, which change each time the process is loaded.
ASLR is only one defense, of course, and it isn’t perfect. Attacks on ASLR have been known for a long time. Consequently, it was rather surprising that, not long after this, the Linux kernel team, Microsoft, and Apple, all started working on patches ostensibly to implement KAISER. The changes were non-trivial and would have significant performance overheads. Not only this, but unexplained maintenance shutdowns were scheduled for Azure and AWS early in the New Year.
We now know that a small number of privileged insiders had knowledge about a much more serious security vulnerability. Unlike most vulnerabilities, this one was intrinsic to the very hardware on which software runs, specifically, Intel CPUs. Worse, this problem could not be resolved with a simple microcode update. It was intrinsic to the design of the CPU chip itself. Software workarounds were complex and imposed significant performance penalties.
So what did these insiders know? It’s believed the flaw lies in the way Intel processors handle access to data in memory.
When a user task requests access to a location in memory, its privileges must be checked. If the task doesn’t have the right privileges, it will not be allowed to read the memory location. But before data from memory is handed back to the requesting task, it is first placed in one of the caches built into the CPU. Caches are just special blocks of very fast memory, intended to ensure that if the same data is requested again, it’ll be right there when it’s needed.
Intel decided to design their CPU so that data is placed in the cache regardless of the requesting task’s privileges. They then checked the permissions of the task after doing this. If the task lacked the required permissions, then it would not receive the requested data and a fault operation would be raised.
But Intel’s design decision had a serious consequence. It is possible for malware to use so-called side-channel attacks to determine the contents of cache memory, even though the malicious software has no direct access to the cache. For example, by measuring how long it takes to retrieve requested data, malicious software can infer what’s cached. That’s a serious problem, because apart from sensitive operating system data, including things like encryption keys, virtualized environments often support multiple tenants – some of whom may be entirely separate organizations. Suddenly, concrete walls become glass walls.
We don’t yet know the performance overheads of the (forthcoming) Windows patch. We do, however, know that the Linux patch has been shown to impose nearly a 25% overhead on database accesses. This is not surprising given the extensive changes that had to be made to work around the problem. These changes negate many of the performance improvements that have been made in recent years, but they are necessary because without them, every Intel-powered computer is fundamentally insecure.
At the time of writing, three proof-of-concept attacks have been demonstrated. These have been grouped into two categories; Meltdown and Spectre. Meltdown is the issue to be concerned over. Spectre is a more limited attack and far less important, and it’s also difficult to exploit.
So now we have a real headache. Every virtualized environment has to be patched, along with physical devices. Anything running an older Linux kernel where legacy line of business software is installed, is now a potential security vulnerability. In many cases it may be far cheaper to simply migrate these systems to AMD-based servers, which are not vulnerable. This is because AMD processors do check privileges before caching data, although researchers haven't totally shut the door on ARM and AMD processors and note that it's "unclear" whether they are being affected.
If your organization is still running older versions of Windows, this will be yet another wakeup call to upgrade to Windows 10 as soon as possible. Of course, we don’t yet know what overheads this fix will impose. Along with this, mobile devices may have significantly poorer battery life. Unfortunately there’s no real alternative. The problem is intrinsic to the CPU design and cannot be patched in microcode.
For Intel, this comes after a terrible year where major security flaws with the inbuilt Management Engine component were exposed, culminating in a demonstration at Black Hat Europe where arbitrary code was loaded into the engine and made to persist after device reboot.
Once again this is also a reminder that security through obscurity is no defense. Indeed, it is innately a vulnerability. When Intel made their design decisions they did so internally, with no review by industry security consultants. As has been shown time and time again, this guarantees something will be overlooked.
It’s also interesting that this vulnerability has probably been around for a decade or more. Of course, we can’t know for sure that someone – a nation state, for example – didn’t already know all about this, and possibly even have a weaponized exploit. Indeed – as has happened several times recently – it may be that such an exploit has inadvertently reached the public domain.
Hopefully over the next few days and weeks, we’ll learn more. At present, leading industry commentators, including Linus Torvalds, Alex Ionescu, and Mark Russinovich have been fairly tight-lipped over the issue. This is probably not surprising considering how high the stakes are. But for now this is probably the biggest security flaw discovered for a very long time – and 2018 has only just begun!
Andrew Mayo has been involved in IT, both in software and hardware roles, for enough years to have worked through the tail-end of the punched card and paper tape era, and the subsequent invention of the PC. Currently he's working on the evolution of 1E's Tachyon solution, ... View Full Bio