Understanding Spectre and MeltDown Vulnerabilities – Part 1

In this blog series, we will go through one of the biggest security vulnerabilities of recent times i.e. Spectre and Meltdown. This article is mostly centered around understanding the concepts which will be necessary for then understanding the internals of these two vulnerabilities.

How is a program executed?

A program is simply a series of instructions which are present in memory. These instructions are executed by our processor one by one. Every instruction which is executed by the CPU or the processor is executed within a privilege level.

From Wikipedia

privilege level in the x86 instruction set controls the access of the program currently running on the processor to resources such as memory regions, I/O ports, and special instructions.

So it essentially means that any instruction executed on a processor running within a particular privilege level might have access to some restrictive subset of system resources ( e.g. memory region, IO ports ).

Intel x86 architecture offers a total of 4 privilege levels which might or might not be used by operating system vendors. Linux for that matter uses only two privilege levels

  • Level 3 ( ring 0 )
    • Kernel operates in this mode. This privilege mode makes sure we have access to all the hardware ( ports ), instruction sets and memory. This is necessary because the kernel needs to access all the hardware devices and different processes memory regions. So it makes complete sense to let kernel in full privilege mode and let it access every hardware device and every memory region.
  • Level 0 ( ring 3 )
    • All user processes run in this mode. Using this mode a user has the limitation of using a segment of the memory. For any, hardware related tasks ( be it disk IO or network IO), it has to involve kernel in this by making appropriate system calls. System calls are a way to change the privilege mode from user mode to kernel mode.

Blank Diagram - Page 1 (5)

Note: Just to add to this, it’s not only the memory regions / IO ports, these levels also prohibit privileged instructions from getting executed in user mode like HLT, RDMSR etc. Read this more details.

Memory Isolation

Memory is divided among different processes as well as kernel via a concept of page tables. Every process has a page table and this page table stores entries to the physical pages in RAM.

Processes use virtual address instead of physical address to store/load content. This implicit conversion from virtual address to physical address is done with the help of page tables which stores the address of the physical pages. These page tables are also stored in memory, so essentially any virtual address to physical address conversion involves a lot of memory seeks ( around 100 cycles for each memory access ) which might slow down our processing, so for that, we have TLB ( Translation Lookaside Buffer ) which is essentially a fast cache for this virtual address to physical address mapping.

This page table is also divided into two segments. One is for user process page table and another is for storing kernel page table entries. Kernel Page table is essential for the memory addressing which would happen while executing in kernel mode ( privileged mode ) for accessing kernel data structures.

blank-diagram-page-1-6.png

As we already know that every user process stores some or other other information in memory and address this memory location via virtual address. We already have TLB for storing this virtual address to physical address mapping, but finally, we need to hit the memory for getting the contents stored at that memory location ( physical address ). If our application involves a lot of memory seeks ( which is generally the case ) this might slow down our processing. For saving these memory seeks we have these CPU caches in place. These CPU caches, cache the contents for those physical addresses and save us those costly memory seeks. blank-diagram-page-1-7.png

For a better understanding of these CPU caches, read this.

Now until this point, I hope,  you have a decent understanding of the CPU architecture in general. But Before going into the spectre and meltdown vulnerability, let’s understand building block of these attacks

Flush Reload Attack

In this attack, the attacker exploits the cache behavior to identify the access for the victim process on memory. L3 cache is shared among different processes running on different cores, so essentially with the help of this attack, we can monitor the instructions executed by the victim process.

Question: But how can we exploit the cache behavior?

Answer:  We already know that if a memory location is cached in the L3 cache, then there is a tremendous amount of CPU cycles saved which essentially means that time taken for uncached read i.e. from RAM is much higher when compared to cached read i.e. from CPU cache ( L3 in this case ).

So basically if an attacker wants to figure out if a memory location has been accessed by another process running on a different core, the attacker just needs to find out the time it takes to access that particular memory location. If that time is on the higher side ( for which we need to train our simple classifier ), then it has not been read by the process but if the time is on the lower side, then we know for sure that process A has recently made access to that particular memory location.  Some prerequisites are that we need to be sure before starting this attack that the memory location ( or line ) has been flushed out.

So with the help of this attack, the attacker can figure out what victim is essentially doing and executing which segment of code.

One of the other interesting observations, made in this paper, is that we can also figure out the data on which victim operates. This is a bit non-trivial in itself, so let’s understand this with the help of an example.

Victim Process A

for (int i = 0; i < PUBLICLY_NOT_KNOWN; i++) {
    performFunction(PUBLICLY_KNOWN);
}
  • We have a number PUBLICLY_KNOWN
  • We will perform certain operation i.e. performFunction on this number
  • This function f will be called PUBLICLY_NOT_KNOWN times
  • Now our motive is to find this number PUBLICLY_NOT_KNOWN

Attacker Process B

  • We already know the PUBLICLY_KNOWN
  • We already know the memory address of the performFunction function
  • We also know that this function takes around t ms
  • Let’s start by flushing the cache line for memory location of performFunction
  • Also let’s initialise PUBLICLY_NOT_KNOWN = 0
  • Now after every t ms, we will check whether this function has been accessed. This can be done with the above-mentioned flush reload technique to figure out whether this memory location has been accessed.
  • If yes then increment the current known value of PUBLICLY_NOT_KNOWN by 1. If no, then it means that the loop has terminated.
  • At the end of this, we know the value of the PUBLICLY_NOT_KNOWN

So this flush-reload attack can be used by the attacker to identify the other secrets inside the victim process memory. This above-mentioned attack/methodology can be used to find the RSA decryption key in the same way in which we have explained. See this for more details.

Speculative Execution

From Wikipedia

Speculative execution is an optimization technique where a computer system performs some task that may not be needed. Work is done before it is known whether it is actually needed, so as to prevent a delay that would have to be incurred by doing the work after it is known that it is needed. If it turns out the work was not needed after all, most changes made by the work are reverted and the results are ignored.

Previous processors used to perform inline processing of instructions i.e. processing instructions one by one. But with speculative execution, a processor can make certain speculations regarding the control flow of the program and pipeline appropriate instructions. Speculative execution has increased the performance of modern processors tremendously.

To understand speculative execution, let’s take this simple example:

if (x < p.size) { // first instruction
  int b = p[x]    // second instruction
} else {
  int b = 1       // third instruction
}

In this code, we can see that we are checking for bounds for x and if x is within the bounds, then we are accessing the data at x offset in the array p.

Inline processing would have meant that each and every time x would be checked against the bounds and after checking those bounds, we will fetch the memory location at x offset. In other words, we would execute instructions serially, one after the other.

But with speculative execution, we need not wait for the first instruction i.e. bounds check to complete before starting with any further instructions. Speculative execution together with branch predictor says

“As most of the times during this particular code execution, branch 1 is taken, so this time also lemme take branch 1”

So speculative execution along with the help of branch predictor takes the branch 1 i.e. second instruction and then reads the memory location at x offset in the array p.

Note: We also might end up in situations where we would have made a wrong speculation and executed the wrong branch. In those cases, the instructions executed via speculative execution are retired from the processors and all the state ( registers ) associated with those speculatively executed instructions is cleared.

References:

3 thoughts on “Understanding Spectre and MeltDown Vulnerabilities – Part 1

  1. Do you mind if I quote a couple off your articles as long
    as I provide credit and soudces back tto your blog?
    My blog is inn the exact same niche as yours and my users wkuld definitely benefit from a lot
    of the information yoou present here. Please let mee know if
    this kay wuth you. Many thanks!

    Like

Leave a Reply