GDB Tricks: Tricking the Application into Generating Test Data

While reverse engineering a Linux binary, I ran into a fairly common situation: I wanted to understand how a decompression function works, but I didn’t have compressed data to test with. In this blog, I’ll look at how to we can manipulate the instruction pointer in the GNU debugger (gdb) to trick the software into generating test data for us!

I posted this on Mastodon awhile back, but I cleaned it up and expanded it a bit to make it a full blog post.

I did this work in the context of my research team at Rapid7 - you can check out all of our work on the Rapid7 Research Blog (secret rss link!)!

Anyway, while working on an application, I ran into a function called LZ4_decompress_safe. I wanted to learn how it worked, but EVERYTHING I tried to decompress returned an error - even test data generated by a legitimtae LZ4 library! I’m not sure why it didn’t work - maybe they modified it? Maybe it’s a different version? Maybe the lz4 CLI tool has more or less file headers? - Dunno! But let’s make the application create its own test data!

I know (from Googling) that the signatures for the decompress and compress functions are:

int __fastcall LZ4_decompress_safe(const char *src, char *dst, int compressedSize, int dstCapacity)
int __fastcall LZ4_compress(const char *src, char *dst, int srcSize, int dstCapacity)

The calling code looks like:

mov     ecx, dword ptr [rsp+80h+capacity] ; dstCapacity
mov     edx, dword ptr [rsp+88h+size] ; compressedSize
mov     rsi, cs:buffer ; dst
mov     rdi, [rsp+88h+out_buffer] ; src
call    LZ4_decompress_safe ; I can't figure out how to get this to work :(

The functions have the exact same signature, which is super handy!

I put a breakpoint on the function LZ4_decompress_safe, which will stop execution when the application attempts to decompress data:

(gdb) b *LZ4_decompress_safe
Breakpoint 4 at 0x40bc40

(gdb) run
Starting program: [...]

Then I sent a message to the server with the “this message is compressed!” flag set, but with uncompressed data (specifically, the contents of /etc/passwd - my go-to for longer test data). So basically, the server will think the data is compressed, but it’s actually not.

When the service tries to decompress the packet, it’ll hit the breakpoint:

(gdb) run
Starting program: [...]

Breakpoint 4, 0x000000000040bc40 in LZ4_decompress_safe ()

The calling convention on x64 Linux means that the first three arguments are placed in the rdi, rsi, and rdx registers. We want the dst buffer, which is the second argument, so we print out rsi:

(gdb) print/x $rsi
$63 = 0x6820f0

Then we can change the instruction pointer (rip), placing it at the beginning of the LZ4_compress function instead of LZ4_decompress_safe, which means that once we resume execution, it’ll execute the compression function instead of the decompression function! Since they have the same signature, which means they expect the exact same set of arguments, the application can’t tell the difference and will just continue on from the rip you set!

Here’s the command to change the instruction pointer:

(gdb) set $rip=LZ4_compress

Then we can let ‘er run (the finish command in gdb runs the current function - which is now LZ4_compres - until it returns):

(gdb) finish
Run till exit from #0  0x0000000000409360 in LZ4_compress ()
0x000000000040778b in [...] ()

Then we can verify it worked (by printing the return value, which should be the size of the compressed data):

(gdb) print/x $rax
$62 = 0x62b

(gdb) print/d $rax
$63 = 1579

The function successfully compressed my /etc/passwd file to 1579 bytes! Now we can dump the buffer (which was at the address we saw earlier - 0x6820f0) to a file:

(gdb) dump memory /tmp/compressed.bin 0x6820f0 0x6820f0+0x62b

The arguments to dump memory are the file to dump it to, the starting address, and the ending address - all of which we know!

Then we can verify it wrote data to the file:

$ file /tmp/compressed.bin 
/tmp/compressed.bin: data

$ hexdump -C /tmp/compressed.bin
00000000  b1 72 6f 6f 74 3a 78 3a  30 3a 30 3a 0b 00 12 2f  |.root:x:0:0:.../|
00000010  06 00 f0 04 62 69 6e 2f  62 61 73 68 0a 62 69 6e  |....bin/bash.bin|

The file command doesn’t recognize the format, which supports my idea that this is some sorta modified LZ4 function, but the file does contain legit-looking data!

I sent that compressed file to the application as a compressed packet, and it successfully decompressed, allowing me to continue my testing! And that was that. We could certainly take this a step further and call that function directly to generate data on the fly, but thankfully, one set of test data was all I actually needed.

It was mostly luck that both functions have the same signature, but with a few tweaks we could have changed around the arguments to fit whatever prototype we needed. Thankfully, we didn’t have to. :)


Join the conversation on this Mastodon post (replies will appear below)!

    Loading comments...