While reverse engineering a Linux binary, I ran into a fairly common situation:
I wanted to understand how a decompression function works, but I didn’t have
compressed data to test with. In this blog, I’ll look at how to we can
manipulate the instruction pointer in the GNU debugger (
gdb) to trick the
software into generating test data for us!
I posted this on Mastodon awhile back, but I cleaned it up and expanded it a bit to make it a full blog post.
I did this work in the context of my research team at Rapid7 - you can check out all of our work on the Rapid7 Research Blog (secret rss link!)!
Anyway, while working on an application, I ran into a function called
LZ4_decompress_safe. I wanted to learn how it worked, but EVERYTHING I tried
to decompress returned an error - even test data generated by a legitimtae LZ4
library! I’m not sure why it didn’t work - maybe they modified it? Maybe it’s a
different version? Maybe the
lz4 CLI tool has more or less file headers? -
Dunno! But let’s make the application create its own test data!
I know (from Googling) that the signatures for the decompress and compress functions are:
int __fastcall LZ4_decompress_safe(const char *src, char *dst, int compressedSize, int dstCapacity) int __fastcall LZ4_compress(const char *src, char *dst, int srcSize, int dstCapacity)
The calling code looks like:
mov ecx, dword ptr [rsp+80h+capacity] ; dstCapacity mov edx, dword ptr [rsp+88h+size] ; compressedSize mov rsi, cs:buffer ; dst mov rdi, [rsp+88h+out_buffer] ; src call LZ4_decompress_safe ; I can't figure out how to get this to work :(
The functions have the exact same signature, which is super handy!
I put a breakpoint on the function
LZ4_decompress_safe, which will stop
execution when the application attempts to decompress data:
(gdb) b *LZ4_decompress_safe Breakpoint 4 at 0x40bc40 (gdb) run Starting program: [...]
Then I sent a message to the server with the “this message is compressed!” flag
set, but with uncompressed data (specifically, the contents of
my go-to for longer test data). So basically, the server will think the data is
compressed, but it’s actually not.
When the service tries to decompress the packet, it’ll hit the breakpoint:
(gdb) run Starting program: [...] Breakpoint 4, 0x000000000040bc40 in LZ4_decompress_safe ()
The calling convention on x64 Linux
means that the first three arguments are placed in the
registers. We want the
dst buffer, which is the second argument, so we print
(gdb) print/x $rsi $63 = 0x6820f0
Then we can change the instruction pointer (
rip), placing it at the beginning
LZ4_compress function instead of
LZ4_decompress_safe, which means
that once we resume execution, it’ll execute the compression function instead
of the decompression function! Since they have the same signature, which means
they expect the exact same set of arguments, the application can’t tell the
difference and will just continue on from the
rip you set!
Here’s the command to change the instruction pointer:
(gdb) set $rip=LZ4_compress
Then we can let ‘er run (the
finish command in
gdb runs the current
function - which is now
LZ4_compres - until it returns):
(gdb) finish Run till exit from #0 0x0000000000409360 in LZ4_compress () 0x000000000040778b in [...] ()
Then we can verify it worked (by printing the return value, which should be the size of the compressed data):
(gdb) print/x $rax $62 = 0x62b (gdb) print/d $rax $63 = 1579
The function successfully compressed my
/etc/passwd file to 1579 bytes! Now
we can dump the buffer (which was at the address we saw earlier -
to a file:
(gdb) dump memory /tmp/compressed.bin 0x6820f0 0x6820f0+0x62b
The arguments to
dump memory are the file to dump it to, the starting address,
and the ending address - all of which we know!
Then we can verify it wrote data to the file:
$ file /tmp/compressed.bin /tmp/compressed.bin: data $ hexdump -C /tmp/compressed.bin 00000000 b1 72 6f 6f 74 3a 78 3a 30 3a 30 3a 0b 00 12 2f |.root:x:0:0:.../| 00000010 06 00 f0 04 62 69 6e 2f 62 61 73 68 0a 62 69 6e |....bin/bash.bin|
file command doesn’t recognize the format, which supports my idea that
this is some sorta modified LZ4 function, but the file does contain
I sent that compressed file to the application as a compressed packet, and it successfully decompressed, allowing me to continue my testing! And that was that. We could certainly take this a step further and call that function directly to generate data on the fly, but thankfully, one set of test data was all I actually needed.
It was mostly luck that both functions have the same signature, but with a few tweaks we could have changed around the arguments to fit whatever prototype we needed. Thankfully, we didn’t have to. :)
Join the conversation on this Mastodon post (replies will appear below)!