Defcon Quals writeup for byhd (reversing a Huffman Tree)

This is my writeup for byhd, a 2-point challenge from the Defcon Qualifier CTF. You can get the files, including my annotated assembly file, here. This is my second (and final) writeup for the Defcon Qualifiers, you can find the writeup for shitsco here.

This was a reverse engineering challenge where code would be constructed based on your input, then executed. You had to figure out the exact right input to generate a payload that would give you access to the server (so, in a way, there was some exploitation involved).

Up till now, cnot from PlaidCTF has probably been my favourite hardcore reversing level, but I think this level has taken over. It was super fun!

The setup

When you fire up byhd, it listens on port 9730 and waits for connections:

$ strace ./byhd
[...]
bind(3, {sa_family=AF_INET, sin_port=htons(9730), sin_addr=inet_addr("192.168.1.201")}, 16) = 0
listen(3, 20)                           = 0
accept(3,

When you connect, it forks off a new process, and therefore we have to fix it as described in an old post I wrote to ensure everything stays in one process (otherwise, you’re gonna have a Bad Time). You also have to add a user and group and such, and you may need to run it as root.

After I fix that, it reads from the socket properly! But when I send data to it, it just disconnects me:

$ nc 192.168.1.103 9730 | hexdump -C
hello
00000000  ff ff ff ff                                       |....|

Because it’s so, so common in protocols, I tried to prefix a 4-byte length to my string:

$ echo -ne '\x04\x00\x00\x00\x41\x41\x41\x41' | nc -vv 192.168.1.103 9730
192.168.1.103: inverse host lookup failed:
(UNKNOWN) [192.168.1.103] 9730 (?) open
 sent 8, rcvd 0

No response this time? On the server side, we can see why:

# ./byhd-fixed
Segmentation fault

The crash is really weird, too:

# gdb -q ./byhd-fixed
(gdb) run

Program received signal SIGBUS, Bus error.
0x0000000000401cc1 in ?? ()
(gdb) x/i $rip
=> 0x401cc1:    call   0x4010b0 <munmap@plt>

SIGBUS at a call? Wat? Instead of AAAA, let’s send it \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0:

$ echo -ne '\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' | nc -vv 192.168.1.103 9730

Which results in:

(gdb) run

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7ff9000 in ?? ()
(gdb) x/8i $rip
=> 0x7ffff7ff9000:      push   rax
   0x7ffff7ff9001:      nop
   0x7ffff7ff9002:      push   rdi
   0x7ffff7ff9004:      (bad)
   0x7ffff7ff9005:      jg     0x7ffff7ff9007
   0x7ffff7ff9007:      add    dh,bl
   0x7ffff7ff9009:      fs
   0x7ffff7ff900a:      fcomip st,st(7)

(gdb) x/32xb $rip
0x7ffff7ff9000: 0x50    0x90    0xff    0xf7    0xff    0x7f    0x00    0x00
0x7ffff7ff9008: 0xde    0x64    0xdf    0xf7    0xff    0x7f    0x00    0x00
0x7ffff7ff9010: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7ffff7ff9018: 0xc0    0x53    0xdf    0xf7    0xff    0x7f    0x00    0x00

I don’t know what’s going on, but that sure doesn’t look like code it’s trying to run!

That’s enough of messing around, we’re gonna have to do a deep dive to figure out what’s happening…

Reading itself

After it forks off a thread, and before it reads anything from the socket, the process opens itself and reads all the bytes:

# strace ./byhd-fixed
execve("./byhd-fixed", ["./byhd-fixed"], [/* 21 vars */]) = 0
brk(0)                                  = 0x1d66000
[...]
bind(3, {sa_family=AF_INET, sin_port=htons(9730), sin_addr=inet_addr("192.168.1.103")}, 16) = 0
listen(3, 20)                           = 0
accept(3, {sa_family=AF_INET, sin_port=htons(39624), sin_addr=inet_addr("192.168.1.201")}, [16]) = 4
[...]
chdir("/home/byhd")                     = 0
stat("./byhd-fixed", {st_mode=S_IFREG|0755, st_size=18896, ...}) = 0
open("./byhd-fixed", O_RDONLY)          = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\2\0>\0\1\0\0\0\220\21@\0\0\0\0\0"..., 18896) = 18896
close(3)                                = 0
read(4,
[...]

That’s interesting! First it accepts a connection, then it gets the size of itself, opens itself, and reads the whole thing. Then, finally, it tries to read from the socket it opened.

My first hunch was that it’s some anti-tampering code, which isn’t 100% wrong (nor was it 100% right). I threw together a quick wrapper in C to fix things:

#include <unistd.h>

int main(int argc, const char *argv[])
{
  execlp("/home/byhd/byhd-fixed", "/home/byhd/byhd", NULL);

  printf("Fail :(\n");
  return 0;
}

Note that I’m running “/home/byhd/fixed/byhd”, but setting argv[0] to “/home/byhd/byhd”. You can verify with strace that it indeed opens the ‘real’ executable and not the modified one:

# strace ./wrapper
execve("./wrapper", ["./wrapper"], [/* 21 vars */]) = 0
brk(0)
[...]
execve("/home/byhd/byhd-fixed", ["/home/byhd/byhd"], [/* 21 vars */]) = 0
brk(0)                                  = 0x2007000
[...]
chdir("/home/byhd")                     = 0
stat("/home/byhd/byhd", {st_mode=S_IFREG|0644, st_size=18896, ...}) = 0
open("/home/byhd/byhd", O_RDONLY)       = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\2\0>\0\1\0\0\0\220\21@\0\0\0\0\0"..., 18896) = 18896
close(3)                                = 0
read(4, "\20\0\0\0", 4)                 = 4
read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 16
mmap(NULL, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4fccf13000
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++
Segmentation fault (core dumped)

Histogram

For the next little while, I jumped around quite a bit because I wasn’t sure exactly what I was trying to do. I eventually decided to start from the top; that is, the code that runs right after it reads the file.

Before I explain what’s happening, let’s take a look at some assembly! If you’re interested in the actual code, have a look at this; otherwise, just skip past to the description. I re-implemented this in a few lines of Ruby.

Note that I’ve removed a bunch of in-between code, including register movement and error handling, to just show the useful parts:

; The function definition
.text:0040173B ; int __cdecl generate_histogram(char *itself, size_t length)
.text:0040173B

; Allocate 0x400 bytes
.text:0040175F                 mov     edi, 400h       ; size
.text:00401764                 call    _malloc         ; Allocate 0x400 bytes
.text:00401769                 mov     [rbp+allocated], rax

; In this loop, ebx is the loop counter
.text:0040178C top_loop:                               ; CODE XREF: generate_histogram+7Cj

; Point 'rax' to the current byte (the string plus the index)
.text:0040178C                 mov     edx, ebx        ; edx = current iteration
.text:0040178E                 mov     rax, [rbp+itself]
.text:00401792                 add     rax, rdx        ; Go to the current offset in the file

; Read the current byte
.text:00401795                 movzx   eax, byte ptr [rax] ; Read the current byte

; Multiply it by 4
.text:0040179B                 lea     rdx, [rax*4+0]  ; rdx = current_byte * 4

; Set rax to that index in the array
.text:004017A3                 mov     rax, [rbp+allocated] ; rax = allocated buffer
.text:004017A7                 add     rax, rdx        ; Go to that offset

; Increment the index
.text:004017AA                 mov     edx, [rax]
.text:004017AC                 add     edx, 1
.text:004017AF                 mov     [rax], edx      ; Increment that offset

; Increment the loop counter
.text:004017B1                 add     ebx, 1
.text:004017B4
.text:004017B4 bottom_loop:                            ; CODE XREF: generate_histogram+4Fj

; Loop till we're at the end, then return
.text:004017B4                 cmp     ebx, [rbp+itself_length]
.text:004017B7                 jb      short top_loop
.text:004017C9                 retn

It turns out, the code generates a histogram based on the executable file. In other words, it basically does this:

data = File.new(ARGV[0]).read

histogram = {}
histogram.default = 0

data.each_byte() do |b|
  histogram[b.chr] = histogram[b.chr] + 1
end
puts(histogram)

Which might look like:

$ ruby histogram.rb /etc/passwd
{"r"=>73, "o"=>119, "t"=>54, ":"=>204, "x"=>38, "0"=>39, "/"=>141, "b"=>78, "i"=>80, "n"=>107, "a"=>105, "s"=>78, "h"=>22, "\n"=>34, "1"=>41, "f"=>27, "l"=>71, "e"=>85, "d"=>81, "m"=>35, "2"=>18, "3"=>16, "4"=>13, "v"=>21, "p"=>58, "7"=>3, "y"=>31, "c"=>13, "5"=>12, "u"=>33, "w"=>9, "6"=>10, "9"=>3, "g"=>39, " "=>68, "8"=>6, "+"=>2, "q"=>2, "k"=>7, "z"=>4, "-"=>1}

Running that code on the actual binary, it works great; but it’s a lot longer and much uglier output, so I didn’t want to include it here. Feel free to try :)

I was still working off the assumption this was all anti-tampering code. As I said earlier, that was only partly right…

Building a tree

This is where it started to get weird and interesting. I could understand the histogram being generated, but then it stated adding and removing stuff from the array! What was happening!?

Once again, here’s the actual annotated code, with the error handling and stuff removed. Feel free to read or skip it!

.text:00402127 enter_second_loop:                      ; CODE XREF: generate_block_tree+2AFj

; Remove and store the smallest entry
.text:00402127                 mov     rax, [rbp+entry_list] ; An array of 256 8-byte values, each of which points to 20 bytes of allocated memory
.text:0040212B                 mov     rdi, rax        ; allocated_block
.text:0040212E                 call    get_smallest_entry ; Removes the smallest histogram entry from the list and returns it
.text:00402133                 mov     [rbp+smallest_entry], rax

; Remove and store the next smallest entry
.text:00402137                 mov     rax, [rbp+entry_list] ; An array of 256 8-byte values, each of which points to 20 bytes of allocated memory
.text:0040213B                 mov     rdi, rax        ; allocated_block
.text:0040213E                 call    get_smallest_entry ; Removes the smallest histogram entry from the list and returns it
.text:00402143                 mov     [rbp+next_smallest_entry], rax

; Allocate memory for a new entry
.text:00402165                 mov     edi, 20h        ; size
.text:0040216A                 call    _malloc         ; Allocate space for a new entry
.text:0040216F                 mov     [rbp+new_entry], rax

; Store the smallest entry in the 'left' branch
.text:004021B2                 mov     rax, [rbp+new_entry]
.text:004021B6                 mov     rdx, [rbp+smallest_entry]
.text:004021BA                 mov     [rax+entry_struct.left_histogram], rdx

; Store the next-smallest entry in the 'right' branch
.text:004021BD                 mov     rax, [rbp+new_entry]
.text:004021C1                 mov     rdx, [rbp+next_smallest_entry]
.text:004021C5                 mov     [rax+entry_struct.right_histogram], rdx

; Get the sum of the two entry values (the character counts, if they're leaf nodes)
.text:004021E2                 mov     rax, [rbp+smallest_entry] ; ** This section puts the two smallest histograms into a new field, then puts their sum into the 'value' entry
.text:004021E6                 mov     edx, [rax+entry_struct.histogram_entry]
.text:004021E9                 mov     rax, [rbp+next_smallest_entry]
.text:004021ED                 mov     eax, [rax+entry_struct.histogram_entry]
.text:004021F0                 add     edx, eax

; Store the sum of the two child nodes in the new node's entry
.text:004021F2                 mov     rax, [rbp+new_entry]
.text:004021F6                 mov     [rax+entry_struct.histogram_entry], edx

; Store the new node at the end of the list
.text:00402201                 mov     rsi, rdx        ; new_entry
.text:00402204                 mov     rdi, rax        ; entry_list
.text:00402207                 call    add_entry_to_end_maybe

What I could see in the code is that it removed the two smallest entries, created a new entry that points to them, and put it at the end of the list. Then it would loop until there was only one entry. Typing it like that makes it sound really obvious to me, but digging through the code was remarkably difficult. In fact, it took me a long, long time to realize that it was removing the smallest entries and adding a new entry. I totally misunderstood the code…

Anyway, let’s look at an example: you have the string ‘eebddaafbcdfdbaee’. The histogram looks like:

{"a"=>3, "b"=>3, "c"=>1, "d"=>4, "e"=>4, "f"=>2}

We’ll re-write it, for simplicity, like this:

a[3]  b[3]  c[1]  d[4]  e[4]  f[2]

First, it removes the smallest two entries from the list, c[1] and f[2]:

a[3]  b[3]  d[4]  e[4]

And replaces them with another node that contains their combined value, with the two removed values underneath:

a[3]  b[3]  d[4]  e[4]  [3]
                        / \
                     c[1] f[2]

Then the next two smallest values are removed and combined under a single parent:

d[4]  e[4]  [3]       [6]
            / \       / \
         c[1] f[2] a[3] b[3]

Then remove the smallest two again. This step is interesting because one of the smallest nodes is a non-leaf:

e[4]   [6]
       / \
    a[3] b[3]

Add them back under a common node at the end:

e[4]   [6]       [7]
       / \       / \
    a[3] b[3]  [3] d[4]
               / \
             c[1] f[2]

Then the [4] and [6] are similarly combined:

     [10]         [7]
     /  \         / \
  e[4]  [6]     [3] d[4]
        / \       / \
     a[3] b[3]  c[1] f[2]

And finally, we only have a single parent node:

            [17]
          /     \
     [10]         [7]
     /  \         / \
  e[4]  [6]     [3] d[4]
        / \      /    \
     a[3] b[3] c[1]   f[2]

And there you have it! A tree!

It’s really funny: when I was working on this, I had the feeling at the back of my mind that this was a real tree algorithm, but I read a bunch on Wikipedia and couldn’t find one that matched, so I gave up and just continued. Today, my co-worker mentioned “oh, like Huffman encoding!” and I said “nah, there’s no compression involved”.

But, as soon as I built that tree by hand, I realized that this absolutely IS a Huffman Tree! And checking Wikipedia, I can confirm that it is. That would have made the last part a whole lot easier…

Using the incoming data

Now, what’s going on with the data I send in? I already confirmed that it’s a 4-byte length value followed by some code, and the program crashes in different and creative ways depending on what code I send it. Now what?

If I’d recognized the Huffman Tree, I could have made a pretty good guess: that we’re sending huffman-compressed data. And it would have been right, too! Unfortunately, I missed the obvious hints…

Anyway, I don’t really want to dwell too much on this part, since it’s conceptually really simple. There was a loop that would go through the data string you sent it, and touch each byte you sent 8 times. Hmm. Then there was some bitwise arithmetic that would do some shifting and ANDing of each byte. You’d think I would have figured out by that that it’s breaking the string into bits, but I didn’t. What made the light bulb go on was this:

.text:00401315                 and     eax, 1
.text:00401318                 test    eax, eax
.text:0040131A                 jz      short use_left
.text:0040131C                 mov     rax, [rbp+current_node] ; starts at the root
.text:00401320                 mov     rax, [rax+entry_struct.right_histogram]
.text:00401324                 mov     [rbp+current_node], rax ; starts at the root
.text:00401328                 jmp     short restart_loop
.text:0040132A ; ---------------------------------------------------------------------------
.text:0040132A
.text:0040132A use_left:                               ; CODE XREF: walk_tree_to_get_value+9Ej
.text:0040132A                 mov     rax, [rbp+current_node] ; starts at the root
.text:0040132E                 mov     rax, [rax+entry_struct.left_histogram]
.text:00401331                 mov     [rbp+current_node], rax ; starts at the root
.text:00401335
.text:00401335 restart_loop:                           ; CODE XREF: walk_tree_to_get_value+ACj
.text:00401335                 add     [rbp+current_index_maybe], 1

Basically, based on the right-most bit in eax, it makes a decision to either jump to the left node or the right node. Then, when it gets to the leaf…

.text:00401363                 mov     rax, [rbp+current_node] ; starts at the root
.text:00401367                 movzx   eax, [rax+entry_struct.byte_value] ; Return the byte value at this leaf
.text:00401374                 pop     rbx
.text:00401375                 pop     rbp
.text:00401376                 retn

…it returns the byte value in that leaf. If you read up on Huffman Trees, you’ll see that that’s exactly how they work.

The calling function adds that byte value to the end of an executable memory segment. When we’re done reading the tree, the list of leaf-node bytes we’ve found are executed.

To summarize:

Build a histogram from itself
Convert that histogram into a Huffman Tree
Read data from the socket
Convert that data to bits, and use those bits to navigate the tree

We’re almost done but… how do we get that tree!?

Putting it all together

All right, we understand the code, now we have to write an exploit. What do we do!?

The “right” way to do this is to build the same tree the same way, and to walk it from the node to the root to get he values. But this is a CTF and I want it to work on the first try, damnit! None of that “reconstructing the exact algorithm” nonsense! So I decided to steal their memory. :)

So, the first thing I did was put a breakpoint immediately after the tree was built to find the address of the root. On my box, the address of the root node, after the tree was built, happened to be 0x60e050. Then I wanted to dump the heap:

(gdb) x/1000000xb 0x603000
0x603000:       0x1b    0x0c    0x07    0x08    0x90    0x01    0x07    0x10
0x603008:       0x14    0x00    0x00    0x00    0x1c    0x00    0x00    0x00
0x603010:       0x80    0xe1    0xff    0xff    0x2a    0x00    0x00    0x00
0x603018:       0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x603020:       0x14    0x00    0x00    0x00    0x00    0x00    0x00    0x00
[...]

I determined the starting address by trial and error - I wanted it to be as small as possible, but I needed the memory between it and the root node to be contiguous. 0x602000 wasn’t allocated, but 0x603000 worked fine.

I dumped that to a file from gdb, then processed the file with Ruby:

# Split the file into lines
file.split(/\n/).each do |line|
  if(line =~ /Cannot/)
    break
  end

  # Break off the address and data
  addr, data = line.split(/:\s/, 2)
  addr = addr.to_i(16)

  # Remove the crud
  data.gsub!(/0x/, '')
  data.gsub!(/[^a-fA-F0-9]/, '')

  # Get both the qword (64-bit value) and pair of dwords (32-bit values) on that line
  qword = [data].pack("H*").unpack("Q").pop
  dword1 = qword & 0x0FFFFFFFF
  dword2 = (qword >> 32) & 0x0FFFFFFFF

  # Store them, indexed by the address
  @@qwords[addr] = qword
  @@dwords[addr] = dword1
  @@dwords[addr+4] = dword2
end

Now I have a nice address->value mapping for the entire dumped memory. So I can now define a function to read a node:

def get_node(addr)
  node = {}
  node[:addr]  = addr
  node[:left]  = @@qwords[addr + 0x00]
  node[:right] = @@qwords[addr + 0x08]
  node[:value] = @@dwords[addr + 0x10]
  node[:count] = @@dwords[addr + 0x1c]

  return node
end

And, starting at the root, re-build the whole tree recursively:

@@seqs = {}
def walk_nodes(addr, seq = "")
  node = get_node(addr)
  #print_node(node)

  if(node[:left] != 0)
    walk_nodes(node[:left], seq + "0")
  end

  if(node[:right] != 0)
    walk_nodes(node[:right], seq + "1")
  end

  if(node[:left] == 0)
    @@seqs[node[:value]] = seq
  end
end

node = get_node(ROOT)
walk_nodes(ROOT)

In the end, @@seqs looks like this:

0 => 0
255 => 1000
215 => 100100000000
177 => 100100000001
30 => 10010000001
56 => 1001000001
220 => 100100001
114 => 10010001
7 => 1001001
133 => 10010100
97 => 10010101
...

Now that I have a mapping, I can do a bunch of lookups for the shellcode of my choice:

tree_code = ""

SHELLCODE.split(//).each do |c|
  tree_code += @@seqs[c.ord]
end

while((tree_code.length % 8) != 0) do
  tree_code += '0'
end
tree_code = [tree_code].pack("B*")
tree_code = [tree_code.length].pack("I") + tree_code
tree_code.split(//).each do |b|
  print('\x%02x' % b.ord)
end
puts()

And I’m done! I have a compressed version of my shellcode that I can feed directly into the program:

\x89\x00\x00\x00\xd7\x5f\x67\xae\xbe\x35\xd7\xdf\x0d\x75\xfe\x09\x7d\x75\xf6\x75\x13\xba\xc3\x51\x33\xd4\x55\x44\xfc\xea\x91\x51\x3d\x0d\x83\xc3\xf5\xd1\x7e\xd9\xeb\xaf\xbe\x17\xd7\x5f\xe0\x98\xb3\x7f\x0f\x8e\xab\xbb\x2d\x9a\xfb\xaa\xee\xea\x0c\xfc\xd7\xdd\x57\xc7\xd5\x48\x7e\x9e\x03\xaf\x6c\x0a\x89\xe2\xaa\x46\x2d\x76\xc3\x51\x35\x36\xc1\xe1\xfa\xf5\xd7\xdf\x0a\x89\xc0\xa8\xad\x47\x55\x51\x33\x16\xc1\xe1\xfa\xf9\x6f\x86\xba\xf8\xae\x22\xb8\x8a\x8a\xaa\x46\xb0\x7c\x0b\x81\x6d\x39\xfe\x5e\x05\x47\x57\xad\xf3\x20\x78\xeb\x88\xdf\x6c\x35\x13\xc0\x6c\x1e\x1f\xac

Conclusion: my 118 bytes of shellcode compresses down to a clean 142 bytes. :)

Summary

So, once you figure it out, this level is actually pretty straight forward!

Basically, read its own binary, build a Huffman Tree, then use the user’s input to walk that Huffman Tree to build the executable code to run. Or, in other words, decompress and run the shellcode that we send!