Lately, I’ve been reverse engineering a reasonably complex network protocol, and I ran into a mystery - while the protocol is generally an unencrypted binary protocol, one of the messages was large and random. In an otherwise unencrypted protocol, why is one of the messages unreadable? It took me a few hours to accomplish what should have been a couple minutes of effort, and I wanted to share the trick I ultimately used!
I’m going to be intentionally vague on the software, and even modify a few things to make it harder to identify; I’ll probably publish a lot more on my work blog once I’m finished this project!
Let’s take a look at the binary protocol! If you’re familiar with protocols and just want to see the “good stuff”, feel free to skip down to the next header.
A “binary protocol” is a network protocol that uses unprintable characters (as opposed to a protocol like HTTP, which is something you can type on your keyboard). Often, you’ll use a tool like Wireshark to grab a sample of network traffic (a “packet capture”, or “PCAP”) and, if it’s not encrypted, you can start drawing conclusions about what the client and server expect. In a PCAP, you might see requests / responses that look like this:
Outbound: 08 00 00 00 2c 00 00 00 ....,... Inbound: 40 00 00 00 2c 00 00 00 55 53 52 53 05 00 00 00 @...,... USRS.... 2c 00 00 00 02 00 00 00 55 38 f9 ed 21 59 47 f5 ,....... U8..!YG. 8f 9d 43 59 33 5c 2e 92 00 00 00 00 c4 54 f4 01 ..CY3\.. .....T.. 8d b4 43 e7 9e 9f ea db 4e 76 1a 7a 00 00 00 00 ..C..... Nv.z....
I don’t want to get too buried in the weeds on how this protocol actually works, but when you work with unknown binary protocols a lot, certain things start to stand out.
First, let’s talk about endianness! The way integers are encoded into protocols vary based on the protocol, but a very common way to encode a 4-byte (32-bit) number is either big endian (
00 00 00 08) or little endian (
08 00 00 00). There are historic reasons both exist, and both are common to see, but based on the structure of those messages, we can guess that the first 4 bytes are either a big-endian integer with the value 0x08000000 or a little-endian integer with the value 0x00000008. The latter seems more likely, because that would make a great length value; speaking of lengths…
Second, let’s talk about TCP - TCP is a streaming protocol, which means there is no guarantee that if you send 100 bytes, the receiver will receive those 100 bytes all at once. You ARE guaranteed that if you received data, it’ll be the correct bytes in the correct order; however, you might get 50 now and 50 later, or 99 now and 1 later, or maybe the next 50 bytes will be attached and you’ll get 150 bytes all at once. As a result, TCP-based services nearly always encode a length value near the start, allowing protocols to unambiguously receive complete messages.
Because of all that, one of the first things I do when approaching a new protocol is try to identify the length field. In this case, you’ll note that the packet that starts with 0x08 is 8 bytes long, and the packet that starts with 0x40 is 0x40 bytes long. That looks promising! And, as it turns out, is correct.
Once we have a length field, the next thing to consider is how the client and server multiplex messages. In an HTTP protocol, there’s a URI, which tells the server where to direct the request. In a binary protocol, there isn’t typically a free-form string like that; instead, you commonly see a “message id” field (or packet id, or any number of other names). Typically, these will be near the start of a message, and typically, the structure of the remainder of the message will be based on that value. So finding similar looking messages with the same identifier near the start is one way to identify the message id. Another way - not necessarily super reliable, mind you - is to look for a request and response that appear to go together and that have the same integer near the start; often, responses have the same message id as requests. In the request/response above, the 0x2c value seems like a great candidate for a message id, and it is!
If you have access to the binary - and it’ll be awfully hard to reverse an unknown protocol without one! - you can often validate that sort of guess by finding an enormous
switch statement close to a
recv - this is from disassembling the binary in IDA:
.text:00007FF77741B624 loc_7FF77741B624: ; CODE XREF: switches_msg_id+89↑j .text:00007FF77741B624 mov [rbp+50h+var_A0], r15 .text:00007FF77741B628 mov [rbp+50h+var_98], 7 .text:00007FF77741B630 mov word ptr [rbp+50h+str_adminsupport], r15w .text:00007FF77741B635 mov eax, [rsi+10h] .text:00007FF77741B638 dec eax ; switch 490 cases <-- 490 options! .text:00007FF77741B63A cmp eax, 1E9h .text:00007FF77741B63F ja def_7FF77741B65E ; jumptable 00007FF77741B65E default case, cases 13-15,25,[.......way way way more.......] .text:00007FF77741B645 lea rcx, unk_7FF776FD0000 .text:00007FF77741B64C movzx eax, ds:(byte_7FF77741D5EC - 7FF776FD0000h)[rcx+rax] .text:00007FF77741B654 mov edx, ds:(jpt_7FF77741B65E - 7FF776FD0000h)[rcx+rax*4] .text:00007FF77741B65B add rdx, rcx .text:00007FF77741B65B ; } // starts at 7FF77741B5E1 .text:00007FF77741B65E jmp rdx ; switch jump
I used a debugger to put a breakpoint on that
switch jump (at
00007FF77741B65E), then replayed the initial 8-bit message message. When execution reached that
jmp, it breaks into debug mode. I go to the next statement (ie, tell the program to perform the
jmp rdx instruction), and wind up at this piece of code:
.text:00007FF77741BC23 loc_7FF77741BC23: ; CODE XREF: switches_msg_id+FE↑j .text:00007FF77741BC23 ; DATA XREF: switches_msg_id:jpt_7FF77741B65E↓o .text:00007FF77741BC23 mov r9, r14 ; jumptable 00007FF77741B65E case 44 .text:00007FF77741BC26 mov r8, rsi .text:00007FF77741BC29 mov rdx, rdi .text:00007FF77741BC2C mov rcx, rbx .text:00007FF77741BC2F call sub_7FF777411F20 ; I believe this just returns the site_id .text:00007FF77741BC34 jmp loc_7FF77741D190
Case 44, in hex, is case 0x2c - the message id! That pretty much confirms the message id.
The rest of message 0x2c isn’t super interesting - the response is an array of 20-byte identifiers that don’t really matter. I chose it as an example because it’s pretty short.
Now that we’ve talked a bit about reversing a protocol, let’s look at the mystery!
The mystery blob!
While working on this project, I noticed one packet that stands out: immediately after authenticating, the client sends an 8-byte request with id 0x0c, and the server responds with an enormous response (0x17ba bytes!) with the message type 0xff7f:
Outbound: 08 00 00 00 0c 00 00 00 ........ Inbound: ba 17 00 00 7f ff 00 00 78 9c ed 9c 07 78 14 d5 ........ x....x.. da c7 4f 68 4a a4 08 2a 8a 08 ac 41 ba 09 09 04 ..OhJ..* ...A.... 08 a1 98 0e 01 52 c8 86 be 10 36 9b dd 64 c9 6e .....R.. ..6..d.n 36 6c 09 04 01 47 34 08 88 02 a1 08 5e 01 95 26 6l...G4. ....^..& 20 52 04 11 01 11 09 58 10 50 04 41 b0 d1 91 22 R.....X .P.A..." d2 94 26 f9 ce bc ff 9d 24 b3 99 4d a2 72 ef e3 ..&..... $..M.r.. 77 ef 2c 4f f2 cb fb 9e 33 a7 cd 99 99 ff 79 f7 w.,O.... 3.....y. 30 f7 4c 64 ac 06 63 4c db 5b 9b 7c 2f a7 0f ff 0.Ld..cL .[.|/... b9 f8 02 63 df 16 16 16 46 b2 50 a6 63 89 cc ce ...c.... F.P.c... 6c 2c 9d ff d6 33 2b 8b e2 bf 9d fc 47 c7 ba 33 l,...3+. ....G..3 [.........]
My first (incorrect) instinct is that it’s encrypted, which raises the obvious question: if it’s encrypted, what is it??? It’s gotta be pretty interesting if it’s a gigantic, encrypted blob, right? That was my logic as I excitedly dove into the weeds!
So I set off to figure out what it actually is. The function that handles message 0x0c is large and complex. Some debugging revealed strings like “Default settings” and “all users” and stuff, which made me think it’s configuration. Why would they encrypt configuration? Maybe it has passwords? Maybe this is exciting!
Unfortunately, it’s C++, and it’s a complex function. Like really complex. I could have spent a week reversing the whole thing, with all the objects involved, and I don’t have time for that! Hoping for an easy victory (identifying what, exactly, is encrypted), I spent a couple hours tracing through the various interesting-looking function calls, looking for something that looks encryptiony to me. At one point I noticed code that looks like:
.text:00007FF7776FFEBA mov rax, 4924924924924925h .text:00007FF7776FFEC4 sub rcx, [r14+10h] .text:00007FF7776FFEC8 mov r8d, 4 .text:00007FF7776FFECE imul rcx .text:00007FF7776FFED1 mov rcx, [rdi] .text:00007FF7776FFED4 sar rdx, 4 .text:00007FF7776FFED8 mov rax, rdx .text:00007FF7776FFEDB shr rax, 3Fh .text:00007FF7776FFEDF add rdx, rax
I googled the constant
0x4924924924924925 (I can do a whole other post about googling constants!), and found some mentions of encryption, some CTF challenges, stuff like that. It looked promising, but the code around it didn’t feel like encryption to me - it kinda looked like it could be a hashtable. I didn’t really know.
Eventually, at the bottom of one of the functions, I noticed an exception being thrown:
.text:00007FF7777000A2 loc_7FF7777000A2: ; CODE XREF: sub_7FF7776FFE80+70↑j .text:00007FF7777000A2 lea rcx, [rbp+pExceptionObject] .text:00007FF7777000A6 call sub_7FF77709D600 .text:00007FF7777000AB lea rdx, __TI3?AVCArchiveException@io@Shared@@ ; pThrowInfo .text:00007FF7777000B2 lea rcx, [rbp+pExceptionObject] ; pExceptionObject .text:00007FF7777000B6 call _CxxThrowException
AVCArchiveException? Wait, Archive? Does that mean….?
Solving the mystery
As part of my normal process, I was building a client in Ruby, implementing the protocol as I go. Here’s the code I used to generate message 0x0c with a blank body:
encrypted = send_recv(s, 0x0c, '')
I had tried different techniques to decrypt the response; since the size (0x17ba or 6074) isn’t a multiple of 8 or 16, I knew it wasn’t a block cipher. But I was otherwise stuck.
After seeing that exception, I realized that, what I should have done, was write the response to a file:
Then use the Linux
file command to tell me what the message actually was:
$ hexdump -C /tmp/test | head -n3 00000000 78 9c ed 9c 07 78 14 d5 da c7 4f 68 4a a4 08 2a |x....x....OhJ..*| 00000010 8a 08 ac 41 ba 09 09 04 08 a1 98 0e 01 52 c8 86 |...A.........R..| 00000020 be 10 36 9b dd 64 c9 6e 36 6c 09 04 01 47 34 08 |..6..d.n6l...G4.| $ file /tmp/test /tmp/test: zlib compressed data
Zlib! It’s compressed, not encrypted! D’oh!! I did a bit of reading, and realized that 0x78 (or “x”) is the most character for a Zlib-compressed string to begin with, so that’s something to remember!
Once you know it’s Zlib, you can decompress it with a variety of tools, including
$ openssl zlib -d < /tmp/test | hexdump -C | head -n4 00000000 07 88 00 00 0c 00 00 00 53 4c 53 54 08 00 00 00 |........SLST....| 00000010 01 00 00 00 ef 87 00 00 d9 ff ff ff 43 00 3a 00 |............C.:.| 00000020 5c 00 50 00 72 00 6f 00 67 00 72 00 61 00 6d 00 |\.P.r.o.g.r.a.m.| 00000030 44 00 61 00 74 00 61 00 5c 00 61 00 62 00 63 00 |D.a.t.a.\.a.b.c.| [...]
As you can see, it’s now plaintext - it was never encrypted at all! Just compressed! The worst part is, this isn’t even close to the first time I’ve run into this situation - I just got excited and wasn’t thinking!
In the end, it turned out to be a big blob of settings - nothing exciting. Presumably, they wanted to save some bandwidth. Or maybe, they knew that, some day, they’d waste my time - who knows?
When faced with an unknown data type, try the
file command - it’s identified
zlib, and other stuff for me. It can save you a whole lot of trouble!