Beginner Reverse Engineering Writeup: Script Kiddie 1

Difficulty: Medium
Concepts: Python, code obfuscation
Tools: Python

About

The Script Kiddie 1 challenge demonstrates Python obfuscation techniques used to conceal malicious or suspicious python code. At first glance, the script looks like an impossible mess but after some analysis, you will see the underlying logic. I’ve seen similar techniques in a sample sent to my office. Usually malware obfuscated like this is skid-ish because of the many available one click obfuscators online. Making it easy to generate low effort payloads. This writeup walks through how to recognize and unravel these techniques step by step.

Walkthrough

To start off I run the script to see what happens. All there is, is a string output to the console, not very useful. Let’s make a copy of the script so we can change it and use the power of python to help us deobfuscate. We don’t need to deobfuscate the whole thing, just enough to find out what the goal of this layer of the script is

Open your copy of the script and take a close look and you’ll be able to spot some easily reversible objects. You want to keep an eye out for strings, hex strings and dynamic imports, these are easy places to start. At the end of the first line we can see clearly that four builtins are being assigned to variables through a multiple assignment. Now the script just has to reference those confusing variable names to call exec, __import__, getattr, and bytes! Sneaky… when you see something like this go ahead and find all the occurrences of those variables and replace them with something recognizable and informative. You should be seeing something interesting now more clearly, a dynamic import of the “bytes” builtin along with a getattr chain to resolve the fromhex attribute within it to convert the 3 hex strings we have into bytes. All of that wrapped in an exec call. I can almost guarantee before we even change anything that the massive hex string is another python script.

We could just replace the exec call with a print or write the output to a file and we would have our next layer, but let’s take a look at what the script is actually doing to the hex strings. Take the first string ‘7a6c6962’ and convert it from hex to ascii, it yields ‘zlib’. The next one yields ‘decompress’. The longer third one yields gibberish. Can you guess why? Because its zlib compressed. It needs to be decompressed before you can see the original data.

Ok, find your exec call and replace it with a print so we can get a visual of what we’re working with here… Wow that looks confusing, looks like we’re on the right path. It’s all on one line separated by semicolons so go back and change your print statement to a variable assignment so you’re saving the output from the zlib decompress. Create an empty list and write a for loop to go through the decompressed data in your variable and change any semicolon (;) to a newline (\n), appending to the list on each iteration. After the loop join the list into an empty string and write it to a file.

import zlib
import sys

sys.setrecursionlimit(100000000)
z = bytes.fromhex('INSERT_HEX_ STRING_HERE')
z = zlib.decompress(z).decode()

z_list = []
for letter in z:
    if letter == ';':
        letter = '\n'
    z_list.append(letter)
z = ''.join(z_list)

with open('2_zlib_extracted.py', 'w') as f:
    f.write(z)

Open the file extracted from above and we have now arrived at the second layer of obfuscated data. I can see a few interesting things but this part seems to be heavily obfuscated and nothing is apparent to me. A good start would be to examine the code starting from the last line. Usually these scripts move into the next layer through exec and eval calls located after all the code reconstruction. But I like to investigate a little bit so my first course of action in this case is to open up the python interpreter and go through this code line by line. The cool thing about Python is that you can print pretty much anything and you’ll get some information out of that. So as I go line by line in the interpreter I’ll copy a few variables and calls and print them and see what comes up.

As I go through the code and print random stuff I notice there’s a few code objects and an ast object. The code objects contain python bytecode and would be useful but a more efficient way to get the source from this is to reverse the ast object. An ast object holds a structured tree of python objects, including values and operations. It is created after line 16 has executed. I used the ast and astor library to reconstruct the source code and write it to a file.

Why does the script crash when I execute line 17 in the interpreter?

It uses the __file__ special variable to get its own file path, in the interpreter this is undefined that’s why it crashes. ast uses the filename for nonessential things and can be worked around by defining __file__ or replacing the filename argument with any string.

import ast, astor
mod, fname, mode = (________, _____.____________________(), _____._________________(_____._______________________())[_.__ ** _.__ + _._] + _____._________________(_____._______________())[_.__] + _____._________________(_____._______________________())[_.__ ** _.__ + _._] + _____._________________(_____._______________________())[_.___])
ast.fix_missing_locations(mod)
src = astor.to_source(mod)
open('recovered_payload.py', 'w', encoding='utf-8').write(src)

Opening the third layer we just produced, I can see the light at the end of the tunnel. This one is really easy to reverse. Turn the string into bytes and then base 85 decode it. Or just change the exec into a print and run it. You should see the flag printed.

Enter the flag on The Range to complete this challenge.