In a recent challenge I needed to get access to a system by exploiting the way Python deserializes data using the pickle module. In this article I want to give a quick introduction of how to pickle/unpickle data, highlight the issues that can arise when your program deals with data from untrusted sources and “dump” my own notes.
What is pickle?
In Python, the pickle module lets you serialize and deserialize data. Essentially, this means that you can convert a Python object into a stream of bytes and then reconstruct it (including the object’s internal structure) later in a different process or environment by loading that stream of bytes.
How to dump and load?
In Python you can serialize objects by using pickle.dumps():
Example:
import pickle
pickle.dumps(['hackgod', 'python', 78, 100])
The pickled representation we’re getting back from dumps will look like this:
b'\x80\x04\x95\x1c\x00\x00\x00\x00\x00\x00\x00]\x94(\x8c\x07hackgod\x94\x8c\x06python\x94KNKde.'
Reading the serialized data back in::
import pickle
pickle.loads(b'\x80\x04\x95\x1c\x00\x00\x00\x00\x00\x00\x00]\x94(\x8c\x07hackgod\x94\x8c\x06python\x94KNKde.')
our list object...
['hackgod', 'python', 78, 100]
Theory:
What is actually happening behind the scenes is that the byte-stream created by dumps contains opcodes that are then one-by-one executed as soon as we load the pickle back in. If you are curious how the instructions in this pickle look like, you can use pickletools to create a disassembly:
Code representation:
>>> import pickle
>>> data = pickle.dumps(['hackgod', 'python', 78, 100])
>>> import pickletools
>>> pickletools.dis(data)
0: \x80 PROTO 4
2: \x95 FRAME 28
11: ] EMPTY_LIST
12: \x94 MEMOIZE (as 0)
13: ( MARK
14: \x8c SHORT_BINUNICODE 'hackgod'
23: \x94 MEMOIZE (as 1)
24: \x8c SHORT_BINUNICODE &apospython&apos
32: \x94 MEMOIZE (as 2)
33: K BININT1 78
35: K BININT1 100
37: e APPENDS (MARK at 13)
38: . STOP
highest protocol among opcodes = 4
Controlling the behavior of pickling/unpickling:
Not every object can be serialized (e.g. file handles) and pickling and unpickling certain objects (like functions or classes) comes with restrictions. The Python docs give you a good overview what can and cannot be pickled.
While in most cases you don’t need to do anything special to make an object “picklable”, pickle still allows you to define a custom behavior for the pickling process for your class instances. Reading a bit further down in the docs we can see that implementing __reduce__ is exactly what we would need to get code execution, when viewed from an attacker’s perspective:
So by implementing __reduce__ in a class which instances we are going to pickle, we can give the pickling process a callable plus some arguments to run. While intended for reconstructing objects, we can abuse this for getting our own reverse shell code executed.
Data that can be pickled / unpickled or Not.- LanguagePython
- StackPHP, jQuery, MySQL