Tutorial - Beginner's Guide to Fuzzing

Part 1: Simple Fuzzing with zzuf

Part 1: zzuf    Part 2: Address Sanitizer    Part 3: american fuzzy lop

The goal of this tutorial is to get the message out that fuzzing is really simple. Many free software projects today suffer from bugs that can easily be found with fuzzing. This has to change and I hope we can make fuzzing an integral part of most project's development process. What fuzzing does is that we feed an application with a large number of malformed inputs and look for undesired behaviour, e. g. crashes. We usually do this by taking a valid input and add random errors to it.

Promising fuzzing targets are tools that provide parsers for a large number of exotic file formats. Let's take ImageMagick as an example. It's a set of command line tools that process images in a large number of file formats.

How do we fuzz it? We start by generating some input samples. It's usually a good idea to fuzz with small files, so first we create a simple image in any format with small dimensions, e.g. a 3x3 pixel PNG. We'll name that example.png Now we convert that into various other file formats. In this case you can just use ImageMagick itself or more precisely the tool convert that is part of ImageMagick to create your example files:

convert example.png example.gif
convert example.png example.xwd
convert example.png example.tga

Use as many as you like (convert -list format will show you all supported formats). Now we need malformed versions of these example files. Here we start using the tool zzuf. It's a simple fuzzing tool and is available in most Linux distributions.

There are different ways to use zzuf, for now we will just use it to create a large number of malformed files from our example.* input files. We can do that with a simple bash loop:

for i in {1000..3000}; do for f in example.*; do zzuf -r 0.01 -s $i < "$f" > "$i-$f"; done; done

What happens is that for every example file we create 2000 malformed variants, all named in the form [number]-example.[extension]. The -r parameter for zzuf is the amount of change you want in a file. 0.01 means 1% of the file gets changed randomly. The -s parameter is the seed. For every different s value we get a different output. You can of course adapt the number of variants, but judging from experience 2000 is a reasonable number to start with.

Now the next step: We want to feed ImageMagick with these malformed files. To do that we just need any command that does something with our files. We can also use convert for that. To actually increase the chance of hitting bugs we'll instruct convert to resize the malformed images. We redirect the output into a logfile we can later inspect:

LC_ALL=C; LANG=C; for f in *-example.*; do timeout 3 convert -resize 2 "$f" /tmp/test.png; echo $f; done &> fuzzing.log

The LC_ALL=C; LANG=C; makes sure we set our output to english language. We do this because we now want to grep for error messages. The timeout command makes sure we stop when a single file takes too long. This will miss endless loop bugs. We can workaround that, but I will spare that for this introduction tutorial. Also after every call of convert we output the name of the current file. This should be quite obvious - we want to know later which file caused a crash when we found one.

This can take quite some time and depending on how many inputs you use the output file may become quite large, so make sure you place it somewhere with some gigabytes of space.

Now we check if we found something. We look for Segmentation faults in our logfile:

grep -C2 "Segmentation fault" fuzzing.log

If we found any crashes we should see them now. We should also see the filename of the files that caused the crash.

Using zzuf directly

The above strategy is useful for tools with many input file formats, but it's not the most efficient one. It is possible to let zzuf itself run the tool you want to fuzz. zzuf is able to parallelize the task and it can also detect hangs of the tested software.

As we took imagemagick as an example we'll now use another one. Let's take objdump, it's a tool to debug executable files. It is part of binutils. So for objdump we need an executable as an input. For our test we will use a windows EXE file. You can get a trivial EXE from our file formats archive. Now we run zzuf on objdump and our executable:

zzuf -s 0:1000000 -c -C 0 -q -T 3 objdump -x win9x.exe

The -s means we'll try one million seed values. The -c means zzuf should only fuzz the files given on the command line. This is useful because otherwise often the tools will already throw error messages from reading config files or other things, so they won't really get to our fuzzed input. -C 0 means zzuf should not stop after the first crash found. -q suppresses the output of our fuzzed command. -T 3 sets a timeout of three seconds so zzuf won't hang if we run into an endless loop.

You will see output like this:

zzuf[s=215,r=0.004]: signal 11 (SIGSEGV)

This means zzuf spottet a segfault with parameters -s 215 and -r 0.004. Now we re-create this malformed file:

zzuf -r 0.004 -s 215 < win9x.exe > crash.exe

Analyzing and Reporting

Now we have a fuzzed file that generates a crash in our application. We can send these to the application author. Please note: If you fuzzed one of the example applications mentioned here a large number of issues have already been reported there and the authors are busy fixing them. If you want to report anything make sure you test the crashes with their latest upstream git/svn code.

We can also do some more analysis on them. A handy tool is valgrind. Just run valgrind -q in front of your crashing command (-q suppresses some unneccessary output):

valgrind -q objdump -x crash.exe

It will give you an output like this:

==22449== Process terminating with default action of signal 11 (SIGSEGV)
==22449==  Access not within mapped region at address 0x7715FF3
==22449==    at 0x4E7FAC0: bfd_getl16 (libbfd.c:570)
==22449==    by 0x4EE356D: pe_print_idata (peigen.c:1328)
==22449==    by 0x4EE356D: _bfd_pe_print_private_bfd_data_common (peigen.c:2160)
==22449==    by 0x4EDE1F8: pe_print_private_bfd_data (peicode.h:335)
==22449==    by 0x408504: dump_bfd_private_header (objdump.c:2643)
==22449==    by 0x408504: dump_bfd (objdump.c:3214)
==22449==    by 0x408AA7: display_object_bfd (objdump.c:3313)
==22449==    by 0x408AA7: display_any_bfd (objdump.c:3387)
==22449==    by 0x40AB22: display_file (objdump.c:3408)
==22449==    by 0x405249: main (objdump.c:3690)

This should already give a pretty good idea what's going on. It is always nice to send some more info when reporting these issues to the upstream developers. The output will be less detailed if the software you fuzzed has no debugging symbols enabled. If you compile software for fuzzing you can include -ggdb in your compiler flags to make sure you get more debugging information. If you compile software for fuzzing it is also a good idea to disable shared libraries if possible. Then you won't run into problems that your fuzzed software uses the system libraries instead. For software with configure scripts this would work like this:

CFLAGS="-ggdb" CXXFLAGS="-ggdb" ./configure --disable-shared

That's it for the first part. You may also want to consult the official zzuf Tutorial.

In part 2 we will improve our bug finding abilities with Address Sanitizer. In part 3 we will introduce american fuzzy lop, a much smarter fuzzing tool.

Part 1: zzuf    Part 2: Address Sanitizer    Part 3: american fuzzy lop

CC0
The Fuzzing Project is run by Hanno Böck