Once you have a fuzz target running, you can analyze and tweak it to improve its efficiency. This document describes techniques to minimize fuzzing time and maximize your results.
The most direct way to gauge the effectiveness of your fuzz target is to collect metrics. You can get them manually, or take them from a ClusterFuzz status page after your fuzz target is checked into the Chromium repository.
A fuzzing engine such as libFuzzer typically explores a large search space by performing randomized mutations, so it needs to run as fast as possible to find interesting code paths.
Fuzz target speed is calculated in executions per second (exec/s
). It is printed while a fuzz target is running:
#11002 NEW cov: 1337 ft: 10934 corp: 707/409Kb lim: 1098 exec/s: 5333 rss: 27Mb L: 186/1098
You should aim for at least 1,000 exec/s from your fuzz target locally before submitting it to the Chromium repository. If you’re under 1,000, consider the following improvements:
If your LLVMFuzzerTestOneInput
function is too complex, it can decrease the fuzzer’s execution speed. It can also cause the fuzzer to target specific use-cases or fail to account for unexpected scenarios.
Instead of performing setup and teardown on each input, use static initialization and shared resources. Check out this startup initialization in libFuzzer’s documentation for an example.
LLVMFuzzerTestOneInput
function should be de-allocated, since the function gets called millions of times during a fuzzing session. If you don’t, you’ll often run out of memory and reduce overall fuzzing efficiency.Avoid allocation of dynamic memory wherever possible. Memory instrumentation works faster for stack-based and static objects than for heap-allocated ones.
You can check the percentage of code covered by your fuzz target to gauge fuzzing effectiveness:
A guided fuzzing engine such as libFuzzer considers an input (a.k.a. testcase or corpus unit) interesting if the input results in new code coverage (i.e., if the fuzzer reaches code that has not been reached before). The set of all interesting inputs is called the corpus. A corpus is shared across fuzzer runs and grows over time.
If a fuzz target stops discovering new interesting inputs after running for a while, it typically indicates that the fuzz target is hitting a code barrier (also called a coverage plateau). The corpus for a reasonably complex target should contain hundreds (if not thousands) of inputs.
If a fuzz target reaches coverage plateau with a small corpus, the common causes are checksums and magic numbers. Or, it may be impossible for your fuzzer to reach a lot of code. The easiest way to diagnose the problem is to generate and analyze a coverage report. Then, to fix the issue, try the following:
You can give your fuzz target a starting point by creating a set of valid and interesting inputs called a seed corpus. If you don’t provide a seed corpus, the fuzzing engine has to guess inputs from scratch, which can take time (depending on the size of the inputs and the complexity of the target format). In many cases, providing a seed corpus can increase code coverage by an order of magnitude.
Seed corpuses work especially well for strictly defined file formats and data transmission protocols:
If you’re running a fuzz target locally, you can easily designate a corpus by passing a directory as an argument:
./out/libfuzzer/my_fuzzer ~/tmp/my_fuzzer_corpus
The fuzzer stores all the interesting inputs it finds in the directory.
When running fuzz targets at scale, ClusterFuzz looks for a seed corpus defined in the Chromium source repository. You can define one in your BUILD.gn
file by adding a seed_corpus
attribute to your fuzzer_test
target definition:
fuzzer_test("my_fuzzer") { ... seed_corpus = "test/fuzz/testcases" ... }
If you want to specify multiple seed corpus directories, use the seed_corpuses
attribute instead:
fuzzer_test("my_fuzzer") { ... seed_corpuses = [ "test/fuzz/testcases", "test/unittest/data" ] ... }
All files found in these directories and their subdirectories are stored in a <my_fuzzer>_seed_corpus.zip
output archive.
If you can't store your seed corpus in the Chromium repository (e.g., it’s too large, can’t be open-sourced, etc.), you can upload the corpus to the Google Cloud Storage (GCS) bucket used by ClusterFuzz.
<my_fuzzer>
. If the directory does not exist, create it.<my_fuzzer>
directory, upload your corpus files.seed_corpus
attribute to your fuzzer_test
target definition. However, adding seed corpus to the Chromium repository is the preferred way.You can do the same thing by using the gsutil command line tool:
gsutil -m rsync <path_to_corpus> gs://clusterfuzz-corpus/libfuzzer/<my_fuzzer>
gsutil
, you must be logged into your @google.com account (@chromium.org will not work). You can use the gcloud auth login
command to log into your account in gsutil
if you installed gsutil
through gcloud
.Your seed corpus is synced to all fuzzing bots for every iteration, so it's important to minimize it to a small set of interesting inputs before uploading. Keeping the seed corpus small improves fuzzing efficiency and prevents our bots from running out of disk space.
You can minimize your seed corpus by using libFuzzer’s -merge=1
option:
# Create an empty directory. mkdir seed_corpus_minimized # Run the fuzzer with -merge=1 flag. ./my_fuzzer -merge=1 ./seed_corpus_minimized ./seed_corpus
After running the command, the seed_corpus_minimized
directory will contain a minimized corpus that gives the same code coverage as your initial seed_corpus
directory.
You can help your fuzzer increase its coverage by providing a set of common words or values that you expect to find in the input. Such a dictionary works especially well for certain use-cases (e.g., fuzzing file format decoders or text-based protocols like XML).
Add a fuzzer dictionary:
Create a flat ASCII text file that lists one input token per line in the format name="value"
. The value must appear in quotes with hex escaping (\xNN
) applied to all non-printable, high-bit, or otherwise problematic characters (\
and "
shorthands are recognized, too). This syntax is similar to the one used by the AFL fuzzing engine (-x
option).
name
can be omitted, but it is a convenient way to document the meaning of each token. Here’s an example dictionary:# Lines starting with '#' and empty lines are ignored. # Adds "blah" word (w/o quotes) to the dictionary. kw1="blah" # Use \\ for backslash and \" for quotes. kw2="\"ac\\dc\"" # Use \xAB for hex values. kw3="\xF7\xF8" # Key name before '=' can be omitted: "foo\x0Abar"
Test your dictionary by running your fuzz target locally:
./out/libfuzzer/my_fuzzer -dict=<path_to_dict> <path_to_corpus>
If the dictionary is effective, you should see NEW
units discovered in the output.
Add the dictionary file in the same directory as your fuzz target, then add the dict
attribute to the fuzzer_test
definition in your BUILD.gn
file:
fuzzer_test("my_fuzzer") { ... dict = "my_fuzzer.dict" }
The dictionary is submitted to the Chromium repository. Once ClusterFuzz picks up a new revision build, the dictionary is used automatically.
If you need to change the code being tested by your fuzz target, you can use an #ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
macro in your target code.