| # Efficient Fuzzing Guide |
| |
| This relates to fuzzers created using [libfuzzer] not [FuzzTests] - none of this |
| advice is necessary for FuzzTests. |
| |
| Once you have a fuzz target running, you can analyze and tweak it to improve its |
| efficiency. This document describes techniques to minimize fuzzing time and |
| maximize your results. |
| |
| *** note |
| **Note:** If you haven’t created your first fuzz target yet, see the [Getting |
| Started Guide]. |
| *** |
| |
| The most direct way to gauge the effectiveness of your fuzz target is to collect |
| metrics. You can get them manually, or take them from a [ClusterFuzz status] |
| page after your fuzz target is checked into the Chromium repository. |
| |
| [TOC] |
| |
| ## Key metrics of a fuzz target |
| |
| ### Execution speed |
| |
| A fuzzing engine such as libFuzzer typically explores a large search space by |
| performing randomized mutations, so it needs to run as fast as possible to find |
| interesting code paths. |
| |
| Fuzz target speed is calculated in executions per second (`exec/s`). It is |
| printed while a fuzz target is running: |
| |
| ``` |
| #11002 NEW cov: 1337 ft: 10934 corp: 707/409Kb lim: 1098 exec/s: 5333 rss: 27Mb L: 186/1098 |
| ``` |
| |
| You should aim for at least 1,000 exec/s from your fuzz target locally before |
| submitting it to the Chromium repository. If you’re under 1,000, consider the |
| following improvements: |
| |
| * [Simplifying initialization/cleanup](#Simplifying-initialization-cleanup) |
| * [Minimizing memory usage](#Minimizing-memory-usage) |
| |
| #### Simplifying initialization/cleanup |
| |
| If your `LLVMFuzzerTestOneInput` function is too complex, it can decrease the |
| fuzzer’s execution speed. It can also cause the fuzzer to target specific |
| use-cases or fail to account for unexpected scenarios. |
| |
| Instead of performing setup and teardown on each input, use static |
| initialization and shared resources. Check out this [startup initialization] in |
| libFuzzer’s documentation for an example. |
| |
| *** note |
| **Note:** You can skip freeing static resources. However, all other resources |
| allocated within the `LLVMFuzzerTestOneInput` function should be de-allocated, |
| since the function gets called millions of times during a fuzzing session. If |
| you don’t, you’ll often run out of memory and reduce overall fuzzing efficiency. |
| *** |
| |
| #### Minimizing memory usage |
| |
| Avoid allocation of dynamic memory wherever possible. Memory instrumentation |
| works faster for stack-based and static objects than for heap-allocated ones. |
| |
| *** note |
| **Note:** It’s always a good idea to try different variants for your fuzz target |
| locally, then submit only the fastest implementation to the Chromium repository. |
| *** |
| |
| ### Code coverage |
| |
| You can check the percentage of code covered by your fuzz target to gauge |
| fuzzing effectiveness: |
| |
| * Review aggregated Chrome coverage from recent runs by checking the [fuzzing |
| coverage] report. This report can provide insight on how to improve code |
| coverage. |
| * Generate a source-level coverage report for your fuzzer by running the |
| [coverage script] stored in the Chromium repository. The script provides |
| detailed instructions and a usage example. |
| |
| For the `out/coverage` target in the coverage script, make sure to add all of |
| the gn args you needed to build the `out/libfuzzer` target; this could include |
| args like `target_os=chromeos` and `is_asan=true` depending on the [gn config] |
| you chose. |
| |
| *** note |
| **Note:** The code coverage of a fuzz target depends heavily on the corpus. A |
| well-chosen corpus will produce much greater code coverage. On the other hand, |
| a coverage report generated by a fuzz target without a corpus won't cover much |
| code. If you don’t have a corpus to use, you can download the [corpus from |
| ClusterFuzz]. For more information on the corpus, see |
| [Corpus Size](#Corpus-Size). |
| *** |
| |
| ### Corpus size |
| |
| A guided fuzzing engine such as libFuzzer considers an input (a.k.a. testcase |
| or corpus unit) *interesting* if the input results in new code coverage (i.e., |
| if the fuzzer reaches code that has not been reached before). The set of all |
| interesting inputs is called the *corpus*. A corpus is shared across fuzzer runs |
| and grows over time. |
| |
| If a fuzz target stops discovering new interesting inputs after running for a |
| while, it typically indicates that the fuzz target is hitting a code barrier |
| (also called a *coverage plateau*). The corpus for a reasonably complex target |
| should contain hundreds (if not thousands) of inputs. |
| |
| If a fuzz target reaches coverage plateau with a small corpus, the common causes |
| are checksums and magic numbers. Or, it may be impossible for your fuzzer to |
| reach a lot of code. The easiest way to diagnose the problem is to generate and |
| analyze a [coverage report](#code-coverage). Then, to fix the issue, try the |
| following: |
| |
| * Change the code (e.g., disable CRC checks while fuzzing) with a |
| [custom build](#Custom-build). |
| * Prepare or improve the [seed corpus](#Seed-corpus). |
| * Prepare or improve the [fuzzer dictionary](#Fuzzer-dictionary). |
| |
| ## Ways to improve a fuzz target |
| |
| ### Seed corpus |
| |
| You can give your fuzz target a starting point by creating a set of valid and |
| interesting inputs called a *seed corpus*. If you don’t provide a seed corpus, |
| the fuzzing engine has to guess inputs from scratch, which can take time |
| (depending on the size of the inputs and the complexity of the target format). |
| In many cases, providing a seed corpus can increase code coverage by an order of |
| magnitude. |
| |
| Seed corpuses work especially well for strictly defined file formats and data |
| transmission protocols: |
| |
| * For file format parsers, add valid files from your test suite. |
| * For protocol parsers, add valid raw streams from a test suite into separate |
| files. |
| * For graphics libraries, add a variety of small PNG/JPG/GIF files. |
| |
| #### Using a corpus locally |
| |
| If you’re running a fuzz target locally, you can easily designate a corpus by |
| passing a directory as an argument: |
| |
| ``` |
| ./out/libfuzzer/my_fuzzer ~/tmp/my_fuzzer_corpus |
| ``` |
| |
| The fuzzer stores all the interesting inputs it finds in the directory. |
| |
| #### Creating a Chromium repository seed corpus |
| |
| When running fuzz targets at scale, ClusterFuzz looks for a seed corpus defined |
| in the Chromium source repository. You can define one in your `BUILD.gn` file by |
| adding a `seed_corpus` attribute to your `fuzzer_test` target definition: |
| |
| ``` |
| fuzzer_test("my_fuzzer") { |
| ... |
| seed_corpus = "test/fuzz/testcases" |
| ... |
| } |
| ``` |
| |
| If you want to specify multiple seed corpus directories, use the `seed_corpuses` |
| attribute instead: |
| |
| ``` |
| fuzzer_test("my_fuzzer") { |
| ... |
| seed_corpuses = [ "test/fuzz/testcases", "test/unittest/data" ] |
| ... |
| } |
| ``` |
| |
| All files found in these directories and their subdirectories are stored in a |
| `<my_fuzzer>_seed_corpus.zip` output archive. |
| |
| #### Uploading corpus files to GCS |
| |
| If you can't store your seed corpus in the Chromium repository (e.g., it’s too |
| large, can’t be open-sourced, etc.), you can upload the corpus to the Google |
| Cloud Storage (GCS) bucket used by ClusterFuzz. |
| |
| 1) Open the [Corpus GCS Bucket] in your browser. |
| 2) Search for the directory named `<my_fuzzer>`. If the directory does not |
| exist, create it. |
| 3) In the `<my_fuzzer>` directory, upload your corpus files. |
| |
| *** note |
| **Note:** If you upload your corpus to GCS, you don’t need to add the |
| `seed_corpus` attribute to your `fuzzer_test` target definition. However, adding |
| seed corpus to the Chromium repository is the preferred way. |
| *** |
| |
| You can do the same thing by using the [gsutil] command line tool: |
| |
| ```bash |
| gsutil -m rsync <path_to_corpus> gs://clusterfuzz-corpus/libfuzzer/<my_fuzzer> |
| ``` |
| |
| *** note |
| **Note:** To write to this bucket using `gsutil`, you must be logged into your |
| @google.com account (@chromium.org will not work). You can use the `gcloud auth |
| login` command to log into your account in `gsutil` if you installed `gsutil` |
| through `gcloud`. |
| *** |
| |
| #### Minimizing a seed corpus |
| |
| Your seed corpus is synced to all fuzzing bots for every iteration, so it's |
| important to minimize it to a small set of interesting inputs before uploading. |
| Keeping the seed corpus small improves fuzzing efficiency and prevents our bots |
| from running out of disk space. |
| |
| You can minimize your seed corpus by using libFuzzer’s `-merge=1` option: |
| |
| ```bash |
| # Create an empty directory. |
| mkdir seed_corpus_minimized |
| |
| # Run the fuzzer with -merge=1 flag. |
| ./my_fuzzer -merge=1 ./seed_corpus_minimized ./seed_corpus |
| ``` |
| |
| After running the command, the `seed_corpus_minimized` directory will contain a |
| minimized corpus that gives the same code coverage as your initial `seed_corpus` |
| directory. |
| |
| ### Fuzzer dictionary |
| |
| You can help your fuzzer increase its coverage by providing a set of common |
| words or values that you expect to find in the input. Such a dictionary works |
| especially well for certain use-cases (e.g., fuzzing file format decoders or |
| text-based protocols like XML). |
| |
| Add a fuzzer dictionary: |
| |
| 1) Create a flat ASCII text file that lists one input token per line in the |
| format `name="value"`. The value must appear in quotes with hex escaping |
| (`\xNN`) applied to all non-printable, high-bit, or otherwise problematic |
| characters (`\` and `"` shorthands are recognized, too). This syntax is |
| similar to the one used by the [AFL] fuzzing engine (`-x` option). |
| |
| *** note |
| **Note:** `name` can be omitted, but it is a convenient way to document the |
| meaning of each token. Here’s an example dictionary: |
| *** |
| |
| ``` |
| # Lines starting with '#' and empty lines are ignored. |
| |
| # Adds "blah" word (w/o quotes) to the dictionary. |
| kw1="blah" |
| # Use \\ for backslash and \" for quotes. |
| kw2="\"ac\\dc\"" |
| # Use \xAB for hex values. |
| kw3="\xF7\xF8" |
| # Key name before '=' can be omitted: |
| "foo\x0Abar" |
| ``` |
| |
| 2) Test your dictionary by running your fuzz target locally: |
| |
| ```bash |
| ./out/libfuzzer/my_fuzzer -dict=<path_to_dict> <path_to_corpus> |
| ``` |
| |
| If the dictionary is effective, you should see `NEW` units discovered in the |
| output. |
| |
| 3) Add the dictionary file in the same directory as your fuzz target, then add |
| the `dict` attribute to the `fuzzer_test` definition in your `BUILD.gn` file: |
| |
| ``` |
| fuzzer_test("my_fuzzer") { |
| ... |
| dict = "my_fuzzer.dict" |
| } |
| ``` |
| |
| The dictionary is submitted to the Chromium repository. Once ClusterFuzz |
| picks up a new revision build, the dictionary is used automatically. |
| |
| ### Custom build |
| |
| If you need to change the code being tested by your fuzz target, you can use an |
| `#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION` macro in your target code. |
| |
| *** note |
| **Note:** Patching target code is not a preferred way of improving the |
| corresponding fuzz target, but in some cases it might be the only way to do it |
| (e.g., when there is no intended API to disable checksum verification, or when |
| the target code uses a random generator that affects the reproducibility of |
| crashes). |
| *** |
| |
| [AFL]: http://lcamtuf.coredump.cx/afl/ |
| [ClusterFuzz status]: libFuzzer_integration.md#Status-Links |
| [Corpus GCS Bucket]: https://console.cloud.google.com/storage/clusterfuzz-corpus/libfuzzer |
| [Getting Started Guide]: getting_started.md |
| [gn config]: getting_started.md#running-the-fuzz-target |
| [corpus from ClusterFuzz]: libFuzzer_integration.md#Corpus |
| [coverage script]: https://cs.chromium.org/chromium/src/tools/code_coverage/coverage.py |
| [fuzzing coverage]: https://analysis.chromium.org/coverage/p/chromium?platform=fuzz |
| [gsutil]: https://cloud.google.com/storage/docs/gsutil |
| [startup initialization]: https://llvm.org/docs/LibFuzzer.html#startup-initialization |
| [libfuzzer]: getting_started_with_libfuzzer.md |
| [fuzztests]: getting_started.md |