| # Efficient Fuzzer |
| |
| This document describes ways to determine your fuzzer efficiency and ways |
| to improve it. |
| |
| ## Overview |
| |
| Being a coverage-driven fuzzer, libFuzzer considers a certain input *interesting* |
| if it results in new coverage. The set of all interesting inputs is called |
| *corpus*. |
| Items in corpus are constantly mutated in search of new interesting input. |
| Corpus is usually maintained between multiple fuzzer runs. |
| |
| There are several metrics you should look at to determine your fuzzer effectiveness: |
| |
| * [fuzzer speed](#Fuzzer-Speed) (exec/s) |
| * [corpus size](#Corpus-Size) |
| * [coverage](#Coverage) |
| |
| You can collect these metrics manually or take them from [ClusterFuzz status] |
| pages. |
| |
| ## Fuzzer Speed |
| |
| Fuzzer speed is printed while fuzzer runs: |
| |
| ``` |
| #19346 NEW cov: 2815 bits: 1082 indir: 43 units: 150 exec/s: 19346 L: 62 |
| ``` |
| |
| Because libFuzzer performs randomized search, it is critical to have it as fast |
| as possible. You should try to get to at least 1,000 exec/s. Profile the fuzzer |
| using any standard tool to see where it spends its time. |
| |
| |
| ### Initialization/Cleanup |
| |
| Try to keep your fuzzing function as simple as possible. Prefer to use static |
| initialization and shared resources rather than bringing environment up and down |
| every single run. |
| |
| Fuzzers don't have to shutdown gracefully (we either kill them or they crash |
| because sanitizer has found a problem). You can skip freeing static resource. |
| |
| Of course all resources allocated within `LLVMFuzzerTestOneInput` function |
| should be deallocated since this function is called millions of times during |
| one fuzzing session. |
| |
| |
| ### Memory Usage |
| |
| Avoid allocation of dynamic memory wherever possible. Instrumentation works |
| faster for stack-based and static objects than for heap allocated ones. |
| |
| It is always a good idea to play with different versions of a fuzzer to find the |
| fastest implementation. |
| |
| |
| ### Maximum Testcase Length |
| |
| Experiment with different values of `-max_len` parameter. This parameter often |
| significantly affects execution speed, but not always. |
| |
| 1) Define which `-max_len` value is reasonable for your target. For example, it |
| may be useless to fuzz an image decoder with too small value of testcase length. |
| |
| 2) Increase the value defined on previous step. Check its influence on execution |
| speed of fuzzer. If speed doesn't drop significantly for long inputs, it is fine |
| to have some bigger value for `-max_len`. |
| |
| In general, bigger `-max_len` value gives better coverage. Coverage is main |
| priority for fuzzing. However, low execution speed may result in waste of |
| resources used for fuzzing. If large inputs make fuzzer too slow you have to |
| adjust value of `-max_len` and find a trade-off between coverage and execution |
| speed. |
| |
| |
| ## Corpus Size |
| |
| After running for a while the fuzzer would reach a plateau and won't discover |
| new interesting input. Corpus for a reasonably complex functionality |
| should contain hundreds (if not thousands) of items. |
| |
| Too small corpus size indicates some code barrier that |
| libFuzzer is having problems penetrating. Common cases include: checksums, |
| magic numbers etc. The easiest way to diagnose this problem is to generate a |
| [coverage report](#Coverage). To fix the issue you can: |
| |
| * change the code (e.g. disable crc checks while fuzzing) |
| * prepare [corpus seed](#Corpus-Seed) |
| * prepare [fuzzer dictionary](#Fuzzer-Dictionary) |
| * specify [custom options](#Custom-Options) |
| |
| ## Coverage |
| |
| You can easily generate source-level coverage report for a given corpus: |
| |
| ``` |
| ASAN_OPTIONS=html_cov_report=1:sancov_path=./third_party/llvm-build/Release+Asserts/bin/sancov \ |
| ./out/libfuzzer/my_fuzzer -runs=0 ~/tmp/my_fuzzer_corpus |
| ``` |
| |
| This will produce an .html file with colored source-code. It can be used to |
| determine where your fuzzer is "stuck". Replace `ASAN_OPTIONS` by corresponding |
| option variable if your are using another sanitizer (e.g. `MSAN_OPTIONS`). |
| `sancov_path` can be omitted by adding llvm bin directory to `PATH` environment |
| variable. |
| |
| ### Corpus Seed |
| |
| You can pass a corpus directory to a fuzzer that you run manually: |
| |
| ``` |
| ./out/libfuzzer/my_fuzzer ~/tmp/my_fuzzer_corpus |
| ``` |
| |
| The directory can initially be empty. The fuzzer would store all the interesting |
| items it finds in the directory. You can help the fuzzer by "seeding" the corpus: |
| simply copy interesting inputs for your function to the corpus directory before |
| running. This works especially well for strictly defined file formats or data |
| transmission protocols. |
| |
| * For file-parsing functionality just use some valid files from your test suite. |
| |
| * For protocol processing targets put raw streams from test suite into separate |
| files. |
| |
| |
| ClusterFuzz uses seed corpus stored in Chromium repository. You need to add |
| `seed_corpus` attribute to fuzzer target: |
| |
| ``` |
| fuzzer_test("my_protocol_fuzzer") { |
| ... |
| seed_corpus = "src/fuzz/testcases" |
| ... |
| } |
| ``` |
| |
| If you don't want to store seed corpus in Chromium repository, you can upload |
| corpus to Google Cloud Storage bucket used by ClusterFuzz: |
| |
| |
| 1) go to [Corpus GCS Bucket] |
| |
| 2) open directory named `%YOUR_FUZZER_NAME%_static` |
| |
| 3) upload corpus files into the directory |
| |
| |
| Alternative way is to use `gsutil` tool: |
| ```bash |
| gsutil -m rsync <corpus_dir_on_disk> gs://clusterfuzz-corpus/libfuzzer/%YOUR_FUZZER_NAME%_static |
| ``` |
| |
| |
| ### Fuzzer Dictionary |
| |
| It is very useful to provide fuzzer a set of common words/values that you expect |
| to find in the input. This greatly improves efficiency of finding new units and |
| works especially well while fuzzing file format decoders. |
| |
| To add a dictionary, first create a dictionary file. |
| Dictionary syntax is similar to that used by [AFL] for its -x option: |
| |
| ``` |
| # Lines starting with '#' and empty lines are ignored. |
| |
| # Adds "blah" (w/o quotes) to the dictionary. |
| kw1="blah" |
| # Use \\ for backslash and \" for quotes. |
| kw2="\"ac\\dc\"" |
| # Use \xAB for hex values |
| kw3="\xF7\xF8" |
| # the name of the keyword followed by '=' may be omitted: |
| "foo\x0Abar" |
| ``` |
| |
| Test your dictionary by running your fuzzer locally: |
| |
| ```bash |
| ./out/libfuzzer/my_protocol_fuzzer -dict=<path_to_dict> <path_to_corpus> |
| ``` |
| |
| You should see lots of new units discovered. |
| |
| Add `dict` attribute to fuzzer target: |
| |
| ``` |
| fuzzer_test("my_protocol_fuzzer") { |
| ... |
| dict = "protocol.dict" |
| } |
| ``` |
| |
| Make sure to submit dictionary file to git. The dictionary will be used |
| automatically by ClusterFuzz once it picks up new fuzzer version (once a day). |
| |
| |
| ### Custom Options |
| |
| It is possible to specify [libFuzzer parameters](http://llvm.org/docs/LibFuzzer.html#usage) |
| for any fuzzer being run at ClusterFuzz. Custom options will overwrite default |
| values provided by ClusterFuzz. |
| |
| Just list all parameters in `libfuzzer_options` variable of build target: |
| |
| ``` |
| fuzzer_test("my_protocol_fuzzer") { |
| ... |
| libfuzzer_options = [ |
| "max_len=2048", |
| "use_traces=1", |
| ] |
| } |
| ``` |
| |
| Please note that `dict` parameter should be provided [separately](#Fuzzer-Dictionary). |
| Other options may be passed through `libfuzzer_options` property. |
| |
| |
| [AFL]: http://lcamtuf.coredump.cx/afl/ |
| [ClusterFuzz status]: clusterfuzz.md#Status-Links |
| [Corpus GCS Bucket]: https://goto.google.com/libfuzzer-clusterfuzz-corpus |