blob: bc29397d3067eb89c51af10679a15b8b5d8b9538 [file] [log] [blame] [view]
# Getting Started with libprotobuf-mutator in Chrome
*** note
**Note:** libprotobuf-mutator (LPM) is new to Chromium and does not (yet) have a
long track record of success. Also, writing fuzzers with libprotobuf-mutator
will probably require more effort than writing fuzzers with libFuzzer alone.
If you run into problems, send an email to [fuzzing@chromium.org] for help.
**Prerequisites:** Knowledge of [libFuzzer in Chromium] and basic understanding of
[Protocol Buffers].
***
This document will walk you through:
* An overview of libprotobuf-mutator and how it's used.
* Writing and building your first fuzzer using libprotobuf-mutator.
## Overview of libprotobuf-mutator
libprotobuf-mutator is a package that allows libFuzzer’s mutation engine to
manipulate protobufs. This allows libFuzzer's mutations to be more specific
to the format it is fuzzing and less arbitrary. Below are some good use cases
for libprotobuf-mutator:
* Fuzzing targets that accept Protocol Buffers as input. Note that if you are
fuzzing a target that accepts protobuffers, your protobuf definition *cannot*
be optimized for `LITE_RUNTIME`, as is the case with almost all protobuf
definitions in Chromium. To get around this you can copy the file without the
optimization.
* Fuzzing targets that accept other highly structured input. To do this you
must write code that converts data from a protobuf-based format to a format the
target accepts. url_parse_proto_fuzzer is a working example of this and is
commented extensively. Readers may wish to consult its code, which is located in
`testing/libfuzzer/fuzzers/url_parse_proto_fuzzer.cc`, and
`testing/libfuzzer/fuzzers/url.proto`. Its build configuration can be found
in `testing/libfuzzer/fuzzers/BUILD.gn`.
* Fuzzing targets that accept more than one argument (such as data and flags).
In this case, you can define each argument as its own field in your protobuf
definition.
In the next two sections, we will discuss how to write and build fuzzers using
libprotobuf-mutator. Interested readers may also want to look at [this] example
of a libprotobuf-mutator fuzzer that is even more trivial than
url_parse_proto_fuzzer.
## Write a libprotobuf-mutator fuzzer
Once you have in mind the code you want to fuzz and the format it accepts, you
are ready to start writing a libprotobuf-mutator fuzzer. Writing the fuzzer
will have three steps:
* Define the fuzzed format (not required for protobuf formats, unless the
original definition is optimized for `LITE_RUNTIME`).
* Write the fuzzer target and conversion code (for non-protobuf formats).
* Define the GN target
### Define the Fuzzed Format
Create a new .proto using `proto2` or `proto3` syntax and define a message that
you want libFuzzer to mutate.
``` protocol-buffer
syntax = "proto2";
package my_fuzzer;
message MyProtoFormat {
// Define a format for libFuzzer to mutate here.
}
```
See `testing/libfuzzer/fuzzers/url.proto` for an example of this in practice.
That example has extensive comments on URL syntax and how that influenced
the definition of the Url message.
### Write the Fuzzer Target and Conversion Code
Create a new .cc and write a `DEFINE_BINARY_PROTO_FUZZER` function:
```cpp
// Needed since we use getenv().
#include <stdlib.h>
// Needed since we use std::cout.
#include <iostream>
#include "third_party/libprotobuf-mutator/src/src/libfuzzer/libfuzzer_macro.h"
// Assuming the .proto file is path/to/your/proto_file/my_format.proto.
#include "path/to/your/proto_file/my_format.pb.h"
// Silence logging from the protobuf library.
protobuf_mutator::protobuf::LogSilencer log_silencer;
// Put your conversion code here (if needed) and then pass the result to
// your fuzzing code (or just pass "my_format", if your target accepts
// protobufs).
DEFINE_BINARY_PROTO_FUZZER(const my_fuzzer::MyFormat& my_proto_format) {
// Convert your protobuf to whatever format your targeted code accepts
// if it doesn't accept protobufs.
std::string native_input = convert_to_native_input(my_proto_format);
// You should provide a way to easily retreive the native input for
// a given protobuf input. This is useful for debugging and for seeing
// the inputs that cause targeted_code to crash (which is the reason we are
// here!). Note how this is done before the targeted_code is called since we
// can't print after the program has crashed.
if (getenv("LPM_DUMP_NATIVE_INPUT"))
std::cout << native_input << std::endl;
// Now test your targeted code using the converted protobuf input.
targeted_code(native_input);
}
```
This is very similar to the same step in writing a standard libFuzzer fuzzer.
The only real differences are accepting protobufs rather than raw data and
converting them to the desired format. Conversion code can't really be explored
in this guide since it is format-specific. However, a good example of conversion
code (and a fuzzer target) can be found in
`testing/libfuzzer/fuzzers/url_parse_proto_fuzzer.cc`. That example thoroughly
documents how it converts the Url protobuf message into a real URL string.
Note that `DEFINE_TEXT_PROTO_FUZZER` can be used during development instead of
`DEFINE_BINARY_PROTO_FUZZER`. `DEFINE_TEXT_PROTO_FUZZER` comes with a
performance penalty but causes the corpus to be stored in a human readable (and
modifiable) string-based format.
A good convention is printing the native input when the `LPM_DUMP_NATIVE_INPUT`
env variable is set. This will make it easy to retreive the actual input that
causes the code to crash instead of the protobuf version of it (eg you can get
the URL string that causes an input to crash rather than a protobuf). Since it
is only a convention it is strongly recommended even though it isn't necessary.
You don't need to do this if the native input of targeted_code is protobufs.
Beware that printing a newline can make the output invalid for some formats. In
this case you should use `fflush(0)` since otherwise the program may crash
before native_input is actually printed.
### Define the GN Target
Define a fuzzer_test target and include your protobuf definition and
libprotobuf-mutator as dependencies.
```python
import("//testing/libfuzzer/fuzzer_test.gni")
import("//third_party/protobuf/proto_library.gni")
fuzzer_test("my_fuzzer") {
sources = [ "my_fuzzer.cc" ]
deps = [
:my_format_proto
"//third_party/libprotobuf-mutator"
...
]
}
proto_library("my_format_proto") {
sources = [ "my_format.proto" ]
}
```
See `testing/libfuzzer/fuzzers/BUILD.gn` for an example of this in practice.
### Wrapping Up
Once you have written a fuzzer with libprotobuf-mutator, building and running
it is pretty much the same as if the fuzzer were a standard libFuzzer-based
fuzzer (with minor exceptions, like your seed corpus must be in protobuf
format).
### Tips
* If you have messages that are defined recursively (eg: message `Foo` has a
field of type `Foo`), make sure to bound recursive calls to code converting
your message into native input. Otherwise you will (probably) end up with an
out of memory error. The code coverage benefits of allowing unlimited
recursion in a message are probably fairly low for most targets anyway.
* Remember that proto definitions can be changed in ways that are backwards
compatible (such as adding explicit values to an `enum`). This means that you
can make changes to your definitions while preserving the usefulness of your
corpus. In general adding fields will be backwards compatible but removing them
(particulary if they are `required`) is not.
* Make sure you understand the meaning of the different protobuf modifiers such
as `oneof` and `repeated` as they can be counter-intuitive. `oneof` means "At
most one of" while `repeated` means "At least zero". You can hack around these
meanings if you need "at least one of" or "exactly one of" something. For
example, this is the proto code for exactly one of: `MessageA` or `MessageB` or
`MessageC`:
```protocol-buffer
message MyFormat {
oneof a_or_b {
MessageA message_a = 1;
MessageB message_b = 2;
}
required MessageC message_c = 3;
}
```
And here is the C++ code that converts it.
```c++
std::string Convert(MyFormat& my_format) {
if (my_format.has_message_a())
return ConvertMessageA(my_format.message_a());
else if (my_format.has_message_b())
return ConvertMessageB(my_format.message_b());
else // Fall through to the default case, message_c.
return ConvertMessageC(my_format.message_c());
}
```
* Check out some of the [existing proto fuzzers], as not only will they be helpful
examples, but it is possible that your format is already defined or partially
defined by an existing proto definition.
* libprotobuf-mutator supports both proto2 and proto3 syntax. Be aware though
that it handles strings differently in each because of differences in the way
the proto library handles strings in each syntax (in short, proto3 strings must
actually be UTF-8 while in proto2 they do not). See [here] for more details.
[libfuzzer in Chromium]: getting_started.md
[Protocol Buffers]: https://developers.google.com/protocol-buffers/docs/cpptutorial
[fuzzing@chromium.org]: mailto:fuzzing@chromium.org
[this]: https://github.com/google/libprotobuf-mutator/tree/master/examples/libfuzzer/libfuzzer_example.cc
[existing proto fuzzers]: https://cs.chromium.org/search/?q=DEFINE_(BINARY_%7CTEXT_)?PROTO_FUZZER+-file:src/third_party/libprotobuf-mutator/src/src/libfuzzer/libfuzzer_macro.h&sq=package:chromium&type=cs
[here]: https://github.com/google/libprotobuf-mutator/blob/master/README.md#utf-8-strings