third_party/afl/src/libtokencap/README.tokencap - chromium/src - Git at Google

 =========================================
 strcmp() / memcmp() token capture library
 =========================================

   (See ../docs/README for the general instruction manual.)

 This Linux-only companion library allows you to instrument strcmp(), memcmp(),
 and related functions to automatically extract syntax tokens passed to any of
 these libcalls. The resulting list of tokens may be then given as a starting
 dictionary to afl-fuzz (the -x option) to improve coverage on subsequent
 fuzzing runs.

 This may help improving coverage in some targets, and do precisely nothing in
 others. In some cases, it may even make things worse: if libtokencap picks up
 syntax tokens that are not used to process the input data, but that are a part
 of - say - parsing a config file... well, you're going to end up wasting a lot
 of CPU time on trying them out in the input stream. In other words, use this
 feature with care. Manually screening the resulting dictionary is almost
 always a necessity.

 As for the actual operation: the library stores tokens, without any deduping,
 by appending them to a file specified via AFL_TOKEN_FILE. If the variable is not
 set, the tool uses stderr (which is probably not what you want).

 Similarly to afl-tmin, the library is not "proprietary" and can be used with
 other fuzzers or testing tools without the need for any code tweaks. It does not
 require AFL-instrumented binaries to work.

 To use the library, you *need* to make sure that your fuzzing target is compiled
 with -fno-builtin and is linked dynamically. If you wish to automate the first
 part without mucking with CFLAGS in Makefiles, you can set AFL_NO_BUILTIN=1
 when using afl-gcc. This setting specifically adds the following flags:

   -fno-builtin-strcmp -fno-builtin-strncmp -fno-builtin-strcasecmp
   -fno-builtin-strcasencmp -fno-builtin-memcmp -fno-builtin-strstr
   -fno-builtin-strcasestr

 The next step is simply loading this library via LD_PRELOAD. The optimal usage
 pattern is to allow afl-fuzz to fuzz normally for a while and build up a corpus,
 and then fire off the target binary, with libtokencap.so loaded, on every file
 found by AFL in that earlier run. This demonstrates the basic principle:

   export AFL_TOKEN_FILE=$PWD/temp_output.txt

   for i in <out_dir>/queue/id*; do
     LD_PRELOAD=/path/to/libtokencap.so \
       /path/to/target/program [...params, including $i...]
   done

   sort -u temp_output.txt >afl_dictionary.txt

 If you don't get any results, the target library is probably not using strcmp()
 and memcmp() to parse input; or you haven't compiled it with -fno-builtin; or
 the whole thing isn't dynamically linked, and LD_PRELOAD is having no effect.

 PS. The library is Linux-only because there is probably no particularly portable
 and non-invasive way to distinguish between read-only and read-write memory
 mappings. The __tokencap_load_mappings() function is the only thing that would
 need to be changed for other OSes. Porting to platforms with /proc/<pid>/maps
 (e.g., FreeBSD) should be trivial.
	=========================================
	strcmp() / memcmp() token capture library
	=========================================

	(See ../docs/README for the general instruction manual.)

	This Linux-only companion library allows you to instrument strcmp(), memcmp(),
	and related functions to automatically extract syntax tokens passed to any of
	these libcalls. The resulting list of tokens may be then given as a starting
	dictionary to afl-fuzz (the -x option) to improve coverage on subsequent
	fuzzing runs.

	This may help improving coverage in some targets, and do precisely nothing in
	others. In some cases, it may even make things worse: if libtokencap picks up
	syntax tokens that are not used to process the input data, but that are a part
	of - say - parsing a config file... well, you're going to end up wasting a lot
	of CPU time on trying them out in the input stream. In other words, use this
	feature with care. Manually screening the resulting dictionary is almost
	always a necessity.

	As for the actual operation: the library stores tokens, without any deduping,
	by appending them to a file specified via AFL_TOKEN_FILE. If the variable is not
	set, the tool uses stderr (which is probably not what you want).

	Similarly to afl-tmin, the library is not "proprietary" and can be used with
	other fuzzers or testing tools without the need for any code tweaks. It does not
	require AFL-instrumented binaries to work.

	To use the library, you need to make sure that your fuzzing target is compiled
	with -fno-builtin and is linked dynamically. If you wish to automate the first
	part without mucking with CFLAGS in Makefiles, you can set AFL_NO_BUILTIN=1
	when using afl-gcc. This setting specifically adds the following flags:

	-fno-builtin-strcmp -fno-builtin-strncmp -fno-builtin-strcasecmp
	-fno-builtin-strcasencmp -fno-builtin-memcmp -fno-builtin-strstr
	-fno-builtin-strcasestr

	The next step is simply loading this library via LD_PRELOAD. The optimal usage
	pattern is to allow afl-fuzz to fuzz normally for a while and build up a corpus,
	and then fire off the target binary, with libtokencap.so loaded, on every file
	found by AFL in that earlier run. This demonstrates the basic principle:

	export AFL_TOKEN_FILE=$PWD/temp_output.txt

	for i in <out_dir>/queue/id*; do
	LD_PRELOAD=/path/to/libtokencap.so \
	/path/to/target/program [...params, including $i...]
	done

	sort -u temp_output.txt >afl_dictionary.txt

	If you don't get any results, the target library is probably not using strcmp()
	and memcmp() to parse input; or you haven't compiled it with -fno-builtin; or
	the whole thing isn't dynamically linked, and LD_PRELOAD is having no effect.

	PS. The library is Linux-only because there is probably no particularly portable
	and non-invasive way to distinguish between read-only and read-write memory
	mappings. The __tokencap_load_mappings() function is the only thing that would
	need to be changed for other OSes. Porting to platforms with /proc/<pid>/maps
	(e.g., FreeBSD) should be trivial.