Remove RE2 from

The CSV reader has been using regular expression to extract fields
from comma-separated-value records. While allowing concise code in CSV
reader, regular expressions are too big a hammer for this task. After
some recent changes in RE2, the CSV reader became slower, as discussed
in the attached bug.

Because the regular expressions used here are compile-time constants,
and the equivalent parser is simple, this CL removes the RE2
dependency in favour of a custom parser.

This change caused 1-2 orders of magnitude of speedup in the
csv_reader_fuzzer, when running the fuzzer on my machine in the
(debug) configuration and with the input specified by the ClusterFuzz
report in the associated bug:
* Before the CL, running the fuzzer took about 52 seconds.
* After the CL, running took about a second.
In release build with DCHECKS this was 6.5 seconds (RE2) vs. half a
second (this CL). Without DCHECKS this was 6.3 vs. 0.3 seconds.

Note: The parser is currently written so that it allows both CR and
LF as end-of-line markers. (It still needs some work to allow CRLF.)
This was a blocker for avoiding a copy of the input in
CSVTable::ReadCSV, because the copy was made just to convert all
EOLs to LF. However, this improvement is not made here, to keep
CLs focused.

Bug: 921383
Change-Id: I3965e117377cdd58483abaa5ab3434540f4232e8
Commit-Queue: Vaclav Brozek <>
Reviewed-by: Jan Wilken Dörrie <>
Cr-Commit-Position: refs/heads/master@{#623781}
2 files changed