This document describes the current status of C/C++ IncludeProcessor.
This document is based on the goma client revision on Feb 2019.
The purpose of C/C++ IncludeProcessor is to list all necessary included files. For example, if
#include <foo.h> is found, “foo.h” is listed as an include file.
However, there are a few conditionally included files. For example:
#if FOO() && BAR() #include <baz.h> #endif
In this case, only when
FOO() && BAR() is true,
baz.h is included. So, C/C++ IncludeProcessor needs to evaluate preprocessor directives.
In the rest of this document, we describe how this evalution works.
Assume C/C++ IncludeProcessor wants to list included files for a file “a.cc”.
First, we convert “a.cc” content to
SharedCppDirectives, and detected include guard (if any).
SharedCppDirectives is conceptually a list of
CppDirective corresponds to one cpp directive e.g.
Definition is the following:
using CppDirectiveList = std::vector<std::unique_ptr<const CppDirective>>; using SharedCppDirectives = std::shared_ptr<const CppDirectiveList>;
Process flow is like the following: See IncludeCache::CreateFromFile for more details.
Input File v v DirectiveFilter: Keep only # lines to make parser faster. v Input File (filtered) v v CppDirectiveParser: parse # lines and convert them to a list of v CppDirective. v SharedCppDirectives v v CppDirectiveOptimizer: remove unnecessary directives, v which won't affect include processor result. v SharedCppDirectives v v IncludeGuardDetector: detect include guard to use in CppParser. v IncludeItem
The result is cached in IncludeCache, and we reuse the conversion result to process the same file.
The cache size is limited by the max number of entries. After processing all chrome sources, 200~300 MB will be used in IncludeCache.
After a file can be converted to a list of CppDirectives,
CppParser evaluates the list of
Evaluation is just processing CppDirectives one by one. See
CppParser::ProcessDirectives, to understand how evalution works.
During evaluation, CppParser keeps a hashmap from macro name (string) to Macro. For example,
#define A FOO BAR is processed,
CppParser has a hashmap entry like
"A" --> Macro(tokens=["FOO", "BAR"]).
Note that we pass directives not only from a file input, but also from a compiler predefined macros (e.g.
__cplusplus) and macros defined in a command line flag (e.g.
-DFOO=BAR). We need to pass these predefined macros and command line defined macros to CppParser before evaluating CppDirective from a file input.
On Linux, the mean size of the hashmap is around 4000 entries. On Windows, since windows.h is large, it sometimes exceeds 15000 entries.
If the mean memory size of macro entry is just 1KB, macro environment will use 1 [KB] * 15,000 [bytes/entries] = 15 MB (+ hashmap overhead). IncludeProcessor works in parallel (usually the number of CPU cores tasks). If you're using 32 cores machine, 32 * 15 = 480 MB will be used. Note that this is rough estimation.
CppParser::ProcessDirectives just evaluates each directives, so it's easy. However, one difficult point is how to expand macro.
See comment about how they work. Especially, CBV version has several examples. Naive version is based on https://www.spinellis.gr/blog/20060626/cpp.algo.pdf.