This document describes the current status of C/C++ IncludeProcessor.
This document is based on the goma client revision on Feb 2019.
The purpose of C/C++ IncludeProcessor is to list all necessary included files. For example, if #include <foo.h>
is found, “foo.h” is listed as an include file.
#include <foo.h>
However, there are a few conditionally included files. For example:
#if FOO() && BAR() #include <baz.h> #endif
In this case, only when FOO() && BAR()
is true, baz.h
is included. So, C/C++ IncludeProcessor needs to evaluate preprocessor directives.
In the rest of this document, we describe how this evalution works.
Assume C/C++ IncludeProcessor wants to list included files for a file “a.cc”.
First, we convert “a.cc” content to IncludeItem
. IncludeItem
contains SharedCppDirectives
, and detected include guard (if any). SharedCppDirectives
is conceptually a list of CppDirective
. One CppDirective
corresponds to one cpp directive e.g. #include <iostream>
.
Definition is the following:
using CppDirectiveList = std::vector<std::unique_ptr<const CppDirective>>; using SharedCppDirectives = std::shared_ptr<const CppDirectiveList>;
See: cpp_directive.h
Process flow is like the following: See IncludeCache::CreateFromFile for more details.
Input File v v DirectiveFilter: Keep only # lines to make parser faster. v Input File (filtered) v v CppDirectiveParser: parse # lines and convert them to a list of v CppDirective. v SharedCppDirectives v v CppDirectiveOptimizer: remove unnecessary directives, v which won't affect include processor result. v SharedCppDirectives v v IncludeGuardDetector: detect include guard to use in CppParser. v IncludeItem
The result is cached in IncludeCache, and we reuse the conversion result to process the same file.
The cache size is limited by the max number of entries. After processing all chrome sources, 200~300 MB will be used in IncludeCache.
After a file can be converted to a list of CppDirectives, CppParser
evaluates the list of CppDirectives
.
Evaluation is just processing CppDirectives one by one. See CppParser::ProcessDirectives
, to understand how evalution works.
During evaluation, CppParser keeps a hashmap from macro name (string) to Macro. For example, #define A FOO BAR
is processed, CppParser
has a hashmap entry like "A" --> Macro(tokens=["FOO", "BAR"])
.
Note that we pass directives not only from a file input, but also from a compiler predefined macros (e.g. __cplusplus
) and macros defined in a command line flag (e.g. -DFOO=BAR
). We need to pass these predefined macros and command line defined macros to CppParser before evaluating CppDirective from a file input.
On Linux, the mean size of the hashmap is around 4000 entries. On Windows, since windows.h is large, it sometimes exceeds 15000 entries.
If the mean memory size of macro entry is just 1KB, macro environment will use 1 [KB] * 15,000 [bytes/entries] = 15 MB (+ hashmap overhead). IncludeProcessor works in parallel (usually the number of CPU cores tasks). If you're using 32 cores machine, 32 * 15 = 480 MB will be used. Note that this is rough estimation.
CppParser::ProcessDirectives
just evaluates each directives, so it's easy. However, one difficult point is how to expand macro.
Macro expansion uses two different expanders: CppMacroExpanderCBV and CppMacroExpanderNaive.
See comment about how they work. Especially, CBV version has several examples. Naive version is based on https://www.spinellis.gr/blog/20060626/cpp.algo.pdf.