| ==================== |
| Include What You Use |
| ==================== |
| |
| "Include what you use" means this: for every symbol (type, function, |
| variable, or macro) that you use in foo.cc (or foo.cpp), either foo.cc |
| or foo.h should #include a .h file that exports the declaration of |
| that symbol. (Similarly, for foo_test.cc, either foo_test.cc or foo.h |
| should do the #including.) Obviously symbols defined in foo.cc itself |
| are excluded from this requirement. |
| |
| This puts us in a state where every file includes the headers it needs |
| to declare the symbols that it uses. When every file includes what it |
| uses, then it is possible to edit any file and remove unused headers, |
| without fear of accidentally breaking the upwards dependencies of that |
| file. It also becomes easy to automatically track and update |
| dependencies in the source code. |
| |
| ====== |
| CAVEAT |
| ====== |
| |
| This is alpha quality software -- at best. It was written to work |
| specifically in the Google source tree, and may make assumptions, or |
| have gaps, that are immediately and embarrassingly evident in other |
| types of code. For instance, we only run this on C++ code, not C or |
| Objective C. Even for Google code, the tool still makes a lot of |
| mistakes. |
| |
| While we work to get IWYU quality up, we will be stinting new |
| features, and will prioritize reported bugs along with the many |
| existing, known bugs. The best chance of getting a problem fixed is |
| to submit a patch that fixes it (along with a unittest case that |
| verifies the fix)! |
| |
| ============ |
| How to Build |
| ============ |
| |
| You will need the clang and llvm trees on your system, such as by |
| checking out their SVN trees: |
| http://clang.llvm.org/get_started.html |
| |
| Then download the include-what-you-use tarball and unpack it the |
| /path/to/llvm/tools/clang/tools directory: |
| llvm/tools/clang/tools$ tar xfz include-what-you-use-<whatever>.tar.gz |
| Or, alternately, get the project directly from svn: |
| llvm/tools/clang/tools$ svn co http://include-what-you-use.googlecode.com/svn/trunk/ include-what-you-use |
| |
| Then cd into the include-what-you-use directory (under tools) and type |
| 'make'. |
| |
| Include what you use makes heavy use of clang internals, and will |
| occasionally break when clang is updated. For best results, download |
| clang as of the same revision number of the last include-what-you-use |
| release. You can find this revision number in comments at the top |
| of the include-what-you-use Makefile. |
| |
| -- BUILDING AGAINST LLVM 2.9 |
| |
| These are the commands I ran, which worked for me: |
| |
| 1) Visit http://llvm.org/releases/download.html#2.9 |
| 2) Download the appropriate binaries ("Clang Binaries for Linux/x86_64") |
| 3) Untar the download file, cd to its root directory |
| 4) Run this command: |
| g++ -D_DEBUG -D_GNU_SOURCE -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -fno-rtti -I include -L lib ~/devel/llvm/tools/clang/tools/include-what-you-use/*.cc -lclangFrontend -lclangSerialization -lclangDriver -lclangSema -lclangAnalysis -lclangAST -lclangParse -lclangLex -lclangBasic -lLLVMipo -lLLVMScalarOpts -lLLVMInstCombine -lLLVMTransformUtils -lLLVMipa -lLLVMAnalysis -lLLVMTarget -lLLVMMC -lLLVMCore -lLLVMSupport -lpthread -ldl -lm -o include-what-you-use |
| |
| ========== |
| How to Run |
| ========== |
| |
| The easiest way to run IWYU over your codebase is to run |
| make -k CXX=/path/to/llvm/Debug+Asserts/bin/include-what-you-use |
| or |
| make -k CXX=/path/to/llvm/Release/bin/include-what-you-use |
| |
| (include-what-you-use always exits with an error code, so the build |
| system knows it didn't build a .o file. Hence the need for -k.) |
| |
| We also include, in this directory, a tool that automatically fixes up |
| your source files based on the iwyu recommendations. This is also |
| alpha-quality software! Here's how to use it (requires python): |
| |
| make -k CXX=/path/to/llvm/Debug+Asserts/bin/include-what-you-use > /tmp/iwyu.out |
| python fix_includes.py --nosafe < /tmp/iwyu.out |
| |
| If you don't like the way fix_includes.py munges your #include lines, |
| you can control its behavior via flags. fix_includes.py --help will |
| give a full list, but these are some common ones: |
| * -b: Put blank lines between system and Google #includes |
| * --nocomments: Don't add the 'why' comments next to #includes |
| |
| WARNING: include-what-you-use only analyzes .cc (or .cpp) files built |
| by 'make', along with their corresponding .h files. If your project |
| has a .h file with no corresponding .cc file, iwyu will ignore it. |
| include-what-you-use supports the AddGlobToReportIWYUViolationsFor() |
| function which can be used to indicate other files to analyze, but |
| it's not currently exposed to the user in any way. |
| |
| ============================ |
| How to Correct IWYU Mistakes |
| ============================ |
| |
| 1) If fix_includes.py has removed an #include you actually need, add |
| it back in with the comment '// IWYU pragma: keep' at the end of |
| the #include line. Note that the comment is case-sensitive. |
| |
| 2) If fix_includes has added an #include you don't need, just take it |
| out. We hope to come up with a more permanent way of fixing later. |
| |
| 3) If fix_includes has wrongly added or removed a forward-declare, |
| just fix it up manually. |
| |
| 4) If fix_includes has suggested a private header file (such as |
| <bits/stl_vector.h>) instead of the proper public header file |
| (<vector>), you can fix this by inserting this comment near the |
| top of the private file (assuming you can write to it): |
| // IWYU pragma: private, include "the/public/file.h" |
| The full list of 'iwyu pragma' comments are at the top of |
| iwyu_preprocessor.h. |
| |
| ================= |
| Running the Tests |
| ================= |
| |
| To run the iwyu tests, run |
| python run_iwyu_tests.py |
| |
| It runs one test for each .cc file in the tests/ directory. (We have |
| additional tests in more_tests/, but have not yet gotten the testing |
| framework set up for those tests.) The output can be a bit hard to |
| read, but if a test fails, the reason why will be listed after the |
| 'ERROR:root:Test failed for xxx' line. |
| |
| If fixing a bug in clang, please add a test to the test suite! You |
| can create a file called whatever.cc (*not* .cpp), and whatever.h, and |
| whatever-<extension>.h. You may be able to get away without adding |
| any .h files, and just #including "direct.h" -- see, for instance, |
| tests/remove_fwd_decl_when_including.cc. |
| |
| When fixing fix_includes.py, add a test case to fix_includes_test.py |
| and run |
| python fix_includes_test.py |
| |
| ========= |
| Debugging |
| ========= |
| |
| It's possible to run include-what-you-use in gdb, to debug that way. |
| Another useful tool -- especially in combination with gdb -- is to get |
| the verbose include-what-you-use output. See iwyu_output.h for a |
| description of the verbose levels. Level 7 is very verbose -- it |
| dumps basically the entire AST as it's being traversed, along with |
| iwyu decisions made as it goes -- but very useful for that: |
| env IWYU_VERBOSE=7 make -k CXX=/path/to/llvm/Debug+Asserts/bin/include-what-you-use 2>&1 > /tmp/iwyu.verbose |
| |
| =============== |
| Developer Notes |
| =============== |
| |
| The codebase is strewn with TODOs of known problems, and also language |
| constructs that aren't adequately tested yet. So there's plenty to |
| do! Here's a brief guide through the codebase: |
| |
| * iwyu.cc: the main file, it includes the logic for deciding when a |
| symbol has been 'used', and whether it's a full use (definition |
| required) or forward-declare use (only a declaration required). It |
| also inclues the logic for following uses through template |
| instantiations. |
| |
| * iwyu_output.cc: the file that translates from 'uses' into iwyu |
| violations. This has the logic for deciding if a use is covered by |
| an existing #include (or is a built-in). It also, as the name |
| suggests, prints the iwyu output. |
| |
| * iwyu_preprocessor.cc: handles the preprocessor directives, the |
| #includes and #ifdefs, to construct the existing include-tree. This |
| is obviously essential for include-what-you-use analysis. This file |
| also handles the iwyu pragma-comments. |
| |
| * iwyu_include_picker.cc: this finds canonical #includes, handling |
| hard-coded private->public mappings (like bits/stl_vector.h -> |
| vector) and symbols with multiple possible #includes (like NULL). |
| |
| * iwyu_cache.cc: holds the cache of instantiated templates (may hold |
| other cached info later). This is data that is expensive to compute |
| and may be used more than once. |
| |
| * iwyu_globals.cc: holds various global variables. We used to think |
| globals were bad, until we saw how much having this file simplified |
| the code... |
| |
| * iwyu_*_util(s).h and .cc: utility functions of various types. The |
| most interesting, perhaps, is iwyu_ast_util.h, which has routines |
| that make it easier to navigate and analyze the clang AST. There |
| are also some STL helpers, string helpers, filesystem helpers, etc. |
| |
| * fix_includes.py: the helper script that edits a file based on the |
| iwyu recommendations. |
| |
| ===================== |
| Why IWYU is Difficult |
| ===================== |
| |
| This last section is informational, for folks who are wondering why |
| include-what-you-use requires so much code and yet still has so many |
| errors. |
| |
| Include-what-you-use has the most problems with templates and |
| macros. If your code doesn't use either, iwyu will probably do |
| great. And, you're probably not actually programming in C++... |
| |
| USE VERSUS FORWARD DECLARE |
| |
| Include-what-you-use has to be able to tell when a symbol is being |
| used in a way that you can forward-declare it. Otherwise, if you wrote |
| |
| vector<MyClass*> foo; |
| |
| iwyu would tell you to #include "myclass.h", when perhaps the whole |
| reason you're using a pointer here is to avoid the need for that |
| #include. |
| |
| In the above case, it's pretty easy for iwyu to tell that we can |
| safely forward-declare MyClass. But now consider |
| |
| vector<MyClass> foo; // requires full definition of MyClass |
| scoped_ptr<MyClass> foo; // forward-declaring MyClass is ok |
| |
| To distinguish these, clang has to instantiate the vector and |
| scoped_ptr template classes, including analyzing all member variables |
| and the bodies of the constructor and destructor (and recursively for |
| superclasses). |
| |
| But that's not enough: when instantiating the templates, we need to |
| keep track of which symbols come from template arguments and which |
| don't. For instance, suppose you call MyFunc<MyClass>(), where MyFunc |
| looks like this: |
| |
| template<typename T> void MyFunc() { |
| T* t; |
| MyClass myclass; |
| ... |
| } |
| |
| In this case, the caller of MyFunc is not using the full type of |
| MyClass, because the template parameter is only used as a pointer. On |
| the other hand, the file that defines MyFunc is using the full type |
| information for MyClass. The end result is that the caller can |
| forward-declare MyClass, but the file defining MyFunc has to #include |
| "myclass.h". |
| |
| HANDLING TEMPLATE ARGUMENTS |
| |
| Even figuring out what types are 'used' with a template can be |
| difficult. Consider the following two declarations: |
| |
| vector<MyClass> v; |
| hash_set<MyClass> h; |
| |
| These both have default template arguments, so are parsed like |
| |
| vector<MyClass, alloc<MyClass> > v; |
| hash_set<MyClass, hash<MyClass>, equal_to<MyClass>, alloc<MyClass> > h; |
| |
| What symbols should we say are used? If we say alloc<MyClass> is used |
| when you declare a vector, then every file that #includes <vector> |
| will also need to #include <memory>. |
| |
| So it's tempting to just ignore default template arguments. But that's |
| not right either. What if hash<MyClass> is defined in some local |
| myhash.h file (as hash<string> often is)? Then we want to make sure iwyu |
| says to #include "myhash.h" when you create the hash_set |
| (otherwise the code won't compile). That requires paying attention to |
| the default template argument. Figuring out how to handle default |
| template arguments can get very complex. |
| |
| Even normal template arguments can be confusing. Consider this |
| templated function: |
| |
| template<typename A, typename B, typename C> void MyFunc(A (*fn)(B,C)) { ... } |
| |
| and you call MyFunc(FunctionReturningAFunctionPointer()). What types |
| are being used where, in this case? |
| |
| WHO IS RESPONSIBLE FOR DEPENDENT TEMPLATE TYPES? |
| |
| If you say vector<MyClass> v;, it's clear that you, and not vector.h, |
| are responsible for the use of MyClass, even though all the functions |
| that use MyClass are defined in vector.h. (OK, technically, these |
| functions are not "defined" in a particular location, they're |
| instantiated from template methods written in vector.h, but for us it |
| works out the same.) |
| |
| When you say hash_map<MyClass, int> h;, you are likewise responsible |
| for MyClass (and int), but are you responsible for pair<MyClass, int>? |
| That is the type that hash_map uses to store your entries internally, |
| and it depends on one of your template arguments, but even so it |
| shouldn't be your responsibility -- it's an implementation detail of |
| hash_map. Of course, if you say hash_map<pair<int, int>, int>, then |
| you are responsible for the use of pair. Distinguishing these two |
| cases from each other, and from the vector case, can be difficult. |
| |
| Now suppose there's a template function like this: |
| |
| template<typename T> void MyFunc(T t) { |
| strcat(t, 'a'); |
| strchr(t, 'a'); |
| cerr << t; |
| } |
| |
| If you call MyFunc(some_char_star), which of these symbols are you |
| responsible for, and which is the author of MyFunc responsible for: |
| strcat, strchr, operator<<(ostream&, T)? |
| |
| strcat is a normal function, and the author of MyFunc is responsible |
| for its use. This is an easy case. |
| |
| In C++, strchr is a templatized function (different impls for char* |
| and const char*). Which version is called depends on the template |
| argument. So, naively, we'd conclude that the caller is responsible |
| for the use of strchr. However, that's ridiculous; we don't want every |
| caller of MyFunc to have to #include <string.h> just to call |
| MyFunc. We have special code that (usually) handles this kind of case. |
| |
| operator<< is also a templatized function, but it's one that may be |
| defined in lots of different files. It would be ridiculous in its own |
| way if MyFunc was responsible for #including every file that defines |
| operator<<(ostream&, T) for all T. So, unlike the two cases above, the |
| caller is the one responsible for the use of operator<<, and will have |
| to #include the file that defines it. It's counter-intuitive, perhaps, |
| but the alternatives are all worse. |
| |
| As you can imagine, distinguishing all these cases is extremely |
| difficult. To get it exactly right would require re-implementing C++'s |
| (byzantine) lookup rules, which we have not yet tackled. |
| |
| TEMPLATE TEMPLATE TYPES |
| |
| Let's say you have a function |
| |
| template<template<typename U> T> void MyFunc() { |
| T<string> t; |
| } |
| |
| and you call MyFunc<hash_set>(). Who is responsible for the 'use' of |
| hash<string>, and thus needs to #include "myhash.h"? I think |
| it has to be the caller, even if the caller never uses the string type |
| in its file at all. This is rather counter-intuitive. Luckily, it's |
| also rather rare. |
| |
| TYPEDEFS |
| |
| Suppose you #include a file "foo.h" that has typedef hash_map<Foo, |
| Bar> MyMap;. And you have this code: |
| |
| for (MyMap::iterator it = ...) |
| |
| Who, if anyone, is using the symbol hash_map<Foo, Bar>::iterator? If |
| we say you, as the author of the for-loop, are the user, then you must |
| #include <hash_map>, which undoubtedly goes against the goal of the |
| typedef (you shouldn't even have to know you're using a hash_map). So |
| we want to say the author of the typedef is responsible for the |
| use. But how could the author of the typedef know that you were going |
| to use MyMap::iterator? It can't predict that. That means it has to be |
| responsible for every possible use of the typedef type. This can be |
| complicated to figure out. It requires instantiating all methods of |
| the underlying type, some of which might not even be legal C++ (if, |
| say, the class uses SFINAE). |
| |
| Worse, when the language auto-derives template types, it loses typedef |
| information. Suppose you wrote this: |
| |
| MyMap m; |
| find(m.begin(), m.end(), some_foo); |
| |
| The compiler sees this as syntactic sugar for find<hash_map<Foo, Bar, |
| hash<Foo>, equal_to<Foo>, alloc<Foo> >(m.begin(), m.end(), some_foo); |
| |
| Not only is the template argument hash_map instead of MyMap, it |
| includes all the default template arguments, with no indication |
| they're default arguments. All the tricks we used above to |
| intelligently ignore default template arguments are worthless here. We |
| have to jump through lots of hoops so this code doesn't require you to |
| #include not only <hash_map>, but <alloc> and <utility> as well. |
| |
| MACROS |
| |
| It's no surprise macros cause a huge problem for include-what-you-use. |
| Basically, all the problems of templates also apply to macros, but |
| worse: we can analyze an uninstantiated template, but we can't analyze |
| an uninstantiated macro -- the macro likely doesn't even parse cleanly |
| in isolation. As a result, we have very few tools to distinguish when |
| the author of a macro is responsible for a symbol used in a macro, and |
| when the caller of the macro is responsible. |
| |
| PRIVATE INCLUDES |
| |
| Suppose you write vector<int> v;. You are using vector, and thus have |
| to #include <vector>. Even this seemingly easy case is difficult, |
| because vector isn't actually defined in <vector>; it's defined in |
| <bits/stl_vector.h>. The C++ standard library has hundreds of private |
| files that users are not supposed to #include directly. Third party |
| libraries have hundreds more. There's no general way to distinguish |
| private from public headers; we have to manually construct the proper |
| mapping. |
| |
| In the future, we hope to provide a way for users to annotate if a |
| file is public or private, either a comment or a #pragma. For now, we |
| hard-code it in the iwyu tool. |
| |
| The mappings themselves can be ambiguous. For instance, NULL is |
| provided by many files, including stddef.h, stdlib.h, and more. If you |
| use NULL, what #include file should iwyu suggest? We have rules to try |
| to minimize the number of #includes you have to add; it can get rather |
| involved. |
| |
| UNPARSED CODE |
| |
| Conditional #includes are a problem for iwyu when the condition is |
| false: |
| |
| #if _MSC_VER |
| #include <foo> |
| #endif |
| |
| If we're not running under windows (and iwyu does not currently run |
| under windows), we have no way of telling if foo is a necessary |
| #include or not. |
| |
| PLACING NEW INCLUDES AND FORWARD-DECLARES |
| |
| Figuring out where to insert new #includes and forward-declares is a |
| complex problem of its own (one that is the responsibility of |
| fix_includes.py). In general, we want to put new #includes with |
| existing #includes. But the existing #includes may be broken up into |
| sections, either because of conditional #includes (with #ifdefs), or |
| macros (such as #define __GNU_SOURCE), or for other reasons. Some |
| forward-declares may need to come early in the file, and some may |
| prefer to come later (after we're in an appropriate namespace, for |
| instance). |
| |
| fix_includes.py tries its best to give pleasant-looking output, while |
| being conservative about putting code in a place where it might not |
| compile. It uses heuristics to do this, which are not yet perfect. |