| /* ****************************************************************************** |
| * Copyright (c) 2010-2025 Google, Inc. All rights reserved. |
| * ******************************************************************************/ |
| |
| /* |
| * Redistribution and use in source and binary forms, with or without |
| * modification, are permitted provided that the following conditions are met: |
| * |
| * * Redistributions of source code must retain the above copyright notice, |
| * this list of conditions and the following disclaimer. |
| * |
| * * Redistributions in binary form must reproduce the above copyright notice, |
| * this list of conditions and the following disclaimer in the documentation |
| * and/or other materials provided with the distribution. |
| * |
| * * Neither the name of Google, Inc. nor the names of its contributors may be |
| * used to endorse or promote products derived from this software without |
| * specific prior written permission. |
| * |
| * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" |
| * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE |
| * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE |
| * ARE DISCLAIMED. IN NO EVENT SHALL VMWARE, INC. OR CONTRIBUTORS BE LIABLE |
| * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL |
| * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR |
| * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER |
| * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT |
| * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY |
| * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH |
| * DAMAGE. |
| */ |
| |
| /** |
| **************************************************************************** |
| \page page_code_style Coding Style Conventions |
| |
| The overall goal is to <strong>make the source code as readable and |
| as self-documenting as possible</strong>. Everyone following the same |
| style guidelines is an important part of keeping the code consistent and |
| maintainable. |
| |
| # Automated Formatting |
| |
| We employ automated code formatting via [`clang-format` version 14.0](https://releases.llvm.org/14.0.0/tools/clang/docs/ClangFormat.html). The [`.clang-format` top-level file](https://github.com/DynamoRIO/dynamorio/blob/master/.clang-format) specifies the style rules for all `.h`, `.c`, and `.cpp` source code files. Developers are expected to set up their editors to use `clang-format` when saving each file (see [the `clang-format` documentation](https://releases.llvm.org/14.0.0/tools/clang/docs/ClangFormat.html) for pointers to vim, emacs, and Visual Studio setup instructions). Our test suite includes a format check and will fail any code whose formatting does not match the `clang-format` output. |
| |
| # Legacy Code |
| |
| Some of the style conventions have changed over time, but we have not |
| wanted to incur the cost in time and history confusion of changing all |
| old code. Thus, you may observe old code that does not comply with |
| some of these conventions. These listed conventions overrule surrounding |
| code! Please change the style of old code when you are making other |
| changes to the same lines. |
| |
| # Naming Conventions |
| |
| -# Header files consisting of exported interface content that is also used internally are named with an `_api.h` suffix: e.g., `encode_api.h`. If the content is solely used externally, it should be named the same as its exported name: e.g., `dr_inject.h`. For the former case, where the exported name is different, the include guards should use the exported name. |
| |
| -# Variable and function names use only lowercase letters. Multi-word |
| function and variable names are all lowercase with underscores delimiting |
| words. Do not use CamelCase for names, unless mirroring Windows-defined |
| data structures. |
| |
| \b GOOD: `instr_get_target()` |
| |
| \e BAD: `instrGetTarget()` |
| |
| -# Type names are all lowercase, with underscores dividing words, and |
| ending in `_t`: |
| ``` |
| instr_t |
| build_bb_t |
| ``` |
| This is true for C++ class names as well. |
| |
| -# The name of a struct in a C typedef should be the type name with an |
| underscore prefixed: |
| ``` |
| typedef struct _build_bb_t { |
| ... |
| } build_bb_t; |
| ``` |
| |
| -# Constants should be in all capital letters, with underscores dividing |
| words. Enum members should use a common descriptive prefix. |
| ``` |
| static const int MAX_SIZE = 256; |
| enum { |
| DUMPCORE_DEADLOCK = 0x0004, |
| DUMPCORE_ASSERTION = 0x0008, |
| }; |
| ``` |
| |
| -# For C++ code, class member fields are named like other variables except they contain an underscore suffix. Structs that contain methods other than constructors also have this suffix (but usually that should be a class); simple structs with no method other than a constructor do not have this suffix, nor do constants if they follow the constant naming conventions. |
| ``` |
| class apple_t { |
| ... |
| int num_seeds_; |
| }; |
| struct truck_t { |
| int num_wheels; |
| }; |
| ``` |
| |
| -# Preprocessor defines and macros should be in all capital letters, with |
| underscores dividing words. |
| ``` |
| #ifdef WINDOWS |
| # define IF_WINDOWS(x) x |
| #else |
| # define IF_WINDOWS(x) |
| #endif |
| ``` |
| |
| -# Preprocessor defines that include a leading or trailing comma should |
| have a corresponding leading or trailing underscore: |
| ``` |
| #define _IF_WINDOWS(x) , x |
| #define IF_WINDOWS_(x) x, |
| ``` |
| |
| -# Functions that operate on a data structure should contain that |
| structure as a prefix. For example, all of the routines that operate on |
| the `instr_t` struct begin with `instr_`. |
| |
| -# In `core/`, short names or any global name with a chance of colliding with names from an including application linking statically should be qualified with a `d_r_` prefx: e.g., `d_r_dispatch`. This is distinct from the `dr_` prefix which is used on exported interface names. |
| |
| -# Use `static` when possible for every function or variable |
| that is not needed outside of its own file. |
| |
| -# Do not shadow global variables (or variables in containing scopes) by using |
| local variables of the same name: choose a distinct name for the local variable. |
| |
| -# Template parameters in C++ should have a descriptive CamelCase identifier. |
| |
| # Types |
| |
| |
| -# See above for naming conventions for types. |
| |
| -# When declaring a function with no arguments, always explicitly use |
| the `void` keyword. Otherwise the compiler will not be able to |
| check whether you are incorrectly passing arguments to that function. |
| |
| \b GOOD: `int foo(void);` |
| |
| \e BAD: `int foo();` |
| |
| -# Use the `IN`, `OUT`, and `INOUT` labels to |
| describe function parameters. This is a recent addition to DynamoRIO so |
| you will see many older functions without these labels, but use them on all |
| new functions. |
| |
| \b GOOD: `int foo(IN int length, OUT char *buf);` |
| |
| \e BAD: `int foo(int length, char **buf);` |
| |
| -# Only use boolean types as conditionals. This means using explicit NULL |
| comparisons and result comparisons. In particular with functions like |
| strcmp() and memcmp(), the use of ! is counter-intuitive. |
| |
| \b GOOD: `if (p == NULL) ...` |
| |
| \e BAD: `if (!p)` |
| |
| \b GOOD: `if (p != NULL) ...` |
| |
| \e BAD: `if (p)` |
| |
| \b GOOD: `if (strncmp(...) == 0) ...` |
| |
| \e BAD: `if (!strncmp(...))` |
| |
| -# Use constants of the appropriate type. Assign or compare a character to '\0' not to `0`. |
| |
| -# It's much easier to read `if (i == 0)` than `if (0 == i)`. |
| The compiler, with all warnings turned on (which we have), will |
| warn you if you use assignment rather than equality. |
| |
| \b GOOD: `if (i == 0) ...` |
| |
| \e BAD: `if (0 == i)` |
| |
| -# Use the `TEST` and related macros for testing bits. |
| |
| \b GOOD: `if (TEST(BITMASK, x))` |
| |
| \e BAD: `if ((x & BITMASK) != 0)` |
| |
| -# Write code that is 32-bit and 64-bit aware: |
| |
| - Use int and uint for 32-bit integers. Do not use long as its size is 64-bit for Linux but 32-bit for Windows. We assume that int is a 32-bit type. |
| - Use int64 and uint64 for 64-bit integers. Use `INT64_FORMAT` and related macros for printing 64-bit integers. |
| - Use ptr_uint_t and ptr_int_t for pointer-sized integers. |
| - Use size_t for sizes of memory regions. |
| - Use reg_t for register-sized values whose type is not known. |
| - Use `ASSERT_TRUNCATE` macros when casting to a smaller type. |
| - Use `PFX` (rather than %p, which is inconsistent across compilers) and other printf macros for printing pointer-sized variables in code using general printing libraries. For code that exclusively uses DR's own printing facilities, %p is allowed: its improved code readability and simplicity outweigh the risk of such code being copied into non-DR locations and resulting in inconsistent output. |
| - When generating code or writing assembler code, be aware of stack alignment restrictions. |
| |
| |
| -# Invalid addresses, either pointers to our data structures or |
| application addresses that we're manipulating, have the value NULL, not 0. |
| 0 is only for arithmetic types. |
| |
| -# `const` makes code easier to read and lets the compiler complain |
| about errors and generate better code. It is also required for the most |
| efficient self-protection. Use whenever possible. However, do not mark |
| simple scalar type function parameters as `const`. |
| |
| -# Place `*` (or `&` for C++ references) prefixing variable names (C style), not suffixing type names |
| (Java style): |
| |
| \b GOOD: `char *foo;` |
| |
| \e BAD: `char* foo;` |
| |
| -# In a struct, union, or class, list each field on its own line with its own type declaration, even when sharing the type of the prior field. Similarly, declare global variables separately. Local variables of the same type can optionally be combined on a line. |
| |
| \b GOOD: |
| ``` |
| struct foo { |
| int field1; |
| int field2; |
| }; |
| ``` |
| |
| \e BAD: |
| ``` |
| struct foo { |
| int field1, field2; |
| }; |
| ``` |
| |
| -# Do not assume that `char` is signed: use our `sbyte` typedef for a signed one-byte type. |
| |
| |
| # Commenting Conventions |
| |
| |
| -# For C code, `/∗ ∗/` comments are preferable to `//`. |
| Put stars on each line of a multi-line comment, like this: |
| \verbatim |
| /* multi-line comment |
| * with stars |
| */ |
| \endverbatim |
| The trailing `∗/` can be either on its own line or the end of the preceding line, but on its own line is preferred. |
| |
| For C++ code, `//` comments are allowed. |
| |
| -# Make liberal use of comments. However, too many comments can impair |
| readability. Choose self-descriptive function and variable names to reduce |
| the number of comments needed. |
| |
| -# Do not use large, clunky function headers that simply duplicate |
| information in the code itself. Such headers tend to contain stale, |
| incorrect information, for two reasons: the code is often updated without |
| maintaining the header, and since the headers are a pain to type they are |
| often copied from other functions and not completely modified for their new |
| home. They also make it harder to see the code or to group related |
| functions, as they take up so much screen space. It is better to have |
| leaner, more maintainable, and more readable implementation files by using |
| self-descriptive function and parameter names and placing comments for |
| function parameters next to the parameters themselves. |
| |
| \b GOOD: |
| \verbatim |
| /* Retrieves the name of the logfile for a particular thread. |
| * Returns false if no such thread exists. |
| */ |
| bool get_logfile(IN thread_id_t thread, |
| OUT char **fname, |
| IN size_t fname_len) |
| \endverbatim |
| \e BAD: |
| \verbatim |
| /*------------------------------------------------------ |
| * Name: get_logfile |
| * |
| * Purpose: |
| * Retrieves the name of the logfile for a particular thread. |
| * |
| * Parameters: |
| * [IN] thread = which thread |
| * [OUT](IN] fname = where to store the logfile name |
| * [IN] fname_len = the size of the fname buffer |
| * |
| * Returns: |
| * True if successful. |
| * False if no such thread exists. |
| * |
| * Side effects: |
| * None. |
| * ------------------------------------------------------ |
| */ |
| bool get_logfile(thread_id_t thread, char *fname, size_t fname_len) |
| \endverbatim |
| |
| |
| -# Use doxygen comments on all function and type declarations that are |
| exported as part of the API. For comments starting with `/∗∗`, |
| leave the rest of the first line empty, unless the entire comment is a |
| single line. Some examples (ignore leading dots -- only there to work around GitHub markdown problems with leading spaces in literal blocks in list entries): |
| \verbatim |
| DR_API |
| /** |
| * Returns the entry point of the function with the given name in the module |
| * with the given base. Returns NULL on failure. |
| * \note Currently Windows only. |
| */ |
| generic_func_t |
| dr_get_proc_address(IN module_handle_t lib, IN const char *name); |
| |
| /** |
| * Data structure passed with a signal event. Contains the machine |
| * context at the signal interruption point and other signal |
| * information. |
| */ |
| typedef struct _dr_siginfo_t { |
| int sig; /**< The signal number. */ |
| void *drcontext; /**< The context of the thread receiving the signal. */ |
| dr_mcontext_t mcontext; /**< The machine state at the signal interruption point. */ |
| siginfo_t siginfo; /**< The signal information provided by the kernel. **/ |
| } dr_siginfo_t; |
| \endverbatim |
| |
| Within doxygen comments, create links using parentheses for |
| functions `foo()` and a leading `#` for other items such as types |
| `#dr_siginfo_t` or defines `#DR_REG_START_GPR`. See the doxygen |
| documentation for more information: |
| http://www.doxygen.nl/manual/autolink.html |
| |
| -# <strong>NEVER</strong> check in commented-out code. This is |
| unacceptable. If you feel strongly that you need to leave code in that is |
| disabled, use conditional compilation (e.g., |
| <tt>\#if DISABLED_UNTIL_BUG_812_IS_FIXED</tt>), and perhaps additionally |
| explain in a comment why the code is disabled. |
| |
| -# Sloppy comments full of misspelled words, etc. are an indication of |
| carelessness. We do not want carelessly written code, and we do not |
| want carelessly written comments. |
| |
| -# Comments that contain more than one sentence should be properly |
| capitalized and punctuated and should use complete sentences. Single-sentence |
| comments should also prefer capitalization, punctuation, and to use a |
| complete sentence when occupying an entire line. For comments that are |
| inside a line of code or at the end of a line of code, sentence fragments or |
| phrases are fine. |
| |
| -# Use `XXX` in comments to indicate code that |
| could be optimized or something that may warrant re-examination later. |
| Include the issue number using the syntax `i#<number>`. For example (ignore leading dots -- only there to work around GitHub markdown problems with leading spaces in literal blocks in list entries): |
| \verbatim |
| /* XXX i#391: this could be done more efficiently via ... |
| */ |
| \endverbatim |
| |
| -# Use `TODO` in comments to indicate missing features that are required and not just optimizations or optional improvements (use `XXX` for those). |
| (Avoid `XXX` in new comments as its connotations are too negative and too easily |
| mis-interpreted in code audits.) |
| Include the issue number using the syntax `i#<number>`. For example (ignore leading dots -- only there to work around GitHub markdown problems with leading spaces in literal blocks in list entries): |
| \verbatim |
| /* TODO i#999: we do not yet handle a corner case where ... |
| */ |
| \endverbatim |
| |
| -# Mark any temporary or unfinished code unsuitable for committing with a |
| `NOCHECKIN` comment. The `make/codereview.cmake` script will remind you to clean |
| up the code. |
| \verbatim |
| x = 4; /* NOCHECKIN: temporary debugging change */ |
| \endverbatim |
| |
| -# For banner comments that separate out groups of related functions, use the following style (ignore leading dots -- only there to work around GitHub markdown problems with leading spaces in literal blocks in list entries): |
| \verbatim |
| /**************************************************************************** |
| * Name for this group of functions |
| */ |
| |
| \endverbatim |
| If a closing marker is needed use this style: |
| \verbatim |
| /* |
| ****************************************************************************/ |
| \endverbatim |
| |
| |
| # Warnings Are Errors |
| |
| |
| -# Uninitialized variables warning (W4701 for cl): Don't initialize |
| variables when you don't need to, so that we can still have good warnings |
| about uninitialized variables in the future. Only if the compiler can't |
| analyze code properly is it better to err on the side of a deterministic |
| bug and set to 0 or `{0}`. |
| |
| Use `do {} while ()` loops to help the compiler figure out that variables |
| will get initialized. The generated code on those constructs is faster and |
| better predicted (although optimizations should be able to transform simple |
| loops). |
| |
| -# For suggested use of static analysis tools: PreFAST or /analyze for |
| new code, refer to case 3966. |
| |
| |
| |
| # Program Structure |
| |
| |
| -# Keep the line length to 90 characters or less. |
| |
| -# Use an indentation level of 4 spaces (no tabs, always expand them to |
| spaces when saving the file). (Exception: in CMakeLists.txt and other CMake scripts, use an indentation level of 2 spaces.) |
| |
| <strong>WARNING</strong>: Emacs defaults are not always correct here. Make |
| sure your .emacs contains the following: |
| ``` |
| ; always expand tabs to spaces |
| (setq-default indent-tabs-mode 'nil) |
| |
| ; want "gnu" style but indent of 4: |
| (setq c-basic-offset 4) |
| (add-hook 'c-mode-hook '(lambda () |
| (setq c-basic-offset 4))) |
| ``` |
| |
| For CMake, use cmake-mode which does default to 2 spaces. |
| |
| -# K&R-style braces: opening braces at the end of the line preceding |
| the new code block, closing braces on their own line at the same |
| indentation as the line preceding the code block. Functions are an |
| exception -- see below. |
| |
| -# Functions should have their type on a separate line from their name. |
| Place the function's opening brace on a line by itself at zero |
| indentation. |
| ``` |
| int |
| foo(int x, int y) |
| { |
| return 42; |
| } |
| ``` |
| |
| Function declarations should also have the type on a separate line, although this rule can be relaxed for short (single-line) signatures with a one-line comment or no comment beforehand. |
| |
| -# Put spaces after commas in parameter and argument lists |
| |
| \b GOOD: `foo(x, y, z);` |
| |
| \e BAD: `foo(x,y,z);` |
| |
| -# Do not put spaces between a function name and the |
| following parenthesis. Do put a space between a `for`, `if`, `while`, |
| `do`, or `switch` and the following parenthesis. Do not put spaces after |
| an opening parenthesis or before a closing parenthesis, unless the interior |
| expression is complex and contains multiple layers of parentheses. |
| |
| \b GOOD: `foo(x, y, z);` |
| |
| \e BAD: `foo (x, y, z);` |
| |
| \e BAD: `foo( x, y, z);` |
| |
| \e BAD: `foo(x, y, z );` |
| |
| \b GOOD: `if (x == 6)` |
| |
| \e BAD: `if( x==6 )` |
| |
| -# Always put the body of an if or loop on a separate line from the |
| line containing the keyword. |
| |
| \b GOOD: |
| ``` |
| if (x == 6) |
| y = 5; |
| ``` |
| \e BAD: |
| ``` |
| if (x==6) y = 5; |
| ``` |
| |
| -# A multi-line (not just multi-statement) body (of an if, loop, etc.) |
| should always be surrounded with braces (to avoid errors in later |
| statements arising from indentation mistakes). |
| |
| -# Statements should always begin on a new line (do not put multiple |
| statements on the same line). |
| |
| -# The case statements of a switch statement should not be indented: they should line up with the switch itself. |
| \b GOOD: |
| ``` |
| switch (opc) { |
| case OP_add: ... |
| case OP_sub: ... |
| default: ... |
| } |
| ``` |
| \e BAD: |
| ``` |
| switch (opc) { |
| case OP_add: ... |
| case OP_sub: ... |
| default: ... |
| } |
| ``` |
| |
| -# Indent nested preprocessor statements. The `#` character must be in |
| the first column, but the rest of the statement can be indented. |
| ``` |
| #ifdef OUTERDEF |
| # ifdef INNERDEF |
| # define INSIDE |
| # endif |
| #else |
| # define OUTSIDE |
| #endif |
| ``` |
| |
| -# Macros, globals, and typedefs should be declared either at the top of the file or at the top of a related group of functions that are delineated by a banner comment. Do not place global declarations or defines at random places in the middle of a file. |
| |
| -# To make the code easier to read, use the `DODEBUG`, |
| `DOSTATS`, or `DOLOG` macros, or the `IF_WINDOWS` and |
| related macros, rather than ifdefs, for common defines. |
| |
| -# Do not use `DO_ONCE(SYSLOG_INTERNAL`. Instead use two new macros: |
| `DODEBUG_ONCE` and `SYSLOG_INTERNAL_*_ONCE`. |
| |
| -# Use `make/codereview.cmake`'s style checks to examine the code for |
| known poor coding patterns. In the future we may add checks using `astyle` |
| ([issue 83](https://github.com/DynamoRIO/dynamorio/issues#issue/83)). |
| |
| -# In .asm files, place opcodes on column 8 and start operands on column 17. |
| |
| -# Multi-statement macros should always be inside "do { ... } while (0)" to avoid mistaken sequences with use as the body of an if() or other construct. |
| |
| -# When using DEBUG_DECLARE or other conditional macros at the start of a line, move any code not within the macro to the subsequent line, to aid readability and avoid the reader skipping over it under the assumption that it's debug-only. E.g.: |
| |
| \b GOOD: |
| ``` |
| DEBUG_DECLARE(bool res =) |
| foo(bar); |
| ``` |
| \e BAD: |
| ``` |
| DEBUG_DECLARE(bool res =) foo(bar); |
| ``` |
| |
| -# Avoid embedding assignments inside expressions. We consider a separate assignment statement to be more readable. E.g.: |
| |
| \b GOOD: |
| ``` |
| x = foo(); |
| if (x == 0) { ... |
| ``` |
| \e BAD: |
| ``` |
| if ((x = foo()) == 0) { ... |
| ``` |
| |
| -# Avoid embedding non-const expressions inside macros. We consider a separate expression statement to be more readable, as well as avoiding hidden errors when macros such as ASSERT are disabled in non-debug build. Example: |
| |
| \b GOOD: |
| ``` |
| bool success = set_some_value(); |
| ASSERT(success); |
| ``` |
| \e BAD: |
| ``` |
| ASSERT(set_some_value()); |
| ``` |
| |
| -# Use named constants rather than "magic values" embedded in the code. Recognizing and naming constants up front and centralizing them makes assumptions clearer, code more readable, and modifying and maintaining the code easier. |
| |
| \b GOOD: `char buf[MAXIMUM_LINE_LENGTH];` |
| |
| \e BAD: `char buf[128];` |
| |
| |
| |
| # File Organization |
| |
| |
| -# Write OS-independent code as much as possible and keep it in the |
| base `core/` directory. If code must diverge for Windows versus |
| Linux, provide an OS-independent interface documented |
| in `core/os_shared.h` and implemented separately |
| in `core/unix/` and `core/windows/`. |
| |
| # C++ |
| |
| |
| -# While the core DynamoRIO library and API are C, we do support C++ clients and have some C++ tests and clients ourselves. For broad compiler support we limit our code to C++11. |
| |
| -# C++ exception use is not allowed, for maximum interoperability to enable using libraries and source code in other environments where exceptions are not permitted. |
| |
| |
| **************************************************************************** |
| */ |