api/docs/code_style.dox - external/github.com/DynamoRIO/dynamorio - Git at Google

 /* ******************************************************************************
  * Copyright (c) 2010-2025 Google, Inc.  All rights reserved.
  * ******************************************************************************/

 /*
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are met:
  *
  * * Redistributions of source code must retain the above copyright notice,
  *   this list of conditions and the following disclaimer.
  *
  * * Redistributions in binary form must reproduce the above copyright notice,
  *   this list of conditions and the following disclaimer in the documentation
  *   and/or other materials provided with the distribution.
  *
  * * Neither the name of Google, Inc. nor the names of its contributors may be
  *   used to endorse or promote products derived from this software without
  *   specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
  * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED. IN NO EVENT SHALL VMWARE, INC. OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
  * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
  * DAMAGE.
  */

 /**
  ****************************************************************************
 \page page_code_style Coding Style Conventions

 The overall goal is to <strong>make the source code as readable and
 as self-documenting as possible</strong>.  Everyone following the same
 style guidelines is an important part of keeping the code consistent and
 maintainable.

 # Automated Formatting

 We employ automated code formatting via [`clang-format` version 14.0](https://releases.llvm.org/14.0.0/tools/clang/docs/ClangFormat.html). The [`.clang-format` top-level file](https://github.com/DynamoRIO/dynamorio/blob/master/.clang-format) specifies the style rules for all `.h`, `.c`, and `.cpp` source code files.  Developers are expected to set up their editors to use `clang-format` when saving each file (see [the `clang-format` documentation](https://releases.llvm.org/14.0.0/tools/clang/docs/ClangFormat.html) for pointers to vim, emacs, and Visual Studio setup instructions).  Our test suite includes a format check and will fail any code whose formatting does not match the `clang-format` output.

 # Legacy Code

 Some of the style conventions have changed over time, but we have not
 wanted to incur the cost in time and history confusion of changing all
 old code.  Thus, you may observe old code that does not comply with
 some of these conventions.  These listed conventions overrule surrounding
 code!  Please change the style of old code when you are making other
 changes to the same lines.

 # Naming Conventions

 -# Header files consisting of exported interface content that is also used internally are named with an `_api.h` suffix: e.g., `encode_api.h`.  If the content is solely used externally, it should be named the same as its exported name: e.g., `dr_inject.h`.  For the former case, where the exported name is different, the include guards should use the exported name.

 -# Variable and function names use only lowercase letters.  Multi-word
 function and variable names are all lowercase with underscores delimiting
 words.  Do not use CamelCase for names, unless mirroring Windows-defined
 data structures.

   \b GOOD: `instr_get_target()`

   \e BAD:  `instrGetTarget()`

 -# Type names are all lowercase, with underscores dividing words, and
 ending in `_t`:
  ```
   instr_t
   build_bb_t
  ```
 This is true for C++ class names as well.

 -# The name of a struct in a C typedef should be the type name with an
 underscore prefixed:
  ```
   typedef struct _build_bb_t {
       ...
   } build_bb_t;
  ```

 -# Constants should be in all capital letters, with underscores dividing
 words.  Enum members should use a common descriptive prefix.
  ```
   static const int MAX_SIZE = 256;
   enum {
       DUMPCORE_DEADLOCK           = 0x0004,
       DUMPCORE_ASSERTION          = 0x0008,
   };
  ```

 -# For C++ code, class member fields are named like other variables except they contain an underscore suffix.  Structs that contain methods other than constructors also have this suffix (but usually that should be a class); simple structs with no method other than a constructor do not have this suffix, nor do constants if they follow the constant naming conventions.
  ```
   class apple_t {
       ...
       int num_seeds_;
   };
   struct truck_t {
       int num_wheels;
   };
  ```

 -# Preprocessor defines and macros should be in all capital letters, with
 underscores dividing words.
  ```
   #ifdef WINDOWS
   #    define IF_WINDOWS(x) x
   #else
   #    define IF_WINDOWS(x)
   #endif
  ```

 -# Preprocessor defines that include a leading or trailing comma should
 have a corresponding leading or trailing underscore:
  ```
   #define _IF_WINDOWS(x) , x
   #define IF_WINDOWS_(x) x,
  ```

 -# Functions that operate on a data structure should contain that
 structure as a prefix.  For example, all of the routines that operate on
 the `instr_t` struct begin with `instr_`.

 -# In `core/`, short names or any global name with a chance of colliding with names from an including application linking statically should be qualified with a `d_r_` prefx: e.g., `d_r_dispatch`.  This is distinct from the `dr_` prefix which is used on exported interface names.

 -# Use `static` when possible for every function or variable
 that is not needed outside of its own file.

 -# Do not shadow global variables (or variables in containing scopes) by using
 local variables of the same name: choose a distinct name for the local variable.

 -# Template parameters in C++ should have a descriptive CamelCase identifier.

 # Types


 -# See above for naming conventions for types.

 -# When declaring a function with no arguments, always explicitly use
 the `void` keyword.  Otherwise the compiler will not be able to
 check whether you are incorrectly passing arguments to that function.

   \b GOOD: `int foo(void);`

   \e BAD:  `int foo();`

 -# Use the `IN`, `OUT`, and `INOUT` labels to
 describe function parameters.  This is a recent addition to DynamoRIO so
 you will see many older functions without these labels, but use them on all
 new functions.

   \b GOOD: `int foo(IN int length, OUT char *buf);`

   \e BAD:  `int foo(int length, char **buf);`

 -# Only use boolean types as conditionals.  This means using explicit NULL
 comparisons and result comparisons.  In particular with functions like
 strcmp() and memcmp(), the use of ! is counter-intuitive.

   \b GOOD: `if (p == NULL) ...`

   \e BAD:  `if (!p)`

   \b GOOD: `if (p != NULL) ...`

   \e BAD:  `if (p)`

   \b GOOD: `if (strncmp(...) == 0) ...`

   \e BAD:  `if (!strncmp(...))`

 -# Use constants of the appropriate type.  Assign or compare a character to '\0' not to `0`.

 -# It's much easier to read `if (i == 0)` than `if (0 == i)`.
 The compiler, with all warnings turned on (which we have), will
 warn you if you use assignment rather than equality.

   \b GOOD: `if (i == 0) ...`

   \e BAD:  `if (0 == i)`

 -# Use the `TEST` and related macros for testing bits.

   \b GOOD: `if (TEST(BITMASK, x))`

   \e BAD:  `if ((x & BITMASK) != 0)`

 -# Write code that is 32-bit and 64-bit aware:

   - Use int and uint for 32-bit integers.  Do not use long as its size is 64-bit for Linux but 32-bit for Windows.  We assume that int is a 32-bit type.
   - Use int64 and uint64 for 64-bit integers.  Use `INT64_FORMAT` and related macros for printing 64-bit integers.
   - Use ptr_uint_t and ptr_int_t for pointer-sized integers.
   - Use size_t for sizes of memory regions.
   - Use reg_t for register-sized values whose type is not known.
   - Use `ASSERT_TRUNCATE` macros when casting to a smaller type.
   - Use `PFX` (rather than %p, which is inconsistent across compilers) and other printf macros for printing pointer-sized variables in code using general printing libraries.  For code that exclusively uses DR's own printing facilities, %p is allowed: its improved code readability and simplicity outweigh the risk of such code being copied into non-DR locations and resulting in inconsistent output.
   - When generating code or writing assembler code, be aware of stack alignment restrictions.


 -# Invalid addresses, either pointers to our data structures or
 application addresses that we're manipulating, have the value NULL, not 0.
 0 is only for arithmetic types.

 -# `const` makes code easier to read and lets the compiler complain
 about errors and generate better code.  It is also required for the most
 efficient self-protection.  Use whenever possible.  However, do not mark
 simple scalar type function parameters as `const`.

 -# Place `*` (or `&` for C++ references) prefixing variable names (C style), not suffixing type names
 (Java style):

   \b GOOD: `char *foo;`

   \e BAD:  `char* foo;`

 -# In a struct, union, or class, list each field on its own line with its own type declaration, even when sharing the type of the prior field.  Similarly, declare global variables separately.  Local variables of the same type can optionally be combined on a line.

   \b GOOD:
   ```
   struct foo {
       int field1;
       int field2;
   };
   ```

   \e BAD:
   ```
   struct foo {
       int field1, field2;
   };
   ```

 -# Do not assume that `char` is signed: use our `sbyte` typedef for a signed one-byte type.


 # Commenting Conventions


 -# For C code, `/&lowast; &lowast;/` comments are preferable to `//`.
 Put stars on each line of a multi-line comment, like this:
  \verbatim
  /* multi-line comment
   * with stars
   */
  \endverbatim
 The trailing `&lowast;/` can be either on its own line or the end of the preceding line, but on its own line is preferred.

 For C++ code, `//` comments are allowed.

 -# Make liberal use of comments.  However, too many comments can impair
 readability.  Choose self-descriptive function and variable names to reduce
 the number of comments needed.

 -# Do not use large, clunky function headers that simply duplicate
 information in the code itself.  Such headers tend to contain stale,
 incorrect information, for two reasons: the code is often updated without
 maintaining the header, and since the headers are a pain to type they are
 often copied from other functions and not completely modified for their new
 home.  They also make it harder to see the code or to group related
 functions, as they take up so much screen space.  It is better to have
 leaner, more maintainable, and more readable implementation files by using
 self-descriptive function and parameter names and placing comments for
 function parameters next to the parameters themselves.

  \b GOOD:
  \verbatim
  /* Retrieves the name of the logfile for a particular thread.
   * Returns false if no such thread exists.
   */
  bool get_logfile(IN thread_id_t thread,
                   OUT char **fname,
                   IN size_t fname_len)
  \endverbatim
  \e BAD:
  \verbatim
  /*------------------------------------------------------
   * Name: get_logfile
   *
   * Purpose:
   * Retrieves the name of the logfile for a particular thread.
   *
   * Parameters:
   * [IN] thread    = which thread
   * [OUT](IN] fname     = where to store the logfile name
   * [IN]  fname_len = the size of the fname buffer
   *
   * Returns:
   * True if successful.
   * False if no such thread exists.
   *
   * Side effects:
   * None.
   * ------------------------------------------------------
   */
  bool get_logfile(thread_id_t thread, char *fname, size_t fname_len)
  \endverbatim


 -# Use doxygen comments on all function and type declarations that are
 exported as part of the API.  For comments starting with `/&lowast;&lowast;`,
 leave the rest of the first line empty, unless the entire comment is a
 single line.  Some examples (ignore leading dots -- only there to work around GitHub markdown problems with leading spaces in literal blocks in list entries):
  \verbatim
  DR_API
  /**
   * Returns the entry point of the function with the given name in the module
   * with the given base. Returns NULL on failure.
   * \note Currently Windows only.
   */
  generic_func_t
  dr_get_proc_address(IN module_handle_t lib, IN const char *name);

  /**
   * Data structure passed with a signal event.  Contains the machine
   * context at the signal interruption point and other signal
   * information.
   */
  typedef struct _dr_siginfo_t {
      int sig;                /**< The signal number. */
      void *drcontext;        /**< The context of the thread receiving the signal. */
      dr_mcontext_t mcontext; /**< The machine state at the signal interruption point. */
      siginfo_t siginfo;      /**< The signal information provided by the kernel. **/
  } dr_siginfo_t;
  \endverbatim

   Within doxygen comments, create links using parentheses for
   functions `foo()` and a leading `#` for other items such as types
   `#dr_siginfo_t` or defines `#DR_REG_START_GPR`.  See the doxygen
   documentation for more information:
   http://www.doxygen.nl/manual/autolink.html

 -# <strong>NEVER</strong> check in commented-out code.  This is
 unacceptable.  If you feel strongly that you need to leave code in that is
 disabled, use conditional compilation (e.g.,
 <tt>\#if DISABLED_UNTIL_BUG_812_IS_FIXED</tt>), and perhaps additionally
 explain in a comment why the code is disabled.

 -# Sloppy comments full of misspelled words, etc. are an indication of
 carelessness.  We do not want carelessly written code, and we do not
 want carelessly written comments.

 -# Comments that contain more than one sentence should be properly
 capitalized and punctuated and should use complete sentences.  Single-sentence
 comments should also prefer capitalization, punctuation, and to use a
 complete sentence when occupying an entire line.  For comments that are
 inside a line of code or at the end of a line of code, sentence fragments or
 phrases are fine.

 -# Use `XXX` in comments to indicate code that
 could be optimized or something that may warrant re-examination later.
 Include the issue number using the syntax `i#<number>`.  For example (ignore leading dots -- only there to work around GitHub markdown problems with leading spaces in literal blocks in list entries):
  \verbatim
  /* XXX i#391: this could be done more efficiently via ...
   */
  \endverbatim

 -# Use `TODO` in comments to indicate missing features that are required and not just optimizations or optional improvements (use `XXX` for those).
 (Avoid `XXX` in new comments as its connotations are too negative and too easily
 mis-interpreted in code audits.)
 Include the issue number using the syntax `i#<number>`.  For example (ignore leading dots -- only there to work around GitHub markdown problems with leading spaces in literal blocks in list entries):
  \verbatim
  /* TODO i#999: we do not yet handle a corner case where ...
   */
  \endverbatim

 -# Mark any temporary or unfinished code unsuitable for committing with a
 `NOCHECKIN` comment.  The `make/codereview.cmake` script will remind you to clean
 up the code.
  \verbatim
  x = 4; /* NOCHECKIN: temporary debugging change */
  \endverbatim

 -# For banner comments that separate out groups of related functions, use the following style (ignore leading dots -- only there to work around GitHub markdown problems with leading spaces in literal blocks in list entries):
  \verbatim
  /****************************************************************************
   * Name for this group of functions
   */

  \endverbatim
 If a closing marker is needed use this style:
  \verbatim
  /*
   ****************************************************************************/
  \endverbatim


 # Warnings Are Errors


 -#  Uninitialized variables warning (W4701 for cl): Don't initialize
 variables when you don't need to, so that we can still have good warnings
 about uninitialized variables in the future.  Only if the compiler can't
 analyze code properly is it better to err on the side of a deterministic
 bug and set to 0 or `{0}`.

  Use `do {} while ()` loops to help the compiler figure out that variables
  will get initialized.  The generated code on those constructs is faster and
  better predicted (although optimizations should be able to transform simple
  loops).

 -#  For suggested use of static analysis tools: PreFAST or /analyze for
 new code, refer to case 3966.


 # Program Structure


 -# Keep the line length to 90 characters or less.

 -# Use an indentation level of 4 spaces (no tabs, always expand them to
 spaces when saving the file).  (Exception: in CMakeLists.txt and other CMake scripts, use an indentation level of 2 spaces.)

  <strong>WARNING</strong>: Emacs defaults are not always correct here.  Make
  sure your .emacs contains the following:
  ```
  ; always expand tabs to spaces
  (setq-default indent-tabs-mode 'nil)

  ; want "gnu" style but indent of 4:
  (setq c-basic-offset 4)
  (add-hook 'c-mode-hook '(lambda ()
                            (setq c-basic-offset 4)))
  ```

  For CMake, use cmake-mode which does default to 2 spaces.

 -# K&R-style braces: opening braces at the end of the line preceding
 the new code block, closing braces on their own line at the same
 indentation as the line preceding the code block.  Functions are an
 exception -- see below.

 -# Functions should have their type on a separate line from their name.
 Place the function's opening brace on a line by itself at zero
 indentation.
  ```
   int
   foo(int x, int y)
   {
       return 42;
   }
  ```

  Function declarations should also have the type on a separate line, although this rule can be relaxed  for short (single-line) signatures with a one-line comment or no comment beforehand.

 -# Put spaces after commas in parameter and argument lists

   \b GOOD: `foo(x, y, z);`

   \e BAD:  `foo(x,y,z);`

 -# Do not put spaces between a function name and the
 following parenthesis.  Do put a space between a `for`, `if`, `while`,
 `do`, or `switch` and the following parenthesis.  Do not put spaces after
 an opening parenthesis or before a closing parenthesis, unless the interior
 expression is complex and contains multiple layers of parentheses.

   \b GOOD: `foo(x, y, z);`

   \e BAD:  `foo (x, y, z);`

   \e BAD:  `foo( x, y, z);`

   \e BAD:  `foo(x, y, z );`

   \b GOOD: `if (x == 6)`

   \e BAD:  `if( x==6 )`

 -# Always put the body of an if or loop on a separate line from the
 line containing the keyword.

   \b GOOD:
  ```
   if (x == 6)
       y = 5;
  ```
   \e BAD:
  ```
   if (x==6) y = 5;
  ```

 -# A multi-line (not just multi-statement) body (of an if, loop, etc.)
 should always be surrounded with braces (to avoid errors in later
 statements arising from indentation mistakes).

 -# Statements should always begin on a new line (do not put multiple
 statements on the same line).

 -# The case statements of a switch statement should not be indented: they should line up with the switch itself.
   \b GOOD:
  ```
   switch (opc) {
   case OP_add: ...
   case OP_sub: ...
   default: ...
   }
  ```
   \e BAD:
  ```
   switch (opc) {
       case OP_add: ...
       case OP_sub: ...
       default: ...
   }
  ```

 -# Indent nested preprocessor statements.  The `#` character must be in
 the first column, but the rest of the statement can be indented.
  ```
   #ifdef OUTERDEF
   #    ifdef INNERDEF
   #        define INSIDE
   #    endif
   #else
   #    define OUTSIDE
   #endif
  ```

 -# Macros, globals, and typedefs should be declared either at the top of the file or at the top of a related group of functions that are delineated by a banner comment.  Do not place global declarations or defines at random places in the middle of a file.

 -# To make the code easier to read, use the `DODEBUG`,
 `DOSTATS`, or `DOLOG` macros, or the `IF_WINDOWS` and
 related macros, rather than ifdefs, for common defines.

 -#  Do not use `DO_ONCE(SYSLOG_INTERNAL`.  Instead use two new macros:
 `DODEBUG_ONCE` and `SYSLOG_INTERNAL_*_ONCE`.

 -# Use `make/codereview.cmake`'s style checks to examine the code for
 known poor coding patterns.  In the future we may add checks using `astyle`
 ([issue 83](https://github.com/DynamoRIO/dynamorio/issues#issue/83)).

 -# In .asm files, place opcodes on column 8 and start operands on column 17.

 -# Multi-statement macros should always be inside "do { ... } while (0)" to avoid mistaken sequences with use as the body of an if() or other construct.

 -# When using DEBUG_DECLARE or other conditional macros at the start of a line, move any code not within the macro to the subsequent line, to aid readability and avoid the reader skipping over it under the assumption that it's debug-only.  E.g.:

   \b GOOD:
  ```
   DEBUG_DECLARE(bool res =)
       foo(bar);
  ```
   \e BAD:
  ```
   DEBUG_DECLARE(bool res =) foo(bar);
  ```

 -# Avoid embedding assignments inside expressions.  We consider a separate assignment statement to be more readable.  E.g.:

   \b GOOD:
  ```
   x = foo();
   if (x == 0) { ...
  ```
   \e BAD:
  ```
   if ((x = foo()) == 0) { ...
  ```

 -# Avoid embedding non-const expressions inside macros.  We consider a separate expression statement to be more readable, as well as avoiding hidden errors when macros such as ASSERT are disabled in non-debug build.  Example:

   \b GOOD:
  ```
   bool success = set_some_value();
   ASSERT(success);
  ```
   \e BAD:
  ```
   ASSERT(set_some_value());
  ```

 -# Use named constants rather than "magic values" embedded in the code.  Recognizing and naming constants up front and centralizing them makes assumptions clearer, code more readable, and modifying and maintaining the code easier.

   \b GOOD: `char buf[MAXIMUM_LINE_LENGTH];`

   \e BAD:  `char buf[128];`


 # File Organization


 -# Write OS-independent code as much as possible and keep it in the
 base `core/` directory.  If code must diverge for Windows versus
 Linux, provide an OS-independent interface documented
 in `core/os_shared.h` and implemented separately
 in `core/unix/` and `core/windows/`.

 # C++


 -# While the core DynamoRIO library and API are C, we do support C++ clients and have some C++ tests and clients ourselves.  For broad compiler support we limit our code to C++11.

 -# C++ exception use is not allowed, for maximum interoperability to enable using libraries and source code in other environments where exceptions are not permitted.


  ****************************************************************************
  */