api/docs/bt.dox - external/github.com/DynamoRIO/dynamorio - Git at Google

 /* **********************************************************
  * Copyright (c) 2011-2024 Google, Inc.  All rights reserved.
  * Copyright (c) 2007-2009 VMware, Inc.  All rights reserved.
  * **********************************************************/

 /*
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are met:
  *
  * * Redistributions of source code must retain the above copyright notice,
  *   this list of conditions and the following disclaimer.
  *
  * * Redistributions in binary form must reproduce the above copyright notice,
  *   this list of conditions and the following disclaimer in the documentation
  *   and/or other materials provided with the distribution.
  *
  * * Neither the name of VMware, Inc. nor the names of its contributors may be
  *   used to endorse or promote products derived from this software without
  *   specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
  * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED. IN NO EVENT SHALL VMWARE, INC. OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
  * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
  * DAMAGE.
  */

 /**
 ***************************************************************************
 ***************************************************************************
 \page API_BT Code Manipulation API

 The Code Manipulation API exposes the full power of DynamoRIO, allowing
 tools to observe and modify the application's actual code stream as it
 executes.  Modifications are not limited to trampoline insertion and can
 include arbitrary changes.  We divide the API description into the
 following sections:

 - \ref sec_IR
 - \ref sec_events_bt
 - \ref sec_decode
 - \ref sec_isa
 - \ref sec_IR_utils
 - \ref sec_reg_stolen
 - \ref sec_translation
 - \ref sec_predication
 - \ref sec_ldrex
 - \ref sec_rseq
 - \ref sec_pcache

 ***************************************************************************
 \htmlonly
 <table width=100% bgcolor="#000000" cellspacing=0 cellpadding=2 border=0>
   <tr><td><table width=100% bgcolor="#0000C0" cellspacing=0 cellpadding=1 border=0>
   <tr><td><table width=100% bgcolor="#0000C0" cellspacing=0 cellpadding=1 border=0>
   <tr><td></td></tr></table></td></tr></table></td></tr></table>
 \endhtmlonly
 \section sec_IR Instruction Representation

 The primary data structures involved in instruction manipulation are
 the #opnd_t, which represents one operand; the #instr_t, which
 represents a single instruction; and the \c
 #instrlist_t, which is a linked list of instructions.  The header files
 dr_ir_instrlist.h and dr_ir_instr.h list a number of functions that
 operate on these data structures, including:

 - Routines to create new instructions.
 - Routines to iterate over an instruction's operands.
 - Routines to iterate over an #instrlist_t.
 - Routines to insert and remove an #instr_t from an #instrlist_t.

 As we will see in the the \ref sec_events_bt section that follows, a
 client usually interacts with #instrlist_t's in the form of \e basic \e
 blocks or \e traces.  A basic block is a sequence of instructions that
 terminates with a control transfer operation.  Traces are
 frequently-executed sequences of basic blocks that DynamoRIO forms
 dynamically as the application executes, i.e., \e hot code.
 Collectively, we refer to basic blocks and traces as \e fragments.
 Both basic blocks and traces present a linear view of control flow.
 In other words, instruction sequences have a single entrance and one
 or more exits.  This representation greatly simplifies analysis and is
 a primary contributor to DynamoRIO's efficiency.

 The instruction representation includes all of the operands, whether
 implicit or explicit, and the condition code effects of each instruction.
 This allows for analysis of liveness of registers and condition codes.
 The operands are split into sources and destinations.

 A memory reference is treated as one operand even when it uses
 registers to compute its address: those constituent registers are not
 listed as their own separate source operands (unless they are read for
 other reasons such as updating the index register).  This means that a
 store to memory will have that store as a destination operand without
 listing the store's addressing mode registers as source operands in
 their own right.  Tools interested in all registers inside such
 operands can use opnd_get_num_regs_used() and opnd_get_reg_used() to
 generically walk the registers inside an operand, or
 instr_reads_from_reg() to determine whether an instruction reads a
 register either as a source operand or as a component of a destination
 memory reference.

 DynamoRIO's IR is mostly opaque to clients.  Key data structures have their
 sizes exposed to allow for stack allocation, but their fields are opaque.  In
 order to examine them, clients must call IR accessor routines in DynamoRIO.
 While this makes DynamoRIO ABI compatible with prior releases, there is a
 performance cost to calling through to an exported routine every time the
 client touches an instruction.  Clients that are not concerned with ABI
 compatibility can turn many of these export routine calls into inline functions
 or macros by setting the CMake variable \c DynamoRIO_FAST_IR on or defining \p
 DR_FAST_IR before including dr_api.h.  This removes some of the error checking
 that DynamoRIO performs on calls from the client, so it should typically be
 enabled only in a release build.  Furthermore, some of the macros evaluate their
 arguments twice, so clients should avoid passing arguments with side effects.

 When a new instruction is created using instr_create() or the
 INSTR_CREATE_* or XINST_CREATE_* macros, if the instruction is added to the
 #instrlist_t that is passed to the basic block or trace events, the heap
 memory used by the instruction is automatically freed when that instruction
 list is freed by DynamoRIO.  If instead an instruction is created on the
 heap but used for other purposes and not added to a DynamoRIO-provided
 instruction list, it should be freed by calling instr_destroy() or by
 explicitly destroying a custom instruction list.

 See \ref sec_decode for further information on creating instructions from
 scratch, decoding, encoding, and disassembling instructions.  Typically
 these instructions will be stored on the stack, using instr_init() and
 instr_free() or instr_reset(), as shown in that section.

 See \ref sec_IR_heap for further information on heap allocation for
 instructions and safely using instructions in signal handlers.

 \if level_of_detail
 ********************
 \subsection sec_IR_adaptive Adaptive Level of Detail

 It is costly to decode instructions.  Fortunately, DynamoRIO is
 often only interested in high-level information for a subset of
 instructions, such as just the control-flow instructions.  Each
 instruction can be at one of five levels of detail.  To simplify clients,
 DynamoRIO only passes them instructions at Level 3 or Level 4.
 Internally, DynamoRIO uses all five levels:

 \par Level 0: Raw Bundle

 At Level 0, the #instr_t data structure holds raw bytes for group of
 instructions decoded only enough to determine the final instruction
 boundary:

 <table border=0 cellpadding=2 cellspacing=1>
 <tr bgcolor="ffcc99"><td><tt>
 8d 34 01 8b 46 0c 2b 46 1c 0f b7 4e 08 c1 e1 07 3b c1 0f 8d a2 0a 00 00
 </tt></td></tr>
 </table>

 \par Level 1: Raw Individual

 At Level 1, each #instr_t holds only one instruction, but has no more
 information than raw bytes:

 <table border=0 cellpadding=2 cellspacing=1>
 <tr><td bgcolor="ffcc99"><tt>8d 34 01</tt></td></tr>
 <tr align="middle" bgcolor="ffffff"><td>&nbsp;|</td></tr>
 <tr><td bgcolor="ffcc99"><tt>8b 46 0c</tt></td></tr>
 <tr align="middle" bgcolor="ffffff"><td>&nbsp;|</td></tr>
 <tr><td bgcolor="ffcc99"><tt>2b 46 1c</tt></td></tr>
 <tr align="middle" bgcolor="ffffff"><td>&nbsp;|</td></tr>
 <tr><td bgcolor="ffcc99"><tt>0f b7 4e 08</tt></td></tr>
 <tr align="middle" bgcolor="ffffff"><td>&nbsp;|</td></tr>
 <tr><td bgcolor="ffcc99"><tt>c1 e1 07</tt></td></tr>
 <tr align="middle" bgcolor="ffffff"><td>&nbsp;|</td></tr>
 <tr><td bgcolor="ffcc99"><tt>3b c1</tt></td></tr>
 <tr align="middle" bgcolor="ffffff"><td>&nbsp;|</td></tr>
 <tr><td bgcolor="ffcc99"><tt>0f 8d a2 0a 00 00</tt></td></tr>
 </table>

 \par Level 2: Opcode and Eflags

 At Level 2, #instr_t has been decoded just enough to determine opcode and
 flags effects (flags are important when analyzing code).  Raw bytes
 are still used for encoding.  The flags effects below are only shown for
 reading (R) or writing (W) the six arithmetic flags (Carry, Parity, Adjust,
 Zero, Sign, and Overflow).

 <table border=0 cellpadding=2 cellspacing=1>
 <tr><td bgcolor="ffcc99"><tt>8d 34 01</tt></td>
   <td bgcolor="ccccff"><tt>lea</tt></td>
   <td bgcolor="ccccff" align="middle"><tt>-</tt></td></tr>
 <tr align="middle" bgcolor="ffffff"><td>&nbsp;</td><td>&nbsp;|</td></tr>
 <tr><td bgcolor="ffcc99"><tt>8b 46 0c</tt></td>
   <td bgcolor="ccccff"><tt>mov</tt></td>
   <td bgcolor="ccccff" align="middle"><tt>-</tt></td></tr>
 <tr align="middle" bgcolor="ffffff"><td>&nbsp;</td><td>&nbsp;|</td></tr>
 <tr><td bgcolor="ffcc99"><tt>2b 46 1c</tt></td>
   <td bgcolor="ccccff"><tt>sub</tt></td>
   <td bgcolor="ccccff" align="middle"><tt>WCPAZSO</tt></td></tr>
 <tr align="middle" bgcolor="ffffff"><td>&nbsp;</td><td>&nbsp;|</td></tr>
 <tr><td bgcolor="ffcc99"><tt>0f b7 4e 08</tt></td>
   <td bgcolor="ccccff"><tt>movzx</tt></td>
   <td bgcolor="ccccff" align="middle"><tt>-</tt></td></tr>
 <tr align="middle" bgcolor="ffffff"><td>&nbsp;</td><td>&nbsp;|</td></tr>
 <tr><td bgcolor="ffcc99"><tt>c1 e1 07</tt></td>
   <td bgcolor="ccccff"><tt>shl</tt></td>
   <td bgcolor="ccccff" align="middle"><tt>WCPAZSO</tt></td></tr>
 <tr align="middle" bgcolor="ffffff"><td>&nbsp;</td><td>&nbsp;|</td></tr>
 <tr><td bgcolor="ffcc99"><tt>3b c1</tt></td>
   <td bgcolor="ccccff"><tt>cmp</tt></td>
   <td bgcolor="ccccff" align="middle"><tt>WCPAZSO</tt></td></tr>
 <tr align="middle" bgcolor="ffffff"><td>&nbsp;</td><td>&nbsp;|</td></tr>
 <tr><td bgcolor="ffcc99"><tt>0f 8d a2 0a 00 00</tt></td>
   <td bgcolor="ccccff"><tt>jnl</tt></td>
   <td bgcolor="ccccff" align="middle"><tt>RSO</tt></td></tr>
 </table>

 \par Level 3: Operands

 A Level 3 #instr_t contains dynamically allocated arrays of source and
 destination operands (dynamic because some ISA's are quite variable) that are now
 filled in.  Raw bytes are still valid and are used for encoding.  This
 level combines high-level information with quick encoding.

 <table border=0 cellpadding=2 cellspacing=1>
 <tr><td bgcolor="ffcc99"><tt>8d 34 01</tt></td>
   <td bgcolor="ccccff"><tt>lea</tt></td>
   <td bgcolor="ccff66"><tt>(%ecx,%eax,1) => %esi</tt></td>
   <td bgcolor="ccccff" align="middle"><tt>-</tt></td></tr>
 <tr align="middle" bgcolor="ffffff"><td>&nbsp;</td><td>&nbsp;|</td></tr>
 <tr><td bgcolor="ffcc99"><tt>8b 46 0c</tt></td>
   <td bgcolor="ccccff"><tt>mov</tt></td>
   <td bgcolor="ccff66"><tt>0xc(%esi) => %eax</tt></td>
   <td bgcolor="ccccff" align="middle"><tt>-</tt></td></tr>
 <tr align="middle" bgcolor="ffffff"><td>&nbsp;</td><td>&nbsp;|</td></tr>
 <tr><td bgcolor="ffcc99"><tt>2b 46 1c</tt></td>
   <td bgcolor="ccccff"><tt>sub</tt></td>
   <td bgcolor="ccff66"><tt>0x1c(%esi) %eax => %eax</tt></td>
   <td bgcolor="ccccff" align="middle"><tt>WCPAZSO</tt></td></tr>
 <tr align="middle" bgcolor="ffffff"><td>&nbsp;</td><td>&nbsp;|</td></tr>
 <tr><td bgcolor="ffcc99"><tt>0f b7 4e 08</tt></td>
   <td bgcolor="ccccff"><tt>movzx</tt></td>
   <td bgcolor="ccff66"><tt>0x8(%esi) => %ecx</tt></td>
   <td bgcolor="ccccff" align="middle"><tt>-</tt></td></tr>
 <tr align="middle" bgcolor="ffffff"><td>&nbsp;</td><td>&nbsp;|</td></tr>
 <tr><td bgcolor="ffcc99"><tt>c1 e1 07</tt></td>
   <td bgcolor="ccccff"><tt>shl</tt></td>
   <td bgcolor="ccff66"><tt>$0x07 %ecx => %ecx</tt></td>
   <td bgcolor="ccccff" align="middle"><tt>WCPAZSO</tt></td></tr>
 <tr align="middle" bgcolor="ffffff"><td>&nbsp;</td><td>&nbsp;|</td></tr>
 <tr><td bgcolor="ffcc99"><tt>3b c1</tt></td>
   <td bgcolor="ccccff"><tt>cmp</tt></td>
   <td bgcolor="ccff66"><tt>%eax %ecx</tt></td>
   <td bgcolor="ccccff" align="middle"><tt>WCPAZSO</tt></td></tr>
 <tr align="middle" bgcolor="ffffff"><td>&nbsp;</td><td>&nbsp;|</td></tr>
 <tr><td bgcolor="ffcc99"><tt>0f 8d a2 0a 00 00</tt></td>
   <td bgcolor="ccccff"><tt>jnl</tt></td>
   <td bgcolor="ccff66"><tt>$0x77f52269</tt></td>
   <td bgcolor="ccccff" align="middle"><tt>RSO</tt></td></tr>
 </table>

 \par Level 4: Modified Operands

 At the highest level, #instr_t has been modified at the operand level, or
 has been created from operands, such that the raw bytes are no longer
 valid.  The #instr_t must be fully encoded from its operands.

 <table border=0 cellpadding=2 cellspacing=1>
 <tr><td>&nbsp;</td>
   <td bgcolor="ccccff"><tt>lea</tt></td>
   <td bgcolor="ccff66"><tt>(%ecx,%eax,1) => \e %edi</tt></td>
   <td bgcolor="ccccff" align="middle"><tt>-</tt></td></tr>
 <tr align="middle" bgcolor="ffffff"><td>&nbsp;</td><td>&nbsp;|</td></tr>
 <tr><td>&nbsp;</td>
   <td bgcolor="ccccff"><tt>mov</tt></td>
   <td bgcolor="ccff66"><tt>0xc(\e %edi) => %eax</tt></td>
   <td bgcolor="ccccff" align="middle"><tt>-</tt></td></tr>
 <tr align="middle" bgcolor="ffffff"><td>&nbsp;</td><td>&nbsp;|</td></tr>
 <tr><td>&nbsp;</td>
   <td bgcolor="ccccff"><tt>sub</tt></td>
   <td bgcolor="ccff66"><tt>0x1c(\e %edi) %eax => %eax</tt></td>
   <td bgcolor="ccccff" align="middle"><tt>WCPAZSO</tt></td></tr>
 <tr align="middle" bgcolor="ffffff"><td>&nbsp;</td><td>&nbsp;|</td></tr>
 <tr><td>&nbsp;</td>
   <td bgcolor="ccccff"><tt>movzx</tt></td>
   <td bgcolor="ccff66"><tt>0x8(\e %edi) => %ecx</tt></td>
   <td bgcolor="ccccff" align="middle"><tt>-</tt></td></tr>
 <tr align="middle" bgcolor="ffffff"><td>&nbsp;</td><td>&nbsp;|</td></tr>
 <tr><td bgcolor="ffcc99"><tt>c1 e1 07</tt></td>
   <td bgcolor="ccccff"><tt>shl</tt></td>
   <td bgcolor="ccff66"><tt>$0x07 %ecx => %ecx</tt></td>
   <td bgcolor="ccccff" align="middle"><tt>WCPAZSO</tt></td></tr>
 <tr align="middle" bgcolor="ffffff"><td>&nbsp;</td><td>&nbsp;|</td></tr>
 <tr><td bgcolor="ffcc99"><tt>3b c1</tt></td>
   <td bgcolor="ccccff"><tt>cmp</tt></td>
   <td bgcolor="ccff66"><tt>%eax %ecx</tt></td>
   <td bgcolor="ccccff" align="middle"><tt>WCPAZSO</tt></td></tr>
 <tr align="middle" bgcolor="ffffff"><td>&nbsp;</td><td>&nbsp;|</td></tr>
 <tr><td bgcolor="ffcc99"><tt>0f 8d a2 0a 00 00</tt></td>
   <td bgcolor="ccccff"><tt>jnl</tt></td>
   <td bgcolor="ccff66"><tt>$0x77f52269</tt></td>
   <td bgcolor="ccccff" align="middle"><tt>RSO</tt></td></tr>
 </table>

 \par Basic Block Example

 As an example of using different levels of detail, a basic block in
 DynamoRIO is represented using a Level 0 #instr_t for all non-control-flow
 instructions, and a Level 3 #instr_t for the block-ending control-flow
 instruction:

 <table border=0 cellpadding=2 cellspacing=1>
 <tr bgcolor="ffcc99"><td colspan=4><tt>
 8d 34 01 8b 46 0c 2b 46 1c 0f b7 4e 08 c1 e1 07 3b c1
 </tt></td></tr>
 <tr align="middle" bgcolor="ffffff"><td>&nbsp;|</td></tr>
 <tr><td bgcolor="ffcc99"><tt>0f 8d a2 0a 00 00</tt></td>
   <td bgcolor="ccccff"><tt>jnl&nbsp;&nbsp;&nbsp;&nbsp;</tt></td>
   <td bgcolor="ccff66"><tt>$0x77f52269&nbsp;&nbsp;&nbsp;&nbsp;</tt></td>
   <td bgcolor="ccccff" align="middle"><tt>&nbsp;&nbsp;RSO&nbsp;&nbsp;</tt></td></tr>
 </table>

 However, when a client registers for the basic block event, DynamoRIO
 passes an #instrlist_t of all Level 3 #instr_t's, for simplicity.
 \endif


 ***************************************************************************
 \htmlonly
 <table width=100% bgcolor="#000000" cellspacing=0 cellpadding=2 border=0>
   <tr><td><table width=100% bgcolor="#0000C0" cellspacing=0 cellpadding=1 border=0>
   <tr><td><table width=100% bgcolor="#0000C0" cellspacing=0 cellpadding=1 border=0>
   <tr><td></td></tr></table></td></tr></table></td></tr></table>
 \endhtmlonly

 ********************
 \subsection sec_IR_AArch64 AArch64 IR Variations

 DynamoRIO's IR representation of AArch64 NEON instructions uses an additional
 immediate source operand to denote the width of the vector elements. The immediates
 take the values #VECTOR_ELEM_WIDTH_BYTE (8 bit), #VECTOR_ELEM_WIDTH_HALF (16 bit),
 #VECTOR_ELEM_WIDTH_SINGLE (32 bit) and #VECTOR_ELEM_WIDTH_DOUBLE (64 bit),
 for vector instructions that require arrangement specifiers for their operands.
 This is different from AArch64 assembly, where the element width is part of the
 vector register operand. For example, floating point vector addition of two vectors
 with 2 double elements is represented in assembly by
 \code fadd v9.2d, v30.2d, v9.2d \endcode and in IR by
 \code fadd %q30 %q9 $0x03 -> %q9 \endcode.

 \section sec_events_bt Events

 The core of a client's interaction with DynamoRIO occurs through <em>
 event hooks</em>: the client registers its own callback routine (or \e
 hook) for each event it is interested in.  DynamoRIO calls the
 client's event hooks at appropriate times, giving the client access to
 key actions during execution of the application.  The \ref sec_events
 section describes events common to the entire DynamoRIO API. Here we
 discuss the events specific to the Code Manipulation portion.

 DynamoRIO provides two events related to application code fragments: one for
 basic blocks and one for traces (see dr_register_bb_event() and
 dr_register_trace_event()).  Through these fragment-creation hooks,
 the client has the ability to inspect and modify any piece of code
 that DynamoRIO emits before it executes.  Using the basic block hook,
 a client sees \e all application code.  The trace-creation hook
 provides a mechanism for clients to instrument only
 frequently-executed code paths.

 ********************
 \subsection sec_control_points Transformation Versus Execution Time

 DynamoRIO's basic block and trace events are raised when the corresponding
 application code is being transferred to the software code cache for
 execution.  <em>No event is raised on each execution of this code from the
 code cache</em>.  In a typical run, a particular block of code will only be
 seen once in an event.  It will subsequently execute many times in the code
 cache.

 The point where the event is raised, where the application code is being
 copied into the cache, is called <em>transformation time</em>.  This is
 where a client can insert instrumentation to monitor the code, or can
 modify the application code itself.  The repeated executions within the
 code cache of this instrumented or modified code are referred to as
 <em>execution time</em>.  It is important to understand the distinction.

 The code manipulation API is highly efficient in that fragment creation
 comprises a small part of DynamoRIO's overhead.  A client's instrumentation
 time actions rarely add substantial overhead for most target applications.
 Instead, it is extra actions taken by added instrumentation code acting at
 execution time that affects efficiency.

 ********************
 \subsection sec_events_bb Basic Block Creation

 Through the basic block creation event, registered via
 dr_register_bb_event(), the client has the ability to inspect and transform
 any piece of code prior to its execution.  The client's hook receives
 five parameters:

 \code
 dr_emit_flags_t new_block(void *drcontext, void *tag, instrlist_t *bb,
                           bool for_trace, bool translating);
 \endcode

  - \c drcontext is a pointer to the input program's machine context.
    Clients should not inspect or modify the context; it is provided as
    an opaque pointer (i.e., <tt>void *</tt>) to be passed to API
    routines that require access to this internal data.

  - \c tag is a unique identifier for the basic block fragment.

  - \c bb is a pointer to the list of instructions that comprise the
    basic block.  Clients can examine, manipulate, or completely
    replace the instructions in the list.

  - \c for_trace indicates whether this callback is for a new basic block
    (false) or for adding a basic block to a trace being created (true).
    The client has the opportunity to either include the same modifications
    made to the standalone basic block, or to use different modifications,
    for the code in the trace.

  - \c translating indicates whether this callback is for basic block
    creation (false) or is for address translation (true). This is further
    explained in \ref sec_translation.

 The return value of the basic block callback should generally be
 DR_EMIT_DEFAULT; however, time-varying instrumentation or complex code
 transformations may need to return DR_EMIT_STORE_TRANSLATIONS.  See \ref
 sec_translation for further details.  A tool that wants to persist its code
 to a file for fast re-use on subsequent runs can include the
 DR_EMIT_PERSISTABLE flag in its return value.  See \ref sec_pcache for more
 information.

 To iterate over instructions in an #instrlist_t, use the \ref
 instrlist_first(), \ref instrlist_last() (if necessary), and \ref
 instr_get_next() routines. For example:

 \code
 dr_emit_flags_t new_block(void *drcontext, void *tag, instrlist_t *bb,
                           bool for_trace, bool translating)
 {
   instr_t *instr, *next;
   for (instr = instrlist_first(bb);
        instr != NULL;
        instr = next) {
     next = instr_get_next(instr);
     /* do some processing on instr */
   }
   return DR_EMIT_DEFAULT;
 }
 \endcode

 ********************
 \subsection sec_Meta Application Versus Meta Instructions

 Changes to the instruction stream made by a client fall into two
 categories: changes or additions that should be considered part of the
 application's behavior, versus additions that are observational in nature
 and are not acting on the application's behalf.  The latter are called \e
 meta instructions.

 Meta instructions are marked using these API routines:

 \code
 instr_set_meta()
 instrlist_meta_preinsert()
 instrlist_meta_postinsert()
 instrlist_meta_append()
 \endcode

 DynamoRIO performs some processing on the basic block after the
 client has finished with it, primarily modifying branches to ensure
 that DynamoRIO retains control after execution.  It is important that
 the client mark any control-flow instructions that it does not want treated
 as application instructions as \e meta instructions. Doing so informs
 DynamoRIO that these instructions should execute natively rather than
 being trapped and redirected to new basic block fragments.

 Through meta instructions, a client can add its own internal control flow
 or make a call to a native routine.  The target of a meta call will not be
 brought into the code
 cache by DynamoRIO.  However, such native calls need to be careful to
 remain transparent (see \ref sec_clean_call).

 Meta instructions are normally observational, in which case they should not
 fault and should have a NULL translation field.  It is possible to use meta
 instructions that deliberately fault, or that could fault by accessing
 application memory addresses, but only if the client handles all such
 faults.  See \ref sec_translation for more information on fault handling.

 Meta instructions are visible to client code, if using instr_get_next() and
 instrlist_first(). To traverse only application (non-meta) instructions, a
 client can use the following API functions instead:

 \code
 instr_get_next_app()
 instrlist_first_app()
 \endcode

 We recommend that clients follow a disciplined model that separates application
 code analysis versus insertion of instrumentation.  The \ref page_drmgr
 Extension facilitates this by separating application transformation, application
 analysis, and instrumentation.  However, even with this separation, label
 instructions and in some cases other meta instructions (e.g., from
 drwrap_replace_native()) are added during application transformation which
 should be skipped during analysis.  Using instrlist_first_app() and
 instr_get_next_app() is recommended during application analysis: it
 automatically skips non-application (meta) instructions, which at that stage are
 guaranteed to be either labels or to have no effect on register state or other
 key aspects of application code analysis.

 While DynamoRIO attempts to support arbitrary code transformations, its
 internal operation requires that we impose the following limitations:

  - If there is more than one application branch, only the last can be
    conditional.
  - An application conditional branch must be the final instruction in the
    block.
  - There can only be one indirect branch (call, jump, or return) in
    a basic block, and it must be the final application branch in the block.
  - The exit control-flow of a block ending in a system call cannot be
    changed.
  - On AArch64, an ISB instruction (#OP_isb) must be the last instruction
    in its block.

 Application instructions, or non-meta instructions, in addition to being
 processed (and followed if control flow), are also considered safe points
 for relocation for the rare times when DynamoRIO must move threads
 around.  Thus a client should ensure that it is safe to re-start an
 application instruction at the translation field address provided.

 ********************
 \subsection sec_events_trace Trace Creation

 DynamoRIO provides access to traces primarily through the trace-creation
 event, registered via dr_register_trace_event().  It is important to note
 that clients are not
 required to employ the trace-creation event to ensure full instrumentation.
 Rather, it is sufficient to perform all code modification using the basic
 block event.  Any basic blocks that DynamoRIO chooses to place in a trace
 will contain all client modifications (unless the client behaves
 differently in the basic block hook when its \c for_trace parameter is
 true).  The trace-creation event provides
 a mechanism for clients to instrument \e hot code separately.

 The parameters to the trace-creation event hook are nearly identical
 to those of the basic block hook:

 \code
 dr_emit_flags_t new_trace(void *drcontext, void *tag, instrlist_t *trace,
                           bool translating);
 \endcode

  - \c drcontext is a pointer to the input program's machine context.
    Clients should not inspect or modify the context; it is provided as
    an opaque pointer (i.e., <tt>void *</tt>) to be passed to API
    routines that require access to this internal data.

  - \c tag is a unique identifier for the trace fragment.

  - \c trace is a pointer to the list of instructions that comprise the
    trace.  Clients can examine, manipulate, or completely replace the
    instructions in the list.

  - \c translating indicates whether this callback is for trace creation
    (false) or is for address translation (true). This is further explained
    in \ref sec_translation.

 The return value of the trace callback should generally be DR_EMIT_DEFAULT;
 however, time-varying instrumentation or complex code transformations may
 need to return DR_EMIT_STORE_TRANSLATIONS.  See \ref sec_translation for
 further details.

 DynamoRIO calls the client-supplied event hook each time a trace is
 created, just before the trace is emitted into the code cache.
 Additionally, as each constituent basic block is added to the trace,
 DynamoRIO calls the basic block creation hook with the \p for_trace
 parameter set to true.  In order to preserve basic block instrumentation
 inside of traces, a client need only act identically with respect to the
 \p for_trace parameter; it can ignore the trace event if its goal is to
 place instrumentation on all code.

 The constituent basic blocks will be stitched together prior to insertion
 in the code cache: conditional branches will be realigned so that their
 fall-through target remains on the trace, and inlined indirect branches
 will be preceded by a comparison against the on-trace target.

 If the basic block callback behaves differently based on the \p for_trace
 parameter, different instrumentation will exist in the trace as opposed to
 the standalone basic block.  If the basic block corresponds to the
 application code at the start of the trace (i.e., it is a trace head), the
 trace will shadow the basic block and the trace will be executed
 preferentially.  If #dr_delete_fragment() is called, it will also delete
 the trace first and may leave the basic block in place.  The flush routines
 (#dr_flush_region(), #dr_delay_flush_region(), #dr_unlink_flush_region()),
 however, will delete traces and basic blocks alike.

 ********************
 \subsection sec_events_translation State Restoration

 If a client is only adding instrumentation (meta instructions) that do not
 reference application memory, and is not reordering or removing application
 instructions, then it need not register for this event.  If, however, a
 client is modifying application code or adding instructions that could
 fault, the client must be capable of restoring the original context.
 DynamoRIO calls a state restoration event, registered via
 dr_register_restore_state_event() or dr_register_restore_state_ex_event(),
 whenever it needs to translate a code cache context to an original
 application context:

 \code
 void restore_state(void *drcontext, void *tag, dr_mcontext_t *mcontext,
                    bool restore_memory, bool app_code_consistent)
 void restore_state_ex(void *drcontext, bool restore_memory,
                       dr_restore_state_info_t *info)
 \endcode

 See \ref sec_translation for further details.

 ********************
 \subsection sec_events_del Basic Block and Trace Deletion

 DynamoRIO can also provide notification of fragment deletion via
 dr_register_delete_event().  The signature for this event callback is:

 \code
 void fragment_deleted(void *drcontext, void *tag);
 \endcode

 DynamoRIO calls this event hook each time it deletes a fragment from
 the code cache.  Such information may be needed if the client
 maintains its own data structures about emitted fragment code that
 must be consistent across fragment deletions.

 ********************
 \subsection sec_events_wow64 Special System Calls

 For 32-bit applications on a 64-bit Windows kernel ("Windows-on-Windows-64"
 or "WOW64"), DynamoRIO treats the indirect call from 32-bit system
 libraries that transitions to WOW64 marshalling code as a system call, even
 though there are a few 32-bit instructions executed afterward on some
 versions of Windows.  Tools monitoring calls and returns will need to also
 check for instructions being considered system calls.


 ***************************************************************************
 \htmlonly
 <table width=100% bgcolor="#000000" cellspacing=0 cellpadding=2 border=0>
   <tr><td><table width=100% bgcolor="#0000C0" cellspacing=0 cellpadding=1 border=0>
   <tr><td><table width=100% bgcolor="#0000C0" cellspacing=0 cellpadding=1 border=0>
   <tr><td></td></tr></table></td></tr></table></td></tr></table>
 \endhtmlonly
 \section sec_decode Decoding and Encoding

 As discussed in \ref sec_events_bb and \ref sec_events_trace, a
 client's primary interface to code inspection and manipulation is via
 the basic block and trace hooks.  However, DynamoRIO also
 exports a rich set of functions and data structures to decode and
 encode instructions directly.  The following subsections overview this
 functionality.


 ********************
 \subsection sec_Decoding Decoding

 DynamoRIO provides several routines for decoding and disassembling
 instructions.  The most common method for decoding is the
 decode() routine, which populates an #instr_t data structure with all
 information about the instruction (e.g., opcode and operand
 information).

 When decoding instructions, clients must explicitly manage the \c
 #instr_t data structure.  For example, the following code shows how to
 use the instr_init(), instr_reset(), and instr_free() routines to
 decode a sequence of arbritrary instructions:

 \code
 instr_t instr;
 instr_init(&instr);
 do {
   instr_reset(dcontext, &instr);
   pc = decode(dcontext, pc, &instr);
   /* check for invalid instr */
   if (pc == NULL)
     break;
   if (instr_writes_memory(&instr)) {
     /* do some processing */
   }
 } while (pc < stop_pc);
 instr_free(dcontext, &instr);
 \endcode

 DynamoRIO supports decoding multiple instruction set modes.
 See \ref sec_isa for full details.

 ********************
 \subsection sec_InstrGen Instruction Generation

 Clients can construct instructions from scratch in two different ways:

  -# Using the INSTR_CREATE_opcode macros that fill
     in implicit operands automatically:
     \code
 instr_t *instr = INSTR_CREATE_dec(dcontext, opnd_create_reg(REG_EDX));
     \endcode
  -# Specifying the opcode and all operands (including implicit
     operands):
     \code
 instr_t *instr = instr_create(dcontext);
 instr_set_opcode(instr, OP_dec);
 instr_set_num_opnds(dcontext, instr, 1, 1);
 instr_set_dst(instr, 0, opnd_create_reg(REG_EDX));
 instr_set_src(instr, 0, opnd_create_reg(REG_EDX));
     \endcode

 When using the second method, the exact order of operands and their
 sizes must match the templates that DynamoRIO uses.  The
 INSTR_CREATE_ macros in dr_ir_macros.h should be consulted to
 determine the order.

 ********************
 \subsection sec_Encoding Encoding

 DynamoRIO's encoding routines take an instruction or list of
 instructions and encode them into the corresponding bit pattern:

 \code instr_encode(), instrlist_encode() \endcode

 When encoding a control transfer instruction that targets another
 instruction, two encoding passes are performed: one to find the offset
 of the target instruction, and the other to link the control transfer
 to the proper target offset.

 DynamoRIO is capable of encoding multiple instruction set modes.
 See \ref sec_isa for details.

 ********************
 \subsection sec_disasm Disassembly

 DynamoRIO provides several routines for printing instructions to a file or
 a buffer.  These include disassemble(), opnd_disassemble(),
 instr_disassemble(), instrlist_disassemble(), disassemble_with_info(),
 disassemble_from_copy(), and disassemble_to_buffer().

 The style of disassembly can be controlled through the
 \ref op_syntax_intel "-syntax_intel" (for Intel-style disassembly),
 \ref op_syntax_att "-syntax_att" (for AT&T-style disassembly),
 \ref op_syntax_arm "-syntax_arm" (for ARM-style disassembly), and
 \ref op_syntax_riscv "-syntax_riscv" (for RISC-V-style disassembly) runtime
 options, or the disassemble_set_syntax() function.  The default disassembly
 style is DynamoRIO's custom style, which lists all operands (both implicit
 and explicit).  The sources are listed first, followed by "->", and then
 the destinations.  This provides more information than any of the other
 formats.

 ********************
 \subsection sec_IR_heap Instruction Heap Allocation

 DynamoRIO's IR is designed for efficiency and a small footprint.  By
 default, space for operands is dynamically allocated from the heap.
 This can be problematic when using instructions in fragile locations
 such as signal handlers.  DR provides a separate instruction structure
 for such situations: #instr_noalloc_t.  This structure contains
 built-in storage for all possible operand slots and for temporary
 encoding space, avoiding heap allocation when used for decoding or
 encoding.

 To use #instr_noalloc_t, declare it and then obtain a pointer to it
 as an #instr_t structure for use with API functions:

 \code
 instr_noalloc_t noalloc;
 instr_noalloc_init(dcontext, &noalloc);
 instr_t *instr = instr_from_noalloc(&noalloc);
 pc = decode(dcontext, ptr, instr);
 \endcode

 No freeing is required.  To re-use the same no-alloc structure,
 instr_reset() can be called on its #instr_t pointer.


 ***************************************************************************
 \htmlonly
 <table width=100% bgcolor="#000000" cellspacing=0 cellpadding=2 border=0>
   <tr><td><table width=100% bgcolor="#0000C0" cellspacing=0 cellpadding=1 border=0>
   <tr><td><table width=100% bgcolor="#0000C0" cellspacing=0 cellpadding=1 border=0>
   <tr><td></td></tr></table></td></tr></table></td></tr></table>
 \endhtmlonly
 \section sec_isa Instruction Set Modes

 Some architectures support multiple instruction set modes.  The AMD64 build
 of DynamoRIO is capable of decoding and encoding 32-bit IA-32 instructions,
 while the 32-bit ARM build is capable of decoding and encoding both ARM and
 Thumb modes.

 In DynamoRIO, each thread has a current mode that is used to determine how
 to interpret instructions while decoding, whose default matches the
 DynamoRIO build.  The dr_set_isa_mode() routine changes the current mode,
 while dr_get_isa_mode() queries the current mode.

 Additionally, each instruction contains a flag indicating the mode in which
 it should be encoded.  When an instruction is created or decoded, the
 instruction's flag is set to the thread's current mode.  It can be queried
 with instr_get_isa_mode() and changed with instr_set_isa_mode().

 ********************
 \subsection sec_64bit 64-bit Versus 32-bit Instructions

 The 64-bit build of DynamoRIO uses 64-bit decoding and encoding by
 default, while the 32-bit build uses 32-bit.  The 64-bit build is also
 capable of decoding and encoding 32-bit instructions.

 For a 64-bit build of DynamoRIO, the instruction creation macros all use
 64-bit-sized registers.  The recommended model when generating 32-bit code
 is to use the macros to create an instruction list and before encoding to
 call instr_set_isa_mode(DR_ISA_IA32) and instr_shrink_to_32_bits() on each
 instruction.  Naturally any instruction that differs in more than register
 selection must be special-cased.

 ********************
 \subsection sec_thumb Thumb Mode Addresses

 For 32-bit ARM, target addresses passed as event callbacks, as clean
 call targets, or as dr_redirect_execution() targets should have
 their least significant bit set to 1 if they need to be executed
 in Thumb mode (#DR_ISA_ARM_THUMB).  Addresses obtained via
 dr_get_proc_address(), or function pointers at the source code level,
 should automatically have this property.  dr_app_pc_as_jump_target() can
 also be used to construct the proper address from an aligned value.

 When decoding, if the target address has its least significant bit set to
 1, the decoder switches to Thumb mode for the duration of the decoding,
 regardless of the current thread's mode.

 ***************************************************************************
 \htmlonly
 <table width=100% bgcolor="#000000" cellspacing=0 cellpadding=2 border=0>
   <tr><td><table width=100% bgcolor="#0000C0" cellspacing=0 cellpadding=1 border=0>
   <tr><td><table width=100% bgcolor="#0000C0" cellspacing=0 cellpadding=1 border=0>
   <tr><td></td></tr></table></td></tr></table></td></tr></table>
 \endhtmlonly
 \section sec_IR_utils Utilities

 In addition to instruction decoding and encoding, the API includes
 several higher-level routines to facilitate code instrumentation.
 These include the following:

 - Routines to insert clean calls to client-defined functions.
 - Routines to instrument control-flow instructions.
 - Routines to spill registers to DynamoRIO's thread-private spill
   slots.
 - Routines to quickly save and restore arithmetic flags,
   floating-point state, and MMX/SSE registers.

 The following subsections describe these routines in more detail.

 ********************
 \subsection sec_clean_call Clean Calls

 To make it easy to insert code into the application instruction
 stream, DynamoRIO provides a <em>clean call</em> mechanism, which
 allows insertion of a transparent call to a client routine.
 The dr_insert_clean_call() routine takes care of switching to a clean
 stack, setting up arguments to a call and making the call, optionally
 preserving floating point state, and preserving application state across
 the entire sequence.

 Here is an example of inserting a clean call to the \c at_mbr function:

 \code
 if (instr_is_mbr(instr)) {
   app_pc address = instr_get_app_pc(instr);
   uint opcode = instr_get_opcode(instr);
   instr_t *nxt = instr_get_next(instr);
   dr_insert_clean_call(drcontext, ilist, nxt, (void *) at_mbr,
                        false/*don't need to save fp state*/,
                        2 /* 2 parameters */,
                        /* opcode is 1st parameter */
                        OPND_CREATE_INT32(opcode),
                        /* address is 2nd parameter */
                        OPND_CREATE_INTPTR(address));
 }
 \endcode

 Through this mechanism, clients can write analysis code in C or other
 high-level languages and easily insert calls to these routines in the
 instruction stream.  Note, however, that saving and restoring machine state
 is an expensive operation.  Performance-critical operations should be
 inlined for maximum efficiency.

 The stack that DynamoRIO switches to for clean calls is relatively small:
 only 20KB by default.  Clients can increase the size of the stack with the
 \ref op_stack_size "-stack_size" runtime option.  Clients should also
 avoid keeping persistent state on the clean call stack, as it is wiped
 clean at the start of each clean call.

 The saved interrupted application state can be accessed using
 dr_get_mcontext() and modified using dr_set_mcontext().

 For performance reasons, clean calls do not save or restore floating point,
 MMX, or SSE state by default.  If the clean callee is using floating point
 or multimedia operations, it should request that the clean call mechanism
 preserve the floating point state through the appropriate parameter to
 dr_insert_clean_call().  See also
 \ref sec_trans_floating_point "Floating Point State, MMX, and SSE Transparency".

 If more detailed control over the call sequence is desired, it can be
 broken down into its constituent pieces:

  - dr_prepare_for_call()
  - Optionally, dr_insert_save_fpstate()
  - dr_insert_call()
  - Optionally, dr_insert_restore_fpstate()
  - dr_cleanup_after_call()

 DynamoRIO analyzes the callee target of each clean call and attempts to
 reduce the context switch size and, if the callee is simple enough, to
 automatically inline it.  This analysis and potential inlining works best
 when the callee is fully optimized.  Thus, we recommend using high
 optimization levels in clients, even when running DynamoRIO itself in debug
 build in order to examine whether callees are being inlined.  See \ref
 op_cleancall "-opt_cleancall" for information on how to adjust the
 aggressiveness of these optimizations and for a list of specific conditions
 that affect inlining.

 ********************
 \subsection sec_state State Preservation

 To facilitate code transformations, DynamoRIO makes available its register
 spill slots and other state preservation functionality.
 The \ref page_drreg Extension Library is recommended to manage registers.
 DynamoRIO's direct interfaces for saving and restoring registers to
 and from thread-local spill slots may also be used:

 \code dr_save_reg(), dr_restore_reg(), and dr_reg_spill_slot_opnd() \endcode

 The values stored in these spill slots remain valid until the next application
 (i.e. non-meta) instruction and as such can be accessed from clean calls using:

 \code dr_read_saved_reg(), dr_write_saved_reg() \endcode

 When using DynamoRIO's interfaces instead of drreg, be sure to look
 for the labels #DR_NOTE_ANNOTATION and #DR_NOTE_REG_BARRIER at which
 all application values should be restored to registers.

 For longer term persistence DynamoRIO also provides a generic dedicated
 thread-local storage field for use by clients, making it easy to write
 thread-aware clients.  From C code, use:

 \code dr_get_tls_field(), dr_set_tls_field() \endcode

 To access this thread-local field from the code cache, use the following
 routines to generate the necessary code:

 \code dr_insert_read_tls_field(), dr_insert_write_tls_field() \endcode

 Since saving and restoring the \c eflags register is required for almost
 all code transformations, and since it is difficult to do so efficiently,
 we export routines that use our efficient method of arithmetic flag
 preservation:

 \code dr_save_arith_flags(), dr_restore_arith_flags() \endcode

 As just discussed in \ref sec_clean_call, we also export convenience
 routines for making \e clean (i.e., transparent) native calls from the code
 cache, as well as floating point and multimedia state preservation.


 ********************
 \subsection sec_branch_instru Branch Instrumentation

 DynamoRIO provides explicit support for instrumenting call
 instructions, direct (or unconditional) branches, indirect (or
 multi-way) branches, and conditional branches.  These convenience
 routines insert clean calls to client-provided methods, passing as
 arguments the instruction pc and target pc of each control transfer,
 along with taken or not taken information for conditional branches:

 \code
 dr_insert_call_instrumentation()
 dr_insert_ubr_instrumentation()
 dr_insert_mbr_instrumentation()
 dr_insert_cbr_instrumentation()
 \endcode


 ********************
 \subsection sec_adaptive Dynamic Instrumentation

 DynamoRIO allows a client to dynamically adjust its instrumentation
 by providing a routine to flush all cached fragments corresponding to
 an application code region and register (or unregister) instrumentation
 event callbacks:

 \code
 dr_flush_region_ex()
 \endcode

 The client should provide a callback to this routine, that unregisters
 old instrumentation event callbacks, and registers new ones.

 In order to directly modify the instrumentation on a particular fragment
 (as opposed to replacing instrumentation on all copies of fragments
 corresponding to particular application code), DynamoRIO also supports
 directly replacing an existing fragment with a new #instrlist_t:

 \code
 dr_replace_fragment()
 \endcode

 However, this routine is only supported when running with the \ref
 op_thread_priv "-thread_private" runtime option, and it replaces the
 fragment for the current thread only.  A client can call this routine even
 while inside the to-be-replaced fragment (e.g., in a clean call from inside
 the fragment).  In this scenario, the old fragment is executed to
 completion and the new code is inserted before the next execution.

 For example usage, see the client sample \ref sec_ex3.

 ********************
 \subsection sec_custom_traces Custom Traces

 DynamoRIO combines frequently executed sequences of basic blocks into
 <em>traces</em>.  It uses a simple profiling scheme based on <em>trace
 heads</em>, which are the targets of backward branches or exits from
 existing traces.  Execution counters are kept for each trace head.  Once a
 head crosses a threshold, the next sequence of basic blocks that are
 executed becomes a new trace.

 DynamoRIO allows a client to build custom traces by marking its own trace
 heads (<em>in addition</em> to DynamoRIO's normal trace heads) and
 deciding when to end traces.  If a client registers for the following
 event, DynamoRIO will call its hook before extending a trace (with tag \c
 trace_tag) with a new basic block (with tag \c next_tag):

 \code
 int query_end_trace(void *drcontext, void *trace_tag, void *next_tag);
 \endcode

 The client hook returns one of these values:
  - CUSTOM_TRACE_DR_DECIDES = use standard termination criteria
  - CUSTOM_TRACE_END_NOW = end trace now
  - CUSTOM_TRACE_CONTINUE = do not end trace

 If using standard termination criteria, DynamoRIO ends the trace if it
 reaches a trace head or another trace (or certain corner-case basic blocks
 that cannot be part of a trace).

 The client can also mark any basic block as a trace head with
 \code dr_mark_trace_head() \endcode

 For example usage, see the callee-inlining client sample \ref sec_ex5.

 \if custom_stubs
 ********************
 \subsection sec_custom_stubs Custom Exit Stubs

 An exit cti can be given an #instrlist_t to be prepended to the standard
 exit stub.  There are set and get methods for this custom exit stub code:

 \code
 void instr_set_exit_stub_code(instr_t *instr, instrlist_t *stub);
 instrlist_t *instr_exit_stub_code(instr_t *instr);
 \endcode

 When a fragment is re-decoded, e.g. when being appended to a trace or when
 re-decoded using dr_decode_fragment, the custom stubs are regenerated and
 added to the owning exit cti's.
 \endif


 ***************************************************************************
 \htmlonly
 <table width=100% bgcolor="#000000" cellspacing=0 cellpadding=2 border=0>
   <tr><td><table width=100% bgcolor="#0000C0" cellspacing=0 cellpadding=1 border=0>
   <tr><td><table width=100% bgcolor="#0000C0" cellspacing=0 cellpadding=1 border=0>
   <tr><td></td></tr></table></td></tr></table></td></tr></table>
 \endhtmlonly
 \section sec_reg_stolen Register Stolen by DynamoRIO

 On some architectures, e.g., ARM and AArch64, DynamoRIO steals a register for
 holding the base of DynamoRIO's own TLS (Thread-Local Storage).
 DynamoRIO guarantees the correctness of the application execution by saving
 and restoring the stolen register's value before and after that register is used
 by each application instruction.
 DynamoRIO also guarantees the stolen register's value is stored in the
 application machine context (dr_mcontext_t) for client use at event callbacks or
 clean calls.
 However, DynamoRIO exposes the stolen register to the client and places the burden
 on the client to ensure the correctness of its instrumentation.

 The client can use reg_is_stolen() or dr_get_stolen_reg() to identify the stolen
 register. To use the application value of the stolen register in the inserted code,
 the client must first use dr_insert_get_stolen_reg_value() to insert code to get
 the value into another register.
 Otherwise, the TLS base value might be used instead.
 Similarly, the client should use dr_insert_set_stolen_reg_value() to set the
 application value of the stolen register.

 ***************************************************************************
 \htmlonly
 <table width=100% bgcolor="#000000" cellspacing=0 cellpadding=2 border=0>
   <tr><td><table width=100% bgcolor="#0000C0" cellspacing=0 cellpadding=1 border=0>
   <tr><td><table width=100% bgcolor="#0000C0" cellspacing=0 cellpadding=1 border=0>
   <tr><td></td></tr></table></td></tr></table></td></tr></table>
 \endhtmlonly
 \section sec_translation State Translation

 To support transparent fault handling, DynamoRIO must translate a fault in
 the code cache into a fault at the corresponding application address.
 DynamoRIO must also be able to translate when a suspended thread is
 examined by the application or by DynamoRIO itself for internal
 synchronization purposes.

 If a client is only adding observational instrumentation (i.e., \ref
 sec_Meta) (which should not fault) and is not modifying, reordering, or
 removing application instructions, these details can be ignored.  In that
 case the client's basic block and trace callbacks should return
 #DR_EMIT_DEFAULT in addition to being deterministic and idempotent (i.e.,
 DynamoRIO should be able to repeatedly call the callback and receive back
 the same resulting instruction list, with no net state changes to the
 client).

 If a client is performing modifications, then in order for DynamoRIO to
 properly translate a code cache address the client must use
 instr_set_translation() (chainable via INSTR_XL8()) in the basic block and
 trace creation callbacks to set the corresponding application address for
 each added meta instruction that can fault, each modified instruction, and
 each added application instruction.  The
 translation value is the application address that should be presented to
 the application as the faulting address, or the application address that
 should be restarted after a suspend.  Currently the translation address
 must be within the existing range of source addresses for the basic block
 or trace.

 There are two methods for using the translated addresses:

 -# Return #DR_EMIT_STORE_TRANSLATIONS from the basic block creation
    callback.  DR will then store the translation addresses and use
    the stored information on a fault.  The basic block callback for
    \p tag will not be called with \p translating set to true.  Note
    that unless #DR_EMIT_STORE_TRANSLATIONS is also returned for \p
    for_trace calls (or #DR_EMIT_STORE_TRANSLATIONS is returned in
    the trace callback), each constituent block comprising the trace
    will need to be re-created with both \p for_trace and \p
    translating set to true.  Storing translations uses additional
    memory that can be significant: up to 20% in some cases, as it
    prevents DR from using its simple data structures and forces it
    to fall back to its complex, corner-case design.  This is why DR
    does not store all translations by default.
 -# Return #DR_EMIT_DEFAULT from the basic block or trace creation callback.
    DynamoRIO will then call the callback again during fault translation
    with \p translating set to true.  All modifications to the instruction
    list that were performed on the creation callback must be repeated on
    the translating callback.  This option is only posible when basic block
    modifications are deterministic and idempotent, but it saves memory.
    Naturally, global state changes triggered by block creation should be
    wrapped in checks for \p translating being false.  Even in this case,
    instr_set_translation() should be called for appropriate instructions even
    when \p translating is false, as DynamoRIO may decide to store the
    translations at creation time for reasons of its own.

 Furthermore, if the client's modifications change any part of the machine
 state besides the program counter, the client should use
 dr_register_restore_state_event() or dr_register_restore_state_ex_event()
 (see \ref sec_events_translation) to restore the registers to their
 original application values. DR attempts to reconstruct the #instrlist_t
 for the faulting fragment; this list contains all instrs added by the
 basic block event(s) with \p translating set to true, and also DR's own
 mangling of some instrs. If this reconstructed #instrlist_t is available,
 it will be passed on to the registered callback as part of
 #dr_fault_fragment_info_t in #dr_restore_state_info_t. It may not be
 available when some or all clients returned #DR_EMIT_STORE_TRANSLATIONS,
 or for DR internal reasons when the app code may not be consistent: for
 pending deletion or self-modifying fragments.

 For meta instructions that do not reference application memory (i.e., they
 should not fault), leave the translation field as NULL.  A NULL value
 instructs DynamoRIO to use the subsequent application instruction's
 translation as the application address, and to fail when translating the
 full state.  Since the full state will only be needed when relocating a
 thread (as stated, there will not be a fault here), failure indicates that
 this is not a valid relocation point, and DynamoRIO's thread
 synchronization scheme will use another spot.  If the translation field is
 set to a non-NULL value, the client should be willing to also restore the
 rest of the machine state at that point (restore spilled registers, etc.)
 via #dr_register_restore_state_event() or
 #dr_register_restore_state_ex_event().  This is necessary for meta
 instructions that reference application memory or that may deliberately fault
 when accessing client memory.  DynamoRIO takes care of
 such potentially-faulting instructions added by its own API routines
 (#dr_insert_clean_call() arguments that reference application data,
 #dr_insert_mbr_instrumentation()'s read of application indirect branch
 data, etc.)

 Here is an example of using the INSTR_XL8 macro to set the translation
 field for a meta instruction:

 \code
 #define PREM instrlist_meta_preinsert

 app_pc xl8 = instr_get_app_pc(inst);
 PREM(bb, inst, INSTR_XL8(INSTR_CREATE_mov_st(drcontext, dst, src), xl8));
 \endcode

 ***************************************************************************
 \htmlonly
 <table width=100% bgcolor="#000000" cellspacing=0 cellpadding=2 border=0>
   <tr><td><table width=100% bgcolor="#0000C0" cellspacing=0 cellpadding=1 border=0>
   <tr><td><table width=100% bgcolor="#0000C0" cellspacing=0 cellpadding=1 border=0>
   <tr><td></td></tr></table></td></tr></table></td></tr></table>
 \endhtmlonly
 \section sec_predication Conditionally Executed Instructions

 DynamoRIO models conditionally executed, or "predicated", instructions as
 regular instructions with extra predication attributes.  Use
 instr_is_predicated() to determine whether an instruction is conditionally
 executed.  If so, use instr_get_predicate() to determine the type of
 condition.  At execution time, instr_predicate_triggered() can be used to
 query whether an instruction will execute or not.

 The degree of conditional execution varies.  In some cases, when an
 instruction is not executed, it will not read any source operands nor write
 any destination operands.  In other cases, the condition on which it depends
 involves the value of a source operand (e.g., OP_bsf or OP_maskmovq).
 However, all conditionally executed instructions share the same property
 that their destination operands are conditionally written.  This does not
 apply to eflags, which are written to unconditionally by some conditional
 instructions.

 To aid in analyzing liveness and other properties of application code, all
 API routines that query whether registers or flags are written take a
 parameter of type #dr_opnd_query_flags_t that controls how to treat
 conditionally accessed operands: whether to include them or skip them.

 API routines that operate on raw instruction information, such as
 instr_num_dsts(), include all possible operands.  Clients should explicitly
 query instr_is_predicated() when using API routines that do not take in
 #dr_opnd_query_flags_t.

 As shorthand for emitting instrumentation that necessarily uses the same
 predicate as the current instruction, \p instrlist_set_auto_predicate() is
 provided which will predicate all instructions inserted into an instruction
 list. Although writing to aflags is strictly forbidden, as is meta control
 flow, internal DR components such as \p dr_insert_clean_call() will gracefully
 handle this auto-predication setting and are safe for use with it.

 \p instrlist_get_auto_predicate() similarly may be used to query the current
 predicate desired for auto predication.

 ********************
 \subsection sec_it_blocks IT Blocks

 On ARM AArch32, the Thumb mode includes conditional groups of instructions
 called IT blocks.  The #OP_it header instruction indicates how many
 instructions are in the block and the direction of each instruction's
 conditional.  Thus, inserting instrumentation inside the block without
 updating the header results in an unencodable instruction list.  To solve
 this, we provide two API routines: dr_remove_it_instrs() and
 dr_insert_it_instrs().  The first simply removes the headers.  Since the
 individual instructions from the blocks are marked with their condition in
 our IR, the header is not necessary for tools to analyze the instructions.
 The second re-instates the headers, creating a legal instruction list.
 This re-creation of the proper IT block headers occurs as a final phase in
 drmgr, after the instru2instru event.  This means that the original #OP_it
 header instructions are present for clients to observe in the analysis
 phase.

 Clients not using drmgr can call dr_remove_it_instrs() and then
 dr_insert_it_instrs() on their own.

 ***************************************************************************
 \htmlonly
 <table width=100% bgcolor="#000000" cellspacing=0 cellpadding=2 border=0>
   <tr><td><table width=100% bgcolor="#0000C0" cellspacing=0 cellpadding=1 border=0>
   <tr><td><table width=100% bgcolor="#0000C0" cellspacing=0 cellpadding=1 border=0>
   <tr><td></td></tr></table></td></tr></table></td></tr></table>
 \endhtmlonly
 \section sec_ldrex Exclusive Monitor Instrumentation

 On ARM and AArch64, a load-exclusive store-exclusive pair of
 instructions has some constraints that make inserting instrumentation
 in between the pair challenging.  ARM/AArch64 hardware requires that
 an application minimize memory operations in between the
 load-exclusive and store-exclusive and minimize the total number of
 instructions in between.  Violating this can result in failure to
 acquire the desired exclusive monitor, which is always done in a loop
 and can result in a non-terminating loop when the application is run
 with instrumentation.

 Inserting a few memory references in between the pair works
 on some but not all hardware. Inserting something heavyweight like a clean
 call is very likely to result in a non-terminating loop.

 By default, DynamoRIO converts each such sequence to a compare-and-swap
 sequence which can handle any amount of added instrumentation.
 However, compare-and-swap is not semantically identical: it does not
 detect "ABA" changes and could cause errors in lock-free data
 structures or other application constructs.  This compare-and-swap
 conversion can be disabled with the runtime option \ref op_ldstex2cas
 "no_ldstex2cas".

 If compare-and-swap conversion must be disabled, we recommend using
 inlined instrumentation with at most a few memory references in such
 regions, rather than clean calls.  In general, this is the best
 performing strategy anyway for heavyweight tools that want to
 instrument every instruction.  Take a look at the memtrace_simple and
 instrace_simple samples, which check for instr_is_exclusive_store() to
 avoid a clean call in between.  However, even this is not enough on
 some hardware, where instrumentation would have to be shifted to before
 and/or after the monitor region.  The runtime option
 "-unsafe_build_ldstex" may be useful on AArch64 hardware that does not
 allow any loads or stores between an exclusive load and the
 corresponding exclusive store. With this option DynamoRIO tries to
 turn a sequence of instructions containing an exclusive load/store
 pair into a macro-instruction, which prevents any loads and stores
 from being inserted, but also prevents any of the instructions
 involved from being instrumented.

 ***************************************************************************
 \htmlonly
 <table width=100% bgcolor="#000000" cellspacing=0 cellpadding=2 border=0>
   <tr><td><table width=100% bgcolor="#0000C0" cellspacing=0 cellpadding=1 border=0>
   <tr><td><table width=100% bgcolor="#0000C0" cellspacing=0 cellpadding=1 border=0>
   <tr><td></td></tr></table></td></tr></table></td></tr></table>
 \endhtmlonly
 \section sec_rseq Restartable Sequence Instrumentation Constraints

 The Linux kernel supports special code regions called restartable
 sequences.  This "rseq" feature is challenging to support under
 instrumentation due to the tight restrictions on operations inside the
 sequence.  Instrumentation inserted in the sequence would need to be
 designed to be restartable as well, with a single commit point.  Meeting
 such requirements is unrealistic for most instrumentation.  Instead, DR
 provides a "run twice" solution where the sequence is first executed as
 regular code with regular instrumentation up to the commit point.  Then the
 sequence is restarted and executed without instrumentation to perform the
 commit.

 This run-twice approach is subject to the following limitations:

 - Only x86 and aarch64 are supported for now, and 32-bit x86 is not as well-tested.
 - The application must store an rseq_cs struct for each rseq region in a
   section of its binary named "__rseq_cs", optionally with an "__rseq_cs_ptr_array"
   section of pointers into the __rseq_cs section, per established conventions.
   These sections must be located in loaded segments.
 - The application must use static thread-local storage for its struct rseq registrations.
 - The application must use the same signature for every rseq system call.
 - Each rseq region's code must never be also executed as a non-restartable sequence.
 - Each rseq region must handle being directly restarted without its
   abort handler being called (with the machine state restored: though just the
   general-purpose registers as described in the limitation below).
 - Each memory store instruction inside an rseq region must have no other side
   effects: it must only write to memory and not to any registers.
   For example, a push instruction which both writes to memory and the
   stack pointer register is not supported.
   An exception is a pre-indexed or post-indexed writeback store, which is
   supported.
 - Each rseq region's code must end with a fall-through (non-control-flow)
   instruction.
 - Indirect branches that do not exit the rseq region are not allowed.
 - Each rseq region must be entered only from the top, with no branches from outside
   the region targeting a point inside the region.
 - No system calls are allowed inside rseq regions.
 - No call instructions are allowed inside rseq regions.
 - The only register inputs to an rseq region, or registers written inside an rseq
   region whose values are then read afterward, must be general-purpose registers.
 - The instrumented execution of the rseq region may not perfectly reflect
   the native behavior of the application.  The instrumentation will never see
   an abort anywhere but just prior to the committing store.  The native execution
   might take a different conditional branch direction than the instrumented
   execution, leading to discrepancies in recorded instruction sequences (drmemtrace
   does try to fix these up in post-processing using the #DR_NOTE_RSEQ_ENTRY label).
   Additionally, memory addresses may be wrong if they are based on
   the live (as opposed to cached at the rseq region entrance) underlying cpuid
   and a migration occurred mid-region.  These are minor and
   acceptable for most tools (especially given that there is no better alternative).

 Some of these limitations are explicitly checked, and DR will exit with an error
 message if they are not met.  However, not all are efficiently verifiable.
 If an application does not satisfy these limitations, the \ref
 op_disable_rseq "disable_rseq" runtime option may be used to return ENOSYS,
 which can provide a workaround for applications which have fallback code
 for kernels where rseq is not supported.

 ***************************************************************************
 \htmlonly
 <table width=100% bgcolor="#000000" cellspacing=0 cellpadding=2 border=0>
   <tr><td><table width=100% bgcolor="#0000C0" cellspacing=0 cellpadding=1 border=0>
   <tr><td><table width=100% bgcolor="#0000C0" cellspacing=0 cellpadding=1 border=0>
   <tr><td></td></tr></table></td></tr></table></td></tr></table>
 \endhtmlonly
 \section sec_pcache Persisting Code

 Decoding, instrumenting, and emitting code into the code cache takes time.
 Short-running applications, or applications that execute large amounts of
 code with little code re-use, can incur noticeable overhead when run under
 DynamoRIO.  One solution is to write the code cache to a file for fast
 re-use on subsequent runs by simply loading the file.  DynamoRIO provides
 support for tools to persist their instrumented code.

 First, the \ref op_persist "-persist" runtime option, and optionally \p
 -persist_dir, must be set in order for any caches to be persisted.  Only
 basic block persistence is supported: no traces.  In the presence of a
 client, basic blocks by default are not persisted.  Only if the return
 value of the basic block event callback includes the DR_EMIT_PERSISTABLE
 flag is a block eligible for persistence.  Even then, there are further
 constraints on persistence, as only simple blocks are persistable.

 Persisted caches end in the extension \p .dpc, for DynamoRIO Persisted
 Cache, and are stored in the directory specified by the \p -persist_dir
 runtime option, or the log directory if unspecified, inside a per-user
 subdirectory.

 A client may need to store data in the persisted file in order to
 determine whether it is re-usable when loaded again, or to provide
 generated code or other auxiliary data or code that the persisted code
 requires.  A set of events are provided for this purpose.  These events
 allow a client to store three types of data in a persisted file, beyond
 instrumented code inside each basic block: read-only data, executed code
 (outside of basic blocks), and writable data.  The types of data are
 separated because the file is laid out in different protection zones.
 Read-only data can be added using dr_register_persist_ro(), executable code
 using dr_register_persist_rx(), and writable data using
 dr_register_persist_rw().  Additionally, the basic blocks to be persisted
 can be patched using dr_register_persist_patch().

 Whenever code is about to be persisted, DynamoRIO will call all of the
 registered events for that module.  A user data parameter can be used to
 share information across the event callbacks.

 Clients are cautioned to ensure their instrumentation is either
 position-independent or properly patched to operate correctly when the
 client library base or the persisted code addresses change.  For example,
 if inserted instrumentation includes calls or jumps into the client
 library, these can be persisted unchanged if the client also stores its
 base address in the read-only section and in the resurrection callback
 checks it against its current base address.  On a mismatch, the persisted
 file must be rejected.  A more sophisticated approach requires indirection,
 position independence in the code, or patching.

 DynamoRIO itself ensures that a persisted file is only re-used if its
 application module has not changed, if the set of clients in use is
 identical to those present on creation of the file, and that the TLS offset
 is identical.  The application module check currently includes the base
 address on Windows, which precludes re-using persisted files for libraries
 loaded at different addresses via ASLR.  (In the future we plan to provide
 application relocation support, but it is not there today.).  The client
 check is based on the absolute paths.  If a client needs to validate based
 on its runtime options, or do a version check based on its own changing
 instrumentation, it must do that on its own in the event callbacks.  The
 TLS check ensures that TLS scratch slots are identical.  DynamoRIO also
 ensures that any runtime options that affect persistent code (such as
 whether traces are enabled) are identical.

 ***************************************************************************
 ****************************************************************************
 ****************************************************************************
 */