blob: 65e013c6f16431672afe061fb3421df21c01e119 [file] [log] [blame]
/* **********************************************************
* Copyright (c) 2012-2021 Google, Inc. All rights reserved.
* Copyright (c) 2007-2009 VMware, Inc. All rights reserved.
* **********************************************************/
/*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* * Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
*
* * Redistributions in binary form must reproduce the above copyright notice,
* this list of conditions and the following disclaimer in the documentation
* and/or other materials provided with the distribution.
*
* * Neither the name of VMware, Inc. nor the names of its contributors may be
* used to endorse or promote products derived from this software without
* specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL VMWARE, INC. OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
* SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
* DAMAGE.
*/
/**
\page API_samples Sample Tools
We provide several examples in the <tt>samples</tt> directory of the
release package to illustrate how the DynamoRIO API is used to build a
DynamoRIO client.
There are also samples for the Dr. Memory Framework located in the
<tt>drmemory/drmf/samples</tt> directory of the release package.
For larger examples of clients, see the provided \ref page_tool, which are
larger and more polished end-user clients than these samples.
********************
\section sample_list List of Samples
The sample <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/bbbuf.c">bbbuf.c</a>
demonstrates how to use a TLS field for per-thread basic block profiling.
The sample <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/bbcount.c">bbcount.c</a>
illustrates how to perform performant instrumentation
for reporting the dynamic execution count of all basic blocks.
The sample <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/bbsize.c">bbsize.c</a>
collects statistics on the sizes of all basic blocks in the target application.
The sample <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/callstack.cpp">callstack.cpp</a> walks the application callstack at a given function entry point.
The sample <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/cbr.c">cbr.c</a> collects conditional branch
execution information and shows how to dynamically
update or replace instrumented code after it executes.
The sample <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/cbrtrace.c">cbrtrace.c</a> collects
conditional branch execution traces and writes them into files.
The sample <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/countcalls.c">countcalls.c</a>
reports the dynamic execution count for direct calls, indirect calls, and
returns in the target application. It illustrates how to perform
performant inline increments and use per-thread data structures.
The sample <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/div.c">div.c</a> demonstrates
profiling the types of values fed to a particular opcode.
The sample <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/empty.c">empty.c</a> is provided
as an example client that does nothing.
The sample <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/inc2add.c">inc2add.c</a>
performs a dynamic optimization: it converts the "inc" instruction to
"add 1" without perturbing the target application's behavior.
The sample <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/inline.c">inline.c</a>
performs an optimization that uses the custom trace API to inline
entire callees into traces.
The sample <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/inscount.cpp">inscount.cpp</a>
reports the dynamic count of the total number of instructions executed via
inserting performant clean calls which are auto-inlined by DynamoRIO.
It also illustrates use of the \ref page_droption.
The sample <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/instrace_simple.c">instrace_simple.c</a>
is provided as an example client that illustrates how to gather an instruction
trace in a simple, cross-platform way. It is much slower than
instrace_x86_binary, however.
The sample <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/instrace_x86.c">instrace_x86.c</a> is
provided as an example client that illustrates how to create a private code
cache and perform lean procedure calls to generate an instruction trace in
a performant manner. It is compiled into two different libraries:
instrace_x86_text, which produces a readable, text-mode trace, but is much
slower than instrace_x86_binary, which produces a binary-format trace file.
The sample <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/instrcalls.c">instrcalls.c</a>
demonstrates how to instrument direct calls, indirect calls and returns.
The sample <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/memtrace_simple.c">memtrace_simple.c</a>
is provided as an example client that illustrates how to gather a memory
trace in a simple, cross-platform way. It is much slower than
memtrace_x86_binary, however.
The sample <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/memval_simple.c">memval_simple.c</a>
is provided as an example client that illustrates how to gather a memory
trace of stored addresses and the values being stored.
The sample <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/memtrace_x86.c">memtrace_x86.c</a>
is provided as an example client that illustrates how to create a private
code cache and perform lean procedure calls. It is compiled into two
different libraries: memtrace_x86_text, which produces a readable,
text-mode trace, but is much slower than memtrace_x86_binary, which
produces a binary-format trace file.
The sample <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/hot_bbcount.c">hot_bbcount.c</a>
uses the drbbdup extension to count the execution of hot basic blocks which have exceeded a given hit threshold.
The sample <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/opcode_count.cpp">opcode_count.cpp</a>
takes an opcode as input and uses drmgr's instrumentation events to count the number of times instructions with that opcode
are executed.
The sample <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/modxfer.c">modxfer.c</a>
reports the control flow transfers between modules.
The sample <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/modxfer_app2lib.c">modxfer_app2lib.c</a>
reports the control flow transfers between the application executable and
other dynamic libraries and modules.
It illustrates how to perform performant clean calls on different modules.
The sample <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/opcodes.c">opcodes.c</a> computes dynamic
execution counts broken down by instruction opcode.
The sample <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/prefetch.c">prefetch.c</a> demonstrates
modifying the dynamic code stream for compatibility between different
processor types.
The sample <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/signal.c">signal.c</a> demonstrates how to
use the signal event.
The sample <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/statecmp.c">statecmp.c</a> demonstrates how to
use the drstatecmp instrumentation testing library.
The sample <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/stl_test.cpp">stl_test.cpp</a> is provided
as an example client that uses C++ STL containers.
The sample <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/syscall.c">syscall.c</a> displays how to
use the system call events and API routines.
The sample <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/tracedump.c">tracedump.c</a> is provided
as a standalone application that disassembles a trace dump in binary format
produced by the -tracedump_binary option.
The sample <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/wrap.c">wrap.c</a> demonstrates how to use
the drwrap extension (see \ref page_drwrap).
The sample <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/ssljack.c">ssljack.c</a> demonstrates how to
hook OpenSSL and GnuTLS functions, using the drwrap extension.
\section bt_examples Discussion of Selected Samples
********************
\subsection sec_ex1 Instruction Counting
We now illustrate how to use the above API to implement a simple
instrumentation client for counting the number of executed call and
return instructions in the input program. Full code for this example is
in the file <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/countcalls.c">countcalls.c</a>.
The client maintains set of three counters: num_direct_calls,
num_indirect_calls, and num_returns to count three different
types of instructions during execution. It keeps both thread-private
and global versions of these counters. The client initializes
everything by supplying the following \p dr_client_main routine:
\code
DR_EXPORT void
dr_client_main(client_id_t id, int argc, const char *argv[])
{
/* register events */
dr_register_exit_event(event_exit);
dr_register_thread_init_event(event_thread_init);
dr_register_thread_exit_event(event_thread_exit);
dr_register_bb_event(event_basic_block);
/* make it easy to tell, by looking at log file, which client executed */
dr_log(NULL, DR_LOG_ALL, 1, "Client 'countcalls' initializing\n");
}
\endcode
The client provides an event_exit routine that displays the final
values of the global counters as well as a thread_exit routine that
shows the counter totals on a per-thread basis.
The client keeps track of each thread's instruction counts separately.
To do this, it creates a data structure that will be separately allocated
for each thread:
\code
typedef struct {
int num_direct_calls;
int num_indirect_calls;
int num_returns;
} per_thread_t;
\endcode
Now the thread hooks are used to initialize the data structure and to
display the thread-private totals :
\code
static void event_thread_init(void *drcontext)
{
/* create an instance of our data structure for this thread */
per_thread *data = (per_thread *)
dr_thread_alloc(drcontext, sizeof(per_thread));
/* store it in the slot provided in the drcontext */
dr_set_tls_field(drcontext, data);
data->num_direct_calls = 0;
data->num_indirect_calls = 0;
data->num_returns = 0;
dr_log(drcontext, DR_LOG_ALL, 1, "countcalls: set up for thread "TIDFMT"\n",
dr_get_thread_id(drcontext));
}
static void event_thread_exit(void *drcontext)
{
per_thread *data = (per_thread *) dr_get_tls_field(drcontext);
... // string formatting and displaying
/* clean up memory */
dr_thread_free(drcontext, data, sizeof(per_thread));
}
\endcode
The real work is done in the basic block hook. We simply look for
the instructions we're interested in and insert an increment of the
appropriate thread-local and global counters, remembering to save the
flags, of course. This sample has separate paths for incrementing the
thread private counts for shared vs. thread-private caches (see the
-thread_private option) to illustrate the differences in targeting
for them. Note that the shared path would work fine with private
caches.
\code
static void
insert_counter_update(void *drcontext, instrlist_t *bb, instr_t *where, int offset)
{
/* Since the inc instruction clobbers 5 of the arithmetic eflags,
* we have to save them around the inc. We could be more efficient
* by not bothering to save the overflow flag and constructing our
* own sequence of instructions to save the other 5 flags (using
* lahf) or by doing a liveness analysis on the flags and saving
* only if live.
*/
dr_save_arith_flags(drcontext, bb, where, SPILL_SLOT_1);
/* Increment the global counter using the lock prefix to make it atomic
* across threads. It would be cheaper to aggregate the thread counters
* in the exit events, but this sample is intended to illustrate inserted
* instrumentation.
*/
instrlist_meta_preinsert(bb, where, LOCK(INSTR_CREATE_inc
(drcontext, OPND_CREATE_ABSMEM(((byte *)&global_count) + offset, OPSZ_4))));
/* Increment the thread private counter. */
if (dr_using_all_private_caches()) {
per_thread_t *data = (per_thread_t *) dr_get_tls_field(drcontext);
/* private caches - we can use an absolute address */
instrlist_meta_preinsert(bb, where, INSTR_CREATE_inc(drcontext,
OPND_CREATE_ABSMEM(((byte *)&data) + offset, OPSZ_4)));
} else {
/* shared caches - we must indirect via thread local storage */
/* We spill xbx to use a scratch register (we could do a liveness
* analysis to try and find a dead register to use). Note that xax
* is currently holding the saved eflags. */
dr_save_reg(drcontext, bb, where, REG_XBX, SPILL_SLOT_2);
dr_insert_read_tls_field(drcontext, bb, where, REG_XBX);
instrlist_meta_preinsert(bb, where,
INSTR_CREATE_inc(drcontext, OPND_CREATE_MEM32(REG_XBX, offset)));
dr_restore_reg(drcontext, bb, where, REG_XBX, SPILL_SLOT_2);
}
/* restore flags */
dr_restore_arith_flags(drcontext, bb, where, SPILL_SLOT_1);
}
static dr_emit_flags_t
event_basic_block(void *drcontext, void *tag, instrlist_t *bb,
bool for_trace, bool translating)
{
instr_t *instr, *next_instr;
... // some logging
for (instr = instrlist_first(bb); instr != NULL; instr = next_instr) {
/* grab next now so we don't go over instructions we insert */
next_instr = instr_get_next(instr);
/* instrument calls and returns -- ignore far calls/rets */
if (instr_is_call_direct(instr)) {
insert_counter_update(drcontext, bb, instr,
offsetof(per_thread_t, num_direct_calls));
} else if (instr_is_call_indirect(instr)) {
insert_counter_update(drcontext, bb, instr,
offsetof(per_thread_t, num_indirect_calls));
} else if (instr_is_return(instr)) {
insert_counter_update(drcontext, bb, instr,
offsetof(per_thread_t, num_returns));
}
}
... // some logging
return DR_EMIT_DEFAULT;
}
\endcode
\par Building the Example
For general instructions on building a client, see \ref page_build_client.
To build the \p instrcalls.c client using CMake, if \p DYNAMORIO_HOME is
set to the base of the DynamoRIO release package:
\code
mkdir build
cd build
cmake -DDynamoRIO_DIR=$DYNAMORIO_HOME/cmake $DYNAMORIO_HOME/samples
make instrcalls
\endcode
To build 32-bit samples when using gcc with a default of 64-bit, use:
\code
mkdir build
cd build
CFLAGS=-m32 CXXFLAGS=-m32 cmake -DDynamoRIO_DIR=$DYNAMORIO_HOME/cmake $DYNAMORIO_HOME/samples
make instrcalls
\endcode
The result is a shared library instrcalls.dll or libinstrcalls.so. To
invoke the client library, follow the instructions under \ref page_deploy.
********************
\subsection sec_ex2 Instruction Profiling
The next example shows how to use the provided control flow
instrumentation routines, which allow more sophisticated profiling than
simply counting instructions. Full code for this example is
in the file <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/instrcalls.c">instrcalls.c</a>.
As in the previous example, the client is interested in direct and
indirect calls and returns. The client wants to analyze the target
address of each dynamic instance of a call or return. For our example,
we simply dump the data in text format to a separate file for each
thread. Since FILE cannot be exported from a DLL on Windows, we use the
DynamoRIO-provided file_t type that hides the distinction between FILE and
HANDLE to allow the same code to work on Linux and Windows.
We make use of the thread initialization and exit routines to
open and close the file. We store the file for a thread in the user
slot in the drcontext.
\code
static void event_thread_init(void *drcontext)
{
/* we're going to dump our data to a per-thread file */
file_t f;
char logname[512];
... // filename generation
f = dr_open_file(fname, false/*write*/);
DR_ASSERT(f != INVALID_FILE);
/* store it in the slot provided in the drcontext */
dr_set_tls_field(drcontext, (void *)f);
... // logging
}
static void event_thread_exit(void *drcontext)
{
file_t f = (file_t)(ptr_uint_t) dr_get_tls_field(drcontext);
dr_close_file(f);
}
\endcode
The basic block hook inserts a call to a procedure for each
type of instruction, using the API-provided
dr_insert_call_instrumentation and dr_insert_mbr_instrumentation
routines, which insert calls to procedures with a certain signature.
\code
static dr_emit_flags_t
event_basic_block(void *drcontext, void *tag, instrlist_t *bb,
bool for_trace, bool translating)
{
instr_t *instr, *next_instr;
... // logging
for (instr = instrlist_first(bb); instr != NULL; instr = next_instr) {
next_instr = instr_get_next(instr);
if (!instr_opcode_valid(instr))
continue;
/* instrument calls and returns -- ignore far calls/rets */
if (instr_is_call_direct(instr)) {
dr_insert_call_instrumentation(drcontext, bb, instr, (app_pc)at_call);
} else if (instr_is_call_indirect(instr)) {
dr_insert_mbr_instrumentation(drcontext, bb, instr, (app_pc)at_call_ind,
SPILL_SLOT_1);
} else if (instr_is_return(instr)) {
dr_insert_mbr_instrumentation(drcontext, bb, instr, (app_pc)at_return,
SPILL_SLOT_1);
}
}
return DR_EMIT_DEFAULT;
}
\endcode
These procedures look like this :
\code
static void
at_call(app_pc instr_addr, app_pc target_addr)
{
file_t f = (file_t)(ptr_uint_t) dr_get_tls_field(dr_get_current_drcontext());
dr_mcontext_t mc;
dr_get_mcontext(dr_get_current_drcontext(), &mc, NULL);
dr_fprintf(f, "CALL @ "PFX" to "PFX", TOS is "PFX"\n",
instr_addr, target_addr, mc.xsp);
}
static void
at_call_ind(app_pc instr_addr, app_pc target_addr)
{
file_t f = (file_t)(ptr_uint_t) dr_get_tls_field(dr_get_current_drcontext());
dr_fprintf(f, "CALL INDIRECT @ "PFX" to "PFX"\n", instr_addr, target_addr);
}
static void
at_return(app_pc instr_addr, app_pc target_addr)
{
file_t f = (file_t)(ptr_uint_t) dr_get_tls_field(dr_get_current_drcontext());
dr_fprintf(f, "RETURN @ "PFX" to "PFX"\n", instr_addr, target_addr);
}
\endcode
The address of the instruction and the address of its target are both
provided. These routines could perform some sort of analysis based on
these addresses. In our example we simply print out the data.
********************
\subsection sec_ex3 Modifying Existing Instrumentation
In this example, we show how to update or replace existing
instrumentation after it executes. This ability is useful for clients
performing adaptive optimization. In this example, however, we are
interested in recording the direction of all conditional branches, but
wish to remove the overhead of instrumentation once we've gathered
that information. This code could form part of a dynamic CFG builder,
where we want to observe the control-flow edges that execute at
runtime, but remove the instrumentation after it executes.
While DynamoRIO supports direct fragment replacement, another method
for re-instrumentation is to flush the fragment from the code cache
and rebuild it in the basic block event callback. In other words, we
take the following approach:
-# In the basic block event callback, insert separate instrumentation
for the taken and fall-through edges.
-# When the basic block executes, note the direction taken and flush
the fragment from the code cache.
-# When the basic block event triggers again, insert instrumentation
only for the unseen edge. After both edges have triggered, remove
all instrumentation for the cbr.
We insert separate clean calls for the taken and fall-through cases.
In each clean call, we record the observed direction and immediately
flush the basic block using dr_flush_region(). Since that routine
removes the calling block, we redirect execution to the target or
fall-through address with dr_redirect_execution(). The file <a
href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/cbr.c">cbr.c</a> contains the full code for this
sample.
********************
\subsection sec_ex4 Optimization
For the next example we consider a client application for a simple
optimization. The optimizer replaces every increment/decrement operation
with a corresponding add/subtract operation if running on a Pentium 4,
where the add/subtract is less expensive. For optimizations, we are
less concerned with covering all the code that is executed; on the
contrary, in order to amortize the optimization overhead, we only want
to apply the optimization to hot code. Thus, we apply the optimization
at the trace level rather than the basic block level. Full code for
this example is in the file <a
href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/inc2add.c">inc2add.c</a>.
********************
\subsection sec_ex5 Custom Tracing
This example demonstrates the custom tracing interface. It changes
DynamoRIO's tracing behavior to favor making traces that start at a call
and end right after a return. It demonstrates the use of both custom trace
API elements :
\code
int query_end_trace(void *drcontext, void *trace_tag, void *next_tag);
bool dr_mark_trace_head(void *drcontext, void *tag);
\endcode
Full code for this example is in the file <a
href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/inline.c">inline.c</a>.
********************
\subsection sec_ex6 Use of x87 Floating Point Operation in a Client
Because saving the x87 floating point state is very expensive, on x86 DynamoRIO
seeks to do so on an as needed basis. If a client wishes to use floating point
operations and is unsure whether its compiler will use x87 or not, or if it
wishes to use MMX registers, it must
save and restore the application's floating point state
around the usage. For an inserted clean call out of the code cache, this
can be conveniently done using dr_insert_clean_call() and passing true for
the save_fpstate parameter. It can also be done explicitly using these
routines:
\code
void proc_save_fpstate(byte *buf);
void proc_restore_fpstate(byte *buf);
\endcode
These routines must be used if x87 floating point operations are performed in
non-inserted-call locations, such as event callbacks. Note that there are
restrictions on how these methods may be called: see the documentation in
the header files for additional information. Note also that the floating
point state must be saved around calls to our provided printing routines
when they are used to print floats. However, it is not necessary to save
and restore the floating point state around floating point operations if
they are being used in the initialization or termination routines.
On ARM and AArch64 the SIMD/FP registers are always saved, so
proc_save_fpstate and proc_restore_fpstate are no-ops.
On x86, modern compilers typically do not use x87 operations, but to be
safe clients are still advised to either avoid floating-point operations
or use the preservation routines listed here.
This example client counts the number of basic blocks processed and keeps
statistics on their average size using floating point operations. Full code
for this example is in the file <a
href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/bbsize.c">bbsize.c</a>.
********************
\subsection sec_drstats Use of Custom Client Statistics with the Windows GUI
The new Windows GUI will display custom client statistics, if they are
placed in shared memory with a certain name. The sample <a
href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/stats.c">stats.c</a> gives code for the
protocol used in the form of a sample client that counts total
instructions, floating-point instructions, and system calls.
Note that the stats.c example client and the Windows GUI must both be run
within the same session in order for the statistics to be shared
properly. They can be modified to use a "Global" prefix instead of "Local"
for cross-session sharing, though this requires running with administrative
privileges.
********************
\subsection sec_ex8 Use of Standalone API
The binary tracedump reader also functions as an example of
\ref page_standalone : <a href="https://github.com/DynamoRIO/dynamorio/tree/master/api/samples/tracedump.c">tracedump.c</a>.
*/