i#4134 drbbdup: Avoid flags preservation for 2-case dispatch (#5323)

When there are just 2 drbbdup cases and one has an encoding of zero,
we can use a flags-free jump-if-register-is-zero for our dispatch,
avoiding flags preservation costs.

Applies this to x86 as well by switching to the xcx scratch register
and using JECXZ.  JECXZ is relatively slow on modern processors.  I
measure its performance, and it depends on the application whether it
out-performs savings the flags.  I left it as the default with hopes
that it will help more often than not on larger clients and
applications, but we can remove it if that is not borne out in future
evaluations.

The existing no-encode-test meets the criteria and serves as one test.
This duplicates the no-encode-test but with the default encoding as 1
and the additional as 0 to test that path, called `client.drbbdup-nonzero-test`.

Before:
  --------------------------------------------------
  after instrumentation:
  TAG  0x0000ffff868340c0
   +0    m4 @0x0000fffd428950e8  f900b781   str    %x1 -> +0x0168(%x28)[8byte]
   +4    m4 @0x0000fffd42894da0  d53b4200   mrs    %nzcv -> %x0
   +8    m4 @0x0000fffd42894cd8  f900af80   str    %x0 -> +0x0158(%x28)[8byte]
   +12   m4 @0x0000fffd42894c58  d28c1000   movz   $0x6080 lsl $0x00 -> %x0
   +16   m4 @0x0000fffd42894bd8  f2a85000   movk   %x0 $0x4280 lsl $0x10 -> %x0
   +20   m4 @0x0000fffd42894b10  f2dfffe0   movk   %x0 $0xffff lsl $0x20 -> %x0
   +24   m4 @0x0000fffd42894a48  f9400000   ldr    (%x0)[8byte] -> %x0
   +28   m4 @0x0000fffd42894e20  f9400000   <label>
   +28   m4 @0x0000fffd42894980  f100041f   subs   %x0 $0x0000000000000001 lsl $0x0000000000000000 -> %xzr
   +32   m4 @0x0000fffd42894900  54000001   b.ne   @0x0000fffd42894fa0[8byte]
  --------------------------------------------------
After:
  --------------------------------------------------
  after instrumentation:
  TAG  0x0000ffffa53f20c0
   +0    m4 @0x0000fffd614530e8  f900b781   str    %x1 -> +0x0168(%x28)[8byte]
   +4    m4 @0x0000fffd61452da0  d28a1000   movz   $0x5080 lsl $0x00 -> %x0
   +8    m4 @0x0000fffd61452cd8  f2ac2780   movk   %x0 $0x613c lsl $0x10 -> %x0
   +12   m4 @0x0000fffd61452c58  f2dfffe0   movk   %x0 $0xffff lsl $0x20 -> %x0
   +16   m4 @0x0000fffd61452bd8  f9400000   ldr    (%x0)[8byte] -> %x0
   +20   m4 @0x0000fffd61452e20  f9400000   <label>
   +20   m4 @0x0000fffd61452b10  b4000000   cbz    @0x0000fffd61452fa0[8byte] %x0
  --------------------------------------------------

Issue: #4134
5 files changed
tree: b11ce77ef3b9fb078120edb50bb63d7d04d15132
  1. .github/
  2. api/
  3. clients/
  4. core/
  5. ext/
  6. libutil/
  7. make/
  8. suite/
  9. third_party/
  10. tools/
  11. .clang-format
  12. ACKNOWLEDGEMENTS
  13. CMakeLists.txt
  14. CONTRIBUTING.md
  15. CTestConfig.cmake
  16. License.txt
  17. README
  18. README.md
README.md

DynamoRIO

DynamoRIO logo

About DynamoRIO

DynamoRIO is a runtime code manipulation system that supports code transformations on any part of a program, while it executes. DynamoRIO exports an interface for building dynamic tools for a wide variety of uses: program analysis and understanding, profiling, instrumentation, optimization, translation, etc. Unlike many dynamic tool systems, DynamoRIO is not limited to insertion of callouts/trampolines and allows arbitrary modifications to application instructions via a powerful IA-32/AMD64/ARM/AArch64 instruction manipulation library. DynamoRIO provides efficient, transparent, and comprehensive manipulation of unmodified applications running on stock operating systems (Windows, Linux, or Android) and commodity IA-32, AMD64, ARM, and AArch64 hardware. Mac OSX support is in progress.

Existing DynamoRIO-based tools

DynamoRIO is the basis for some well-known external tools:

Tools built on DynamoRIO and available in the release package include:

  • The memory debugging tool Dr. Memory
  • The tracing and analysis framework drmemtrace with multiple tools that operate on both online (with multi-process support) and offline instruction and memory address traces:
  • The legacy processor emulator drcpusim
  • The “strace for Windows” tool drstrace
  • The code coverage tool drcov
  • The library tracing tool drltrace
  • The memory address tracing tool memtrace (drmemtrace's offline traces are faster with more surrounding infrastructure, but this is a simpler starting point for customized memory address tracing)
  • The memory value tracing tool memval
  • The instruction tracing tool instrace (drmemtrace's offline traces are faster with more surrounding infrastructure, but this is a simpler starting point for customized instruction tracing)
  • The basic block tracing tool bbbuf
  • The instruction counting tool inscount
  • The dynamic fuzz testing tool Dr. Fuzz
  • The disassembly tool drdisas
  • And more, including opcode counts, branch instrumentation, etc.: see \ref API_samples.

Building your own custom tools

DynamoRIO‘s powerful API abstracts away the details of the underlying infrastructure and allows the tool builder to concentrate on analyzing or modifying the application’s runtime code stream. API documentation is included in the release package and can also be browsed online. Slides from our past tutorials are also available.

Downloading DynamoRIO

DynamoRIO is available free of charge as a binary package for both Windows and Linux. DynamoRIO's source code is available primarily under a BSD license.

Obtaining Help

Use the discussion list to ask questions.

To report a bug, use the issue tracker.

See also the DynamoRIO home page: http://dynamorio.org/