i#7860 load bal: Add load balancing to drmemtrace analyzer (#7878)

When running drmemtrace analyzers with dynamic scheduling (i.e.,
core-sharded with live scheduling instead of replay), we want our worker
threads to have relatively even "activity" as seen by a simulator for a
reasonable final virtual schedule. Atomic activity counts are
periodically updated and examined to accomplish this. If a worker's
activity reaches a specified ratio versus the slowest worker, it sleeps
until it is under the ratio. A new CLI flag --max_load_imbalance sets
the max ratio; the default is 2.5.

Adds a unit test, though it would likely become flaky if waiting were
required. Confirmed manually that there is waiting in the 3 fast workers
for the slow worker #0 who ends up with fewer instructions but within
the ratio (there are no idles in this test):
```
12: [analyzer] Worker 2 @3000 waiting for slowest 3 @1000
12: [analyzer] Worker 3 @13000 waiting for slowest 0 @1000
12: [analyzer] Worker 1 @14000 waiting for slowest 0 @1000
12: [analyzer] Worker 0 waited 0 times for load balancing
12: [analyzer] Worker 1 waited 71 times for load balancing
12: [analyzer] Worker 3 waited 71 times for load balancing
12: [analyzer] Worker 2 waited 71 times for load balancing
12: shard 0 saw 215000 instructions
12: shard 1 saw 530000 instructions
12: shard 2 saw 535000 instructions
12: shard 3 saw 530000 instructions
```

Tested on larger traces on 80 cores with a live max of 60 with target
ratios at 2.0, 2.5, and 3.0, which were precisely hit in two consecutive
runs each with the 100K check cadence settled on in the code here
(auto-raising to 1M or higher reduces accuracy with some ratios climbing
too high). The check_load_balance() function does show up in some
profiles as high as 3.3% but seems worth the cost. When reducing below
1.5, targets are hard to reach, and
the overhead does go up: check_load_balance() is 4.6% in a 1.25-target
run and 15% in a 1.0 run in these experiments, with the ratio not
dropping under 1.5: so 1.5 may be a practical limit, as noted in the
option description, for now: which should be fine for initial use cases.

Fixes #7860
6 files changed
tree: 93153742de79d9e4b16c51f0042aa7db80879067
  1. .github/
  2. api/
  3. clients/
  4. core/
  5. ext/
  6. libutil/
  7. make/
  8. suite/
  9. third_party/
  10. tools/
  11. .clang-format
  12. .gitignore
  13. .gitmodules
  14. ACKNOWLEDGEMENTS
  15. CMakeLists.txt
  16. CONTRIBUTING.md
  17. CTestConfig.cmake
  18. License.txt
  19. README
  20. README.md
README.md

DynamoRIO

DynamoRIO logo

About DynamoRIO

DynamoRIO is a runtime code manipulation system that supports code transformations on any part of a program, while it executes. DynamoRIO exports an interface for building dynamic tools for a wide variety of uses: program analysis and understanding, profiling, instrumentation, optimization, translation, etc. Unlike many dynamic tool systems, DynamoRIO is not limited to insertion of callouts/trampolines and allows arbitrary modifications to application instructions via a powerful IA-32/AMD64/ARM/AArch64 instruction manipulation library. DynamoRIO provides efficient, transparent, and comprehensive manipulation of unmodified applications running on stock operating systems (Windows, Linux, or Android) and commodity IA-32, AMD64, ARM, and AArch64 hardware. Mac OSX support is in progress.

Existing DynamoRIO-based tools

DynamoRIO is the basis for some well-known external tools:

Tools built on DynamoRIO and available in the release package include:

  • The memory debugging tool Dr. Memory
  • The tracing and analysis framework drmemtrace with multiple tools that operate on both online (with multi-process support) and offline instruction and memory address traces:
  • The legacy processor emulator drcpusim
  • The “strace for Windows” tool drstrace
  • The code coverage tool drcov
  • The library tracing tool drltrace
  • The memory address tracing tool memtrace (drmemtrace's offline traces are faster with more surrounding infrastructure, but this is a simpler starting point for customized memory address tracing)
  • The memory value tracing tool memval
  • The instruction tracing tool instrace (drmemtrace's offline traces are faster with more surrounding infrastructure, but this is a simpler starting point for customized instruction tracing)
  • The basic block tracing tool bbbuf
  • The instruction counting tool inscount
  • The dynamic fuzz testing tool Dr. Fuzz
  • The disassembly tool drdisas
  • And more, including opcode counts, branch instrumentation, etc.: see API samples

Building your own custom tools

DynamoRIO‘s powerful API abstracts away the details of the underlying infrastructure and allows the tool builder to concentrate on analyzing or modifying the application’s runtime code stream. API documentation is included in the release package and can also be browsed online. Slides from our past tutorials are also available.

Downloading DynamoRIO

DynamoRIO is available free of charge as a binary package for both Windows and Linux. DynamoRIO's source code is available primarily under a BSD license.

Obtaining Help

Use the discussion list to ask questions.

To report a bug, use the issue tracker.

See also the DynamoRIO home page: http://dynamorio.org/