commit	6d1912f6f6655b9185aa6fc73243819c8e905cac	[log] [tgz]
author	Derek Bruening <bruening@google.com>	Thu Dec 14 22:35:23 2023
committer	GitHub <noreply@github.com>	Thu Dec 14 22:35:23 2023
tree	23836198bea6d04381be6cd90fce533de37beda5
parent	dede2bae2b43f1ce85747cbf7ddfe6efc919d8b8 [diff]

i#6471 idle: Add better idle time modeling (#6505)

Changes instruction quantum idle handling to use wall-clock time,
instead of a counter decremented in queue pops. The counter was skewed
unfairly by the rate of queue queries. This results in all quanta using
a unified time-based approach for blocked time.

Changes block_time_scale to remove the division by the latency
threshold. Now the scale is directly multiplied by the latency to result
in the blocked time units. The default scale is set to 1000 which
matches the wall-clock time to process an instruction by the
schedule_stats tool (about 1 instruction per microsecond) and which
should be a rough match for many simulators passing one nanosecond cycle
per instruction as the time unit.

Adds a new scheduler option block_time_max and drcachesim option
-sched_block_max which caps the blocked time for a syscall (default 25
seconds) to avoid outliers being scaled to extreme amounts of time.

Adds a heartbeat of queue lookups, currently only in debug build, to
help understand behavior over time.

Sets the input ordinal to invalid while idle. This also causes
schedule_stats to consider a transition from idle to a valid input to be
a context switch, which matches how the Linux kernel counts switches.

Augments schedule_stats with a wall-clock measure of cpu and idle time
for a much fairer %cpu metric than the previous one based purely on
record counts, as schedule_stats processes an instruction much more
quickly than an idle record. Adds two %cpu metrics: one that does not
include idle time past the final instruction (for skewed finishes across
cores) and one that includes idle time until all cpus are done.

Increases the default -schedule_stats_print_every to 500K to avoid
extremely long strings for larger workloads.

Updates the scheduler unit tests for the timing changes, changing the
ones testing idle time to use a deterministic mock time quantum to avoid
wall-clock time flakiness.

Tested on large traces on dozens of cores with known idle
characteristics where by tweaking the parameters I was able to get
reasonably representative idle times.

Issue: #6471

10 files changed

tree: 23836198bea6d04381be6cd90fce533de37beda5

README.md

DynamoRIO

DynamoRIO logo

About DynamoRIO

DynamoRIO is a runtime code manipulation system that supports code transformations on any part of a program, while it executes. DynamoRIO exports an interface for building dynamic tools for a wide variety of uses: program analysis and understanding, profiling, instrumentation, optimization, translation, etc. Unlike many dynamic tool systems, DynamoRIO is not limited to insertion of callouts/trampolines and allows arbitrary modifications to application instructions via a powerful IA-32/AMD64/ARM/AArch64 instruction manipulation library. DynamoRIO provides efficient, transparent, and comprehensive manipulation of unmodified applications running on stock operating systems (Windows, Linux, or Android) and commodity IA-32, AMD64, ARM, and AArch64 hardware. Mac OSX support is in progress.

Existing DynamoRIO-based tools

DynamoRIO is the basis for some well-known external tools:

The Arm Instruction Emulator (ArmIE)
WinAFL, the Windows fuzzing tool, as an instrumentation and code coverage engine
The fine-grained profiler for ARM DrCCTProf
The portable and efficient framework for fine-grained value profilers VClinic

Tools built on DynamoRIO and available in the release package include:

The memory debugging tool Dr. Memory
The tracing and analysis framework drmemtrace with multiple tools that operate on both online (with multi-process support) and offline instruction and memory address traces:
- The cache simulator drcachesim
- TLB simulation
- Reuse distance
- Reuse time
- Opcode mix
- Function call tracing
The legacy processor emulator drcpusim
The “strace for Windows” tool drstrace
The code coverage tool drcov
The library tracing tool drltrace
The memory address tracing tool memtrace (drmemtrace's offline traces are faster with more surrounding infrastructure, but this is a simpler starting point for customized memory address tracing)
The memory value tracing tool memval
The instruction tracing tool instrace (drmemtrace's offline traces are faster with more surrounding infrastructure, but this is a simpler starting point for customized instruction tracing)
The basic block tracing tool bbbuf
The instruction counting tool inscount
The dynamic fuzz testing tool Dr. Fuzz
The disassembly tool drdisas
And more, including opcode counts, branch instrumentation, etc.: see API samples

Building your own custom tools

DynamoRIO‘s powerful API abstracts away the details of the underlying infrastructure and allows the tool builder to concentrate on analyzing or modifying the application’s runtime code stream. API documentation is included in the release package and can also be browsed online. Slides from our past tutorials are also available.

Downloading DynamoRIO

DynamoRIO is available free of charge as a binary package for both Windows and Linux. DynamoRIO's source code is available primarily under a BSD license.

Obtaining Help

Use the discussion list to ask questions.

To report a bug, use the issue tracker.