commit | 6d1912f6f6655b9185aa6fc73243819c8e905cac | [log] [tgz] |
---|---|---|
author | Derek Bruening <bruening@google.com> | Thu Dec 14 22:35:23 2023 |
committer | GitHub <noreply@github.com> | Thu Dec 14 22:35:23 2023 |
tree | 23836198bea6d04381be6cd90fce533de37beda5 | |
parent | dede2bae2b43f1ce85747cbf7ddfe6efc919d8b8 [diff] |
i#6471 idle: Add better idle time modeling (#6505) Changes instruction quantum idle handling to use wall-clock time, instead of a counter decremented in queue pops. The counter was skewed unfairly by the rate of queue queries. This results in all quanta using a unified time-based approach for blocked time. Changes block_time_scale to remove the division by the latency threshold. Now the scale is directly multiplied by the latency to result in the blocked time units. The default scale is set to 1000 which matches the wall-clock time to process an instruction by the schedule_stats tool (about 1 instruction per microsecond) and which should be a rough match for many simulators passing one nanosecond cycle per instruction as the time unit. Adds a new scheduler option block_time_max and drcachesim option -sched_block_max which caps the blocked time for a syscall (default 25 seconds) to avoid outliers being scaled to extreme amounts of time. Adds a heartbeat of queue lookups, currently only in debug build, to help understand behavior over time. Sets the input ordinal to invalid while idle. This also causes schedule_stats to consider a transition from idle to a valid input to be a context switch, which matches how the Linux kernel counts switches. Augments schedule_stats with a wall-clock measure of cpu and idle time for a much fairer %cpu metric than the previous one based purely on record counts, as schedule_stats processes an instruction much more quickly than an idle record. Adds two %cpu metrics: one that does not include idle time past the final instruction (for skewed finishes across cores) and one that includes idle time until all cpus are done. Increases the default -schedule_stats_print_every to 500K to avoid extremely long strings for larger workloads. Updates the scheduler unit tests for the timing changes, changing the ones testing idle time to use a deterministic mock time quantum to avoid wall-clock time flakiness. Tested on large traces on dozens of cores with known idle characteristics where by tweaking the parameters I was able to get reasonably representative idle times. Issue: #6471
DynamoRIO is a runtime code manipulation system that supports code transformations on any part of a program, while it executes. DynamoRIO exports an interface for building dynamic tools for a wide variety of uses: program analysis and understanding, profiling, instrumentation, optimization, translation, etc. Unlike many dynamic tool systems, DynamoRIO is not limited to insertion of callouts/trampolines and allows arbitrary modifications to application instructions via a powerful IA-32/AMD64/ARM/AArch64 instruction manipulation library. DynamoRIO provides efficient, transparent, and comprehensive manipulation of unmodified applications running on stock operating systems (Windows, Linux, or Android) and commodity IA-32, AMD64, ARM, and AArch64 hardware. Mac OSX support is in progress.
DynamoRIO is the basis for some well-known external tools:
Tools built on DynamoRIO and available in the release package include:
DynamoRIO‘s powerful API abstracts away the details of the underlying infrastructure and allows the tool builder to concentrate on analyzing or modifying the application’s runtime code stream. API documentation is included in the release package and can also be browsed online. Slides from our past tutorials are also available.
DynamoRIO is available free of charge as a binary package for both Windows and Linux. DynamoRIO's source code is available primarily under a BSD license.
Use the discussion list to ask questions.
To report a bug, use the issue tracker.
See also the DynamoRIO home page: http://dynamorio.org/