Analyze is a command-line tool for analyzing ApiTrace profiles collected with Harvest.
Profiles are generated by running an apitrace command, such as glretrace
, with a trace file on a target device. Analyze can parse profile info with GPU and CPU timing information. To produce such files, you must run glretrace
or eglretrace
with options --pcpu
and --pgpu
. Most of the time, you'll also want to use option --min-cpu-time=0
to ensure that the profile includes all calls. Without it, only calls taking 1 microsecond or more will be captured.
Example:
glretrace --pcpu --pgpu --min-cpu-time=0 traces_linux_10181_payday2_release.trace > profile_data.prof
Beware! When generating profiles by running traces in a Virgl environment, generating GPU timing can significantly skew the CPU timing. In such cases, it is better to capture CPU and GPU timings in separate profiles. (For the curious mind, this is because (e)glretrace insert query operations around each call. Round-tripping this query ops through Virgl adds to the CPU time.)
Once you have a profile run analyze as follows:
> analyze <path to profile_data.prof>
Profiles can be very large and it might take a few minutes for analyze to load them. Once that is done, analyze enters console mode:
Reading profile_data.txt.... 6222 frames read from profile_data.txt ->
To get started, let's type a command to show the 5 most expensive frame by average CPU time:
-> show-frames n=5 s=bycpuavg Profile: trime_trace_virgl.txt frame num calls GPU total CPU total --------------------------------------------------------------------- 3 64629 353.7 mS 3.6 S 1119 18811 331.3 mS 513.4 mS 912 35751 754.5 uS 348.5 mS 5986 9316 4.7 mS 301.8 mS 5861 7192 3.6 mS 243.9 mS ->
A few more useful tidbits about the console:
help
to get basic help and help <command>
to get more detailed help for that command.quit
to exit analyze. (Command history is preserved across runs.)Analyze is even more useful when processing two related profiles together. Two profiles are “related” if they were generated from the same trace, albeit on different devices. Run analyze with the two profiles as follows:
./bin/analyze <profile_data1.prof> <profile_data2.prof>
After loading the two profiles, analyze will check that they are compatible. (It does so by ensuring that matching calls in the two profiles call into the same GL function.)
Here's a sample command that works on two profiles simultaneously:
-> show-frame-details 1646 gt=0 ct=500000 Frame details for frames #1646 to 1646 GPU CPU GPU 1 CPU 1 GPU CPU GPU 2 CPU 2 frame call name call # Prof 1 Prof 1 % % Prof 2 Prof 2 % % ---------------------------------------------------------------------------------------------------------------------------- 1646 glCompileShader 10442713 0.0 nS 1.8 mS 0.0% 3.8% 0.0 nS 1.2 mS 0.0% 0.9% 1646 glLinkProgram 10442733 0.0 nS 7.5 mS 0.0% 15.8% 0.0 nS 20.9 mS 0.0% 14.9% 1646 glDrawRangeElements 10442826 30.8 uS 733.6 uS 0.8% 1.5% 62.2 uS 38.1 uS 0.4% 0.0% 1646 glCompileShader 10442841 0.0 nS 1.7 mS 0.0% 3.6% 0.0 nS 1.1 mS 0.0% 0.8% 1646 glCompileShader 10442845 0.0 nS 4.2 mS 0.0% 8.9% 0.0 nS 24.6 uS 0.0% 0.0% 1646 glLinkProgram 10442866 0.0 nS 7.4 mS 0.0% 15.5% 0.0 nS 25.8 mS 0.0% 18.4% 1646 glDrawRangeElements 10442953 29.4 uS 772.0 uS 0.8% 1.6% 64.6 uS 31.5 uS 0.4% 0.0% 1646 glFlush 10443734 0.0 nS 21.2 mS 0.0% 44.6% 0.0 nS 125.2 uS 0.0% 0.1% 1 frames out of 1 shown, or 100.0%
When analyzing two related profiles, it is even more important to use option --min-cpu-time=0
when generating the profiles with glretrace
. That is because, without it, glretrace
will filter out different calls on each platform. You will likely end up with non-matching subsets of GL calls in each profile. They will still load successfully in analyze, but they will be harder to compare accurately.
The following links show examples of Analyze being used to investigate actual performance issues in graphics applications.