tree: 4d3329c56bbaedcdb2bc2c658bb97f8e2e53255d [path history] [tgz]

src/trace_profiling/README.md

Trace Profiling Tools

Analyze

Analyze is a command-line tool for analyzing profiles produced by apitrace. It is used to analyze a single profile or pairs of profiles produced by running apitrace on the same trace on different platforms.

Producing profiles

Profiles are generated by running the apitrace command glretrace on a trace file. Analyze can parse profile info with GPU and CPU timing information. To produce such files, you must run glretrace with options --pcpu and --pgpu. Most of the time, you'll also want to use option --min-cpu-time=0 to ensure that the profile includes all calls. Without it, only calls taking 1 microsecond or more will be captured.

Example:

glretrace --pcpu --pgpu --min-cpu-time=0 traces_linux_10181_payday2_release.trace > profile_data.txt

Beware! When generating profiles by running traces in a Virgl environment, generating GPU timing can significantly skew the CPU timing. In such cases, it is better to capture CPU and GPU timings in separate profiles. (For the curious mind, this is because glretrace insert query operations around each call. Round-tripping this query ops through Virgl adds to the CPU time.)

Running Analyze

Once you have a profile and you‘ve successfully build analyze, you’re ready to analyze your first profile. Run analyze from within the bin directory as follows:

./bin/analyze <path to profile_data.txt>

Profiles can be very large and it might take a few minutes to load the profile. Once that is done, analyze will enter console mode:

Reading profile_data.txt....
6222 frames read from profile_data.txt
->

To get started, let's type a command to show the 5 most expensive frame by average CPU time:

-> show-frames n=5 s=bycpuavg
Profile: trime_trace_virgl.txt
  frame num   calls           GPU total            CPU total
---------------------------------------------------------------------
          3   64629             353.7 mS                3.6 S
       1119   18811             331.3 mS             513.4 mS
        912   35751             754.5 uS             348.5 mS
       5986    9316               4.7 mS             301.8 mS
       5861    7192               3.6 mS             243.9 mS
->

A few more useful tidbits about the console:

Type help to get basic help and help <command> to get more detailed help for that command.
The console supports history. Use the up/down arrows to move back and forth between older commands.
The console has limited support for tab completion.
Type quit to exit analyze. (Command history is preserved across runs.)

Analyzing dual profiles

Analyze is even more useful when processing two related profiles together. Two profiles are “related” if they were generated from the same trace, albeit on different platforms. To do so, run analyze with the two profiles as follows:

./bin/analyze <profile_data1.txt> <profile_data2.txt>

After loading the two profiles, analyze will check that they are compatible. (It does so by ensuring that matching calls in the two profiles call into the same GL function.)

Here's a sample command that works on two profiles simultaneously:

-> show-frame-details 1646 gt=0 ct=500000
Frame details for frames #1646 to 1646
                                                   GPU       CPU     GPU 1  CPU 1         GPU       CPU      GPU 2   CPU 2
 frame                      call name    call #    Prof 1    Prof 1    %      %           Prof 2    Prof 2     %       %
----------------------------------------------------------------------------------------------------------------------------
  1646                glCompileShader  10442713    0.0 nS    1.8 mS   0.0%   3.8%         0.0 nS    1.2 mS   0.0%   0.9%
  1646                  glLinkProgram  10442733    0.0 nS    7.5 mS   0.0%  15.8%         0.0 nS   20.9 mS   0.0%  14.9%
  1646            glDrawRangeElements  10442826   30.8 uS  733.6 uS   0.8%   1.5%        62.2 uS   38.1 uS   0.4%   0.0%
  1646                glCompileShader  10442841    0.0 nS    1.7 mS   0.0%   3.6%         0.0 nS    1.1 mS   0.0%   0.8%
  1646                glCompileShader  10442845    0.0 nS    4.2 mS   0.0%   8.9%         0.0 nS   24.6 uS   0.0%   0.0%
  1646                  glLinkProgram  10442866    0.0 nS    7.4 mS   0.0%  15.5%         0.0 nS   25.8 mS   0.0%  18.4%
  1646            glDrawRangeElements  10442953   29.4 uS  772.0 uS   0.8%   1.6%        64.6 uS   31.5 uS   0.4%   0.0%
  1646                        glFlush  10443734    0.0 nS   21.2 mS   0.0%  44.6%         0.0 nS  125.2 uS   0.0%   0.1%
1 frames out of 1 shown, or 100.0%

When analyzing two related profiles, it is even more important to use option --min-cpu-time=0 when generating the profiles with glretrace. That is because, without it, glretrace will filter out different calls on each platform. You will likely end up with non-matching subsets of GL calls in each profile. They will still load in analyze successfully, but they will be harder to compare accurately.

Trace Profiler

The trace profiler is a tool for gathering trace-based profiles from remote devices reached through SSH. The entire profiling activity is initiated from a host machine, which provides the binaries for the profiling tools, the trace files and storage space for receiving the profile data.

Run profile --help to print information about the available cmd-line options.

Reaching target devices through SSH

The trace profiler uses SSH both to run command on the target device and to copy files to and from the device. SSH is configure through a JSON file, such as the following:

{
  "addr": "100.107.108.177",
  "port": 22,
  "username": "root",
  "password": "test0000",
  "publicKeyFilepath": "/usr/local/google/home/gwink/.ssh/testing_rsa"
}

By default, trace_profiler reads the SSH configuration from file ssh_config.json in the current directory. Use cmd-line option -ssh-config=[filepath] to override the default.

SSH tunneling

SSH tunneling is useful to gather trace profiles from Crostini or Crouton instances that cannot be reached with a direct SSH connection. By default, the trace profiler doesn't use tunneling. However, if a JSON file with tunneling parameters is specified on the command line with -tunne-config=[filepath], then the trace profiler will attempt to setup a tunnel before establishing the SSH connection.

For example, a tunnel JSON configuration to a Crostini instance might look as follows:

{
  "localPort": 9922,
  "targetHostAddr": "penguin.linux.test",
  "targetPort": 22,
  "server": {
    "addr": "100.107.108.177",
    "port": 22,
    "username": "root",
    "password": "test0000",
    "publicKeyFilepath": "/usr/local/google/home/gwink/.ssh/testing_rsa"
  }
}

In this example, we use the SSH server on the Chromebook device at IP address 100.107.108.177. The tunnel forwards connections from port 9922 on the local host, to port 22 in the Crostini instance, refered to with its standard address penguin.linux.test.

Once the tunnel is established, a SSH connection from the host to the Crostini instance is setup with the following SSH parameters:

{
  "addr": "localhost",
  "port": 9922,
  "username": "<crostini username>",
  "password": "<crostini password>",
  "publicKeyFilepath": ""
}

Note: You may need to provide a path to a public key file, depending how the sshd server is setup in Crostini.

Generating profiles

Once the host can establish a connection to the target device, it is time to collect profiles. That process too is configured through a JSON file. By default it is read from file profile_config.json, but a different file can be specified with cmd-line option -profile-config=[filepath].

The profile configuration JSON looks as follows:

{
  "localTraceDir": "/usr/local/google/home/<username>/sd-gfx/Gaming/traces/",
  "targetTraceDir": "/home/<username>/traces/",
  "traces": ["traces_linux_10127_borderlands2.trace"],
  "keepTraceOnTarget": false,
  "localProfileDir": "/usr/local/google/home/<username>/profiles",
  "profileNameSuffix": ".prof",
  "localProfAppPath": "/usr/local/google/home/<username/src/apitrace/build/",
  "targetProfAppPath": "/home/<username>/apitrace/",
  "profCommand": "/home/<username>/apitrace/glretrace --pcpu --pgpu --min-cpu-time=0 [[trace-file]] > [[prof-file]]",
  "targetDisplay": "0"
}

Here's more info about each option:

localTraceDir: path to the directory that holds the trace files on the host.
targetTraceDir: path to directory to receive the trace files on the target.
traces: list of trace file names.
keepTraceOnTarget: set this to true to keep the trace files on the target device after each profile run. (Beware, trace files can be quite large.) Otherwise, the trace files are deleted immediately after each profile run.
localProfileDir: path of directory where to store the gernerated profile files the on host.
profileNameSuffix: append this to the trace file name to generate profile file name. For example, with “.prof”, traces_linux_10127_borderlands2.trace becomes traces_linux_10127_borderlands2.trace.prof.
localProfAppPath: Path to binaries, compiled for the target device, for the profiling tool. If this path points to a binary file, that single file is copied to the target device. If the path is for a directory, the entire directory content is copied to the device, albeit non-recursively.
targetProfAppPath: Path to directory where to copy the tool's binaries on the target device.
profCommand: This parameter is used to construct the command line to invoke to generate the profile. Placeholder [[trace-file]] is replaced by the path to the input trace file on the target device. Placeholder [[prof-file]] is replaced by the output profile filepath on the target device.
targetDisplay: This is the DISPLAY to use when running the profile command, usually "0" or "1".

Configuration bundle

A configuration bundle is a handy way to specify all configuration properties in a single JSON bundle togather with the ability to override individual properties. A configuration bundle might look as follows:

{
  "files": {
    "sshConfigFile": "ssh_config.json",
    "tunnelConfigFile": "tunnel_config.json",
    "profileConfigFile": "profile_config.json"
  },
  "configs": {
    "profile": {
      "traces": [
        "traces_linux_10127_borderlands2.trace"
      ],
      "profCommand": "/home/gwink/apitrace/glretrace [[trace-file]] > [[prof-file]]"
    }
  }
}

In this example, the base ssh, tunnel and profile configurations are read from the files specified in the files section. However, the configs section overrides two properties in the profile configuration, namely traces and profCommand.

Either section is optional. That is, the configurations could be specified entirely with the files without any ovveride in configs. Or the configurations could be fully specified with the configs section without needing to read data from any file.

The file containing the configuration bundle is specified with cmd-line option -config-bundle.

Merging Profiles

In most cases, it is preferable to gather CPU and GPU profiles separately. That is because measuring GPU timing can significantly skew CPU timing, especially in Crostini. However, the analysis tool expects GPU and CPU timing data to be present in a same file. The merge tool takes a GPU and CPU profile and merge them into a single profile that can be consumed by the analysis tool.

Example:

merge --gpu profile_with_gpu_data.prof --cpu profile_with_cpu_data.prof > profile.prof

A few caveats:

Obviously, the profiles should be gathered on the same device with the same trace.
You most likely want to specify --min-cpu-time=0 when gathering the profiles with glRetrace. That ensures that both profiles contain identical call data. Without it, some API calls may be in one profile and not the other, causing merge issues.

Harvest

Harvest is a multifaceted tool that works with trace archives in Google storage and can perform the following tasks:

Dowload archives, extract trace files and verify them.
Run the trace on attached Crouton and Crostini devices and collect profile data.
Extract the FPS data from the profile, compare and print the output to another file.

Havest is configure through a JSON file that takes the following form:

harvest-config.json:
{
  "traces": [
    "gs://chromeos-gfx-traces-incoming/steam.copied/final/steam_440-team_fortress_2-nami-20200116_165325.tar",
    "gs://chromeos-gfx-traces-incoming/steam.copied/final/steam_49520-borderlands_2-nami-20200122_153040.tar"
  ],
  "traceCacheDir": "<path to a dir where game archives and traces are downloaded and cached>",
	"keepTracesInCache": true,
	"profileBinPath": "<path to exe for companion profile application>",
	"crostiniBundleTemplate": "<path to profile configuration-bundle template for crostini>",
  "croutonBundleTemplate": "<path to profile configuration-bundle template for crouton>"
}

Note that Harvest tries to run several actions simultaneously. For instance, if you specify several traces, it will download the next trace while the current trace is being profiles on the DUTs. Likewise, it will profile a trace simultaneously on attached crostini and crouton devices.

Bundle Templates

The configuration JSON file has two properties named crostiniBundleTemplate and croutonBundleTemplate. These properties point to template files that are used to generate config JSON files for the companion Profile application. These template files look as follows:

crostini_bundle_template.json:
{
  "files": {
    "sshConfigFile": "ssh_crostini_config.json",
    "tunnelConfigFile": "crostini_tunnel_config.json",
    "profileConfigFile": "crostini_profile_config.json"
  },
  "configs": {
    "profile": {
      "localTraceDir": "<<trace-dir>>",
      "localProfileDir": "/path/to/profiles/dir/",
      "profileNameSuffix": ".crostini.prof",
          "traces": [
        "<<trace-file>>"],
      "profCommand": "/home/gwink/apitrace/glretrace -b [[trace-file]] > [[prof-file]]"
    }
  }
}

Much of that file is exactly like the config-bundle JSON file shown for the Profile application, a bit earlier in this document. However,there are templates entries, <<trace-dir>> and <<trace-file>> we must be left as-is; Harvest will substitute the actual dir and file paths into these entries when running the profiles.

Command-line options

Harvest takes several command-line options. They are:

-config: specify path to configuration JSON file.
--no-crostini: when set, do not run the trace on the Crostini device.
--no-crouton: when set, do not run the trace on the Crouton device.
--compare-fps: enable FPS comparison step after profiling.
-out: specify path the comparison-output file (default “compare_out.prof”)
--verbose: enable verbose mode

If both options no-crostini and no-crouton are set, then Harvest effectively becomes an automatic trace downloader and unarchiver.

FPS comparison

When FPS-comparison is enabled, through the command-line option compare-fps, harvest output FPS data in the following form:

                      Trace name  Crostini   Crouton      %
           RimWorld-294100.trace     42.20    108.33     38.95%
Euro_Truck_Simulator_2-227300.trace     25.16     45.45     55.37%
American_Truck_Simulator-270880.trace     26.58     42.46     62.61%
Kerbal_Space_Program-220200.trace     37.42     50.94     73.46%
           Unturned-304930.trace     27.68     36.79     75.24%
  Crusader_Kings_II-203770.trace     37.87     41.05     92.24%
         Counter-Strike-10.trace     55.06    149.01     36.95%
         Counter-Strike-10.trace     55.69    149.42     37.27%
         Counter-Strike-10.trace     55.23    144.98     38.10%

New FPS data is always appended to the file.