Chrome Benchmarking System

Overview

This directory contains benchmarks and infrastructure to test Chrome and Chromium and output performance measurements. These benchmarks are continuously run on the perf waterfall.

For more information on how Chrome measures performance, see here.

Using The Chrome Benchmarking System

Analyzing Results From The Perf Waterfall

The ChromePerf Dashboard is the destination for all metrics generated by the perf waterfall. It provides tools to set up a dashboard for performance of a set of tests + metrics over time. In addition, it provides the ability to launch a bisection by selecting a point on the dashboard.

Running A Single Test

The Chrome Benchmarking System has two methods for manually running performance tests: run_benchmark and Pinpoint.

run_benchmark is useful for creating and debugging benchmarks using local devices. Run from the command line, it has a number of flags useful for determining the internal state of the benchmark. For more information, see here.

Pinpoint wraps run_benchmark and provides the ability to remotely run A/B benchmarks using any platform available in our lab. It will run a benchmark for as many iterations as needed to get a statistically significant result, then visualize it.

If your're trying to debug a test or figure out how the infrastructure works, the easiest way is to set up the debugger in VSCode (guide here)] and set a breakpoint in /tools/perf/core/benchmark_runner.py.

Creating New Tests (stories)

This document provides an oveview of how tests are structured and some of the underlying technologies. After reading that doc, figure out if your story fits into an existing benchmark by checking here (or here for non-Googlers).

If it does, follow the instructions next to it. If there are no instructions, find the test type in src/tools/perf/page_sets.
Otherwise, read this.

After figuring out where your story fits, create a new one. There is a considerable amount of variation between different benchmarks, so use a nearby story as a model. You may also need to introduce custom JavaScript to drive interactions on the page or to deal with nondeterminsim. For an example, search this file for browse:tools:sheets:2019.

Next, we need to use WPR (WebPageReplay) to record all of the content requested by the test. By default, tests spin up a local webserver using these recordings, removing one source of nondeterminism. To do that, run:

./tools/perf/record_wpr --browser=system --story-filter=STORY_NAME BENCHMARK_NAME

Next, we need to verify the recording works. To do so, run the test:

./tools/perf/run_benchmark run BENCHMARK_NAME --browser=system --story-filter=STORY_NAME

After running this, you will need to verify the following:

Does the browser behave the same as it did when creating the recording? If not, is the difference in behavior acceptable?
Are there any concerning errors generated by Chrome when running run_benchmark? These will appear in the output of run_benchmark.
Check the benchmarks in the link generated by run_benchmark. Does everything look reasonable?

If any problems were encountered, review or add custom JavaScript as described in the previous section. Alternatively, ask for help.

If everything looks good, upload your WPR archive by following the instructions in Upload the recording to Cloud Storage and create a CL.

Tools In This Directory

This directory contains a variety of tools that can be used to run benchmarks, interact with speed services, and manage performance waterfall configurations. It also has commands for running functional unittests.

run_tests

This command allows you to run functional tests against the python code in this directory. For example, try:

./run_tests results_dashboard_unittest

Note that the positional argument can be any substring within the test name.

This may require you to set up your gsutil config first.

run_benchmark

This command allows running benchmarks defined in the chromium repository, specifically in tools/perf/benchmarks. If you need it, documentation is available on how to run benchmarks locally and how to properly set up your device.

update_wpr

A helper script to automate various tasks related to the update of Web Page Recordings for our benchmarks. In can help creating new recordings from live websites, replay those to make sure they work, upload them to cloud storage, and finally send a CL to review with the new recordings.

pinpoint_cli

A command line interface to the pinpoint service. Allows to create new jobs, check the status of jobs, and fetch their measurements as csv files.

flakiness_cli

A command line interface to the flakiness dashboard.

soundwave

Allows to fetch data from the Chrome Performance Dashboard and stores it locally on a SQLite database for further analysis and processing. It also allows defining studies, pre-sets of measurements a team is interested in tracking, and uploads them to cloud storage to visualize with the help of Data Studio. This currently backs the v8 and health dashboards.

pinboard

Allows scheduling daily pinpoint jobs to compare measurements with/without a patch being applied. This is useful for teams developing a new feature behind a flag, who wants to track the effects on performance as the development of their feature progresses. Processed data for relevant measurements is uploaded to cloud storage, where it can be read by Data Studio. This also backs data displayed on the v8 dashboard.