Chrome Benchmarking System

Overview

This directory contains benchmarks and infrastructure to test Chrome and Chromium and output performance measurements. These benchmarks are continuously run on the perf waterfall.

For more information on how Chrome measures performance, see here.

Using The Chrome Benchmarking System

Analyzing Results From The Perf Waterfall

The ChromePerf Dashboard is the destination for all metrics generated by the perf waterfall. It provides tools to set up a dashboard for performance of a set of tests + metrics over time. In addition, it provides the ability to launch a bisection by selecting a point on the dashboard.

Running A Single Test

The Chrome Benchmarking System has two methods for manually running performance tests: run_benchmark and Pinpoint.

run_benchmark is useful for creating and debugging benchmarks using local devices. Run from the command line, it has a number of flags useful for determining the internal state of the benchmark. For more information, see here.

Pinpoint wraps run_benchmark and provides the ability to remotely run A/B benchmarks using any platform available in our lab. It will run a benchmark for as many iterations as needed to get a statistically significant result, then visualize it.

If your're trying to debug a test or figure out how the infrastructure works, the easiest way is to set up the debugger in VSCode (guide here)] and set a breakpoint in /tools/perf/core/benchmark_runner.py.

Creating New Tests (stories)

This document provides an oveview of how tests are structured and some of the underlying technologies. After reading that doc, figure out if your story fits into an existing benchmark by checking here (or here for non-Googlers).

If it does, follow the instructions next to it. If there are no instructions, find the test type in src/tools/perf/page_sets.
Otherwise, read this.

After figuring out where your story fits, create a new one. There is a considerable amount of variation between different benchmarks, so use a nearby story as a model. You may also need to introduce custom JavaScript to drive interactions on the page or to deal with nondeterminsim. For an example, search this file for browse:tools:sheets:2019.

Next, we need to create a recording of all the content requested by the test. We then use this by serving it from a web server running on either the test device or a host device (for Android tests), removing one more source of nondeterminism. To do so, follow these instructions.

Tools In This Directory

This directory contains a variety of tools that can be used to run benchmarks, interact with speed services, and manage performance waterfall configurations. It also has commands for running functional unittests.

run_tests

This command allows you to run functional tests against the python code in this directory. For example, try:

./run_tests results_dashboard_unittest

Note that the positional argument can be any substring within the test name.

This may require you to set up your gsutil config first.

run_benchmark

This command allows running benchmarks defined in the chromium repository, specifically in tools/perf/benchmarks. If you need it, documentation is available on how to run benchmarks locally and how to properly set up your device.

update_wpr

A helper script to automate various tasks related to the update of Web Page Recordings for our benchmarks. In can help creating new recordings from live websites, replay those to make sure they work, upload them to cloud storage, and finally send a CL to review with the new recordings.

flakiness_cli

A command line interface to the flakiness dashboard.

soundwave

Allows to fetch data from the Chrome Performance Dashboard and stores it locally on a SQLite database for further analysis and processing. It also allows defining studies, pre-sets of measurements a team is interested in tracking, and uploads them to cloud storage to visualize with the help of Data Studio. This currently backs the v8 and health dashboards.

pinboard

Allows scheduling daily [pinpoint][] jobs to compare measurements with/without a patch being applied. This is useful for teams developing a new feature behind a flag, who wants to track the effects on performance as the development of their feature progresses. Processed data for relevant measurements is uploaded to cloud storage, where it can be read by Data Studio. This also backs data displayed on the v8 dashboard.