GPU Bot Details

This page describes in detail how the GPU bots are set up, which files affect their configuration, and how to both modify their behavior and add new bots.

Overview of the GPU bots' setup

Chromium‘s GPU bots, compared to the majority of the project’s test machines, are physical pieces of hardware. When end users run the Chrome browser, they are almost surely running it on a physical piece of hardware with a real graphics processor. There are some portions of the code base which simply can not be exercised by running the browser in a virtual machine, or on a software implementation of the underlying graphics libraries. The GPU bots were developed and deployed in order to cover these code paths, and avoid regressions that are otherwise inevitable in a project the size of the Chromium browser.

The GPU bots are utilized on the chromium.gpu and waterfalls, and various tryservers, as described in Using the GPU Bots.

All of the physical hardware for the bots lives in the Swarming pool, and most of it in the chromium.tests.gpu Swarming pool. The waterfall bots are simply virtual machines which spawn Swarming tasks with the appropriate tags to get them to run on the desired GPU and operating system type. So, for example, the Win10 x64 Release (NVIDIA) bot is actually a virtual machine which spawns all of its jobs with the Swarming parameters:

    "gpu": "10de:1cb3-",
    "os": "Windows-10",
    "pool": "chromium.tests.gpu"

Since the GPUs in the Swarming pool are mostly homogeneous, this is sufficient to target the pool of Windows 10-like NVIDIA machines. (There are a few Windows 7-like NVIDIA bots in the pool, which necessitates the OS specifier.)

Details about the bots can be found on and by using src/tools/swarming_client/, for example bots. If you are authenticated with credentials you will be able to make queries of the bots and see, for example, which GPUs are available.

The waterfall bots run tests on a single GPU type in order to make it easier to see regressions or flakiness that affect only a certain type of GPU.

The tryservers like win_chromium_rel_ng which include GPU tests, on the other hand, run tests on more than one GPU type. As of this writing, the Windows tryservers ran tests on NVIDIA and AMD GPUs; the Mac tryservers ran tests on Intel and NVIDIA GPUs. The way these tryservers' tests are specified is simply by mirroring how one or more waterfall bots work. This is an inherent property of the chromium_trybot recipe, which was designed to eliminate differences in behavior between the tryservers and waterfall bots. Since the tryservers mirror waterfall bots, if the waterfall bot is working, the tryserver must almost inherently be working as well.

There are a few one-off GPU configurations on the waterfall where the tests are run locally on physical hardware, rather than via Swarming. A few examples are:

There are a couple of reasons to continue to support running tests on a specific machine: it might be too expensive to deploy the required multiple copies of said hardware, or the configuration might not be reliable enough to begin scaling it up.

Adding a new isolated test to the bots

Adding a new test step to the bots requires that the test run via an isolate. Isolates describe both the binary and data dependencies of an executable, and are the underpinning of how the Swarming system works. See the LUCI wiki for background on Isolates and Swarming.

Adding a new isolate

  1. Define your target using the template("test") template in src/testing/test.gni. See test("gl_tests") in src/gpu/ for an example. For a more complex example which invokes a series of scripts which finally launches the browser, see src/chrome/telemetry_gpu_test.isolate.
  2. Add an entry to src/testing/buildbot/gn_isolate_map.pyl that refers to your target. Find a similar target to yours in order to determine the type. The type is referenced in src/tools/mb/mb_config.pyl.

At this point you can build and upload your isolate to the isolate server.

See Isolated Testing for SWEs for the most up-to-date instructions. These instructions are a copy which show how to run an isolate that's been uploaded to the isolate server on your local machine rather than on Swarming.

If cd'd into src/:

  1. ./tools/mb/ isolate //out/Release [target name]
    • For example: ./tools/mb/ isolate //out/Release angle_end2end_tests
  2. python tools/swarming_client/ batcharchive -I out/Release/[target name].isolated.gen.json
    • For example: python tools/swarming_client/ batcharchive -I out/Release/angle_end2end_tests.isolated.gen.json
  3. This will write a hash to stdout. You can run it via: python tools/swarming_client/ -I -s [HASH] -- [any additional args for the isolate]

See the section below on isolate server credentials.

Adding your new isolate to the tests that are run on the bots

See Adding new steps to the GPU bots for details on this process.

Relevant files that control the operation of the GPU bots

In the tools/build workspace:

  • masters/master.chromium.gpu and masters/
    • builders.pyl in these two directories defines the bots that show up on the waterfall. If you are adding a new bot, you need to add it to builders.pyl and use go/bug-a-trooper to request a restart of either master.chromium.gpu or
    • Only changes under masters/ require a waterfall restart. All other changes – for example, to scripts/slave/ in this workspace, or the Chromium workspace – do not require a master restart (and go live the minute they are committed).
  • scripts/slave/recipe_modules/chromium_tests/:
    • and define the following for each builder and tester:
      • How the workspace is checked out (e.g., this is where top-of-tree ANGLE is specified)
      • The build configuration (e.g., this is where 32-bit vs. 64-bit is specified)
      • Various gclient defines (like compiling in the hardware-accelerated video codecs, and enabling compilation of certain tests, like the dEQP tests, that can't be built on all of the Chromium builders)
      • Note that the GN configuration of the bots is also controlled by mb_config.pyl in the Chromium workspace; see below.
    • defines how try bots mirror one or more waterfall bots.
      • The concept of try bots mirroring waterfall bots ensures there are no differences in behavior between the waterfall bots and the try bots. This helps ensure that a CL will not pass the commit queue and then break on the waterfall.
      • This file defines the behavior of the following GPU-related try bots:
        • linux-rel, mac-rel, and win7-rel, which run against every Chromium CL, and which mirror the behavior of bots on the chromium.gpu waterfall.
        • The ANGLE try bots, which run against ANGLE CLs, and mirror the behavior of the waterfall (including using top-of-tree ANGLE, and running additional tests not run by the regular Chromium try bots)
      • The optional GPU try servers linux_optional_gpu_tests_rel, mac_optional_gpu_tests_rel and win_optional_gpu_tests_rel, which are triggered manually and run some tests which can't be run on the regular Chromium try servers mainly due to lack of hardware capacity.

In the chromium/src workspace:

In the infradata/config workspace (Google internal only, sorry):

    • Defines a chromium.tests.gpu Swarming pool which contains most of the specialized hardware: as of this writing, the Windows and Linux NVIDIA bots, the Windows AMD bots, and the MacBook Pros with NVIDIA and AMD GPUs. New GPU hardware should be added to this pool.

Walkthroughs of various maintenance scenarios

This section describes various common scenarios that might arise when maintaining the GPU bots, and how they'd be addressed.

How to add a new test or an entire new step to the bots

This is described in Adding new tests to the GPU bots.

How to set up new virtual machine instances

The tests use virtual machines to build binaries and to trigger tests on physical hardware. VMs don't run any tests themselves. Nevertheless the OS of the VM must match the OS of the physical hardware. Android uses Linux VMs for the hosts.

  1. If you need a Mac VM:

    1. File a Chrome Infrastructure Labs ticket requesting 2 virtual machines for the testers. See this example ticket.
    2. Follow the instructions below to add an association between those VM names and the bot names you're adding to and regenerate the auto-generated files.
  2. If you need a non-Mac VM, VMs are allocated using the GCE Provider APIs:

    1. Create a CL in the infradata/config (Google internal) workspace which does the following. Git configure your to if necessary. For reference, see these example CLs:

      1. Adding both Linux and Windows VMs for trybots.
      2. Adding a Linux VM for a waterfall bot.
      3. Adding a Windows VM for a waterfall bot.
    2. Edit to add an entry for the new bot. Currently, the only way to limit the number of concurrent builds per bot is to limit the number of VMs associated with it. This means that each new bot requires a new prefix. Add your new entry to the correct block:

      1. Put waterfall bots under gpu_ci_bots. For example:
        gce_thin_trusty('linux-fyi-skiarenderer-vulkan-nvidia', 'us-east1-c') or
      2. Put trybots under the appropriate gpu_try_bots block (optional GPU trybots, ANGLE trybots, etc.). For example:
    3. Run to regenerate configs/chromium-swarm/bots.cfg and ‘configs/gce-provider/vms.cfg’. Double-check your work there.

      Note that previously vms.cfg had to be editted manually. Part of the difficulty was in choosing a zone. This should soon no longer be necessary per, but consult with the Chrome Infra team to find out which of the zones has available capacity.

    4. Get this reviewed and landed. This step associates the VM or pool of VMs with the bot's name on the waterfall.

How to add a new tester bot to the waterfall

When deploying a new GPU configuration, it should be added to the waterfall first. The chromium.gpu waterfall should be reserved for those GPUs which are tested on the commit queue. (Some of the bots violate this rule – namely, the Debug bots – though we should strive to eliminate these differences.) Once the new configuration is ready to be fully deployed on tryservers, bots can be added to the chromium.gpu waterfall, and the tryservers changed to mirror them.

In order to add Release and Debug waterfall bots for a new configuration, experience has shown that at least 4 physical machines are needed in the swarming pool. The reason is that the tests all run in parallel on the Swarming cluster, so the load induced on the swarming bots is higher than it would be if the tests were run strictly serially.

With these prerequisites, these are the steps to add a new (swarmed) tester bot. (Actually, pair of bots -- Release and Debug. If deploying just one or the other, ignore the other configuration.) These instructions assume that you are reusing one of the existing builders, like GPU FYI Win Builder.

  1. Work with the Chrome Infrastructure Labs team to get the (minimum 4) physical machines added to the Swarming pool. Use or src/tools/swarming_client/ bots to determine the PCI IDs of the GPUs in the bots. (These instructions will need to be updated for Android bots which don't have PCI buses.)

    1. Make sure to add these new machines to the chromium.tests.gpu Swarming pool by creating a CL against in the infradata/config (Google internal) workspace. Git configure your to if necessary. Here is one example CL and a second example.

    2. Run to regenerate configs/chromium-swarm/bots.cfg. Double-check your work there.

  2. Allocate new virtual machines for the bots as described in How to set up new virtual machine instances.

  3. Create a CL in the Chromium workspace which does the following. Here's an example CL.

    1. Adds the new machines to waterfalls.pyl.
      1. The swarming dimensions are crucial. These must match the GPU and OS type of the physical hardware in the Swarming pool. This is what causes the VMs to spawn their tests on the correct hardware. Make sure to use the chromium.tests.gpu pool, and that the new machines were specifically added to that pool.
      2. Make triply sure that there are no collisions between the new hardware you're adding and hardware already in the Swarming pool. For example, it used to be the case that all of the Windows NVIDIA bots ran the same OS version. Later, the Windows 8 flavor bots were added. In order to avoid accidentally running tests on Windows 8 when Windows 7 was intended, the OS in the swarming dimensions of the Win7 bots had to be changed from win to Windows-2008ServerR2-SP1 (the Win7-like flavor running in our data center). Similarly, the Win8 bots had to have a very precise OS description (Windows-2012ServerR2-SP0).
      3. If you‘re deploying a new bot that’s similar to another existing configuration, please search around in src/testing/buildbot/test_suite_exceptions.pyl for references to the other bot‘s name and see if your new bot needs to be added to any exclusion lists. For example, some of the tests don’t run on certain Win bots because of missing OpenGL extensions.
      4. Run to regenerate src/testing/buildbot/
    2. Updates cr-buildbucket.cfg:
      • Add the two new machines (Release and Debug) inside the bucket. This sets up storage for the builds in the system. Use the appropriate mixin; for example, “win-gpu-fyi-ci” has already been set up for Windows GPU FYI bots on the waterfall.
    3. Updates luci-scheduler.cfg:
      • Add new “job” blocks for your new Release and Debug test bots. They should go underneath the builder which triggers them (like “GPU Win FYI Builder”), in alphabetical order. Make sure the “id” and “builer” entries match. This job block should use the acl_sets “triggered-by-parent-builders”, because it's triggered by the builder, and not by changes to the git repository.
    4. Updates luci-milo.cfg:
      • Add new “builders” blocks for your new testers (Release and Debug) on the console. Look at the short names and categories and try to come up with a reasonable organization.
    5. If you were adding a new builder, you would need to also add the new machine to src/tools/mb/mb_config.pyl.
  4. After the Chromium-side CL lands it will take some time for all of the configuration changes to be picked up by the system. The bot will probably be in a red or purple state, claiming that it can't find its configuration. (It might also be in an “empty” state, not running any jobs at all.)

  5. After the Chromium-side CL lands and the bot is on the console, create a CL in the tools/build workspace which does the following. Here's an example CL.

    1. Adds the new VMs to in scripts/slave/recipe_modules/chromium_tests/. Make sure to set the serialize_tests property to True. This is specified for waterfall bots, but not trybots, and helps avoid overloading the physical hardware. Double-check the BUILD_CONFIG and parent_buildername properties for each. They must match the Release/Debug flavor of the builder, like GPU FYI Win Builder vs. GPU FYI Win Builder (dbg).
    2. Get this reviewed and landed. This step tells the Chromium recipe about the newly-deployed waterfall bot, so it knows which JSON file to load out of src/testing/buildbot and which entry to look at.
    3. It used to be necessary to retrain recipe expectations (scripts/slave/ --use-bootstrap test train). This doesn‘t appear to be necessary any more, but it’s something to watch out for if your CL fails presubmit for some reason.
  6. Note that it is crucial that the bot be deployed before hooking it up in the tools/build workspace. In the new LUCI world, if the parent builder can‘t find its child testers to trigger, that’s a hard error on the parent. This will cause the builders to fail. You can and should prepare the tools/build CL in advance, but make sure it doesn‘t land until the bot’s on the console.

How to start running tests on a new GPU type on an existing try bot

Let's say that you want to cause the win_chromium_rel_ng try bot to run tests on CoolNewGPUType in addition to the types it currently runs (as of this writing, NVIDIA and AMD). To do this:

  1. Make sure there is enough hardware capacity. Unfortunately, tools to report utilization of the Swarming pool are still being developed, but a back-of-the-envelope estimate is that you will need a minimum of 30 machines in the Swarming pool to run the current set of GPU tests on the tryservers. We estimate that 90 machines will be needed in order to additionally run the WebGL 2.0 conformance tests. Plan for the larger capacity, as it's desired to run the larger test suite on as many configurations as possible.
  2. Deploy Release and Debug testers on the chromium.gpu waterfall, following the instructions for the waterfall above. You will also need to temporarily add suppressions to tests/ for these new testers since they aren't yet covered by try bots and are going on a non-FYI waterfall. Make sure these run green for a day or two before proceeding.
  3. Create a CL in the tools/build workspace, adding the new Release tester to win_chromium_rel_ng's bot_ids list in scripts/slave/recipe_modules/chromium_tests/ Rerun scripts/slave/ --use-bootstrap test train.
  4. Once the CL in (3) lands, the commit queue will immediately start running tests on the CoolNewGPUType configuration. Be vigilant and make sure that tryjobs are green. If they are red for any reason, revert the CL and figure out offline what went wrong.

How to add a new manually-triggered trybot

There are a lot of one-off GPU types on the waterfall and sometimes a failure happens just on one type. It's helpful to just be able to send a tryjob to a particular machine. Doing so requires a specific trybot to be set up because most if not all of the existing trybots trigger tests on more than one type of GPU.

Here are the steps to set up a new trybot which runs tests just on one particular GPU type. Let's consider that we are adding a manually-triggered trybot for the Win7 NVIDIA GPUs in Release mode. We will call the new bot gpu_manual_try_win7_nvidia_rel.

  1. Allocate new virtual machines for the bots as described in How to set up new virtual machine instances, following the “trybot” instructions.

  2. Create a CL in the Chromium workspace which does the following. Here's an example CL.

    1. Updates cr-buildbucket.cfg:
      • Add the new trybot to the luci.chromium.try bucket. This is a one-liner, with “name” being “gpu_manual_try_win7_nvidia_rel” and “mixins” being the OS-appropriate mixin, in this case “win-optional-gpu-try”. (We're repurposing the existing ACLs for the “optional” GPU trybots for these manually-triggered ones.)
    2. Updates luci-milo.cfg:
      • Add “builders” blocks for the new trybot to the luci.chromium.try and consoles.
    3. Adds the new trybot to src/tools/mb/mb_config.pyl. Reuse the same mixin as for the optional GPU trybot; in this case, gpu_fyi_tests_release_trybot_x86.
    4. Get this CL reviewed and landed.
  3. Create a CL in the tools/build workspace which does the following. Here's an example CL.

    1. Adds the new trybot to a “Manually-triggered GPU trybots” section in scripts/slave/recipe_modules/chromium_tests/ Create this section after the “Optional GPU bots” section for the appropriate tryserver (, tryserver.chromium.mac, tryserver.chromium.linux, Have the bot mirror the appropriate waterfall bot; in this case, the buildername to mirror is GPU FYI Win Builder and the tester is Win7 FYI Release (NVIDIA).
    2. Adds an exception for your new trybot in tests/, under FAKE_BUILDERS, under the appropriate tryserver waterfall (in this case, This is because this is a LUCI-only bot, and this test verifies the old buildbot configurations.
    3. Get this reviewed and landed. This step tells the Chromium recipe about the newly-deployed trybot, so it knows which JSON file to load out of src/testing/buildbot and which entry to look at to understand which tests to run and on what physical hardware.
    4. It used to be necessary to retrain recipe expectations (scripts/slave/ --use-bootstrap test train). This doesn‘t appear to be necessary any more, but it’s something to watch out for if your CL fails presubmit for some reason.

At this point the new trybot should automatically show up in the “Choose tryjobs” pop-up in the Gerrit UI, under the luci.chromium.try heading, because it was deployed via LUCI. It should be possible to send a CL to it.

(It should not be necessary to modify buildbucket.config as is mentioned at the bottom of the “Choose tryjobs” pop-up. Contact the chrome-infra team if this doesn't work as expected.)

How to add a new try bot that runs a subset of tests or extra tests

Several projects (ANGLE, Dawn) run custom tests using the Chromium recipes. They use try bot bot configs that run subsets of Chromium or additional slower tests that can't be run on the main CQ.

These try bots are a little different because they mirror waterfall bots that don‘t actually exist. The waterfall bots’ specifications exist only to tell these try bots which tests to run.

Let's say that you intended to add a new such custom try bot on Windows. Call it win-myproject-rel for example. You will need to add a “fake” mirror bot for each GPU family the tests you will need to run. For a GPU type of “CoolNewGPUType” in this example you could add a “fake” bot named “MyProject GPU Win10 Release (CoolNewGPUType)”.

  1. Allocate new virtual machines for the bots as described in How to set up new virtual machine instances.
  2. Make sure that you have some swarming capacity for the new GPU type. Since it‘s not running against all Chromium CLs you don’t need the recommended 30 minimum bots, though ~10 would be good.
  3. Create a CL in the Chromium workspace the does the following. Here's an example CL.
    1. Add your new bot (for example, “MyProject GPU Win10 Release (CoolNewGPUType)”) to the waterfall in waterfalls.pyl.
    2. Re-run src/testing/buildbot/ to regenerate the JSON files.
    3. Update cr-buildbucket.cfg to add win-myproject-rel.
    4. Update luci-milo.cfg to include win-myproject-rel.
    5. Update luci-scheduler.cfg to include “MyProject GPU Win10 Release (CoolNewGPUType)”.
    6. Update src/tools/mb/mb_config.pyl to include win-myproject-rel.
    7. Also add your fake bot to src/testing/buildbot/ in the list of get_bots_that_do_not_actually_exist section.
  4. After the Chromium-side CL lands and the bot is on the console, create a CL in the tools/build workspace which does the following. Here's an example CL.
    1. Adds “MyProject GPU Win10 Release (CoolNewGPUType)” to in scripts/slave/recipe_modules/chromium_tests/. You can copy a similar step.
    2. Adds win-myproject-rel to in the same folder. This is where you associate “MyProject GPU Win10 Release (CoolNewGPUType)” with win-myproject-rel. See the sample CL for an example.
    3. Get this reviewed and landed. This step tells the Chromium recipe about the newly-deployed waterfall bot, so it knows which JSON file to load out of src/testing/buildbot and which entry to look at.
  5. After your CLs land you should be able to find and run win-myproject-rel on CLs using Choose Trybots in Gerrit.

How to test and deploy a driver and/or OS update

Let‘s say that you want to roll out an update to the graphics drivers or the OS on one of the configurations like the Linux NVIDIA bots. In order to verify that the new driver or OS won’t destabilize Chromium‘s commit queue, it’s necessary to run the new driver or OS on one of the waterfalls for a day or two to make sure the tests are reliably green before rolling out the driver or OS update. To do this:

  1. Make sure that all of the current Swarming jobs for this OS and GPU configuration are targeted at the “stable” version of the driver and the OS in waterfalls.pyl and mixins.pyl. Make sure that there are “named” stable versions of the driver and the OS there, which target the _TARGETED_DRIVER_VERSIONS and _TARGETED_OS_VERSIONS dictionaries in (Google internal).

  2. File a Build Infrastructure bug, component Infra>Labs, to have ~4 of the physical machines already in the Swarming pool upgraded to the new version of the driver or the OS.

  3. If an “experimental” version of this bot doesn't yet exist, follow the instructions above for How to add a new tester bot to the waterfall to deploy one.

  4. Have this experimental bot target the new version of the driver or the OS in waterfalls.pyl and mixins.pyl. Sample CL.

  5. Hopefully, the new machine will pass the pixel tests. If it doesn‘t, then it’ll be necessary to follow the instructions on updating Gold baselines (step #4).

  6. Watch the new machine for a day or two to make sure it's stable.

  7. When it is, update (Google internal) to add a mapping between the new driver version and the “stable” version. For example:

      # NVIDIA Quadro P400, Ubuntu Stable version
      '10de:1cb3-384.90': 'nvidia-quadro-p400-ubuntu-stable',
      # NVIDIA Quadro P400, new Ubuntu Stable version
      '10de:1cb3-410.78': 'nvidia-quadro-p400-ubuntu-stable',
      # ...

    And/or a mapping between the new OS version and the “stable” version. For example:

      # Linux NVIDIA Quadro P400
      '10de:1cb3': {
        'Ubuntu-14.04': 'linux-nvidia-stable',
        'Ubuntu-19.04': 'linux-nvidia-stable',
      # ...

    The new driver or OS version should match the one just added for the experimental bot. Get this CL reviewed and landed. Sample CL (Google internal).

  8. After it lands, ask the Chrome Infrastructure Labs team to roll out the driver update across all of the similarly configured bots in the swarming pool.

  9. If necessary, update pixel test expectations and remove the suppressions added above.

  10. Remove the old driver or OS version from, leaving the “stable” driver version pointing at the newly upgraded version.

Note that we leave the experimental bot in place. We could reclaim it, but it seems worthwhile to continuously test the “next” version of graphics drivers as well as the current stable ones.

Credentials for various servers

Working with the GPU bots requires credentials to various services: the isolate server, the swarming server, and cloud storage.

Isolate server credentials

To upload and download isolates you must first authenticate to the isolate server. From a Chromium checkout, run:

  • ./src/tools/swarming_client/ login --service=

This will open a web browser to complete the authentication flow. A email address is required in order to properly authenticate.

To test your authentication, find a hash for a recent isolate. Consult the instructions on Running Binaries from the Bots Locally to find a random hash from a target like gl_tests. Then run the following:

If authentication succeeded, this will silently download a file called delete_me into the current working directory. If it failed, the script will report multiple authentication errors. In this case, use the following command to log out and then try again:

  • ./src/tools/swarming_client/ logout --service=

Swarming server credentials

The swarming server uses the same script as the isolate server. You will need to authenticate if you want to manually download the results of previous swarming jobs, trigger your own jobs, or run reproduce to re-run a remote job on your local workstation. Follow the instructions above, replacing the service with

Cloud storage credentials

Authentication to Google Cloud Storage is needed for a couple of reasons: uploading pixel test results to the cloud, and potentially uploading and downloading builds as well, at least in Debug mode. Use the copy of gsutil in depot_tools/third_party/gsutil/gsutil, and follow the Google Cloud Storage instructions to authenticate. You must use your email address and be a member of the Chrome GPU team in order to receive read-write access to the appropriate cloud storage buckets. Roughly:

  1. Run gsutil config
  2. Copy/paste the URL into your browser
  3. Log in with your account
  4. Allow the app to access the information it requests
  5. Copy-paste the resulting key back into your Terminal
  6. Press “enter” when prompted for a project-id (i.e., leave it empty)

At this point you should be able to write to the cloud storage bucket.

Navigate to to view the contents of the cloud storage bucket.