This document outlines how to debug a test failure on a specific builder configuration without needing to repeatedly upload new CL revisions or do CQ dry runs.
Swarming is a system operated by the infra team that schedules and runs tasks under a specific set of constraints, like “this must run on a macOS 10.13 host” or “this must run on a host with an intel GPU”. It is somewhat similar to part of Borg, or to Kubernetes.
An isolate is an archive containing all the files needed to do a specific task on the swarming infrastructure. It contains binaries as well as any libraries they link against or support data. An isolate can be thought of like a tarball, but held by the “isolate server” and identified by a hash of its contents. The isolate also includes the command(s) to run, which is why the command is specified when building the isolate, not when executing it.
Normally, when you do a CQ dry run, something like this happens:
for type in builders_to_run: targets = compute_targets_for(type) isolates = use_swarming_to_build(type, targets) # uploads isolates for targets wait_for_swarming_to_be_done() for isolate in isolates: use_swarming_to_run(type, isolate) # downloads isolates onto the bots used wait_for_swarming_to_be_done()
When you do a CQ retry on a specific set of bots, that simply constrains builders_to_run
in the pseudocode above. However, if you‘re trying to rerun a specific target on a specific bot, because you’re trying to reproduce a failure or debug, doing a CQ retry will still waste a lot of time - the retry will still build and run all targets, even if it's only for one bot.
Fortunately, you can manually invoke some steps of this process. What you really want to do is:
isolate = use_swarming_to_build(type, target) # can't do this yet, see below use_swarming_to_run(type, isolate)
or perhaps:
isolate = upload_to_isolate_server(target_you_built_locally) use_swarming_to_run(type, isolate)
At the moment, you can only build an isolate locally, like so (commands you type begin with $
):
$ tools/mb/mb.py isolate //$outdir $target
This will produce some files in $outdir. The most pertinent two are $outdir/$target.isolate
and $outdir/target.isolated`.
Support for building an isolate using swarming, which would allow you to build for a platform you can't build for locally, does not yet exist.
You can then upload the resulting isolate to the isolate server:
$ tools/swarming_client/isolate.py archive \ -I https://isolateserver.appspot.com \ -i $outdir/$target.isolate \ -s $outdir/$target.isolated
You may need to log in to https://isolateserver.appspot.com
to do this:
$ python tools/swarming_client/auth.py login \ --service=https://isolateserver.appspot.com
The isolate.py
tool will emit something like this:
e625130b712096e3908266252c8cd779d7f442f1 unit_tests
Do not ctrl-c it after it does this, even if it seems to be hanging for a minute
Now that the isolate is on the isolate server with hash $hash
from the previous step, you can run on bots of your choice:
$ tools/swarming_client/swarming.py trigger \ -S https://chromium-swarm.appspot.com \ -I https://isolateserver.appspot.com \ -d pool $pool \ $criteria \ -s $hash
There are two more things you need to fill in here. The first is the pool name; you should pick “Chrome” unless you know otherwise. The pool is the collection of hosts from which swarming will try to pick bots to run your tasks.
The second is the criteria, which is how you specify which bot(s) you want your task scheduled on. These are specified via “dimensions”, which are specified with -d key val
or --dimension=key val
. In fact, the -d pool $pool
in the command above is selecting based on the “pool” dimension. There are a lot of possible dimensions; one useful one is “os”, like -d os Linux
. Examples of other dimensions include:
-d os Mac10.13.6
to select a specific OS version-d device_type "Pixel 3"
to select a specific Android device type-d gpu 8086:1912
to select a specific GPUThe swarming bot list allows you to see all the dimensions and the values they can take on.
When you invoke swarming.py trigger
, it will emit two pieces of information: a URL for the task it created, and a command you can run to collect the results of that task. For example:
Triggered task: ellyjones@chromium.org/os=Linux_pool=Chrome/e625130b712096e3908266252c8cd779d7f442f1 To collect results, use: tools/swarming_client/swarming.py collect -S https://chromium-swarm.appspot.com 46fc393777163310 Or visit: https://chromium-swarm.appspot.com/user/task/46fc393777163310
The ‘collect’ command given there will block until the task is complete, then produce the task‘s results, or you can load that URL and watch the task’s progress.
A lot of this logic is wrapped up in tools/run-swarmed.py
, which you can run like this:
$ tools/run-swarmed.py -t $target --out-dir=$outdir
See the --help
option of run-swarmed.py
for more details about that script.
If you are looking at a Swarming task page, be sure to check the bottom of the page, which gives you commands to: