Important Swarming concepts are Tasks and Bots. A task is a step that has inputs and generates outputs. A bot can run tasks. Simple, right?
There are 3 important classes of data:
Task request (and task properties as part of it) are set when the task is created and do not change.
A task is referenced to via a task ID. It looks like an hex number but should be treated as a string.
A Swarming task is conceptually a function that has some input and generates outputs. The process can be simplified as F(i, c, b) where F() is the Swarming task, i is the input files, c the command execution environment, b the bot selection (dimensions) description.
Inputs can be a mix of all 4 of:
Command execution environment is defined as:
Bot selection is defined as:
key:value
dimensions. This is the core bot selection mechanism to select which bots are allowed to run the task. But this also supports OR like key:value1|value2
.The dimensions are important. For example be clear upfront if you assume an Intel processor, the OS distribution version, e.g. Windows-7-SP1 vs Windows-Vista-SP2.
Idempotency is a mechanism to improve the efficiency of the infrastructure. When a task is requested to the server, and a previous task with the exact same properties had previously succeeded on the Swarming server, previous results can be returned as-is without ever running the task.
Not running anything is always faster than running something. This saves a lot of time and infrastructure usage.
To effectively leverage idempotency, it is important for the inputs files to be as “stable” as possible. For example efforts, see Debian‘s initiated effort reproducible builds and Chromium’s deterministic builds effort.
To enable this feature, a task must be declared as idempotent. This tells the server that this request fits the contract that the task implements a pure function: same inputs always produce same outputs. Results of execution of such tasks can be reused.
For a task to be idempotent, it must depend on nothing else than the task inputs and the declared environment. This means the dimensions uniquely describe the type of bot required; exact OS version, any other important detail that can affect the task output.
Other things of note are:
${ISOLATED_OUTDIR}
.run_isolated.py
keeps a local content addressed cache.If any of the rule above does not hold, the task must not be marked as idempotent since it is not reproducible by definition.
The request is the metadata around the task properties requested. This defines:
EXPIRED
.The result is a collection of:
Once the task is completed, results become immutable.
The result can also be a non-event: the task wasn't run at all. This results in an EXPIRED
task. This happens when there was no bot available to run the task before the expiration delay.
An exceptional event can be BOT_DIED
. This means that either the bot was lost while the task ran or that the server had an internal failure during the task execution.
To understand how bot behaves, see Bot.md. This section focuses from the point of view of running a task.
Swarming tasks are normally running a isolated tree directly via run_isolated.py.
Swarming is designed with inspiration from internal Google test distribution mechanism. As such, it has a few assumptions baked in. A task shall:
/tmp
or %TEMP%
file files that are irrelevant after the task execution.${ISOLATED_OUTDIR}
for files that are the output of this task.Once the task completed, results are uploaded back and the tree is associated with the task result.
There's 4 ways to interact with Swarming:
The Web UI has 4 purposes:
bot_config.py
and bootstrap.py
.swarming is the client to manage Swarming tasks at the command line.
Warning: This doc is bound to become out of date. Here's one weird trick:
swarming help
” gives you all the help you need so only a quick overview is given here:Tasks can be triggered asynchronously by using trigger
+ collect
. The general idea is that you trigger all the tests you want to run immediately, then collect the results.
Triggers a task and exits without waiting for it:
swarming trigger -server <host> -digest <RBE-CAS digest> -task-name <name>
<name>
is the name you want to give to the task, like “base_unittests
”.Run swarming help trigger
for more information.
Collects results for a previously triggered task. The results can be collected multiple times without problem until they are expired on the server.
swarming collect -server <host> <task id>
swarming bots
returns state about the known bots. More APIs will be added, like returning tasks, once needed by someone. In the meantime the web frontend shows these.
The client tools are self-documenting. Use “swarming help
” for more information.
The API is implemented via Cloud Endpoints v1. It can be browsed at: https://apis-explorer.appspot.com/apis-explorer/?base=https://chromium-swarm.appspot.com/_ah/api#p/swarming/v1/
Until the API is rewritten as proto files, each API struct description can be read at https://github.com/luci/luci-py/blob/master/appengine/swarming/swarming_rpcs.py