The script here is used to manage the Docker containers on swarming bots. By design, it's stateless and event-driven. Currently it has only one mode of execution: “launch”. This mode ensures that there is always a given number of swarming containers running on the bot. On the bots, the script called in this mode every 5 minutes via cron.
It's intended to be run in conjucture with the swarm_docker container image. More information on the image can be found here.
Every 5 minutes, this script is invoked with the launch
argument. This is how the containers get spawned, shut down, and cleaned up. Essentially, it does the following:
To preform various forms of maintenance, the script here gracefully shuts down containers and, by association, the swarming bot running inside them. This is done by sending SIGTERM to the swarming bot process. This alerts the swarming bot that it should quit at the next available opportunity (ie: not during a test.) In order to fetch the pid of the swarming bot, the script runs lsof
on the swarming.lck file.
When a new container is launched with an image that is missing on a bot's local cache, it pulls it from the specified container registry. By default, this is the gcloud registry chops-public-images-prod. These images are world-readable, so no authentication with the registry is required. Additionally, when new images are fetched, any old and unused images still present on the machine will be deleted.
To avoid multiple simultaneous invocations of this service from stepping on itself, parts of the script interacting with containers are wrapped in a mutex via a flock. Each container has its own lock file.
The script and its dependencies are deployed as a CIPD package via puppet. The package (infra/swarm_docker_tools/$platform) is continously built on this bot. Puppet deploys it to the relevant bots at these revisions. The canary pin, which tracks latest version in the repository, currently affects no bots on chromium-swarm-dev. So you can proceed to update the stable pin immediately after making a change.
The call site of the script is also defined in puppet.