A backend package for the GCE GAE app. Comprised of independent, idempotent cron jobs which trigger independent, idempotent task queues which attempt to move the real-world state of GCE instances closer to the configured state of GCE instances. The cron jobs and task queues are fault tolerant-- failures do not generally cause inconsistent state, allowing the task queues to be triggered again later by the cron jobs. This means transient failures such as datastore or network outages and insufficient permissions or quota only cause failures in the backend package as long as they remain unresolved. Once the issues are resolved, the backend package should recover without intervention.
A Config is a datastore entity representing a configured type of VM. Creation of Configs is outside the scope of the backend package. Configs are mutable and may be created, updated, or even deleted at any time and the backend package will react accordingly.
A VM is a datastore entity representing a single configured VM, derived from a Config. expandConfig is responsible for the derivation. VMs are mutable, but should only be modified by the backend package. To make changes to a VM, modify its corresponding Config and the backend package will propagate the changes. The Config:VM mapping is 1:n.
A GCE instance is a live virtual machine running in Google Compute Engine. An instance is created from a VM by createInstance. Instances are immutable. Changes made to a VM will only be reflected when creating a new instance. The VM:instance mapping is 1:1.
A Swarming bot is the Swarming server's view of a connected instance. Instances automatically register themselves as bots of a particular Swarming server outside the scope of the backend package. Bots may freely be terminated or deleted from the Swarming server and the backend package will react accordingly. The instance:bot mapping is 1:1.
The deadline is how long an instance may live for. An instance's deadline is derived from the lifetime in the Config and the instance creation time. Once the deadline is up, the backend package will attempt to replace the instance after it finishes its current Swarming workload. Replacing the instance is how changes to VMs are picked up, since instances are immutable.
A drained VM is one scheduled for deletion because the Config has been altered to have its number of VMs decreased. A drained VM will be deleted once its corresponding instance has been deleted. A drained Config is one scheduled for deletion by some external factor. All VMs of a drained Config will be drained. A drained Config will be deleted once its corresponding VMs have been deleted.
All cron jobs operate on multiple entities, triggering task queues which operate on a particular entity. All cron jobs are idempotent.
expandConfigsAsync iterates over all Configs and triggers expandConfig for each.
createInstancesAsync iterates over all VMs which have no corresponding instance and triggers createInstance for each.
manageBotsAsync iterates over all VMs which do have a corresponding instance and triggers manageBot for each.
All task queues are triggered with a particular entity to process. All task queues are idempotent.
expandConfig receives a single Config to expand. It checks how many VMs the Config declares and triggers createVM for each.
createVM receives a single VM to create. It creates the VM if it doesn't exist.
createInstance receives a single VM to create an instance for and attempts to idempotently create it. Instance creation in GCE is asynchronous, so the backend package calls createInstance repeatedly until it's detected as created and then records it. Creation is completed if already started for a drained VM, but new creation tasks in GCE are not started for drained VMs.
manageBot receives a single VM to manage a bot for. First checks if the Config referenced by the VM no longer exists or no longer references the given VM and drains the VM if it isn‘t already. Next, watches the Swarming server for changes in the bot’s state and reacts accordingly. If Swarming reports that the bot has died or been deleted or terminated, triggers destroyInstance. If the VM's deadline has been exceeded or the VM is drained, triggers terminateBot.
destroyInstance receives a single VM to destroy the created instance for and attempts to idempotently destroy it. Instance deletion in GCE is asynchronous, so the backend package calls destroyInstance repeatedly until it's detected as destroyed and then triggers deleteBot.
deleteBot receives a single VM entity to delete the bot for. Bot deletion in Swarming is synchronous, so this action is recorded immediately, which deletes the VM.
terminateBot receives a single VM to terminate the bot for and attempts to terminate it. Termination in Swarming is asynchronous, so the backend package calls manageBot repeatedly until it's detected as terminated.