This page documents schemas used for tasks and bots in the DB.
The task description core has to be read in order like a story in 4 parts; each block depends on the previous ones:
The scheduling optimisation is done via:
This schema is an example of a task.
Note: Entities marked with an asterisk *
may not be stored in certain situations, like for deduplicated tasks, tasks that didn't run due to internal failure, or tasks with no secret bytes provided (for SecretBytes).
+--------Root------------------------------------------------------+ |TaskRequest | | +--------------+ +----------------+ +----------------+ | | |TaskProperties| |TaskSlice | |TaskSlice | | | | +--------+ | |+--------------+| ... |+--------------+| | | | |FilesRef| | *or* ||TaskProperties|| ... ||TaskProperties|| | | | +--------+ | |+--------------+| |+--------------+| | | +--------------+ +----------------+ +----------------+ | |id=<based on epoch> | +------------------------------------------------------------------+ | task_request.py | +------+ | | | v | +-----------+ | |SecretBytes|* task_request.py | |id=1 | | +-----------+ | +------+ | | | v | +--------------+ +--------------+ | |TaskToRunShard|* ... |TaskToRunShard|* task_to_run.py | |id=<composite>| ... |id=<composite>| | +--------------+ +--------------+ | v +-----------------+ |TaskResultSummary| task_result.py | +--------+ | | |FilesRef| | | +--------+ | |id=1 | +-----------------+ | | | v +-------------+ |TaskRunResult|* task_result.py | +--------+ | | |FilesRef| | | +--------+ | |id=1 | +-------------+ | +----------------------+ | | v v +-----------------+ +----------------+ |TaskOutput |* |PerformanceStats|* task_result.py |id=1 (not stored)| |id=1 | +-----------------+ +----------------+ | +------------ ... ----+ | | v v +---------------+ +---------------+ |TaskOutputChunk|* ... |TaskOutputChunk|* task_result.py |id=1 | ... |id=N | +---------------+ +---------------+
This schema is to enable fast task scheduling.
+-------Root------------+ |TaskDimensionsRoot | (not stored) task_queues.py |id=<pool:foo or id:foo>| +-----------------------+ | +---------------- ... -------+ | | v v +----------------------+ +----------------------+ |TaskDimensions | ... |TaskDimensions | task_queues.py | +-----------------+ | ... | +-----------------+ | | |TaskDimensionsSet| | | |TaskDimensionsSet| | | +-----------------+ | | +-----------------+ | |id=<dimension_hash> | |id=<dimension_hash> | +----------------------+ +----------------------+
The bot activity generate entities to keep a trace of all the events happening on the bot. A cache is kept in BotInfo
to be able to provide APIs to query for all active bots.
This schema is about the audit of the events of bots.
+-----------+ |BotRoot | bot_management.py |id=<bot_id>| +-----------+ | +------+--------------+ | | | | v v | +-----------+ +-------+ | |BotSettings| |BotInfo| bot_management.py | |id=settings| |id=info| | +-----------+ +-------+ | +------+-----------+----- ... ----+ | | | v v v +--------+ +--------+ +--------+ |BotEvent| |BotEvent| ... |BotEvent| bot_management.py |id=fffff| |id=ffffe| ... |id=00000| +--------+ +--------+ +--------+ +--------Root---------+ |DimensionAggregation | bot_management.py |id=<all or pool name>| +---------------------+
AppEngine's automatic key numbering is never used. The entities are directly created with predefined keys so entity sharding can be tightly controlled to reduce DB contention.
dimensions_hash
value is calculated as an int32 from the TaskRequest.properties.dimensions dictionary.