This page documents schemas used for tasks and bots in the DB.
The task description core has to be read in order like a story in 4 parts; each block depends on the previous ones:
The scheduling optimisation is done via:
This schema is an example of a task.
Note: Entities marked with an asterisk * may not be stored in certain situations, like for deduplicated tasks, tasks that didn't run due to internal failure, or tasks with no secret bytes provided (for SecretBytes).
+--------Root------------------------------------------------------+
|TaskRequest |
| +--------------+ +----------------+ +----------------+ |
| |TaskProperties| |TaskSlice | |TaskSlice | |
| | +--------+ | |+--------------+| ... |+--------------+| |
| | |FilesRef| | *or* ||TaskProperties|| ... ||TaskProperties|| |
| | +--------+ | |+--------------+| |+--------------+| |
| +--------------+ +----------------+ +----------------+ |
|id=<based on epoch> |
+------------------------------------------------------------------+
| task_request.py
|
+------+
| |
| v
| +-----------+
| |SecretBytes|* task_request.py
| |id=1 |
| +-----------+
|
+------+
| |
| v
| +--------------+ +--------------+
| |TaskToRunShard|* ... |TaskToRunShard|* task_to_run.py
| |id=<composite>| ... |id=<composite>|
| +--------------+ +--------------+
|
v
+-----------------+
|TaskResultSummary| task_result.py
| +--------+ |
| |FilesRef| |
| +--------+ |
|id=1 |
+-----------------+
|
|
|
v
+-------------+
|TaskRunResult|* task_result.py
| +--------+ |
| |FilesRef| |
| +--------+ |
|id=1 |
+-------------+
|
+----------------------+
| |
v v
+-----------------+ +----------------+
|TaskOutput |* |PerformanceStats|* task_result.py
|id=1 (not stored)| |id=1 |
+-----------------+ +----------------+
|
+------------ ... ----+
| |
v v
+---------------+ +---------------+
|TaskOutputChunk|* ... |TaskOutputChunk|* task_result.py
|id=1 | ... |id=N |
+---------------+ +---------------+
This schema is to enable fast task scheduling.
+-------Root------------+
|TaskDimensionsRoot | (not stored) task_queues.py
|id=<pool:foo or id:foo>|
+-----------------------+
|
+---------------- ... -------+
| |
v v
+----------------------+ +----------------------+
|TaskDimensions | ... |TaskDimensions | task_queues.py
| +-----------------+ | ... | +-----------------+ |
| |TaskDimensionsSet| | | |TaskDimensionsSet| |
| +-----------------+ | | +-----------------+ |
|id=<dimension_hash> | |id=<dimension_hash> |
+----------------------+ +----------------------+
The bot activity generate entities to keep a trace of all the events happening on the bot. A cache is kept in BotInfo to be able to provide APIs to query for all active bots.
This schema is about the audit of the events of bots.
+-----------+
|BotRoot | bot_management.py
|id=<bot_id>|
+-----------+
|
+------+--------------+
| | |
| v v
| +-----------+ +-------+
| |BotSettings| |BotInfo| bot_management.py
| |id=settings| |id=info|
| +-----------+ +-------+
|
+------+-----------+----- ... ----+
| | |
v v v
+--------+ +--------+ +--------+
|BotEvent| |BotEvent| ... |BotEvent| bot_management.py
|id=fffff| |id=ffffe| ... |id=00000|
+--------+ +--------+ +--------+
AppEngine's automatic key numbering is never used. The entities are directly created with predefined keys so entity sharding can be tightly controlled to reduce DB contention.
dimensions_hash value is calculated as an int32 from the TaskRequest.properties.dimensions dictionary.