docs/gpu/sync_token_internals.md - chromium/src - Git at Google

 # CHROMIUM Sync Token Internals

 Chrome uses a mechanism known as "sync tokens" to synchronize different command
 buffers in the GPU process. This document discusses the internals of the sync
 token system.

 [TOC]

 ## Rationale

 In Chrome, multiple processes, for example browser and renderer, submit work to
 the GPU process asynchronously in command buffer. However, there are
 dependencies between the work submitted by different processes, such as
 GLRenderer in display compositor in the browser/viz process rendering a tile
 produced by the raster worker in the renderer process.

 Sync tokens are used to synchronize the work contained in command buffers
 without waiting for the work to complete. This improves pipelining, and with the
 introduction of GPU scheduling, allows prioritization of work. Although
 originally built for synchronizing command buffers, they can be used for other
 work in the GPU process.

 ## Generation

 Sync tokens are represented by a namespace, identifier, and the *fence release
 count*. `CommandBufferId` is a 64-bit unsigned integer which is unique within a
 `CommandBufferNamespace`. For example IPC command buffers are in the *GPU_IO*
 CommandBufferNamespace, and are identified by CommandBufferId with process id as
 the MSB and IPC route id as the LSB.

 The fence release count marks completion of some work in a command buffer. Note:
 this is CPU side work done that includes command decoding, validation, issuing
 GL calls to the driver, etc. and not GPU side work. See
 [gpu_synchronication.md](/docs/design/gpu_synchronization.md) for more
 information about synchronizing GPU work.

 Fences are typically generated or inserted on the client using a sequential
 counter. The corresponding GL API is `GenSyncTokenCHROMIUM` which generates the
 fence using `CommandBufferProxyImpl::GenerateFenceSyncRelease()`, and also adds
 the fence to the command buffer using the internal `InsertFenceSync` command.

 ## Verification

 Different client processes communicate with the GPU process using *channels*. A
 channel wraps around a message pipe which doesn't provide ordering guarantees
 with respect to other pipes. For example, a message from the browser process
 containing a sync token wait can arrive before the message from the renderer
 process that releases or fulfills the sync token promise.

 To prevent the above problem, client processes must verify sync tokens before
 sending to another process. Verification involves a synchronous nop IPC message,
 `GpuChannelMsg_Nop`, to the GPU process which ensures that the GPU process has
 read previous messages from the pipe.

 Sync tokens used within a process do not need to be verified, and the
 `GenSyncTokenUnverifiedCHROMIUM` GL API serves this common case. These sync
 tokens need to be verified using `VerifySyncTokensCHROMIUM`. Sync tokens
 generated using `GenSyncTokenCHROMIUM` are already verified. `SyncToken` has a
 `verified_flush` bit that guards against accidentally sending unverified sync
 tokens over IPC.

 ## Streams

 In the GPU process, command buffers are organized into logical streams of
 execution that are called *sequences*. Within a sequence tasks are ordered, but
 are asynchronous with respect to tasks in other sequences. Dependencies between
 tasks are specified as sync tokens. For IPC command buffers, this implies flush
 ordering within a sequence.

 A sequence can be created by `Scheduler::CreateSequence` which returns a
 `SequenceId`. Tasks are posted to a sequence using `Scheduler::ScheduleTask`.
 Typically there is one sequence per channel, but sometimes there are more like
 raster, compositor, and media streams in renderer's channel.

 The scheduler also provides a means for co-operative scheduling through
 `Scheduler::ShouldYield` and `Scheduler::ContinueTask`. These allow a task to
 yield and continue once higher priority work is complete. Together with the GPU
 scheduler, multiple sequences provide the means for prioritization of UI work
 over raster prepaint work.

 ## Waiting and Completion

 Sync tokens are managed in the GPU process by `SyncPointManager`, and its helper
 classes `SyncPointOrderData` and `SyncPointClientState`. `SyncPointOrderData`
 holds state for a logical stream of execution, typically containing work of
 multiple command buffers from one process. `SyncPointClientState` holds sync token
 state for a client which generated sync tokens, typically an IPC command buffer.

 GPU scheduler maintains a `SyncPointOrderData` per sequence. Clients must create
 SyncPointClientState using `SyncPointManager::CreateSyncPointClientState` and
 identify their namespace, id, and sequence.

 Waiting on a sync token is done by calling `SyncPointManager::Wait()` with a
 sync token, order number for the wait, and a callback. The callbacks are
 enqueued with the `SyncPointClientState` of the target with the release count of
 the sync token. The scheduler does this internally for sync token dependencies
 for scheduled tasks, but the wait can also be performed when running the
 `WaitSyncTokenCHROMIUM` GL command.

 Sync tokens are completed when the fence is released in the GPU process by
 calling `SyncPointClientState::ReleaseFenceSync()`. For GL command buffers, the
 `InsertFenceSync` command, which contains the release count generated in the
 client, calls this when executed in the service. This issues callbacks and
 allows waiting command buffers to resume their work.

 ## Correctness

 Correctness of waits and releases basically amounts to checking that there are
 no indefinite waits because of broken promises or circular wait chains. This is
 ensured by associating an order number with each wait and release and
 maintaining the invariant that the order number of release is less than or equal
 to the order number of wait.

 Each task is assigned a global sequential order number generated by
 `SyncPointOrderData::GenerateUnprocessedOrderNumber` which are stored in a queue
 of unprocessed order numbers. In `SyncPointManager::Wait()`, the callbacks are
 also enqueued with the order number of the waiting task in `SyncPointOrderData`
 in a queue called `OrderFenceQueue`.

 `SyncPointOrderData` maintains the invariant that all waiting callbacks must
 have an order number greater than the sequence's next unprocessed order number.
 This invariant is checked when enqueuing a new callback in
 `SyncPointOrderData::ValidateReleaseOrderNumber`, and after completing a task in
 `SyncPointOrderData::FinishProcessingOrderNumber`.


 ## See Also

 [CHROMIUM_sync_point](/gpu/GLES2/extensions/CHROMIUM/CHROMIUM_sync_point.txt)
 [gpu_synchronication.md](/docs/design/gpu_synchronization.md)
 [Lightweight GPU Sync Points](https://docs.google.com/document/d/1XwBYFuTcINI84ShNvqifkPREs3sw5NdaKzKqDDxyeHk/edit)
	# CHROMIUM Sync Token Internals

	Chrome uses a mechanism known as "sync tokens" to synchronize different command
	buffers in the GPU process. This document discusses the internals of the sync
	token system.

	[TOC]

	## Rationale

	In Chrome, multiple processes, for example browser and renderer, submit work to
	the GPU process asynchronously in command buffer. However, there are
	dependencies between the work submitted by different processes, such as
	GLRenderer in display compositor in the browser/viz process rendering a tile
	produced by the raster worker in the renderer process.

	Sync tokens are used to synchronize the work contained in command buffers
	without waiting for the work to complete. This improves pipelining, and with the
	introduction of GPU scheduling, allows prioritization of work. Although
	originally built for synchronizing command buffers, they can be used for other
	work in the GPU process.

	## Generation

	Sync tokens are represented by a namespace, identifier, and the *fence release
	count*. `CommandBufferId` is a 64-bit unsigned integer which is unique within a
	`CommandBufferNamespace`. For example IPC command buffers are in the GPU_IO
	CommandBufferNamespace, and are identified by CommandBufferId with process id as
	the MSB and IPC route id as the LSB.

	The fence release count marks completion of some work in a command buffer. Note:
	this is CPU side work done that includes command decoding, validation, issuing
	GL calls to the driver, etc. and not GPU side work. See
	[gpu_synchronication.md](/docs/design/gpu_synchronization.md) for more
	information about synchronizing GPU work.

	Fences are typically generated or inserted on the client using a sequential
	counter. The corresponding GL API is `GenSyncTokenCHROMIUM` which generates the
	fence using `CommandBufferProxyImpl::GenerateFenceSyncRelease()`, and also adds
	the fence to the command buffer using the internal `InsertFenceSync` command.

	## Verification

	Different client processes communicate with the GPU process using channels. A
	channel wraps around a message pipe which doesn't provide ordering guarantees
	with respect to other pipes. For example, a message from the browser process
	containing a sync token wait can arrive before the message from the renderer
	process that releases or fulfills the sync token promise.

	To prevent the above problem, client processes must verify sync tokens before
	sending to another process. Verification involves a synchronous nop IPC message,
	`GpuChannelMsg_Nop`, to the GPU process which ensures that the GPU process has
	read previous messages from the pipe.

	Sync tokens used within a process do not need to be verified, and the
	`GenSyncTokenUnverifiedCHROMIUM` GL API serves this common case. These sync
	tokens need to be verified using `VerifySyncTokensCHROMIUM`. Sync tokens
	generated using `GenSyncTokenCHROMIUM` are already verified. `SyncToken` has a
	`verified_flush` bit that guards against accidentally sending unverified sync
	tokens over IPC.

	## Streams

	In the GPU process, command buffers are organized into logical streams of
	execution that are called sequences. Within a sequence tasks are ordered, but
	are asynchronous with respect to tasks in other sequences. Dependencies between
	tasks are specified as sync tokens. For IPC command buffers, this implies flush
	ordering within a sequence.

	A sequence can be created by `Scheduler::CreateSequence` which returns a
	`SequenceId`. Tasks are posted to a sequence using `Scheduler::ScheduleTask`.
	Typically there is one sequence per channel, but sometimes there are more like
	raster, compositor, and media streams in renderer's channel.

	The scheduler also provides a means for co-operative scheduling through
	`Scheduler::ShouldYield` and `Scheduler::ContinueTask`. These allow a task to
	yield and continue once higher priority work is complete. Together with the GPU
	scheduler, multiple sequences provide the means for prioritization of UI work
	over raster prepaint work.

	## Waiting and Completion

	Sync tokens are managed in the GPU process by `SyncPointManager`, and its helper
	classes `SyncPointOrderData` and `SyncPointClientState`. `SyncPointOrderData`
	holds state for a logical stream of execution, typically containing work of
	multiple command buffers from one process. `SyncPointClientState` holds sync token
	state for a client which generated sync tokens, typically an IPC command buffer.

	GPU scheduler maintains a `SyncPointOrderData` per sequence. Clients must create
	SyncPointClientState using `SyncPointManager::CreateSyncPointClientState` and
	identify their namespace, id, and sequence.

	Waiting on a sync token is done by calling `SyncPointManager::Wait()` with a
	sync token, order number for the wait, and a callback. The callbacks are
	enqueued with the `SyncPointClientState` of the target with the release count of
	the sync token. The scheduler does this internally for sync token dependencies
	for scheduled tasks, but the wait can also be performed when running the
	`WaitSyncTokenCHROMIUM` GL command.

	Sync tokens are completed when the fence is released in the GPU process by
	calling `SyncPointClientState::ReleaseFenceSync()`. For GL command buffers, the
	`InsertFenceSync` command, which contains the release count generated in the
	client, calls this when executed in the service. This issues callbacks and
	allows waiting command buffers to resume their work.

	## Correctness

	Correctness of waits and releases basically amounts to checking that there are
	no indefinite waits because of broken promises or circular wait chains. This is
	ensured by associating an order number with each wait and release and
	maintaining the invariant that the order number of release is less than or equal
	to the order number of wait.

	Each task is assigned a global sequential order number generated by
	`SyncPointOrderData::GenerateUnprocessedOrderNumber` which are stored in a queue
	of unprocessed order numbers. In `SyncPointManager::Wait()`, the callbacks are
	also enqueued with the order number of the waiting task in `SyncPointOrderData`
	in a queue called `OrderFenceQueue`.

	`SyncPointOrderData` maintains the invariant that all waiting callbacks must
	have an order number greater than the sequence's next unprocessed order number.
	This invariant is checked when enqueuing a new callback in
	`SyncPointOrderData::ValidateReleaseOrderNumber`, and after completing a task in
	`SyncPointOrderData::FinishProcessingOrderNumber`.


	## See Also

	[CHROMIUM_sync_point](/gpu/GLES2/extensions/CHROMIUM/CHROMIUM_sync_point.txt)
	[gpu_synchronication.md](/docs/design/gpu_synchronization.md)
	[Lightweight GPU Sync Points](https://docs.google.com/document/d/1XwBYFuTcINI84ShNvqifkPREs3sw5NdaKzKqDDxyeHk/edit)