docs/design/gpu_synchronization.md - chromium/src - Git at Google

 # GPU Synchronization in Chrome

 Chrome supports multiple mechanisms for sequencing GPU drawing operations, this
 document provides a brief overview. The main focus is a high-level explanation
 of when synchronization is needed and which mechanism is appropriate.

 [TOC]

 ## Glossary

 **GL Sync Object**: Generic GL-level synchronization object that can be in a
 "unsignaled" or "signaled" state. The only current implementation of this is a
 GL fence.

 **GL Fence**: A GL sync object that is inserted into the GL command stream. It
 starts out unsignaled and becomes signaled when the GPU reaches this point in the
 command stream, implying that all previous commands have completed.

 **Client Wait**: Block the client thread until a sync object becomes signaled,
 or until a timeout occurs.

 **Server Wait**: Tells the GPU to defer executing commands issued after a fence
 until the fence signals. The client thread continues executing immediately and
 can continue submitting GL commands.

 **CHROMIUM fence sync**: A command buffer specific GL fence that sequences
 operations among command buffer GL contexts without requiring driver-level
 execution of previous commands.

 **Native GL Fence**: A GL Fence backed by a platform-specific cross-process
 synchronization mechanism.

 **GPU Fence Handle**: An IPC-transportable object (typically a file descriptor)
 that can be used to duplicate a native GL fence into a different process's
 context.

 **GPU Fence**: A Chrome abstraction that owns a GPU fence handle representing a
 native GL fence, usable for cross-process synchronization.

 ## Use case overview

 The core scenario is synchronizing read and write access to a shared resorce,
 for example drawing an image into an offscreen texture and compositing the
 result into a final image. The drawing operations need to be completed before
 reading to ensure correct output. A typical effect of wrong synchronization is
 that the output contains blank or incomplete results instead of the expected
 rendered sub-images, causing flickering or tearing.

 "Completed" in this case means that the end result of using a resource as input
 will be equivalent to waiting for everything to finish rendering, but it does
 not necessarily mean that the GPU has fully finished all drawing operations at
 that time.

 ## Single GL context: no synchronization needed

 If all access to the shared resource happens in the same GL context, there is no
 need for explicit synchronization. GL guarantees that commands are logically
 processed in the order they are submitted. This is true both for local GL
 contexts (GL calls via ui/gl/ interfaces) and for a single command buffer GL
 context.

 ## Multiple driver-level GL contexts in the same share group: use GLFence

 A process can create multiple GL contexts that are part of the same share group.
 These contexts can be created in different threads within this process.

 In this case, GL fences must be used for sequencing, for example:

 1. Context A: draw image, create GLFence
 1. Context B: server wait or client wait for GLFence, read image

 [gl::GLFence](/ui/gl/gl_fence.h) and its subclasses provide wrappers for
 GL/EGL fence handling methods such as `eglFenceSyncKHR` and `eglWaitSyncKHR`.
 These fence objects can be used cross-thread as long as both thread's GL
 contexts are part of the same share group.

 For more details, please refer to the underlying extension documentation, for example:

 * https://www.khronos.org/opengl/wiki/Synchronization
 * https://www.khronos.org/registry/EGL/extensions/KHR/EGL_KHR_fence_sync.txt
 * https://www.khronos.org/registry/EGL/extensions/KHR/EGL_KHR_wait_sync.txt

 ## Implementation-dependent: same-thread driver-level GL contexts

 Many GL driver implementations are based on a per-thread command queue,
 with the effect that commands are processed in order even if they were issued
 from different contexts on that thread without explicit synchronization.

 This behavior is not part of the GL standard, and some driver implementations
 use a per-context command queue where this assumption is not true.

 See [issue 510232](http://crbug.com/510243#c23) for an example of a problematic
 sequence:

 ```
 // In one thread:
 MakeCurrent(A);
 Render1();
 MakeCurrent(B);
 Render2();
 CreateSync(X);

 // And in another thread:
 MakeCurrent(C);
 WaitSync(X);
 Render3();
 MakeCurrent(D);
 Render4();
 ```

 The only serialization guarantee is that Render2 will complete before Render3,
 but Render4 could theoretically complete before Render1.

 Chrome assumes that the render steps happen in order Render1, Render2, Render3,
 and Render4, and requires this behavior to ensure security. If the driver doesn't
 ensure this sequencing, Chrome has to emulate it using virtual contexts. (Or by
 using explicit synchronization, but it doesn't do that today.) See also the
 "CHROMIUM fence sync" section below.

 ## Command buffer GL clients: use CHROMIUM sync tokens

 Chrome's command buffer IPC interface uses multiple layers. There are multiple
 active IPC channels (typically one per process, i.e. one per Renderer and one
 for Browser). Each IPC channel has multiple scheduling groups (also called
 streams), and each stream can contain multiple command buffers, which in turn
 contain a sequence of GL commands.

 Command buffers in the same client-side share group must be in the same stream.
 Command scheduling granuarity is at the stream level, and a client can choose to
 create and use multiple streams with different stream priorities. Stream IDs are
 arbitrary integers assigned by the client at creation time, see for example the
 [ws::ContextProviderCommandBuffer](/services/ws/public/cpp/gpu/context_provider_command_buffer.h)
 constructor.

 The CHROMIUM sync token is intended to order operations among command buffer GL
 instructions. It inserts an internal fence sync command in the stream, flushing
 it appropriately (see below), and generating a sync token from it which is a
 cross-context transportable reference to the underlying fence sync. A
 WaitSyncTokenCHROMIUM call does **not** ensure that the underlying GL commands
 have been executed at the GPU driver level, this mechanism is not suitable for
 synchronizing command buffer GL operations with a local driver-level GL context.

 See the
 [CHROMIUM_sync_point](/gpu/GLES2/extensions/CHROMIUM/CHROMIUM_sync_point.txt)
 documentation for details.

 Commands issued within a single command buffer don't need to be synchronized
 explicitly, they will be executed in the same order that they were issued.

 Multiple command buffers within the same stream can use an ordering barrier to
 sequence their commands. Sync tokens are not necessary. Example:

 ```c++
 // Command buffers gl1 and gl2 are in the same stream.
 Render1(gl1);
 gl1->OrderingBarrierCHROMIUM()
 Render2(gl2);  // will happen after Render1.
 ```

 Command buffers that are in different streams need to use sync tokens. If both
 are using the same IPC channel (i.e. same client process), an unverified sync
 token is sufficient, and commands do not need to be flushed to the server:

 ```c++
 // stream A
 Render1(glA);
 glA->GenUnverifiedSyncTokenCHROMIUM(out_sync_token);

 // stream B
 glB->WaitSyncTokenCHROMIUM();
 Render2(glB);  // will happen after Render1.
 ```

 Command buffers that are using different IPC channels must use verified sync
 tokens. Verification is a check that the underlying fence sync was flushed to
 the server. Cross-process synchronization always uses verified sync tokens.
 `GenSyncTokenCHROMIUM` will force a shallow flush as a side effect if necessary.
 Example:

 ```c++
 // IPC channel in process X
 Render1(glX);
 glX->GenSyncTokenCHROMIUM(out_sync_token);

 // IPC channel in process Y
 glY->WaitSyncTokenCHROMIUM();
 Render2(glY);  // will happen after Render1.
 ```

 Alternatively, unverified sync tokens can be converted to verified ones in bulk
 by calling `VerifySyncTokensCHROMIUM`. This will wait for a flush to complete as
 necessary. Use this to avoid multiple sequential flushes:

 ```c++
 gl->GenUnverifiedSyncTokenCHROMIUM(out_sync_tokens[0]);
 gl->GenUnverifiedSyncTokenCHROMIUM(out_sync_tokens[1]);
 gl->VerifySyncTokensCHROMIUM(out_sync_tokens, 2);
 ```

 ### Implementation notes

 Correctness of the CHROMIUM fence sync mechanism depends on the assumption that
 commands issued from the command buffer service side happen in the order they
 were issued in that thread. This is handled in different ways:

 * Issue a glFlush on switching contexts on platforms where glFlush is sufficient
   to ensure ordering, i.e. MacOS. (This approach would not be well suited to
   tiling GPUs as used on many mobile GPUs where glFlush is an expensive
   operation, it may force content load/store between tile memory and main
   memory.) See for example
   [gl::GLContextCGL::MakeCurrent](/ui/gl/gl_context_cgl.cc):
 ```c++
   // It's likely we're going to switch OpenGL contexts at this point.
   // Before doing so, if there is a current context, flush it. There
   // are many implicit assumptions of flush ordering between contexts
   // at higher levels, and if a flush isn't performed, OpenGL commands
   // may be issued in unexpected orders, causing flickering and other
   // artifacts.
 ```

 * Force context virtualization so that all commands are issued into a single
   driver-level GL context. This is used on Qualcomm/Adreno chipsets, see [issue
   691102](http://crbug.com/691102).

 * Assume per-thread command queues without explicit synchronization. GLX
   effectively ensures this. On Windows, ANGLE uses a single D3D device
   underneath all contexts which ensures strong ordering.

 GPU control tasks are processed out of band and are only partially ordered in
 respect to GL commands. A gpu_control task always happens before any following
 GL commands issued on the same IPC channel. It usually executes before any
 preceding unflushed GL commands, but this is not guaranteed. A
 `ShallowFlushCHROMIUM` ensures that any following gpu_control tasks will execute
 after the flushed GL commands.

 In this example, DoTask will execute after GLCommandA and before GLCommandD, but
 there is no ordering guarantee relative to CommandB and CommandC:

 ```c++
   // gles2_implementation.cc

   helper_->GLCommandA();
   ShallowFlushCHROMIUM();

   helper_->GLCommandB();
   helper_->GLCommandC();
   gpu_control_->DoTask();

   helper_->GLCommandD();

   // Execution order is one of:
   //   A | DoTask B C | D
   //   A | B DoTask C | D
   //   A | B C DoTask | D
 ```

 The shallow flush adds the pending GL commands to the service's task queue, and
 this task queue is also used by incoming gpu control tasks and processed in
 order. The `ShallowFlushCHROMIUM` command returns as soon as the tasks are
 queued and does not wait for them to be processed.

 ## Cross-process transport: GpuFence and GpuFenceHandle

 Some platforms such as Android (most devices N and above) and ChromeOS support
 synchronizing a native GL context with a command buffer GL context through a
 GpuFence.

 Use the static `gl::GLFence::IsGpuFenceSupported()` method to check at runtime if
 the current platform has support for the GpuFence mechanism including
 GpuFenceHandle transport.

 The GpuFence mechanism supports two use cases:

 * Create a GLFence object in a local context, convert it to a client-side
 GpuFence, duplicate it into a command buffer service-side gpu fence, and
 issue a server wait on the command buffer service side. That service-side
 wait will be unblocked when the *client-side* GpuFence signals.

 * Create a new command buffer service-side gpu fence, request a GpuFenceHandle
 from it, use this handle to create a native GL fence object in the local
 context, then issue a server wait on the local GL fence object. This local
 server wait will be unblocked when the *service-side* gpu fence signals.

 The [CHROMIUM_gpu_fence
 extension](/gpu/GLES2/extensions/CHROMIUM/CHROMIUM_gpu_fence.txt) documents
 the GLES API as used through the command buffer interface. This section contains
 additional information about the integration with local GL contexts that is
 needed to work with these objects.

 ### Driver-level wrappers

 In general, you should use the static `gl::GLFence::CreateForGpuFence()` and
 `gl::GLFence::CreateFromGpuFence()` factory methods to create a
 platform-specific local fence object instead of using an implementation class
 directly.

 For Android and ChromeOS, the
 [gl::GLFenceAndroidNativeFenceSync](/ui/gl/gl_fence_android_native_fence_sync.h)
 implementation wraps the
 [EGL_ANDROID_native_fence_sync](https://www.khronos.org/registry/EGL/extensions/ANDROID/EGL_ANDROID_native_fence_sync.txt)
 extension that allows creating a special EGLFence object from which a file
 descriptor can be extracted, and then creating a duplicate fence object from
 that file descriptor that is synchronized with the original fence.

 ### GpuFence and GpuFenceHandle

 A [gfx::GpuFence](/ui/gfx/gpu_fence.h) object owns a GPU fence handle
 representing a native GL fence. The `AsClientGpuFence` method casts it to a
 ClientGpuFence type for use with the [CHROMIUM_gpu_fence
 extension](/gpu/GLES2/extensions/CHROMIUM/CHROMIUM_gpu_fence.txt)'s
 `CreateClientGpuFenceCHROMIUM` call.

 A [gfx::GpuFenceHandle](/ui/gfx/gpu_fence_handle.h) is an IPC-transportable
 wrapper for a file descriptor or other underlying primitive object, and is used
 to duplicate a native GL fence into another process. It has value semantics and
 can be copied multiple times, and then consumed exactly one time. Consumers take
 ownership of the underlying resource. Current GpuFenceHandle consumers are:

 * The `gfx::GpuFence(gpu_fence_handle)` constructor takes ownership of the
   handle's resources without constructing a local fence.

 * The IPC subsystem closes resources after sending. The typical idiom is to call
   `gfx::CloneHandleForIPC(handle)` on a GpuFenceHandle retrieved from a
   scope-lifetime object to create a copied handle that will be owned by the IPC
   subsystem.

 ### Sample Code

 A usage example for two-process synchronization is to sequence access to a
 globally shared drawable such as an AHardwareBuffer on Android, where the
 writer uses a local GL context and the reader is a command buffer context in
 the GPU process. The writer process draws into an AHardwareBuffer-backed
 GLImage in the local GL context, then creates a gpu fence to mark the end of
 drawing operations:

 ```c++
     // This example assumes that GpuFence is supported. If not, the application
     // should fall back to a different transport or synchronization method.
     DCHECK(gl::GLFence::IsGpuFenceSupported())

     // ... write to the shared drawable in local context, then create
     // a local fence.
     std::unique_ptr<gl::GLFence> local_fence = gl::GLFence::CreateForGpuFence();

     // Convert to a GpuFence.
     std::unique_ptr<gfx::GpuFence> gpu_fence = local_fence->GetGpuFence();
     // It's ok for local_fence to be destroyed now, the GpuFence remains valid.

     // Create a matching gpu fence on the command buffer context, issue
     // server wait, and destroy it.
     GLuint id = gl->CreateClientGpuFenceCHROMIUM(gpu_fence.AsClientGpuFence());
     // It's ok for gpu_fence to be destroyed now.
     gl->WaitGpuFenceCHROMIUM(id);
     gl->DestroyGpuFenceCHROMIUM(id);

     // ... read from the shared drawable via command buffer. These reads
     // will happen after the local_fence has signalled. The local
     // fence and gpu_fence dn't need to remain alive for this.
 ```

 If a process wants to consume a drawable that was produced through a command
 buffer context in the GPU process, the sequence is as follows:

 ```c++
     // Set up callback that's waiting for the drawable to be ready.
     void callback(std::unique_ptr<gfx::GpuFence> gpu_fence) {
         // Create a local context GL fence from the GpuFence.
         std::unique_ptr<gl::GLFence> local_fence =
             gl::GLFence::CreateFromGpuFence(*gpu_fence);
         local_fence->ServerWait();
         // ... read from the shared drawable in the local context.
     }

     // ... write to the shared drawable via command buffer, then
     // create a gpu fence:
     GLuint id = gl->CreateGpuFenceCHROMIUM();
     context_support->GetGpuFenceHandle(id, base::BindOnce(callback));
     gl->DestroyGpuFenceCHROMIUM(id);
 ```

 It is legal to create the GpuFence on a separate command buffer context instead
 of on the command buffer channel that did the drawing operations, but in that
 case gl->WaitSyncTokenCHROMIUM() or equivalent must be used to sequence the
 operations between the distinct command buffer contexts as usual.
	# GPU Synchronization in Chrome

	Chrome supports multiple mechanisms for sequencing GPU drawing operations, this
	document provides a brief overview. The main focus is a high-level explanation
	of when synchronization is needed and which mechanism is appropriate.

	[TOC]

	## Glossary

	GL Sync Object: Generic GL-level synchronization object that can be in a
	"unsignaled" or "signaled" state. The only current implementation of this is a
	GL fence.

	GL Fence: A GL sync object that is inserted into the GL command stream. It
	starts out unsignaled and becomes signaled when the GPU reaches this point in the
	command stream, implying that all previous commands have completed.

	Client Wait: Block the client thread until a sync object becomes signaled,
	or until a timeout occurs.

	Server Wait: Tells the GPU to defer executing commands issued after a fence
	until the fence signals. The client thread continues executing immediately and
	can continue submitting GL commands.

	CHROMIUM fence sync: A command buffer specific GL fence that sequences
	operations among command buffer GL contexts without requiring driver-level
	execution of previous commands.

	Native GL Fence: A GL Fence backed by a platform-specific cross-process
	synchronization mechanism.

	GPU Fence Handle: An IPC-transportable object (typically a file descriptor)
	that can be used to duplicate a native GL fence into a different process's
	context.

	GPU Fence: A Chrome abstraction that owns a GPU fence handle representing a
	native GL fence, usable for cross-process synchronization.

	## Use case overview

	The core scenario is synchronizing read and write access to a shared resorce,
	for example drawing an image into an offscreen texture and compositing the
	result into a final image. The drawing operations need to be completed before
	reading to ensure correct output. A typical effect of wrong synchronization is
	that the output contains blank or incomplete results instead of the expected
	rendered sub-images, causing flickering or tearing.

	"Completed" in this case means that the end result of using a resource as input
	will be equivalent to waiting for everything to finish rendering, but it does
	not necessarily mean that the GPU has fully finished all drawing operations at
	that time.

	## Single GL context: no synchronization needed

	If all access to the shared resource happens in the same GL context, there is no
	need for explicit synchronization. GL guarantees that commands are logically
	processed in the order they are submitted. This is true both for local GL
	contexts (GL calls via ui/gl/ interfaces) and for a single command buffer GL
	context.

	## Multiple driver-level GL contexts in the same share group: use GLFence

	A process can create multiple GL contexts that are part of the same share group.
	These contexts can be created in different threads within this process.

	In this case, GL fences must be used for sequencing, for example:

	1. Context A: draw image, create GLFence
	1. Context B: server wait or client wait for GLFence, read image

	[gl::GLFence](/ui/gl/gl_fence.h) and its subclasses provide wrappers for
	GL/EGL fence handling methods such as `eglFenceSyncKHR` and `eglWaitSyncKHR`.
	These fence objects can be used cross-thread as long as both thread's GL
	contexts are part of the same share group.

	For more details, please refer to the underlying extension documentation, for example:

	* https://www.khronos.org/opengl/wiki/Synchronization
	* https://www.khronos.org/registry/EGL/extensions/KHR/EGL_KHR_fence_sync.txt
	* https://www.khronos.org/registry/EGL/extensions/KHR/EGL_KHR_wait_sync.txt

	## Implementation-dependent: same-thread driver-level GL contexts

	Many GL driver implementations are based on a per-thread command queue,
	with the effect that commands are processed in order even if they were issued
	from different contexts on that thread without explicit synchronization.

	This behavior is not part of the GL standard, and some driver implementations
	use a per-context command queue where this assumption is not true.

	See [issue 510232](http://crbug.com/510243#c23) for an example of a problematic
	sequence:

	```
	// In one thread:
	MakeCurrent(A);
	Render1();
	MakeCurrent(B);
	Render2();
	CreateSync(X);

	// And in another thread:
	MakeCurrent(C);
	WaitSync(X);
	Render3();
	MakeCurrent(D);
	Render4();
	```

	The only serialization guarantee is that Render2 will complete before Render3,
	but Render4 could theoretically complete before Render1.

	Chrome assumes that the render steps happen in order Render1, Render2, Render3,
	and Render4, and requires this behavior to ensure security. If the driver doesn't
	ensure this sequencing, Chrome has to emulate it using virtual contexts. (Or by
	using explicit synchronization, but it doesn't do that today.) See also the
	"CHROMIUM fence sync" section below.

	## Command buffer GL clients: use CHROMIUM sync tokens

	Chrome's command buffer IPC interface uses multiple layers. There are multiple
	active IPC channels (typically one per process, i.e. one per Renderer and one
	for Browser). Each IPC channel has multiple scheduling groups (also called
	streams), and each stream can contain multiple command buffers, which in turn
	contain a sequence of GL commands.

	Command buffers in the same client-side share group must be in the same stream.
	Command scheduling granuarity is at the stream level, and a client can choose to
	create and use multiple streams with different stream priorities. Stream IDs are
	arbitrary integers assigned by the client at creation time, see for example the
	[ws::ContextProviderCommandBuffer](/services/ws/public/cpp/gpu/context_provider_command_buffer.h)
	constructor.

	The CHROMIUM sync token is intended to order operations among command buffer GL
	instructions. It inserts an internal fence sync command in the stream, flushing
	it appropriately (see below), and generating a sync token from it which is a
	cross-context transportable reference to the underlying fence sync. A
	WaitSyncTokenCHROMIUM call does not ensure that the underlying GL commands
	have been executed at the GPU driver level, this mechanism is not suitable for
	synchronizing command buffer GL operations with a local driver-level GL context.

	See the
	[CHROMIUM_sync_point](/gpu/GLES2/extensions/CHROMIUM/CHROMIUM_sync_point.txt)
	documentation for details.

	Commands issued within a single command buffer don't need to be synchronized
	explicitly, they will be executed in the same order that they were issued.

	Multiple command buffers within the same stream can use an ordering barrier to
	sequence their commands. Sync tokens are not necessary. Example:

	```c++
	// Command buffers gl1 and gl2 are in the same stream.
	Render1(gl1);
	gl1->OrderingBarrierCHROMIUM()
	Render2(gl2); // will happen after Render1.
	```

	Command buffers that are in different streams need to use sync tokens. If both
	are using the same IPC channel (i.e. same client process), an unverified sync
	token is sufficient, and commands do not need to be flushed to the server:

	```c++
	// stream A
	Render1(glA);
	glA->GenUnverifiedSyncTokenCHROMIUM(out_sync_token);

	// stream B
	glB->WaitSyncTokenCHROMIUM();
	Render2(glB); // will happen after Render1.
	```

	Command buffers that are using different IPC channels must use verified sync
	tokens. Verification is a check that the underlying fence sync was flushed to
	the server. Cross-process synchronization always uses verified sync tokens.
	`GenSyncTokenCHROMIUM` will force a shallow flush as a side effect if necessary.
	Example:

	```c++
	// IPC channel in process X
	Render1(glX);
	glX->GenSyncTokenCHROMIUM(out_sync_token);

	// IPC channel in process Y
	glY->WaitSyncTokenCHROMIUM();
	Render2(glY); // will happen after Render1.
	```

	Alternatively, unverified sync tokens can be converted to verified ones in bulk
	by calling `VerifySyncTokensCHROMIUM`. This will wait for a flush to complete as
	necessary. Use this to avoid multiple sequential flushes:

	```c++
	gl->GenUnverifiedSyncTokenCHROMIUM(out_sync_tokens[0]);
	gl->GenUnverifiedSyncTokenCHROMIUM(out_sync_tokens[1]);
	gl->VerifySyncTokensCHROMIUM(out_sync_tokens, 2);
	```

	### Implementation notes

	Correctness of the CHROMIUM fence sync mechanism depends on the assumption that
	commands issued from the command buffer service side happen in the order they
	were issued in that thread. This is handled in different ways:

	* Issue a glFlush on switching contexts on platforms where glFlush is sufficient
	to ensure ordering, i.e. MacOS. (This approach would not be well suited to
	tiling GPUs as used on many mobile GPUs where glFlush is an expensive
	operation, it may force content load/store between tile memory and main
	memory.) See for example
	[gl::GLContextCGL::MakeCurrent](/ui/gl/gl_context_cgl.cc):
	```c++
	// It's likely we're going to switch OpenGL contexts at this point.
	// Before doing so, if there is a current context, flush it. There
	// are many implicit assumptions of flush ordering between contexts
	// at higher levels, and if a flush isn't performed, OpenGL commands
	// may be issued in unexpected orders, causing flickering and other
	// artifacts.
	```

	* Force context virtualization so that all commands are issued into a single
	driver-level GL context. This is used on Qualcomm/Adreno chipsets, see [issue
	691102](http://crbug.com/691102).

	* Assume per-thread command queues without explicit synchronization. GLX
	effectively ensures this. On Windows, ANGLE uses a single D3D device
	underneath all contexts which ensures strong ordering.

	GPU control tasks are processed out of band and are only partially ordered in
	respect to GL commands. A gpu_control task always happens before any following
	GL commands issued on the same IPC channel. It usually executes before any
	preceding unflushed GL commands, but this is not guaranteed. A
	`ShallowFlushCHROMIUM` ensures that any following gpu_control tasks will execute
	after the flushed GL commands.

	In this example, DoTask will execute after GLCommandA and before GLCommandD, but
	there is no ordering guarantee relative to CommandB and CommandC:

	```c++
	// gles2_implementation.cc

	helper_->GLCommandA();
	ShallowFlushCHROMIUM();

	helper_->GLCommandB();
	helper_->GLCommandC();
	gpu_control_->DoTask();

	helper_->GLCommandD();

	// Execution order is one of:
	// A \| DoTask B C \| D
	// A \| B DoTask C \| D
	// A \| B C DoTask \| D
	```

	The shallow flush adds the pending GL commands to the service's task queue, and
	this task queue is also used by incoming gpu control tasks and processed in
	order. The `ShallowFlushCHROMIUM` command returns as soon as the tasks are
	queued and does not wait for them to be processed.

	## Cross-process transport: GpuFence and GpuFenceHandle

	Some platforms such as Android (most devices N and above) and ChromeOS support
	synchronizing a native GL context with a command buffer GL context through a
	GpuFence.

	Use the static `gl::GLFence::IsGpuFenceSupported()` method to check at runtime if
	the current platform has support for the GpuFence mechanism including
	GpuFenceHandle transport.

	The GpuFence mechanism supports two use cases:

	* Create a GLFence object in a local context, convert it to a client-side
	GpuFence, duplicate it into a command buffer service-side gpu fence, and
	issue a server wait on the command buffer service side. That service-side
	wait will be unblocked when the client-side GpuFence signals.

	* Create a new command buffer service-side gpu fence, request a GpuFenceHandle
	from it, use this handle to create a native GL fence object in the local
	context, then issue a server wait on the local GL fence object. This local
	server wait will be unblocked when the service-side gpu fence signals.

	The [CHROMIUM_gpu_fence
	extension](/gpu/GLES2/extensions/CHROMIUM/CHROMIUM_gpu_fence.txt) documents
	the GLES API as used through the command buffer interface. This section contains
	additional information about the integration with local GL contexts that is
	needed to work with these objects.

	### Driver-level wrappers

	In general, you should use the static `gl::GLFence::CreateForGpuFence()` and
	`gl::GLFence::CreateFromGpuFence()` factory methods to create a
	platform-specific local fence object instead of using an implementation class
	directly.

	For Android and ChromeOS, the
	[gl::GLFenceAndroidNativeFenceSync](/ui/gl/gl_fence_android_native_fence_sync.h)
	implementation wraps the
	[EGL_ANDROID_native_fence_sync](https://www.khronos.org/registry/EGL/extensions/ANDROID/EGL_ANDROID_native_fence_sync.txt)
	extension that allows creating a special EGLFence object from which a file
	descriptor can be extracted, and then creating a duplicate fence object from
	that file descriptor that is synchronized with the original fence.

	### GpuFence and GpuFenceHandle

	A [gfx::GpuFence](/ui/gfx/gpu_fence.h) object owns a GPU fence handle
	representing a native GL fence. The `AsClientGpuFence` method casts it to a
	ClientGpuFence type for use with the [CHROMIUM_gpu_fence
	extension](/gpu/GLES2/extensions/CHROMIUM/CHROMIUM_gpu_fence.txt)'s
	`CreateClientGpuFenceCHROMIUM` call.

	A [gfx::GpuFenceHandle](/ui/gfx/gpu_fence_handle.h) is an IPC-transportable
	wrapper for a file descriptor or other underlying primitive object, and is used
	to duplicate a native GL fence into another process. It has value semantics and
	can be copied multiple times, and then consumed exactly one time. Consumers take
	ownership of the underlying resource. Current GpuFenceHandle consumers are:

	* The `gfx::GpuFence(gpu_fence_handle)` constructor takes ownership of the
	handle's resources without constructing a local fence.

	* The IPC subsystem closes resources after sending. The typical idiom is to call
	`gfx::CloneHandleForIPC(handle)` on a GpuFenceHandle retrieved from a
	scope-lifetime object to create a copied handle that will be owned by the IPC
	subsystem.

	### Sample Code

	A usage example for two-process synchronization is to sequence access to a
	globally shared drawable such as an AHardwareBuffer on Android, where the
	writer uses a local GL context and the reader is a command buffer context in
	the GPU process. The writer process draws into an AHardwareBuffer-backed
	GLImage in the local GL context, then creates a gpu fence to mark the end of
	drawing operations:

	```c++
	// This example assumes that GpuFence is supported. If not, the application
	// should fall back to a different transport or synchronization method.
	DCHECK(gl::GLFence::IsGpuFenceSupported())

	// ... write to the shared drawable in local context, then create
	// a local fence.
	std::unique_ptr<gl::GLFence> local_fence = gl::GLFence::CreateForGpuFence();

	// Convert to a GpuFence.
	std::unique_ptr<gfx::GpuFence> gpu_fence = local_fence->GetGpuFence();
	// It's ok for local_fence to be destroyed now, the GpuFence remains valid.

	// Create a matching gpu fence on the command buffer context, issue
	// server wait, and destroy it.
	GLuint id = gl->CreateClientGpuFenceCHROMIUM(gpu_fence.AsClientGpuFence());
	// It's ok for gpu_fence to be destroyed now.
	gl->WaitGpuFenceCHROMIUM(id);
	gl->DestroyGpuFenceCHROMIUM(id);

	// ... read from the shared drawable via command buffer. These reads
	// will happen after the local_fence has signalled. The local
	// fence and gpu_fence dn't need to remain alive for this.
	```

	If a process wants to consume a drawable that was produced through a command
	buffer context in the GPU process, the sequence is as follows:

	```c++
	// Set up callback that's waiting for the drawable to be ready.
	void callback(std::unique_ptr<gfx::GpuFence> gpu_fence) {
	// Create a local context GL fence from the GpuFence.
	std::unique_ptr<gl::GLFence> local_fence =
	gl::GLFence::CreateFromGpuFence(*gpu_fence);
	local_fence->ServerWait();
	// ... read from the shared drawable in the local context.
	}

	// ... write to the shared drawable via command buffer, then
	// create a gpu fence:
	GLuint id = gl->CreateGpuFenceCHROMIUM();
	context_support->GetGpuFenceHandle(id, base::BindOnce(callback));
	gl->DestroyGpuFenceCHROMIUM(id);
	```

	It is legal to create the GpuFence on a separate command buffer context instead
	of on the command buffer channel that did the drawing operations, but in that
	case gl->WaitSyncTokenCHROMIUM() or equivalent must be used to sequence the
	operations between the distinct command buffer contexts as usual.