GPU Command Buffer

Authors: chrome-offsec-team@google.com
Last updated: April 10, 2023

Overview

The GPU Command Buffer is used to create and manage high throughput communication paths using a combination of Mojo IPC and shared memory regions. It is currently used by Chrome to support three distinct methods of GPU-accelerated graphics: WebGL2, Skia rasterization, and WebGPU. Mojo IPC messages are used to communicate the metadata necessary to establish new shared memory regions and coordinate state among endpoints producing and consuming from the memory region. Communication paths are between two endpoints: one client and one service. Typically the service endpoint resides in the GPU process and clients reside in the renderer process, but there is support for exceptions such as when GPU functionality is forced to run as a thread in the browser process, or unit tests where everything runs in a single process. A single renderer process may host multiple clients, such as when web content utilizes both WebGL and WebGPU features, where each context has an independent dedicated Command Buffer.

The structure of Command Buffer data can be thought of as a layering of protocols. Common functionality is shared by all client-service types and specialized application-specific functionality is built on top. Core functionality provides the mechanisms to establish and manage the communication path, such as creating new shared memory regions and providing synchronization primitives. Specialized functionality pertains to application-specific features, such as shader configuration and execution.

Security Investment Rationale

Isolating untrustworthy content within a sandboxed process is a cornerstone of the Chrome security model. Communication paths that bridge the sandbox to more trustworthy processes are a key part of the inter-process security boundary. The separation of the signaling mechanism (i.e. Mojo IPC) from the data transfer mechanism (i.e. shared memory) creates the potential for state inconsistencies. The layered nature of communication creates the potential for nuanced, cross-protocol dependencies. The use of shared memory creates the potential for time-of-check-time-of-use issues and other state inconsistencies.

Research Scope & Outcomes

The Command Buffer investigation was part of a broader effort to identify areas of interest to attackers within the Chrome GPU acceleration stack. After narrowing our focus to code supporting the WebGPU subsystem, we still found the stack to be far larger and more complex than we could comprehensively audit in one pass. To further narrow scope we identified discrete components of functionality underpinning the WebGPU stack. The Command Buffer is one such component. Investigation of the Command Buffer proceeded in parallel to analysis of other GPU subsystems.

Command Buffer communication is structured in layers. The foundational or common layer is shared by all higher layers. We chose to start analysis with the common features because of their applicability to all higher layers. The Command Buffer also supports application-specific features that behave like higher level protocols. Examples of higher level layers include support for WebGL and WebGPU. It is these higher level features - specifically the implications of cross-protocol interactions with the low level features - that we believe represent the bulk of complexity and attack surface. However, these are not yet explored and give motivation to revisit this subsystem.

Command Buffer Usage in Chrome

The Command Buffer is used in four different scenarios:

Proxy: Used by sandboxed processes to communicate with the CommandBuffer service in the GPU process.
Direct: Used by the CommandBuffer service within the GPU process.
In-process: Used when clients and services are all in the same process, such as unit tests and when Chrome is run in single-process mode.
Pepper (deprecated): Used by plugins - and possibly also extensions and apps - to communicate with the CommandBuffer service in the GPU process.

Each client type implements at least one of two additional classes that each provide a means of IPC signaling:

CommandBufferClient: Used by the service to send messages to clients. Implemented via Mojo for multi-process use cases; the CommandBufferServiceClient is an equivalent variation to support single-process operation via direct C++ API calls.
GPUControl: Used by clients to send messages to the service. Implemented via Mojo for multi-process clients and C++ for in-process use cases.

The Proxy use case is most interesting because it is the attack surface available from a sandboxed renderer process as deployed in real Chrome instances. However, we intended to pursue fuzzing and Chrome's primary fuzzer framework is based on single-process unit tests, so it was necessary to instead target the in-process implementation supported by unit tests even though it stubs-out or emulates interesting features.

Fuzzing

Existing fuzzers for the Command Buffer were developed by the GPU team and predate our analysis. We studied these fuzzers - in particular how they bootstrap the graphics subsystem for testing - and developed new fuzzers with different generation strategies, finding one new high severity security bug in the same narrow feature set already covered by existing fuzzers. Much like our fuzzers from this first pass, the existing fuzzers targeted specific portions of functionality. An emerging theme for improving fuzzing Chrome at large applies here as well: layering and integration of fuzzers is expected to increase reachability of complex state and therefore increase aggregate fuzzer effectiveness.

In other words, the coverage resulting from a combination of individual fuzzers is greater than the sum of its parts. Consequently, we intend to incrementally extend and integrate fuzzers in order to exercise cross-feature complexity during future work in the graphics subsystem. The next step is integration with new and complementary fuzzer targeting the WebGPU Dawn Wire protocol.

We developed one new fuzzer tailored for the Command Buffer. Its design is similar to an existing fuzzer, but the new fuzzer differs in a few ways:

It uses libprotobuf-mutator (LPM) for generation.
The LPM grammar targets CommandBuffer features that are common to all client-service types, making it suitable for combination with other LPM fuzzers targeting higher level client-service types, such as WebGPU.

It so happens that most of the CommandBuffer features targeted by the new fuzzer also involve Mojo IPC.

Higher Layers

Three types of CommandBuffer clients make direct use of its features: WebGL2, Skia Raster, and WebGPU. This section characterizes how each client type makes use of the CommandBuffer.

One renderer may act as many clients. For example, if web content makes use of both WebGL2 and WebGPU, the renderer would have at least two separate CommandBuffer sessions.

CommandBuffer Protocol Structure

CommandBuffer commands all start with a header containing two fields: an 11-bit command identifier and a 21-bit size. The header is followed by zero or more data fields.

Command Buffer Structure

The structure allows for 11 bits of unique commands, each with customizable data payloads. This is the mechanism used to implement the common commands.

WebGL2 and Skia Raster

Just like the common commands, Skia Raster commands and WebGL2/GLES2 commands are also implemented as native CommandBuffer commands; each new command is assigned an identifier and their parameters are defined using CommandBuffer conventions. All native CommandBuffer commands are validated at the CommandBuffer level.

WebGPU

WebGPU takes a different approach: a few native CommandBuffer commands support many new Dawn APIs. In particular, a single CommandBuffer command called DawnCommands (code pointer: client / service) implements the bulk of the Dawn Wire protocol.

Unlike native CommandBuffer commands, which each have a unique id at the CommandBuffer level, Dawn Wire commands are nested inside the data portion of the DawnCommands CommandBuffer command. In the GPU process, the CommandBuffer command handler reads the command data, which includes an identifier for the Dawn command, then uses a switch statement to determine which Dawn command to execute and how to further decode the data. Consequently, Dawn commands use an entirely discrete set of serialization, deserialization and validation logic.

Notably, the command handler is platform specific, auto-generated, and its design allows individual handlers for Dawn commands (i.e. function pointers) to be overridden when the WebGPU session is established.

Specific Dawn commands of interest are described in a dedicated document.

Features of Particular Interest

This section describes high level feature operation to introduce the concepts. Later documents go into more detail about specific use cases.

`TransferBuffer`: Efficient Bulk Data Transfer

Enabling efficient bulk data transfer is a core CommandBuffer feature. GPU clients - namely a renderer process - create a gpu::TransferBuffer that includes a pointer to a memory region the client intends to share with the GPU process. However, a pointer is only meaningful within a single process's address space and the goal is cross-process sharing, so gpu::TransferBuffer also contains a unique numeric identifier, buffer_id_, that will be consistent across processes.

When a gpu::TransferBuffer is allocated and initialized, an important step is “registration” with the GPU process. Registration is a Mojo IPC message containing the buffer_id_ of the newly created gpu::TransferBuffer, which allows the GPU process to record an association between the buffer_id_ and a pointer to the shared buffer within its own address space. After registration both client and service can refer to the same memory by its ID rather than a pointer.

The gpu::TransferBuffer includes data structures that allow the client and service to treat the shared memory region as a ring buffer. These data structures include alignments and offsets within the shared buffer, as well as features to let client and service indicate their respective positions of production and consumption within the buffer. These structures must be kept in sync across both the client and service to achieve intended operation, which requires cooperation of both client and service. Notably, since the client creates the shared memory it may also unilaterally reorganize or altogether de-allocate the memory; this means the service must guard against a compromised renderer that has many opportunities to manipulate state and shared resources.

`SyncToken`: Foundation for Synchronization

The SyncToken is a CommandBuffer synchronization primitive. A token is inserted into the command stream as a shared point of reference for both producers and consumers. The shared point of reference allows building higher level synchronization features. For example, client and server can communicate expectations about the ordering of operations in the stream in relation to a token.

Higher level several synchronization features such as CHROMIUM_sync_point and CHROMIUM_ordering_barrier build on the SyncToken and are implemented as extensions to the CommandBuffer protocol.

Dedicated documentation goes into more detail about Chromium GPU synchronization features.

`Mailbox`: A cross-process identity for shared resources

A gpu::Mailbox is a 16-byte random value that can be shared across GPU contexts and processes. The gpu::Mailbox name is a single shared identifier used to refer to the same object. Similar to a gpu::TransferBuffer ID, the common identifier gives GPU clients and services in different processes a name to refer to data across address spaces.

The key feature of the gpu::Mailbox is allowing producers and consumers to associate multiple resources with a single name. After establishing a mailbox, communicating endpoints can reference a collection of related resources using a common name rather than managing many separate resources.

`SharedImage`: Allowing many endpoints to operate on shared binary data

SharedImage is a complex collection of features to pass binary data between producers and consumers. It's key feature is allowing multiple contexts of potentially different types - e.g. Skia Raster, WebGL, and WebGPU - to operate on the same object.

Observations & Lessons Learned

Layering and integration of fuzzers is expected to increase reachable state and therefore increase aggregate fuzzer effectiveness. Consequently, we intended to incrementally extend and integrate fuzzers in order to exercise cross-protocol complexity during future work in the graphics subsystem. Modest changes to existing fuzzers can yield new bugs.

Layering Dawn on top of an independent communication mechanism has been a source of security bugs (e.g. 1, 2, 3, 4) because operations at the lower CommandBuffer level can violate assumptions made at the higher level.

Prior Work

2020 Q2 WebGPU Security design doc (Google only)
SwiftShader - A Chrome Security Perspective (Google only)