VaAPI

This page documents tracing and debugging the Video Acceleration API (VaAPI or VA-API) on ChromeOS. The VA-API is an open-source library and API specification, providing access to graphics hardware acceleration capabilities for video and image processing. The VaAPI is used on ChromeOS on both Intel and AMD platforms.

Overview

VaAPI code is developed upstream on the VaAPI GitHub repository, from which ChromeOS is a downstream client via the libva package, with packaged backends for e.g. both Intel and AMD.

Tracing VaAPI video decoding

A simplified diagram of the buffer circulation is provided below. The “client” is always a Renderer process via a Mojo/IPC communication. Essentially the VaAPI Video Decode Accelerator (VaVDA) receives encoded BitstreamBuffers from the “client”, and sends them to the “va internals”, which eventually produces decoded video in PictureBuffers. The VaVDA may or may not use the Vpp unit for pixel format adaptation, depending on the codec used, silicon generation and other specifics.

      K BitstreamBuffers   +-----+    +-------------------+
 C   --------------------->| Va  | ----->                 |
 L   <---------------------| VDA | <----     va internals |
 I      (encoded stuff)    |     |    |                   |
 E                         |     |    | +-----+       +----+
 N   <---------------------|     | <----|     |<------| lib|
 T   --------------------->|     | ---->| Vpp |------>| va |
                 N         +-----+    +-+-----+   M   +----+
           PictureBuffers                      VASurfaces
           (decoded stuff)

PictureBuffers are created by the “client” but allocated and filled in by the VaVDA. K is unrelated to both M and N.

Tracing memory consumption

Tracing memory consumption is done via the MemoryInfra system. Please take a minute and read that document (in particular the difference between effective_size and size). The VaAPI lives inside the GPU process (a.k.a. Viz process), so please familiarize yourself with the GPU Memory Tracing document. The VaVDA provides information by implementing the Memory Dump Provider interface, but the information provided varies with the executing mode as explained next.

Internal VASurfaces accountancy

The usage of the Vpp unit is controlled by the member variable |decode_using_client_picture_buffers_| and is very advantageous in terms of CPU, power and memory consumption (see crbug.com/822346).

When |decode_using_client_picture_buffers_| is false, libva uses a set of internally allocated VASurfaces that are accounted for in the gpu/vaapi/decoder tracing category (see screenshot below). Each of these VASurfaces is backed by a Buffer Object large enough to hold, at least, the decoded image in YUV semiplanar format. In the diagram above, M varies: 4 for VP8, 9 for VP9, 4-12 for H264/AVC1 (see GetNumReferenceFrames()).

When |decode_using_client_picture_buffers_| is true, libva can decode directly on the client's PictureBuffers, M = 0, and the gpu/vaapi/decoder category is not present in the GPU MemoryInfra.

PictureBuffers accountancy

VaVDA allocates storage for the N PictureBuffers provided by the client by means of VaapiPicture{NativePixmapOzone}s, backed by NativePixmaps, themselves backed by DmaBufs (the client only knows about the client Texture IDs). The GPU's TextureManager accounts for these textures, but:

They are not correctly identified as being backed by NativePixmaps (see crbug.com/514914).
They are not correctly linked back to the Renderer or ARC++ client on behalf of whom the allocation took place, like e.g. the probe-gpu example (see crbug.com/721674).

See e.g. the following ToT example for 10 1920x1080p textures (32bpp); finding the desired context_group can be tricky.

Tracing power consumption

Power consumption is available on ChromeOS test/dev images via the command line binary dump_intel_rapl_consumption; this tool averages the power consumption of the four SoC domains over a configurable period of time, usually a few seconds. These domains are, in the order presented by the tool:

pkg: estimated power consumption of the whole SoC; in particular, this is a superset of pp0 and pp1, including all accessory silicon, e.g. video processing.
pp0: CPU set.
pp1/gfx: Integrated GPU or GPUs.
dram: estimated power consumption of the DRAM, from the bus activity.

Googlers can read more about this topic under go/power-consumption-meas-in-intel.

dump_intel_rapl_consumption is usually run while a given workload is active (e.g. a video playback) with an interval larger than a second to smooth out all kinds of system services that would show up in smaller periods, e.g. WiFi.

dump_intel_rapl_consumption --interval_ms=2000 --repeat --verbose

E.g. on a nocturne main1, the average power consumption while playing back the first minute of a 1080p VP9 video, the average consumptions in watts are:

`pkg`	`pp0`	`pp1`/`gfx`	`dram`
2.63	1.44	0.29	0.87

As can be seen, pkg ~= pp0 + pp1 + 1W, this extra watt is the cost of all the associated silicon, e.g. bridges, bus controllers, caches, and the media processing engine.

Tracing CPU cycles and instantaneous buffer usage

TODO(mcasas): fill in this section.

Verifying VaAPI installation and usage

Verify the VaAPI is correctly installed and can be loaded

vainfo is a small command line utility used to enumerate the supported operation modes; it's developed in the libva-utils repository, but more concretely available on ChromeOS dev images (media-video/libva-utils package) and under Debian systems (vainfo). vainfo will try to load the appropriate backend driver for the system and/or GPUs and fail if it cannot find/load it.

Verify the VaAPI supports and/or uses a given codec

A few steps are customary to verify the support and use of a given codec.

To verify that the build and platform supports video acceleration, launch Chromium and navigate to chrome://gpu, then:

Search for the “Video Acceleration Information” Section: this should enumerate the available accelerated codecs and resolutions.
If this section is empty, oftentimes the “Log Messages” Section immediately below might indicate an associated error, e.g.:
vaInitialize failed: unknown libva error
that can usually be reproduced with vainfo, see the previous section.

To verify that a given video is being played back using the accelerated video decoding backend:

Navigate to a url that causes a video to be played. Leave it playing.
Navigate to the chrome://media-internals tab.
Find the entry associated to the video-playing tab.
Scroll down to “Player Properties” and check the “video_decoder” entry: it should say “GpuVideoDecoder”.

VaAPI on Linux

This configuration is unsupported (see docs/linux_hw_video_decode.md), the following instructions are provided only as a reference for developers to test the code paths on a Linux machine.

Follow the instructions under the Linux build setup document, adding the GN argument use_vaapi=true in the args.gn file (please refer to the Setting up the build) Section).
To support proprietary codecs such as, e.g. H264/AVC1, add the options proprietary_codecs = true and ffmpeg_branding = "Chrome" to the GN args.
Build Chromium as usual.

At this point you should make sure the appropriate VA driver backend is working correctly; try running vainfo from the command line and verify no errors show up.

To run Chromium using VaAPI two arguments are necessary:

--ignore-gpu-blacklist
--use-gl=desktop or --use-gl=egl

./out/gn/chrome --ignore-gpu-blacklist --use-gl=egl

Note that you can set the environment variable MESA_GLSL_CACHE_DISABLE=false if you want the gpu process to run in sandboxed mode, see crbug.com/264818. To check if the running gpu process is sandboxed or not, just open chrome://gpu and search for Sandboxed in the driver information table. In addition, passing --gpu-sandbox-failures-fatal=yes will prevent the gpu process to run in non-sandboxed mode.

Refer to the previous section to verify support and use of the VaAPI.