blob: 5adf610a46fb06a5dd0c859ab882d4a55e68f27d [file] [log] [blame]
### Section 2: Format Overview {#h-02-00}
VP8 works exclusively with an 8-bit YUV 4:2:0 image format. In this
format, each 8-bit pixel in the two chroma planes (U and V)
corresponds positionally to a 2x2 block of 8-bit luma pixels in the
Y plane; coordinates of the upper left corner of the Y block are of
course exactly twice the coordinates of the corresponding chroma
pixels. When we refer to pixels or pixel distances without
specifying a plane, we are implicitly referring to the Y plane or to
the complete image, both of which have the same (full) resolution.
As is usually the case, the pixels are simply a large array of bytes
stored in rows from top to bottom, each row being stored from left to
right. This "left to right" then "top to bottom" raster-scan order
is reflected in the layout of the compressed data as well.
Provision has been made in the VP8 bitstream header for the support
of a secondary YUV color format, in the form of a reserved bit.
Occasionally, at very low datarates, a compression system may decide
to reduce the resolution of the input signal to facilitate efficient
compression. The VP8 data format supports this via optional
upscaling of its internal reconstruction buffer prior to output (this
is completely distinct from the optional postprocessing discussed
earlier, which has nothing to do with decoding per se). This
upsampling restores the video frames to their original resolution.
In other words, the compression/decompression system can be viewed as
a "black box", where the input and output are always at a given
resolution. The compressor might decide to "cheat" and process the
signal at a lower resolution. In that case, the decompressor needs
the ability to restore the signal to its original resolution.
Internally, VP8 decomposes each output frame into an array of
macroblocks. A macroblock is a square array of pixels whose Y
dimensions are 16x16 and whose U and V dimensions are 8x8.
Macroblock-level data in a compressed frame occurs (and must be
processed) in a raster order similar to that of the pixels comprising
the frame.
Macroblocks are further decomposed into 4x4 subblocks. Every
macroblock has 16 Y subblocks, 4 U subblocks, and 4 V subblocks. Any
subblock-level data (and processing of such data) again occurs in
raster order, this time in raster order within the containing
macroblock.
As discussed in further detail below, data can be specified at the
levels of both macroblocks and their subblocks.
Pixels are always treated, at a minimum, at the level of subblocks,
which may be thought of as the "atoms" of the VP8 algorithm. In
particular, the 2x2 chroma blocks corresponding to 4x4 Y subblocks
are never treated explicitly in the data format or in the algorithm
specification.
The DCT and WHT always operate at a 4x4 resolution. The DCT is used
for the 16Y, 4U, and 4V subblocks. The WHT is used (with some but
not all prediction modes) to encode a 4x4 array comprising the
average intensities of the 16 Y subblocks of a macroblock. These
average intensities are, up to a constant normalization factor,
nothing more than the 0th DCT coefficients of the Y subblocks. This
"higher-level" WHT is a substitute for the explicit specification of
those coefficients, in exactly the same way as the DCT of a subblock
substitutes for the specification of the pixel values comprising the
subblock. We consider this 4x4 array as a second-order subblock
called Y2, and think of a macroblock as containing 24 "real"
subblocks and, sometimes, a 25th "virtual" subblock. This is dealt
with further in Section 13.
The frame layout used by the reference decoder may be found in the
file `vpx_image.h` (Section 20.23).