text_src/02.00__vp8-bitstream__format-overview.txt - webm/bitstream-guide - Git at Google



 ### Chapter 2: Format Overview                               {#h-02-00}

 VP8 works exclusively with an 8-bit YUV 4:2:0 image format. In this format, each 8-bit pixel in the two chroma planes (U and V) corresponds positionally to a 2x2 block of 8-bit luma pixels in the Y plane; coordinates of the upper left corner of the Y block are of course exactly twice the coordinates of the corresponding chroma pixels. When we refer to pixels or pixel distances without specifying a plane, we are implicitly referring to the Y plane or to the complete image, both of which have the same (full) resolution.

 As is usually the case, the pixels are simply a large array of bytes stored in rows from top to bottom, each row being stored from left to right. This "left to right" then "top to bottom" raster-scan order is reflected in the layout of the compressed data as well.

 Provision has been made in the VP8 bitstream header for the support of a secondary YUV color format, in the form of a reserved bit.

 Occasionally, at very low datarates, a compression system may decide to reduce the resolution of the input signal to facilitate efficient compression. The VP8 data format supports this via optional upscaling of its internal reconstruction buffer prior to output (this is completely distinct from the optional postprocessing discussed earlier, which has nothing to do with decoding per se). This upsampling restores the video frames to their original resolution. In other words, the compression/decompression system can be viewed as a "black box", where the input and output is always at a given resolution. The compressor might decide to "cheat" and process the signal at a lower resolution. In that case, the decompressor needs the ability to restore the signal to its original resolution.

 Internally, VP8 decomposes each output frame into an array of macroblocks. A macroblock is a square array of pixels whose Y dimensions are 16x16 and whose U and V dimensions are 8x8. Macroblock-level data in a compressed frame occurs (and must be processed) in a raster order similar to that of the pixels comprising the frame.

 Macroblocks are further decomposed into 4x4 subblocks. Every macroblock has 16 Y subblocks, 4 U subblocks, and 4 V subblocks. Any subblock-level data (and processing of such data) again occurs in raster order, this time in raster order within the containing macroblock.

 As discussed in further detail below, data can be specified at the levels of both macroblocks and their subblocks.

 Pixels are always treated, at a minimum, at the level of subblocks, which may be thought of as the "atoms" of the VP8 algorithm. In particular, the 2x2 chroma blocks corresponding to 4x4 Y subblocks are never treated explicitly in the data format or in the algorithm specification.

 The DCT and WHT always operate at a 4x4 resolution. The DCT is used for the 16Y, 4U and 4V subblocks. The WHT is used (with some but not all prediction modes) to encode a 4x4 array comprising the average intensities of the 16 Y subblocks of a macroblock. These average intensities are, up to a constant normalization factor, nothing more that the zero<sup>th</sup> DCT coefficients of the Y subblocks. This "higher-level" WHT is a substitute for the explicit specification of those coefficients, in exactly the same way as the DCT of a subblock substitutes for the specification of the pixel values comprising the subblock.  We consider this 4x4 array as a _second-order_ subblock called Y2, and think of a macroblock as containing 24 "real" subblocks and, sometimes, a 25th "virtual" subblock. This is dealt with further in Chapter 13.

 The frame layout used by the reference decoder may be found in the file `yv12config.h`.


	### Chapter 2: Format Overview {#h-02-00}

	VP8 works exclusively with an 8-bit YUV 4:2:0 image format. In this format, each 8-bit pixel in the two chroma planes (U and V) corresponds positionally to a 2x2 block of 8-bit luma pixels in the Y plane; coordinates of the upper left corner of the Y block are of course exactly twice the coordinates of the corresponding chroma pixels. When we refer to pixels or pixel distances without specifying a plane, we are implicitly referring to the Y plane or to the complete image, both of which have the same (full) resolution.

	As is usually the case, the pixels are simply a large array of bytes stored in rows from top to bottom, each row being stored from left to right. This "left to right" then "top to bottom" raster-scan order is reflected in the layout of the compressed data as well.

	Provision has been made in the VP8 bitstream header for the support of a secondary YUV color format, in the form of a reserved bit.

	Occasionally, at very low datarates, a compression system may decide to reduce the resolution of the input signal to facilitate efficient compression. The VP8 data format supports this via optional upscaling of its internal reconstruction buffer prior to output (this is completely distinct from the optional postprocessing discussed earlier, which has nothing to do with decoding per se). This upsampling restores the video frames to their original resolution. In other words, the compression/decompression system can be viewed as a "black box", where the input and output is always at a given resolution. The compressor might decide to "cheat" and process the signal at a lower resolution. In that case, the decompressor needs the ability to restore the signal to its original resolution.

	Internally, VP8 decomposes each output frame into an array of macroblocks. A macroblock is a square array of pixels whose Y dimensions are 16x16 and whose U and V dimensions are 8x8. Macroblock-level data in a compressed frame occurs (and must be processed) in a raster order similar to that of the pixels comprising the frame.

	Macroblocks are further decomposed into 4x4 subblocks. Every macroblock has 16 Y subblocks, 4 U subblocks, and 4 V subblocks. Any subblock-level data (and processing of such data) again occurs in raster order, this time in raster order within the containing macroblock.

	As discussed in further detail below, data can be specified at the levels of both macroblocks and their subblocks.

	Pixels are always treated, at a minimum, at the level of subblocks, which may be thought of as the "atoms" of the VP8 algorithm. In particular, the 2x2 chroma blocks corresponding to 4x4 Y subblocks are never treated explicitly in the data format or in the algorithm specification.

	The DCT and WHT always operate at a 4x4 resolution. The DCT is used for the 16Y, 4U and 4V subblocks. The WHT is used (with some but not all prediction modes) to encode a 4x4 array comprising the average intensities of the 16 Y subblocks of a macroblock. These average intensities are, up to a constant normalization factor, nothing more that the zero<sup>th</sup> DCT coefficients of the Y subblocks. This "higher-level" WHT is a substitute for the explicit specification of those coefficients, in exactly the same way as the DCT of a subblock substitutes for the specification of the pixel values comprising the subblock. We consider this 4x4 array as a _second-order_ subblock called Y2, and think of a macroblock as containing 24 "real" subblocks and, sometimes, a 25th "virtual" subblock. This is dealt with further in Chapter 13.

	The frame layout used by the reference decoder may be found in the file `yv12config.h`.