| |
| |
| ### Section 1: Introduction {#h-01-00} |
| |
| This document describes the VP8 compressed video data format, |
| together with a discussion of the decoding procedure for the format. |
| It is intended to be used in conjunction with, and as a guide to, the |
| reference decoder source code provided in Attachment One |
| (Section 20). If there are any conflicts between this narrative and |
| the reference source code, the reference source code should be |
| considered correct. The bitstream is defined by the reference source |
| code and not this narrative. |
| |
| Like many modern video compression schemes, VP8 is based on |
| decomposition of frames into square subblocks of pixels, prediction |
| of such subblocks using previously constructed blocks, and adjustment |
| of such predictions (as well as synthesis of unpredicted blocks) |
| using a discrete cosine transform (hereafter abbreviated as DCT). In |
| one special case, however, VP8 uses a Walsh-Hadamard transform |
| (hereafter abbreviated as WHT) instead of a DCT. |
| |
| Roughly speaking, such systems reduce datarate by exploiting the |
| temporal and spatial coherence of most video signals. It is more |
| efficient to specify the location of a visually similar portion of a |
| prior frame than it is to specify pixel values. The frequency |
| segregation provided by the DCT and WHT facilitates the exploitation |
| of both spatial coherence in the original signal and the tolerance of |
| the human visual system to moderate losses of fidelity in the |
| reconstituted signal. |
| |
| VP8 augments these basic concepts with, among other things, |
| sophisticated usage of contextual probabilities. The result is a |
| significant reduction in datarate at a given quality. |
| |
| Unlike some similar schemes (the older MPEG formats, for example), |
| VP8 specifies exact values for reconstructed pixels. Specifically, |
| the specification for the DCT and WHT portions of the reconstruction |
| does not allow for any "drift" caused by truncation of fractions. |
| Rather, the algorithm is specified using fixed-precision integer |
| operations exclusively. This greatly facilitates the verification of |
| the correctness of a decoder implementation and also avoids |
| difficult-to-predict visual incongruities between such |
| implementations. |
| |
| It should be remarked that, in a complete video playback system, the |
| displayed frames may or may not be identical to the reconstructed |
| frames. Many systems apply a final level of filtering (commonly |
| referred to as postprocessing) to the reconstructed frames prior to |
| viewing. Such postprocessing has no effect on the decoding and |
| reconstruction of subsequent frames (which are predicted using the |
| completely specified reconstructed frames) and is beyond the scope of |
| this document. In practice, the nature and extent of this sort of |
| postprocessing is dependent on both the taste of the user and on the |
| computational facilities of the playback environment. |
| |
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", |
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this |
| document are to be interpreted as described in RFC 2119 [RFC2119]. |
| |