text_src/13.03__vp8-bitstream__token-probabilities.txt - webm/bitstream-guide - Git at Google



 #### 13.3 Token Probabilities                                {#h-13-03}


 The probability specification for the token tree (unlike that for the "extra bits" described above) is rather involved. It uses three pieces of context to index a large probability table, the contents of which may be incrementally modified in the frame header. The full (non-constant) probability table is laid out as follows.


 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Prob coeff_probs [4] [8] [3] [num_dct_tokens-1];
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 {:lang="c"}


 Working from the outside in, the outermost dimension is indexed by the type of plane being decoded:

   * `0` - Y beginning at coefficient 1 (i.e., Y after Y2)
   * `1` - Y2
   * `2` - U or V
   * `3` - Y beginning at coefficient 0 (i.e., Y in the absence of Y2).

 The next dimension is selected by the position of the coefficient being decoded. That position c steps by ones up to 15, starting from zero for block types 1, 2, or 3 and starting from one for block type 0. The second array index is then


 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 coeff_bands [c]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 {:lang="c"}


 where


 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 const int coeff_bands [16] = {
      0, 1, 2, 3, 6, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 7
 };
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 {:lang="c"}


 is a fixed mapping of position to "band".

 The third dimension is the trickiest. Roughly speaking, it measures the "local complexity" or extent to which nearby coefficients are non-zero.

 For the first coefficient (DC, unless the block type is 0), we consider the (already encoded) blocks within the same plane (Y2, Y, U, or V) above and to the left of the current block. The context index is then the number (0, 1 or 2) of these blocks that had at least one non-zero coefficient in their residue record. Specifically for Y2, because macroblocks above and to the left may or may not have a Y2 block, the block above is determined by the most recent macroblock in the same column that has a Y2 block, and the block to the left is determined by the most recent macroblock in the same row that has a Y2 block.

 Beyond the first coefficient, the context index is determined by the absolute value of the most recently decoded coefficient (necessarily within the current block) and is `0` if the last coefficient was a zero, `1` if it was plus or minus one, and `2` if its absolute value exceeded one.

 Note that the intuitive meaning of this measure changes as coefficients are decoded. For example, prior to the first token, a zero means that the neighbors are empty, suggesting that the current block may also be empty. After the first token, because an end-of-block token must have at least one non-zero value before it, a zero means that we just decoded a zero and hence guarantees that a non-zero coefficient will appear later in this block. However, this shift in meaning is perfectly okay because the complete context depends also on the coefficient band (and since band 0 is occupied exclusively by position 0).

 As with other contexts used by VP8, the "neighboring block" context described here needs a special definition for subblocks lying along the top row or left edge of the frame. These "non-existent" predictors above and to the left of the image are simply taken to be empty -- that is, taken to contain no non-zero coefficients.

 The residue decoding of each macroblock then requires, in each of two directions (above and to the left), an aggregate coefficient predictor consisting of a single Y2 predictor, two predictors for each of U and V, and four predictors for Y. In accordance with the scan-ordering of macroblocks, a decoder needs to maintain a single "left" aggregate predictor and a row of "above" aggregate predictors.

 Before decoding any residue, these maintained predictors may simply be cleared, in compliance with the definition of "non-existent" prediction. After each block is decoded, the two predictors referenced by the block are replaced with the (empty or non-empty) state of the block, in preparation for the later decoding of the blocks below and to the right of the block just decoded.

 The fourth, and final, dimension of the token probability array is of course indexed by (half) the position in the token tree structure, as are all tree probability arrays.

 The below pseudocode illustrates the decoding process. Note that criteria, functions, etc. delimited with `**` are either dependent on decoder architecture or are elaborated on elsewhere in this document.


 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 int block[16] = { 0 }; /* current 4x4 block coeffs */
 int firstCoeff = 0;
 int plane;
 int ctx2;
 int ctx3 = 0; /* the 3rd context referred to in above description */
 Prob *probTable;
 int token;
 int sign;
 int absValue;
 int extraBits;
 bool prevCoeffWasZero = false;
 bool currentBlockHasCoeffs = false;
 /* base coeff abs values per each category, elem #0 is
    DCT_VAL_CATEGORY1, * #1 is DCT_VAL_CATEGORY2 etc */
 int categoryBase[6] = { 5, 7, 11, 19, 35, 67 };

 /* Determine plane to use */
 if( **current_block_is_Y2_block** )       plane = 0;
 else if ( **current_block_is_chroma** )   plane = 2;
 else if ( **current_macroblock_has_Y2** ) plane = 1;
 else                                      plane = 3;

 /* For luma blocks of an "Y2 macroblock" we skip coeff index #0 */
 if( plane == 1 )
     firstCoeff++;

 /* Determine whether neighbour 4x4 blocks have coeffiecients.
    This is dependant of the plane we are currently decoding;
    i.e. we check only coefficients from same plane as current
    block. */
 if( **left_neighbor_block_has_coefficients(plane)** )
     ctx3++;
 if( **above_neighbor_block_has_coefficients(plane)** )
     ctx3++;

 for( i = firstCoeff ; i < 16 ; ++i )
 {
     ctx2 = coeff_bands[i];
     probTable = coeff_probs[plane][ctx2][ctx3];

     /* skip first code (dct_eob) if previous token was DCT_0 */
     if( prevCoeffWasZero )
         token = treed_read ( d, **coeff_tree_without_eob**,
           probTable );
     else
         token = treed_read ( d, coeff_tree, probTable );

     if( token == dct_eob )
         break;

     if( token != DCT_0 )
     {
         currentBlockHasCoeffs = true;
   if( **token_has_extra_bits(token)** )
   {
       extraBits = DCTextra( token );
       absValue =
           categoryBase[**token_to_cat_index(token)**] +
     extraBits;
   }
   else
   {
       absValue = **token_to_abs_value(token)**;
   }

   sign = read_bool(d, 128);
         block[i] = sign ? -absValue : absValue;
     }
     else
     {
         absValue = 0;
     }

     /* Set contexts and stuff for next coeff */
     if( absValue == 0 )         ctx3 = 0;
     else if ( absValue == 1 )   ctx3 = 1;
     else                        ctx3 = 2;
     prevCoeffWasZero = true;
 }

 /* Store current block status to decoder internals */
 **block_has_coefficients[currentMb][currentBlock]** =
   currentBlockHasCoeffs;
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 {:lang="c"}


 While we have in fact completely described the coefficient decoding procedure, the reader will probably find it helpful to consult the reference implementation, which can be found in the file `tokens.c`.


	#### 13.3 Token Probabilities {#h-13-03}


	The probability specification for the token tree (unlike that for the "extra bits" described above) is rather involved. It uses three pieces of context to index a large probability table, the contents of which may be incrementally modified in the frame header. The full (non-constant) probability table is laid out as follows.


	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	Prob coeff_probs [4] [8] [3] [num_dct_tokens-1];
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	{:lang="c"}


	Working from the outside in, the outermost dimension is indexed by the type of plane being decoded:

	* `0` - Y beginning at coefficient 1 (i.e., Y after Y2)
	* `1` - Y2
	* `2` - U or V
	* `3` - Y beginning at coefficient 0 (i.e., Y in the absence of Y2).

	The next dimension is selected by the position of the coefficient being decoded. That position c steps by ones up to 15, starting from zero for block types 1, 2, or 3 and starting from one for block type 0. The second array index is then


	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	coeff_bands [c]
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	{:lang="c"}


	where


	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	const int coeff_bands [16] = {
	0, 1, 2, 3, 6, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 7
	};
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	{:lang="c"}


	is a fixed mapping of position to "band".

	The third dimension is the trickiest. Roughly speaking, it measures the "local complexity" or extent to which nearby coefficients are non-zero.

	For the first coefficient (DC, unless the block type is 0), we consider the (already encoded) blocks within the same plane (Y2, Y, U, or V) above and to the left of the current block. The context index is then the number (0, 1 or 2) of these blocks that had at least one non-zero coefficient in their residue record. Specifically for Y2, because macroblocks above and to the left may or may not have a Y2 block, the block above is determined by the most recent macroblock in the same column that has a Y2 block, and the block to the left is determined by the most recent macroblock in the same row that has a Y2 block.

	Beyond the first coefficient, the context index is determined by the absolute value of the most recently decoded coefficient (necessarily within the current block) and is `0` if the last coefficient was a zero, `1` if it was plus or minus one, and `2` if its absolute value exceeded one.

	Note that the intuitive meaning of this measure changes as coefficients are decoded. For example, prior to the first token, a zero means that the neighbors are empty, suggesting that the current block may also be empty. After the first token, because an end-of-block token must have at least one non-zero value before it, a zero means that we just decoded a zero and hence guarantees that a non-zero coefficient will appear later in this block. However, this shift in meaning is perfectly okay because the complete context depends also on the coefficient band (and since band 0 is occupied exclusively by position 0).

	As with other contexts used by VP8, the "neighboring block" context described here needs a special definition for subblocks lying along the top row or left edge of the frame. These "non-existent" predictors above and to the left of the image are simply taken to be empty -- that is, taken to contain no non-zero coefficients.

	The residue decoding of each macroblock then requires, in each of two directions (above and to the left), an aggregate coefficient predictor consisting of a single Y2 predictor, two predictors for each of U and V, and four predictors for Y. In accordance with the scan-ordering of macroblocks, a decoder needs to maintain a single "left" aggregate predictor and a row of "above" aggregate predictors.

	Before decoding any residue, these maintained predictors may simply be cleared, in compliance with the definition of "non-existent" prediction. After each block is decoded, the two predictors referenced by the block are replaced with the (empty or non-empty) state of the block, in preparation for the later decoding of the blocks below and to the right of the block just decoded.

	The fourth, and final, dimension of the token probability array is of course indexed by (half) the position in the token tree structure, as are all tree probability arrays.

	The below pseudocode illustrates the decoding process. Note that criteria, functions, etc. delimited with `**` are either dependent on decoder architecture or are elaborated on elsewhere in this document.


	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	int block[16] = { 0 }; /* current 4x4 block coeffs */
	int firstCoeff = 0;
	int plane;
	int ctx2;
	int ctx3 = 0; /* the 3rd context referred to in above description */
	Prob *probTable;
	int token;
	int sign;
	int absValue;
	int extraBits;
	bool prevCoeffWasZero = false;
	bool currentBlockHasCoeffs = false;
	/* base coeff abs values per each category, elem #0 is
	DCT_VAL_CATEGORY1, * #1 is DCT_VAL_CATEGORY2 etc */
	int categoryBase[6] = { 5, 7, 11, 19, 35, 67 };

	/* Determine plane to use */
	if( current_block_is_Y2_block ) plane = 0;
	else if ( current_block_is_chroma ) plane = 2;
	else if ( current_macroblock_has_Y2 ) plane = 1;
	else plane = 3;

	/* For luma blocks of an "Y2 macroblock" we skip coeff index #0 */
	if( plane == 1 )
	firstCoeff++;

	/* Determine whether neighbour 4x4 blocks have coeffiecients.
	This is dependant of the plane we are currently decoding;
	i.e. we check only coefficients from same plane as current
	block. */
	if( left_neighbor_block_has_coefficients(plane) )
	ctx3++;
	if( above_neighbor_block_has_coefficients(plane) )
	ctx3++;

	for( i = firstCoeff ; i < 16 ; ++i )
	{
	ctx2 = coeff_bands[i];
	probTable = coeff_probs[plane][ctx2][ctx3];

	/* skip first code (dct_eob) if previous token was DCT_0 */
	if( prevCoeffWasZero )
	token = treed_read ( d, coeff_tree_without_eob,
	probTable );
	else
	token = treed_read ( d, coeff_tree, probTable );

	if( token == dct_eob )
	break;

	if( token != DCT_0 )
	{
	currentBlockHasCoeffs = true;
	if( token_has_extra_bits(token) )
	{
	extraBits = DCTextra( token );
	absValue =
	categoryBase[token_to_cat_index(token)] +
	extraBits;
	}
	else
	{
	absValue = token_to_abs_value(token);
	}

	sign = read_bool(d, 128);
	block[i] = sign ? -absValue : absValue;
	}
	else
	{
	absValue = 0;
	}

	/* Set contexts and stuff for next coeff */
	if( absValue == 0 ) ctx3 = 0;
	else if ( absValue == 1 ) ctx3 = 1;
	else ctx3 = 2;
	prevCoeffWasZero = true;
	}

	/* Store current block status to decoder internals */
	block_has_coefficients[currentMb][currentBlock] =
	currentBlockHasCoeffs;
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	{:lang="c"}


	While we have in fact completely described the coefficient decoding procedure, the reader will probably find it helpful to consult the reference implementation, which can be found in the file `tokens.c`.