text_src/18.04__vp8-bitstream__filter-properties.txt - webm/bitstream-guide - Git at Google



 #### 18.4 Filter Properties                                {#h-18-04}


 We discuss briefly the rationale behind the choice of filters.  Our
 approach is necessarily cursory; a genuinely accurate discussion
 would require a couple of books.  Readers unfamiliar with signal
 processing may or may not wish to skip this.

 All digital signals are of course sampled in some fashion.  The case
 where the inter-sample spacing (say in time for audio samples, or
 space for pixels) is uniform, that is, the same at all positions, is
 particularly common and amenable to analysis.  Many aspects of the
 treatment of such signals are best-understood in the frequency domain
 via Fourier Analysis, particularly those aspects of the signal that
 are not changed by shifts in position, especially when those
 positional shifts are not given by a whole number of samples.

 Non-integral translates of a sampled signal are a textbook example of
 the foregoing.  In our case of non-integral motion vectors, we wish
 to say what the underlying image "really is" at these pixels;
 although we don't have values for them, we feel that it makes sense
 to talk about them.  The correctness of this feeling is predicated on
 the underlying signal being band-limited, that is, not containing any
 energy in spatial frequencies that cannot be faithfully rendered at
 the pixel resolution at our disposal.  In one dimension, this range
 of "OK" frequencies is called the Nyquist band; in our two-
 dimensional case of integer-grid samples, this range might be termed
 a Nyquist rectangle.  The finer the grid, the more we know about the
 image, and the wider the Nyquist rectangle.

 It turns out that, for such band-limited signals, there is indeed an
 exact mathematical formula to produce the correct sample value at an
 arbitrary point.  Unfortunately, this calculation requires the
 consideration of every single sample in the image, as well as needing
 to operate at infinite precision.  Also, strictly speaking, all band-
 limited signals have infinite spatial (or temporal) extent, so
 everything we are discussing is really some sort of approximation.

 It is true that the theoretically correct subsampling procedure, as
 well as any approximation thereof, is always given by a translation-
 invariant weighted sum (or filter) similar to that used by VP8.  It
 is also true that the reconstruction error made by such a filter can
 be simply represented as a multiplier in the frequency domain; that
 is, such filters simply multiply the Fourier transform of any signal
 to which they are applied by a fixed function associated to the
 filter.  This fixed function is usually called the frequency response
 (or transfer function); the ideal subsampling filter has a frequency
 response equal to one in the Nyquist rectangle and zero everywhere
 else.

 Another basic fact about approximations to "truly correct"
 subsampling is that the wider the subrectangle (within the Nyquist
 rectangle) of spatial frequencies one wishes to "pass" (that is,
 correctly render) or, put more accurately, the closer one wishes to
 approximate the ideal transfer function, the more samples of the
 original signal must be considered by the subsampling, and the wider
 the calculation precision necessitated.

 The filters chosen by VP8 were chosen, within the constraints of 4 or
 6 taps and 7-bit precision, to do the best possible job of handling
 the low spatial frequencies near the 0th DC frequency along with
 introducing no resonances (places where the absolute value of the
 frequency response exceeds one).

 The justification for the foregoing has two parts.  First, resonances
 can produce extremely objectionable visible artifacts when, as often
 happens in actual compressed video streams, filters are applied
 repeatedly.  Second, the vast majority of energy in real-world images
 lies near DC and not at the high end.

 To get slightly more specific, the filters chosen by VP8 are the best
 resonance-free 4- or 6-tap filters possible, where "best" describes
 the frequency response near the origin: The response at 0 is required
 to be 1, and the graph of the response at 0 is as flat as possible.

 To provide an intuitively more obvious point of reference, the "best"
 2-tap filter is given by simple linear interpolation between the
 surrounding actual pixels.

 Finally, it should be noted that, because of the way motion vectors
 are calculated, the (shorter) 4-tap filters (used for odd fractional
 displacements) are applied in the chroma plane only.  Human color
 perception is notoriously poor, especially where higher spatial
 frequencies are involved.  The shorter filters are easier to
 understand mathematically, and the difference between them and a
 theoretically slightly better 6-tap filter is negligible where chroma
 is concerned.


	#### 18.4 Filter Properties {#h-18-04}


	We discuss briefly the rationale behind the choice of filters. Our
	approach is necessarily cursory; a genuinely accurate discussion
	would require a couple of books. Readers unfamiliar with signal
	processing may or may not wish to skip this.

	All digital signals are of course sampled in some fashion. The case
	where the inter-sample spacing (say in time for audio samples, or
	space for pixels) is uniform, that is, the same at all positions, is
	particularly common and amenable to analysis. Many aspects of the
	treatment of such signals are best-understood in the frequency domain
	via Fourier Analysis, particularly those aspects of the signal that
	are not changed by shifts in position, especially when those
	positional shifts are not given by a whole number of samples.

	Non-integral translates of a sampled signal are a textbook example of
	the foregoing. In our case of non-integral motion vectors, we wish
	to say what the underlying image "really is" at these pixels;
	although we don't have values for them, we feel that it makes sense
	to talk about them. The correctness of this feeling is predicated on
	the underlying signal being band-limited, that is, not containing any
	energy in spatial frequencies that cannot be faithfully rendered at
	the pixel resolution at our disposal. In one dimension, this range
	of "OK" frequencies is called the Nyquist band; in our two-
	dimensional case of integer-grid samples, this range might be termed
	a Nyquist rectangle. The finer the grid, the more we know about the
	image, and the wider the Nyquist rectangle.

	It turns out that, for such band-limited signals, there is indeed an
	exact mathematical formula to produce the correct sample value at an
	arbitrary point. Unfortunately, this calculation requires the
	consideration of every single sample in the image, as well as needing
	to operate at infinite precision. Also, strictly speaking, all band-
	limited signals have infinite spatial (or temporal) extent, so
	everything we are discussing is really some sort of approximation.

	It is true that the theoretically correct subsampling procedure, as
	well as any approximation thereof, is always given by a translation-
	invariant weighted sum (or filter) similar to that used by VP8. It
	is also true that the reconstruction error made by such a filter can
	be simply represented as a multiplier in the frequency domain; that
	is, such filters simply multiply the Fourier transform of any signal
	to which they are applied by a fixed function associated to the
	filter. This fixed function is usually called the frequency response
	(or transfer function); the ideal subsampling filter has a frequency
	response equal to one in the Nyquist rectangle and zero everywhere
	else.

	Another basic fact about approximations to "truly correct"
	subsampling is that the wider the subrectangle (within the Nyquist
	rectangle) of spatial frequencies one wishes to "pass" (that is,
	correctly render) or, put more accurately, the closer one wishes to
	approximate the ideal transfer function, the more samples of the
	original signal must be considered by the subsampling, and the wider
	the calculation precision necessitated.

	The filters chosen by VP8 were chosen, within the constraints of 4 or
	6 taps and 7-bit precision, to do the best possible job of handling
	the low spatial frequencies near the 0th DC frequency along with
	introducing no resonances (places where the absolute value of the
	frequency response exceeds one).

	The justification for the foregoing has two parts. First, resonances
	can produce extremely objectionable visible artifacts when, as often
	happens in actual compressed video streams, filters are applied
	repeatedly. Second, the vast majority of energy in real-world images
	lies near DC and not at the high end.

	To get slightly more specific, the filters chosen by VP8 are the best
	resonance-free 4- or 6-tap filters possible, where "best" describes
	the frequency response near the origin: The response at 0 is required
	to be 1, and the graph of the response at 0 is as flat as possible.

	To provide an intuitively more obvious point of reference, the "best"
	2-tap filter is given by simple linear interpolation between the
	surrounding actual pixels.

	Finally, it should be noted that, because of the way motion vectors
	are calculated, the (shorter) 4-tap filters (used for odd fractional
	displacements) are applied in the chroma plane only. Human color
	perception is notoriously poor, especially where higher spatial
	frequencies are involved. The shorter filters are easier to
	understand mathematically, and the difference between them and a
	theoretically slightly better 6-tap filter is negligible where chroma
	is concerned.