| |
| |
| #### 18.4 Filter Properties {#h-18-04} |
| |
| |
| We discuss briefly the rationale behind the choice of filters. Our |
| approach is necessarily cursory; a genuinely accurate discussion |
| would require a couple of books. Readers unfamiliar with signal |
| processing may or may not wish to skip this. |
| |
| All digital signals are of course sampled in some fashion. The case |
| where the inter-sample spacing (say in time for audio samples, or |
| space for pixels) is uniform, that is, the same at all positions, is |
| particularly common and amenable to analysis. Many aspects of the |
| treatment of such signals are best-understood in the frequency domain |
| via Fourier Analysis, particularly those aspects of the signal that |
| are not changed by shifts in position, especially when those |
| positional shifts are not given by a whole number of samples. |
| |
| Non-integral translates of a sampled signal are a textbook example of |
| the foregoing. In our case of non-integral motion vectors, we wish |
| to say what the underlying image "really is" at these pixels; |
| although we don't have values for them, we feel that it makes sense |
| to talk about them. The correctness of this feeling is predicated on |
| the underlying signal being band-limited, that is, not containing any |
| energy in spatial frequencies that cannot be faithfully rendered at |
| the pixel resolution at our disposal. In one dimension, this range |
| of "OK" frequencies is called the Nyquist band; in our two- |
| dimensional case of integer-grid samples, this range might be termed |
| a Nyquist rectangle. The finer the grid, the more we know about the |
| image, and the wider the Nyquist rectangle. |
| |
| It turns out that, for such band-limited signals, there is indeed an |
| exact mathematical formula to produce the correct sample value at an |
| arbitrary point. Unfortunately, this calculation requires the |
| consideration of every single sample in the image, as well as needing |
| to operate at infinite precision. Also, strictly speaking, all band- |
| limited signals have infinite spatial (or temporal) extent, so |
| everything we are discussing is really some sort of approximation. |
| |
| It is true that the theoretically correct subsampling procedure, as |
| well as any approximation thereof, is always given by a translation- |
| invariant weighted sum (or filter) similar to that used by VP8. It |
| is also true that the reconstruction error made by such a filter can |
| be simply represented as a multiplier in the frequency domain; that |
| is, such filters simply multiply the Fourier transform of any signal |
| to which they are applied by a fixed function associated to the |
| filter. This fixed function is usually called the frequency response |
| (or transfer function); the ideal subsampling filter has a frequency |
| response equal to one in the Nyquist rectangle and zero everywhere |
| else. |
| |
| Another basic fact about approximations to "truly correct" |
| subsampling is that the wider the subrectangle (within the Nyquist |
| rectangle) of spatial frequencies one wishes to "pass" (that is, |
| correctly render) or, put more accurately, the closer one wishes to |
| approximate the ideal transfer function, the more samples of the |
| original signal must be considered by the subsampling, and the wider |
| the calculation precision necessitated. |
| |
| The filters chosen by VP8 were chosen, within the constraints of 4 or |
| 6 taps and 7-bit precision, to do the best possible job of handling |
| the low spatial frequencies near the 0th DC frequency along with |
| introducing no resonances (places where the absolute value of the |
| frequency response exceeds one). |
| |
| The justification for the foregoing has two parts. First, resonances |
| can produce extremely objectionable visible artifacts when, as often |
| happens in actual compressed video streams, filters are applied |
| repeatedly. Second, the vast majority of energy in real-world images |
| lies near DC and not at the high end. |
| |
| To get slightly more specific, the filters chosen by VP8 are the best |
| resonance-free 4- or 6-tap filters possible, where "best" describes |
| the frequency response near the origin: The response at 0 is required |
| to be 1, and the graph of the response at 0 is as flat as possible. |
| |
| To provide an intuitively more obvious point of reference, the "best" |
| 2-tap filter is given by simple linear interpolation between the |
| surrounding actual pixels. |
| |
| Finally, it should be noted that, because of the way motion vectors |
| are calculated, the (shorter) 4-tap filters (used for odd fractional |
| displacements) are applied in the chroma plane only. Human color |
| perception is notoriously poor, especially where higher spatial |
| frequencies are involved. The shorter filters are easier to |
| understand mathematically, and the difference between them and a |
| theoretically slightly better 6-tap filter is negligible where chroma |
| is concerned. |
| |