Upsampling SSE2/SSE4 speedup.

RGB to YUV conversion was not using SSE to finish up the row.
End data is now copied to a buffer big enough to fit in a
SSE register.
(UPSAMPLE_LAST_BLOCK was already using that trick).

Change-Id: Ie539bcbe570a643a774aa88263503c0d2c41890f
2 files changed