This commit adds the 8 tap SSSE3 subpixel filters back into the code
underneath the convolve API. The C code is still called for 4x4
blocks, as well as compound prediction modes. This restores the
encode performance to be within about 8% of the baseline.
Change-Id: Ife0d81477075ae33c05b53c65003951efdc8b09c