02ff360b33
Also port the 4x4, 16x16, 8x16 and 16x8 versions to x86inc.asm; this makes them all slightly faster, particularly on x86-64. Remove SSE3 sad16x16 version, since the SSE2 version is now faster. About 1.5% overall encoding speedup. Change-Id: Id4011a78cce7839f554b301d0800d5ca021af797