Also add mmxext versions of vsad8 and vsad_intra8, and sse2 versions of
vsad16 and vsad_intra16.
Since vsad8 and vsad16 are not bitexact, they are accordingly marked as
approximate.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
No point in having the sad8 functions separate now that the loop is no
longer unrolled.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
This adds back support for 8x4 and 8x16
it does not support 8x2, i think nothing uses that
Found-by: ubitux
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Also add a missing c->pix_abs[0][0] initialization, and sse2 versions of
sad16_x2, sad16_y2 and sad16_xy2 (%15 to %20 faster than mmxext).
Since the _xy2 versions are not bitexact, they are accordingly marked as
approximate.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>