Commit Graph

3 Commits

Author SHA1 Message Date
Kyle Siefring
ae35425ae6 Optimize convolve8 SSSE3 and AVX2 intrinsics
Changed the intrinsics to perform summation similiar to the way the assembly does.

The new code diverges from the assembly by preferring unsaturated additions.

Results for haswell

SSSE3
Horiz/Vert  Size  Speedup
Horiz       x4    ~32%
Horiz       x8    ~6%
Vert        x8    ~4%

AVX2
Horiz/Vert  Size  Speedup
Horiz       x16   ~16%
Vert        x16   ~14%

BUG=webm:1471

Change-Id: I7ad98ea688c904b1ba324adf8eb977873c8b8668
2017-10-24 10:39:48 -04:00
Linfeng Zhang
580d32240f Add 4 to 3 scaling SSSE3 optimization
Note this change will trigger the different C version on SSSE3 and
generate different scaled output.

Its speed is 2x compared with the version calling vpx_scaled_2d_ssse3().

Change-Id: I17fff122cd0a5ac8aa451d84daa606582da8e194
2017-10-16 15:42:42 -07:00
Linfeng Zhang
6543213e87 Refactor x86/vpx_subpixel_8t_intrin_ssse3.c
Change-Id: Id6a8c549709a3c516ed5d7b719b05117c5ef8bac
2017-10-03 13:02:05 -07:00