openh264/test/processing
Sindre Aamås 8a0af4a3f2 [Processing/x86] DyadicBilinearDownsample optimizations
Average vertically before horizontally; horizontal averaging is more
worksome. Doing the vertical averaging first reduces the number of
horizontal averages by half.

Use pmaddubsw and pavgw to do the horizontal averaging for a slight
performance improvement.

Minor tweaks.

Improve the SSSE3 dyadic downsample routines and drop the SSE4 routines.
The non-temporal loads used in the SSE4 routines do nothing for cache-
backed memory AFAIK.

Adjust tests because averaging vertically first gives slightly different
output.

~2.39x speedup for the widthx32 routine on Haswell when not memory-bound.
~2.20x speedup for the widthx16 routine on Haswell when not memory-bound.

Note that the widthx16 routine can be unrolled for further speedup.
2016-06-02 13:44:28 +02:00
..
ProcessUT_AdaptiveQuantization.cpp Add checks for cpu features in tests 2015-01-24 22:47:23 +02:00
ProcessUT_DownSample.cpp [Processing/x86] DyadicBilinearDownsample optimizations 2016-06-02 13:44:28 +02:00
ProcessUT_ScrollDetection.cpp rename namespace and funciton name to avoid conflicts with old library 2014-09-17 15:50:59 +08:00
ProcessUT_VaaCalc.cpp [UT] Test VAA routines with a wider variety of resolutions 2016-04-11 16:40:36 +02:00
targets.mk improve py, and change mk according to mk 2014-09-12 10:25:46 +08:00