openh264/test/processing
Sindre Aamås b43e58a366 [Processing/x86] Add an AVX2 implementation of GeneralBilinearFastDownsample
Keep track of relative pixel offsets and utilize pshufb to efficiently
extract relevant pixels for horizontal scaling ratios <= 8. Because
pshufb does not cross 128-bit lanes, the overhead of address
calculations and loads is relatively greater as compared with an
SSSE3 implementation.

Fall back to a generic approach for ratios > 8.

The implementation assumes that data beyond the end of each line,
before the next line begins, can be dirtied; which AFAICT is safe with
the current usage of these routines.

Speedup is ~10.42x/~5.23x (32-bit/64-bit) for horizontal ratios <= 2,
~9.49x/~4.64x for ratios within (2, 4], ~6.43x/~3.18x for ratios
within (4, 8], and ~5.42x/~2.50x for ratios > 8 when not memory-bound
on Haswell as compared with the current SSE2 implementation.
2016-05-23 20:23:47 +02:00
..
ProcessUT_AdaptiveQuantization.cpp Add checks for cpu features in tests 2015-01-24 22:47:23 +02:00
ProcessUT_DownSample.cpp [Processing/x86] Add an AVX2 implementation of GeneralBilinearFastDownsample 2016-05-23 20:23:47 +02:00
ProcessUT_ScrollDetection.cpp rename namespace and funciton name to avoid conflicts with old library 2014-09-17 15:50:59 +08:00
ProcessUT_VaaCalc.cpp [UT] Test VAA routines with a wider variety of resolutions 2016-04-11 16:40:36 +02:00
targets.mk improve py, and change mk according to mk 2014-09-12 10:25:46 +08:00