openh264/test/processing
Sindre Aamås b1013095b1 [Processing/x86] Add an SSE4.1 implementation of GeneralBilinearAccurateDownsample
Keep track of relative pixel offsets and utilize pshufb to efficiently
extract relevant pixels for horizontal scaling ratios <= 4.

Fall back to a generic approach for ratios > 4.

The use of blendps makes this require SSE4.1. The pshufb path can be
backported to SSSE3 and the generic path to SSE2 for a minor reduction
in performance by replacing blendps and preceding instructions with an
equivalent sequence.

The implementation assumes that data beyond the end of each line,
before the next line begins, can be dirtied; which AFAICT is safe with
the current usage of these routines.

Speedup is ~5.32x/~4.25x (32-bit/64-bit) for horizontal ratios <= 2,
~5.06x/~3.97x for ratios within (2, 4], and ~3.93x/~3.13x for ratios
> 4 when not memory-bound on Haswell as compared with the current SSE2
implementation.
2016-05-23 20:23:39 +02:00
..
ProcessUT_AdaptiveQuantization.cpp Add checks for cpu features in tests 2015-01-24 22:47:23 +02:00
ProcessUT_DownSample.cpp [Processing/x86] Add an SSE4.1 implementation of GeneralBilinearAccurateDownsample 2016-05-23 20:23:39 +02:00
ProcessUT_ScrollDetection.cpp rename namespace and funciton name to avoid conflicts with old library 2014-09-17 15:50:59 +08:00
ProcessUT_VaaCalc.cpp [UT] Test VAA routines with a wider variety of resolutions 2016-04-11 16:40:36 +02:00
targets.mk improve py, and change mk according to mk 2014-09-12 10:25:46 +08:00