openh264

History

Sindre Aamås b43e58a366 [Processing/x86] Add an AVX2 implementation of GeneralBilinearFastDownsample

Keep track of relative pixel offsets and utilize pshufb to efficiently
extract relevant pixels for horizontal scaling ratios <= 8. Because
pshufb does not cross 128-bit lanes, the overhead of address
calculations and loads is relatively greater as compared with an
SSSE3 implementation.

Fall back to a generic approach for ratios > 8.

The implementation assumes that data beyond the end of each line,
before the next line begins, can be dirtied; which AFAICT is safe with
the current usage of these routines.

Speedup is ~10.42x/~5.23x (32-bit/64-bit) for horizontal ratios <= 2,
~9.49x/~4.64x for ratios within (2, 4], ~6.43x/~3.18x for ratios
within (4, 8], and ~5.42x/~2.50x for ratios > 8 when not memory-bound
on Haswell as compared with the current SSE2 implementation.

2016-05-23 20:23:47 +02:00

ProcessUT_AdaptiveQuantization.cpp

Add checks for cpu features in tests

2015-01-24 22:47:23 +02:00

ProcessUT_DownSample.cpp

[Processing/x86] Add an AVX2 implementation of GeneralBilinearFastDownsample

2016-05-23 20:23:47 +02:00

ProcessUT_ScrollDetection.cpp

rename namespace and funciton name to avoid conflicts with old library

2014-09-17 15:50:59 +08:00

ProcessUT_VaaCalc.cpp

[UT] Test VAA routines with a wider variety of resolutions

2016-04-11 16:40:36 +02:00

targets.mk

improve py, and change mk according to mk

2014-09-12 10:25:46 +08:00