openh264/codec/common
Sindre Aamås 1995e03d91 [Processing/x86] Add an SSSE3 implementation of GeneralBilinearFastDownsample
Keep track of relative pixel offsets and utilize pshufb to efficiently
extract relevant pixels for horizontal scaling ratios <= 4.

Fall back to a generic approach for ratios > 4. Note that the generic
approach can be backported to SSE2.

The implementation assumes that data beyond the end of each line,
before the next line begins, can be dirtied; which AFAICT is safe with
the current usage of these routines.

Speedup is ~6.67x/~3.26x (32-bit/64-bit) for horizontal ratios <= 2,
~6.24x/~3.00x for ratios within (2, 4], and ~4.89x/~2.17x for ratios
> 4 when not memory-bound on Haswell as compared with the current SSE2
implementation.
2016-05-23 20:23:31 +02:00
..
arm add new AArch32 asm functions to support sub8x8 mode 2015-07-07 10:13:56 +08:00
arm64 modify neon comment 2016-04-14 14:49:11 +08:00
inc remove sink in WelsThreadPool and hide the construtor to finish the singleTon 2016-03-02 17:08:09 -08:00
src remove sink in WelsThreadPool and hide the construtor to finish the singleTon 2016-03-02 17:08:09 -08:00
x86 [Processing/x86] Add an SSSE3 implementation of GeneralBilinearFastDownsample 2016-05-23 20:23:31 +02:00
generate_version.sh More fixes for out-of-tree build: 2015-05-29 14:57:07 +02:00
targets.mk [Decoder] Use encoder x86 IDCT routines 2016-03-09 10:41:42 +01:00