openh264/codec/processing
Sindre Aamås 1995e03d91 [Processing/x86] Add an SSSE3 implementation of GeneralBilinearFastDownsample
Keep track of relative pixel offsets and utilize pshufb to efficiently
extract relevant pixels for horizontal scaling ratios <= 4.

Fall back to a generic approach for ratios > 4. Note that the generic
approach can be backported to SSE2.

The implementation assumes that data beyond the end of each line,
before the next line begins, can be dirtied; which AFAICT is safe with
the current usage of these routines.

Speedup is ~6.67x/~3.26x (32-bit/64-bit) for horizontal ratios <= 2,
~6.24x/~3.00x for ratios within (2, 4], and ~4.89x/~2.17x for ratios
> 4 when not memory-bound on Haswell as compared with the current SSE2
implementation.
2016-05-23 20:23:31 +02:00
..
build/win32 Rename a vcproj folder to camelcase, to match all other folders in the same project 2015-03-25 11:46:41 +02:00
interface Remove tabs in struct and class definitions 2015-06-10 10:22:01 +03:00
src [Processing/x86] Add an SSSE3 implementation of GeneralBilinearFastDownsample 2016-05-23 20:23:31 +02:00
targets.mk refine common moudle for part of intra prediction function 2014-09-25 14:03:11 +08:00