openh264

History

Sindre Aamås e490215990 [Processing/x86] Add an AVX2 implementation of GeneralBilinearAccurateDownsample

Keep track of relative pixel offsets and utilize pshufb to efficiently
extract relevant pixels for horizontal scaling ratios <= 8. Because
pshufb does not cross 128-bit lanes, the overhead of address
calculations and loads is relatively greater as compared with an
SSSE3/SSE4.1 implementation.

Fall back to a generic approach for ratios > 8.

The implementation assumes that data beyond the end of each line,
before the next line begins, can be dirtied; which AFAICT is safe with
the current usage of these routines.

Speedup is ~8.52x/~6.89x (32-bit/64-bit) for horizontal ratios <= 2,
~7.81x/~6.13x for ratios within (2, 4], ~5.81x/~4.52x for ratios
within (4, 8], and ~5.06x/~4.09x for ratios > 8 when not memory-bound
on Haswell as compared with the current SSE2 implementation.

2016-05-23 20:23:47 +02:00

build/win32

Rename a vcproj folder to camelcase, to match all other folders in the same project

2015-03-25 11:46:41 +02:00

interface

Remove tabs in struct and class definitions

2015-06-10 10:22:01 +03:00

src

[Processing/x86] Add an AVX2 implementation of GeneralBilinearAccurateDownsample

2016-05-23 20:23:47 +02:00

targets.mk

refine common moudle for part of intra prediction function

2014-09-25 14:03:11 +08:00