openh264

History

Sindre Aamås 57fc3e9917 [Processing] Add AVX2 VAA routines

Process 8 lines at a time rather than 16 lines at a time because
this appears to give more reliable memory subsystem performance on
Haswell.

Speedup is > 2x as compared to SSE2 when not memory-bound on Haswell.
On my Haswell MBP, VAACalcSadSsdBgd is about ~3x faster when uncached,
which appears to be related to processing 8 lines at a time as opposed
to 16 lines at a time. The other routines are also faster as compared
to the SSE2 routines in this case but to a lesser extent.

2016-04-11 16:09:56 +02:00

ProcessUT_AdaptiveQuantization.cpp

Add checks for cpu features in tests

2015-01-24 22:47:23 +02:00

ProcessUT_DownSample.cpp

add new c and assembly functions to optimize downsampler when downscale equal 1:3/1:4

2015-09-11 16:45:40 +08:00

ProcessUT_ScrollDetection.cpp

rename namespace and funciton name to avoid conflicts with old library

2014-09-17 15:50:59 +08:00

ProcessUT_VaaCalc.cpp

[Processing] Add AVX2 VAA routines

2016-04-11 16:09:56 +02:00

targets.mk

improve py, and change mk according to mk

2014-09-12 10:25:46 +08:00