openh264/codec/common
Sindre Aamås b6c4a5447c [Decoder/x86] IDCT one block at a time with SSE2
At lower bitrates, it is overall faster to conditionally do one block
at a time with SSE2 on Haswell and likely other common architectures.
At higher bitrates, it is faster to use the wider routine that IDCTs
four blocks at a time. To avoid potential performance regressions
as compared to MMX, stick with single-block IDCTs with SSE2. There
is still a performance advantage as compared to MMX because the
single-block SSE2 routine is faster than the corresponding MMX
routine.

Stick with four blocks at a time with AVX2 for which that appears
to be consistently faster on Haswell.
2016-03-16 19:55:11 +01:00
..
arm add new AArch32 asm functions to support sub8x8 mode 2015-07-07 10:13:56 +08:00
arm64 Add new ARM AArch64 assembly functions to support sub8x8 mode 2015-07-08 10:34:49 +08:00
inc remove sink in WelsThreadPool and hide the construtor to finish the singleTon 2016-03-02 17:08:09 -08:00
src remove sink in WelsThreadPool and hide the construtor to finish the singleTon 2016-03-02 17:08:09 -08:00
x86 [Decoder/x86] IDCT one block at a time with SSE2 2016-03-16 19:55:11 +01:00
generate_version.sh More fixes for out-of-tree build: 2015-05-29 14:57:07 +02:00
targets.mk [Decoder] Use encoder x86 IDCT routines 2016-03-09 10:41:42 +01:00