* commit '82bb3048013201c0095d2853d4623633d912252f':
dsputil: Use correct type in me_cmp_func function pointer
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '0e083d7e43805db1a978cb57bfa25fda62e8ff18':
build: Group general components separate from de/encoders in arch Makefiles
Conflicts:
libavcodec/arm/Makefile
libavcodec/x86/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '5169e688956be3378adb3b16a93962fe0048f1c9':
dsputil: Propagate bit depth information to all (sub)init functions
Conflicts:
libavcodec/arm/dsputil_init_arm.c
libavcodec/arm/dsputil_init_armv5te.c
libavcodec/arm/dsputil_init_armv6.c
libavcodec/arm/dsputil_init_neon.c
libavcodec/dsputil.c
libavcodec/dsputil.h
libavcodec/ppc/dsputil_ppc.c
libavcodec/x86/dsputil_init.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Replace mulps+subps with fnmaddps, resulting in two less instructions inside the
inner loops.
About 1% faster FMA3 performance.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'db3f61a04f1f66746660f921bb2780ddf1141f3b':
x86: dsputil_init: Drop some unnecessary parentheses
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '6a8b35dc88b4a1a452f192fbbf53ae7f59bc3f23':
dsputilenc_mmx: Merge two assignment blocks with identical conditions
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '55519926ef855c671d084ccc151056de9e3d3a77':
x86: Make function prototype comments in assembly code consistent
Conflicts:
libavcodec/x86/sbrdsp.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'edd1f833fa145eb9c5026877c699ebe6efca00a0':
x86: h264_idct_10_bit: Use proper type in function prototype comments
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '831a1180785a786272cdcefb71566a770bfb879e':
Update dsputil- and SIMD-related comments to match reality more closely
Conflicts:
libavcodec/x86/hpeldsp.asm
libavutil/arm/float_dsp_init_arm.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Also undo the changes to ra144enc.c from previous commits.
Should fix ticket #3429
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '3741aa37c2a0d0717faff74a5c4cc357d16f6d1d':
x86: cabac: Use correct #includes to make header compile standalone
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Should fix compilation failures with --disable-yasm on some compilers
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This reverts the changes 64672098361361cd15d37e36f747ab44de5b80ca
and 68c3ed936a76c3ff7738f602fa90237ac7e3ce08 did to the SSE2 version,
which generated a hit of about 5 cycles.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Sandy Bridge Win64:
180 cycles on ff_synth_filter_inner_sse2
150 cycles on ff_synth_filter_inner_avx
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Timings for Arrandale:
C SSE
win32: 2108 334
win64: 1152 322
Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with
the jmp destination being aligned.
Unrolling for ARCH_X86_64 is a 20 cycles gain.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>