* qatar/master:
x86: ac3dsp: Remove 3dnow version of ff_ac3_extract_exponents
Conflicts:
tests/fate/ac3.mak
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The function requires increasing the fuzz factor for the ac3/eac3 encode
tests and even so makes fate fail. It only provides a slight encoding
speedup for legacy CPUs that do not support SS2. Thus its benefit is not
worth the trouble it creates and fixing it would be a waste of time.
* qatar/master:
x86: Move some conditional code around to avoid unused variable warnings
Conflicts:
libavcodec/x86/dsputil_mmx.c
libavfilter/x86/vf_yadif_init.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '78fa0bd0f7067868943c0899907e313414492426':
x86: cavs: Put mmx-specific code into its own init function
Merged-by: Michael Niedermayer <michaelni@gmx.at>
233 to 105 cycles on Arrandale and Win64.
Replacing the multiplication by s_m[m] by a pand and a pxor with
appropriate vectors is slower. Unrolling is a 15 cycles win.
A SSE version was 4 cycles slower.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'c46819f2299c73cd1bfa8ef04d08b0153a5699d3':
x86: Move constants to the only place where they are used
Conflicts:
libavcodec/x86/vp3dsp.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This way, the special IDCT permutations are no longer needed. This
is similar to how H264 does it, and removes the dsputil dependency
imposed by the scantable code.
Also remove the unused type == 0 cases from the plain C version
of the idct.
Signed-off-by: Martin Storsjö <martin@martin.st>
* commit 'e027032fc6a49db5a4ce12fc3e09ffb86ff20522':
x86: dsputil: ff_h263_*_loop_filter declarations to a more suitable place
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'a89c05500f68d94a0269e68bc522abfd420c5497':
x86: h264qpel: int --> ptrdiff_t for some line_size parameters
Conflicts:
libavcodec/x86/qpelbase.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700
to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb
(in the decode_slice loop) goes from 1759 to 1733 cycles on the clip
tested (cathedral), i.e. almost 30 cycles per mb faster.
Signed-off-by: Martin Storsjö <martin@martin.st>
This way, they can be shared between mpeg4qpel and h264qpel without
requiring either one to be compiled unconditionally.
Signed-off-by: Martin Storsjö <martin@martin.st>
From 312 to 89/68 (sse/sse2) cycles on Arrandale and Win64.
Sandybridge: 68/47 cycles.
Having a loop counter is a 7 cycle gain.
Unrolling is another 7 cycle gain.
Working in reverse scan is another 6 cycles.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Timing on Arrandale:
C SSE
Win32: 57 44
Win64: 47 38
Unrolling and not storing mask both save some cycles.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Timing on Arrandale:
C SSE
Win32: 57 44
Win64: 47 38
Unrolling and not storing mask both save some cycles.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'e5c2794a7162e485eefd3133af5b98fd31386aeb':
x86: consistently use unaligned movs in the unaligned bswap
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This way, the special IDCT permutations are no longer needed. Bfin code
is disabled until someone updates it. This is similar to how H264 does
it, and removes the dsputil dependency imposed by the scantable code.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This way, they can be shared between mpeg4qpel and h264qpel without
requiring either one to be compiled unconditionally.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'e8c52271c45ec27d783e74238dcfad0c2008731c':
Revert "Move H264/QPEL specific asm from dsputil.asm to h264_qpel_*.asm."
Conflicts:
libavcodec/x86/dsputil.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This reverts commit f90ff772e7.
The code should be put back in h264_qpel_8bit.asm, but unfortunately
it is unconditionally used from dsputil_mmx.c since 71155d7.
* commit '845cfc92f908791714b8c4c8a49c91b8c64b685e':
x86: dsputil: Drop aliasing of ff_put_pixels8_mmx to ff_put_pixels8_mmxext
Conflicts:
libavcodec/x86/dsputil_mmx.c
Note, the commit message is wrong, there are no mmxext instructions as
claimed in the function. The change should do no harm though
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '096cc11ec102701a18951b4f0437d609081ca1dd':
x86: vc1dsp: Move ff_avg_vc1_mspel_mc00_mmxext out of dsputil_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The external assembly function uses mmxext instructions and should not be
masqueraded as an mmx-only function. Instead, use the mmx-only inline
assembly function.
This fixes crashes in chromium on win64 on machines with AVX
(crashes that apparently aren't triggered by fate).
Signed-off-by: Martin Storsjö <martin@martin.st>
The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700
to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb
(in the decode_slice loop) goes from 1759 to 1733 cycles on the clip
tested (cathedral), i.e. almost 30 cycles per mb faster.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Reference:
commit 3615e2be84
Author: Michael Niedermayer <michaelni@gmx.at>
Date: Tue Dec 2 22:02:57 2003 +0000
h263_h_loop_filter_mmx
Originally committed as revision 2553 to svn://svn.ffmpeg.org/ffmpeg/trunk
commit 359f98ded9
Author: Michael Niedermayer <michaelni@gmx.at>
Date: Tue Dec 2 20:28:10 2003 +0000
h263_v_loop_filter_mmx
Originally committed as revision 2552 to svn://svn.ffmpeg.org/ffmpeg/trunk
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
x86: dsputil: Fix h263 loop filter link error in some configurations
Conflicts:
libavcodec/x86/dsputil.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This was caused by unconditionally referencing a conditionally compiled
table. Now the code is also compiled conditionally.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
This avoids SIMD-optimized functions having to sign-extend their
line size argument manually to be able to do pointer arithmetic.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
The symbol "ff_h263_loop_filter_strength" is defined in h263.c, but
the h263 loopfilter functions (in the .asm file) are not optimized
out (even though their function pointers are never assigned).
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '304b806cb524fb040f8e09a241040f1af2cb820b':
build: Make library minor version visible in the Makefile
x86: mpeg4qpel: Make movsxifnidn do the right thing
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Also use the resulting 16bpp functions for anything >8 and <=16, not just
9 and 10. This fixes 12 and 14bpp H264 support.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Reference:
commit 3615e2be84
Author: Michael Niedermayer <michaelni@gmx.at>
Date: Tue Dec 2 22:02:57 2003 +0000
h263_h_loop_filter_mmx
Originally committed as revision 2553 to svn://svn.ffmpeg.org/ffmpeg/trunk
commit 359f98ded9
Author: Michael Niedermayer <michaelni@gmx.at>
Date: Tue Dec 2 20:28:10 2003 +0000
h263_v_loop_filter_mmx
Originally committed as revision 2552 to svn://svn.ffmpeg.org/ffmpeg/trunk
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'a846dccb29d2bb0798af1d47d06100eda9ca87cc':
h264chroma: x86: Fix building with yasm disabled
rv34: Drop now unnecessary dsputil dependencies
Conflicts:
libavcodec/x86/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '620289a20e022b9c16c10d546ef86cc0bb77cc84':
sh4: Fix silly type vs. variable name search and replace typo
configure: Group all hwaccels together in a separate variable
Add av_cold attributes to arch-specific init functions
Conflicts:
configure
libavcodec/arm/mpegvideo_armv5te.c
libavcodec/x86/mlpdsp.c
libavcodec/x86/motion_est.c
libavcodec/x86/mpegvideoenc.c
libavcodec/x86/videodsp_init.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '25841dfe806a13de526ae09c11149ab1f83555a8':
Use ptrdiff_t instead of int for {avg, put}_pixels line_size parameter.
Conflicts:
libavcodec/alpha/dsputil_alpha.c
libavcodec/dsputil_template.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
x86: hpel: Move {avg,put}_pixels16_sse2 to hpeldsp
configure: Add a comment indicating why uclibc is checked before glibc
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '098eed95bc1a6b2c8ac97f126f62bb74699670cf':
mdec: merge mdec_common_init() into decode_init().
eatgv: use fixed-width types where appropriate.
x86: Simplify some arch conditionals
bfin: Separate VP3 initialization code
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '05b0998f511ffa699407465d48c7d5805f746ad2':
dsputil: Fix error by not using redzone and register name
swscale: GBRP output support
Conflicts:
libswscale/output.c
libswscale/swscale.c
libswscale/swscale_internal.h
libswscale/utils.c
tests/ref/lavfi/pixdesc
tests/ref/lavfi/pixfmts_copy
tests/ref/lavfi/pixfmts_null
tests/ref/lavfi/pixfmts_scale
tests/ref/lavfi/pixfmts_vflip
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Also fix project name
See git blame/log/show and
commit 826f429ae9
Author: Michael Niedermayer <michaelni@gmx.at>
Date: Sun Jan 5 15:57:10 2003 +0000
qpel in mmx2/3dnow
qpel refinement quality parameter
Originally committed as revision 1393 to svn://svn.ffmpeg.org/ffmpeg/trunk
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This also fixes the project name
Original authors fabrice and nick go back to the initial ffmpeg commit
Others for example contributed in: (for a complete list please use git blame / show / log)
commit e9c0a38ff0
Author: Zdenek Kabelac <kabi@informatics.muni.cz>
Date: Tue May 28 16:35:58 2002 +0000
* optimized avg_* functions (except xy2)
* minor speedup for put_pixels_x2 & cleanup
Originally committed as revision 619 to svn://svn.ffmpeg.org/ffmpeg/trunk
commit 607dce96c0
Author: Michael Niedermayer <michaelni@gmx.at>
Date: Fri May 17 01:04:14 2002 +0000
hopefully faster mmx2&3dnow MC
Originally committed as revision 506 to svn://svn.ffmpeg.org/ffmpeg/trunk
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'f90ff772e7e35b4923c2de429d1fab9f2569b568':
Move H264/QPEL specific asm from dsputil.asm to h264_qpel_*.asm.
doc: update the reference for the title
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '69c25c9284645cf5189af2ede42d6f53828f3b45':
dnxhdenc: fix invalid reads in dnxhd_mb_var_thread().
x86: h264qpel: Move stray comment to the right spot and clarify it
atrac3: use correct loop variable in add_tonal_components()
Conflicts:
tests/ref/vsynth/vsynth1-dnxhd-1080i
tests/ref/vsynth/vsynth2-dnxhd-1080i
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '2c10e2a2f62477efaef5b641974594f7df4ca339':
build: Make the H.264 parser select h264qpel
x86: h264qpel: add cpu flag checks for init function
Conflicts:
libavcodec/x86/h264_qpel.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The sh4 optimizations are removed, because the code is
100% identical to the C code, so it is unlikely to
provide any real practical benefit.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>