* commit 'dc1b328d0df6e5ad5ff0ca4ae031e08466624f9c':
x86: hpeldsp: Only compile MMX hpeldsp code if MMX is enabled
Conflicts:
libavcodec/x86/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '9e5e76ef9ea803432ef2782a3f528c3f5bab621e':
x86: More specific ifdefs for dsputil/hpeldsp init functions
Conflicts:
libavcodec/x86/dsputil_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Sandybridge: 47 cycles
Having a loop counter is a 7 cycle gain.
Unrolling is another 7 cycle gain.
Working in reverse scan is another 6 cycles.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
* commit 'bf7c3c6b157f7938578f964b62cffd5e504940be':
x86: dsputil: Move cavs and vc1-specific functions where they belong
Conflicts:
libavcodec/x86/dsputil_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '932806232108872655556100011fe369125805d3':
x86: dsputil: Move avg_pixels16_mmx() out of rnd_template.c
x86: dsputil: Move avg_pixels8_mmx() out of rnd_template.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '9b3a04d30691e85b77e63f75f5f26a93c3a000cd':
x86: Move duplicated put_pixels{8|16}_mmx functions into their own file
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '7f75f2f2bd692857c1c1ca7f414eb30ece3de93d':
ppc: Drop unnecessary ff_ name prefixes from static functions
x86: Drop unnecessary ff_ name prefixes from static functions
arm: Drop unnecessary ff_ name prefixes from static functions
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '6b110d3a739c31602b59887ad65c67025df3f49d':
ppc: More consistent names for H.264 optimizations files
mpegaudiosp: More consistent names for ppc/x86 optimization files
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
x86: ac3dsp: Remove 3dnow version of ff_ac3_extract_exponents
Conflicts:
tests/fate/ac3.mak
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The function requires increasing the fuzz factor for the ac3/eac3 encode
tests and even so makes fate fail. It only provides a slight encoding
speedup for legacy CPUs that do not support SS2. Thus its benefit is not
worth the trouble it creates and fixing it would be a waste of time.
* qatar/master:
x86: Move some conditional code around to avoid unused variable warnings
Conflicts:
libavcodec/x86/dsputil_mmx.c
libavfilter/x86/vf_yadif_init.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '78fa0bd0f7067868943c0899907e313414492426':
x86: cavs: Put mmx-specific code into its own init function
Merged-by: Michael Niedermayer <michaelni@gmx.at>
233 to 105 cycles on Arrandale and Win64.
Replacing the multiplication by s_m[m] by a pand and a pxor with
appropriate vectors is slower. Unrolling is a 15 cycles win.
A SSE version was 4 cycles slower.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'c46819f2299c73cd1bfa8ef04d08b0153a5699d3':
x86: Move constants to the only place where they are used
Conflicts:
libavcodec/x86/vp3dsp.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This way, the special IDCT permutations are no longer needed. This
is similar to how H264 does it, and removes the dsputil dependency
imposed by the scantable code.
Also remove the unused type == 0 cases from the plain C version
of the idct.
Signed-off-by: Martin Storsjö <martin@martin.st>
* commit 'e027032fc6a49db5a4ce12fc3e09ffb86ff20522':
x86: dsputil: ff_h263_*_loop_filter declarations to a more suitable place
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'a89c05500f68d94a0269e68bc522abfd420c5497':
x86: h264qpel: int --> ptrdiff_t for some line_size parameters
Conflicts:
libavcodec/x86/qpelbase.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700
to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb
(in the decode_slice loop) goes from 1759 to 1733 cycles on the clip
tested (cathedral), i.e. almost 30 cycles per mb faster.
Signed-off-by: Martin Storsjö <martin@martin.st>
This way, they can be shared between mpeg4qpel and h264qpel without
requiring either one to be compiled unconditionally.
Signed-off-by: Martin Storsjö <martin@martin.st>
From 312 to 89/68 (sse/sse2) cycles on Arrandale and Win64.
Sandybridge: 68/47 cycles.
Having a loop counter is a 7 cycle gain.
Unrolling is another 7 cycle gain.
Working in reverse scan is another 6 cycles.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Timing on Arrandale:
C SSE
Win32: 57 44
Win64: 47 38
Unrolling and not storing mask both save some cycles.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Timing on Arrandale:
C SSE
Win32: 57 44
Win64: 47 38
Unrolling and not storing mask both save some cycles.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'e5c2794a7162e485eefd3133af5b98fd31386aeb':
x86: consistently use unaligned movs in the unaligned bswap
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This way, the special IDCT permutations are no longer needed. Bfin code
is disabled until someone updates it. This is similar to how H264 does
it, and removes the dsputil dependency imposed by the scantable code.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This way, they can be shared between mpeg4qpel and h264qpel without
requiring either one to be compiled unconditionally.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'e8c52271c45ec27d783e74238dcfad0c2008731c':
Revert "Move H264/QPEL specific asm from dsputil.asm to h264_qpel_*.asm."
Conflicts:
libavcodec/x86/dsputil.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This reverts commit f90ff772e7.
The code should be put back in h264_qpel_8bit.asm, but unfortunately
it is unconditionally used from dsputil_mmx.c since 71155d7.
* commit '845cfc92f908791714b8c4c8a49c91b8c64b685e':
x86: dsputil: Drop aliasing of ff_put_pixels8_mmx to ff_put_pixels8_mmxext
Conflicts:
libavcodec/x86/dsputil_mmx.c
Note, the commit message is wrong, there are no mmxext instructions as
claimed in the function. The change should do no harm though
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '096cc11ec102701a18951b4f0437d609081ca1dd':
x86: vc1dsp: Move ff_avg_vc1_mspel_mc00_mmxext out of dsputil_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The external assembly function uses mmxext instructions and should not be
masqueraded as an mmx-only function. Instead, use the mmx-only inline
assembly function.
This fixes crashes in chromium on win64 on machines with AVX
(crashes that apparently aren't triggered by fate).
Signed-off-by: Martin Storsjö <martin@martin.st>
The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700
to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb
(in the decode_slice loop) goes from 1759 to 1733 cycles on the clip
tested (cathedral), i.e. almost 30 cycles per mb faster.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Reference:
commit 3615e2be84
Author: Michael Niedermayer <michaelni@gmx.at>
Date: Tue Dec 2 22:02:57 2003 +0000
h263_h_loop_filter_mmx
Originally committed as revision 2553 to svn://svn.ffmpeg.org/ffmpeg/trunk
commit 359f98ded9
Author: Michael Niedermayer <michaelni@gmx.at>
Date: Tue Dec 2 20:28:10 2003 +0000
h263_v_loop_filter_mmx
Originally committed as revision 2552 to svn://svn.ffmpeg.org/ffmpeg/trunk
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
x86: dsputil: Fix h263 loop filter link error in some configurations
Conflicts:
libavcodec/x86/dsputil.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This was caused by unconditionally referencing a conditionally compiled
table. Now the code is also compiled conditionally.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
This avoids SIMD-optimized functions having to sign-extend their
line size argument manually to be able to do pointer arithmetic.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
The symbol "ff_h263_loop_filter_strength" is defined in h263.c, but
the h263 loopfilter functions (in the .asm file) are not optimized
out (even though their function pointers are never assigned).
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '304b806cb524fb040f8e09a241040f1af2cb820b':
build: Make library minor version visible in the Makefile
x86: mpeg4qpel: Make movsxifnidn do the right thing
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Also use the resulting 16bpp functions for anything >8 and <=16, not just
9 and 10. This fixes 12 and 14bpp H264 support.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Reference:
commit 3615e2be84
Author: Michael Niedermayer <michaelni@gmx.at>
Date: Tue Dec 2 22:02:57 2003 +0000
h263_h_loop_filter_mmx
Originally committed as revision 2553 to svn://svn.ffmpeg.org/ffmpeg/trunk
commit 359f98ded9
Author: Michael Niedermayer <michaelni@gmx.at>
Date: Tue Dec 2 20:28:10 2003 +0000
h263_v_loop_filter_mmx
Originally committed as revision 2552 to svn://svn.ffmpeg.org/ffmpeg/trunk
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'a846dccb29d2bb0798af1d47d06100eda9ca87cc':
h264chroma: x86: Fix building with yasm disabled
rv34: Drop now unnecessary dsputil dependencies
Conflicts:
libavcodec/x86/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '620289a20e022b9c16c10d546ef86cc0bb77cc84':
sh4: Fix silly type vs. variable name search and replace typo
configure: Group all hwaccels together in a separate variable
Add av_cold attributes to arch-specific init functions
Conflicts:
configure
libavcodec/arm/mpegvideo_armv5te.c
libavcodec/x86/mlpdsp.c
libavcodec/x86/motion_est.c
libavcodec/x86/mpegvideoenc.c
libavcodec/x86/videodsp_init.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '25841dfe806a13de526ae09c11149ab1f83555a8':
Use ptrdiff_t instead of int for {avg, put}_pixels line_size parameter.
Conflicts:
libavcodec/alpha/dsputil_alpha.c
libavcodec/dsputil_template.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
x86: hpel: Move {avg,put}_pixels16_sse2 to hpeldsp
configure: Add a comment indicating why uclibc is checked before glibc
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '098eed95bc1a6b2c8ac97f126f62bb74699670cf':
mdec: merge mdec_common_init() into decode_init().
eatgv: use fixed-width types where appropriate.
x86: Simplify some arch conditionals
bfin: Separate VP3 initialization code
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '05b0998f511ffa699407465d48c7d5805f746ad2':
dsputil: Fix error by not using redzone and register name
swscale: GBRP output support
Conflicts:
libswscale/output.c
libswscale/swscale.c
libswscale/swscale_internal.h
libswscale/utils.c
tests/ref/lavfi/pixdesc
tests/ref/lavfi/pixfmts_copy
tests/ref/lavfi/pixfmts_null
tests/ref/lavfi/pixfmts_scale
tests/ref/lavfi/pixfmts_vflip
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Also fix project name
See git blame/log/show and
commit 826f429ae9
Author: Michael Niedermayer <michaelni@gmx.at>
Date: Sun Jan 5 15:57:10 2003 +0000
qpel in mmx2/3dnow
qpel refinement quality parameter
Originally committed as revision 1393 to svn://svn.ffmpeg.org/ffmpeg/trunk
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This also fixes the project name
Original authors fabrice and nick go back to the initial ffmpeg commit
Others for example contributed in: (for a complete list please use git blame / show / log)
commit e9c0a38ff0
Author: Zdenek Kabelac <kabi@informatics.muni.cz>
Date: Tue May 28 16:35:58 2002 +0000
* optimized avg_* functions (except xy2)
* minor speedup for put_pixels_x2 & cleanup
Originally committed as revision 619 to svn://svn.ffmpeg.org/ffmpeg/trunk
commit 607dce96c0
Author: Michael Niedermayer <michaelni@gmx.at>
Date: Fri May 17 01:04:14 2002 +0000
hopefully faster mmx2&3dnow MC
Originally committed as revision 506 to svn://svn.ffmpeg.org/ffmpeg/trunk
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'f90ff772e7e35b4923c2de429d1fab9f2569b568':
Move H264/QPEL specific asm from dsputil.asm to h264_qpel_*.asm.
doc: update the reference for the title
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '69c25c9284645cf5189af2ede42d6f53828f3b45':
dnxhdenc: fix invalid reads in dnxhd_mb_var_thread().
x86: h264qpel: Move stray comment to the right spot and clarify it
atrac3: use correct loop variable in add_tonal_components()
Conflicts:
tests/ref/vsynth/vsynth1-dnxhd-1080i
tests/ref/vsynth/vsynth2-dnxhd-1080i
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '2c10e2a2f62477efaef5b641974594f7df4ca339':
build: Make the H.264 parser select h264qpel
x86: h264qpel: add cpu flag checks for init function
Conflicts:
libavcodec/x86/h264_qpel.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The sh4 optimizations are removed, because the code is
100% identical to the C code, so it is unlikely to
provide any real practical benefit.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
* commit '2e4bb99f4df7052b3e147ee898fcb4013a34d904':
vorbisdsp: convert x86 simd functions from inline asm to yasm.
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Now, nellymoserenc and aacenc no longer depends on dsputil. Independent
of this patch, wmaprodec also does not depend on dsputil, so I removed
it from there also.
* qatar/master:
proresdec: support mixed interlaced/non-interlaced content
vp3/5: move put_no_rnd_pixels_l2 from dsputil to VP3DSPContext.
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'ce378f0dd0c4e5350b3280e6b3e8d6b46fe4b0a3':
fate: Use wmv2 IDCT for wmv2 tests
vorbisdsp: change block_size type from int to intptr_t.
Conflicts:
tests/fate-run.sh
tests/fate/vcodec.mak
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '8a4f26206d7914eaf2903954ce97cb7686933382':
dsputil: remove butterflies_float_interleave.
srtp: Move a variable to a local scope
srtp: Add tests for the crypto suite with 32/80 bit HMAC
Conflicts:
libavcodec/x86/dsputil.asm
libavcodec/x86/dsputil_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'aeaf268e52fc11c1f64914a319e0edddf1346d6a':
vp3: integrate clear_blocks with idct of previous block.
mpegvideo: fix loop condition in draw_line()
dvdsubdec: parse the size from the extradata
Conflicts:
libavcodec/dvdsubdec.c
libavcodec/mpegvideo.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This is identical to what e.g. vp8 does, and prevents the function call
overhead (plus dependency on dsputil for this particular function).
Arm asm updated by Janne Grunau <janne-libav@jannau.net>.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
* qatar/master:
x86: dsputil: Drop some unused macro definitions
x86: Add a Yasm-based emms() replacement
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
lavc: Move vector_fmul_window to AVFloatDSPContext
rtpdec_mpeg4: Check the remaining amount of data before reading
Conflicts:
libavcodec/dsputil.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'dae1d507af94261bafd3b11549884e5d1eca590e':
x86: Add PAVGB macro to abstract pavgb/pavgusb instruction via cpuflags
vf_fps: add final flushed frames to the dropped frame count
rv34_parser: Adjust #if for disabling individual parsers
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'd8c772de53d29afb1bada88afa859fce8489c668':
nutdec: Always return a value from nut_read_timestamp()
configure: Make warnings from -Wreturn-type fatal errors
x86: ABS2: port to cpuflags
vdpau: Remove av_unused attribute from function declaration
h264: fix ff_generate_sliding_window_mmcos() prototype.
Conflicts:
configure
libavformat/nutdec.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This could also be fixed by changing the argument type if
someone prefers that and wants to change it ...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '5b4dfbffc258f90a7d2540d21209ac23afcf7cd0':
x86: ABS1: port to cpuflags
v210x: cosmetics, reformat
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The asm code is not valid for older compilers as it uses too many
operands, ICC on x86_32 seems affected by this.
This patch disables the affected code for ICC on x86_32 and should
make it compileable again.
A better fix would be to use fewer operands or to change this code
to yasm, later is being worked on AFAIK so this is a temporary
solution.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Move some functions from dsputil. The idea is that videodsp contains
functions that are useful for a large and varied set of video decoders.
Currently, it contains emulated_edge_mc() and prefetch().
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Use this in VP8/H264-8bit loopfilter functions so they can be used if
there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Use this in VP8/H264-8bit loopfilter functions so they can be used if
there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
* commit '30b39164256999efc8d77edc85e2e0b963c24834':
ac3dec: make downmix() take array of pointers to channel data
Conflicts:
libavcodec/ac3dsp.c
libavcodec/ac3dsp.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Start and end index are multiple of 2, therefore guaranteeing aligned access.
Also, this allows to generate 4 floats per loop, keeping the alignment all
along.
Timing:
- 32 bits: 326c -> 172c
- 64 bits: 323c -> 156c
Signed-off-by: Diego Biurrun <diego@biurrun.de>