* qatar/master:
x86: fix build with nasm 2.08
x86: use nop cpu directives only if supported
x86: fix rNmp macros with nasm
build: add trailing / to yasm/nasm -I flags
x86: use 32-bit source registers with movd instruction
x86: add colons after labels
Conflicts:
Makefile
libavutil/x86/x86inc.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
yasm tolerates mismatch between movd/movq and source register size,
adjusting the instruction according to the register. nasm is more
strict.
Signed-off-by: Mans Rullgard <mans@mansr.com>
* qatar/master:
x86: h264_idct: Rename x264_add8x4_idct_sse2 --> h264_add8x4_idct_sse2
rational: add av_inv_q() returning the inverse of an AVRational
dpx: Make start offset unsigned
lavfi: properly signal out-of-memory error in ff_filter_samples
cosmetics: Fix a few switched periods and linebreaks
zerocodec: Fix memleak in decode_frame
zerocodec: Cosmetics
Conflicts:
ffmpeg.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
lavr: fix handling of custom mix matrices
fate: force pix_fmt in lagarith-rgb32 test
fate: add tests for lagarith lossless video codec.
ARMv6: vp8: fix stack allocation with Apple's assembler
ARM: vp56: allow inline asm to build with clang
fft: 3dnow: fix register name typo in DECL_IMDCT macro
x86: dct32: port to cpuflags
x86: build: replace mmx2 by mmxext
Revert "wmapro: prevent division by zero when sample rate is unspecified"
wmapro: prevent division by zero when sample rate is unspecified
lagarith: fix color plane inversion for YUY2 output.
lagarith: pad RGB buffer by 1 byte.
dsputil: make add_hfyu_left_prediction_sse4() support unaligned src.
Conflicts:
doc/APIchanges
libavcodec/lagarith.c
libavfilter/x86/gradfun.c
libavutil/cpu.h
libavutil/version.h
libswscale/utils.c
libswscale/version.h
libswscale/x86/yuv2rgb.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
vc1dec: Remove separate scaling function for interlaced field MVs
vc1dec: Invoke edge_emulation regardless of MV precision
x86: Use consistent 3dnowext function and macro name suffixes
g723_1: scale output as supposed for the case with postfilter disabled
g723_1: increase excitation storage by 4
g723_1: fix upper bound parameter from inverse maximum autocorrelation
g723_1: make scale_vector() behave like the reference
g723_1: fix off-by-one error in normalize_bits()
g723_1: save/restore excitation with offset to store LPC history
wmapro: prevent division by zero when sample rate is unspecified
x86: proresdsp: improve SIGNEXTEND macro comments
x86: h264dsp: K&R formatting cosmetics
LICENSE: Document all GPL files
Conflicts:
libavcodec/g723_1.c
libavcodec/wmaprodec.c
libavcodec/x86/h264dsp_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
So switching to a consistent naming scheme beforehand is sensible.
The name "mmxext" is more official and widespread and also the name
of the CPU flag, as reported e.g. by the Linux kernel.
This makes add_hfyu_left_prediction_sse4() handle sources that are not
16-byte aligned in its own function rather than by proxying the call to
add_hfyu_left_prediction_ssse3(). This fixes a crash on Win64, since the
sse4 version clobberes xmm6, but the ssse3 version (which uses MMX regs)
does not restore it, thus leading to XMM clobbering and RSP being off.
Fixes bug 342.
Currently there is a wild mix of 3dn2/3dnow2/3dnowext. Switching to
"3dnowext", which is a more common name of the CPU flag, as reported
e.g. by the Linux kernel, unifies this.
Some calculations were changed in b6a3849 to use mmsize, which was not correct
for the AVX version, which uses INIT_YMM and therefore has mmsize == 32.
Fixes Bug 341.
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
These functions are not faster than other mmx implementations on
any hardware I have been able to test on, and they are horribly
inaccurate. There is thus no reason to ever use them.
Signed-off-by: Mans Rullgard <mans@mansr.com>
* qatar/master:
dca: Switch dca_sample_rates to avpriv_ prefix; it is used across libs
ARM: use =const syntax instead of explicit literal pools
ARM: use standard syntax for all LDRD/STRD instructions
fft: port FFT/IMDCT 3dnow functions to yasm, and disable on x86-64.
dct-test: allow to compile without HAVE_INLINE_ASM.
x86/dsputilenc: bury inline asm under HAVE_INLINE_ASM.
dca: Move tables used outside of dcadec.c to a separate file.
dca: Rename dca.c ---> dcadec.c
x86: h264dsp: Remove unused variable ff_pb_3_1
apetag: change a forgotten return to return 0
Conflicts:
libavcodec/Makefile
libavcodec/dca.c
libavcodec/x86/fft_3dn.c
libavcodec/x86/fft_3dn2.c
libavcodec/x86/fft_mmx.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
mpc8: return more meaningful error codes.
mpc: return more meaningful error codes.
wv,mpc8: don't return apetag data in packets.
rtmp: do not warn about receiving metadata packets
x86: h264dsp: Adjust YASM #ifdefs
x86: yadif: Mark mmxext optimizations as such
h264: convert loop filter strength dsp function to yasm.
Improve descriptiveness of a number of codec and container long names
Conflicts:
libavcodec/flvdec.c
libavcodec/libopenjpegdec.c
libavformat/apetag.c
libavformat/mp3dec.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This completes the conversion of h264dsp to yasm; note that h264 also
uses some dsputil functions, most notably qpel. Performance-wise, the
yasm-version is ~10 cycles faster (182->172) on x86-64, and ~8 cycles
faster (201->193) on x86-32.
* qatar/master: (35 commits)
h264_idct_10bit: port x86 assembly to cpuflags.
x86inc: clip num_args to 7 on x86-32.
x86inc: sync to latest version from x264.
fft: rename "z" to "zc" to prevent name collision.
wv: return meaningful error codes.
wv: return AVERROR_EOF on EOF, not EIO.
mp3dec: forward errors for av_get_packet().
mp3dec: remove a pointless local variable.
mp3dec: remove commented out cruft.
lavfi: bump minor to mark stabilizing the ABI.
FATE: add tests for yadif.
FATE: add a test for delogo video filter.
FATE: add a test for amix audio filter.
audiogen: allow specifying random seed as a commandline parameter.
vc1dec: Override invalid macroblock quantizer
vc1: avoid reading beyond the last line in vc1_draw_sprites()
vc1dec: check that coded slice positions and interlacing match.
vc1dec: Do not ignore ff_vc1_parse_frame_header_adv return value
configure: Move parts that should not be user-selectable to CONFIG_EXTRA
lavf: remove commented out cruft in avformat_find_stream_info()
...
Conflicts:
Makefile
configure
libavcodec/vc1dec.c
libavcodec/x86/h264_deblock.asm
libavcodec/x86/h264_deblock_10bit.asm
libavcodec/x86/h264dsp_mmx.c
libavfilter/version.h
libavformat/mp3dec.c
libavformat/utils.c
libavformat/wv.c
libavutil/x86/x86inc.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Without this, cglobal will expand "z" to "zh" to access the high byte
in a register's word, which causes a name collision with the ZH(x) macro
further up in this file.
* qatar/master:
proresdsp: port x86 assembly to cpuflags.
lavr: x86: improve non-SSE4 version of S16_TO_S32_SX macro
lavfi: better channel layout negotiation
alac: check for truncated packets
alac: reverse lpc coeff order, simplify filter
lavr: add x86-optimized mixing functions
x86: add support for fmaddps fma4 instruction with abstraction to avx/sse
tscc2: fix typo in array index
build: use COMPILE template for HOSTOBJS
build: do full flag handling for all compiler-type tools
eval: fix printing of NaN in eval fate test.
build: Rename aandct component to more descriptive aandcttables
mpegaudio: bury inline asm under HAVE_INLINE_ASM.
x86inc: automatically insert vzeroupper for YMM functions.
rtmp: Check the buffer length of ping packets
rtmp: Allow having more unknown data at the end of a chunk size packet without failing
rtmp: Prevent reading outside of an allocate buffer when receiving server bandwidth packets
Conflicts:
Makefile
configure
libavcodec/x86/proresdsp.asm
libavutil/eval.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
libopenjpeg: support YUV and deep RGB pixel formats
Fix typo in v410 decoder.
vf_yadif: unset cur_buf on the input link.
vf_overlay: ensure the overlay frame does not get leaked.
vf_overlay: prevent premature freeing of cur_buf
Support urlencoded http authentication credentials
rtmp: Return an error when the client bandwidth is incorrect
rtmp: Return proper error code in handle_server_bw
rtmp: Return proper error code in handle_client_bw
rtmp: Return proper error codes in handle_chunk_size
lavr: x86: add missing vzeroupper in ff_mix_1_to_2_fltp_flt()
vp8: Replace x*155/100 by x*101581>>16.
vp3: don't use calls to inline asm in yasm code.
x86/dsputil: put inline asm under HAVE_INLINE_ASM.
dsputil_mmx: fix incorrect assembly code
rtmp: Factorize the code by adding handle_invoke
rtmp: Factorize the code by adding handle_chunk_size
rtmp: Factorize the code by adding handle_ping
rtmp: Factorize the code by adding handle_client_bw
rtmp: Factorize the code by adding handle_server_bw
Conflicts:
libavcodec/libopenjpegdec.c
libavcodec/x86/dsputil_mmx.c
libavfilter/vf_overlay.c
libavformat/Makefile
libavformat/version.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Mixing yasm and inline asm is a bad idea, since if either yasm or inline
asm is not supported by your toolchain, all of the asm stops working.
Thus, better to use either one or the other alone.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
In ff_put_pixels_clamped_mmx(), there are two assembly code blocks.
In the first block (in the unrolled loop), the instructions
"movq 8%3, %%mm1 \n\t", and so forth, have problems.
From above instruction, it is clear what the programmer wants: a load from
p + 8. But this assembly code doesn’t guarantee that. It only works if the
compiler puts p in a register to produce an instruction like this:
"movq 8(%edi), %mm1". During compiler optimization, it is possible that the
compiler will be able to constant propagate into p. Suppose p = &x[10000].
Then operand 3 can become 10000(%edi), where %edi holds &x. And the instruction
becomes "movq 810000(%edx)". That is, it will stride by 810000 instead of 8.
This will cause a segmentation fault.
This error was fixed in the second block of the assembly code, but not in
the unrolled loop.
How to reproduce:
This error is exposed when we build using Intel C++ Compiler, with
IPO+PGO optimization enabled. Crashed when decoding an MJPEG video.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
In file libavcodec/x86/dsputil_mmx.c, function ff_put_pixels_clamped_mmx(), there are two assembly code blocks. In the first block (in the unrolled loop), the instructions "movq 8%3, %%mm1 \n\t" etc have problem.
For above instruction, it is clear what the programmer wants: a load from p + 8. But this assembly code doesn’t guarantee that. It only works if the compiler puts p in a register to produce an instruction like this: “movq 8(%edi), %mm1”. During compiler optimization, it is possible that the compiler will be able to constant propagate into p. Suppose p = &x[10000]. Then operand 3 can become 10000(%edi), where %edi holds &x. And the instruction becomes “movq 810000(%edx)”. That is, it will stride by 810000 instead of 8.
This will cause the segmentation fault.
This error was fixed in the second block of the assembly code, but not in the unrolled loop.
How to reproduce:
This error is exposed when we build the ffmpeg using Intel C++ Compiler, IPO+PGO optimization. The ffmpeg was crashed when decoding a mjpeg video.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
v410dec: Implement explode mode support
zerocodec: fix direct rendering.
wav: init st to NULL to avoid a false-positive warning.
wavpack: set bits_per_raw_sample for S32 samples to properly identify 24-bit
h264: refactor NAL decode loop
RTMPTE protocol support
RTMPE protocol support
rtmp: Add ff_rtmp_calc_digest_pos()
rtmp: Rename rtmp_calc_digest to ff_rtmp_calc_digest and make it global
swscale: add missing HAVE_INLINE_ASM check.
lavfi: place x86 inline assembly under HAVE_INLINE_ASM.
vc1: Add a test for interlaced field pictures
swscale: Mark all init functions as av_cold
swscale: x86: Drop pointless _mmx suffix from filenames
lavf: use conditional notation for default codec in muxer declarations.
swscale: place inline assembly bilinear scaler under HAVE_INLINE_ASM.
dsputil: ppc: cosmetics: pretty-print
dsputil: x86: add SHUFFLE_MASK_W macro
configure: respect CC_O setting in check_cc
Conflicts:
Changelog
configure
libavcodec/v410dec.c
libavcodec/zerocodec.c
libavformat/asfenc.c
libavformat/version.h
libswscale/utils.c
libswscale/x86/swscale.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
Print full compiler identification, not only version number
flacdec: reverse lpc coeff order, simplify filter
x86: dsputil: drop some unused CPU flag debug code
Conflicts:
cmdutils.c
configure
Merged-by: Michael Niedermayer <michaelni@gmx.at>