* commit '2e4bb99f4df7052b3e147ee898fcb4013a34d904':
vorbisdsp: convert x86 simd functions from inline asm to yasm.
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Now, nellymoserenc and aacenc no longer depends on dsputil. Independent
of this patch, wmaprodec also does not depend on dsputil, so I removed
it from there also.
* qatar/master:
proresdec: support mixed interlaced/non-interlaced content
vp3/5: move put_no_rnd_pixels_l2 from dsputil to VP3DSPContext.
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '8a4f26206d7914eaf2903954ce97cb7686933382':
dsputil: remove butterflies_float_interleave.
srtp: Move a variable to a local scope
srtp: Add tests for the crypto suite with 32/80 bit HMAC
Conflicts:
libavcodec/x86/dsputil.asm
libavcodec/x86/dsputil_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
x86: dsputil: Drop some unused macro definitions
x86: Add a Yasm-based emms() replacement
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
lavc: Move vector_fmul_window to AVFloatDSPContext
rtpdec_mpeg4: Check the remaining amount of data before reading
Conflicts:
libavcodec/dsputil.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Move some functions from dsputil. The idea is that videodsp contains
functions that are useful for a large and varied set of video decoders.
Currently, it contains emulated_edge_mc() and prefetch().
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
* commit '5ae72f54532960cb9eae82a1c9e8d505106c022b':
flashsv: check for keyframe before using differential coding
h264: enable low delay only if no delayed frames were seen
x86: fix build without inline asm
Conflicts:
libavcodec/h264.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The qpel functions referenced here are not related to h264 and should
thus never have been under CONFIG_H264QPEL.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Signed-off-by: Diego Biurrun <diego@biurrun.de>
The only CPUs that have 3dnow and don't have mmxext are 12 years old.
Moreover, AMD has dropped 3dnow extensions from newer CPUs.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
* qatar/master:
x86: dsputil: port to cpuflags
crc: av_crc() parameter names should match between .c, .h and doxygen
avserver: replace av_read_packet with av_read_frame
avserver: fix constness casting warnings
Conflicts:
libavcodec/x86/dsputil.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
fate: add ac3/eac3 tests to FATE_SAMPLES_AVCONV
avconv_opt, cmdutils: Add missing function parameter Doxygen
x86: Move optimization suffix to end of function names
Conflicts:
cmdutils.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'fa8fcab1e0d31074c0644c4ac5194474c6c26415':
x86: h264_chromamc_10bit: drop pointless PAVG %define
x86: mmx2 ---> mmxext in function names
swscale: do not forget to swap data in formats with different endianness
Conflicts:
libavcodec/x86/dsputil_mmx.c
libavfilter/x86/gradfun.c
libswscale/input.c
libswscale/utils.c
libswscale/x86/swscale.c
tests/ref/lavfi/pixfmts_scale
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'be923ed659016350592acb9b3346f706f8170ac5':
x86: fmtconvert: port to cpuflags
x86: MMX2 ---> MMXEXT in macro names
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
yuv4mpeg: return proper error codes.
Give all anonymously typedeffed structs in headers a name
fate: Add parseutils test
parseutils-test: Drop random colors from parsing test
vf_pad/scale: use double precision for aspect ratios.
build: error on variable-length arrays
ppc: swscale: rework yuv2planeX_altivec()
ppc: fmtconvert: kill VLA in float_to_int16_interleave_altivec()
x86: dsputil: kill VLA in gmc_mmx()
libspeexenc: Updated commentary to reflect recent changes
libspeexenc: Add an option for enabling DTX
doc/APIchanges: fill in missing dates and hashes.
lavr: bump major to 1 and declare it stable.
lavr: change the type of the data buffers to uint8_t**.
lavc: deprecate the audio resampling API.
Conflicts:
cmdutils.h
configure
doc/APIchanges
ffplay.c
libavcodec/dwt.h
libavcodec/libspeexenc.c
libavfilter/vf_pad.c
libavfilter/vf_scale.c
libavformat/asf.h
tests/fate/libavutil.mak
tests/ref/fate/parseutils
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Instead of using an evil VLA, fall back to C version when edge
emulation is needed. MPEG4 GMC is a rarely used fringe feature
so the speed loss is an acceptable cost for safer code.
Signed-off-by: Mans Rullgard <mans@mansr.com>
* qatar/master:
x86: dsputil: Move Xvid IDCT put/add functions to a more suitable place
trasher: Include all the necessary headers
x86: Remove some leftover declarations for non-existent functions
ARM: libavresample: NEON optimised generic fltp to s16 conversion
ARM: libavresample: NEON optimised stereo fltp to s16 conversion
ARM: libavresample: NEON optimised flat float to s16 conversion
Conflicts:
libavcodec/x86/dsputil_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
os_support: Choose between direct.h and io.h using a configure check
os_support: Include io.h instead of direct.h on mingw32ce
x86: ac3dsp: Only refer to the ac3_downmix_sse symbol if it has been declared
swscale: Remove two bogus asserts
ac3: move ac3_downmix() from dsputil to ac3dsp
lavr/audio_mix_matrix: acknowledge the existence of LFE2.
mlp_parser: avoid mapping multiple disctinct TrueHD channels to the same Libav channel.
lavu/audioconvert: add a second low frequency channel.
Conflicts:
doc/APIchanges
libavcodec/ac3dsp.c
libavcodec/ac3dsp.h
libavcodec/mlp_parser.c
libavutil/audioconvert.c
libavutil/version.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
mov_chan: Only set the channel_layout if setting it to a nonzero value
mov_chan: Reindent an incorrectly indented line
mp2 muxer: mark as AVFMT_NOTIMESTAMPS.
x86: float_dsp: fix ff_vector_fmac_scalar_avx() on Win64
x86: more specific checks for availability of required assembly capabilities
x86: avcodec: Drop silly "_mmx" suffix from dsputil template names
fate: Drop redundant setting of FUZZ to 1
cavsdsp: set idct permutation independently of dsputil
x86: allow using add_hfyu_median_prediction_cmov on any cpu with cmov
Conflicts:
libavcodec/x86/dsputil_mmx.c
libavformat/mp3enc.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
For some reason add_hfyu_median_prediction_cmov is only selected
on 3Dnow-capable CPUs, even though it uses no 3Dnow instructions.
This patch allows it to be selected on any cpu with cmov with the
possibility of being overridden by the mmxext version.
Signed-off-by: Mans Rullgard <mans@mansr.com>
* qatar/master:
fate: Allow setting the ld parameter from the config file
x86: dsputil: Do not redundantly check for CPU caps before calling init funcs
configure: Disable some warnings in MSVC
x86: vp56: cmov version of vp56_rac_get_prob requires inline asm
avopt: fix examples to match the same style about default values as the actual code.
configure: Add support for MSVC cl.exe/link.exe
lavu: add snprintf(), vsnprint() and strtod() replacements for MS runtime.
Conflicts:
libavutil/opt.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
MSS1 and MSS2: set final pixel format after common stuff has been initialised
MSS2 decoder
configure: handle --disable-asm before check_deps
x86: Split inline and external assembly #ifdefs
configure: x86: Separate inline from standalone assembler capabilities
pktdumper: Use a custom define instead of PATH_MAX for buffers
pktdumper: Use av_strlcpy instead of strncpy
pktdumper: Use sizeof(variable) instead of the direct buffer length
Conflicts:
Changelog
configure
libavcodec/allcodecs.c
libavcodec/avcodec.h
libavcodec/codec_desc.c
libavcodec/dct-test.c
libavcodec/imgconvert.c
libavcodec/mss12.c
libavcodec/version.h
libavfilter/x86/gradfun.c
libswscale/x86/yuv2rgb.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'ec36aa69448f20a78d8c4588265022e0b2272ab5':
x86: Fix linking with some or all of yasm, mmx, optimizations disabled
configure: Add more fine-grained SSE CPU capabilities flags
avfilter: x86: Use more precise compile template names
x86: cosmetics: Comment some #endifs for better readability
g723_1: add comfort noise generation
utvideoenc: Switch to dsputils' median prediction
utvideoenc: Avoid writing into the input picture
avtools: remove the distinction between func_arg and func2_arg.
avconv: make the -passlogfile option per-stream.
avconv: make the -pass option per-stream.
cmdutils: make -codecs print lossy/lossless flags.
lavc: add lossy/lossless codec properties.
Conflicts:
Changelog
cmdutils.c
configure
doc/APIchanges
ffmpeg.h
ffmpeg_opt.c
ffprobe.c
libavcodec/codec_desc.c
libavcodec/g723_1.c
libavcodec/utvideoenc.c
libavcodec/version.h
libavcodec/x86/mpegaudiodec.c
libavcodec/x86/rv40dsp_init.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'd488c3bcbaf7ddda42597e014deb661a7e9e2112':
configure: support Bitrig OS
yuv2rgb: handle line widths that are not a multiple of 4.
graph2dot: Use the fallback getopt implementation if needed
tools: Include io.h for open/read/write/close if unistd.h doesn't exist
testprogs: Remove unused includes
qt-faststart: Use other seek/tell functions on MSVC than on mingw
ismindex: Include direct.h for _mkdir on windows
sdp: Use static const char arrays instead of pointers to strings
x86: avcodec: Drop silly "_mmx" suffixes from filenames
x86: avcodec: Drop silly "_sse" suffixes from filenames
sdp: Include profile-level-id for H264
utvideoenc: use ff_huff_gen_len_table
huffman: add ff_huff_gen_len_table
cllc: simplify/fix swapped data buffer allocation.
rtpdec_h264: Don't set the pixel format
h264: Check that the codec isn't null before accessing it
audio_frame_queue: Define af_queue_log_state before using it
Conflicts:
libavcodec/audio_frame_queue.c
libavcodec/h264.c
libavcodec/huffman.h
libavcodec/huffyuv.c
libavcodec/utvideoenc.c
libavcodec/x86/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
mpegvideo: reduce excessive inlining of mpeg_motion()
mpegvideo: convert mpegvideo_common.h to a .c file
build: factor out mpegvideo.o dependencies to CONFIG_MPEGVIDEO
Move MASK_ABS macro to libavcodec/mathops.h
x86: move MANGLE() and related macros to libavutil/x86/asm.h
x86: rename libavutil/x86_cpu.h to libavutil/x86/asm.h
aacdec: Don't fall back to the old output configuration when no old configuration is present.
rtmp: Add message tracking
rtsp: Support mpegts in raw udp packets
rtsp: Support receiving plain data over UDP without any RTP encapsulation
rtpdec: Remove an unused include
rtpenc: Remove an av_abort() that depends on user-supplied data
vsrc_movie: discourage its use with avconv.
avconv: allow no input files.
avconv: prevent invalid reads in transcode_init()
avconv: rename OutputStream.is_past_recording_time to finished.
Conflicts:
configure
doc/filters.texi
ffmpeg.c
ffmpeg.h
libavcodec/Makefile
libavcodec/aacdec.c
libavcodec/mpegvideo.c
libavformat/version.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
lavr: fix handling of custom mix matrices
fate: force pix_fmt in lagarith-rgb32 test
fate: add tests for lagarith lossless video codec.
ARMv6: vp8: fix stack allocation with Apple's assembler
ARM: vp56: allow inline asm to build with clang
fft: 3dnow: fix register name typo in DECL_IMDCT macro
x86: dct32: port to cpuflags
x86: build: replace mmx2 by mmxext
Revert "wmapro: prevent division by zero when sample rate is unspecified"
wmapro: prevent division by zero when sample rate is unspecified
lagarith: fix color plane inversion for YUY2 output.
lagarith: pad RGB buffer by 1 byte.
dsputil: make add_hfyu_left_prediction_sse4() support unaligned src.
Conflicts:
doc/APIchanges
libavcodec/lagarith.c
libavfilter/x86/gradfun.c
libavutil/cpu.h
libavutil/version.h
libswscale/utils.c
libswscale/version.h
libswscale/x86/yuv2rgb.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
vc1dec: Remove separate scaling function for interlaced field MVs
vc1dec: Invoke edge_emulation regardless of MV precision
x86: Use consistent 3dnowext function and macro name suffixes
g723_1: scale output as supposed for the case with postfilter disabled
g723_1: increase excitation storage by 4
g723_1: fix upper bound parameter from inverse maximum autocorrelation
g723_1: make scale_vector() behave like the reference
g723_1: fix off-by-one error in normalize_bits()
g723_1: save/restore excitation with offset to store LPC history
wmapro: prevent division by zero when sample rate is unspecified
x86: proresdsp: improve SIGNEXTEND macro comments
x86: h264dsp: K&R formatting cosmetics
LICENSE: Document all GPL files
Conflicts:
libavcodec/g723_1.c
libavcodec/wmaprodec.c
libavcodec/x86/h264dsp_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
So switching to a consistent naming scheme beforehand is sensible.
The name "mmxext" is more official and widespread and also the name
of the CPU flag, as reported e.g. by the Linux kernel.
Currently there is a wild mix of 3dn2/3dnow2/3dnowext. Switching to
"3dnowext", which is a more common name of the CPU flag, as reported
e.g. by the Linux kernel, unifies this.
These functions are not faster than other mmx implementations on
any hardware I have been able to test on, and they are horribly
inaccurate. There is thus no reason to ever use them.
Signed-off-by: Mans Rullgard <mans@mansr.com>
* qatar/master: (35 commits)
h264_idct_10bit: port x86 assembly to cpuflags.
x86inc: clip num_args to 7 on x86-32.
x86inc: sync to latest version from x264.
fft: rename "z" to "zc" to prevent name collision.
wv: return meaningful error codes.
wv: return AVERROR_EOF on EOF, not EIO.
mp3dec: forward errors for av_get_packet().
mp3dec: remove a pointless local variable.
mp3dec: remove commented out cruft.
lavfi: bump minor to mark stabilizing the ABI.
FATE: add tests for yadif.
FATE: add a test for delogo video filter.
FATE: add a test for amix audio filter.
audiogen: allow specifying random seed as a commandline parameter.
vc1dec: Override invalid macroblock quantizer
vc1: avoid reading beyond the last line in vc1_draw_sprites()
vc1dec: check that coded slice positions and interlacing match.
vc1dec: Do not ignore ff_vc1_parse_frame_header_adv return value
configure: Move parts that should not be user-selectable to CONFIG_EXTRA
lavf: remove commented out cruft in avformat_find_stream_info()
...
Conflicts:
Makefile
configure
libavcodec/vc1dec.c
libavcodec/x86/h264_deblock.asm
libavcodec/x86/h264_deblock_10bit.asm
libavcodec/x86/h264dsp_mmx.c
libavfilter/version.h
libavformat/mp3dec.c
libavformat/utils.c
libavformat/wv.c
libavutil/x86/x86inc.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
libopenjpeg: support YUV and deep RGB pixel formats
Fix typo in v410 decoder.
vf_yadif: unset cur_buf on the input link.
vf_overlay: ensure the overlay frame does not get leaked.
vf_overlay: prevent premature freeing of cur_buf
Support urlencoded http authentication credentials
rtmp: Return an error when the client bandwidth is incorrect
rtmp: Return proper error code in handle_server_bw
rtmp: Return proper error code in handle_client_bw
rtmp: Return proper error codes in handle_chunk_size
lavr: x86: add missing vzeroupper in ff_mix_1_to_2_fltp_flt()
vp8: Replace x*155/100 by x*101581>>16.
vp3: don't use calls to inline asm in yasm code.
x86/dsputil: put inline asm under HAVE_INLINE_ASM.
dsputil_mmx: fix incorrect assembly code
rtmp: Factorize the code by adding handle_invoke
rtmp: Factorize the code by adding handle_chunk_size
rtmp: Factorize the code by adding handle_ping
rtmp: Factorize the code by adding handle_client_bw
rtmp: Factorize the code by adding handle_server_bw
Conflicts:
libavcodec/libopenjpegdec.c
libavcodec/x86/dsputil_mmx.c
libavfilter/vf_overlay.c
libavformat/Makefile
libavformat/version.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
In ff_put_pixels_clamped_mmx(), there are two assembly code blocks.
In the first block (in the unrolled loop), the instructions
"movq 8%3, %%mm1 \n\t", and so forth, have problems.
From above instruction, it is clear what the programmer wants: a load from
p + 8. But this assembly code doesn’t guarantee that. It only works if the
compiler puts p in a register to produce an instruction like this:
"movq 8(%edi), %mm1". During compiler optimization, it is possible that the
compiler will be able to constant propagate into p. Suppose p = &x[10000].
Then operand 3 can become 10000(%edi), where %edi holds &x. And the instruction
becomes "movq 810000(%edx)". That is, it will stride by 810000 instead of 8.
This will cause a segmentation fault.
This error was fixed in the second block of the assembly code, but not in
the unrolled loop.
How to reproduce:
This error is exposed when we build using Intel C++ Compiler, with
IPO+PGO optimization enabled. Crashed when decoding an MJPEG video.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
In file libavcodec/x86/dsputil_mmx.c, function ff_put_pixels_clamped_mmx(), there are two assembly code blocks. In the first block (in the unrolled loop), the instructions "movq 8%3, %%mm1 \n\t" etc have problem.
For above instruction, it is clear what the programmer wants: a load from p + 8. But this assembly code doesn’t guarantee that. It only works if the compiler puts p in a register to produce an instruction like this: “movq 8(%edi), %mm1”. During compiler optimization, it is possible that the compiler will be able to constant propagate into p. Suppose p = &x[10000]. Then operand 3 can become 10000(%edi), where %edi holds &x. And the instruction becomes “movq 810000(%edx)”. That is, it will stride by 810000 instead of 8.
This will cause the segmentation fault.
This error was fixed in the second block of the assembly code, but not in the unrolled loop.
How to reproduce:
This error is exposed when we build the ffmpeg using Intel C++ Compiler, IPO+PGO optimization. The ffmpeg was crashed when decoding a mjpeg video.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
Print full compiler identification, not only version number
flacdec: reverse lpc coeff order, simplify filter
x86: dsputil: drop some unused CPU flag debug code
Conflicts:
cmdutils.c
configure
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
ppc: fix build with altivec disabled
vp3: move idct and loop filter pointers to new vp3dsp context
build: add CONFIG_VP3DSP, reduce repetition in OBJS lists
tscc2: do not add/subtract 128 bias during DCT
tscc2: fix typo in DCT
configure: clarify external library section of help output
configure: mark libfdk-aac as nonfree
configure: cosmetics: drop some unnecessary backslashes
os_support: K&R formatting cosmetics
Conflicts:
configure
libavcodec/vp3.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This moves all VP3-specific function pointers from dsputil to a
new vp3dsp context. There is no reason to ever use the VP3 IDCT
where an MPEG2 IDCT is expected or vice versa.
Signed-off-by: Mans Rullgard <mans@mansr.com>
* qatar/master:
x86: Only use optimizations with cmov if the CPU supports the instruction
x86: Add CPU flag for the i686 cmov instruction
x86: remove unused inline asm macros from dsputil_mmx.h
x86: move some inline asm macros to the only places they are used
lavfi: Add the af_channelmap audio channel mapping filter.
lavfi: add join audio filter.
lavfi: allow audio filters to request a given number of samples.
lavfi: support automatically inserting the fifo filter when needed.
lavfi/audio: eliminate ff_default_filter_samples().
Conflicts:
Changelog
libavcodec/x86/h264dsp_mmx.c
libavfilter/Makefile
libavfilter/allfilters.c
libavfilter/avfilter.h
libavfilter/avfiltergraph.c
libavfilter/version.h
libavutil/x86/cpu.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
float_dsp: ppc: add a separate header for Altivec function prototypes
ARM: fix float_dsp breakage from d5a7229
Add a float DSP framework to libavutil
PPC: Move types_altivec.h and util_altivec.h from libavcodec to libavutil
ARM: Move asm.S from libavcodec to libavutil
vc1dsp: mark put/avg_vc1_mspel_mc() always_inline
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
dwt: check malloc calls
ppc: Drop unused header regs.h
af_resample: remove an extra space in the log output
Convert vector_fmul range of functions to YASM and add AVX versions
lavfi: add an audio split filter
lavfi: rename vf_split.c to split.c
Conflicts:
doc/filters.texi
libavcodec/ppc/regs.h
libavfilter/Makefile
libavfilter/allfilters.c
libavfilter/f_split.c
libavfilter/split.c
libavfilter/version.h
libavfilter/vf_split.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (25 commits)
rv40dsp x86: MMX/MMX2/3DNow/SSE2/SSSE3 implementations of MC
ape: Use unsigned integer maths
arm: dsputil: fix overreads in put/avg_pixels functions
h264: K&R formatting cosmetics for header files (part II/II)
h264: K&R formatting cosmetics for header files (part I/II)
rtmp: Implement check bandwidth notification.
rtmp: Support 'rtmp_swfurl', an option which specifies the URL of the SWF player.
rtmp: Support 'rtmp_flashver', an option which overrides the version of the Flash plugin.
rtmp: Support 'rtmp_tcurl', an option which overrides the URL of the target stream.
cmdutils: Add fallback case to switch in check_stream_specifier().
sctp: be consistent with socket option level
configure: Add _XOPEN_SOURCE=600 to Solaris preprocessor flags.
vcr1enc: drop pointless empty encode_init() wrapper function
vcr1: drop pointless write-only AVCodecContext member from VCR1Context
vcr1: group encoder code together to save #ifdefs
vcr1: cosmetics: K&R prettyprinting, typos, parentheses, dead code, comments
mov: make one comment slightly more specific
lavr: replace the SSE version of ff_conv_fltp_to_flt_6ch() with SSE4 and AVX
lavfi: move audio-related functions to a separate file.
lavfi: remove some audio-related function from public API.
...
Conflicts:
cmdutils.c
libavcodec/h264.h
libavcodec/h264_mvpred.h
libavcodec/vcr1.c
libavfilter/avfilter.c
libavfilter/avfilter.h
libavfilter/defaults.c
libavfilter/internal.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Code mostly inspired by vp8's MC, however:
- its MMX2 horizontal filter is worse because it can't take advantage of
the coefficient redundancy
- that same coefficient redundancy allows better code for non-SSSE3 versions
Benchmark (rounded to tens of unit):
V8x8 H8x8 2D8x8 V16x16 H16x16 2D16x16
C 445 358 985 1785 1559 3280
MMX* 219 271 478 714 929 1443
SSE2 131 158 294 425 515 892
SSSE3 120 122 248 387 390 763
End result is overall around a 15% speedup for SSSE3 version (on 6 sequences);
all loop filter functions now take around 55% of decoding time, while luma MC
dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
* qatar/master:
4xm: fix invalid array indexing
rv34dsp: factorize a multiplication in the noround inverse transform
rv40: perform bitwise checks in loop filter
rv34: remove inline keyword from rv34_decode_block().
rv40: change a logical test into a bitwise one.
rv34: remove constant parameter
rv40: don't always do the full prev_type search
dsputil x86: revert a test back to its previous value
rv34dsp x86: implement MMX2 inverse transform
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The new lowres support is limited to decoders where lowres decoding
is possible in high quality.
I was not able to measure any speed difference, but if one is found
the 2-3 lines that might affect speed can be made compile time conditional
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
ARM: allow runtime masking of CPU features
dsputil: remove unused functions
mov: Treat keyframe indexes as 1-origin if starting at non-zero.
mov: Take stps entries into consideration also about key_off.
Remove lowres video decoding
Conflicts:
ffmpeg.c
ffplay.c
libavcodec/arm/vp8dsp_init_arm.c
libavcodec/libopenjpegdec.c
libavcodec/mjpegdec.c
libavcodec/mpegvideo.c
libavcodec/utils.c
libavformat/mov.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
avcodec: remove AVCodecContext.dsp_mask
avconv: fix a segfault when default encoder for a format doesn't exist.
utvideo: general cosmetics
aac: Handle HE-AACv2 when sniffing a channel order.
movenc: Support high sample rates in isomedia formats by setting the sample rate field in stsd to 0.
xxan: Remove write-only variable in xan_decode_frame_type0().
ivi_common: Initialize a variable at declaration in ff_ivi_decode_blocks().
Conflicts:
ffmpeg.c
libavcodec/utvideo.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This removes all references to AVCodecContext.dsp_mask and marks
it for eviction at the next version bump. It has been superseded
by av_set_cpu_flag_mask() which, unlike this field, works everywhere.
Signed-off-by: Mans Rullgard <mans@mansr.com>
* qatar/master:
h264: Factorize declaration of mb_sizes array.
vsrc_buffer: when no frame is available, return an error instead of segfaulting.
configure: add dl to frei0r extralibs.
dsputil x86: use SSE float instruction instead of SSE2 integer equivalent
dsputil x86: remove deprecated parameter from scalarproduct_int16 prototype
vp8dsp x86: perform rounding shift with a single instruction
fate: add BMP tests.
swscale: handle complete dimensions for monoblack/white.
aacenc: Mark deinterleave_input_samples argument as const.
vf_unsharp: Mark readonly variable as const.
h264: fix 4:2:2 PCM-macroblocks decoding
Conflicts:
configure
libavcodec/h264.h
libavcodec/x86/dsputil_mmx.c
libavfilter/vf_unsharp.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (31 commits)
cdxl demux: do not create packets with uninitialized data at EOF.
Replace computations of remaining bits with calls to get_bits_left().
amrnb/amrwb: Remove get_bits usage.
cosmetics: reindent
avformat: do not require a pixel/sample format if there is no decoder
avformat: do not fill-in audio packet duration in compute_pkt_fields()
lavf: Use av_get_audio_frame_duration() in get_audio_frame_size()
dca_parser: parse the sample rate and frame durations
libspeexdec: do not set AVCodecContext.frame_size
libopencore-amr: do not set AVCodecContext.frame_size
alsdec: do not set AVCodecContext.frame_size
siff: do not set AVCodecContext.frame_size
amr demuxer: do not set AVCodecContext.frame_size.
aiffdec: do not set AVCodecContext.frame_size
mov: do not set AVCodecContext.frame_size
ape: do not set AVCodecContext.frame_size.
rdt: remove workaround for infinite loop with aac
avformat: do not require frame_size in avformat_find_stream_info() for CELT
avformat: do not require frame_size in avformat_find_stream_info() for MP1/2/3
avformat: do not require frame_size in avformat_find_stream_info() for AAC
...
Conflicts:
doc/APIchanges
libavcodec/Makefile
libavcodec/avcodec.h
libavcodec/h264.c
libavcodec/h264_ps.c
libavcodec/utils.c
libavcodec/version.h
libavcodec/x86/dsputil_mmx.c
libavformat/utils.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This splits ff_dsputil_init_mmx() into multiple functions, one for
each MMX/SSE level, somewhat simplifying the nested conditions.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Signed-off-by: Diego Biurrun <diego@biurrun.de>
* qatar/master: (26 commits)
avconv: deprecate the -deinterlace option
doc: Fix the name of the new function
aacenc: make sure to encode enough frames to cover all input samples.
aacenc: only use the number of input samples provided by the user.
wmadec: Verify bitstream size makes sense before calling init_get_bits.
kmvc: Log into a context at a log level constant.
mpeg12: Pad framerate tab to 16 entries.
kgv1dec: Increase offsets array size so it is large enough.
kmvc: Check palsize.
nsvdec: Propagate errors
nsvdec: Be more careful with av_malloc().
nsvdec: Fix use of uninitialized streams.
movenc: cosmetics: Get rid of camelCase identifiers
swscale: more generic check for planar destination formats with alpha
doc: Document mov/mp4 fragmentation options
build: Use order-only prerequisites for creating FATE reference file dirs.
x86 dsputil: provide SSE2/SSSE3 versions of bswap_buf
rtsp: Remove some unused variables from ff_rtsp_connect().
avutil: make intfloat api public
avformat_write_header(): detail error message
...
Conflicts:
doc/APIchanges
doc/ffmpeg.texi
doc/muxers.texi
ffmpeg.c
libavcodec/kmvc.c
libavcodec/x86/Makefile
libavcodec/x86/dsputil_yasm.asm
libavcodec/x86/pngdsp-init.c
libavformat/movenc.c
libavformat/movenc.h
libavformat/mpegtsenc.c
libavformat/nsvdec.c
libavformat/utils.c
libavutil/avutil.h
libswscale/swscale.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
While pshufb allows emulating bswap on XMM registers for SSSE3, more
shuffling is needed for SSE2. Alignment is critical, so specific codepaths
are provided for this case.
For the huffyuv sequence "angels_480-huffyuvcompress.avi":
C (using bswap instruction): ~ 55k cycles
SSE2: ~ 40k cycles
SSSE3 using unaligned loads: ~ 35k cycles
SSSE3 using aligned loads: ~ 30k cycles
Signed-off-by: Diego Biurrun <diego@biurrun.de>
* qatar/master:
png: add missing #if HAVE_SSSE3 around function pointer assignment.
imdct36: mark SSE functions as using all 16 XMM registers.
png: move DSP functions to their own DSP context.
sunrast: Add a sample request for TIFF, IFF, and Experimental Rastfile formats.
sunrast: Cosmetics
sunrast: Remove if (unsigned int < 0) check.
sunrast: Replace magic number by a macro.
Conflicts:
libavcodec/dsputil.c
libavcodec/dsputil.h
libavcodec/pngdec.c
libavcodec/sunrast.c
libavcodec/x86/Makefile
libavcodec/x86/dsputil_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (22 commits)
wma: Clip WMA1 and WMA2 frame length to 11 bits.
movenc: Don't require frame_size to be set for modes other than mov
doc: Update APIchanges with info on muxer flushing
movenc: Reindent a block
tools: Remove some unnecessary #undefs.
rv20: prevent calling ff_h263_decode_mba() with unset height/width
tools: K&R reformatting cosmetics
Ignore generated aviocat and ismindex tools.
build: Automatically include architecture-specific library Makefile snippets.
indeo5: prevent null pointer dereference on broken files
pktdumper: Use usleep instead of sleep
cosmetics: Remove some unnecessary block braces.
Drop unnecessary prefix from *sink* variable and struct names.
Add a tool for creating smooth streaming manifests
movdec: Calculate an average bit rate for fragmented streams, too
movenc: Write the sample rate instead of time scale in the stsd atom
movenc: Add a separate ismv/isma (smooth streaming) muxer
movenc: Allow the caller to decide on fragmentation
libavformat: Add a flag for muxers that support write_packet(NULL) for flushing
movenc: Add support for writing fragmented mov files
...
Conflicts:
Changelog
cmdutils.c
cmdutils.h
doc/APIchanges
ffmpeg.c
ffplay.c
libavfilter/Makefile
libavformat/Makefile
libavformat/avformat.h
libavformat/movenc.c
libavformat/movenc.h
libavformat/version.h
tools/graph2dot.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (23 commits)
applehttp: Properly clean up if unable to probe a segment
applehttp: Avoid reading uninitialized memory
fate: Replace misleading "aac" in the name of an ADTS test with "adts".
fate: Drop pointless "-an" from pictor test command.
fate: split off image codec FATE tests into their own file
fate: split off WMA codec FATE tests into their own file
fate: split off lossless video and audio FATE tests into their own files
fate: split off qtrle codec FATE tests into their own file
fate: split off Ut Video codec FATE tests into their own file
fate: split off screen codec FATE tests into their own file
fate: split off Real Inc. codec FATE tests into their own file
fate: split off AC-3 codec FATE tests into their own file
mpegvideo: remove abort() in ff_find_unused_picture()
rv40: NEON optimised loop filter strength selection
rv40: rearrange loop filter functions
configure: cosmetics: sort some lists where appropriate
swscale_mmx: drop no longer required parameters from VSCALEX macros
swscale: Mark yuv2planeX_8_mmx as MMX2; it contains MMX2 instructions.
build: conditionally compile x86 H.264 chroma optimizations
v410 encoder and decoder
...
Conflicts:
Changelog
configure
doc/developer.texi
doc/general.texi
libavcodec/arm/asm.S
libavcodec/avcodec.h
libavcodec/v410dec.c
libavcodec/v410enc.c
libavcodec/version.h
libavcodec/x86/Makefile
libavcodec/x86/dsputil_mmx.c
libswscale/x86/swscale_mmx.c
tests/Makefile
tests/fate2.mak
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (22 commits)
aacdec: Fix PS in ADTS.
avconv: Consistently use PIX_FMT_NONE.
dsputil: use cpuflags in x86 emu_edge_core
dsputil: use movups instead of movdqu in ff_emu_edge_core_sse()
wma: initialize prev_block_len_bits, next_block_len_bits, and block_len_bits.
mov: Remove some redundant and obsolete comments.
Add libavutil/mathematics.h #includes for INFINITY
doxy: structure libavformat groups
doxy: introduce an empty structure in libavcodec
doxy: provide a start page and document libavutil
doxy: cleanup pixfmt.h
regtest: split video encode/decode tests into individual targets
ARM: add explicit .arch and .fpu directives to asm.S
pthread: do not touch has_b_frames
avconv: cleanup the transcoding loop in output_packet().
avconv: split subtitle transcoding out of output_packet().
avconv: split video transcoding out of output_packet().
avconv: split audio transcoding out of output_packet().
avconv: reindent.
avconv: move streamcopy-only code out of decoding loop.
...
Conflicts:
avconv.c
libavcodec/aaccoder.c
libavcodec/pthread.c
libavcodec/version.h
libavutil/audioconvert.h
libavutil/avutil.h
libavutil/mem.h
tests/ref/vsynth1/dv
tests/ref/vsynth1/mpeg2thread
tests/ref/vsynth2/dv
tests/ref/vsynth2/mpeg2thread
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
vble: remove vble_error_close
VBLE Decoder
tta: use an integer instead of a pointer to iterate output samples
shorten: do not modify samples pointer when interleaving
mpc7: only support stereo input.
dpcm: do not try to decode empty packets
dpcm: remove unneeded buf_size==0 check.
twinvq: add SSE/AVX optimized sum/difference stereo interleaving
vqf/twinvq: pass vqf COMM chunk info in extradata
vqf: do not set bits_per_coded_sample for TwinVQ.
twinvq: check for allocation failure in init_mdct_win()
swscale: add padding to conversion buffer.
rtpdec: Simplify finalize_packet
http: Handle proxy authentication
http: Print an error message for Authorization Required, too
AVOptions: don't return an invalid option when option list is empty
AIFF: add 'twos' FourCC for the mux/demuxer (big endian PCM audio)
Conflicts:
libavcodec/avcodec.h
libavcodec/tta.c
libavcodec/vble.c
libavcodec/version.h
libavutil/opt.c
libswscale/utils.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
avformat: Avoid a warning about mixed declarations and code
BMV demuxer and decoder
matroskaenc: Make sure the seekhead struct is freed even on seek failure
mpeg12enc: Remove write-only variables.
mpeg12enc: Don't set up run-level info for level 0.
msmpeg4: Don't set up run-level info for level 0.
avformat: Warn about using network functions without calling avformat_network_init
avformat: Revise wording
rdt: Set AVFMT_NOFILE on ff_rdt_demuxer
rdt: Check the return value of avformat_open
rtsp: Discard the dynamic handler, if it has an alloc function which failed
dsputil: use cpuflags in x86 versions of vector_clip_int32()
Conflicts:
libavcodec/avcodec.h
libavcodec/version.h
libavformat/Makefile
libavformat/allformats.c
libavformat/version.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Look for MMX_DISABLED to find the disabled functions.
Authors of this code are Marco Gerards <marco@gnu.org> and David Conrad <lessen42@gmail.com>
With changes from Jordi Ortiz <nenjordi@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (35 commits)
libopencore-amr: check output buffer size before decoding
libopencore-amr: remove unneeded buf_size==0 check.
libopencore-amr: remove unneeded frame_count field.
aac_latm: remove unneeded check for zero-size packet.
pcmdec: fix output buffer size check by calculating the actual output size prior to decoding.
pcmdec: move codec-specific variable declarations to the corresponding codec blocks.
pcmdec: return buf_size instead of src-buf.
avcodec: remove the Zork PCM encoder.
pcm_zork: use AV_SAMPLE_FMT_U8 instead of shifting all samples by 8.
pcmenc: remove unneeded sample_fmt check.
pcmdec: move number of channels check to pcm_decode_init()
pcmdec: remove unnecessary check for sample_fmt change
pcmdec: move DVD PCM bits_per_coded_sample check near to the code that sets the sample size.
pcmdec: do not needlessly set *data_size to 0
alacdec: remove unneeded NULL or zero-size packet checks.
alacdec: simplify buffer allocation by using FF_ALLOC_OR_GOTO()
alacdec: ask for a sample for unsupported sample depths.
alacdec: cosmetics: use 'ch' instead of 'chan' to iterate channels
alacdec: move some declarations to the top of the function
alacdec: always use get_sbits_long() for uncompressed samples
...
Conflicts:
libavcodec/pcm.c
tests/ref/acodec/pcm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (23 commits)
fix AC3ENC_OPT_MODE_ON/OFF
h264: fix HRD parameters parsing
prores: implement multithreading.
prores: idct sse2/sse4 optimizations.
swscale: use aligned move for storage into temporary buffer.
prores: extract idct into its own dspcontext and merge with put_pixels.
h264: fix invalid shifts in init_cavlc_level_tab()
intfloat_readwrite: fix signed addition overflows
mov: do not misreport empty stts
mov: cosmetics, fix for and if spacing
id3v2: fix NULL pointer dereference
mov: read album_artist atom
mov: fix disc/track numbers and totals
doc: fix references to obsolete presets directories for avconv/ffmpeg
flashsv: return more meaningful error value
flashsv: fix typo in av_log() message
smacker: validate channels and sample format.
smacker: check buffer size before reading output size
smacker: validate number of channels
smacker: Separate audio flags from sample rates in smacker demuxer.
...
Conflicts:
cmdutils.h
doc/ffmpeg.texi
libavcodec/Makefile
libavcodec/motion_est_template.c
libavformat/id3v2.c
libavformat/mov.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
Fix NASM include directive
dsputil_mmx: Honor HAVE_AMD3DNOW
lavf,lavd: remove all usage of AVFormatParameters from demuxers.
jack: add 'channels' private option.
VC-1: fix reading of custom PAR.
Remove redundant and dubious video codec detection by its extradata
mpeg12: remove repeat-field code disabled since May 2002
patch checklist: suggest fate instead of regression tests
Turn on resampling on sudden size change instead of bailing out during recode.
avtools: reinitialise filter chain when input video stream changes dimensions
Conflicts:
Makefile
avconv.c
doc/developer.texi
ffplay.c
libavcodec/x86/dsputil_mmx.c
libavdevice/libdc1394.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'b2c087871dafc7d030b2d48457ddff597dfd4925':
Move x86util.asm from libavcodec/ to libavutil/.
Move x86inc.asm to libavutil/.
APIchanges: note error_recognition in lavf
lavf: add support for error_recognition, use it in avidec, and bump minor API version
avconv: change semantics of -map
avconv: get rid of new* options.
cmdutils: allow precisely specifying a stream for AVOptions.
configure: add missing CFLAGS to fix building on the HURD
libx264: Include hint for possible values for configuring libx264
cmdutils: allow ':'-separated modifiers in option names.
avconv: make -map_metadata work consistently with the other options
avconv: remove deprecated options.
avconv: make -map_chapters accept only the input file index.
Make a copy of ffmpeg under a new name -- avconv.
ffmpeg: add a warning stating that the program is deprecated.
Add weighted motion compensation for RV40 B-frames
RV3/4: calculate B-frame motion weights once per frame
Move RV3/4-specific DSP functions into their own context
mjpeg: propagate decode errors from ff_mjpeg_decode_sos and ff_mjpeg_decode_dqt
h264: notice memory allocation failure
Conflicts:
.gitignore
Makefile
cmdutils.c
configure
doc/ffplay.texi
doc/ffprobe.texi
doc/ffserver.texi
libavcodec/libx264.c
libavformat/avformat.h
libavformat/avidec.c
libavformat/version.h
tests/lavf-regression.sh
tests/lavfi-regression.sh
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
mxfdec: Include FF_INPUT_BUFFER_PADDING_SIZE when allocating extradata.
H.264: tweak some other x86 asm for Atom
probe: Fix insane flow control.
mpegts: remove invalid error check
s302m: use nondeprecated audio sample format API
lavc: use designated initialisers for all codecs.
x86: cabac: add operand size suffixes missing from 6c32576
Conflicts:
libavcodec/ac3enc_float.c
libavcodec/flacenc.c
libavcodec/frwu.c
libavcodec/pictordec.c
libavcodec/qtrleenc.c
libavcodec/v210enc.c
libavcodec/wmv2dec.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
dnxhddec: optimise dnxhd_decode_dct_block()
rtp: remove disabled code
eac3enc: use different numbers of blocks per frame to allow higher bitrates
dnxhd: add regression test for 10-bit
dnxhd: 10-bit support
dsputil: update per-arch init funcs for non-h264 high bit depth
dsputil: template get_pixels() for different bit depths
dsputil: create 16/32-bit dctcoef versions of some functions
jfdctint: add 10-bit version
mov: add clcp type track as Subtitle stream.
mpeg4: add Mpeg4 Profiles names.
mpeg4: decode Level Profile for MPEG4 Part 2.
ffprobe: display bitstream level.
imgconvert: remove unused glue and xglue macros
Conflicts:
libavcodec/dsputil_template.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>