* qatar/master:
x86: vc1: call ff_vc1dsp_init_x86() under if (ARCH_X86)
x86: cavs: call ff_cavsdsp_init_x86() under if (ARCH_X86)
x86: call most of the x86 dsp init functions under if (ARCH_X86)
doc: support the new website layout
doc: remove a warning from filters.texi
doc: initial nut documentation
segment: drop global headers setting
lavu: fix typo in Makefile
Conflicts:
doc/Makefile
doc/filters.texi
doc/t2h.init
libavcodec/fmtconvert.c
libavcodec/proresdsp.c
libavcodec/x86/Makefile
libavcodec/x86/vc1dsp_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
x86: dsputil: Only compile motion_est code when encoders are enabled
mem: fix typo in check for __ICC
fate: mp3: drop redundant CMP setting
rtp: Depacketization of JPEG (RFC 2435)
Rename ff_put_string to avpriv_put_string
mjpeg: Rename some symbols to avpriv_* instead of ff_*
yadif: cosmetics
Conflicts:
Changelog
libavcodec/mjpegenc.c
libavcodec/x86/Makefile
libavfilter/vf_yadif.c
libavformat/version.h
libavutil/mem.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
build: export filtered -lz flag in config.mak
build: add separate setting for host linker
configure: probe_cc: use separate variable for linker output flag
x86: Always compile files with functions that are called unconditionally
x86: mpegvideoenc: fix linking with --disable-mmx
x86: mpegvideoenc: Do not abuse HAVE_ variables for template instantiation
Conflicts:
Makefile
configure
libavcodec/x86/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'd488c3bcbaf7ddda42597e014deb661a7e9e2112':
configure: support Bitrig OS
yuv2rgb: handle line widths that are not a multiple of 4.
graph2dot: Use the fallback getopt implementation if needed
tools: Include io.h for open/read/write/close if unistd.h doesn't exist
testprogs: Remove unused includes
qt-faststart: Use other seek/tell functions on MSVC than on mingw
ismindex: Include direct.h for _mkdir on windows
sdp: Use static const char arrays instead of pointers to strings
x86: avcodec: Drop silly "_mmx" suffixes from filenames
x86: avcodec: Drop silly "_sse" suffixes from filenames
sdp: Include profile-level-id for H264
utvideoenc: use ff_huff_gen_len_table
huffman: add ff_huff_gen_len_table
cllc: simplify/fix swapped data buffer allocation.
rtpdec_h264: Don't set the pixel format
h264: Check that the codec isn't null before accessing it
audio_frame_queue: Define af_queue_log_state before using it
Conflicts:
libavcodec/audio_frame_queue.c
libavcodec/h264.c
libavcodec/huffman.h
libavcodec/huffyuv.c
libavcodec/utvideoenc.c
libavcodec/x86/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
vf_hqdn3d: Don't declare the loop variable within the for loop
huffyuv: update to current coding style
huffman: update to current coding style
rtsp: Free the rtpdec context properly
build: fft: x86: Drop unused YASM-OBJS-FFT- variable
Conflicts:
libavcodec/huffman.c
libavcodec/huffyuv.c
libavcodec/x86/Makefile
libavfilter/vf_hqdn3d.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'a1bcc76e6036e78f25cbb7323c145056cfca9d93': (21 commits)
cmdutils: fix a memleak when specifying an option twice.
x86: mpegvideo: more sensible names for optimization file and init function
x86: mpegvideoenc: Split optimizations off into a separate file
dnxhdenc: x86: more sensible names for optimization file and init function
svq1/svq3: Move common code out of SVQ1 decoder-specific file
dirac: add Comments and references to the standard
lavr: x86: optimized 6-channel flt to fltp conversion
lavr: x86: optimized 2-channel flt to fltp conversion
lavr: x86: optimized 6-channel flt to s16p conversion
lavr: x86: optimized 2-channel flt to s16p conversion
lavr: x86: optimized 6-channel s16 to fltp conversion
lavr: x86: optimized 2-channel s16 to fltp conversion
lavr: x86: optimized 6-channel s16 to s16p conversion
lavr: x86: optimized 2-channel s16 to s16p conversion
lavr: x86: optimized 2-channel fltp to flt conversion
lavr: x86: optimized 6-channel fltp to s16 conversion
lavr: x86: optimized 2-channel fltp to s16 conversion
lavr: x86: optimized 6-channel s16p to flt conversion
lavr: x86: optimized 2-channel s16p to flt conversion
lavr: x86: optimized 6-channel s16p to s16 conversion
...
Conflicts:
libavcodec/dirac.c
libavcodec/mpegvideo.h
libavcodec/x86/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
build: x86: Only compile mpegvideo optimizations when necessary
configure: Drop fastdiv option
build: Make the E-AC-3 encoder select the AC-3 encoder
fate: flac: Only run tests requiring samples when samples are available
Conflicts:
configure
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
Fix even more missing includes after the common.h removal
build: Factor out rangecoder dependencies to CONFIG_RANGECODER
build: Factor out error resilience dependencies to CONFIG_ERROR_RESILIENCE
x86: avcodec: Consistently name all init files
Add more missing includes after removing the implicit common.h
Add some more missing includes after removing the implicit common.h
Don't include common.h from avutil.h
rtmp: Automatically compute the hash for SWFVerification
Conflicts:
configure
doc/APIchanges
doc/examples/decoding_encoding.c
libavcodec/Makefile
libavcodec/assdec.c
libavcodec/audio_frame_queue.c
libavcodec/avpacket.c
libavcodec/dv_profile.c
libavcodec/dwt.c
libavcodec/libtheoraenc.c
libavcodec/rawdec.c
libavcodec/rv40dsp.c
libavcodec/tiff.c
libavcodec/tiffenc.c
libavcodec/v210dec.h
libavcodec/vc1dsp.c
libavcodec/x86/Makefile
libavfilter/asrc_anullsrc.c
libavfilter/avfilter.c
libavfilter/buffer.c
libavfilter/formats.c
libavfilter/vf_ass.c
libavfilter/vf_drawtext.c
libavfilter/vf_fade.c
libavfilter/vf_select.c
libavfilter/video.c
libavfilter/vsrc_testsrc.c
libavformat/version.h
libavutil/audioconvert.c
libavutil/error.h
libavutil/version.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
rtmp: Add support for SWFVerification
api-example: use new video encoding API.
x86: avcodec: Appropriately name files containing only init functions
mpegvideo_mmx_template: drop some commented-out cruft
libavresample: add mix level normalization option
w32pthreads: Add missing #includes to make header compile standalone
rtmp: Gracefully ignore _checkbw errors by tracking them
rtmp: Do not send _checkbw calls as notifications
prores: interlaced ProRes encoding
Conflicts:
doc/examples/decoding_encoding.c
libavcodec/proresenc_kostya.c
libavcodec/w32pthreads.h
libavcodec/x86/Makefile
libavformat/version.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (22 commits)
g723.1: do not pass large structs by value
g723.1: do not bounce intermediate values via memory
g723.1: declare a variable in the block it is used
g723.1: avoid saving/restoring excitation
g723.1: avoid unnecessary memcpy() in residual_interp()
g723.1: make postfilter write directly to output buffer
g723.1: drop unnecessary variable buf_ptr in formant_postfilter()
g723.1: make scale_vector() output to a separate buffer
g723.1: make autocorr_max() work on an arbitrary buffer
g723.1: do not needlessly use int64_t
g723.1: use saturating addition functions
g723.1: optimise scale_vector()
g723.1: remove useless uses of MUL64()
g723.1: remove unnecessary argument 'shift' from dot_product()
g723.1: deobfuscate "(x << 4) - x" to "15 * x"
celp: optimise ff_celp_lp_synthesis_filter()
libavutil: add saturating addition functions
cllc: Implement ARGB support
cllc: Add support for QRGB
cllc: Rename some funcs to represent what they actually do
...
Conflicts:
LICENSE
libavcodec/g723_1.c
libavcodec/x86/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
These functions are not faster than other mmx implementations on
any hardware I have been able to test on, and they are horribly
inaccurate. There is thus no reason to ever use them.
Signed-off-by: Mans Rullgard <mans@mansr.com>
* qatar/master:
ppc: fix build with altivec disabled
vp3: move idct and loop filter pointers to new vp3dsp context
build: add CONFIG_VP3DSP, reduce repetition in OBJS lists
tscc2: do not add/subtract 128 bias during DCT
tscc2: fix typo in DCT
configure: clarify external library section of help output
configure: mark libfdk-aac as nonfree
configure: cosmetics: drop some unnecessary backslashes
os_support: K&R formatting cosmetics
Conflicts:
configure
libavcodec/vp3.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This moves all VP3-specific function pointers from dsputil to a
new vp3dsp context. There is no reason to ever use the VP3 IDCT
where an MPEG2 IDCT is expected or vice versa.
Signed-off-by: Mans Rullgard <mans@mansr.com>
* qatar/master:
wtv: Check the return value from gmtime
x86: fft: convert sse inline asm to yasm
x86: place some inline asm under #if HAVE_INLINE_ASM
Conflicts:
libavcodec/x86/fft_sse.c
libavformat/wtv.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
build: ppc: drop stray leftover backslash
build: Only clean the architecture subdirectory we build for.
build: drop some unnecessary dependencies from the H.264 parser
build: prettyprinting cosmetics
libavutil: Remove pointless rational test program.
libavutil: Remove broken and pointless lzo test program.
lavf doxy: expand AVStream.codec doxy.
lavf doxy: improve AVStream.time_base doxy.
lavf doxy: add some basic documentation about reading from the demuxer.
lavf doxy: document passing options to demuxers.
lavf doxy: clarify that an AVPacket contains encoded data.
mpegtsenc: allow user triggered PES packet flushing
APIchanges: mark the place where 0.7 was cut.
APIchanges: mark the place where 0.8 was cut.
APIchanges: fill in missing dates and hashes.
smacker: convert palette and header reading to bytestream2.
alac: convert extradata reading to bytestream2.
Conflicts:
doc/APIchanges
libavcodec/smacker.c
libavcodec/x86/Makefile
libavfilter/Makefile
libavutil/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
docs: use -bsf:[vas] instead of -[vas]bsf.
mpegaudiodec: Prevent premature clipping of mp3 input buffer.
lavf: move the packet keyframe setting code.
oggenc: free comment header for all codecs
lcl: error out if uncompressed input buffer is smaller than framesize.
mjpeg: abort decoding if packet is too large.
golomb: use HAVE_BITS_REMAINING() macro to prevent infloop on EOF.
get_bits: add HAVE_BITS_REMAINING macro.
lavf/output-example: use new audio encoding API correctly.
lavf/output-example: more proper usage of the new API.
tiff: Prevent overreads in the type_sizes array.
tiff: Make the TIFF_LONG and TIFF_SHORT types unsigned.
apetag: do not leak memory if avio_read() fails
apetag: propagate errors.
SBR DSP x86: implement SSE sbr_hf_g_filt
SBR DSP x86: implement SSE sbr_sum_square_sse
SBR DSP: use intptr_t for the ixh parameter.
Conflicts:
doc/bitstream_filters.texi
doc/examples/muxing.c
doc/ffmpeg.texi
libavcodec/golomb.h
libavcodec/x86/Makefile
libavformat/oggenc.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The 32bits targets have been compiled with -mfpmath=sse for proper reference.
sbr_sum_square C /32bits: 82c (unrolled)/102c
C /64bits: 69c (unrolled)/82c
SSE/32bits: 42c
SSE/64bits: 31c
Use of SSE4.1 dpps to perform the final sum is slower.
Not unrolling to perform 8 operations in a loop yields 10 more cycles.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* qatar/master:
libx264: fix indentation.
vorbis: fix overflows in floor1[] vector and inverse db table index.
win64: add a XMM clobber test configure option.
movdec: Parse the dvc1 atom
ARM: ac3: fix ac3_bit_alloc_calc_bap_armv6
swscale: K&R formatting cosmetics for Blackfin code
frwu: lowercase the FRWU codec name
movdec: fix dts generation in fragmented files
fate: make acodec-ac3_fixed test output raw AC3
APIchanges: add missing commit hashes
swscale: implement MMX, SSE2 and AVX functions for RGB32 input.
ra144enc: drop pointless "encoder" from .long_name
bethsoftvideo: fix palette reading.
mpc7: use av_fast_padded_malloc()
mpc7: simplify handling of packet sizes that are not a multiple of 4 bytes
doc: decoding Forward Uncompressed is supported
Fix a typo in the x86 asm version of ff_vector_clip_int32()
pcmenc: Do not set avpkt->size.
ff_alloc_packet: modify the size of the packet to match the requested size
Conflicts:
doc/APIchanges
libavcodec/libx264.c
libavcodec/mpc7.c
libavformat/isom.h
libswscale/Makefile
libswscale/bfin/yuv2rgb_bfin.c
tests/ref/fate/bethsoft-vid
tests/ref/seek/ac3_ac3
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This will be useful to test more aggressively for failures to mark XMM
registers as clobbered in Win64 builds, and prevent regressions thereof.
Based on a patch by Ramiro Polla <ramiro.polla@gmail.com>
* qatar/master: (29 commits)
fate: add golomb-test
golomb-test: K&R formatting cosmetics
h264: Split h264-test off into a separate file - golomb-test.c.
h264-test: cleanup: drop timer invocations, commented out code and other cruft
h264-test: Remove unused DSP and AVCodec contexts and related init calls.
adpcm: Add missing stdint.h #include to fix standalone header compilation.
lavf: add functions for accessing the fourcc<->CodecID mapping tables.
lavc: set AVCodecContext.codec in avcodec_get_context_defaults3().
lavc: make avcodec_close() work properly on unopened codecs.
lavc: add avcodec_is_open().
lavf: rename AVInputFormat.value to raw_codec_id.
lavf: remove the pointless value field from flv and iv8
lavc/lavf: remove unnecessary symbols from the symbol version script.
lavc: reorder AVCodec fields.
lavf: reorder AVInput/OutputFormat fields.
mp3dec: Fix a heap-buffer-overflow
adpcmenc: remove some unneeded casts
adpcmenc: use int16_t and uint8_t instead of short and unsigned char.
adpcmenc: fix adpcm_ms extradata allocation
adpcmenc: return proper AVERROR codes instead of -1
...
Conflicts:
doc/APIchanges
libavcodec/Makefile
libavcodec/adpcmenc.c
libavcodec/avcodec.h
libavcodec/h264.c
libavcodec/libavcodec.v
libavcodec/mpc7.c
libavcodec/mpegaudiodec.c
libavcodec/options.c
libavformat/Makefile
libavformat/avformat.h
libavformat/flvdec.c
libavformat/libavformat.v
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (26 commits)
avconv: deprecate the -deinterlace option
doc: Fix the name of the new function
aacenc: make sure to encode enough frames to cover all input samples.
aacenc: only use the number of input samples provided by the user.
wmadec: Verify bitstream size makes sense before calling init_get_bits.
kmvc: Log into a context at a log level constant.
mpeg12: Pad framerate tab to 16 entries.
kgv1dec: Increase offsets array size so it is large enough.
kmvc: Check palsize.
nsvdec: Propagate errors
nsvdec: Be more careful with av_malloc().
nsvdec: Fix use of uninitialized streams.
movenc: cosmetics: Get rid of camelCase identifiers
swscale: more generic check for planar destination formats with alpha
doc: Document mov/mp4 fragmentation options
build: Use order-only prerequisites for creating FATE reference file dirs.
x86 dsputil: provide SSE2/SSSE3 versions of bswap_buf
rtsp: Remove some unused variables from ff_rtsp_connect().
avutil: make intfloat api public
avformat_write_header(): detail error message
...
Conflicts:
doc/APIchanges
doc/ffmpeg.texi
doc/muxers.texi
ffmpeg.c
libavcodec/kmvc.c
libavcodec/x86/Makefile
libavcodec/x86/dsputil_yasm.asm
libavcodec/x86/pngdsp-init.c
libavformat/movenc.c
libavformat/movenc.h
libavformat/mpegtsenc.c
libavformat/nsvdec.c
libavformat/utils.c
libavutil/avutil.h
libswscale/swscale.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Provide MMX, SSE2 and SSSE3 versions, with a fast-path when the weights are
multiples of 512 (which is often the case when the values round up nicely).
*_TIMER report for the 16x16 and 8x8 cases:
C:
9015 decicycles in 16, 524257 runs, 31 skips
2656 decicycles in 8, 524271 runs, 17 skips
MMX:
4156 decicycles in 16, 262090 runs, 54 skips
1206 decicycles in 8, 262131 runs, 13 skips
MMX on fast-path:
2760 decicycles in 16, 524222 runs, 66 skips
995 decicycles in 8, 524252 runs, 36 skips
SSE2:
2163 decicycles in 16, 262131 runs, 13 skips
832 decicycles in 8, 262137 runs, 7 skips
SSE2 with fast path:
1783 decicycles in 16, 524276 runs, 12 skips
711 decicycles in 8, 524283 runs, 5 skips
SSSE3:
2117 decicycles in 16, 262136 runs, 8 skips
814 decicycles in 8, 262143 runs, 1 skips
SSSE3 with fast path:
1315 decicycles in 16, 524285 runs, 3 skips
578 decicycles in 8, 524286 runs, 2 skips
This means around a 4% speedup for some sequences.
Signed-off-by: Diego Biurrun <diego@biurrun.de>