FFALIGN doesn't work with non-powers-of-2.
This reverts commit 7ad1b612c8.
Signed-off-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Line sizes are only 8-byte aligned, so use unaliged loads
for add_bytes_l2 pointers.
Increasing the alignment requirement to 16 seemed a bit extreme
(png may be used for rather small sizes).
Also fix a mov that had its arguments swapped, leading
add_bytes_l2 being applied on up to 8 bytes too few.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
* qatar/master:
libx264: fix indentation.
vorbis: fix overflows in floor1[] vector and inverse db table index.
win64: add a XMM clobber test configure option.
movdec: Parse the dvc1 atom
ARM: ac3: fix ac3_bit_alloc_calc_bap_armv6
swscale: K&R formatting cosmetics for Blackfin code
frwu: lowercase the FRWU codec name
movdec: fix dts generation in fragmented files
fate: make acodec-ac3_fixed test output raw AC3
APIchanges: add missing commit hashes
swscale: implement MMX, SSE2 and AVX functions for RGB32 input.
ra144enc: drop pointless "encoder" from .long_name
bethsoftvideo: fix palette reading.
mpc7: use av_fast_padded_malloc()
mpc7: simplify handling of packet sizes that are not a multiple of 4 bytes
doc: decoding Forward Uncompressed is supported
Fix a typo in the x86 asm version of ff_vector_clip_int32()
pcmenc: Do not set avpkt->size.
ff_alloc_packet: modify the size of the packet to match the requested size
Conflicts:
doc/APIchanges
libavcodec/libx264.c
libavcodec/mpc7.c
libavformat/isom.h
libswscale/Makefile
libswscale/bfin/yuv2rgb_bfin.c
tests/ref/fate/bethsoft-vid
tests/ref/seek/ac3_ac3
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This will be useful to test more aggressively for failures to mark XMM
registers as clobbered in Win64 builds, and prevent regressions thereof.
Based on a patch by Ramiro Polla <ramiro.polla@gmail.com>
* qatar/master: (22 commits)
frwu: Employ more meaningful return values.
fraps: Use av_fast_padded_malloc() instead of av_realloc()
mjpegdec: use av_fast_padded_malloc()
eatqi: use av_fast_padded_malloc()
asv1: use av_fast_padded_malloc()
avcodec: Add av_fast_padded_malloc().
swscale: enable dithering in MMX functions.
swscale: make rgb24 function macros slightly smaller.
avcodec.h: Remove some disabled cruft.
swscale: remove obsolete comment.
swscale-test: Drop unused argc and argv arguments from main().
zmbv: Employ more meaningful return values.
zmbvenc: Employ more meaningful return values.
vc1: prevent null pointer dereference on broken files
zmbv: check av_realloc() return values and avoid memleaks on ENOMEM
truespeech: align buffer
ac3: Do not read past the end of ff_ac3_band_start_tab.
dv: Fix small stack overread related to CVE-2011-3929 and CVE-2011-3936.
dv: Fix null pointer dereference due to ach=0
dv: check stype
...
Conflicts:
doc/APIchanges
libavcodec/asv1.c
libavcodec/avcodec.h
libavcodec/eatqi.c
libavcodec/fraps.c
libavcodec/frwu.c
libavcodec/zmbv.c
libavformat/dv.c
libswscale/swscale.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Return the correct number of consumed bytes and set *data_size = 0.
Returned size is 1 too small, leading to that 1 byte being read as the next
frame, which results in an extra blank frame at the beginning of the stream.
Avoids doing malloc/free for each frame.
Also fixes valgrind errors due to use of uninitialized padding bytes.
Based on a patch by Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Wrapper around av_fast_malloc() that keeps FF_INPUT_BUFFER_PADDING_SIZE
zero-padded bytes at the end of the used buffer.
Based on a patch by Reimar Döffinger <Reimar.Doeffinger@gmx.de>.
"Copyright (c) 2001 Michael Niedermayer" and "part of Libav" is not likely
not only am i not a libav developer there also was no libav in 2001
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (29 commits)
fate: add golomb-test
golomb-test: K&R formatting cosmetics
h264: Split h264-test off into a separate file - golomb-test.c.
h264-test: cleanup: drop timer invocations, commented out code and other cruft
h264-test: Remove unused DSP and AVCodec contexts and related init calls.
adpcm: Add missing stdint.h #include to fix standalone header compilation.
lavf: add functions for accessing the fourcc<->CodecID mapping tables.
lavc: set AVCodecContext.codec in avcodec_get_context_defaults3().
lavc: make avcodec_close() work properly on unopened codecs.
lavc: add avcodec_is_open().
lavf: rename AVInputFormat.value to raw_codec_id.
lavf: remove the pointless value field from flv and iv8
lavc/lavf: remove unnecessary symbols from the symbol version script.
lavc: reorder AVCodec fields.
lavf: reorder AVInput/OutputFormat fields.
mp3dec: Fix a heap-buffer-overflow
adpcmenc: remove some unneeded casts
adpcmenc: use int16_t and uint8_t instead of short and unsigned char.
adpcmenc: fix adpcm_ms extradata allocation
adpcmenc: return proper AVERROR codes instead of -1
...
Conflicts:
doc/APIchanges
libavcodec/Makefile
libavcodec/adpcmenc.c
libavcodec/avcodec.h
libavcodec/h264.c
libavcodec/libavcodec.v
libavcodec/mpc7.c
libavcodec/mpegaudiodec.c
libavcodec/options.c
libavformat/Makefile
libavformat/avformat.h
libavformat/flvdec.c
libavformat/libavformat.v
Merged-by: Michael Niedermayer <michaelni@gmx.at>
get_ue_golomb_long() is only tested for values up to 2^15 - 2 since
we can not write larger values.
Silence the test on success and return a non-zero value on error.
Use an heap scratch buffer instead of large stack buffer.
Remove unneeded includes.
* shariman/wmall:
Cosmetics: Fix some whitespace errors and indentation
Use correct variable type for 32-bit samples buffer
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This way, if the AVCodecContext is allocated for a specific codec, the
caller doesn't need to store this codec separately and then pass it
again to avcodec_open2().
It also allows to set codec private options using av_opt_set_* before
opening the codec.
It allows to check whether an AVCodecContext is open in a documented
way. Right now the undocumented way this check is done in lavf/lavc is
by checking whether AVCodecContext.codec is NULL. However it's desirable
to be able to set AVCodecContext.codec before avcodec_open2().
* qatar/master: (26 commits)
avconv: deprecate the -deinterlace option
doc: Fix the name of the new function
aacenc: make sure to encode enough frames to cover all input samples.
aacenc: only use the number of input samples provided by the user.
wmadec: Verify bitstream size makes sense before calling init_get_bits.
kmvc: Log into a context at a log level constant.
mpeg12: Pad framerate tab to 16 entries.
kgv1dec: Increase offsets array size so it is large enough.
kmvc: Check palsize.
nsvdec: Propagate errors
nsvdec: Be more careful with av_malloc().
nsvdec: Fix use of uninitialized streams.
movenc: cosmetics: Get rid of camelCase identifiers
swscale: more generic check for planar destination formats with alpha
doc: Document mov/mp4 fragmentation options
build: Use order-only prerequisites for creating FATE reference file dirs.
x86 dsputil: provide SSE2/SSSE3 versions of bswap_buf
rtsp: Remove some unused variables from ff_rtsp_connect().
avutil: make intfloat api public
avformat_write_header(): detail error message
...
Conflicts:
doc/APIchanges
doc/ffmpeg.texi
doc/muxers.texi
ffmpeg.c
libavcodec/kmvc.c
libavcodec/x86/Makefile
libavcodec/x86/dsputil_yasm.asm
libavcodec/x86/pngdsp-init.c
libavformat/movenc.c
libavformat/movenc.h
libavformat/mpegtsenc.c
libavformat/nsvdec.c
libavformat/utils.c
libavutil/avutil.h
libswscale/swscale.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
In some cases, what is left to read from ptr is smaller than EXTRABYTES.
Based on a patch by Thierry Foucu <tfoucu@gmail.com>.
Signed-off-by: Alex Converse <alex.converse@gmail.com>
Provide MMX, SSE2 and SSSE3 versions, with a fast-path when the weights are
multiples of 512 (which is often the case when the values round up nicely).
*_TIMER report for the 16x16 and 8x8 cases:
C:
9015 decicycles in 16, 524257 runs, 31 skips
2656 decicycles in 8, 524271 runs, 17 skips
MMX:
4156 decicycles in 16, 262090 runs, 54 skips
1206 decicycles in 8, 262131 runs, 13 skips
MMX on fast-path:
2760 decicycles in 16, 524222 runs, 66 skips
995 decicycles in 8, 524252 runs, 36 skips
SSE2:
2163 decicycles in 16, 262131 runs, 13 skips
832 decicycles in 8, 262137 runs, 7 skips
SSE2 with fast path:
1783 decicycles in 16, 524276 runs, 12 skips
711 decicycles in 8, 524283 runs, 5 skips
SSSE3:
2117 decicycles in 16, 262136 runs, 8 skips
814 decicycles in 8, 262143 runs, 1 skips
SSSE3 with fast path:
1315 decicycles in 16, 524285 runs, 3 skips
578 decicycles in 8, 524286 runs, 2 skips
This means around a 4% speedup for some sequences.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Currently, any samples in the final frame are not decoded because they are
only represented by one frame instead of two. So we encode two final frames to
cover both the analysis delay and the MDCT delay.
I have no idea what the idea was behind the original code,
but the new code is equivalent to it.
In that loop that places the new node nodes[j] contains
always the data of the new node (since the steps are always
in order: FFSWAP copies node[j] to node[j-1], j is decremented).
Thus nodes[j].no == i and nodes[j].sym == HNODE.
make fate still passes and contains VP6 samples which use
FF_HUFFMAN_FLAG_HNODE_FIRST.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
While pshufb allows emulating bswap on XMM registers for SSSE3, more
shuffling is needed for SSE2. Alignment is critical, so specific codepaths
are provided for this case.
For the huffyuv sequence "angels_480-huffyuvcompress.avi":
C (using bswap instruction): ~ 55k cycles
SSE2: ~ 40k cycles
SSSE3 using unaligned loads: ~ 35k cycles
SSSE3 using aligned loads: ~ 30k cycles
Signed-off-by: Diego Biurrun <diego@biurrun.de>
* qatar/master:
png: add missing #if HAVE_SSSE3 around function pointer assignment.
imdct36: mark SSE functions as using all 16 XMM registers.
png: move DSP functions to their own DSP context.
sunrast: Add a sample request for TIFF, IFF, and Experimental Rastfile formats.
sunrast: Cosmetics
sunrast: Remove if (unsigned int < 0) check.
sunrast: Replace magic number by a macro.
Conflicts:
libavcodec/dsputil.c
libavcodec/dsputil.h
libavcodec/pngdec.c
libavcodec/sunrast.c
libavcodec/x86/Makefile
libavcodec/x86/dsputil_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Offsets are relative to the end of the header, not the
start of the buffer, thus the buffer size needs to be subtracted.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Codec is too simple to gain much from it at lower resolutions,
but should help at very high resolutions, particularly for
v3 and v5 where a not too optimized pseudo-YUV to RGB
is done in the codec.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
This reverts e6e7bfc1 and 365e1ec2.
The code may be incorrect both before and after the revert, but we
do not have any samples that were fixed by the original commits.
Fixes ticket #871.
With gcc 4.6 this part of the code is ca. 4x faster, resulting
in an overall speedup of around 5% for fate-fraps-v5 sample.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
On x86-64, it indeed uses all 16 registers (and on x86-32, this gets
clipped to 8). Not marking it properly causes callers of this function
to fail randomly because of XMM register clobbering.
Note: This fixes the following GCC warning :-
libavcodec/sunrast.c:94: warning: comparison of unsigned expression < 0 is always false.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Codec has only I- and skip-frames, so there is no
need for reget_buffer, change it so it works with
get_buffer.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
* qatar/master:
aacenc: Fix LONG_START windowing.
aacenc: Fix a bug where deinterleaved samples were stored in the wrong place.
avplay: use the correct array size for stride.
lavc: extend doxy for avcodec_alloc_context3().
APIchanges: mention avcodec_alloc_context()/2/3
avcodec_align_dimensions2: set only 4 linesizes, not AV_NUM_DATA_POINTERS.
aacsbr: ARM NEON optimised sbrdsp functions
aacsbr: align some arrays
aacsbr: move some simdable loops to function pointers
cosmetics: Remove extra newlines at EOF
Conflicts:
libavcodec/utils.c
libavfilter/formats.c
libavutil/mem.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>