Provide MMX, SSE2 and SSSE3 versions, with a fast-path when the weights are
multiples of 512 (which is often the case when the values round up nicely).
*_TIMER report for the 16x16 and 8x8 cases:
C:
9015 decicycles in 16, 524257 runs, 31 skips
2656 decicycles in 8, 524271 runs, 17 skips
MMX:
4156 decicycles in 16, 262090 runs, 54 skips
1206 decicycles in 8, 262131 runs, 13 skips
MMX on fast-path:
2760 decicycles in 16, 524222 runs, 66 skips
995 decicycles in 8, 524252 runs, 36 skips
SSE2:
2163 decicycles in 16, 262131 runs, 13 skips
832 decicycles in 8, 262137 runs, 7 skips
SSE2 with fast path:
1783 decicycles in 16, 524276 runs, 12 skips
711 decicycles in 8, 524283 runs, 5 skips
SSSE3:
2117 decicycles in 16, 262136 runs, 8 skips
814 decicycles in 8, 262143 runs, 1 skips
SSSE3 with fast path:
1315 decicycles in 16, 524285 runs, 3 skips
578 decicycles in 8, 524286 runs, 2 skips
This means around a 4% speedup for some sequences.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Currently, any samples in the final frame are not decoded because they are
only represented by one frame instead of two. So we encode two final frames to
cover both the analysis delay and the MDCT delay.
I have no idea what the idea was behind the original code,
but the new code is equivalent to it.
In that loop that places the new node nodes[j] contains
always the data of the new node (since the steps are always
in order: FFSWAP copies node[j] to node[j-1], j is decremented).
Thus nodes[j].no == i and nodes[j].sym == HNODE.
make fate still passes and contains VP6 samples which use
FF_HUFFMAN_FLAG_HNODE_FIRST.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
While pshufb allows emulating bswap on XMM registers for SSSE3, more
shuffling is needed for SSE2. Alignment is critical, so specific codepaths
are provided for this case.
For the huffyuv sequence "angels_480-huffyuvcompress.avi":
C (using bswap instruction): ~ 55k cycles
SSE2: ~ 40k cycles
SSSE3 using unaligned loads: ~ 35k cycles
SSSE3 using aligned loads: ~ 30k cycles
Signed-off-by: Diego Biurrun <diego@biurrun.de>
* qatar/master:
png: add missing #if HAVE_SSSE3 around function pointer assignment.
imdct36: mark SSE functions as using all 16 XMM registers.
png: move DSP functions to their own DSP context.
sunrast: Add a sample request for TIFF, IFF, and Experimental Rastfile formats.
sunrast: Cosmetics
sunrast: Remove if (unsigned int < 0) check.
sunrast: Replace magic number by a macro.
Conflicts:
libavcodec/dsputil.c
libavcodec/dsputil.h
libavcodec/pngdec.c
libavcodec/sunrast.c
libavcodec/x86/Makefile
libavcodec/x86/dsputil_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Offsets are relative to the end of the header, not the
start of the buffer, thus the buffer size needs to be subtracted.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Codec is too simple to gain much from it at lower resolutions,
but should help at very high resolutions, particularly for
v3 and v5 where a not too optimized pseudo-YUV to RGB
is done in the codec.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
This reverts e6e7bfc1 and 365e1ec2.
The code may be incorrect both before and after the revert, but we
do not have any samples that were fixed by the original commits.
Fixes ticket #871.
With gcc 4.6 this part of the code is ca. 4x faster, resulting
in an overall speedup of around 5% for fate-fraps-v5 sample.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
On x86-64, it indeed uses all 16 registers (and on x86-32, this gets
clipped to 8). Not marking it properly causes callers of this function
to fail randomly because of XMM register clobbering.
Note: This fixes the following GCC warning :-
libavcodec/sunrast.c:94: warning: comparison of unsigned expression < 0 is always false.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Codec has only I- and skip-frames, so there is no
need for reget_buffer, change it so it works with
get_buffer.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
* qatar/master:
aacenc: Fix LONG_START windowing.
aacenc: Fix a bug where deinterleaved samples were stored in the wrong place.
avplay: use the correct array size for stride.
lavc: extend doxy for avcodec_alloc_context3().
APIchanges: mention avcodec_alloc_context()/2/3
avcodec_align_dimensions2: set only 4 linesizes, not AV_NUM_DATA_POINTERS.
aacsbr: ARM NEON optimised sbrdsp functions
aacsbr: align some arrays
aacsbr: move some simdable loops to function pointers
cosmetics: Remove extra newlines at EOF
Conflicts:
libavcodec/utils.c
libavfilter/formats.c
libavutil/mem.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Move libgsm_encode_close before its first use and call it
with the correct number of arguments.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
10l: Forgot to adjust deinterleave for new location of incoming samples in 7946a5a.
This produced incorrect, but surprisingly listenable results.
Thanks to Justin Ruggles for the report.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This fixes the video frame pts (off by one for each MVIh)
and makes the "key frames" decode stand-alone (MVIh
contains only palette, such a palette-only frame being
marked as key frame is not really correct).
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Previously the decoder would raise an error.
The end result is the same, the time stamps only change
because regression tests create time stamps incorrectly.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
This prepares for assembly optimisations by moving the most
time-consuming loops to functions called through pointers
in a new context.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Earlier, calling avcodec_encode_audio worked fine even if time_base
wasn't set. Now it crashes due to trying to scale the output pts to
the codec context time base. This affects e.g. VLC.
If no time_base is set for audio codecs, set it to the sample
rate.
CC: libav-stable@libav.org
Signed-off-by: Martin Storsjö <martin@martin.st>
MVDATA may or may not be transmitted. If it is not, both
dmv_x and dmv_y is to be assumed zero.
This may not trigger wrong picture in all systems, but
it's a bug nevertheless. Fixes SA10116.vc1 on my 64-bit
Windows 7.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* qatar/master:
smacker: Sanity check huffman tables found in the headers.
smacker: remove dead store
qdm2: Check data block size for bytes to bits overflow.
mxfdec: Fix files with essence containers larger than 2 GiB.
mxfdec: Employ correct printf conversion specifiers for POSIX int types.
vc1: always read the bfraction element for interlaced fields
fate: add XWD image regression test
lavf: prevent infinite loops while flushing in avformat_find_stream_info
matroskadec: Pad AAC extradata.
ismindex: Fix build on mingw
Conflicts:
libavformat/mxfdec.c
libavformat/utils.c
tests/lavf-regression.sh
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Fixes not yet fixed parts of CVE-2011-3946.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Previously, it would not be read if refdist_flag was not set, however
according to the spec and the reference decoder, it should always be read.
Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
* qatar/master: (22 commits)
wma: Clip WMA1 and WMA2 frame length to 11 bits.
movenc: Don't require frame_size to be set for modes other than mov
doc: Update APIchanges with info on muxer flushing
movenc: Reindent a block
tools: Remove some unnecessary #undefs.
rv20: prevent calling ff_h263_decode_mba() with unset height/width
tools: K&R reformatting cosmetics
Ignore generated aviocat and ismindex tools.
build: Automatically include architecture-specific library Makefile snippets.
indeo5: prevent null pointer dereference on broken files
pktdumper: Use usleep instead of sleep
cosmetics: Remove some unnecessary block braces.
Drop unnecessary prefix from *sink* variable and struct names.
Add a tool for creating smooth streaming manifests
movdec: Calculate an average bit rate for fragmented streams, too
movenc: Write the sample rate instead of time scale in the stsd atom
movenc: Add a separate ismv/isma (smooth streaming) muxer
movenc: Allow the caller to decide on fragmentation
libavformat: Add a flag for muxers that support write_packet(NULL) for flushing
movenc: Add support for writing fragmented mov files
...
Conflicts:
Changelog
cmdutils.c
cmdutils.h
doc/APIchanges
ffmpeg.c
ffplay.c
libavfilter/Makefile
libavformat/Makefile
libavformat/avformat.h
libavformat/movenc.c
libavformat/movenc.h
libavformat/version.h
tools/graph2dot.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The MDCT buffers in the decoder are only sized for up to 11 bits. The
reverse engineered documentation for WMA1/2 headers say that that for
all samplerates above 32kHz 11 bits are used. 12 and 13 bit support
were added for WMAPro. I was unable to make any Microsoft tools generate
a test file at a samplerate above 48kHz.
Discovered by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
This fixes a double release of the current frame on deinit.
Fixes CVE-2011-3934
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The index of the motion vector has to be checked before being
multiplied by 2 for the array index.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (23 commits)
aacenc: Fix identification padding when the bitstream is already aligned.
aacenc: Write correct length for long identification strings.
aud: remove unneeded field, audio_stream_index from context
aud: fix time stamp calculation for ADPCM IMA WS
aud: simplify header parsing
aud: set pts_wrap_bits to 64.
cosmetics: indentation
aud: support Westwood SND1 audio in AUD files.
adpcm_ima_ws: fix stereo decoding
avcodec: add a new codec_id for CRYO APC IMA ADPCM.
vqa: remove unused context fields, audio_samplerate and audio_bits
vqa: clean up audio header parsing
vqa: set time base to frame rate as coded in the header.
vqa: set packet duration.
vqa: use 1/sample_rate as the audio stream time base
vqa: set stream start_time to 0.
lavc: postpone the removal of AVCodecContext.request_channels.
lavf: postpone removing av_close_input_file().
lavc: postpone removing old audio encoding and decoding API
avplay: remove the -er option.
...
Conflicts:
Changelog
libavcodec/version.h
libavdevice/v4l.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Although it has been deprecated for a long time, its intended
replacement (request_channel_layout) is not actually used anywhere, so
request_channels is currently the only way to access that functionality.
Previously this was just checked in case of slice threads,
but frame threads do not support this either currently.
Making them support this is of course the long term goal
Fixes bug155
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
Remove ffmpeg.
aacenc: Simplify windowing
aacenc: Move saved overlap samples to the beginning of the same buffer as incoming samples.
aacenc: Deinterleave input samples before processing.
aacenc: Store channel count in AACEncContext.
aacenc: Move Q^3/4 calculation to it's own table
aacenc: Request normalized float samples instead of converting s16 samples to float.
aacpsy: Replace an if with FFMAX in LAME windowing.
aacenc: cosmetics, replace 'rd' with 'bits' in codebook_trellis_rate to make it more clear what is being calculated.
aacpsy: cosmetics, change a FIXME to a NOTE about subshort comparisons
aacenc: cosmetics: move init() and end() to the bottom of the file.
aacenc: aac_encode_init() cleanup
XWD encoder and decoder
vc1: don't read the interpfrm and bfraction elements for interlaced frames
mxfdec: fix memleak on mxf_read_close()
westwood: split the AUD and VQA demuxers into separate files.
Conflicts:
.gitignore
Changelog
Makefile
configure
doc/ffmpeg.texi
ffmpeg.c
libavcodec/Makefile
libavcodec/aacenc.c
libavcodec/allcodecs.c
libavcodec/avcodec.h
libavcodec/version.h
libavformat/Makefile
libavformat/img2.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The previous implementation assumed that a new picture would always
supersede the previous picture. Similarly, presentation segments
were assumed to pertain to the most-recently-read picture.
However, each presentation segment may refer to 0 or more pictures
by their ID. Picture IDs may repeat, and a repeated picture ID
indicates that the old picture for that ID is no longer needed
and may be discarded.
The new implementation allocates a buffer with one slot for each
possible picture ID (the picture ID is a 16-bit field) and
properly decodes presentation segments so that all relevant
pictures are output upon encountering a display segment.
Given that most PGS streams are unlikely to use more than a small
fraction of the available picture IDs, it would probably be better
to use a more memory-efficient data structure. I'm lazy though, so
I leave this to a more motivated individual.
I've tested the code with MKV files in VLC (a recent revision from
their git repo) and with HandBrake (a version that I hacked up to
use ffmpeg's PGS subtitle decoder).
Review-by: Hendrik Leppkes <h.leppkes@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This fixes memory corruption when seeking in broken streams.
a random mpeg4 in nut file was used to debug.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This matches the spec as well as the reference decoder, and fixes a bug
with interlaced frame decoding.
Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
* qatar/master: (25 commits)
riff: fix invalid av_freep() calls on EOF in ff_read_riff_info
pam: Fix a typo that broke writing and reading PAM files.
mxfdec: fix memleak on av_realloc failures
mxfdec: Do not parse slices or DeltaEntryArrays.
mxfdec: hybrid demuxing/seeking solution
mxfdec: Add Avid's essence element key.
mfxdec: Separate mxf_essence_container_uls for audio and video.
mxfdec: Compute packet offsets properly.
mxfdec: Use MaterialPackage - Track - TrackID instead of the system_item hack.
mxfdec: use av_dlog() for 'no corresponding source package found'
mxfdec: Make mxf->partitions sorted by offset.
mxfdec: parse ThisPartition
mxfdec: Speed up metadata and index parsing.
mxfdec: Make sure DataDefinition is consistent between material track and source track.
mxfdec: add EssenceContainer UL found in 0001GL00.MXF.A1.mxf_opatom.mxf
mxfdec: Add hack that adjusts the n_delta calculation when system items are present.
mxfdec: Parse IndexTableSegments and convert them into AVIndexEntry arrays.
mxfdec: Move FooterPartition to MXFContext and make sure it is never zero.
mxfdec: check return value of avio_seek
mxfdec: skip to end of structural sets
...
Conflicts:
configure
libavcodec/pnm.c
libavformat/mxfdec.c
libavformat/riff.c
libavformat/rtsp.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This was a regression that came in when I switched to using the
h.264 annex b filter all the time. As the filter modifies extradata,
its use violates the statelessness assumption that exists in the
'ffmpeg' command line tool, and maybe elsewhere. It assumes that
a docoder can be reinitalised and pointed to an existing stream and
get the same results.
For now, the only way to meet this requirement is to backup the
extradata.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
the written length was off by 2 causing aac decoders to fail with the data.
lucky the encoder was marked as experimental and not used much
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
rtpdec: Use our own SSRC in the SDES field when sending RRs
Finalize changelog for 0.8 Release
Prepare for 0.8 Release
threads: change the default for threads back to 1
threads: update slice_count and slice_offset from user context
aviocat: Remove useless includes
doc/APIChanges: fill in missing dates and hashes
Revert "avserver: fix build after the next bump."
mpegaudiodec: switch error detection check to AV_EF_BUFFER
lavf: rename fer option and document resulting (f_)err_detect options
lavc: rename err_filter option to err_detect and document it
mpegvideo: fix invalid memory access for small video dimensions
movenc: Reorder entries in the MOVIentry struct, for tigheter packing
rtsp: Remove extern declarations for variables that don't exist
aviocat: Flush the output before closing
Conflicts:
Changelog
RELEASE
libavcodec/mpegaudiodec.c
libavcodec/pthread.c
libavformat/options.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Using threaded decoding by default breaks backward compatibility if
AVHWAccel is used or if an appliction sets threadunsafe callbacks.
Avconv and avplay still use -threads auto if not specified.
When either video dimension is only one macroblock, subtractions
based on v_edge_pos and the macroblock size may be negative. In
that situation, an unsigned comparison isn't sufficent to test for
MV overruns, because a limit of (unsigned)-1 will let any other
value pass.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
That way all mix levels as exported by the parser
will have the same meaning.
Previously the 3bit center mix level for eac3 was
used to index in a 4 entry table leading to out of array reads.
this change removes the table and offsets the ac3 variable by 4
so it matches the meanings for eac3 except the reserved case.
The reserved case is then explicitly handled.
Idea-by: Justin Ruggles <justin.ruggles@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
Add a tool that uses avio to read and write, doing a plain copy of data
ARM: fix build with FFT enabled and MDCT disabled
lavf: force single-threaded decoding in avformat_find_stream_info
avidec: migrate last of lavf from FF_ER_* to AV_EF_*
avserver: fix build after the next bump.
Conflicts:
libavformat/Makefile
libavformat/avidec.c
libavformat/utils.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
mpeg12: check for available bits to avoid an infinite loop
fate: add some shorthands to run groups of tests
fate: Give some tests more sensible names.
cosmetics: Rename ffsink to avsink.
Conflicts:
avconv.c
cmdutils.c
cmdutils.h
ffmpeg.c
ffplay.c
tests/fate/audio.mak
tests/fate/demux.mak
tests/fate/dpcm.mak
tests/fate/image.mak
tests/fate/lossless-audio.mak
tests/fate/lossless-video.mak
tests/fate/microsoft.mak
tests/fate/pcm.mak
tests/fate/real.mak
tests/fate/screen.mak
tests/fate/video.mak
tests/fate/voice.mak
tests/fate/wma.mak
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This one was missed in the previous fraps fix, the
allocation is exactly the same in both cases.
Fixes fraps-v5 under valgrind.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
This is needed in case the get_buffer() callback doesnt set
width/height.
Ideally all decoders would make calls through some wraper
to the callbacks and that wraper would call ff_init_buffer_info()
But until thats done, the default reget buffer must call this
itself as it needs the values for the changed size check later.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
tta: cast output data pointer to the correct type
avconv: fix -frames for video encoders with delay.
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The same as av_fast_malloc but uses av_mallocz and keeps extra
always-0 padding.
This does not mean the memory will be 0-initialized after each call,
but actually only after each growth of the buffer.
However this makes sure that
a) all data anywhere in the buffer is always initialized
b) the padding is always 0
c) the user does not have to bother with adding the padding themselves
Fixes another valgrind warning about use of uninitialized data,
this time with fate-vsynth1-jpegls.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
This combination is quite odd and almost certainly a bug if
it happens.
Reviewed-by: Justin Ruggles <justin.ruggles@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
rv34: add NEON rv34_idct_add
rv34: 1-pass inter MB reconstruction
add SMJPEG muxer
avformat: split out common SMJPEG code
pictordec: Use bytestream2 functions
avconv: use avcodec_encode_audio2()
pcmenc: use AVCodec.encode2()
avcodec: bump minor version and add APIChanges for the new audio encoding API
avcodec: Add avcodec_encode_audio2() as replacement for avcodec_encode_audio()
avcodec: add a public function, avcodec_fill_audio_frame().
rv34: Intra 16x16 handling
rv34: Inter/intra MB code split
Conflicts:
Changelog
libavcodec/avcodec.h
libavcodec/pictordec.c
libavcodec/utils.c
libavcodec/version.h
libavcodec/x86/rv34dsp.asm
libavformat/version.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Overall almost 4% faster, idct_add down from 350 to 85 cycles, idct_dc_add
down from 83 to 30 cycles.
squash: rv34 idct rearrange partial register loads