Reduces the number of calls to tmvp derivation from 933685 to 586271 on
a sequence.
Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The position is either rounded or not checked, so delay the wait to
check the proper value.
Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
After finishing parsing VPS/SPS/PPS/slice header, check remaining bits,
and if an overconsumption occurred, report invalid data.
Liked-by: BBB
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'cc1d8c54c19dd14fb851e3e7a7793d6b3bd75e94':
avcodec: Postpone FF_IDCT_XVIDMMX removal until the next version bump
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This makes the SPS parsing a little, but barely, safer.
Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
They might be left uninitialized otherwise since 3ad04608.
Fixes ticket #3840.
Found-by: Carl Eugen Hoyos <ce@hoyos.ws>
Reported-by: Piotr Bandurski <ami_stuff@o2.pl>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
mplayer-specifc hacks should not be in our codebase. mplayer should fix
its own code. It is not our responsibility to work around their broken
code.
This reverts commit e8e575633f.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Only use PAL8 if palette is present, else use GRAY8 for pixfmt.
Instead of simulating a grayscale palette, use real grayscale pixels, if no
palette is actually defined.
Signed-off-by: Diego Elio Pettenò <flameeyes@flameeyes.eu>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
It previously used the output, cropped size, causing overreads/writes.
Fixes ticket #3839.
This issue was introduced by d249e682, which is not part of any release
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
It causes build failures in some cases and the functions are provided by
libavutil so the wraper should not be needed anymore
Found-by: jamrial
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Some applications still use this deprecated API
Its not nice to remove it when its still in use and as long as it doesnt
cause us any work to keep it.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This prevents a build failure when bumping.
the uses could easily be updated / removed, if people prefer.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
(cherry picked from commit eedc3f36532e4c6de782fe1c2dc59d192418a8fc)
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'f4c444e17d137c786f0ed2da0e5943df505d5f9e':
Postpone API-incompatible changes until the next bump.
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'a7a17e3f1915ce69b787dc58c5d8dba0910fc0a4':
hevc_filter: move some conditions out of loops
Conflicts:
libavcodec/hevc_filter.c
This is possibly less readable than the variant used before.
Thus please take a look and if people agree its worse, dont
hesitate to revert.
See: 83976e40e8
Merged-by: Michael Niedermayer <michaelni@gmx.at>
1) each of the loops run within a single CTB, so the relevant reference
list is constant
2) when that CTB is, or lies on the same slice as, the current one, we
can use a simple access instead of a relatively expensive call to
ff_hevc_get_ref_list()
* commit '84d173d3de97c753234ab0c0b50551d51413d663':
xvididct: Ensure that the scantable permutation is always set correctly
Merged-by: Michael Niedermayer <michaelni@gmx.at>
It allows attaching other external, opaque data to the frame and passing it
through the reordering process, for cases when the caller wants other data
than just the plain packet pts. There is no way to cleanly achieve this
without the field.
Used to expose ff_raw_pix_fmt_tags[] to other libav* libraries
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'c7d9b473e28238d4a4ef1b7e8b42c1cca256da36':
cdgraphics: do not return 0 from the decode function
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The input data must remain constant, make a copy instead. This is in
theory a performance hit, but since I failed to find any samples
using this feature, this should not matter in practice.
Also, check the size of the header, avoiding invalid reads on truncated
data.
CC:libav-stable@libav.org
The x86 asm expects int32_t so use that type.
Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The previous implementation of the parser made four passes over each input
buffer (reduced to two if the container format already guaranteed the input
buffer corresponded to frames, such as with MKV). But these buffers are
often 200K in size, certainly enough to flush the data out of L1 cache, and
for many CPUs, all the way out to main memory. The passes were:
1) locate frame boundaries (not needed for MKV etc)
2) copy the data into a contiguous block (not needed for MKV etc)
3) locate the start codes within each frame
4) unescape the data between start codes
After this, the unescaped data was parsed to extract certain header fields,
but because the unescape operation was so large, this was usually also
effectively operating on uncached memory. Most of the unescaped data was
simply thrown away and never processed further. Only step 2 - because it
used memcpy - was using prefetch, making things even worse.
This patch reorganises these steps so that, aside from the copying, the
operations are performed in parallel, maximising cache utilisation. No more
than the worst-case number of bytes needed for header parsing is unescaped.
Most of the data is, in practice, only read in order to search for a start
code, for which optimised implementations already existed in the H264 codec
(notably the ARM version uses prefetch, so we end up doing both remaining
passes at maximum speed). For MKV files, we know when we've found the last
start code of interest in a given frame, so we are able to avoid doing even
that one remaining pass for most of the buffer.
In some use-cases (such as the Raspberry Pi) video decode is handled by the
GPU, but the entire elementary stream is still fed through the parser to
pick out certain elements of the header which are necessary to manage the
decode process. As you might expect, in these cases, the performance of the
parser is significant.
To measure parser performance, I used the same VC-1 elementary stream in
either an MPEG-2 transport stream or a MKV file, and fed it through avconv
with -c:v copy -c:a copy -f null. These are the gperftools counts for
those streams, both filtered to only include vc1_parse() and its callees,
and unfiltered (to include the whole binary). Lower numbers are better:
Before After
File Filtered Mean StdDev Mean StdDev Confidence Change
M2TS No 861.7 8.2 650.5 8.1 100.0% +32.5%
MKV No 868.9 7.4 731.7 9.0 100.0% +18.8%
M2TS Yes 250.0 11.2 27.2 3.4 100.0% +817.9%
MKV Yes 149.0 12.8 1.7 0.8 100.0% +8526.3%
Yes, that last case shows vc1_parse() running 86 times faster! The M2TS
case does show a larger absolute improvement though, since it was worse
to begin with.
This patch has been tested with the FATE suite (albeit on x86 for speed).
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>