This does not seem to have an effect currently. Fate-h264 passes with
THREADS=1..16 and both threading types as before. It fixes however a
segfault during error resilience with my adaptive-frame-mt patchset.
A picture in use during error resilience gets realloced in another
thread in the fuzzed sample sample_varPAR.avi_s226019.
Dropping frames is undesirable but that is the only way by which the
decoder could return to low delay mode. Instead emit a warning and
continue with delayed frames.
Fixes a crash in fuzzed sample nasa-8s2.ts_s20033 caused by a larger
than expected has_b_frames value. Low delay keeps getting re-enabled
from a presumely broken SPS.
CC: libav-stable@libav.org
s->mb_x is reset to zero a couple of lines above. It does not make
sense to call ff_er_add_slice() with 0 as endx when the end of the
macroblock row was reached. Fixes unnecessary and counterproductive
error resilience in https://bugzilla.libav.org/show_bug.cgi?id=394.
CC: libav-stable@libav.org
Some invocations include a verb in the log message, others do not. Yet
av_log_missing_feature expects callers to provide a verb. Change the
function to include a verb instead and update the callers accordingly.
The result is a more natural function API and correct English in the
function invocations.
When decode_nal_units() previously encountered a NAL_END_SEQUENCE,
and there are some junk bytes left in the input buffer, but no start codes,
buf_index gets stuck 3 bytes before the end of the buffer.
This can trigger an infinite loop in the caller code, eg. in
try_decode_trame(), as avcodec_decode_video() then keeps returning zeroes,
with 3 bytes of the input packet still available.
With this change, the remaining bytes are skipped so the whole packet gets
consumed.
CC:libav-stable@libav.org
Signed-off-by: Jindřich Makovička <makovick@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
It is possible in various error paths as well as gap handling
that this has already been allocated. It is not clear why that
would be a problem with the current code, thus disable the
assert to avoid a common assert failure when asserts are enabled.
Signed-off-by: Martin Storsjö <martin@martin.st>
The h264_vdpau decoder crashed if output colorspace was not 8-bit 420.
Add a check to error out instead (current hardware does not support
other colorspaces, so successful decoding is not possible).
Signed-off-by: Martin Storsjö <martin@martin.st>
Write out the NAL decoding loops in full so that they are easier
to parse for a preprocessor without it having to be aware of macros
or other such things in C code.
This also makes the code more readable.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Instead of inlining everything into ff_h264_hl_decode_mb(), use
explicit templating to create versions of the called functions
with constant parameters filled in. This greatly speeds up
compilation of h264.c and reduces the code size without any
measurable impact on performance.
Compilation time for h264.c on an i7 goes from 30s to 5.5s.
Code size is reduced by 430kB.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Override the frame size from the SPS with AVCodecContext values
if the latter specify a size smaller by less than one macroblock.
This is required for correct cropping of MOV files from Canon cameras.
Signed-off-by: Mans Rullgard <mans@mansr.com>
If decoding a second complementary field, and the first was
decoded in our thread, mark decoding of that field as complete.
If decoding fails, mark the decoded field/frame as complete.
Do not allow switching between field modes or field/frame mode
between slices within the same field/frame. Ensure that two
subsequent fields cover top/bottom (rather than top/frame,
bottom/frame or such nonsense situations).
Fixes various deadlocks when decoding samples with errors in
reference frames.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
Progressive images can have only 16 references, error out if there are
more, since the data is almost certainly corrupt, and the invalid value
will lead to random crashes or invalid writes later on.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
Parsing the entire NAL as SPS fixes decoding of some AVC bitstreams
with broken escaping. Since the size of the NAL unit is known and
checked against the buffer end we can parse it entirely without buffer
overreads.
Fixes playback of
http://streams.videolan.org/streams/mp4/Mr_MrsSmith-h264_aac.mp4
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
This reverts commit 729ebb2f18.
There was an off-by-one error in the bit mask calculation clearing
actually the last valid bit and causing
http://bugzilla.libav.org/show_bug.cgi?id=227
The broken sample (Mr_MrsSmith-h264_aac.mp4) the commit was fixing
does not work after correcting the off-by-one error.
CC: libav-stable@libav.org
Fixes invalid reads while initializing the dequant tables, which uses
the bit depth to determine the QP table size.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
Conversion of the luma intra prediction mode to one of the constrained
("alzheimer") ones can happen by crafting special bitstreams, causing
a crash because we'll call a NULL function pointer for 16x16 block intra
prediction, since constrained intra prediction functions are only
implemented for chroma (8x8 blocks).
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
This fixes standalone compilation of some decoders with --disable-optimizations.
cabac.h defines some inline functions that use symbols from cabac.c. Without
optimizations these inline functions are not eliminated and linking fails with
references to non-existing symbols.
Splitting the inline functions off into their own header and only #including
it in the places where the inline functions are used allows #including cabac.h
from anywhere without ill effects.
This fixes compilation failures related to START_TIMER/STOP_TIMER macros and
-Werror=declaration-after-statement. START_TIMER declares variables and thus
may not be placed after statements outside of a new block.
Adds a new member to MpegEncContext to hold the number of used slice
contexts. Fixes segfaults with '-threads 17 -thread_type slice' and
fate-vsynth{1,2}-mpeg{2,4}thread{,_ilace} with --disable-pthreads.
Trailing bits are likely to be non-zero if the NAL unit is truncated.
Clearing the bits make overreads of the bitstream less likely in this
case. Fixes playback of
http://streams.videolan.org/streams/mp4/Mr_MrsSmith-h264_aac.mp4 which
has a forbidden byte sequence of 0x00 0x00 0x00 in it SPS.
Start code emulation prevention is only required in Annex B bytestream
packed NAL units. For other coding formats the size is already known.
Looking for a start code prefix can result in false positives like in
http://streams.videolan.org/streams/mp4/Mr_MrsSmith-h264_aac.mp4
which has a false positive in the SPS.
The keyframe after a POC reset may not be the first to be returned to
the user. Therefore, don't reset the expected next POC once we return
a keyframe to the user, but once we know that the next frame in the
return-queue is a keyframe.
A new field, AVCodecContext.internal is used to hold a new struct
AVCodecInternal, which has private fields that are not codec-specific and are
used by general libavcodec functions.
Moved internal_buffer, internal_buffer_count, and is_copy.
Fixes the following conformance suite samples:
HCBP1_HHI_A.264, HCBP2_HHI_A.264, HCMP1_HHI_A.264 (main)
HCHP1_HHI_B.264, HCHP2_HHI_A.264, HCHP3_HHI_A.264 (frext)
Replace our incomplete w32threads implementation with x264's pthreads
w32threads wrapper.
Relicensed to LGPL with kind permission by Pegasys Inc.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
This allows concurrent decoding of the last field/frame, rather than
only the last slice, of data packets with multiple NAL units packed
together.
This will fix the slowdown reported in e.g. bug 52.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Correct computation of implicit weight tables when referencing pictures
that are marked for long reference.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
High bitdepth H.264 needs 32-bit transform coefficients, whereas
dnxhd does not. This creates a conflict with the templated
functions operating on DCTELEM data. This patch adds a field
allowing the caller to choose the element size in dsputil_init()
and adds the required functions.
Signed-off-by: Mans Rullgard <mans@mansr.com>
FF_COMMON_FRAME holds the contents of the AVFrame structure and is also copied
to struct Picture. Replace by an embedded AVFrame structure in struct Picture.
By observation it did not seem to handle prev_frame_num > frame_num.
This does not affect any files I have.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
One of the causes of this bug is that the h264 parser defaults low_delay
to 1, but the h264 codec defaults low_delay to 0. Really Ugly.
After many hours of looking at this, I'm still not sure how has_b_frames
is *intended* to behave, but to me the implementation appears way more
complicated than it ought to be.
My patch relies on the encoder to set an optional field in the SPS. This
works for libx264 streams, but I'm not sure that all h264 encoders will
set it.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
When backing up the top-left border, check that the top-left
(rather than left) MB indeed does belong to our slice. If it
doesn't, backing up has no positive effect but may accidentally
interfere with other threads writing in the same space.
Fixes occasional one-off effects when enabling slice-MT.
This patch lets e.g. dsputil_init chose dsp functions with respect to
the bit depth to decode. The naming scheme of bit depth dependent
functions is <base name>_<bit depth>[_<prefix>] (i.e. the old
clear_blocks_c is now named clear_blocks_8_c).
Note: Some of the functions for high bit depth is not dependent on the
bit depth, but only on the pixel size. This leaves some room for
optimizing binary size.
Preparatory patch for high bit depth h264 decoding support.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
In high bit depth, the QP values may now be up to (51 + 6*(bit_depth-8)).
Preparatory patch for high bit depth h264 decoding support.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
In high bit depth the pixels will not be stored in uint8_t like in the
normal case, but in uint16_t. The pixel size is thus 1 in normal bit
depth and 2 in high bit depth.
Preparatory patch for high bit depth h264 decoding support.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
It is pretty hopeless that other considerable projects will adopt
libavutil alone in other projects. Projects that need small footprint
are better off with more specialized libraries such as gnulib or rather
just copy the necessary parts that they need. With this in mind, nobody
is helped by having libavutil and libavcore split. In order to ease
maintenance inside and around FFmpeg and to reduce confusion where to
put common code, avcore's functionality is merged (back) to avutil.
Signed-off-by: Reinhard Tartler <siretart@tauware.de>
Don't free RBSP tables (containing decoded NAL units) on resolution
change, because we actually need this data to decode the frame after
reiniting (with new resolution). Fixed issue 2393.
Signed-off-by: Janne Grunau <janne-ffmpeg@jannau.net>
It's incomplete, no one is working on it, and when someone asks about
working on it we advise them not to.
Signed-off-by: Mans Rullgard <mans@mansr.com>
No speed improvement, but necessary for some future stuff.
Also opens up the possibility of asm chroma dc idct/dequant.
Originally committed as revision 26349 to svn://svn.ffmpeg.org/ffmpeg/trunk
Doesn't help speed as there isn't an asm implementation yet, but consistency
is a good thing.
Originally committed as revision 26348 to svn://svn.ffmpeg.org/ffmpeg/trunk
Useful so that we don't have to run the hierarchical DC iDCT if there aren't
any coefficients. Opens up some future opportunities for optimization as well.
Originally committed as revision 26337 to svn://svn.ffmpeg.org/ffmpeg/trunk
About 2.5x the speed.
NOTE: the way that the asm code handles large qmuls is a bit suboptimal.
If x264-style dequant was used (separate shift and qmul values), it might
be possible to get some extra speed.
Originally committed as revision 26336 to svn://svn.ffmpeg.org/ffmpeg/trunk
It was an ugly hack to begin with and didn't give any performance.
NOTE: this patch opens up some future simplifications to be made (such as
removing some of the scantables from H264Context) but doesn't take advantage
of them yet.
Originally committed as revision 26329 to svn://svn.ffmpeg.org/ffmpeg/trunk
svq3 still doesn't support multithreading, but it's simpler for clients if
they can enable threading for all codecs by default.
Originally committed as revision 26015 to svn://svn.ffmpeg.org/ffmpeg/trunk
Contrary to progressive, just being able to crop up to 14/15 pixels
is not enough to encode all supported resolutions, and the new
behaviour is also consistent with e.g. MPEG-2 etc.
Originally committed as revision 25669 to svn://svn.ffmpeg.org/ffmpeg/trunk
r25218 made assumptions about the existence of past reference frames that
weren't necessarily true.
Originally committed as revision 25243 to svn://svn.ffmpeg.org/ffmpeg/trunk
h264dsp_mmx.c to h264_idct.asm (as yasm code). Because the loops are now
coded in asm instead of C, this is (depending on the function) up to 50%
faster for cases where gcc didn't do a great job at looping.
Since h264_idct_add8() is now faster than the manual loop setup in h264.c,
in-asm idct calling can now be enabled for chroma as well (see r16207). For
MMX, this is 5% faster. For SSE2 (which isn't done for chroma if h264.c does
the looping), this makes it up to 50% faster. Speed gain overall is ~0.5-1.0%.
Originally committed as revision 25119 to svn://svn.ffmpeg.org/ffmpeg/trunk
Passing an explicit filename to this command is only necessary if the
documentation in the @file block refers to a file different from the
one the block resides in.
Originally committed as revision 22921 to svn://svn.ffmpeg.org/ffmpeg/trunk
The function is only used within that file, so it makes sense to place
it there. This fixes many warnings of the type:
h264.h:1170: warning: ‘fill_filter_caches’ defined but not used
Originally committed as revision 22876 to svn://svn.ffmpeg.org/ffmpeg/trunk
start of decoding a picture instead of at the end.
Fixes mmco01.264
Patch by Stephen Warren
Originally committed as revision 22728 to svn://svn.ffmpeg.org/ffmpeg/trunk
change, not only size changes.
Patch by Janusz Krzysztofik foo=zyszt <jkr$foo@tis.icnet.pl>.
Originally committed as revision 22597 to svn://svn.ffmpeg.org/ffmpeg/trunk
This moves the H264-specific functions from DSPContext to the new
H264DSPContext. The code is made conditional on CONFIG_H264DSP
which is set by the codecs requiring it.
The qpel and chroma MC functions are not moved as these are used by
non-h264 code.
Originally committed as revision 22565 to svn://svn.ffmpeg.org/ffmpeg/trunk
Previously, the area of a lost slice would be left at the slice number of the previous
frame which could occasionally match the number of the next slice and thus a non existing
slice could have been used for prediction leading to additional decoding errors in otherwise
undamaged slices.
Originally committed as revision 22483 to svn://svn.ffmpeg.org/ffmpeg/trunk
This fixes playback of such streams with ffplay (but does not affect
current ffmpeg).
Patch by Janusz Krzysztofik, jkrzyszt A tis D icnet D pl
Originally committed as revision 22112 to svn://svn.ffmpeg.org/ffmpeg/trunk
about 5 cpu cycles slower in the local code but should be overall faster
due to reduced cache use. (my sample though has too few intra4x4 blocks
for this to be meassureable easily either way)
Originally committed as revision 22052 to svn://svn.ffmpeg.org/ffmpeg/trunk
The code read/write code itself was 1 cycle faster, overall its
likely more due to cache effects
Originally committed as revision 22048 to svn://svn.ffmpeg.org/ffmpeg/trunk
This eliminates all aliasing violation warnings in h264 code.
No measurable speed difference with gcc-4.4.3 on i7.
Originally committed as revision 21881 to svn://svn.ffmpeg.org/ffmpeg/trunk
the row code. This function would only be needed on a MB basis for MBAFF+FMO
Originally committed as revision 21860 to svn://svn.ffmpeg.org/ffmpeg/trunk
This should fix a segfault, also it might be faster on systems where the
+52 wasnt free.
Originally committed as revision 21406 to svn://svn.ffmpeg.org/ffmpeg/trunk
loop filter. This removes one obstacle of getting ff_h264_filter_mb_fast()
bitexact. code is maybe 0.1% faster
Originally committed as revision 21280 to svn://svn.ffmpeg.org/ffmpeg/trunk
Run loop filter per row instead of per MB, this also should make it
much easier to switch to per frame filtering and also doing so in a
seperate thread in the future if some volunteer wants to try.
Overall decoding speedup of 1.7% (single thread on pentium dual / cathedral sample)
This change also allows some optimizations to be tried that would not have
been possible before.
Originally committed as revision 21270 to svn://svn.ffmpeg.org/ffmpeg/trunk
Seems to speed the code up a little...
The placement of many generic functions between h264.c and h264.h is still open
Currently they are a little randomly placed between them.
Originally committed as revision 21178 to svn://svn.ffmpeg.org/ffmpeg/trunk
called once per MB in worst case and doesnt seem to benefit from static inline.
Actually the code might be a hair faster now (0.1% according to my benchmark but
this could be random noise)
Originally committed as revision 21173 to svn://svn.ffmpeg.org/ffmpeg/trunk
no speedloss meassured, also its really not touching anything that is speed relevant.
Originally committed as revision 21169 to svn://svn.ffmpeg.org/ffmpeg/trunk
No speedloss meassured (its slightly faster here but that may be random fluctuations)
Originally committed as revision 21165 to svn://svn.ffmpeg.org/ffmpeg/trunk
functions called more than per mb are moved into the header, scan8 is also
as it must be known at compiletime.
The code after this patch duplicates h264data.h, this has been done to minimize
the changes in this step and allow more fine grained benchmarking.
Speedwise this is 1% faster on my pentium dual core with diegos cursed cathedral
sample.
Originally committed as revision 21157 to svn://svn.ffmpeg.org/ffmpeg/trunk
decoder which allows their usage without checking profile_idc.
Patch by Laurent Aimar (fenrir (AT) videolan org)
Originally committed as revision 21107 to svn://svn.ffmpeg.org/ffmpeg/trunk