On 12 frames of a 444p 12 bits DNxHR sequence, _put function:
C: 78902 decicycles in idct, 262071 runs, 73 skips
avx: 32478 decicycles in idct, 262045 runs, 99 skips
Difference between the 2:
stddev: 0.39 PSNR:104.47 MAXDIFF: 2
This is unavoidable and due to the scale factors used in the x86
version, which cannot match the C ones.
In addition, the trick of adding an initial bias to the input of a
pass can overflow, as the input coefficients are already 15bits,
which is the maximum this function can handle.
Overall, however, the omse on 12 bits samples goes from 0.16916 to
0.16883. Reducing rowshift by 1 improves to 0.0908, but causes
overflows.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Modeled from the prores version. Clips to [0;1023] and is bitexact.
Bitexactness requires to add offsets in different places compared to
prores or C, and makes the function approximately 2% slower.
For 16 frames of a DNxHD 4:2:2 10bits test sequence:
C: 60861 decicycles in idct, 1048205 runs, 371 skips
sse2: 27567 decicycles in idct, 1048216 runs, 360 skips
avx: 26272 decicycles in idct, 1048171 runs, 405 skips
The add version is not implemented, so the corresponding dsp
function is set to NULL to make it clear in a code executing it.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
When the input of a pass has 15 or 16 bits of precision (in particular
the column pass), the addition of a bias to W4 may lead to overflows
in the input to pmaddwd.
This requires postponing the adding of the bias to after the first
butterfly. To do so, the fact that m15, unused although zeroed, is
exploited. In case the pass is safe, an address can be directly used,
and the number of xmm regs can be decreased. Otherwise, the 32bits bias
is loaded into it.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
omse goes from 0.03060703 (which fails for dct-test) to 0.01663750.
This also actually improve the error of decoding the sample generated
by fate-vsynth3-dnxhd1080i-10bit using simple_idct10 to FAANI, which
goes (when resampled to yuv422p) from:
stddev: 0.06 PSNR: 72.28 MAXDIFF: 1
to identical.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This should be reused for a generic simple_idct10 function.
Requires a bit of trickery to declare common constants in C.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Bare ampersand characters are still accepted, even though out-of-spec.
Also fixes adjacent tags not being parsed.
Fixes trac #4915
Signed-off-by: Ricardo Constantino <wiiaboo@gmail.com>
This commit adds the ability for a profile to set the default
options, as well as for the user to override such options
by simply stating them in the command line while still keeping
the same profile, as long as those options are still permitted by
the profile.
Example: setting the profile to aac_low (the default) will turn
PNS and IS on. They can be disabled by -aac_pns 0 and -aac_is 0,
respectively. Turning on -aac_pred 1 will cause the profile to be
elevated to aac_main, as long as no options forbidding aac_main
have been entered (like AAC-LTP, which will be pushed soon).
A useful feature is that by setting the profile to mpeg2_aac_low,
all MPEG4 features will be disabled and if the user tries to enable
them then the program will exit with an error. This profile is
signalled with the same bitstream as aac_low (MPEG4) but some devices
and decoders will fail if any MPEG4 features have been enabled.
This commit implements support for 7.1 channel audio. There's no
more predefined bitstream channel mappings so going beyond 8 channels
(and 7 channels exactly) will require programmable channel elements,
which is already underway.
The bulk of calls to quantize_band_cost are replaced
by a call to a version that memoizes, greatly improving
performance, since during coefficient search there is
a great deal of repeat work.
Memoization cannot always be applied, so do this in a
different function, and leave the original as-is.
Intermediate results can indeed violate SF delta. Instead of asserting
there, just make the code safe, and assert on the final result.
Also re-clamp SFs more often in short windows (which tend to violate
the restriction when encoding the switch from one window to the other)
It was merged with the iff_ilbm decoder in commit
929a24efff.
Define AV_CODEC_ID_IFF_BYTERUN1 as AV_CODEC_ID_IFF_ILBM for API
compatibility.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
This finalizes merging of the work in the patches in ticket #2686.
Improvements to twoloop and RC logic are extensive.
The non-exhaustive list of twoloop improvments includes:
- Tweaks to distortion limits on the RD optimization phase of twoloop
- Deeper search in twoloop
- PNS information marking to let twoloop decide when to use it
(turned out having the decision made separately wasn't working)
- Tonal band detection and priorization
- Better band energy conservation rules
- Strict hole avoidance
For rate control:
- Use psymodel's bit allocation to allow proper use of the bit
reservoir. Don't work against the bit reservoir by moving lambda
in the opposite direction when psymodel decides to allocate more/less
bits to a frame.
- Retry the encode if the effective rate lies outside a reasonable
margin of psymodel's allocation or the selected ABR.
- Log average lambda at the end. Useful info for everyone, but especially
for tuning of the various encoder constants that relate to lambda
feedback.
Psy:
- Do not apply lowpass with a FIR filter, instead just let the coder
zero bands above the cutoff. The FIR filter induces group delay,
and while zeroing bands causes ripple, it's lost in the quantization
noise.
- Experimental VBR bit allocation code
- Tweak automatic lowpass filter threshold to maximize audio bandwidth
at all bitrates while still providing acceptable, stable quality.
I/S:
- Phase decision fixes. Unrelated to #2686, but the bugs only surfaced
when the merge was finalized. Measure I/S band energy accounting for
phase, and prevent I/S and M/S from being applied both.
PNS:
- Avoid marking short bands with PNS when they're part of a window
group in which there's a large variation of energy from one window
to the next. PNS can't preserve those and the effect is extremely
noticeable.
M/S:
- Implement BMLD protection similar to the specified in
ISO-IEC/13818:7-2003, Appendix C Section 6.1. Since M/S decision
doesn't conform to section 6.1, a different method had to be
implemented, but should provide equivalent protection.
- Move the decision logic closer to the method specified in
ISO-IEC/13818:7-2003, Appendix C Section 6.1. Specifically,
make sure M/S needs less bits than dual stereo.
- Don't apply M/S in bands that are using I/S
Now, this of course needed adjustments in the compare targets and
fuzz factors of the AAC encoder's fate tests, but if wondering why
the targets go up (more distortion), consider the previous coder
was using too many bits on LF content (far more than required by
psy), and thus those signals will now be more distorted, not less.
The extra distortion isn't audible though, I carried extensive
ABX testing to make sure.
A very similar patch was also extensively tested by Kamendo2 in
the context of #2686.
This should fix the undefined behavior reported in:
https://trac.ffmpeg.org/ticket/4727.
I can reproduce this at runtime: simply stick in an abort call in
asym_quant to check if c < 0 and run FATE. I don't know ac3 so I can't
confirm if negative coefficients are intentional, but at the moment they
clearly are according to FATE.
This resolves the undefined behavior. Tested with FATE.
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
the pps offset is used to locate pps in the spspps_buf; however, the
current calc method is wrong because it is the offset of the original
avctx->extradata;
when there is only one sps in the avcc; the value is correct by
coincidence, however, it will fail in avcc with multi sps
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This fixes a warning observed on Clang 3.7:
"warning: attribute 'deprecated' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes]"
and thus enables deprecation warning for the relevant struct.
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
This should fix the first undefined behavior reported in:
https://trac.ffmpeg.org/ticket/4727.
I can't reproduce the runtime behavior reported in the ticket, hence I
can't confirm that this actually fixes the exact issue reported in the
ticket.
Regardless, I can confirm that this is a genuine issue, and that
negative shifts can (and do) occur, fixed by this.
Tested with FATE.
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
There is not much reason to generate such a small table at runtime.
Signed-off-by: Derek Buitenhuis <derekb@vimeo.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
The MBAFF handling recently introduced on the decoder side shows that
the encoder does not support it correctly. Therefore, make the related
profile experimental.
Furthermore, current encoder logic treats it as unable to encode as
progressive, which isn't the case.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
MBAFF-like handling of interlaced content in CID 1260 is different from
the other CIDs, and in particular doesn't use the same syntax.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Profiles 1256 & 1270 (currently) signal at the frame header and MB
levels the colorspace used, either RGB or YUV. While a MB-level
varying colorspace is not supported, whether it is constant can be
tracked so as to determine the exact colorspace.
This requires having bitdepth and the ACT and 4:4:4 flags, in turn
needing the CID. Because setting those before having validated
enough things may result in invalid/unset DSP fucntions, setting
the bitdepth in the context is delayed.
It is not tested against a true RGB sequence, though.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
place primary audio coding header data into DCAAudioHeader
structure to make DCAContext clearer
and move channel related data to DCAChan structure to make
them easier to use by extensions
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Do not fail when original resolution is smaller than current one,
as the frame buffer is resized automatically.
Signed-off-by: Vittorio Giovara <vittorio.giovara at gmail.com>
Modified datatype of function argument (pitch from int32_t to ptrdiff_t)
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Modified sps and pps access from old HEVCContext(s) structure to newly introduced HEVCParamSets(ps)
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The -threads option is ignored with libkvazaar since it does not have
any of the AV_CODEC_CAP_{FRAME,SLICE,AUTO}_THREADS capabilities. This
commit removes the incorrect documentation as well as the no-op of
setting the number of threads in libkvazaar encoder.
Signed-off-by: Arttu Ylä-Outinen <arttu.yla-outinen@tut.fi>
The divisor and dividend in the equation had been swapped, making the
result the inverse of the actual framerate.
Signed-off-by: Arttu Ylä-Outinen <arttu.yla-outinen@tut.fi>
Changes function libkvazaar_encode to return proper error codes instead
of crashing when the video dimensions or pixel format change in the
middle of encoding.
Signed-off-by: Arttu Ylä-Outinen <arttu.yla-outinen@tut.fi>
Function encoder_encode in Kvazaar API was changed to have new output
parameters: source picture and frame info. Frame info is used to set the
keyframe flag and source picture is ignored.
Signed-off-by: Arttu Ylä-Outinen <arttu.yla-outinen@tut.fi>
This saves one register in a few cases on 32bit builds with unaligned
stack (e.g. MSVC), making the code slightly easier to maintain.
(Can someone please test this on 32bit+msvc and confirm make fate-vp9
and tests/checkasm/checkasm still work after this patch?)
This patch moves the pointer validity check outside the macro,
and silences the -Waddress observed with GCC 5.2.
Note that this changes the error message slightly, from:
"bad option..." to "Error parsing option...".
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
The structure is copied around and that triggers warnings if it is uninitialized
Fixes CID1322360
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
These are DNxHR profiles with the following properties:
- Variable size in a profile (property added in a previous commit),
requiring variable-sized macroblock table;
- Variable bitdepth, up to 12 bits.
- Better validation of buffer sizes and positions
Signed-off-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This a 4:4:4 10 bits profile, where image size is not fixed by the
profile, and which strays a bit outside the old frame header parsing
code.
Fixes ticket #4581 (DNxHR is not stricly supported, but that sequence is).
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Currently not used, but will be used to indicate that a CIDEntry field
is not set, because it is variable, and that checks should be adapted.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Move the 'interlaced' flag to this element (arbitrarily set to 16bits).
This should allow better detection/selection of profiles.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* commit '39f01e346cab464ef6c0d4ec58cc13b7123e60d8':
mmaldec: be more tolerant against MMAL not returning decoded output
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
* commit '65db4899fa8790049bec3af16ecdb75dd81051fd':
mmaldec: refactor to have more context per MMAL input buffer
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
* commit 'abe9adfb31566c415fd830a8d4977c79512d4385':
rangecoder: Use AV_RB16 instead of bytestream_get_be16
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
* commit '4628443ca3534060888dd0015b229337eac13fd2':
h263: Drop uninitialized variable use from log message
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
* commit 'bb198c4997d5036f3bf91de51e44f807115677d0':
d3d11va: make av_d3d11va_alloc_context() available at all times
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
In some situations, MMAL won't return a decoded frame for certain input
frames. This can happen if a frame fails to decode, or if a packet does
not actually contain a complete frame. In these situations, we would
deadlock (or actually timeout) waiting for an expected output frame,
which is not ideal. On the other hand, there are situations where we
definitely have to block to avoid deadlocks. (This mess is a
consequence of trying to map MMAL's asynchronous and flexible
dataflow to libavcodec, which is more static and rigid.)
Solve this by doing a blocking wait only if the amount of buffered data
is too big. The whole purpose of the blocking wait is to avoid excessive
buffering of input data, so we can skip it if it appears to be low. The
consequence is that libavcodec can gracefully return no frame to the
API user.
We want to track the number of full packets to make our heuristic work.
But MMAL buffers are fixed-size, requiring splitting large packets. This
is why the previous commit is needed. We use the ..._FRAME_END flag to
remember packet boundaries, but MMAL does not preserve these buffer
flags when returning buffers to the user.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
The next commit needs 1 bit of additional information per MMAL buffer
sent to the MMAL input port. This information will be needed when the
buffer is recycled (i.e. returned by the input port's callback).
Normally, we could use MMAL_BUFFER_HEADER_FLAG_USER0, but that is
unexpectedly not preserved.
Do this by storing a pointer to FFBufferEntry in the MMAL buffer's
user data, instead of an AVBufferRef. This also changes the lifetime
of FFBufferEntry.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
When compiled with --disable-pthreads, e.g
http://fate.ffmpeg.org/report.cgi?time=20150917015044&slot=alpha-debian-qemu-gcc-4.7,
a bunch of -Wunused-functions are reported due to missing header guards
around threading related functions.
This patch should silence such warnings.
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
The intended meaning is "if this block is the first block in a slice then
its left boundary is a slice boundary". Silence a logical-not-parentheses
warning from gcc.
Silence a warning due to frame assignment in dvenc. All uses of the
reference in dvdec are read only, except the ones in the main decoding
function, so use the frame pointer directly there.
Assumes 'GA94' format (ATSC standard)
Signed-off-by: DHE <git@dehacked.net>
Tested-by: Anshul <anshul.ffmpeg@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* commit 'd0a3e89d41b05f9ed0e7401c352b60ed4f4d1ed5':
dcadec: make a number of samples per subband per subsubframe a named constant
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
Fixes: https://trac.ffmpeg.org/attachment/ticket/685/movie.264
In the available testcase the actual PPS only uses a few bits
while there are 7kbyte of apparently random data after it
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This affects Annex B streams (such as demuxed from .ts and others). It
also handles the format change in reinit-large_420_8-to-small_420_8.h264
correctly.
Instead of passing through the extradata, create it on the fly it from
the currently active SPS and PPS. Since reconstructing the PPS and SPS
NALs would be very complicated and verbose, we use the NALs as they
originally appeared in the bitstream.
The code for writing the extradata is somewhat derived from
libavformat/avc.c, but it's small and different enough that sharing it
is not really worth it.
We assume an upper bound of 4096 bytes for each raw SPS/PPS. It's hard
to determine an exact maximum size, but this value was was considered
high enough and safe.
Needed for the following VideotoolBox commit.
The current one, while correct, does not yield the best possible
results. The specificiations suggest another formula, which results
in quality gains in the decoded output from fate tests. This
justifies changing said formula.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
CID 1256 is specified as using the same table for luma and chroma,
which is the same as CID 1235 luma table. This is consistent with
the format supposedly being RGB, although most sequences seem to
actually be YCbCr-encoded.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
Tables 1258 and 1259 were not zigzagged when added, so it was not
possible to notice the equivalence.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
Convert them to zigzag order, as the rest of them are.
When I was adding support for 10-bit DNxHD, I just copy-pasted the
missing quant matrices from the spec. Now it turns out the existing
matrices in dnxhddata.c were in zigzag order. This resulted in wrong
quantization for 10-bit DNxHD. The attached patch fixes the problem by
converting 10-bit quant matrices to zigzag order.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
It is only (mis-)used to set the dsp fucntions clear_block(s). But
these functions always work on 16bits-wide elements, which make
the parameter useless and actually harmful, as it causes all content
on more than 8-bits to not use accelerated functions.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
When forwarding the frame type information, by default x264 can
decide which kind of keyframe output, add an option to force it
to output IDR frames in to support use-cases such as preparing
the content for segmented streams formats.
x264 build 147 adds the native support for NV21.
Useful to avoid additional pixel format conversion when encoding
from a wide range of capture devices, Android among those.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Fixes deadlock with threads
Found-by: Paul B Mahol
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The variable is not a constant and can lead to race conditions
Fixes: repro.webm (not reproducable with FFmpeg alone)
Found-by: Dale Curtis <dalecurtis@google.com>
Tested-by: Dale Curtis <dalecurtis@google.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
These field are difficult to interpret, and are provided by a single
encoder (mpegvideoenc). In general they do not belong to a structure
containing raw data only, so remove them from AVFrame.
Mpegvideoenc now uses a private field in Picture for its internal
computations.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
Commits 43bc5cf9 and c5371f77 add code for skipping initial zeros in mp3
packets. This code forgot to report to the user that data was skipped at
all.
Since audio codecs allow partial packet decoding, the user application
has to rely on the return value. It will remove the data reported as
consumed by the decoder, and feed it to the decoder again. This resulted
in the mp3 frame after the zero region to be decoded over and over
again, until the zero region was finally skipped by the application.
Fix this by including the amount of skipped bytes to the number of
consumed bytes returned by the decode call.
Fixes trac ticket #4890.
CID 1256 is specified as using the same table for luma and chroma,
which is the same as CID 1235 luma table. This is consistent with
the format supposedly being RGB, although most sequences seem to
actually be YCbCr-encoded.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
VideoToolbox also implements a software decoder for h264, and will fallback to
using it if the file cannot be decoded on the GPU. In these cases though,
we want the hwaccel to fail so that we can use the libavcodec software decoder
instead of the Apple one.
Signed-off-by: wm4 <nfxjfg@googlemail.com>
* commit '4885bde3187a2bb0cae85b67796e07db233bf77f':
motion_est_template: Fix undefined left shift of negative number
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
* commit '948f3c19a8bd069768ca411212aaf8c1ed96b10d':
lavc: Make AVPacket.duration int64, and deprecate convergence_duration
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
This bit is 1 in some samples, and seems to coincide with interlaced
mbs and CID1260. 2008 specs do not know about it, and maintain qscale
is 11 bits. This looks oversized, but may help larger bitdepths.
Currently, it leads to an obviously incorrect qscale value, meaning
its syntax is shifted by 1. However, reading 11 bits also leads to
obviously incorrect decoding: qscale seems to be 10 bits.
However, as most profiles still have 11bits qscale, the feature is
restricted to the CID1260 profile (this flag is dependent on
a higher-level flag located in the header).
The encoder writes 12 bits of syntax, last and first bits always 0,
which is now somewhat inconsistent with the decoder, but ends up with
the same effect (progressive + reserved bit).
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>