There's a lag of one CTB line for SAO behind deblocking filter, except for
last line. However, once SAO has been completed on a line, all its pixels,
i.e. up to y+ctb_size are filtered and ready to be used as reference.
Without SAO, when deblocking filter finishes a CTB line, only the bottom
bottom 4 pixels may be filtered when next CTB is process by the deblocing.
The await_progess for hevc then checks whether the bottom pixels of a PU
requires access beyond that point, so the reporting should effectively
report up to the the above limits.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '1a583c0c60240adb8fa6620c6df33f1a0a0fe5d9':
fdct: Move ppc-specific declarations to a header in the ppc directory
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '5dcc201505f71b1e73e9eef12ce89d4eed252ad0':
simple_idct: Move x86-specific declarations to a header in the x86 directory
Conflicts:
libavcodec/x86/simple_idct.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '85cabb8d002f2cd100ced5cc17d87bfc9460d314':
fdct: Move x86-specific declarations to a header in the x86 directory
Merged-by: Michael Niedermayer <michaelni@gmx.at>
- adding one extra pixel all around the frame
- do not copy when SAO is not applied
5% improvement
cherry picked from commit 10fc29fc19a12c4d8168fbe1a954b76386db12d0
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '24af1aa0f70362a66cda04c9d7cd012e019f5572':
fft: Convert FFT/MDCT permutation type #defines to enums
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '746ad4e0df7faf93329804e412ec53c1d929a75b':
dct-test: Improve CPU flags struct member name
Conflicts:
libavcodec/dct-test.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'cb44b21da1f59923be577f08c267ec270529be97':
dct-test: Move cpu_flags variable out of global scope
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '7e18a727d2c2a19f22fcf68875d1b05fd2eafcef':
arm: cosmetics: Consistently use lowercase for shift operators
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '87552d54d3337c3241e8a9e1a05df16eaa821496':
armv6: Accelerate ff_fft_calc for general case (nbits != 4)
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The previous implementation targeted DTS Coherent Acoustics, which only
requires nbits == 4 (fft16()). This case was (and still is) linked directly
rather than being indirected through ff_fft_calc_vfp(), but now the full
range from radix-4 up to radix-65536 is available. This benefits other codecs
such as AAC and AC3.
The implementaion is based upon the C version, with each routine larger than
radix-16 calling a hierarchy of smaller FFT functions, then performing a
post-processing pass. This pass benefits a lot from loop unrolling to
counter the long pipelines in the VFP. A relaxed calling standard also
reduces the overhead of the call hierarchy, and avoiding the excessive
inlining performed by GCC probably helps with I-cache utilisation too.
I benchmarked the result by measuring the number of gperftools samples that
hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
specifically in the FFT routines (fft4() to fft512() and pass()) for the
same sample AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
Audio decode 2245.5 53.1 1599.6 43.8 100.0% +40.4%
FFT routines 940.6 22.0 348.1 20.8 100.0% +170.2%
Signed-off-by: Martin Storsjö <martin@martin.st>
The previous implementation targeted DTS Coherent Acoustics, which only
requires mdct_bits == 6. This relatively small size lent itself to
unrolling the loops a small number of times, and encoding offsets
calculated at assembly time within the load/store instructions of each
iteration.
In the more general case (codecs such as AAC and AC3) much larger arrays
are used - mdct_bits == [8, 9, 11]. The old method does not scale for
these cases, so more integer registers are used with non-unrolled versions
of the loops (and with some stack spillage). The postrotation filter loop
is still unrolled by a factor of 2 to permit the double-buffering of some
VFP registers to facilitate overlap of neighbouring iterations.
I benchmarked the result by measuring the number of gperftools samples
that hit anywhere in the AAC decoder (starting from aac_decode_frame())
or specifically in ff_imdct_half_c / ff_imdct_half_vfp, for the same
example AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
aac_decode_frame 2368.1 35.8 2117.2 35.3 100.0% +11.8%
ff_imdct_half_* 457.5 22.4 251.2 16.2 100.0% +82.1%
Signed-off-by: Martin Storsjö <martin@martin.st>
These where removed by libav in
See: git show -C 2d60444331
diff --git a/libavcodec/dsputil.c b/libavcodec/me_cmp.c
similarity index 98%
rename from libavcodec/dsputil.c
rename to libavcodec/me_cmp.c
index ba71a99..9fcc937 100644
--- a/libavcodec/dsputil.c
+++ b/libavcodec/me_cmp.c
@@ -1,8 +1,4 @@
/*
- * DSP utils
- * Copyright (c) 2000, 2001 Fabrice Bellard
- * Copyright (c) 2002-2004 Michael Niedermayer <michaelni@gmx.at>
- *
* This file is part of Libav.
*
* Libav is free software; you can redistribute it and/or
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '14b4e64eabc84c5a5e57c8ccc56bbeb95380823b':
g2meet: allow size changes within original sizes
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Fixes fate failure with --enable-memory-poisoning && make THREAD_TYPE=slice THREADS=7 fate-hevc-conformance-ENTP_C_Qualcomm_1
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
- support for 4:2:2 and 4:4:4 up to 12 bits
- add a new profile for range extension
(cherry picked from commit d3c067fa65bbc871758d28aa07f54123430ca346)
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The previous implementation targeted DTS Coherent Acoustics, which only
requires mdct_bits == 6. This relatively small size lent itself to
unrolling the loops a small number of times, and encoding offsets
calculated at assembly time within the load/store instructions of each
iteration.
In the more general case (codecs such as AAC and AC3) much larger arrays
are used - mdct_bits == [8, 9, 11]. The old method does not scale for
these cases, so more integer registers are used with non-unrolled versions
of the loops (and with some stack spillage). The postrotation filter loop
is still unrolled by a factor of 2 to permit the double-buffering of some
VFP registers to facilitate overlap of neighbouring iterations.
I benchmarked the result by measuring the number of gperftools samples
that hit anywhere in the AAC decoder (starting from aac_decode_frame())
or specifically in ff_imdct_half_c / ff_imdct_half_vfp, for the same
example AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
aac_decode_frame 2368.1 35.8 2117.2 35.3 100.0% +11.8%
ff_imdct_half_* 457.5 22.4 251.2 16.2 100.0% +82.1%
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'f43789b76e661acd93c21664678f140e53cfa1fa':
hevc: set the keyframe flag on output frames
See: e2760de605
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'f46bb608d9d76c543e4929dc8cffe36b84bd789e':
dsputil: Split off pixel block routines into their own context
Conflicts:
configure
libavcodec/dsputil.c
libavcodec/mpegvideo_enc.c
libavcodec/pixblockdsp_template.c
libavcodec/x86/dsputilenc.asm
libavcodec/x86/dsputilenc_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'f6ee61fb05482c617f5deee29a190d8ff483b3d1':
lavc: export DV profile API used by muxer/demuxer as public
Conflicts:
configure
doc/APIchanges
libavcodec/Makefile
libavcodec/dv_profile.c
libavcodec/dv_profile.h
libavcodec/version.h
libavformat/dvenc.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The issue affects dvdsub subtitles (a.k.a. VOBSUB).
Some players -- in particular hardware players -- cut off
the lowest row of pixels if the number of rows in the subtitle
is odd.
The patch below implements a work-around for that. If the
number of rows is odd, it is simply rounded up to an even
number, adding an invisible (i.e. fully transparent) row.
The work-around can be enabled or disabled with a new
option -even_rows_fix. The default is disabled, so there
is no change of behaviour for users who don't care about it.
The overhead for the fix is low, and in many cases even zero:
For subtitles with an odd number of rows (i.e. in 50% of
cases on average), the size increases by two bytes because
a fully transparent row is encoded as 0x00 0x00. However,
in the VOBSUB standard, all data packets are padded to 2KB
anyway, so in most cases the additional bytes just use some
part of the padding, so there is no overhead. Only in the
rare case that the 2KB boundary is hit (0.1% chance), a full
2KB block is added.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Disable moved functions to prevent build/test failure,
patch to update and re-enable them is welcome
volunteer to maintain the alpha code is welcome too
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The threshold was choosen so that no further size decrease happened with larger lambda
with the test video.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '79fce1ec8abd017593c003917fc123f7119a78d6':
arm: Avoid using the 'setend' instruction on ARMv7 and newer
Conflicts:
libavcodec/arm/h264dsp_init_arm.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '1e9a93bfca2c2f43a07e01f2ef9fd5cbafe6c22d':
libfdk-aacdec: Decode the first AAC frame to reliably identify the bitstream
Conflicts:
libavcodec/version.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
For implicit signaling cases (as possible for Spectral Band Replication
and Parametric Stereo Tools), the decoder must decode the first frame to
correctly identify the stream configuration (as called from
avformat_find_stream_info). The mechanism for this is built-in and only
requires adding CODEC_CAP_CHANNEL_CONF to the libfdk-aacdec AVCodec
struct.
Signed-off-by: Omer Osman <omer.osman@iis.fraunhofer.de>
Signed-off-by: Martin Storsjö <martin@martin.st>
* commit '246f869590b8c7313d26e1c2ef56db01f6fd2503':
vmd: Split audio and video decoder
Conflicts:
libavcodec/vmdvideo.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '8d686ca59db14900ad5c12b547fb8a7afc8b0b94':
dsputil: Split off *_8x8basis to a separate context
Conflicts:
libavcodec/dsputil.c
libavcodec/mpegvideo_enc.c
libavcodec/x86/dsputilenc_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '2fc85fe96e7e0e5fc433b98eacebf4d3511d2d58':
bmv: Split audio and video decoder
Conflicts:
libavcodec/bmvvideo.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'b0633f83f277c05bf1f617a99c7aedd2db8306e3':
paf: split audio and video decoder
Conflicts:
libavcodec/pafvideo.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
These where removed by libav while spliting the file in adcb8392c9
See: de6d9b6404
See: 723106b279
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
libutvideo.h is not a C header and thus fails building as a C file
Reviewed-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Currently, http://ffmpeg.org/doxygen/trunk/group__preproc__misc.html is
broken, because of inclusion bug in Doxygen's preprocessing mode. The
now-unincluded header file is included anyway in avcodec.h, the only
place where it is used, so there should be no problem removing it.
One concern is API-compatibility, because old_codec_ids.h is a public
header. However, IMO this is not a problem because currently users
cannot include only this header and not `avcodec.h` anyway, because of
missing AV_CODEC_ID_NONE symbol.
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'b0de1c766329dd8c9960ad1722e2f653160abc1b':
x86: build: Only compile FDCT code if MMX is enabled
Conflicts:
libavcodec/x86/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '12f129e545e5a5844b6ad7f3eb6a438015cad8bc':
x86: Unconditionally compile blockdsp and svq1enc init files
Conflicts:
libavcodec/x86/Makefile
blockdsp_mmx is renamed to blockdsp_init as we already have a blockdsp file
and _init is how all other such files are called
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* cehoyos/master:
Use os/2 palette even if it contains less than 256 entries.
Assume that old bmps do not contain transparency information.
Do not detect jp2 images as mov files.
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'e0bfe34ea8ccf333ec5b17961fd58eb575e74f8b':
libfdk-aacdec: Reduce the default decoder delay by one frame
Conflicts:
libavcodec/version.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The default error concealment method if none is set via
aacDecoder_SetParam(AAC_CONCEAL_METHOD) is set in
CConcealment_InitCommonData within the fdk-aac library
and is set to Energy Interpolation. This method requires one frame
delay to the output. To reduce the default decoder output delay and
avoid missing the last frame in file based decoding, use Noise
Substitution as the default concealment method.
Signed-off-by: Omer Osman <omer.osman@iis.fraunhofer.de>
Signed-off-by: Martin Storsjö <martin@martin.st>
* commit 'c6698dfe7cdbc7634f33245875488ed3fa4a8ced':
webpdec: Fix decoding of the huffman group indices.
Merged-by: Michael Niedermayer <michaelni@gmx.at>
improve the debugging function for saving subtitles
to PPM files: Actually use the alpha channel.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Fix an off-by-one error that causes the height of decoded
subtitles to be too small, thus cutting off the lowest row
of pixels.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'df949b645b12d62bb4da56d629a887c81f67f2e5':
hevc: Use the local context variable when needed
Conflicts:
libavcodec/hevc.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>