The position is either rounded or not checked, so delay the wait to
check the proper value.
Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
beta0 and beta1 will always be the same within a CU
Signed-off-by: Mickaël Raulet <mraulet@insa-rennes.fr>
cherry picked from commit 4a23d824741a289c7d2d2f2871d1e2621b63fa1b
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-new rext bitstreams:
PERSIST_RPARAM_A_RExt_Sony_1.bit ok =
QMATRIX_A_RExt_Sony_1.bit ok =
SAO_A_RExt_MediaTek_1.bit ok =
(cherry picked from commit cdea029d452c521f8e5bcbe589f44b13a4011604)
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '6869612f5c7d4d2f20f69a5658328a761deadb1c':
arm: Macroize the test for 'setend' CPU instruction support
Conflicts:
libavcodec/arm/h264dsp_init_arm.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Intrinsics only used on aarch64 since the existing ARMv7 NEON asm
is slightly faster (Cortex-A9, gcc-4.8, micro-benchmarks and full
decoding time).
Signed-off-by: James Yu <james.yu@linaro.org>
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
* commit '16b7328058fa600d5158c84d9cc621a134eb88bc':
build: Conditionally build and run DCT test program
Conflicts:
libavcodec/Makefile
libavcodec/dct-test.c
tests/fate/libavcodec.mak
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'bd499d9af668aef979ec9f3f3215b8dd508c7ec1':
build: Conditionally build and test iirfilter
Conflicts:
libavcodec/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
They are unneeded and make adding elements slightly harder as they
would need to be constantly updated
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'c39059bea3adebcd888571d1181db215eee54495':
h264: Fix direct temporal mvs for bottom-field-first poc order
Conflicts:
libavcodec/h264_direct.c
See: ebd1c505d2
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '4de8b60684ce13dff3e3d372dae4f49b9e53f755':
idct: Move arm-specific declarations to a header in the arm directory
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Such files can be created using the --bff x264 option.
Sample-Id: h264_direct_temporal_mvs_bff.mkv
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
* commit '9f99a5f1d078721a30a76aec27c58805b7b87e58':
mpegencconetxt: Move rv10-specific orig_width/orig_height where they belong
Merged-by: Michael Niedermayer <michaelni@gmx.at>
When dealing with MVs, both components may be processed at a time.
On Win64, 560 to 539 cycles for derive_spatial_merge_candidates.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
There's a lag of one CTB line for SAO behind deblocking filter, except for
last line. However, once SAO has been completed on a line, all its pixels,
i.e. up to y+ctb_size are filtered and ready to be used as reference.
Without SAO, when deblocking filter finishes a CTB line, only the bottom
bottom 4 pixels may be filtered when next CTB is process by the deblocing.
The await_progess for hevc then checks whether the bottom pixels of a PU
requires access beyond that point, so the reporting should effectively
report up to the the above limits.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '1a583c0c60240adb8fa6620c6df33f1a0a0fe5d9':
fdct: Move ppc-specific declarations to a header in the ppc directory
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '5dcc201505f71b1e73e9eef12ce89d4eed252ad0':
simple_idct: Move x86-specific declarations to a header in the x86 directory
Conflicts:
libavcodec/x86/simple_idct.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '85cabb8d002f2cd100ced5cc17d87bfc9460d314':
fdct: Move x86-specific declarations to a header in the x86 directory
Merged-by: Michael Niedermayer <michaelni@gmx.at>
- adding one extra pixel all around the frame
- do not copy when SAO is not applied
5% improvement
cherry picked from commit 10fc29fc19a12c4d8168fbe1a954b76386db12d0
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '24af1aa0f70362a66cda04c9d7cd012e019f5572':
fft: Convert FFT/MDCT permutation type #defines to enums
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '746ad4e0df7faf93329804e412ec53c1d929a75b':
dct-test: Improve CPU flags struct member name
Conflicts:
libavcodec/dct-test.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'cb44b21da1f59923be577f08c267ec270529be97':
dct-test: Move cpu_flags variable out of global scope
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '7e18a727d2c2a19f22fcf68875d1b05fd2eafcef':
arm: cosmetics: Consistently use lowercase for shift operators
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '87552d54d3337c3241e8a9e1a05df16eaa821496':
armv6: Accelerate ff_fft_calc for general case (nbits != 4)
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The previous implementation targeted DTS Coherent Acoustics, which only
requires nbits == 4 (fft16()). This case was (and still is) linked directly
rather than being indirected through ff_fft_calc_vfp(), but now the full
range from radix-4 up to radix-65536 is available. This benefits other codecs
such as AAC and AC3.
The implementaion is based upon the C version, with each routine larger than
radix-16 calling a hierarchy of smaller FFT functions, then performing a
post-processing pass. This pass benefits a lot from loop unrolling to
counter the long pipelines in the VFP. A relaxed calling standard also
reduces the overhead of the call hierarchy, and avoiding the excessive
inlining performed by GCC probably helps with I-cache utilisation too.
I benchmarked the result by measuring the number of gperftools samples that
hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
specifically in the FFT routines (fft4() to fft512() and pass()) for the
same sample AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
Audio decode 2245.5 53.1 1599.6 43.8 100.0% +40.4%
FFT routines 940.6 22.0 348.1 20.8 100.0% +170.2%
Signed-off-by: Martin Storsjö <martin@martin.st>
The previous implementation targeted DTS Coherent Acoustics, which only
requires mdct_bits == 6. This relatively small size lent itself to
unrolling the loops a small number of times, and encoding offsets
calculated at assembly time within the load/store instructions of each
iteration.
In the more general case (codecs such as AAC and AC3) much larger arrays
are used - mdct_bits == [8, 9, 11]. The old method does not scale for
these cases, so more integer registers are used with non-unrolled versions
of the loops (and with some stack spillage). The postrotation filter loop
is still unrolled by a factor of 2 to permit the double-buffering of some
VFP registers to facilitate overlap of neighbouring iterations.
I benchmarked the result by measuring the number of gperftools samples
that hit anywhere in the AAC decoder (starting from aac_decode_frame())
or specifically in ff_imdct_half_c / ff_imdct_half_vfp, for the same
example AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
aac_decode_frame 2368.1 35.8 2117.2 35.3 100.0% +11.8%
ff_imdct_half_* 457.5 22.4 251.2 16.2 100.0% +82.1%
Signed-off-by: Martin Storsjö <martin@martin.st>
These where removed by libav in
See: git show -C 2d60444331
diff --git a/libavcodec/dsputil.c b/libavcodec/me_cmp.c
similarity index 98%
rename from libavcodec/dsputil.c
rename to libavcodec/me_cmp.c
index ba71a99..9fcc937 100644
--- a/libavcodec/dsputil.c
+++ b/libavcodec/me_cmp.c
@@ -1,8 +1,4 @@
/*
- * DSP utils
- * Copyright (c) 2000, 2001 Fabrice Bellard
- * Copyright (c) 2002-2004 Michael Niedermayer <michaelni@gmx.at>
- *
* This file is part of Libav.
*
* Libav is free software; you can redistribute it and/or
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '14b4e64eabc84c5a5e57c8ccc56bbeb95380823b':
g2meet: allow size changes within original sizes
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Fixes fate failure with --enable-memory-poisoning && make THREAD_TYPE=slice THREADS=7 fate-hevc-conformance-ENTP_C_Qualcomm_1
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
- support for 4:2:2 and 4:4:4 up to 12 bits
- add a new profile for range extension
(cherry picked from commit d3c067fa65bbc871758d28aa07f54123430ca346)
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The previous implementation targeted DTS Coherent Acoustics, which only
requires mdct_bits == 6. This relatively small size lent itself to
unrolling the loops a small number of times, and encoding offsets
calculated at assembly time within the load/store instructions of each
iteration.
In the more general case (codecs such as AAC and AC3) much larger arrays
are used - mdct_bits == [8, 9, 11]. The old method does not scale for
these cases, so more integer registers are used with non-unrolled versions
of the loops (and with some stack spillage). The postrotation filter loop
is still unrolled by a factor of 2 to permit the double-buffering of some
VFP registers to facilitate overlap of neighbouring iterations.
I benchmarked the result by measuring the number of gperftools samples
that hit anywhere in the AAC decoder (starting from aac_decode_frame())
or specifically in ff_imdct_half_c / ff_imdct_half_vfp, for the same
example AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
aac_decode_frame 2368.1 35.8 2117.2 35.3 100.0% +11.8%
ff_imdct_half_* 457.5 22.4 251.2 16.2 100.0% +82.1%
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>