- adding one extra pixel all around the frame
- do not copy when SAO is not applied
5% improvement
cherry picked from commit 10fc29fc19a12c4d8168fbe1a954b76386db12d0
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'c0de9159a7ba5707aa0a5c2bc73ae78b7b87ec46':
avdevice: Give names to anonymously typedeffed structs
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '24af1aa0f70362a66cda04c9d7cd012e019f5572':
fft: Convert FFT/MDCT permutation type #defines to enums
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '746ad4e0df7faf93329804e412ec53c1d929a75b':
dct-test: Improve CPU flags struct member name
Conflicts:
libavcodec/dct-test.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'cb44b21da1f59923be577f08c267ec270529be97':
dct-test: Move cpu_flags variable out of global scope
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '7e18a727d2c2a19f22fcf68875d1b05fd2eafcef':
arm: cosmetics: Consistently use lowercase for shift operators
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '87552d54d3337c3241e8a9e1a05df16eaa821496':
armv6: Accelerate ff_fft_calc for general case (nbits != 4)
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '5c22e8e4ad0852d61d5c4ba8d67d33fd72339497':
armv6: Accelerate ff_imdct_half for general case (mdct_bits != 6)
See: 42c1cc35b7623ce76c7b55c6bc100f135e17cd4f
Merged-by: Michael Niedermayer <michaelni@gmx.at>
I benchmarked the result by measuring the number of gperftools samples that
hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
specifically in butterflies_float_c() / ff_butterflies_float_vfp() for the
same sample AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
Audio decode 1542.8 43.7 1470.5 41.5 100.0% +4.9%
butterflies_float 130.0 11.9 70.2 12.1 100.0% +85.2%
Signed-off-by: Martin Storsjö <martin@martin.st>
I benchmarked the result by measuring the number of gperftools samples that
hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
specifically in vector_fmul_window_c() / ff_vector_fmul_window_vfp() for the
same sample AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
Audio decode 1598.2 47.4 1529.2 25.4 100.0% +4.5%
vector_fmul_window 244.0 22.1 188.9 22.3 100.0% +29.2%
Signed-off-by: Martin Storsjö <martin@martin.st>
The previous implementation targeted DTS Coherent Acoustics, which only
requires nbits == 4 (fft16()). This case was (and still is) linked directly
rather than being indirected through ff_fft_calc_vfp(), but now the full
range from radix-4 up to radix-65536 is available. This benefits other codecs
such as AAC and AC3.
The implementaion is based upon the C version, with each routine larger than
radix-16 calling a hierarchy of smaller FFT functions, then performing a
post-processing pass. This pass benefits a lot from loop unrolling to
counter the long pipelines in the VFP. A relaxed calling standard also
reduces the overhead of the call hierarchy, and avoiding the excessive
inlining performed by GCC probably helps with I-cache utilisation too.
I benchmarked the result by measuring the number of gperftools samples that
hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
specifically in the FFT routines (fft4() to fft512() and pass()) for the
same sample AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
Audio decode 2245.5 53.1 1599.6 43.8 100.0% +40.4%
FFT routines 940.6 22.0 348.1 20.8 100.0% +170.2%
Signed-off-by: Martin Storsjö <martin@martin.st>
The previous implementation targeted DTS Coherent Acoustics, which only
requires mdct_bits == 6. This relatively small size lent itself to
unrolling the loops a small number of times, and encoding offsets
calculated at assembly time within the load/store instructions of each
iteration.
In the more general case (codecs such as AAC and AC3) much larger arrays
are used - mdct_bits == [8, 9, 11]. The old method does not scale for
these cases, so more integer registers are used with non-unrolled versions
of the loops (and with some stack spillage). The postrotation filter loop
is still unrolled by a factor of 2 to permit the double-buffering of some
VFP registers to facilitate overlap of neighbouring iterations.
I benchmarked the result by measuring the number of gperftools samples
that hit anywhere in the AAC decoder (starting from aac_decode_frame())
or specifically in ff_imdct_half_c / ff_imdct_half_vfp, for the same
example AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
aac_decode_frame 2368.1 35.8 2117.2 35.3 100.0% +11.8%
ff_imdct_half_* 457.5 22.4 251.2 16.2 100.0% +82.1%
Signed-off-by: Martin Storsjö <martin@martin.st>
These where removed by libav in
See: git show -C 2d60444331fca1910510038dd3817bea885c2367
diff --git a/libavcodec/dsputil.c b/libavcodec/me_cmp.c
similarity index 98%
rename from libavcodec/dsputil.c
rename to libavcodec/me_cmp.c
index ba71a99..9fcc937 100644
--- a/libavcodec/dsputil.c
+++ b/libavcodec/me_cmp.c
@@ -1,8 +1,4 @@
/*
- * DSP utils
- * Copyright (c) 2000, 2001 Fabrice Bellard
- * Copyright (c) 2002-2004 Michael Niedermayer <michaelni@gmx.at>
- *
* This file is part of Libav.
*
* Libav is free software; you can redistribute it and/or
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'a578b0407dc983aecd72028e1127062689b67089':
configure: Assume runtime cpu detection on arm on --target-os=android as well
Merged-by: Michael Niedermayer <michaelni@gmx.at>
When merging the formats around the automatically inserted
convert filters, the refcount of the format lists can not be 0.
Coverity does not detect it, and suspects a memory leak,
because if refcount is 0 the newly allocated lists are not
stored anywhere. That gives CIDs 1224282, 1224283 and 1224284.
Lists with refcount 0 are used in can_merge_formats(), so the
asserts can not be moved inside the merge functions.
The X11 servers by VNC, at 32-bits depths, has the following masks:
R:0x000007ff G:0x003ff800 B:0xffc00000
This is not compatible with AV_PIX_FMT_0RGB32, and the result
is success with completely wrong colors.