The 3*stride value stored in r3src can be loaded much later,
so use r3src instead of a dedicated gpr when possible.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
GCC 4.9.2 on a Core i5-4200U @ 1.60GHz, Linux x86_64
Before
715487 decicycles in sao_edge_filter_8, 262144 runs, 0 skips
After
672104 decicycles in sao_edge_filter_8, 262144 runs, 0 skips
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
* commit '3a651f599a18b023602370b67a77eb0efa309b20':
dca: Move data tables from a header to an object file
Conflicts:
libavcodec/Makefile
libavcodec/dcadata.h
libavcodec/dcadec.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '73ae0a9d12857852222363f9a7c14d07058ebfd3':
g722: Split out computation of band->s_zero and unroll code
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '10f160768b824f00933f33bc69f1fae89a25dfc8':
g722: Reduce number of pointers passed to g722_apply_qmf() function
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '67690683130faf37dd9d969ced15eba2a1940ade':
g722: Split out g722_qmf_apply() function into g722dsp.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This also allows replacing several literal numbers by named constants
And it should be faster, the function is not speed relevant though as it is
generally only called a few times at the streams start.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '2a9c6fae927964b5dd0b5d3d9292f5621bd21664':
dca: Move all tables into dcadata.h
Conflicts:
libavcodec/dcadec.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'b339019de4e5f4d3c661bbdba98ae248ab77e2f0':
dca: Split code for handling the EXSS extension off into a separate file
Conflicts:
libavcodec/Makefile
libavcodec/dcadec.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The second stride is always the internal buffer one, MAX_PB_SIZE (times 2 to
get the value in bytes).
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
hevc seems to be the only place where the C implementation
of the av_clip function is explicitly selected, precluding
platform-specific optimizations
Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Fixes out of array accesses
Fixes: ffmpeg_mjpeg_crash.avi
Found-by: Thomas Lindroth <thomas.lindroth@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Also move EC ref initialization to where the EC code is called.
Fixes out of array read
Fixes: asan_heap-uaf_143f420_142_20110805_112659_ch0.mkv
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The width parameter is now completely at the back, and actually
never used. This helps understanding the actual parameter list.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Their intent was to make the DSP work with wmalossless pro.
The later was fixed to work with the DSP.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The scaling list can be specified in either the SPS or PPS.
Additionally, compensate for the diagonal scan permutation applied
in the decoder.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
pb_eo must be handled as a rip relative address for MSVC64, so an
intermediate register is needed. Should fix link failures.
Suggested by Hendrik Leppkes and Christophe Gisquet.
Tested-By: Hendrik Leppkes <h.leppkes@gmail.com>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
The epel_hv functions were still relying on only epel_hv 8-wide
being the maximum width instanciated.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Fixes out of array read
Fixes: asan_static-oob_30328b6_719_cov_3325483287_H264_artifacts_motion.h264
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This reverts commit 3b4ffba3af.
Unbreaks the SSSE3 code on mingw32
Conflicts:
libavcodec/x86/lossless_audiodsp.asm
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This is needed as the mmx code is used as fallback from the ssse3 code
Suggested-by: jamrial
Tested-by: wm4
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The buffer pointers would be otherwise overwritten, causing a
leak on e.g. PERSIST_RPARAM_A_RExt_Sony_1.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
All the webm/vp9 files I have seen so far can have packets that contain
1 invisible and 1 visible frame. The vp9 parser separates them. Since
the invisible frame is always (?) the first sub-packet, the new packet
is assigned the PTS of the original packet, while the packet containing
the visible frame has no PTS.
This patch essentially reassigns the PTS from the invisible to the
visible frame.
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Original x86 intrinsics code by Pierre-Edouard Lepere.
Yasm port, refactoring and optimizations by James Almer.
Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U
Width 32
342694 decicycles in sao_edge_filter_10, 16384 runs, 0 skips
29476 decicycles in ff_hevc_sao_edge_filter_32_10_ssse3, 16384 runs, 0 skips
13996 decicycles in ff_hevc_sao_edge_filter_32_10_avx2, 16381 runs, 3 skips
Width 64
581163 decicycles in sao_edge_filter_10, 8192 runs, 0 skips
59774 decicycles in ff_hevc_sao_edge_filter_64_10_ssse3, 8192 runs, 0 skips
28383 decicycles in ff_hevc_sao_edge_filter_64_10_avx2, 8191 runs, 1 skips
Signed-off-by: James Almer <jamrial@gmail.com>
Original x86 intrinsics code and initial yasm port by Pierre-Edouard Lepere.
Refactoring and optimizations by James Almer.
Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U
Width 32
158583 decicycles in edge, sao_edge_filter_8 runs, 0 skips
5205 decicycles in ff_hevc_sao_edge_filter_32_8_ssse3, 32767 runs, 1 skips
2942 decicycles in ff_hevc_sao_edge_filter_32_8_avx2, 32767 runs, 1 skips
Width 64
705639 decicycles in sao_edge_filter_8, 262144 runs, 0 skips
19224 decicycles in ff_hevc_sao_edge_filter_64_8_ssse3, 262111 runs, 33 skips
10433 decicycles in ff_hevc_sao_edge_filter_64_8_avx2, 262115 runs, 29 skips
Signed-off-by: James Almer <jamrial@gmail.com>
This ensures we do not loose the frame in case or multiple clears
Fixes out of array read
Fixes: asan_heap-oob_2fa47ea_2100_cov_1278768963_ff_add_pixels_clamped_mmx.m2ts
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
For reasons we are not privy to, nvidia decided that the nvenc encoder
should apply aspect ratio compensation to 'DVD like' content, assuming that
the content is not bt.601 compliant, but needs to be bt.601 compliant. In
this context, that means that they make the following, questionable,
assumptions:
1) If the input dimensions are 720x480 or 720x576, assume the content has
an active area of 704x480 or 704x576.
2) Assume that whatever the input sample aspect ratio is, it does not account
for the difference between 'physical' and 'active' dimensions.
From, these assumptions, they then conclude that they can 'help', by adjusting
the sample aspect ratio by a factor of 45/44. And indeed, if you wanted to
display only the 704 wide active area with the same aspect ratio as the full
720 wide image - this would be the correct adjustment factor, but what if you
don't? And more importantly, what if you're used to ffmpeg not making this kind
of adjustment at encode time - because none of the other encoders do this!
And, what if you had already accounted for bt.601 and your input had the
correct attributes? Well, it's going to apply the compensation anyway!
So, if you take some content, and feed it through nvenc repeatedly, it
will keep scaling the aspect ratio every time, stretching your video out
more and more and more.
So, clearly, regardless of whether you want to apply bt.601 aspect ratio
adjustments or not, this is not the way to do it. With any other ffmpeg
encoder, you would do it as part of defining your input paramters or
do the adjustment at playback time, and there's no reason by nvenc
should be any different.
This change adds some logic to undo the compensation that nvenc would
otherwise do.
nvidia engineers have told us that they will work to make this
compensation mechanism optional in a future release of the nvenc
SDK. At that point, we can adapt accordingly.
Signed-off-by: Philip Langdale <philipl@overt.org>
Reviewed-by: Timo Rothenpieler <timo@rothenpieler.org>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Fixes integer overflow and out of array read
Fixes: asan_heap-oob_1fb2f9b_3780_cov_3984375136_usf.mkv
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
As with sao_band_filter, pass instead the two variables from the struct needed in the function.
This simplifies writing asm optimized versions.
Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr>
Signed-off-by: James Almer <jamrial@gmail.com>
Fixes out of array accesses
Fixes: asan_heap-oob_1c1a4ea_1242_cov_2274415971_TESTcmyk.jpg
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The scaling list can be specified in either the SPS or PPS.
Additionally, compensate for the diagonal scan permutation applied in the decoder.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'd615187f74ddf3413778a8b5b7ae17255b0df88e':
aacdec: Support for ER AAC ELD 480.
Conflicts:
libavcodec/aacdec.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '0ee2573347ecdb9cb5656001f7201d819eec16d8':
aacdec: Support for ER AAC in LATM
Conflicts:
libavcodec/aacdec.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Some files contain a few additional, all-0 bits.
Check for that case and don't print incorrect "not supported"
message.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Signed-off-by: Alex Converse <alex.converse@gmail.com>
Use edge emu buffers
And enable the code unconditionally
Speed difference without USE_SAO_SMALL_BUFFER and with the new code:
Decicycles: 26772->26220 (BO32), 83803->80942 (BO64)
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
cherry picked from commit 5d9f79edef2c11b915bdac3a025b59a32082f409
SAO edge filter uses pre-SAO pixel data on the left and top of the ctb, so
this data must be kept available. This was done previously by having 2
copies of the frame, one before and one after SAO.
This commit reduces the storage to just that, instead of the previous whole
frame.
Commit message taken from patch by Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This is currently always true, the assert protects against
future changes to the code breaking this assumtation
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '9b8c8a9395c849639aea0f6b5300e991e93c3a73':
svq1dec: Validate the stages value strictly
Not merged, this is wrong, the condition is not possible
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '3d5d46233cd81f78138a6d7418d480af04d3f6c8':
opus: Factor out imdct15 into a standalone component
Conflicts:
configure
libavcodec/opus_celt.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '28df0151b6618226b05ee52e031af0b11ca531b0':
configure: Add a dependency on vc1_decoder from vc1_parser
See: 6ac3c8c6a0
Merged-by: Michael Niedermayer <michaelni@gmx.at>
libopenjpegenc crashes with "pointer being freed was not allocated" when threading
is enabled with:
ffmpeg -i tests/vsynth1/01.pgm -vcodec libopenjpeg file.j2k
this appears to be a bug in libopenjpeg
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This fixes builds with vc1_parser enabled without vc1_decoder. All
the vc1_decoder object files were included in the vc1_parser line
in libavcodec/Makefile before, but architecture specific object files
for vc1_decoder were not.
Signed-off-by: Martin Storsjö <martin@martin.st>
Prevents an 'Invalid packet' message. Currently mid-stream setup packets
are ignored. Theoretically, they could, based on the specification, be used to
reinitialize the stream if parameters change, but I don't expect that to be
common (and no one seems to have asked for it).
Signed-off-by: Ben Boeckel <mathstuf@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
For band filter, source and destination are aligned (except for 16x16 ctbs),
and otherwise, they are most often aligned. Overall, the total width is also
too small for amortizing memcpy.
Timings (using an intrinsic version of edge filters):
B/32 B/64 E/32 E/64
Before: 32045 93952 38925 126896
After: 26772 83803 33942 117182