* Reduced xmm register count to 7 (As such they are now enabled for x86_32).
* Removed four movdqa (affects the sse2 version only).
* pxor is now used to clear m0 only once.
~5% faster.
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
bytestream2_* will not cause buffer overflow, but in that case, this means
the allocation would be incorrect and the encoded result invalid. Therefore,
assert no overflow occurred.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
With huge sampling rates, the table derivation method does not converge fast
enough. While fixing it using e.g. Newton-Rhapson-like methods (the curve is
nicely convex) is possible, it is much simpler to reject these cases.
The value of 96000 was arbitrarily chosen as a realistic value, though
1000000 would still work and converge.
Fixes ticket #3868.
Suggested-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The AVSampleFormat list of sample_fmts_s16p is missing the trailing "P" for planar formats. AV_SAMPLE_FMT_S16 vs AV_SAMPLE_FMT_S16P
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
They should match but they do not always
Fixes assertion failure
no testcase with unmodified source available
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The allocation didn't account for headers, that can be easily 79 bytes.
As a result, buffers allocated for a few samples (e.g. 5 in the original
bug) could be undersized.
Fixed ticket #2881.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
If the initial max_slice_size is 0 then reallocation is disabled for the first
slice.
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Not actually used in huffyuvenc, but rather in setting the frame
threading.
Example for some files:
context=0: 851974 27226 1137281
context=1,ND=0: 471819 22604 972351
context=1,ND=1: 472875 22673 972582
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Prior to 56.1.100, incorrect ALAC files for 24bps content were produced, in
particular not decoding losslessly.
Add an option to allow correctly decoding those streams.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The packet buffer allocation considers the alpha channel as DCT-coded,
while it is actually run-coded and thus requires a larger buffer.
CC: libav-stable@libav.org
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
The buffer allocation may be incorrect (e.g. with an alpha plane),
and currently causes the buffer to be set to NULL by init_put_bits,
causing a crash later on.
So, detect that situation, and if detected, reallocate the buffer
and ask for a sample that shows the problem.
CC: libav-stable@libav.org
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
If the allocated size, despite best efforts, is too small, exit
with the appropriate error.
CC: libav-stable@libav.org
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
The LZMA support is a semi-official extension supported by libtiff 4.0.0
and later.
Signed-off-by: Diego Elio Pettenò <flameeyes@flameeyes.eu>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
The reasoning behind this addition is that various third party
applications are interested in getting some motion information out of a
video "for free" when it is available.
It was considered to export other information as well (such as the intra
information about the block, or the quantization) but the structure
might have ended up into a half full-generic, half full of codec
specific cruft. If more information is necessary, it should either be
added in the "flags" field of the AVMotionVector structure, or in
another side-data.
This commit also includes an example exporting them in a CSV stream.
Some files seem to have an off-by-one error. In most cases, it appears to
be on the image width. Therefore, if the decoded image doesn't fit in the
screen:
- If it is wider than the screen (and the lzw decoding buffer), reject it;
- Otherwise, decode the indicated amount, but only write a truncated amount
to the screen.
Fixes ticket #3538.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The raw coded bits are extracted prior to decorrelation, as is correctly
performed by the decoder, and not after.
Fixes ticket #2768.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'efd26bedec9a345a5960dbfcbaec888418f2d4e6':
build: Add explanatory comments to (optimization) blocks in the Makefiles
Conflicts:
libavcodec/ppc/Makefile
libavcodec/x86/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This should help to clarify the API.
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The encoder produces files that are no longer compatible with previous
versions of the decoder, and may actually cause decoding issues for other
software, so indicate that change to allow decoder quirks.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Fixes out of array read
Fixes: yuv111_no_compr_crash.avi
Found-by: Piotr Bandurski <ami_stuff@o2.pl>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The packet buffer allocation considered as dct-coded, while it is
actually run-coded and thus requires a larger buffer.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
SMPTE 268M-2003 specifies that each line starts at a 4-bytes boundary.
Therefore, modify correspondingly the input buffer strides and size.
Partially fixes ticket #3692: DLAD_8b_3c_big.dpx still has inverted
colors, which might be related to endianness.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
S268M-2003 specifies that each line start is aligned on a 4-byte boundary.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Hendrik Leppkes <h.leppkes@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
It now does 12 samples per iteration, up from 4.
From 1.8 to 3.2 times faster again. 3.6 to 5.7 times faster overall.
Runtime is reduced by a further 2 to 18%. Overall runtime reduced by
4 to 50%.
Same conditions as before apply.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This reduces code duplication and differences with the fork.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
From 1.8 to 2.4 times faster. Runtime is reduced by 2 to 39%. The
speed-up generally increases with compression_level.
This lpc encoder is not used with levels < 3 so it provides no speed-up
in these cases.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Thanks to Pascal Massimino and Michael Militzer for permission to use under LGPL
The xvid idct code is from xvid, and nearly unchanged to make future syncing easy
the integration into ffmpeg is done by the commiter
the commit message is written by the commiter
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'da7d839a0d3ec40423a665dc85e0cfaed3f92eb8':
ffv1dec: check that global parameters do not change in version 0/1
Conflicts:
libavcodec/ffv1dec.c
See: b05cd1ea7e
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Fixes mismatch in first keyframe in sample
ffvp9_fails_where_libvpx.succeeds.webm from ticket 3849. There's still
a second mismatch a few frames into the sample.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Such changes are neither allowed nor supported
Found-by: ami_stuff
Bug-Id: CVE-2013-7020
CC: libav-stable@libav.org
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Reduces the number of calls to tmvp derivation from 933685 to 586271 on
a sequence.
Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The position is either rounded or not checked, so delay the wait to
check the proper value.
Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
After finishing parsing VPS/SPS/PPS/slice header, check remaining bits,
and if an overconsumption occurred, report invalid data.
Liked-by: BBB
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'cc1d8c54c19dd14fb851e3e7a7793d6b3bd75e94':
avcodec: Postpone FF_IDCT_XVIDMMX removal until the next version bump
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This makes the SPS parsing a little, but barely, safer.
Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
They might be left uninitialized otherwise since 3ad04608.
Fixes ticket #3840.
Found-by: Carl Eugen Hoyos <ce@hoyos.ws>
Reported-by: Piotr Bandurski <ami_stuff@o2.pl>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
mplayer-specifc hacks should not be in our codebase. mplayer should fix
its own code. It is not our responsibility to work around their broken
code.
This reverts commit e8e575633f.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Only use PAL8 if palette is present, else use GRAY8 for pixfmt.
Instead of simulating a grayscale palette, use real grayscale pixels, if no
palette is actually defined.
Signed-off-by: Diego Elio Pettenò <flameeyes@flameeyes.eu>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
It previously used the output, cropped size, causing overreads/writes.
Fixes ticket #3839.
This issue was introduced by d249e682, which is not part of any release
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
It causes build failures in some cases and the functions are provided by
libavutil so the wraper should not be needed anymore
Found-by: jamrial
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Some applications still use this deprecated API
Its not nice to remove it when its still in use and as long as it doesnt
cause us any work to keep it.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This prevents a build failure when bumping.
the uses could easily be updated / removed, if people prefer.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
(cherry picked from commit eedc3f36532e4c6de782fe1c2dc59d192418a8fc)
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'f4c444e17d137c786f0ed2da0e5943df505d5f9e':
Postpone API-incompatible changes until the next bump.
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'a7a17e3f1915ce69b787dc58c5d8dba0910fc0a4':
hevc_filter: move some conditions out of loops
Conflicts:
libavcodec/hevc_filter.c
This is possibly less readable than the variant used before.
Thus please take a look and if people agree its worse, dont
hesitate to revert.
See: 83976e40e8
Merged-by: Michael Niedermayer <michaelni@gmx.at>
1) each of the loops run within a single CTB, so the relevant reference
list is constant
2) when that CTB is, or lies on the same slice as, the current one, we
can use a simple access instead of a relatively expensive call to
ff_hevc_get_ref_list()
* commit '84d173d3de97c753234ab0c0b50551d51413d663':
xvididct: Ensure that the scantable permutation is always set correctly
Merged-by: Michael Niedermayer <michaelni@gmx.at>
It allows attaching other external, opaque data to the frame and passing it
through the reordering process, for cases when the caller wants other data
than just the plain packet pts. There is no way to cleanly achieve this
without the field.
Used to expose ff_raw_pix_fmt_tags[] to other libav* libraries
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'c7d9b473e28238d4a4ef1b7e8b42c1cca256da36':
cdgraphics: do not return 0 from the decode function
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The input data must remain constant, make a copy instead. This is in
theory a performance hit, but since I failed to find any samples
using this feature, this should not matter in practice.
Also, check the size of the header, avoiding invalid reads on truncated
data.
CC:libav-stable@libav.org
The x86 asm expects int32_t so use that type.
Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The previous implementation of the parser made four passes over each input
buffer (reduced to two if the container format already guaranteed the input
buffer corresponded to frames, such as with MKV). But these buffers are
often 200K in size, certainly enough to flush the data out of L1 cache, and
for many CPUs, all the way out to main memory. The passes were:
1) locate frame boundaries (not needed for MKV etc)
2) copy the data into a contiguous block (not needed for MKV etc)
3) locate the start codes within each frame
4) unescape the data between start codes
After this, the unescaped data was parsed to extract certain header fields,
but because the unescape operation was so large, this was usually also
effectively operating on uncached memory. Most of the unescaped data was
simply thrown away and never processed further. Only step 2 - because it
used memcpy - was using prefetch, making things even worse.
This patch reorganises these steps so that, aside from the copying, the
operations are performed in parallel, maximising cache utilisation. No more
than the worst-case number of bytes needed for header parsing is unescaped.
Most of the data is, in practice, only read in order to search for a start
code, for which optimised implementations already existed in the H264 codec
(notably the ARM version uses prefetch, so we end up doing both remaining
passes at maximum speed). For MKV files, we know when we've found the last
start code of interest in a given frame, so we are able to avoid doing even
that one remaining pass for most of the buffer.
In some use-cases (such as the Raspberry Pi) video decode is handled by the
GPU, but the entire elementary stream is still fed through the parser to
pick out certain elements of the header which are necessary to manage the
decode process. As you might expect, in these cases, the performance of the
parser is significant.
To measure parser performance, I used the same VC-1 elementary stream in
either an MPEG-2 transport stream or a MKV file, and fed it through avconv
with -c:v copy -c:a copy -f null. These are the gperftools counts for
those streams, both filtered to only include vc1_parse() and its callees,
and unfiltered (to include the whole binary). Lower numbers are better:
Before After
File Filtered Mean StdDev Mean StdDev Confidence Change
M2TS No 861.7 8.2 650.5 8.1 100.0% +32.5%
MKV No 868.9 7.4 731.7 9.0 100.0% +18.8%
M2TS Yes 250.0 11.2 27.2 3.4 100.0% +817.9%
MKV Yes 149.0 12.8 1.7 0.8 100.0% +8526.3%
Yes, that last case shows vc1_parse() running 86 times faster! The M2TS
case does show a larger absolute improvement though, since it was worse
to begin with.
This patch has been tested with the FATE suite (albeit on x86 for speed).
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Initialise VC1DSPContext for parser as well as for decoder.
Note, the VC-1 code doesn't actually use the function pointer yet.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
The unpacks/shuffles later on makes it unnecessary.
Before:
1508 decicycles in h, 2096759 runs, 393 skips
2512 decicycles in v, 2095422 runs, 1730 skips
After:
1477 decicycles in h, 2096745 runs, 407 skips
2484 decicycles in v, 2095297 runs, 1855 skips
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The rationale is that you have a packed format in form
<greyscale sample> <alpha sample> <greyscale sample> <alpha sample>
and shortening greyscale to 'G' might make one thing about Greenscale instead.
An alias pixel format and color space name are provided for compatibility.
* commit 'f89d76c10355242c39b08f253c1d1524f45ef778':
mpeg4video: Initialize xvididct for all threads
Conflicts:
libavcodec/mpeg4videodec.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Bug-Id: CVE-2013-0868
inspired by a patch from Michael Niedermayer <michaelni@gmx.at>
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Diego Biurrun <diego@biurrun.de>
CC: libav-stable@libav.org
llvm's integrated assembler does not accept spaces as macro argument
delimiter when targeting darwin. Using a explicit delimiter is a good
idea in principle since it makes case like 'macro 4 -2' vs 'macro 4 - 2'
clear.
* commit 'c697c590fbf296b1679b80c8f4071e4c8a6c884b':
lcl: Disentangle pointers to input data and decompression buffer
Conflicts:
libavcodec/lcldec.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '7835c24e19d9e1cb43fba5a02ce9d81d518f1300':
dv: Update DV-profile-related functions to current public API
Conflicts:
libavcodec/dvdec.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Up to four instructions less depending on function and instruction set.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'ffa4d4ef0bd66c4e8bde7357b69bdedc78123ea8':
ppc: fft: Build AltiVec optimizations in the standard way
Conflicts:
libavcodec/ppc/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '5735552f1f17ea01dcbc99b08f54b5bf52176a8f':
pngenc: Drop pointless pointer cast in png_write_row()
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'a786c8259dafeca9744252230b5d78f67810770c':
idct: Split off Xvid IDCT
Conflicts:
libavcodec/Makefile
libavcodec/mpeg4videodec.c
libavcodec/x86/Makefile
libavcodec/x86/idctdsp_init.c
This split is somewhat restructured leaving the xvid IDCT available
outside mpeg4 if manually selected.
The code also could not be merged unchanged as it conflicted with a
bugfix in FFmpeg
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '03c9f357a4c2307a7913cea2cbf0ba817e80beb6':
ppc: idctdsp: Immediately return if no AltiVec is available
Conflicts:
libavcodec/ppc/idctdsp.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Properly address CVE-2011-3946 and parse bitstream as described in the spec.
CC: libav-stable@libav.org
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
In order not to break a sequence like "SPS IDR SPS IDR", the boolean
telling that the SPS/PPS has been seen should always be set.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
source index, as well as dest one, is unconditionnaly set afterwards,
before being effectively used.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Make sure the buffer size does not exceed the expected
RLE size.
Prevent an out of array bound write.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Bug-Id: CVE-2013-0852
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Cosmetic change. No measurable difference in speed.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
If there are consecutive IDR pictures, then SPS/PPS should be prepended
to all of them, not only the first one.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'ccbf370f2000b9b27f4af259c23007d67f7ea46e':
mpegvideo: move vol_control_parameters to the only place it is used
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This should help cache locality. On win64:
Before: 1397x cycles, 16216 bytes
After: 1369x cycles, 16040 bytes
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '019a28cd630286ecb2b06ee62025a17c821b493e':
sanm: Use correct printf conversion specifiers for POSIX int types
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'e76f2d11970484266e67a12961f2339a5c2fccf9':
hevc: eliminate the last element from TransformTree
Conflicts:
libavcodec/hevc.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '0daa2554636ba1d31f3162ffb86991e84eb938a8':
hevc: do not store the transform inter_split flag in the context
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'c5fca0174db9ed45be821177f49bd9633152704d':
lavc: add a property for marking codecs that support frame reordering
Conflicts:
doc/APIchanges
libavcodec/avcodec.h
libavcodec/codec_desc.c
libavcodec/version.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
- They are be replaced by passing additional parameters to the transform
functions.
- Adaptation to 4:2:2
Signed-off-by: Mickaël Raulet <mraulet@insa-rennes.fr>
cherry picked from commit f518bb22531c648f1c37f978b0c7ad2e71e04c25
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This provides a public sustainable API/ABI for DCT functions.
Only externally used dct functions are included.
The structure is extensible without ABI issues compared to the
existing dct contexts.
See Mailing list and IRC log of 2014-07-26/27
Reviewed-by: ubitux
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '541427ab4d5b4b6f5a90a687a06decdb78e7bc3c':
eamad: use the bytestream2 API instead of AV_RL
Conflicts:
libavcodec/eamad.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '53abe32409f13687c864b3cda077a1aa906a2459':
avcodec: Mark argument in av_{parser|hwaccel|bitstream_filter}_next as const
Conflicts:
libavcodec/avcodec.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '73bb8f61d48dbf7237df2e9cacd037f12b84b00a':
hevcdsp: remove an unneeded variable in the loop filter
Conflicts:
libavcodec/hevc_filter.c
See: d7e162d46b
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Only 8-bit and 10-bit idct_dc() functions are included (adding others should be trivial).
Benchmarks on an Intel Core i5-4200U:
idct8x8_dc
SSE2 MMXEXT C
cycles 22 26 57
idct16x16_dc
AVX2 SSE2 C
cycles 27 32 249
idct32x32_dc
AVX2 SSE2 C
cycles 62 126 1375
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Additional contributions by James Almer <jamrial@gmail.com>,
Carl Eugen Hoyos <cehoyos@ag.or.at>, Fiona Glaser <fiona@x264.com> and
Anton Khirnov <anton@khirnov.net>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
* commit 'd8520d3ee032bf18f28897e0109f44b405caf5e3':
mpegvideo: Move QMAT_SHIFT* defines to the only place they are used
Conflicts:
libavcodec/mpegvideo.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '4fbb62a21bd04bf261da2382d5ba6c249c702af8':
mpegvideo: Move ME_MAP_* defines to the only place they are used
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'ff85334375c6733c6116ea3686f128b4a11f33e7':
mpegvideo: Drop unused MPEG_BUF_SIZE and CHROMA_444 defines
CHROMA_444 is not removed as we do support CHROMA_444 and use the
identifier
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '165e9df19567ec0b6abee1ee2c26027e6d7aa7bf':
fft-test: Pass the right struct members instead of casting
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '7fb993d338d88f2f62e0a358b6c9f3eb9a3a08ac':
qpeldsp: Mark source pointer in qpel_mc_func function pointer const
Conflicts:
libavcodec/h264qpel_template.c
libavcodec/x86/cavsdsp.c
libavcodec/x86/rv40dsp_init.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The typedefs also exist in the avfft.h header and since typedefs cannot be
legally redefined in C, the code fails to compile with some compilers.
This reverts commits 11c7155cce and 57f1b1dcc7.
Reduces on a sequence number of calls from 933685 to 586271.
Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '2b6ab3a2bd7e7cee5e7a55dd2e48b8feb4a826bb':
mpegvideo: Move QUANT_BIAS_SHIFT define to the only place it is used
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '57f1b1dcc77a93c2a5c503d4e47fe2f567cf9db5':
fft-test: Drop unnecessary #ifdefs around header includes
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The position is either rounded or not checked, so delay the wait to
check the proper value.
Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
beta0 and beta1 will always be the same within a CU
Signed-off-by: Mickaël Raulet <mraulet@insa-rennes.fr>
cherry picked from commit 4a23d824741a289c7d2d2f2871d1e2621b63fa1b
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-new rext bitstreams:
PERSIST_RPARAM_A_RExt_Sony_1.bit ok =
QMATRIX_A_RExt_Sony_1.bit ok =
SAO_A_RExt_MediaTek_1.bit ok =
(cherry picked from commit cdea029d452c521f8e5bcbe589f44b13a4011604)
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '6869612f5c7d4d2f20f69a5658328a761deadb1c':
arm: Macroize the test for 'setend' CPU instruction support
Conflicts:
libavcodec/arm/h264dsp_init_arm.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Intrinsics only used on aarch64 since the existing ARMv7 NEON asm
is slightly faster (Cortex-A9, gcc-4.8, micro-benchmarks and full
decoding time).
Signed-off-by: James Yu <james.yu@linaro.org>
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
* commit '16b7328058fa600d5158c84d9cc621a134eb88bc':
build: Conditionally build and run DCT test program
Conflicts:
libavcodec/Makefile
libavcodec/dct-test.c
tests/fate/libavcodec.mak
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'bd499d9af668aef979ec9f3f3215b8dd508c7ec1':
build: Conditionally build and test iirfilter
Conflicts:
libavcodec/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
They are unneeded and make adding elements slightly harder as they
would need to be constantly updated
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'c39059bea3adebcd888571d1181db215eee54495':
h264: Fix direct temporal mvs for bottom-field-first poc order
Conflicts:
libavcodec/h264_direct.c
See: ebd1c505d2
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '4de8b60684ce13dff3e3d372dae4f49b9e53f755':
idct: Move arm-specific declarations to a header in the arm directory
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Such files can be created using the --bff x264 option.
Sample-Id: h264_direct_temporal_mvs_bff.mkv
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
* commit '9f99a5f1d078721a30a76aec27c58805b7b87e58':
mpegencconetxt: Move rv10-specific orig_width/orig_height where they belong
Merged-by: Michael Niedermayer <michaelni@gmx.at>
When dealing with MVs, both components may be processed at a time.
On Win64, 560 to 539 cycles for derive_spatial_merge_candidates.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
There's a lag of one CTB line for SAO behind deblocking filter, except for
last line. However, once SAO has been completed on a line, all its pixels,
i.e. up to y+ctb_size are filtered and ready to be used as reference.
Without SAO, when deblocking filter finishes a CTB line, only the bottom
bottom 4 pixels may be filtered when next CTB is process by the deblocing.
The await_progess for hevc then checks whether the bottom pixels of a PU
requires access beyond that point, so the reporting should effectively
report up to the the above limits.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '1a583c0c60240adb8fa6620c6df33f1a0a0fe5d9':
fdct: Move ppc-specific declarations to a header in the ppc directory
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '5dcc201505f71b1e73e9eef12ce89d4eed252ad0':
simple_idct: Move x86-specific declarations to a header in the x86 directory
Conflicts:
libavcodec/x86/simple_idct.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '85cabb8d002f2cd100ced5cc17d87bfc9460d314':
fdct: Move x86-specific declarations to a header in the x86 directory
Merged-by: Michael Niedermayer <michaelni@gmx.at>
- adding one extra pixel all around the frame
- do not copy when SAO is not applied
5% improvement
cherry picked from commit 10fc29fc19a12c4d8168fbe1a954b76386db12d0
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '24af1aa0f70362a66cda04c9d7cd012e019f5572':
fft: Convert FFT/MDCT permutation type #defines to enums
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '746ad4e0df7faf93329804e412ec53c1d929a75b':
dct-test: Improve CPU flags struct member name
Conflicts:
libavcodec/dct-test.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'cb44b21da1f59923be577f08c267ec270529be97':
dct-test: Move cpu_flags variable out of global scope
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '7e18a727d2c2a19f22fcf68875d1b05fd2eafcef':
arm: cosmetics: Consistently use lowercase for shift operators
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '87552d54d3337c3241e8a9e1a05df16eaa821496':
armv6: Accelerate ff_fft_calc for general case (nbits != 4)
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The previous implementation targeted DTS Coherent Acoustics, which only
requires nbits == 4 (fft16()). This case was (and still is) linked directly
rather than being indirected through ff_fft_calc_vfp(), but now the full
range from radix-4 up to radix-65536 is available. This benefits other codecs
such as AAC and AC3.
The implementaion is based upon the C version, with each routine larger than
radix-16 calling a hierarchy of smaller FFT functions, then performing a
post-processing pass. This pass benefits a lot from loop unrolling to
counter the long pipelines in the VFP. A relaxed calling standard also
reduces the overhead of the call hierarchy, and avoiding the excessive
inlining performed by GCC probably helps with I-cache utilisation too.
I benchmarked the result by measuring the number of gperftools samples that
hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
specifically in the FFT routines (fft4() to fft512() and pass()) for the
same sample AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
Audio decode 2245.5 53.1 1599.6 43.8 100.0% +40.4%
FFT routines 940.6 22.0 348.1 20.8 100.0% +170.2%
Signed-off-by: Martin Storsjö <martin@martin.st>
The previous implementation targeted DTS Coherent Acoustics, which only
requires mdct_bits == 6. This relatively small size lent itself to
unrolling the loops a small number of times, and encoding offsets
calculated at assembly time within the load/store instructions of each
iteration.
In the more general case (codecs such as AAC and AC3) much larger arrays
are used - mdct_bits == [8, 9, 11]. The old method does not scale for
these cases, so more integer registers are used with non-unrolled versions
of the loops (and with some stack spillage). The postrotation filter loop
is still unrolled by a factor of 2 to permit the double-buffering of some
VFP registers to facilitate overlap of neighbouring iterations.
I benchmarked the result by measuring the number of gperftools samples
that hit anywhere in the AAC decoder (starting from aac_decode_frame())
or specifically in ff_imdct_half_c / ff_imdct_half_vfp, for the same
example AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
aac_decode_frame 2368.1 35.8 2117.2 35.3 100.0% +11.8%
ff_imdct_half_* 457.5 22.4 251.2 16.2 100.0% +82.1%
Signed-off-by: Martin Storsjö <martin@martin.st>
These where removed by libav in
See: git show -C 2d60444331
diff --git a/libavcodec/dsputil.c b/libavcodec/me_cmp.c
similarity index 98%
rename from libavcodec/dsputil.c
rename to libavcodec/me_cmp.c
index ba71a99..9fcc937 100644
--- a/libavcodec/dsputil.c
+++ b/libavcodec/me_cmp.c
@@ -1,8 +1,4 @@
/*
- * DSP utils
- * Copyright (c) 2000, 2001 Fabrice Bellard
- * Copyright (c) 2002-2004 Michael Niedermayer <michaelni@gmx.at>
- *
* This file is part of Libav.
*
* Libav is free software; you can redistribute it and/or
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>