Yields 2x improvement in function performance, and boosts aac encoding
speed by ~ 4% overall. Sample benchmark (Haswell+GCC under -march=native):
after:
ffmpeg -i sin.flac -acodec aac -y sin_new.aac 5.22s user 0.03s system 105% cpu 4.970 total
before:
ffmpeg -i sin.flac -acodec aac -y sin_new.aac 5.40s user 0.05s system 105% cpu 5.162 total
Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Signed-off-by: Ganesh Ajjanagadde <gajjanag@gmail.com>
Understanding the mips32r6 and mips64r6 ISAs in the configure script is
not enough. In order to have full support for MIPS R6 in FFmpeg we need
to be able to build it, and for that we need to make sure we don't use
incompatible assembler code which makes the build fail. Ifdefing the
offending code is sufficient to fix the problem.
Signed-off-by: Vicente Olivert Riera <Vincent.Riera@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This ensures gcc does not create unnecessary
loads or stores and possibly even does not vectorize
the negation.
Speeds up mp3 to aac transcoding with default settings
by 10% when using "gcc (Debian 5.3.1-10) 5.3.1 20160224".
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
The change of bps from 0 doesn't contain any info useful to the
user. This message is now at info log level only if the original
value is !=0, otherwise pushed back to debug log level. The
original value is displayed additionally.
Signed-off-by: Moritz Barsnick <barsnick@gmx.net>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The first X96 channel set can have more channels than core, causing X96
decoding to be skipped. Clear the number of decoded X96 channels to zero
in this rudimentary case.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Umair Khan <omerjerk@gmail.com>
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
I cannot see any point whatsoever to use
double here instead of float, the results
are likely identical in all cases..
Using float allows for much more
efficient use of SIMD.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Fixes compilation of fft with hardcoded tables
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
This was a leftover from before the slices were encoded in parallel.
Since the put_bits context is initialized per slice aligning it
aferwards is pointless.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
This commit solves most of the crashes and issues with the encoder and
the bitrate setting. Now the encoder will always allocate the absolute
lowest amount of memory regardless of what the bitrate has been set to.
Therefore if a user inputs a very low bitrate the encoder will use the
maximum possible quantization (basically zero out all coefficients),
allocate a packet and encode it. There is no coupling between the
bitrate and the allocation size and so no crashes because the buffer
isn't large enough.
The maximum quantizer was raised to the size of the table now to both
keep the overshoot at ridiculous bitrates low and to improve quality
with higher bit depths (since the coefficients grow larger per transform
quantizing them to the same relative level requires larger quantization
indices).
Since the quantization index start follows the previous quantization
index for that slice, the quantization step was reduced to a static 1
to improve performance. Previously with quant/5 the step was usually
set to 0 upon start (and was later clipped to 1), that isn't a big change.
As the step size increases so does the amount of bits leftover and so
the redistribution algorithm has to iterate more and thus waste more
time.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
AVClass is now a const, the rest are no-op.
* commit '5e555f93009f0605db120eec78262d0fe337e645':
mpeg12enc: always write closed gops for intra only outputs
h264: Add an AVClass pointer to H264Context
libx264: Fix noise_reduction option assignment
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
These macros were added in OS X 10.11, and the file compiles without warnings
on both 10.10 and 10.11 with them removed.
Thanks to mark4o on IRC for pointing out the failure and testing the patch.
Autodetected by default. Encode using -codec:v h264_videotoolbox.
Signed-off-by: Rick Kern <kernrj@gmail.com>
Signed-off-by: wm4 <nfxjfg@googlemail.com>
It makes no sense whatsoever to do this at each function call; we
already have a table for this.
Yields a 2x improvement in find_min_book (x86-64, Haswell+GCC):
ffmpeg -i sin.flac -acodec aac -y sin.aac
find_min_book
old
605 decicycles in find_min_book, 8388453 runs, 155 skips.9x
606 decicycles in find_min_book,16776912 runs, 304 skips.9x
607 decicycles in find_min_book,33553819 runs, 613 skips.2x
607 decicycles in find_min_book,67107668 runs, 1196 skips.3x
607 decicycles in find_min_book,134215360 runs, 2368 skips3x
new
359 decicycles in find_min_book, 8388552 runs, 56 skips.3x
360 decicycles in find_min_book,16777112 runs, 104 skips.1x
361 decicycles in find_min_book,33554218 runs, 214 skips.4x
361 decicycles in find_min_book,67108381 runs, 483 skips.5x
361 decicycles in find_min_book,134216725 runs, 1003 skips5x
and more importantly a non-negligible speedup (~ 8%) to overall AAC encoding:
old:
ffmpeg -i sin.flac -acodec aac -strict -2 -y sin_new.aac 6.82s user 0.03s system 104% cpu 6.565 total
new:
ffmpeg -i sin.flac -acodec aac -strict -2 -y sin_old.aac 6.24s user 0.03s system 104% cpu 5.993 total
This also improves accuracy of the expression by ~ 2 ulp in some cases.
Reviewed-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Signed-off-by: Ganesh Ajjanagadde <gajjanag@gmail.com>
This fixes a data race warning by ThreadSanitizer.
FrameThreadContext.die is read by all the worker threads but is not
protected by any mutex. Move it to PerThreadContext so that each worker
thread reads its own copy of |die|, which can then be protected with
PerThreadContext.mutex.
Signed-off-by: Wan-Teh Chang <wtc@google.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
This was a regression introduced by commit e7345abe05 which
enabled full use of the allocated packet but due to the overhead of
using field coding the buffer was too small and triggered warnings and
crashes.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
The fact that now all quantization indices costs are cached justifies
storing 20 more integers in a structure already allocated on heap.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>