This further speeds up runtime initialization, with identical generated tables.
Sample benchmark (x86-64, Haswell, GNU/Linux):
old:
34441423 decicycles in mpegaudio_tableinit, 8192 runs, 0 skips
new:
10776291 decicycles in mpegaudio_tableinit, 8192 runs, 0 skips
Most low hanging fruit is taken care of here. For some idea, note that
83,064 array elements totalling 233,722 bytes need to be initialized.
Thus, with this patch, we average ~ 12.9 cycles per element or ~ 4.6
cycles per byte.
Reviewed-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
This does some miscellaneous stuff mainly avoiding the usage of pow to
achieve significant speedups. This is not speed critical, but is
unnecessary latency and cycles wasted for a user.
All tables tested and are identical to the old ones
(bit-exact even in floating point case).
Sample benchmark (x86-64, Haswell, GNU/Linux):
old:
102329530 decicycles in mpegaudio_tableinit, 1 runs, 0 skips
new:
34111900 decicycles in mpegaudio_tableinit, 1 runs, 0 skips
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
cbrtf() took floats but it represented 1/3 exactly
and even if not more precission should be better in theory
for the table generation
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
You cannot count on it being present on all systems, and you
cannot include libm.h in a host tool, so just hard code a baseline
implementation.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
You cannot count on them being present on all systems, and you
cannot include libm.h in a host tool, so just hard code baseline
implementations.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* qatar/master:
ffmpeg: get rid of the -vglobal option.
dct32: Add AVX implementation of 32-point DCT
dct32: Change pass 6 permutation to allow for AVX implementation
dct32: port SSE 32-point DCT to YASM
multiple inclusion guard cleanup
avio: document buffer must created with av_malloc() and friends
avio: check AVIOContext malloc failure
swscale: point out an alternative to sws_getContext
svq3: Do initialization after parsing the extradata
add changelog entries for 0.7_beta2
mp3lame: add #include required for AV_RB32 macro.
Conflicts:
Changelog
libavcodec/svq3.c
libavcodec/x86/dct32_sse.c
libavfilter/vsrc_buffer.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
Fix compilation of iirfilter-test.
libx264: handle closed GOP codec flag
lavf: remove duplicate assignment in avformat_alloc_context.
lavf: use designated initializers for AVClasses.
flvdec: clenup debug code
asfdec: fix possible overread on broken files.
asfdec: do not fall back to binary/generic search
asfdec: reindent after previous commit c7bd5ed
asfdec: fallback to binary search internally
mpegaudio: add _fixed suffix to some names
Modify x86util.asm to ease transitioning to 10-bit H.264 assembly.
dct: build dct32 as separate object files
qdm2: include correct header for rdft
Conflicts:
ffpresets/libx264-fast.ffpreset
ffpresets/libx264-fast_firstpass.ffpreset
ffpresets/libx264-faster.ffpreset
ffpresets/libx264-faster_firstpass.ffpreset
ffpresets/libx264-medium.ffpreset
ffpresets/libx264-medium_firstpass.ffpreset
ffpresets/libx264-placebo.ffpreset
ffpresets/libx264-placebo_firstpass.ffpreset
ffpresets/libx264-slow.ffpreset
ffpresets/libx264-slow_firstpass.ffpreset
ffpresets/libx264-slower.ffpreset
ffpresets/libx264-slower_firstpass.ffpreset
ffpresets/libx264-superfast.ffpreset
ffpresets/libx264-superfast_firstpass.ffpreset
ffpresets/libx264-ultrafast.ffpreset
ffpresets/libx264-ultrafast_firstpass.ffpreset
ffpresets/libx264-veryfast.ffpreset
ffpresets/libx264-veryfast_firstpass.ffpreset
ffpresets/libx264-veryslow.ffpreset
ffpresets/libx264-veryslow_firstpass.ffpreset
libavformat/flvdec.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This adds a _fixed suffix to the fixed-point versions of things
with both float and fixed-point variants. This makes it more
consistent with other dual-implementation things, e.g. fft.
Signed-off-by: Mans Rullgard <mans@mansr.com>
* qatar/master:
mpegaudiodec: group #includes more sanely
mpegaudio: remove #if 0 blocks
ffmpeg.c: reset avoptions after each input/output file.
ffmpeg.c: store per-output stream sws flags.
mpegaudio: remove CONFIG_MPEGAUDIO_HP option
mpegtsenc: Clear st->priv_data when freeing it
udp: Fix receiving RTP data over multicast
rtpproto: Remove an unused variable
regtest: fix wma tests
NOT pulled: mpegaudio: remove CONFIG_AUDIO_NONSHORT
regtest: separate flags for encoding and decoding
Conflicts:
ffmpeg.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The low quality mode is off by default and never tested. The high
quality mode is also plenty fast enough.
Signed-off-by: Mans Rullgard <mans@mansr.com>
config.h must not be included in that file. The table generator runs
on the host system, but config.h describes the target.
Originally committed as revision 20620 to svn://svn.ffmpeg.org/ffmpeg/trunk