This uses Stein's binary GCD algorithm:
https://en.wikipedia.org/wiki/Binary_GCD_algorithm
to get a roughly 4x speedup over Euclidean GCD on standard architectures
with a compiler intrinsic for ctzll, and a roughly 2x speedup otherwise.
At the moment, the compiler intrinsic is used on GCC and Clang due to
its easy availability.
Quick note regarding overflow: yes, subtractions on int64_t can, but the
llabs takes care of that. The llabs is also guaranteed to be safe, with
no annoying INT64_MIN business since INT64_MIN being a power of 2, is
shifted down before being sent to llabs.
The binary GCD needs ff_ctzll, an extension of ff_ctz for long long (int64_t). On
GCC, this is provided by a built-in. On Microsoft, there is a
BitScanForward64 analog of BitScanForward that should work; but I can't confirm.
Apparently it is not available on 32 bit builds; so this may or may not
work correctly. On Intel, per the documentation there is only an
intrinsic for _bit_scan_forward and people have posted on forums
regarding _bit_scan_forward64, but often their documentation is
woeful. Again, I don't have it, so I can't test.
As such, to be safe, for now only the GCC/Clang intrinsic is added, the rest
use a compiled version based on the De-Bruijn method of Leiserson et al:
http://supertech.csail.mit.edu/papers/debruijn.pdf.
Tested with FATE, sample benchmark (x86-64, GCC 5.2.0, Haswell)
with a START_TIMER and STOP_TIMER in libavutil/rationsl.c, followed by a
make fate.
aac-am00_88.err:
builtin:
714 decicycles in av_gcd, 4095 runs, 1 skips
de-bruijn:
1440 decicycles in av_gcd, 4096 runs, 0 skips
previous:
2889 decicycles in av_gcd, 4096 runs, 0 skips
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* commit '94a417acc05cc5151b473abc0bf51fad26f8c5a0':
mathematics: remove asserts from av_rescale_rnd()
Conflicts:
libavutil/mathematics.c
The asserts are left in place for now as no code checks the return
value, but we sure can change this if application developers
prefer
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '930c9d4373e0f3cb7c64fcfc129127a309f6d066':
avutil: Duplicate ff_log2_tab instead of sharing it across libs
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '9734b8ba56d05e970c353dfd5baafa43fdb08024':
Move avutil tables only used in libavcodec to libavcodec.
Conflicts:
libavcodec/mathtables.c
libavutil/intmath.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (36 commits)
adpcmenc: Use correct frame_size for Yamaha ADPCM.
avcodec: add ff_samples_to_time_base() convenience function to internal.h
adx parser: set duration
mlp parser: set duration instead of frame_size
gsm parser: set duration
mpegaudio parser: set duration instead of frame_size
(e)ac3 parser: set duration instead of frame_size
flac parser: set duration instead of frame_size
avcodec: add duration field to AVCodecParserContext
avutil: add av_rescale_q_rnd() to allow different rounding
pnmdec: remove useless .pix_fmts
libmp3lame: support float and s32 sample formats
libmp3lame: renaming, rearrangement, alignment, and comments
libmp3lame: use the LAME default bit rate
libmp3lame: use avpriv_mpegaudio_decode_header() for output frame parsing
libmp3lame: cosmetics: remove some pointless comments
libmp3lame: convert some debugging code to av_dlog()
libmp3lame: remove outdated comment.
libmp3lame: do not set coded_frame->key_frame.
libmp3lame: improve error handling in MP3lame_encode_init()
...
Conflicts:
doc/APIchanges
libavcodec/libmp3lame.c
libavcodec/pcxenc.c
libavcodec/pnmdec.c
libavcodec/pnmenc.c
libavcodec/sgienc.c
libavcodec/utils.c
libavformat/hls.c
libavutil/avutil.h
libswscale/x86/swscale_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (21 commits)
swscale: Add Doxygen for hyscale_fast/hScale.
fate: enable lavfi-pixmt tests on big endian systems
PPC: swscale: disable altivec functions for unsupported formats
fate: merge identical pixdesc_be/le tests
swscale: Add Doxygen for yuv2planar*/yuv2packed* functions.
build: call texi2pod.pl with full path instead of symlink
build: include sub-makefiles using full path instead of symlinks
swscale: update big endian reference values after dff5a835.
wavpack: skip blocks with no samples
cosmetics: remove outdated comment that is no longer true
build: replace some addprefix/addsuffix with substitution refs
avutil: Remove unused arbitrary precision integer code.
configure: Drop check for availability of ten assembler operands.
aacenc: Save channel configuration for later use.
aacenc: Fix codebook trellising for zeroed bands.
swscale: change prototypes of scaled YUV output functions.
swscale: re-add support for non-native endianness.
swscale: disentangle yuv2rgbX_c_full() into small functions.
swscale: split yuv2packed[12X]_c() remainders into small functions.
swscale: split yuv2packedX_altivec in smaller functions.
...
Conflicts:
Makefile
configure
libavcodec/x86/dsputil_mmx.c
libavfilter/Makefile
libavformat/Makefile
libavutil/integer.c
libavutil/integer.h
libswscale/swscale.c
libswscale/swscale_internal.h
libswscale/x86/swscale_template.c
tests/ref/lavfi/pixdesc_le
tests/ref/lavfi/pixfmts_scale
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Passing an explicit filename to this command is only necessary if the
documentation in the @file block refers to a file different from the
one the block resides in.
Originally committed as revision 22921 to svn://svn.ffmpeg.org/ffmpeg/trunk
This reduces the number of false dependencies on header files and
speeds up compilation.
Originally committed as revision 22407 to svn://svn.ffmpeg.org/ffmpeg/trunk
Otherwise doxygen complains about ambiguous filenames when files exist
under the same name in different subdirectories.
Originally committed as revision 16912 to svn://svn.ffmpeg.org/ffmpeg/trunk