29-105% faster apply_filter, 6-90% faster ape decoding on core2
(Any x86 other than core2 probably gets much less, since this is mostly due to ssse3 cachesplit avoidance and I haven't written the full gamut of other cachesplit modes.)
9-123% faster ape decoding on G4.
Originally committed as revision 20739 to svn://svn.ffmpeg.org/ffmpeg/trunk
necessary ff_cos_tabs tables are initialized.
Fixes issue 1507 (QDM2 broken since r20237 without hardcoded tables).
Originally committed as revision 20464 to svn://svn.ffmpeg.org/ffmpeg/trunk
While this "wastes" up to 2x32 bytes it makes the code slightly simpler and
less confusing.
Originally committed as revision 20449 to svn://svn.ffmpeg.org/ffmpeg/trunk
2.2x faster than C on conroe, 3.6x on penryn.
4-6% faster huffyuv decoding if using left or plane mode and yuv
Originally committed as revision 20287 to svn://svn.ffmpeg.org/ffmpeg/trunk
corresponding dsputil functions and remove their dependency on the FLAC
encoder.
Fixes Issue1486.
Originally committed as revision 20266 to svn://svn.ffmpeg.org/ffmpeg/trunk
initialized by ff_fft_init and using different code can result in slightly
different values, in addition it crashes when the tables are hardcoded.
On amd64 this slightly changes qdm2 output.
Originally committed as revision 20237 to svn://svn.ffmpeg.org/ffmpeg/trunk
--enable-hardcoded-tables was used.
Due to the size, the code for the tables is generated at compile time.
Originally committed as revision 20232 to svn://svn.ffmpeg.org/ffmpeg/trunk
The src3 and step arguments to vector_fmul_add_add() are always zero
and one, respectively. This removes these arguments from the function,
simplifies the code accordingly, and renames the function to better
match the new operation.
Originally committed as revision 20061 to svn://svn.ffmpeg.org/ffmpeg/trunk
This adds a function pointer for forward MDCT to FFTContext and
initialises it with the existing C function. ff_calc_mdct() is
changed to an inline function calling the selected version as
done for other fft/mdct functions.
Originally committed as revision 19818 to svn://svn.ffmpeg.org/ffmpeg/trunk
The DECLARE_ALIGNED_8 macro is defined to align to 16 bytes instead
the 8 suggested by the name on some CPUs. None of the uses of this
macro ever need 16-byte alignment, cases which once did having been
changed to always specify 16 bytes explicitly.
Originally committed as revision 19737 to svn://svn.ffmpeg.org/ffmpeg/trunk
Includes mmx2 asm for the various functions.
Note that the actual idct still does not have an x86 SIMD implemtation.
For wmv3 files using regular idct, the decoder just falls back to simple_idct,
since simple_idct_dc doesn't exist (yet).
Originally committed as revision 19204 to svn://svn.ffmpeg.org/ffmpeg/trunk
Scaling (i)MDCT output has no runtime overhead and can be used to improve
performance of audio codecs. All the changes are only needed in
'ff_mdct_init' function and slow down initialization a bit.
Originally committed as revision 18855 to svn://svn.ffmpeg.org/ffmpeg/trunk
Otherwise doxygen complains about ambiguous filenames when files exist
under the same name in different subdirectories.
Originally committed as revision 16912 to svn://svn.ffmpeg.org/ffmpeg/trunk