This separation allows these functions to be used in a cleaner
fashion from other codecs (e.g. qdm2) and simplifies creating
optimised versions of them.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This patch lets e.g. dsputil_init chose dsp functions with respect to
the bit depth to decode. The naming scheme of bit depth dependent
functions is <base name>_<bit depth>[_<prefix>] (i.e. the old
clear_blocks_c is now named clear_blocks_8_c).
Note: Some of the functions for high bit depth is not dependent on the
bit depth, but only on the pixel size. This leaves some room for
optimizing binary size.
Preparatory patch for high bit depth h264 decoding support.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
This reverts commit f8bed30d8b. The reason
for this is that the overlap filter, which runs after IDCT, should run
on unclamped values, and thus IDCT and put_pixels() cannot be merged if
we want to attempt to be bitexact.
According to ISO 9899:1999 S 6.5.7/4:
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits
are filled with zeros. If E1 has an unsigned type, the value of the
result is E1× 2^E2, reduced modulo one more than the maximum value
representable in the result type. If E1 has a signed type and
nonnegative value, and E1× 2^E2 is representable in the result type, then
that is the resulting value; otherwise, the behavior is undefined.
GCC 4.3 and later are more particular about signedness matching
in vector operations. The operations under if(rangered) were
missing assignments and thus had no effect.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This will be beneficial for use with the audio conversion API without
requiring it to depend on all of dsputil.
Signed-off-by: Mans Rullgard <mans@mansr.com>
mm_support() instead.
Reduce complexity and simplify pending move to libavutil.
Originally committed as revision 25074 to svn://svn.ffmpeg.org/ffmpeg/trunk
On PPC a leaf function has a 288-byte red zone below the stack pointer,
sparing these functions the chore of setting up a full stack frame.
When a function call is disguised within an inline asm block, the
compiler might not adjust the stack pointer as required before a
function call, resulting in the red zone being clobbered.
Moving the entire function to pure asm avoids this problem and also
results in somewhat better code.
Originally committed as revision 24044 to svn://svn.ffmpeg.org/ffmpeg/trunk
1.8x faster than altivec radix-2 on a G4
8% faster vorbis decoding
Patch (mostly) by Loren Merritt
Originally committed as revision 23956 to svn://svn.ffmpeg.org/ffmpeg/trunk
This checks which assembler syntax is supported and defines macros
for register names accordingly.
Originally committed as revision 23952 to svn://svn.ffmpeg.org/ffmpeg/trunk
Passing an explicit filename to this command is only necessary if the
documentation in the @file block refers to a file different from the
one the block resides in.
Originally committed as revision 22921 to svn://svn.ffmpeg.org/ffmpeg/trunk
This moves the H264-specific functions from DSPContext to the new
H264DSPContext. The code is made conditional on CONFIG_H264DSP
which is set by the codecs requiring it.
The qpel and chroma MC functions are not moved as these are used by
non-h264 code.
Originally committed as revision 22565 to svn://svn.ffmpeg.org/ffmpeg/trunk
This fixes a compilation issue on OS X 10.4, where some system headers were
included implicitly through dsputil_altivec.h (with _POSIX_C_SOURCE defined),
and other system headers included later, with _POSIX_C_SOURCE undefined at
that time.
Originally committed as revision 22327 to svn://svn.ffmpeg.org/ffmpeg/trunk
These macros are redundant. All uses are replaced with the generic
DECLARE_ALIGNED macro instead.
Originally committed as revision 22233 to svn://svn.ffmpeg.org/ffmpeg/trunk
29-105% faster apply_filter, 6-90% faster ape decoding on core2
(Any x86 other than core2 probably gets much less, since this is mostly due to ssse3 cachesplit avoidance and I haven't written the full gamut of other cachesplit modes.)
9-123% faster ape decoding on G4.
Originally committed as revision 20739 to svn://svn.ffmpeg.org/ffmpeg/trunk
The src3 and step arguments to vector_fmul_add_add() are always zero
and one, respectively. This removes these arguments from the function,
simplifies the code accordingly, and renames the function to better
match the new operation.
Originally committed as revision 20061 to svn://svn.ffmpeg.org/ffmpeg/trunk
Storing a single element from a vector where all elements have the same
value does not require an aligned destination. Which element is stored
depends on the alignment of the destination address, but since they all
have the same value, the result is the same regardless of the alignment.
Originally committed as revision 19696 to svn://svn.ffmpeg.org/ffmpeg/trunk
Instead of filling a local array with the desired value and loading it,
load a single element and vec_splat() it to fill the vector.
Originally committed as revision 19691 to svn://svn.ffmpeg.org/ffmpeg/trunk
As a side-effect this also gives it the correct value on e.g. PPC970FX-based
PPC64 systems, thus fixing "make test" (mp2/mp3 decoding).
Originally committed as revision 18953 to svn://svn.ffmpeg.org/ffmpeg/trunk
GCC makes a mess of these operations, so give it a hand.
55% faster MP3 decoding on G4.
Originally committed as revision 18794 to svn://svn.ffmpeg.org/ffmpeg/trunk
Left to its own devices, gcc calculates the full 64-bit product only to
discard the low 32 bits. This forces it to do the right thing.
20% faster MP3 decoding on G4.
Originally committed as revision 18737 to svn://svn.ffmpeg.org/ffmpeg/trunk
Otherwise doxygen complains about ambiguous filenames when files exist
under the same name in different subdirectories.
Originally committed as revision 16912 to svn://svn.ffmpeg.org/ffmpeg/trunk
allowing to re-enable ff_h264_idct_add_altivec's usage.
Patch by David Conrad %lessen42 A gmail P com%
Originally committed as revision 16465 to svn://svn.ffmpeg.org/ffmpeg/trunk
(parameter 'len' is a long not an int).
Patch by David Conrad % lessen42 A gmail P com %
Originally committed as revision 16451 to svn://svn.ffmpeg.org/ffmpeg/trunk
h264_idct_add16intra, h264_idct_add8 need to be implemented.
Add C version of ff_h264_idct8_dc_add in AltiVec so that ff_h264_idct8_add_altivec
can be used.
Originally committed as revision 16311 to svn://svn.ffmpeg.org/ffmpeg/trunk