The shift parameter was removed from this interface in 7e1ce6a.
This updates the Altivec implementation to match.
Signed-off-by: Mans Rullgard <mans@mansr.com>
To load unaligned vector data in the usual way, explicit vec_ld()
should be used rather than dereferencing a pointer to a vector type.
When the VSX extension is enabled, gcc may compile vector pointer
dereferences using the VSX lxvw4x instruction instead of the lvx
instruction typically used with Altivec/VMX. As the behaviour of
these instructions with unaligned addresses differs, it is important
that only lvx is used here.
Signed-off-by: Mans Rullgard <mans@mansr.com>
On 32-bit ppc, the GOT pointer must be loaded manually.
This adds a "get_got" assembler macro to compute the
GOT address. The "movrel" macro is updated to take an
additional parameter containing the GOT address since
no register is reserved for this purpose on ppc32.
These changes have no effect on ppc64 builds.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This separation allows these functions to be used in a cleaner
fashion from other codecs (e.g. qdm2) and simplifies creating
optimised versions of them.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This patch lets e.g. dsputil_init chose dsp functions with respect to
the bit depth to decode. The naming scheme of bit depth dependent
functions is <base name>_<bit depth>[_<prefix>] (i.e. the old
clear_blocks_c is now named clear_blocks_8_c).
Note: Some of the functions for high bit depth is not dependent on the
bit depth, but only on the pixel size. This leaves some room for
optimizing binary size.
Preparatory patch for high bit depth h264 decoding support.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
This reverts commit f8bed30d8b. The reason
for this is that the overlap filter, which runs after IDCT, should run
on unclamped values, and thus IDCT and put_pixels() cannot be merged if
we want to attempt to be bitexact.
According to ISO 9899:1999 S 6.5.7/4:
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits
are filled with zeros. If E1 has an unsigned type, the value of the
result is E1× 2^E2, reduced modulo one more than the maximum value
representable in the result type. If E1 has a signed type and
nonnegative value, and E1× 2^E2 is representable in the result type, then
that is the resulting value; otherwise, the behavior is undefined.
GCC 4.3 and later are more particular about signedness matching
in vector operations. The operations under if(rangered) were
missing assignments and thus had no effect.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This will be beneficial for use with the audio conversion API without
requiring it to depend on all of dsputil.
Signed-off-by: Mans Rullgard <mans@mansr.com>
mm_support() instead.
Reduce complexity and simplify pending move to libavutil.
Originally committed as revision 25074 to svn://svn.ffmpeg.org/ffmpeg/trunk
On PPC a leaf function has a 288-byte red zone below the stack pointer,
sparing these functions the chore of setting up a full stack frame.
When a function call is disguised within an inline asm block, the
compiler might not adjust the stack pointer as required before a
function call, resulting in the red zone being clobbered.
Moving the entire function to pure asm avoids this problem and also
results in somewhat better code.
Originally committed as revision 24044 to svn://svn.ffmpeg.org/ffmpeg/trunk
1.8x faster than altivec radix-2 on a G4
8% faster vorbis decoding
Patch (mostly) by Loren Merritt
Originally committed as revision 23956 to svn://svn.ffmpeg.org/ffmpeg/trunk
This checks which assembler syntax is supported and defines macros
for register names accordingly.
Originally committed as revision 23952 to svn://svn.ffmpeg.org/ffmpeg/trunk
Passing an explicit filename to this command is only necessary if the
documentation in the @file block refers to a file different from the
one the block resides in.
Originally committed as revision 22921 to svn://svn.ffmpeg.org/ffmpeg/trunk
This moves the H264-specific functions from DSPContext to the new
H264DSPContext. The code is made conditional on CONFIG_H264DSP
which is set by the codecs requiring it.
The qpel and chroma MC functions are not moved as these are used by
non-h264 code.
Originally committed as revision 22565 to svn://svn.ffmpeg.org/ffmpeg/trunk
This fixes a compilation issue on OS X 10.4, where some system headers were
included implicitly through dsputil_altivec.h (with _POSIX_C_SOURCE defined),
and other system headers included later, with _POSIX_C_SOURCE undefined at
that time.
Originally committed as revision 22327 to svn://svn.ffmpeg.org/ffmpeg/trunk
These macros are redundant. All uses are replaced with the generic
DECLARE_ALIGNED macro instead.
Originally committed as revision 22233 to svn://svn.ffmpeg.org/ffmpeg/trunk
29-105% faster apply_filter, 6-90% faster ape decoding on core2
(Any x86 other than core2 probably gets much less, since this is mostly due to ssse3 cachesplit avoidance and I haven't written the full gamut of other cachesplit modes.)
9-123% faster ape decoding on G4.
Originally committed as revision 20739 to svn://svn.ffmpeg.org/ffmpeg/trunk
The src3 and step arguments to vector_fmul_add_add() are always zero
and one, respectively. This removes these arguments from the function,
simplifies the code accordingly, and renames the function to better
match the new operation.
Originally committed as revision 20061 to svn://svn.ffmpeg.org/ffmpeg/trunk
Storing a single element from a vector where all elements have the same
value does not require an aligned destination. Which element is stored
depends on the alignment of the destination address, but since they all
have the same value, the result is the same regardless of the alignment.
Originally committed as revision 19696 to svn://svn.ffmpeg.org/ffmpeg/trunk
Instead of filling a local array with the desired value and loading it,
load a single element and vec_splat() it to fill the vector.
Originally committed as revision 19691 to svn://svn.ffmpeg.org/ffmpeg/trunk
As a side-effect this also gives it the correct value on e.g. PPC970FX-based
PPC64 systems, thus fixing "make test" (mp2/mp3 decoding).
Originally committed as revision 18953 to svn://svn.ffmpeg.org/ffmpeg/trunk
GCC makes a mess of these operations, so give it a hand.
55% faster MP3 decoding on G4.
Originally committed as revision 18794 to svn://svn.ffmpeg.org/ffmpeg/trunk
Left to its own devices, gcc calculates the full 64-bit product only to
discard the low 32 bits. This forces it to do the right thing.
20% faster MP3 decoding on G4.
Originally committed as revision 18737 to svn://svn.ffmpeg.org/ffmpeg/trunk
Otherwise doxygen complains about ambiguous filenames when files exist
under the same name in different subdirectories.
Originally committed as revision 16912 to svn://svn.ffmpeg.org/ffmpeg/trunk
allowing to re-enable ff_h264_idct_add_altivec's usage.
Patch by David Conrad %lessen42 A gmail P com%
Originally committed as revision 16465 to svn://svn.ffmpeg.org/ffmpeg/trunk
(parameter 'len' is a long not an int).
Patch by David Conrad % lessen42 A gmail P com %
Originally committed as revision 16451 to svn://svn.ffmpeg.org/ffmpeg/trunk
h264_idct_add16intra, h264_idct_add8 need to be implemented.
Add C version of ff_h264_idct8_dc_add in AltiVec so that ff_h264_idct8_add_altivec
can be used.
Originally committed as revision 16311 to svn://svn.ffmpeg.org/ffmpeg/trunk
Add missing one for FF_MM_ALTIVEC to avcodec.h.
Rename all the occurences of MM_* to the corresponding FF_MM_*.
Originally committed as revision 15770 to svn://svn.ffmpeg.org/ffmpeg/trunk
Neither the asm() nor the __asm__() keyword is part of the C99
standard, but while GCC accepts the former in C89 syntax, it is not
accepted in C99 unless GNU extensions are turned on (with -fasm). The
latter form is accepted in any syntax as an extension (without
requiring further command-line options).
Sun Studio C99 compiler also does not accept asm() while accepting
__asm__(), albeit reporting warnings that it's not valid C99 syntax.
Originally committed as revision 15627 to svn://svn.ffmpeg.org/ffmpeg/trunk
similar typedefs that sysctl.h needs. Since sysctl() itself isn't POSIX
undefining _POSIX_C_SOURCE for check_altivec.c seems the best way to fix this.
patch by David Conrad lessen42 at gmail com
Originally committed as revision 15306 to svn://svn.ffmpeg.org/ffmpeg/trunk
Consistently apply this rule: the guard name is obtained from the
filename by stripping the leading "lib", converting '/' and '.' to
'_' and uppercasing the resulting name. Guard names in the root
directory have to be prefixed by "FFMPEG_".
Originally committed as revision 15120 to svn://svn.ffmpeg.org/ffmpeg/trunk
The original problem was that FSF and Apple gcc used a different syntax
for vector declarations, i.e. {} vs. (). Nowadays Apple gcc versions support
the standard {} syntax and versions that support {} are available on all
relevant Mac OS X versions. Thus the greater compatibility is no longer
worth cluttering the code with macros.
Originally committed as revision 14366 to svn://svn.ffmpeg.org/ffmpeg/trunk
This includes indentation changes, comment reformatting, consistent brace
placement and some prettyprinting.
Originally committed as revision 14318 to svn://svn.ffmpeg.org/ffmpeg/trunk
This includes indentation changes, comment reformatting, consistent brace
placement and some prettyprinting.
Originally committed as revision 14316 to svn://svn.ffmpeg.org/ffmpeg/trunk