Commit Graph

602 Commits

Author SHA1 Message Date
Vitor Sessak
18b131de04 dct32: Add SSE2 ASM optimizations
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-08-02 10:17:29 -07:00
Jason Garrett-Glaser
a3bf7b864a H.264: tweak some other x86 asm for Atom 2011-07-29 12:24:15 -07:00
Mans Rullgard
3ad1684126 x86: cabac: add operand size suffixes missing from 6c32576
This fixes build with clang.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-07-28 18:59:23 -07:00
Mans Rullgard
f5f004bc5a x86: cabac: don't load/store context values in asm
Inspection of compiled code shows gcc handles these fine on its own.
Benchmarking also shows no measurable speed difference.

Removing the remaining cases in get_cabac_bypass_sign_x86() does
cause more substantial changes to the compiled code with uncertain
impact.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-07-28 22:25:21 +01:00
Jason Garrett-Glaser
6c32576548 H.264: optimize CABAC x86 asm for Atom 2011-07-28 13:06:13 -07:00
Mans Rullgard
da4c7cce21 x86: fix build with gcc 4.7
The upcoming gcc 4.7 has more advanced constant propagation
resulting some inline asm operands becoming constants and thus
emitted as literals, sometimes in contexts where this results
in invalid instructions.

This patch changes the constraints of the relevant operands
to "rm" thus forcing a valid type.  While obviously suboptimal,
this is what older gcc versions already did, and there is no
change to the code generated with these.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-07-26 22:17:43 +01:00
Daniel Kang
406fbd24dc H.264: Add optimizations to predict x86 assembly.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-07-22 14:54:33 -07:00
Joseph Artsimovich
5ab21439fd dnxhd: 10-bit support
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-07-21 18:44:40 +01:00
Mans Rullgard
a617c6aaa3 dsputil: update per-arch init funcs for non-h264 high bit depth
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-07-21 18:10:58 +01:00
Mans Rullgard
874f1a901d dsputil: template get_pixels() for different bit depths
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-07-21 18:10:58 +01:00
Mans Rullgard
0a72533e98 jfdctint: add 10-bit version
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-07-21 18:10:58 +01:00
Mans Rullgard
e7a972e113 simple_idct: add 10-bit version
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-07-20 17:49:48 +01:00
Diego Biurrun
65083b4911 dsputil: remove disabled code 2011-07-18 11:48:35 +02:00
Martin Storsjö
8f62ef0f95 x86: Use LOCAL_ALIGNED in mpegvideo_mmx_template
Signed-off-by: Martin Storsjö <martin@martin.st>
2011-07-18 00:10:45 +03:00
Diego Biurrun
e0ae2174db simple_idct: remove disabled code 2011-07-17 17:32:37 +02:00
Daniel Kang
ac4a85f476 H.264: Add more x86 assembly for 10-bit H.264 predict functions
Mainly ported from 8-bit H.264 predict.

Some code ported from x264. LGPL ok by author.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-07-13 18:44:51 -07:00
Jason Garrett-Glaser
b5bbc84fe2 H.264: add filter_mb_fast support for >8-bit decoding
Much faster high bit depth deblocking.
2011-07-11 14:58:50 -07:00
Mans Rullgard
710b8df949 dsputil: remove ff_emulated_edge_mc macro used in one place
This macro can cause problems in conjunction with the bitdepth
template expansion.  It was presumably added to keep source
compatibility when high bitdepth support was added.  However,
emulated_edge_mc is a dsputil pointer and should not be called
directly, so there is little reason to keep such a macro.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-07-10 17:55:58 +01:00
Daniel Kang
c0483d0c7a H.264: Add x86 assembly for 10-bit H.264 predict functions
Mainly ported from 8-bit H.264 predict.

Some code ported from x264. LGPL ok by author.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-07-08 15:59:29 -07:00
Daniel Kang
3c7c16fde3 YASM: Shut up unused variable compiler warning with --disable-yasm.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2011-07-04 18:49:09 +02:00
Daniel Kang
567a32b5b2 x86_32: Fix build on x86_32 with --disable-yasm.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-07-04 08:47:09 -07:00
Daniel Kang
58f7aad051 Fix build with --disable-yasm.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-07-03 22:56:09 -07:00
Daniel Kang
9bfa5363da H.264: Add x86 assembly for 10-bit H.264 qpel functions.
Mainly ported from 8-bit H.264 qpel.

Some code ported from x264. LGPL ok by author.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-07-03 07:43:38 -07:00
Justin Ruggles
f99a5ef92e ac3dsp: add x86-optimized versions of ac3dsp.extract_exponents(). 2011-07-01 13:02:11 -04:00
Justin Ruggles
6054cd25b4 ac3enc: add int32_t array clipping function to DSPUtil, including x86 versions. 2011-07-01 13:02:11 -04:00
Diego Biurrun
d2ee495fb2 configure: Drop check for availability of ten assembler operands.
This was done to support gcc 2.95, which is an old legacy compiler
that fails to compile the current codebase anyway.
2011-06-28 13:14:37 +02:00
Diego Biurrun
adbfc605f6 doxygen: Consistently use '@' instead of '\' for Doxygen markup.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2011-06-24 00:37:49 +02:00
Daniel Kang
84e70ef004 h264: Add x86 assembly for 10-bit weight/biweight H.264 functions.
Mainly ported from 8-bit H.264 weight/biweight.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2011-06-21 15:24:13 +02:00
Mans Rullgard
c5ee740745 x86: cabac: fix register constraints for 32-bit mode
Some operands need to be accessed in byte mode, which restricts the
available registers in 32-bit mode.  Using the 'q' constraint selects
a suitable register.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-06-20 23:36:40 +01:00
Mans Rullgard
2143d69bdd cabac: move x86 asm to libavcodec/x86/cabac.h
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-06-20 22:36:31 +01:00
Mans Rullgard
d075e7d540 x86: h264: cast pointers to intptr_t rather than int
Only the low-order bits are used here so the type is not important,
but this avoids a compiler warning.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-06-20 22:36:31 +01:00
Mans Rullgard
3a4edb76d6 x86: h264: remove hardcoded edi in decode_significance_8x8_x86()
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-06-20 22:36:31 +01:00
Mans Rullgard
b92c1a6d26 x86: h264: remove hardcoded esi in decode_significance[_8x8]_x86()
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-06-20 22:36:31 +01:00
Mans Rullgard
3fc4e36c78 x86: h264: remove hardcoded edx in decode_significance[_8x8]_x86()
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-06-20 22:36:31 +01:00
Mans Rullgard
e4b5a204aa x86: h264: remove hardcoded eax in decode_significance[_8x8]_x86()
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-06-20 22:36:30 +01:00
Mans Rullgard
018c33838e x86: cabac: remove hardcoded ebx in inline asm
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-06-20 22:36:30 +01:00
Mans Rullgard
6b712acc0e x86: cabac: remove hardcoded struct offsets from inline asm
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-06-20 22:36:30 +01:00
Ronald S. Bultje
ed63f527f2 Fix build if yasm is not available. 2011-06-18 08:34:14 -04:00
Daniel Kang
f188a1e0ca H.264: Add x86 assembly for 10-bit MC Chroma H.264 functions.
Mainly ported from 8-bit H.264 MC Chroma.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-06-18 07:52:19 -04:00
Jason Garrett-Glaser
c90b94424c 4:4:4 H.264 decoding support
Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.
2011-06-13 21:16:30 -07:00
Jason Garrett-Glaser
504811baea Roll back 4:4:4 H.264 for now
Needs some ARM/PPC asm modifications.
2011-06-13 13:38:46 -07:00
Jason Garrett-Glaser
c9c493872c 4:4:4 H.264 decoding support
Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.
2011-06-13 12:21:39 -07:00
Oskar Arvidsson
6c031a3338 h264: Fix 10-bit H.264 x86 chroma v loopfilter asm.
The tc variable was not splatted correctly.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-06-10 14:44:57 -04:00
Daniel Kang
4de83b7b6d H264: x86 predict init cosmetics.
Change indentation and whitespace; also move HAVE_YASM blocks.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2011-06-08 00:22:52 +02:00
Daniel Kang
a8d44f9dd5 Add x86 assembly for some 10-bit H.264 intra predict functions.
Parts are inspired from the 8-bit H.264 predict code in Libav.
Other parts ported from x264 with relicensing permission from author.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2011-06-06 01:31:02 +02:00
Loren Merritt
53be7b23e9 Cosmetic changes to h264_idct_10bit.asm.
Removes redundant dword tags and whitespace changes.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-06-02 07:07:15 -07:00
Loren Merritt
994c3550ff 2x faster h264_idct_add8_10.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-06-02 07:07:02 -07:00
Ronald S. Bultje
e6635a9a19 h264: remove CONFIG_GPL from x86 intra prediction code.
The authors permitted relicensing to LGPL a long time ago (Holger,
Loren and Jason).
2011-06-02 07:02:46 -07:00
Daniel Kang
f3aa65af3a h264/10bit: add HAVE_ALIGNED_STACK checks.
Fixes regression in 836f47d34b in ICC-10.x,
since ICC<=11.0 doesn't align stack upon function calls.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-05-31 21:43:20 -07:00
Daniel Kang
348493db60 Update 8-bit H.264 IDCT function names to reflect bit-depth.
Signed-off-by: Ronald S. Bultje <rbultje@google.com>
2011-05-31 15:02:32 -07:00
Daniel Kang
836f47d34b Add IDCT functions for 10-bit H.264.
Ports the majority of IDCT functions for 10-bit H.264.

Parts are inspired from 8-bit IDCT code in Libav; other parts ported from x264 with relicensing permission from author.

Signed-off-by: Ronald S. Bultje <rbultje@google.com>
2011-05-31 15:02:32 -07:00
Justin Ruggles
70bb747a57 ac3dsp: do not use the ff_* prefix when referencing ff_ac3_bap_bits.
this should fix the windows builds

Signed-off-by: Martin Storsjö <martin@martin.st>
2011-05-28 22:43:40 +03:00
Justin Ruggles
6ca23db9cc ac3enc: modify mantissa bit counting to keep bap counts for all values of bap
instead of just 0 to 4.

This does all the actual bit counting as a final step.
2011-05-28 12:39:28 -04:00
Diego Biurrun
5e528cffcf x86: Add appropriate ifdefs around certain AVX functions.
nasm versions prior to 2.09 have trouble assembling some of our AVX code.
Protect these sections by preprocessor macros to allow compilation to pass.
2011-05-27 21:18:12 +02:00
Dave Yeo
a10fb79070 x86 asm: Add SECTION_TEXT to dct32_sse.asm.
This fixes the following error on OS/2:
error: segment name `.text align=16' not recognized

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2011-05-23 12:47:53 +02:00
Loren Merritt
422b2362fc dct32_sse: eliminate some spills
125->104 cycles on penryn (x86_64 only)
2011-05-22 19:27:18 +02:00
Vitor Sessak
165c7c420d Fix dct32() compilation with --disable-yasm
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-05-22 07:10:19 -04:00
Vitor Sessak
6204feb160 dct32: Add AVX implementation of 32-point DCT 2011-05-21 17:42:26 +02:00
Vitor Sessak
4e653b98c8 dct32: Change pass 6 permutation to allow for AVX implementation 2011-05-21 17:42:26 +02:00
Vitor Sessak
3758eb0eb9 dct32: port SSE 32-point DCT to YASM 2011-05-21 17:42:26 +02:00
Diego Biurrun
153382e1b6 multiple inclusion guard cleanup
Add missing multiple inclusion guards; clean up #endif comments;
add missing library prefixes; keep guard names consistent.
2011-05-21 13:48:10 +02:00
Dave Yeo
d69f9a4234 Add support for a.out object format to assembler macros.
This format is still used by e.g. OS/2.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2011-05-20 17:52:21 +02:00
Mans Rullgard
0b5e44ed29 mpegaudiodsp: fix x86 and ppc makefiles
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-05-19 16:32:24 +01:00
Mans Rullgard
c4f5c2d6f4 Move some mpegaudio functions to new mpegaudiodsp subsystem
This separation allows these functions to be used in a cleaner
fashion from other codecs (e.g. qdm2) and simplifies creating
optimised versions of them.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-05-19 12:25:34 +01:00
Justin Ruggles
e98a95e779 10l: wrap float_interleave functions in HAVE_YASM.
fixes compilation with --disable-yasm
2011-05-18 20:18:08 -04:00
Justin Ruggles
32f8fb8ecf Add float_interleave() to FmtConvertContext with x86-optimized versions.
Partially based on patches by clsid2 in ffdshow-tryout.
ff_float_interleave6() x86 improvements by Loren Merrit.
2011-05-18 17:27:05 -04:00
Daniel Kang
d0005d347d Modify x86util.asm to ease transitioning to 10-bit H.264 assembly.
Arguments for variable size instructions are added to many macros, along
with other various changes. The x86util.asm code was ported from x264.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2011-05-17 20:44:48 +02:00
Gil Pedersen
257de5fb25 h264dsp_mmx: Add #ifdefs around some mmxext functions on x86_64.
This fixes linking errors due to undefined symbols on x86_64 OS X.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2011-05-16 15:35:53 +02:00
Diego Biurrun
888fa31eca Fix FSF address copy paste error in some license headers. 2011-05-14 21:32:31 +02:00
Jason Garrett-Glaser
5705b02079 10-bit H.264 x86 chroma v loopfilter asm
Also delete some unused deblock asm macros.
2011-05-11 11:09:10 -07:00
Jason Garrett-Glaser
9f3d6ca4f1 Port x86 10-bit H.264 deblock asm from x264 2011-05-10 20:02:15 -07:00
Jason Garrett-Glaser
8ad77b65b5 Update x86 H.264 deblock asm
Includes AVX versions from x264.
2011-05-10 20:01:58 -07:00
Ronald S. Bultje
86b29553f8 h264dsp_mmx: place bracket outside #if/#endif block.
Should fix compile on systems missing yasm/nasm.
2011-05-10 08:39:38 -04:00
Oskar Arvidsson
19a0729b4c Adds 8-, 9- and 10-bit versions of some of the functions used by the h264 decoder.
This patch lets e.g. dsputil_init chose dsp functions with respect to
the bit depth to decode. The naming scheme of bit depth dependent
functions is <base name>_<bit depth>[_<prefix>] (i.e. the old
clear_blocks_c is now named clear_blocks_8_c).

Note: Some of the functions for high bit depth is not dependent on the
bit depth, but only on the pixel size. This leaves some room for
optimizing binary size.

Preparatory patch for high bit depth h264 decoding support.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-05-10 07:24:36 -04:00
Diego Biurrun
a734fa575f Remove disabled non-optimized code variants. 2011-04-29 20:01:13 +02:00
Vitor Sessak
9d35fa520e Add AVX FFT implementation.
Signed-off-by: Reinhard Tartler <siretart@tauware.de>
2011-04-26 18:25:24 +02:00
Vitor Sessak
33cbfa6fa3 Update x86inc.asm from x264 to allow AVX emulation using SSE and MMX.
Signed-off-by: Reinhard Tartler <siretart@tauware.de>
2011-04-26 18:18:22 +02:00
Alexander Strange
1500be13f2 dsputil: allow to skip drawing of top/bottom edges. 2011-03-26 17:45:38 -04:00
Justin Ruggles
e6e9823488 Add apply_window_int16() to DSPContext with x86-optimized versions and use it
in the ac3_fixed encoder.
2011-03-22 21:08:30 -04:00
Mans Rullgard
0aded9484d Move dct and rdft definitions to separate files
This leaves fft.h with only the core FFT and MDCT definitions
thus making it more managable.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-03-20 17:15:33 +00:00
Mans Rullgard
2912e87a6c Replace FFmpeg with Libav in licence headers
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-03-19 13:33:20 +00:00
Justin Ruggles
0f999cfddb ac3enc: add float_to_fixed24() with x86-optimized versions to AC3DSPContext
and use in scale_coefficients() for the floating-point AC-3 encoder.
2011-03-17 16:46:48 -04:00
Justin Ruggles
79414257e2 mathops: fix MULL() when the compiler does not inline the function.
If the function is not inlined, an immmediate cannot be used for the
shift parameter, so the %cl register must be used instead in that case.

This fixes compilation for x86-32 using gcc with --disable-optimizations.
2011-03-15 20:49:37 -04:00
Justin Ruggles
aaff3b312e mathops: change "g" constraint to "rm" in x86-32 version of MUL64().
The 1-arg imul instruction cannot take an immediate argument, only a register
or memory argument.
2011-03-15 13:43:47 -04:00
Justin Ruggles
b181b8fb96 mathops: convert MULL/MULH/MUL64 to inline functions rather than macros.
This fixes unexpected name collisions that were occurring with variables
declared within the macros.
It also fixes the fate-acodec-ac3_fixed regression test on x86-32.
2011-03-15 13:43:47 -04:00
Justin Ruggles
f1efbca5e9 ac3enc: add SIMD-optimized shifting functions for use with the fixed-point AC3 encoder. 2011-03-14 08:45:31 -04:00
Mans Rullgard
a5444fee06 Add CONFIG_AC3DSP symbol to simplify makefiles
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-03-12 11:35:26 +00:00
Ronald S. Bultje
bf6fa73245 dsputil_mmx.c: remove ff_vector128.
Remove ff_vector128, it is identical to ff_pb_80.
2011-02-19 10:51:15 -05:00
Ronald S. Bultje
12802ec060 dsputil: move VC1-specific stuff into VC1DSPContext. 2011-02-17 17:35:35 -05:00
Justin Ruggles
1f004fc512 ac3dsp: Change punpckhqdq to movhlps in ac3_max_msb_abs_int16().
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-02-16 14:08:34 -05:00
Justin Ruggles
fbb6b49dab ac3enc: Add x86-optimized function to speed up log2_tab().
AC3DSPContext.ac3_max_msb_abs_int16() finds the maximum MSB of the absolute
value of each element in an array of int16_t.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-02-13 16:49:39 -05:00
Loren Merritt
e6b1ed693a FFT: factor a shuffle out of the inner loop and merge it into fft_permute.
6% faster SSE FFT on Conroe, 2.5% on Penryn.

Signed-off-by: Janne Grunau <janne-ffmpeg@jannau.net>
2011-02-13 15:36:39 +01:00
Justin Ruggles
dda3f0ef48 Add x86-optimized versions of exponent_min().
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-02-10 15:32:47 -05:00
Ronald S. Bultje
17cf7c68ed Fix ff_emu_edge_core_sse() on Win64.
Fix emu_edge_v_extend_15 to be <128 bytes on Win64, by being more strict
on the size of registers and which registers are being used for operations
where multiple are available. This fixes segfaults in emulated_edge()
function calls on Win64.
2011-02-08 18:25:12 -05:00
Justin Ruggles
c73d99e672 Separate format conversion DSP functions from DSPContext.
This will be beneficial for use with the audio conversion API without
requiring it to depend on all of dsputil.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-02-02 02:44:53 +00:00
Alex Converse
770c410fbb Fix ff_imdct_calc_sse() on gcc-4.6
Gcc 4.6 only preserves the first value when using an array with an "m"
constraint.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-02-02 02:40:05 +00:00
Ronald S. Bultje
81f2a3f4ff Implement a SIMD version of emulated_edge_mc() for x86.
From ~550 cycles (C version) to 170 (SSE/x86-64), 206 (MMX/x86-32)
and 196 (SSE2/x86-32) cycles.
2011-01-31 20:55:56 -05:00
Justin Ruggles
d19b744a36 cosmetics: indentation
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-01-31 20:30:15 +00:00
Justin Ruggles
80ba1ddb58 Remove unneeded add bias from 3 functions.
DSPContext.vector_fmul_window()
DCADSPContext.lfe_fir()
SynthFilterContext.synth_filter_float()

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-01-31 20:28:42 +00:00
Mans Rullgard
80944df720 x86: fix overflow in h264 8x8 planar prediction
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-01-24 23:24:28 +00:00
Justin Ruggles
6eabb0d3ad Change DSPContext.vector_fmul() from dst=dst*src to dest=src0*src1.
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-01-22 17:53:27 +00:00
Justin Ruggles
1c189fc533 cosmetics related to LPC changes.
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-01-21 19:59:08 +00:00
Justin Ruggles
77a78e9bdc Separate window function from autocorrelation.
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-01-21 19:59:08 +00:00
Justin Ruggles
56f8952b25 Move lpc_compute_autocorr() from DSPContext to a new struct LPCContext.
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-01-21 19:58:59 +00:00
Ronald S. Bultje
b9c7f66e6d Fix horizontal/horizontal_up 8x8l intra prediction x86/simd functions.
The original functions did not work correctly for edge pixels, e.g.
when CODEC_FLAG_EMU_EDGE is set, leading to corrupt output in e.g. VLC.
Based on a patch by Daniel Kang <daniel d kang gmail com>.

Signed-off-by: Ronald S. Bultje <rsbultje gmail com>
2011-01-19 20:34:42 -05:00
Mans Rullgard
ef4a65149d Replace ASMALIGN() with .p2align
This macro has unconditionally used .p2align for a long time and
serves no useful purpose.
2011-01-18 20:48:24 +00:00
Mans Rullgard
ac3c9d0169 x86: remove VLA in ac3_downmix_sse 2011-01-18 20:48:24 +00:00
Janne Grunau
2c3589bfda consolidate .gitignore patters into a single file
Signed-off-by: Janne Grunau <janne-ffmpeg@jannau.net>
2011-01-18 21:32:05 +01:00
Janne Grunau
348b8218f7 convert svn:ignore properties to .gitignore files
Signed-off-by: Janne Grunau <janne-ffmpeg@jannau.net>
2011-01-17 15:50:14 +01:00
Ronald S. Bultje
1b3e43e4fd Fix overflow in pred16x16_plane x86 simd code. Fixes issue 2547.
Originally committed as revision 26381 to svn://svn.ffmpeg.org/ffmpeg/trunk
2011-01-15 22:00:44 +00:00
Ronald S. Bultje
ec3233a855 Fix ff_pw_3 alignment.
Originally committed as revision 26344 to svn://svn.ffmpeg.org/ffmpeg/trunk
2011-01-14 23:26:34 +00:00
Jason Garrett-Glaser
19fb234e4a H.264: split luma dc idct out and implement MMX/SSE2 versions
About 2.5x the speed.

NOTE: the way that the asm code handles large qmuls is a bit suboptimal.
If x264-style dequant was used (separate shift and qmul values), it might
be possible to get some extra speed.

Originally committed as revision 26336 to svn://svn.ffmpeg.org/ffmpeg/trunk
2011-01-14 21:34:25 +00:00
Daniel Kang
004357a11f Fix compilation on x86-32 with --disable-optimizations,
fixes issue 2127.

Patch by Daniel Kang, daniel.d.kang at gmail

Originally committed as revision 26204 to svn://svn.ffmpeg.org/ffmpeg/trunk
2011-01-03 11:30:04 +00:00
Daniel Kang
0790caba60 Fix invalid reads in valgrind fate, patch by Daniel Kang <daniel dot d dot
kang at gmail com>, as part of Google's GCI 2010.

Originally committed as revision 26177 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-31 01:29:06 +00:00
Daniel Kang
536e9b2f58 Port pred8x8l_down_left_mmxext (H.264 intra prediction) from x264 (authors:
Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang
at gmail com>, as part of Google's GCI 2010.

Originally committed as revision 26162 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-29 23:48:44 +00:00
Daniel Kang
720ea2d5b2 Port pred4x4_down_right_mmxext (H.264 intra prediction) from x264 (authors:
Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang
at gmail com>, as part of Google's GCI 2010.

Originally committed as revision 26159 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-29 21:55:51 +00:00
Daniel Kang
d0aebe23e2 Port pred4x4_vertical_right_mmxext (H.264 intra prediction) from x264 (authors:
Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang
at gmail com>, as part of Google's GCI 2010.

Originally committed as revision 26158 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-29 21:52:41 +00:00
Daniel Kang
76497232ef Port pred4x4_horizontal_down_mmxext (H.264 intra prediction) from x264
(authors:Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot
d dot kang at gmail com>, as part of Google's GCI 2010.

Originally committed as revision 26157 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-29 21:49:57 +00:00
Daniel Kang
e9c576a467 Port pred4x4_horizontal_up_mmxext (H.264 intra prediction) from x264 (authors:
Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang
at gmail com>, as part of Google's GCI 2010.

Originally committed as revision 26156 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-29 21:42:33 +00:00
Daniel Kang
92f441ae86 Port pred4x4_vertical_left_mmxext (H.264 intra prediction) from x264 (authors:
Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang
at gmail com>, as part of Google's GCI 2010.

Originally committed as revision 26155 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-29 21:35:34 +00:00
Ronald S. Bultje
e8d98764cc Merge a few superfluous CONFIG_GPL checks.
Originally committed as revision 26154 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-29 21:30:47 +00:00
Ronald S. Bultje
42a59278cf Whitespace cosmetics.
Originally committed as revision 26152 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-29 20:43:15 +00:00
Daniel Kang
57b1f334d1 Port pred8x8l_horizontal_down_sse2/ssse3 (H.264 intra prediction) from x264
(authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot
d dot kang at gmail com>, as part of Google's GCI 2010.

Originally committed as revision 26151 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-29 20:42:15 +00:00
Daniel Kang
04cbdf3d24 Port pred8x8l_horizontal_down_mmxext (H.264 intra prediction) from x264
(authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot
d dot kang at gmail com>, as part of Google's GCI 2010.

Originally committed as revision 26150 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-29 20:38:06 +00:00
Daniel Kang
98c6053cd0 Port pred8x8l_horizontal_up_mmxext/ssse3 (H.264 intra prediction) from x264
(authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot
d dot kang at gmail com>, as part of Google's GCI 2010.

Originally committed as revision 26149 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-29 20:35:31 +00:00
Daniel Kang
ecc7efbbb6 Port pred8x8l_vertical_left_sse2/ssse3 (H.264 intra prediction) from x264
(authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot
d dot kang at gmail com>, as part of Google's GCI 2010.

Originally committed as revision 26148 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-29 20:06:22 +00:00
Daniel Kang
bdd93f1b25 Port pred8x8l_vertical_right_sse2/ssse3 (H.264 intra prediction) from x264
(authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot
d dot kang at gmail com>, as part of Google's GCI 2010.

Originally committed as revision 26147 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-29 19:54:05 +00:00
Daniel Kang
f25112fc09 Port pred8x8l_vertical_right_mmxext (H.264 intra prediction) from x264
(authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot
d dot kang at gmail com>, as part of Google's GCI 2010.

Originally committed as revision 26146 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-29 19:46:09 +00:00
Daniel Kang
602a4cb25a Port pred8x8l_down_right_sse2/ssse3 (H.264 intra prediction) from x264
(authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot
d dot kang at gmail com>, as part of Google's GCI 2010.

Originally committed as revision 26145 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-29 19:19:49 +00:00
Daniel Kang
e916acbcd1 Port pred8x8l_down_right_mmxext (H.264 intra prediction) from x264 (authors:
Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang
at gmail com>, as part of Google's GCI 2010.

Originally committed as revision 26143 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-29 19:12:02 +00:00
Daniel Kang
c249e66576 Port pred8x8l_down_left_sse2/ssse3 (H.264 intra prediction) from x264 (authors:
Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang at
gmail com>, as part of Google's GCI 2010.

Originally committed as revision 26142 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-29 19:02:50 +00:00
Daniel Kang
ee1ba9c326 Port pred8x8l_vertical_mmxext/ssse3 (H.264 intra prediction) from x264 to
FFmpeg. Original authors: Holger Lubitz <holger lubitz org>, Jason Garrett-
Glaser <darkshikari gmail com> (approves LGPL relicensing for this code) and
Loren Merritt <lorenm at u dot washington dot edu> (approves LGPL relicensing
for this code). Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as
part of Google's GCI 2010.

Originally committed as revision 26140 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-29 18:46:40 +00:00
Daniel Kang
04207ef353 Port pred8x8l_horizontal_mmxext/ssse3 (H.264 intra prediction) from x264 to
FFmpeg. Original authors: Holger Lubitz <holger lubitz org>, Jason Garrett-
Glaser <darkshikari gmail com> (approves LGPL relicensing for this code) and
Loren Merritt <lorenm at u dot washington dot edu> (approves LGPL relicensing
for this code). Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as
part of Google's GCI 2010.

Originally committed as revision 26139 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-29 18:40:53 +00:00
Daniel Kang
abab14eac0 Port pred8x8l_dc_mmx/ssse3 (H.264 intra prediction) from x264 to FFmpeg.
Original authors: Holger Lubitz <holger lubitz org>, Jason Garrett-Glaser
<darkshikari gmail com> (approves LGPL relicensing for this code) and Loren
Merritt <lorenm at u dot washington dot edu> (approves LGPL relicensing for
this code). Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as
part of Google's GCI 2010.

Originally committed as revision 26138 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-29 18:33:10 +00:00
Daniel Kang
2e93fd4b5e Port pred8x8l_top_dc_mmxext/ssse3 (H.264 intra prediction) from x264 to FFmpeg.
Original authors: Holger Lubitz <holger lubitz org>, Jason Garrett-Glaser
<darkshikari gmail com> (approves LGPL relicensing for this code) and Loren
Merritt <lorenm at u dot washington dot edu> (approves LGPL relicensing for
this code). Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as
part of Google's GCI 2010.

Originally committed as revision 26137 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-29 18:11:27 +00:00
Ronald S. Bultje
54a959e483 Move PRED4x4_LOWPASS up so it can be used in 8x8l predict functions while
keeping the functions ordered in the source file (i.e. cosmetics).

Originally committed as revision 26136 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-29 18:04:57 +00:00
Ronald S. Bultje
a2dfe8d18d Port pred8x8_dc_mmxext (H.264 intra prediction) from x264 to FFmpeg. Original
authors: Holger Lubitz <holger lubitz org>, Jason Garrett-Glaser <darkshikari
gmail com> (approves LGPL relicensing for this code) and Loren Merritt <lorenm
at u dot washington dot edu> (approves LGPL relicensing for this code). Patch
by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI
2010.

Originally committed as revision 26135 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-29 18:00:26 +00:00
Ronald S. Bultje
83ff3f72e5 Add missing authors to copyright headers.
Originally committed as revision 26133 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-29 17:45:26 +00:00
Daniel Kang
725a3f9dfb Port pred8x8_top_dc_mmxext (H.264 intra prediction) from x264 to FFmpeg.
Original authors: Holger Lubitz <holger lubitz org>, Jason Garrett-Glaser
<darkshikari gmail com> (approves LGPL relicensing for this code) and Loren
Merritt <lorenm at u dot washington dot edu> (approves LGPL relicensing for
this code). Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as
part of Google's GCI 2010.

Originally committed as revision 26132 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-29 17:42:34 +00:00
Ronald S. Bultje
98928c83e0 Mark recently added pred4x4_down_left_mmxext as CONFIG_GPL. Although Holger
initially said he'd be OK with relicensing, he also said he wanted to have
another look at the patch, and then he went on vacation, so let's play it
safe for now. We can consider removing this again later.

Originally committed as revision 26131 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-29 17:34:00 +00:00
Daniel Kang
911b32f482 Port pred4x4_down_left_mmxext (H.264 intra prediction) from x264 to FFmpeg.
LGPL relicensing approved by original authors: Holger Lubitz <holger lubitz
org>, Jason Garrett-Glaser <darkshikari gmail com> and Loren Merritt <lorenm
at u dot washington dot edu>. Patch by Daniel Kang <daniel dot d dot kang at
gmail com>, as part of Google's GCI 2010.

Originally committed as revision 26087 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-24 22:43:07 +00:00
Ronald S. Bultje
8d147f1f60 For rounding in chroma MC SSSE3, use 16-byte pw_3/4 instead of reading 8 bytes
and then using movlhps to dup it into the higher half of the register.

Originally committed as revision 26086 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-24 17:23:22 +00:00
Baptiste Coudurier
90f1f3bf00 In yadif filter, declare asm constants directly to avoid dependency on libavcodec
Originally committed as revision 25895 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-06 00:14:15 +00:00
Baptiste Coudurier
9e95999e2a 10l, add ff_pw_1 to dsputil_mmx for yadif sse2
Originally committed as revision 25881 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-04 13:06:06 +00:00
avcoder
1761272ba9 Use SECTION .text for yasm code.
Patch by avcoder, ffmpeg gmail

Originally committed as revision 25859 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-01 13:12:39 +00:00
Ramiro Polla
4f9d25ddc8 dnxhd_mmx: prefer xmm registers below xmm6 when they are available
Originally committed as revision 25634 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-11-02 03:09:16 +00:00
İsmail Dönmez
80e33d2451 dsputil: Use explicit movzbl instead of movzx
This fixes compilation with the latest clang trunk version.

Patch by İsmail Dönmez, ismail at namtrac dot org

Originally committed as revision 25628 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-11-01 19:35:51 +00:00
Ramiro Polla
a4ece893e1 lpc_mmx: add xmm registers to clobber list
Originally committed as revision 25620 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-31 23:37:15 +00:00
Ramiro Polla
e5d5407e26 lpc_mmx: merge some asm blocks
These blocks depended on the compiler keeping xmm registers untouched between
them.

Originally committed as revision 25619 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-31 23:36:26 +00:00
Ramiro Polla
eed299b897 sad16_sse2: merge 2 asm blocks
Originally committed as revision 25617 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-31 21:20:20 +00:00
Ramiro Polla
153ca56b38 xmm_clobbers: list xmm registers first in clobber list
suncc does not like the leading commas inside the macro, but it has no problem
with trailing commas.

Originally committed as revision 25615 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-31 18:14:48 +00:00
Ramiro Polla
ba40452095 idct_sse2_xvid: only mark xmm>=8 as clobbered on x86_64
Originally committed as revision 25614 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-31 16:28:28 +00:00
Ramiro Polla
05c018078c motion_est_mmx: prefer xmm registers below xmm6 when they are available
Originally committed as revision 25612 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-31 15:07:21 +00:00
Ramiro Polla
5d543a3d13 dsputil_mmx: add xmm registers to clobber list
Originally committed as revision 25611 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-31 13:57:58 +00:00
Ramiro Polla
e2d13c5882 cosmetics: split long line
Originally committed as revision 25610 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-31 13:46:17 +00:00
Ramiro Polla
0d729e0de2 fdct_mmx: add xmm registers to clobber list
Originally committed as revision 25609 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-31 13:45:04 +00:00
Ramiro Polla
616735eb97 idct_sse2_xvid: add xmm registers to clobber list
Originally committed as revision 25608 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-31 13:17:43 +00:00
Ramiro Polla
9943f3b91c mpegvideo_mmx: add xmm registers to clobber list
Originally committed as revision 25607 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-31 13:15:16 +00:00
Ramiro Polla
559738eff3 dsputil_mmx: prefer xmm registers below xmm6 when they are available
Originally committed as revision 25606 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-31 13:13:53 +00:00
Ramiro Polla
51d592dbcb h264dsp: add xmm registers to clobber list
Originally committed as revision 25604 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-30 17:14:22 +00:00
Ramiro Polla
ac19f4a3e8 indent
Originally committed as revision 25598 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-28 18:31:30 +00:00
Ramiro Polla
cae05859e1 h264dsp: merge some more asm blocks
Originally committed as revision 25597 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-28 18:22:21 +00:00
Ramiro Polla
c6a908be58 dct32: mark xmm registers in clobber list in ff_dct32_float_sse()
Originally committed as revision 25569 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-25 20:29:29 +00:00
Ramiro Polla
b32c9ca9a3 h264dsp: merge some asm blocks
Some code was initializing some xmm registers in one asm block and using them
in the following block, assuming they wouldn't be changed in between blocks.

Originally committed as revision 25568 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-25 18:02:02 +00:00
Reimar Döffinger
6c2142809c Add d modifier to asm argument to fix nasm compilation.
Originally committed as revision 25397 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-07 19:18:18 +00:00
Ramiro Polla
326bf69acc fft: mark xmm registers as clobbered in ff_imdct_calc_sse
Originally committed as revision 25363 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-06 01:27:02 +00:00
Ronald S. Bultje
dd68d4db43 MMX, MMX2, SSE2 and SSSE3 optimizations for pred16x16/8x8_plane H264 intra
prediction (plus some with different rounding for svq3/rv40). Speedup (for
SSSE3) about ~6-fold, 3.6% faster overall with cathedral sample.

Originally committed as revision 25361 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-05 22:06:18 +00:00
İsmail Dönmez
9276bdddca snowdsp: Explicitly state the operand sizes
Fixes compilation with clang's builtin assembler

Patch by İsmail Dönmez, ismail at namtrac dot org

Originally committed as revision 25331 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-04 13:08:13 +00:00
Ronald S. Bultje
a52ffc3f54 Move static inline function to a macro, so that constant propagation in
inline asm works for gcc-3.x also (hopefully). Should fix gcc-3.x FATE
breakage after r25254.

Originally committed as revision 25262 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-29 17:42:26 +00:00
Eli Friedman
329d689f75 Use sse2 variant of put_pixels16() for no_rnd also. Provides a minor speed
increase to e.g. vc1, snow and mpeg decoding.

Patch by Eli Friedman <eli dot friedman gmail com>.

Originally committed as revision 25259 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-29 15:34:43 +00:00
Ronald S. Bultje
cd17285e6c Merge b_idx and edge variables, and optimize the ASM to directly load variables
from memory locations/offsets depending on b_idx plus constants, rather than
having gcc do this. This saves several lea calls and together saves about
10 cycles in h264_loop_filter_strength_mmx2().

Originally committed as revision 25256 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-29 14:04:39 +00:00
Ronald S. Bultje
0cc8a5d088 Remove mv_mask variable. Replace the related pand -1/0 instructions by either
a pxor, or remove the instruction alltogether. Altogether, this saves 1
instruction.

Originally committed as revision 25255 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-29 14:03:30 +00:00
Ronald S. Bultje
c0673f2cf4 Remove d_idx as a variable, and instead load it as a constant in the asm.
This has no measurable speed effect because the surrounding code doesn't
take advantage of this yet.

Originally committed as revision 25254 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-29 14:02:32 +00:00
Ronald S. Bultje
2c3135f6d3 Unroll inner bidir loop in h264_loop_filter_strength_mmx2(), which gets rid
of the d_idx variable and therefore allows for future optimizations. No speed
difference by this commit itself.

Originally committed as revision 25253 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-29 13:35:24 +00:00
Ronald S. Bultje
4b81511cab Unloop the outer loop in h264_loop_filter_strength_mmx2(), which allows
inlining various constants within the loop code. 20 cycles faster on
cathedral sample.

Originally committed as revision 25252 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-29 13:34:20 +00:00
Reimar Döffinger
02b424d9c8 Add d suffix to movd target register to make it work with nasm.
Originally committed as revision 25206 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-26 09:15:18 +00:00
Reimar Döffinger
dc77e985b7 Split and then simplify address generation macro.
Allows nasm to work for this code.

Originally committed as revision 25205 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-26 09:08:11 +00:00
Ronald S. Bultje
7e117771cd Remove unused variable.
Originally committed as revision 25173 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-24 15:31:46 +00:00
Ronald S. Bultje
ae11291865 Unroll loop in h264_idct_add16intra_sse2(). Basically identical to r25171, this
inlines scan8[] and removes loop setup. 15% faster, 0.4% overall.

See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML.

Originally committed as revision 25172 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-24 14:07:23 +00:00
Ronald S. Bultje
4bca677494 Unroll loop in h264_idct_add8_sse2(). This means we can inline scan8[] in the
code directly also and remove loop setup. 20% faster in function, 0.8% overall.

See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML.

Originally committed as revision 25171 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-24 14:05:45 +00:00
Måns Rullgård
c0bc8b9afb x86: disable SSE functions using stack when stack is not aligned
This fixes crashes with ICC 10.1.

Originally committed as revision 25153 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-21 17:57:21 +00:00
Måns Rullgård
f41237c9db x86: remove hack disabling sse2 h264 loop filter with 32-bit icc
Originally committed as revision 25146 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-18 20:44:32 +00:00
Ronald S. Bultje
ada65af9d1 Don't access upper 32 bits of a 32-bit int on 64-bit systems.
Originally committed as revision 25140 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-17 12:24:22 +00:00
Ronald S. Bultje
6c3d021891 Properly add HAVE_YASM around yasmified symbols. Should fix compile error
on configurations using --disable-yasm.

Originally committed as revision 25138 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-17 03:01:57 +00:00
Ronald S. Bultje
e2e341048e Move hadamard_diff{,16}_{mmx,mmx2,sse2,ssse3}() from inline asm to yasm,
which will hopefully solve the Win64/FATE failures caused by these functions.

Originally committed as revision 25137 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-17 01:56:06 +00:00
Ronald S. Bultje
d0acc2d2e9 Move sse16_sse2() from inline asm to yasm. It is one of the functions causing
Win64/FATE issues.

Originally committed as revision 25136 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-17 01:44:17 +00:00
Ronald S. Bultje
1d16a1cf99 Rename h264_idct_sse2.asm to h264_idct.asm; move inline IDCT asm from
h264dsp_mmx.c to h264_idct.asm (as yasm code). Because the loops are now
coded in asm instead of C, this is (depending on the function) up to 50%
faster for cases where gcc didn't do a great job at looping.

Since h264_idct_add8() is now faster than the manual loop setup in h264.c,
in-asm idct calling can now be enabled for chroma as well (see r16207). For
MMX, this is 5% faster. For SSE2 (which isn't done for chroma if h264.c does
the looping), this makes it up to 50% faster. Speed gain overall is ~0.5-1.0%.

Originally committed as revision 25119 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-14 13:36:26 +00:00
Jason Garrett-Glaser
8acb554aff LGPL SSE2 H.264 iDCT
This leaves no more GPL-only H.264 decoding asm code.

Approved by Loren.

Originally committed as revision 25092 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-10 02:25:12 +00:00
Stefano Sabatini
c6c98d0897 Move mm_support() from libavcodec to libavutil, make it a public
function and rename it to av_get_cpu_flags().

Originally committed as revision 25076 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-08 15:07:14 +00:00
Reimar Döffinger
b1c32fb5e5 Use "d" suffix for general-purpose registers used with movd.
This increases compatibilty with nasm and is also more consistent,
e.g. with h264_intrapred.asm and h264_chromamc.asm that already
do it that way.

Originally committed as revision 25042 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-05 10:10:16 +00:00
Stefano Sabatini
7160bb716b Rename FF_MM_ symbols related to CPU features flags as AV_CPU_FLAG_
symbols, and move them from libavcodec/avcodec.h to libavutil/cpu.h.

Originally committed as revision 25040 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-04 09:59:08 +00:00
Ronald S. Bultje
2c166c3af1 Port latest x264 deblock asm (before they moved to using NV12 as internal
format), LGPL'ed with permission from Jason and Loren. This includes mmx2
code, so remove inline asm from h264dsp_mmx.c accordingly.

Originally committed as revision 25031 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-03 16:52:46 +00:00
Eli Friedman
a10a9f5cd0 Fix typo in r25019.
Patch by Eli Friedman <eli.friedman at gmail dot com>.

Originally committed as revision 25022 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-01 23:19:36 +00:00
Ronald S. Bultje
615da9b1d9 Unscrew breakage after my last commit because of symbol prefixes.
Originally committed as revision 25020 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-01 21:10:19 +00:00
Ronald S. Bultje
a33a2562c1 Rename h264_weight_sse2.asm to h264_weight.asm; add 16x8/8x16/8x4 non-square
biweight code to sse2/ssse3; add sse2 weight code; and use that same code to
create mmx2 functions also, so that the inline asm in h264dsp_mmx.c can be
removed. OK'ed by Jason on IRC.

Originally committed as revision 25019 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-01 20:56:16 +00:00
Ronald S. Bultje
14bc1f2485 Split h264dsp_mmx.c (which was #included in dsputil_mmx.c) in h264_qpel_mmx.c,
still #included in dsputil_mmx.c and is part of DSPContext, and h264dsp_mmx.c,
which represents H264DSPContext and is now compiled on its own.

Originally committed as revision 25018 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-01 20:48:59 +00:00
Ronald S. Bultje
5929b3a651 Fix vertical align.
Originally committed as revision 25009 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-31 12:32:24 +00:00
Ronald S. Bultje
79ce0f002e Fix compilation failure if yasm is disabled (missing vp3 symbols).
Originally committed as revision 24992 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-30 20:30:40 +00:00
Ronald S. Bultje
de1c253bab Split intra prediction initialization (i.e. assigning of function pointers)
into its own file, it doesn't belong in h264dsp_mmx.c (much less so in
dsputil_mmx.c).

Originally committed as revision 24990 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-30 16:34:13 +00:00
Ronald S. Bultje
d0eb5a1174 Move H264 chroma MC from inline asm to yasm. This fixes VP3/5/6 and VC-1
fate failures on Win64.

Originally committed as revision 24989 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-30 16:31:04 +00:00
Ronald S. Bultje
e9f5f020c6 Move VP3 IDCT functions from inline ASM to YASM. This fixes part of the VP3/5/6
issues on Win64.

Originally committed as revision 24988 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-30 16:25:46 +00:00
Ronald S. Bultje
7e7c4b6008 Put ff_ prefix on non-static {put_signed,put,add}_pixels_clamped_mmx()
functions.

Originally committed as revision 24987 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-30 16:22:27 +00:00
Loren Merritt
19d929f9a3 cosmetics in imdct_sse
Originally committed as revision 24958 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-28 21:03:13 +00:00
Ronald S. Bultje
4eca52ed19 Fix typos when converting inline asm to yasm, fixes MMX-only fate-ea-vp61.
Originally committed as revision 24948 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-26 14:33:39 +00:00
Ronald S. Bultje
6697bc33e2 Revert r24931, it broke Win32 and some BSD compiles (yay fate).
Originally committed as revision 24934 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-25 20:36:35 +00:00
Ronald S. Bultje
72f642400b Mark xmm6 and xmm7 as clobbered in ff_vp3_idct_sse2(), which is contributing
to the VP6 fate failures on Win64.

Originally committed as revision 24931 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-25 19:57:05 +00:00
Måns Rullgård
69dad87c48 VP6: fix vp6_filter_diag4_mmx/sse on 64-bit
The stride can be negative and must be sign extended before being
used in pointer arithmetic.

Originally committed as revision 24926 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-25 15:41:11 +00:00
Ronald S. Bultje
89fa3504ed Move vp6_filter_diag4() x86 SIMD code from inline ASM to YASM. This should
help in fixing the Win64 fate failures.

Originally committed as revision 24922 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-25 13:44:16 +00:00
Ronald S. Bultje
3a0885146c Move vp6_filter_diag4() from DSPContext to VP56DSPContext.
Originally committed as revision 24921 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-25 13:42:28 +00:00
Måns Rullgård
c0ec9918b0 Remove global mm_flags variable
Originally committed as revision 24909 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-24 17:47:05 +00:00
Ronald S. Bultje
3611c45ab7 Mark xmm registers as clobbered in simple loopfilter. Should fix the last
two VP8-related fate failures on Win64.

Originally committed as revision 24908 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-24 16:52:27 +00:00
Alex Converse
cb4f12466b imdct/x86: Use "s->mdct_size" instead of "1 << s->mdct_bits".
It generates smaller cleaner code.

Originally committed as revision 24887 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-23 15:51:09 +00:00
Ronald S. Bultje
684d608bde Fix segfaults in VP8 SIMD code on Win64 (and FATE/win64 failures).
Originally committed as revision 24871 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-23 02:41:22 +00:00
Alex Converse
78b5c97d3e Convert ff_imdct_half_sse() to yasm.
This is to avoid split asm sections that attempt to preserve some
registers between sections.

Originally committed as revision 24869 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-22 14:39:58 +00:00
Jason Garrett-Glaser
05c04cdf54 VP5/6/8: ~7% faster arithmetic decoding
Grab from the bitstream in 16-bit chunks instead of 8-bit chunks.
TODO: grab in 32-bit chunks on 64-bit systems.

Originally committed as revision 24783 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-12 01:11:32 +00:00
Jason Garrett-Glaser
4a384de5b8 Split h264dsp and h264pred in configure.
Many H.264 derivatives, like RV40 and VP8, use the H.264 prediction functions
but not the weight/loopfilter functions.
This should reduce the size of builds with one of these derivatives but without
H.264 decoding itself.

Originally committed as revision 24741 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-07 23:10:25 +00:00
Jason Garrett-Glaser
98fe09df7b Add file missing in r24702
Originally committed as revision 24703 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-05 00:49:48 +00:00
Eli Friedman
c12d6955e2 H.264: SSE2/SSSE3 weighted prediction asm
Patch by Eli Friedman <eli.friedman at gmail dot com>

Originally committed as revision 24702 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-05 00:13:38 +00:00
Måns Rullgård
f079a64aea Move cavs dsp functions to their own struct
Originally committed as revision 24685 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-03 20:59:00 +00:00
Jason Garrett-Glaser
8b9b5e085f VP5/6/8: add one inline missed in r24677
Originally committed as revision 24682 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-03 11:21:22 +00:00
Jason Garrett-Glaser
827d43bb9d VP8: move zeroing of luma DC block into the WHT
Lets us do the zeroing in asm instead of C.
Also makes it consistent with the way the regular iDCT code does it.

Originally committed as revision 24668 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-02 20:18:09 +00:00
Ronald S. Bultje
6341838f3c Use word-writing instead of dword-writing (with two cached but otherwise
unchanged bytes) in the horizontal simple loopfilter. This makes the filter
quite a bit faster in itself (~30 cycles less on Core1), probably mostly
because we don't need a complex 4x4 transpose, but only a simple byte
interleave. Also allows using pextrw on SSE4, which speeds up even more
(e.g. 25% faster on Core i7).

Originally committed as revision 24638 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-31 23:13:15 +00:00
Vitor Sessak
fa738b3ad1 Remove x86/mmx.h. It is not used anymore and has been deprecated for years.
Originally committed as revision 24618 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-31 16:20:45 +00:00
Vitor Sessak
de4bc44abb Convert deinterlacing MMX code to YASM
Originally committed as revision 24615 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-31 14:50:51 +00:00
Vitor Sessak
740dfe7012 Fix compilation in x86_64. I broke it with r24580.
Originally committed as revision 24582 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-29 22:45:21 +00:00
Vitor Sessak
2c3dda6838 Translate libmpeg2 MMX IDCT to plain asm
Originally committed as revision 24580 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-29 22:19:54 +00:00
Ronald S. Bultje
ab4d031889 Use pmaddubsw for the mbedge_filter (>=ssse3), 6-10 cycles faster.
Originally committed as revision 24514 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-26 21:18:19 +00:00
Jason Garrett-Glaser
e25dee602f VP8: Much faster SSE2 MC
5-10% faster or more on Phenom, Athlon 64, and some others.
Helps some on pre-SSSE3 Intel chips as well, but not as much.

Originally committed as revision 24513 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-26 19:34:00 +00:00
Ronald S. Bultje
48adb7e7a4 Enable no-loop memory/register saving for ssse3/sse4 also.
Originally committed as revision 24511 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-26 14:07:57 +00:00
Ronald S. Bultje
2a180c69ea Save a register (or regsize of stackspace for x86-32) for the no-loop
mbedge loopfilter functions, by re-using space that holds a variable
that we no longer need.

Originally committed as revision 24510 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-26 14:00:15 +00:00
Ronald S. Bultje
bcd4aa6498 Use nested ifs instead of &&, which appears to not work with %ifidn (i.e. this
construct was always enabled, even for <ssse3 versions).

Originally committed as revision 24509 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-26 13:56:51 +00:00
Ronald S. Bultje
2208053bd3 Split pextrw macro-spaghetti into several opt-specific macros, this will make
future new optimizations (imagine a sse5) much easier. Also fix a bug where
we used the direction (%2) rather than optimization (%1) to enable this, which
means it wasn't ever actually used...

Originally committed as revision 24507 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-26 13:50:59 +00:00
Ronald S. Bultje
6de5b7c6b8 Fix obvious bug in assignment. Somehow, the test vectors don't test this...
Originally committed as revision 24489 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-25 02:42:40 +00:00
Ronald S. Bultje
e3f7bf774c Fix SPLATB_REG mess. Used to be a if/elseif/elseif/elseif spaghetti, so this
splits it into small optimization-specific macros which are selected for each
DSP function. The advantage of this approach is that the sse4 functions now
use the ssse3 codepath also without needing an explicit sse4 codepath.

Originally committed as revision 24487 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-24 19:33:05 +00:00
Eli Friedman
3611e7a309 Inline asm for VP56 arith coder
This is a lot more reliable to get cmov rather than trying to trick gcc into
generating it, useful since it's 2% faster overall.

Patch by Eli Friedman <eli.friedman at gmail>

Originally committed as revision 24471 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 21:46:30 +00:00
Jason Garrett-Glaser
3ae079a3c8 VP8: optimize DC-only chroma case in the same way as luma.
Add MMX idct_dc_add4uv function for this case.
~40% faster chroma idct.

Originally committed as revision 24455 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 06:02:52 +00:00
Jason Garrett-Glaser
51c9156438 VP8 asm: cosmetics (spacing)
Originally committed as revision 24453 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 03:02:56 +00:00
Jason Garrett-Glaser
8a467b2d44 VP8: 30% faster idct_mb
Take shortcuts based on statistically common situations.
Add 4-at-a-time idct_dc function (mmx and sse2) since rows of 4 DC-only DCT
blocks are common.
TODO: tie this more directly into the MB mode, since the DC-level transform is
only used for non-splitmv blocks?

Originally committed as revision 24452 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 02:58:27 +00:00
Jason Garrett-Glaser
c25c776708 VP8: clear DCT blocks in iDCT instead of using clear_blocks.
~0.3% faster overall.

Originally committed as revision 24448 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 00:07:16 +00:00
Ronald S. Bultje
dc5eec8085 Use pextrw for SSE4 mbedge filter result writing, speedup 5-10cycles on
CPUs supporting it.

Originally committed as revision 24437 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-22 19:59:34 +00:00
Ronald S. Bultje
003243c3c2 Fix and enable horizontal >=SSE2 mbedge loopfilter.
Originally committed as revision 24409 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-22 01:35:26 +00:00
Loren Merritt
c7b1d9768c relicense h264 deblock sse2 to lgpl
Originally committed as revision 24408 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-22 00:39:49 +00:00
Loren Merritt
532e769701 sync yasm macros from x264
Originally committed as revision 24406 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-21 22:45:16 +00:00
Jason Garrett-Glaser
8731dbd890 Eliminate one instruction in VP8 dc_add_sse4
Originally committed as revision 24405 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-21 22:41:37 +00:00
Jason Garrett-Glaser
7dd224a42d Various VP8 x86 deblocking speedups
SSSE3 versions, improve SSE2 versions a bit.
SSE2/SSSE3 mbedge h functions are currently broken, so explicitly disable them.

Originally committed as revision 24403 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-21 22:11:03 +00:00
Jason Garrett-Glaser
b8b231b5dc Make mmx VP8 WHT faster
Avoid pextrw, since it's slow on many older CPUs.
Now it doesn't require mmxext either.

Originally committed as revision 24397 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-21 20:51:01 +00:00
David Conrad
af521abc28 Add header declarations for mmx/sse constants missing them
Originally committed as revision 24381 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-21 10:02:07 +00:00
David Conrad
c7eec58170 Move ff_pw_* from vc1dsp_mmx.c to dsputil_mmx.c
Should fix compilation with icc and should help prevent any future duplicates

Originally committed as revision 24380 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-21 10:02:03 +00:00
Ronald S. Bultje
e9e456d850 VP8 MBedge loopfilter MMX/MMX2/SSE2 functions for both luma (width=16)
and chroma (width=8).

Originally committed as revision 24378 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-20 22:58:56 +00:00
Ronald S. Bultje
268821e76e Chroma (width=8) inner loopfilter MMX/MMX2/SSE2 for VP8 decoder.
Originally committed as revision 24377 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-20 22:04:18 +00:00