600 Commits

Author SHA1 Message Date
Ronald S. Bultje
af79a0c48a png: add support for bpp>4 to paeth x86 SIMD code.
This fixes playback of e.g. RGB48 (bpp=6) content on x86 CPUs. Fixes
bug 214.
2012-01-29 21:22:50 -08:00
Ronald S. Bultje
f91c4b7824 png: add SSE2 version for add_bytes_l2. 2012-01-29 18:52:17 -08:00
Ronald S. Bultje
59f474b49d png: convert DSP functions to yasm. 2012-01-29 18:47:50 -08:00
Ronald S. Bultje
20a7d3178f png: add missing #if HAVE_SSSE3 around function pointer assignment. 2012-01-29 12:31:59 -08:00
Ronald S. Bultje
331e7c4cb3 imdct36: mark SSE functions as using all 16 XMM registers.
On x86-64, it indeed uses all 16 registers (and on x86-32, this gets
clipped to 8). Not marking it properly causes callers of this function
to fail randomly because of XMM register clobbering.
2012-01-29 08:14:05 -08:00
Ronald S. Bultje
e92003514d png: move DSP functions to their own DSP context. 2012-01-29 08:11:18 -08:00
Ronald S. Bultje
3b15a6d742 config.asm: change %ifdef directives to %if directives.
This allows combining multiple conditionals in a single statement.
2012-01-27 10:19:57 +08:00
Ronald S. Bultje
c3af52fa8b dsputil: use vertical component for drawing bottom edge.
Current code only writes 8 pixels of vertical edge for YUV422, which
causes MC artifacts when subsequent frames use data from that edge.
2012-01-25 18:06:36 +08:00
Christophe GISQUET
9ba9c34024 rv34: 1-pass inter MB reconstruction
Implement 1-pass inverse transform and reconstruction for inter blocks.
2012-01-16 19:26:41 +01:00
Christophe GISQUET
d78062386e rv34: Intra 16x16 handling
Extract processing of intra 16x16 blocks from intra macroblock
processing.
Also implement a function performing inverse transform and block
reconstruction for DC-only blocks in 1 pass instead of 2.
2012-01-16 00:41:51 +01:00
Christophe GISQUET
3faa303a47 rv34: DC-only inverse transform
When decoding coefficients, detect whether the block is DC-only, and take
advantage of this knowledge to perform DC-only inverse transform.

This is achieved by:
- first, changing the 108x4 element modulo_three_table into a 108 element
  table (kind of base4), and accessing each value using mask and shifts.
- then, checking low bits for 0 (as they represent the presence of higher
  frequency coefficients)

Also provide x86 SIMD code for the DC-only inverse transform.

Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
2012-01-12 09:52:33 +01:00
Henrik Gramner
e7d02b04dc fft: init functions with INIT_XMM/YMM.
This is required to handle clobbering of XMM registers on Win64
correctly. Fixes FFT and all tests depending on FFT on Win64.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2012-01-11 20:12:26 +01:00
Vitor Sessak
39df0c434c mpegaudiodec: optimized iMDCT transform
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-01-08 17:40:55 -08:00
Martin Storsjö
676a9ee1d2 x86: Fix constraints for decode_significance*_x86
Originally, prior to 8742a4ff8, the caller code was compiled
within this condition:

ARCH_X86 && HAVE_7REGS && HAVE_EBX_AVAILABLE && !defined(BROKEN_RELOCATIONS)

Since HAVE_7REGS is defined as
(ARCH_X86_64 || (HAVE_EBX_AVAILABLE && HAVE_EBP_AVAILABLE))
the subcondition HAVE_7REGS && HAVE_EBX_AVAILABLE is equal
to HAVE_7REGS (for 32 bit at least). The correct simplification
of the original condition thus is HAVE_7REGS, not
HAVE_EBX_AVAILABLE.

This fixes compilation in some cases where HAVE_EBP_AVAILABLE = 0
and HAVE_EBX_AVAILABLE = 1.

Signed-off-by: Martin Storsjö <martin@martin.st>
2011-12-27 09:05:14 +02:00
Diego Biurrun
6fdb2ce34a x86: Tighten register constraints for decode_significance*_x86.
On 32-bit OS X with gcc 4.0/4.2 and shared libraries enabled, the ebx register
is not available, but required to assemble the functions.

This reverts commit 8742a4f to a simplified version of the original constraints.
2011-12-21 12:06:37 +01:00
Diego Biurrun
30bbd5cbc0 x86: conditionally compile dnxhd encoder optimizations 2011-12-19 13:54:10 +01:00
Diego Biurrun
88b9735753 build: conditionally compile x86 H.264 chroma optimizations 2011-12-14 11:58:45 +01:00
Martin Storsjö
f1dba9e498 x86: Require 7 registers for the cabac asm
The change in 599b4c6ef didn't turn out to work properly on
i386 on OS X, where it broke building with PIC enabled.

Signed-off-by: Martin Storsjö <martin@martin.st>
2011-12-12 15:36:20 +02:00
Mans Rullgard
599b4c6efd x86: cabac: replace explicit memory references with "m" operands
This replaces the explicit offset(reg) memory references with
"m" operands for the same locations.  As a result, one fewer
register operand is needed for these inline asm statements.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-12-11 22:29:22 +00:00
Diego Biurrun
da9cea77e3 Fix a bunch of common typos. 2011-12-11 00:32:25 +01:00
Justin Ruggles
0e8fdd41c2 dsputil: use cpuflags in x86 emu_edge_core
avoids passing around the extra argument among all the macros it uses
2011-11-22 15:40:51 -05:00
Justin Ruggles
395f2e70dd dsputil: use movups instead of movdqu in ff_emu_edge_core_sse()
This allows emulated_edge_mc_sse() and gmc_sse() to be used under
AV_CPU_FLAG_SSE.
2011-11-22 15:40:51 -05:00
Justin Ruggles
9d06037d48 twinvq: add SSE/AVX optimized sum/difference stereo interleaving 2011-11-11 14:13:58 -05:00
Diego Biurrun
ce33320b30 Remove redundant filename self-references inside files.
Filenames are brittle across renames and add no useful information.
2011-11-08 17:52:56 +01:00
Diego Biurrun
276b995d85 x86: drop pointless ARCH_X86 #ifdef from files in x86 subdirectory 2011-11-08 17:52:55 +01:00
Justin Ruggles
b8f02f5b4e dsputil: use cpuflags in x86 versions of vector_clip_int32() 2011-11-06 20:50:06 -05:00
Ronald S. Bultje
717401aff2 h264_weight: remove duplication functions. 2011-11-05 07:16:30 -07:00
Justin Ruggles
5463e83dbc fmtconvert: fix int32_to_float_fmul_scalar() for windows x86_64
The calling convention only allows 4 non-stack parameter, with each
float or int register being skipped if not used.

fixes Bug 64
2011-11-02 21:44:58 -04:00
Daniel Kang
ded3e9f054 H.264: Cometics to dsputil_mmx.c
Add whitespace.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-10-26 06:41:32 -07:00
Ronald S. Bultje
b0b3231074 h264_weight: initialize "height" function argument properly.
Right now it's not actually initialized on 32-bit, leading to crashes
on win32.
2011-10-22 00:23:24 -07:00
Justin Ruggles
aad3429d4e fmtconvert: port float_to_int16_interleave() 2-channel x86 inline asm to yasm 2011-10-21 10:13:05 -04:00
Justin Ruggles
4e8e262476 fmtconvert: port int32_to_float_fmul_scalar() x86 inline asm to yasm 2011-10-21 10:13:05 -04:00
Justin Ruggles
185142a5ea fmtconvert: check compile-time x86 instruction set flags 2011-10-21 10:13:05 -04:00
Justin Ruggles
708ab7dd69 fmtconvert: port float_to_int16() x86 inline asm to yasm 2011-10-21 10:13:05 -04:00
Ronald S. Bultje
c2d337429c H264: change weight/biweight functions to take a height argument.
Neon parts by Mans Rullgard <mans@mansr.com>.
2011-10-21 01:00:45 -07:00
Ronald S. Bultje
229d263cc9 Support for lossless and inter H264 4:2:2. 2011-10-21 01:00:45 -07:00
Baptiste Coudurier
76741b0e56 h264: 4:2:2 intra decoding support
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-10-21 01:00:41 -07:00
Diego Biurrun
265980dabc x86: Move some variable declarations below the appropriat #ifdef.
This avoids some unused variable warnings with YASM disabled.
2011-10-20 16:19:27 +02:00
Diego Biurrun
2cb7c81669 x86: Fix linking of ProRes DSP ASM with YASM disabled. 2011-10-20 16:19:13 +02:00
Ronald S. Bultje
05c8f119cc proresdsp: fix function prototypes.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2011-10-14 21:34:46 +02:00
Ronald S. Bultje
e3f530feca prores: idct sse2/sse4 optimizations.
~3.0-3.5x as fast as original C version, 1.6x as fast overall.
2011-10-11 07:50:48 -07:00
Sean McGovern
c2d3f56107 fft: avoid a signed overflow
As a signed integer, 1<<31 overflows, so force it to unsigned.

Signed-off-by: Alex Converse <alex.converse@gmail.com>
2011-09-23 17:02:58 -07:00
Ronald S. Bultje
38e06c2969 Move clipd macros to x86util.asm.
This allows sharing them between multiple .asm files.
2011-08-17 20:56:06 -07:00
Dave Yeo
cc73511e8e Fix NASM include directive
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-08-15 11:24:35 -07:00
Alex Converse
48f7163f13 dsputil_mmx: Honor HAVE_AMD3DNOW 2011-08-15 11:20:08 -07:00
Ronald S. Bultje
b2c087871d Move x86util.asm from libavcodec/ to libavutil/.
This allows using it in swscale also.
2011-08-12 11:43:03 -07:00
Ronald S. Bultje
3a39195b1d Move x86inc.asm to libavutil/.
This allows using it in libswscale/ also.
2011-08-12 11:43:02 -07:00
Kostya Shishkov
d241f51e0f Move RV3/4-specific DSP functions into their own context
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-08-11 16:07:15 -07:00
Vitor Sessak
18b131de04 dct32: Add SSE2 ASM optimizations
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-08-02 10:17:29 -07:00
Jason Garrett-Glaser
a3bf7b864a H.264: tweak some other x86 asm for Atom 2011-07-29 12:24:15 -07:00