Commit Graph

123 Commits

Author SHA1 Message Date
Mans Rullgard
356ee8d7de x86: clean up ff_dsputil_init_mmx()
This splits ff_dsputil_init_mmx() into multiple functions, one for
each MMX/SSE level, somewhat simplifying the nested conditions.

Signed-off-by: Mans Rullgard <mans@mansr.com>
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2012-03-05 14:40:03 +01:00
Martin Storsjö
9cf0841ef3 dsputil: Add ff_ prefix to the dsputil*_init* functions
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-02-15 22:06:34 +02:00
Christophe Gisquet
6b03900382 x86 dsputil: provide SSE2/SSSE3 versions of bswap_buf
While pshufb allows emulating bswap on XMM registers for SSSE3, more
shuffling is needed for SSE2. Alignment is critical, so specific codepaths
are provided for this case.

For the huffyuv sequence "angels_480-huffyuvcompress.avi":
C (using bswap instruction): ~ 55k cycles
SSE2:                        ~ 40k cycles
SSSE3 using unaligned loads: ~ 35k cycles
SSSE3 using aligned loads:   ~ 30k cycles

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2012-01-30 10:19:55 +01:00
Ronald S. Bultje
e92003514d png: move DSP functions to their own DSP context. 2012-01-29 08:11:18 -08:00
Ronald S. Bultje
c3af52fa8b dsputil: use vertical component for drawing bottom edge.
Current code only writes 8 pixels of vertical edge for YUV422, which
causes MC artifacts when subsequent frames use data from that edge.
2012-01-25 18:06:36 +08:00
Diego Biurrun
88b9735753 build: conditionally compile x86 H.264 chroma optimizations 2011-12-14 11:58:45 +01:00
Justin Ruggles
395f2e70dd dsputil: use movups instead of movdqu in ff_emu_edge_core_sse()
This allows emulated_edge_mc_sse() and gmc_sse() to be used under
AV_CPU_FLAG_SSE.
2011-11-22 15:40:51 -05:00
Justin Ruggles
9d06037d48 twinvq: add SSE/AVX optimized sum/difference stereo interleaving 2011-11-11 14:13:58 -05:00
Justin Ruggles
b8f02f5b4e dsputil: use cpuflags in x86 versions of vector_clip_int32() 2011-11-06 20:50:06 -05:00
Daniel Kang
ded3e9f054 H.264: Cometics to dsputil_mmx.c
Add whitespace.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-10-26 06:41:32 -07:00
Ronald S. Bultje
e3f530feca prores: idct sse2/sse4 optimizations.
~3.0-3.5x as fast as original C version, 1.6x as fast overall.
2011-10-11 07:50:48 -07:00
Alex Converse
48f7163f13 dsputil_mmx: Honor HAVE_AMD3DNOW 2011-08-15 11:20:08 -07:00
Kostya Shishkov
d241f51e0f Move RV3/4-specific DSP functions into their own context
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-08-11 16:07:15 -07:00
Jason Garrett-Glaser
a3bf7b864a H.264: tweak some other x86 asm for Atom 2011-07-29 12:24:15 -07:00
Mans Rullgard
a617c6aaa3 dsputil: update per-arch init funcs for non-h264 high bit depth
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-07-21 18:10:58 +01:00
Mans Rullgard
e7a972e113 simple_idct: add 10-bit version
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-07-20 17:49:48 +01:00
Diego Biurrun
65083b4911 dsputil: remove disabled code 2011-07-18 11:48:35 +02:00
Mans Rullgard
710b8df949 dsputil: remove ff_emulated_edge_mc macro used in one place
This macro can cause problems in conjunction with the bitdepth
template expansion.  It was presumably added to keep source
compatibility when high bitdepth support was added.  However,
emulated_edge_mc is a dsputil pointer and should not be called
directly, so there is little reason to keep such a macro.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-07-10 17:55:58 +01:00
Daniel Kang
c0483d0c7a H.264: Add x86 assembly for 10-bit H.264 predict functions
Mainly ported from 8-bit H.264 predict.

Some code ported from x264. LGPL ok by author.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-07-08 15:59:29 -07:00
Daniel Kang
3c7c16fde3 YASM: Shut up unused variable compiler warning with --disable-yasm.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2011-07-04 18:49:09 +02:00
Daniel Kang
58f7aad051 Fix build with --disable-yasm.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-07-03 22:56:09 -07:00
Daniel Kang
9bfa5363da H.264: Add x86 assembly for 10-bit H.264 qpel functions.
Mainly ported from 8-bit H.264 qpel.

Some code ported from x264. LGPL ok by author.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-07-03 07:43:38 -07:00
Justin Ruggles
6054cd25b4 ac3enc: add int32_t array clipping function to DSPUtil, including x86 versions. 2011-07-01 13:02:11 -04:00
Diego Biurrun
d2ee495fb2 configure: Drop check for availability of ten assembler operands.
This was done to support gcc 2.95, which is an old legacy compiler
that fails to compile the current codebase anyway.
2011-06-28 13:14:37 +02:00
Ronald S. Bultje
ed63f527f2 Fix build if yasm is not available. 2011-06-18 08:34:14 -04:00
Daniel Kang
f188a1e0ca H.264: Add x86 assembly for 10-bit MC Chroma H.264 functions.
Mainly ported from 8-bit H.264 MC Chroma.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-06-18 07:52:19 -04:00
Jason Garrett-Glaser
c90b94424c 4:4:4 H.264 decoding support
Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.
2011-06-13 21:16:30 -07:00
Jason Garrett-Glaser
504811baea Roll back 4:4:4 H.264 for now
Needs some ARM/PPC asm modifications.
2011-06-13 13:38:46 -07:00
Jason Garrett-Glaser
c9c493872c 4:4:4 H.264 decoding support
Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.
2011-06-13 12:21:39 -07:00
Jason Garrett-Glaser
9f3d6ca4f1 Port x86 10-bit H.264 deblock asm from x264 2011-05-10 20:02:15 -07:00
Oskar Arvidsson
19a0729b4c Adds 8-, 9- and 10-bit versions of some of the functions used by the h264 decoder.
This patch lets e.g. dsputil_init chose dsp functions with respect to
the bit depth to decode. The naming scheme of bit depth dependent
functions is <base name>_<bit depth>[_<prefix>] (i.e. the old
clear_blocks_c is now named clear_blocks_8_c).

Note: Some of the functions for high bit depth is not dependent on the
bit depth, but only on the pixel size. This leaves some room for
optimizing binary size.

Preparatory patch for high bit depth h264 decoding support.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-05-10 07:24:36 -04:00
Alexander Strange
1500be13f2 dsputil: allow to skip drawing of top/bottom edges. 2011-03-26 17:45:38 -04:00
Justin Ruggles
e6e9823488 Add apply_window_int16() to DSPContext with x86-optimized versions and use it
in the ac3_fixed encoder.
2011-03-22 21:08:30 -04:00
Mans Rullgard
2912e87a6c Replace FFmpeg with Libav in licence headers
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-03-19 13:33:20 +00:00
Ronald S. Bultje
bf6fa73245 dsputil_mmx.c: remove ff_vector128.
Remove ff_vector128, it is identical to ff_pb_80.
2011-02-19 10:51:15 -05:00
Ronald S. Bultje
12802ec060 dsputil: move VC1-specific stuff into VC1DSPContext. 2011-02-17 17:35:35 -05:00
Justin Ruggles
c73d99e672 Separate format conversion DSP functions from DSPContext.
This will be beneficial for use with the audio conversion API without
requiring it to depend on all of dsputil.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-02-02 02:44:53 +00:00
Ronald S. Bultje
81f2a3f4ff Implement a SIMD version of emulated_edge_mc() for x86.
From ~550 cycles (C version) to 170 (SSE/x86-64), 206 (MMX/x86-32)
and 196 (SSE2/x86-32) cycles.
2011-01-31 20:55:56 -05:00
Justin Ruggles
d19b744a36 cosmetics: indentation
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-01-31 20:30:15 +00:00
Justin Ruggles
80ba1ddb58 Remove unneeded add bias from 3 functions.
DSPContext.vector_fmul_window()
DCADSPContext.lfe_fir()
SynthFilterContext.synth_filter_float()

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-01-31 20:28:42 +00:00
Justin Ruggles
6eabb0d3ad Change DSPContext.vector_fmul() from dst=dst*src to dest=src0*src1.
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-01-22 17:53:27 +00:00
Mans Rullgard
ef4a65149d Replace ASMALIGN() with .p2align
This macro has unconditionally used .p2align for a long time and
serves no useful purpose.
2011-01-18 20:48:24 +00:00
Mans Rullgard
ac3c9d0169 x86: remove VLA in ac3_downmix_sse 2011-01-18 20:48:24 +00:00
Ronald S. Bultje
ec3233a855 Fix ff_pw_3 alignment.
Originally committed as revision 26344 to svn://svn.ffmpeg.org/ffmpeg/trunk
2011-01-14 23:26:34 +00:00
Jason Garrett-Glaser
19fb234e4a H.264: split luma dc idct out and implement MMX/SSE2 versions
About 2.5x the speed.

NOTE: the way that the asm code handles large qmuls is a bit suboptimal.
If x264-style dequant was used (separate shift and qmul values), it might
be possible to get some extra speed.

Originally committed as revision 26336 to svn://svn.ffmpeg.org/ffmpeg/trunk
2011-01-14 21:34:25 +00:00
Ronald S. Bultje
8d147f1f60 For rounding in chroma MC SSSE3, use 16-byte pw_3/4 instead of reading 8 bytes
and then using movlhps to dup it into the higher half of the register.

Originally committed as revision 26086 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-24 17:23:22 +00:00
Baptiste Coudurier
90f1f3bf00 In yadif filter, declare asm constants directly to avoid dependency on libavcodec
Originally committed as revision 25895 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-06 00:14:15 +00:00
Baptiste Coudurier
9e95999e2a 10l, add ff_pw_1 to dsputil_mmx for yadif sse2
Originally committed as revision 25881 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-04 13:06:06 +00:00
İsmail Dönmez
80e33d2451 dsputil: Use explicit movzbl instead of movzx
This fixes compilation with the latest clang trunk version.

Patch by İsmail Dönmez, ismail at namtrac dot org

Originally committed as revision 25628 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-11-01 19:35:51 +00:00
Ramiro Polla
153ca56b38 xmm_clobbers: list xmm registers first in clobber list
suncc does not like the leading commas inside the macro, but it has no problem
with trailing commas.

Originally committed as revision 25615 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-31 18:14:48 +00:00