Ronald S. Bultje
e3f530feca
prores: idct sse2/sse4 optimizations.
...
~3.0-3.5x as fast as original C version, 1.6x as fast overall.
2011-10-11 07:50:48 -07:00
Alex Converse
48f7163f13
dsputil_mmx: Honor HAVE_AMD3DNOW
2011-08-15 11:20:08 -07:00
Kostya Shishkov
d241f51e0f
Move RV3/4-specific DSP functions into their own context
...
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-08-11 16:07:15 -07:00
Jason Garrett-Glaser
a3bf7b864a
H.264: tweak some other x86 asm for Atom
2011-07-29 12:24:15 -07:00
Mans Rullgard
a617c6aaa3
dsputil: update per-arch init funcs for non-h264 high bit depth
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-07-21 18:10:58 +01:00
Mans Rullgard
e7a972e113
simple_idct: add 10-bit version
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-07-20 17:49:48 +01:00
Diego Biurrun
65083b4911
dsputil: remove disabled code
2011-07-18 11:48:35 +02:00
Mans Rullgard
710b8df949
dsputil: remove ff_emulated_edge_mc macro used in one place
...
This macro can cause problems in conjunction with the bitdepth
template expansion. It was presumably added to keep source
compatibility when high bitdepth support was added. However,
emulated_edge_mc is a dsputil pointer and should not be called
directly, so there is little reason to keep such a macro.
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-07-10 17:55:58 +01:00
Daniel Kang
c0483d0c7a
H.264: Add x86 assembly for 10-bit H.264 predict functions
...
Mainly ported from 8-bit H.264 predict.
Some code ported from x264. LGPL ok by author.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-07-08 15:59:29 -07:00
Daniel Kang
3c7c16fde3
YASM: Shut up unused variable compiler warning with --disable-yasm.
...
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2011-07-04 18:49:09 +02:00
Daniel Kang
58f7aad051
Fix build with --disable-yasm.
...
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-07-03 22:56:09 -07:00
Daniel Kang
9bfa5363da
H.264: Add x86 assembly for 10-bit H.264 qpel functions.
...
Mainly ported from 8-bit H.264 qpel.
Some code ported from x264. LGPL ok by author.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-07-03 07:43:38 -07:00
Justin Ruggles
6054cd25b4
ac3enc: add int32_t array clipping function to DSPUtil, including x86 versions.
2011-07-01 13:02:11 -04:00
Diego Biurrun
d2ee495fb2
configure: Drop check for availability of ten assembler operands.
...
This was done to support gcc 2.95, which is an old legacy compiler
that fails to compile the current codebase anyway.
2011-06-28 13:14:37 +02:00
Ronald S. Bultje
ed63f527f2
Fix build if yasm is not available.
2011-06-18 08:34:14 -04:00
Daniel Kang
f188a1e0ca
H.264: Add x86 assembly for 10-bit MC Chroma H.264 functions.
...
Mainly ported from 8-bit H.264 MC Chroma.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-06-18 07:52:19 -04:00
Jason Garrett-Glaser
c90b94424c
4:4:4 H.264 decoding support
...
Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.
2011-06-13 21:16:30 -07:00
Jason Garrett-Glaser
504811baea
Roll back 4:4:4 H.264 for now
...
Needs some ARM/PPC asm modifications.
2011-06-13 13:38:46 -07:00
Jason Garrett-Glaser
c9c493872c
4:4:4 H.264 decoding support
...
Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.
2011-06-13 12:21:39 -07:00
Jason Garrett-Glaser
9f3d6ca4f1
Port x86 10-bit H.264 deblock asm from x264
2011-05-10 20:02:15 -07:00
Oskar Arvidsson
19a0729b4c
Adds 8-, 9- and 10-bit versions of some of the functions used by the h264 decoder.
...
This patch lets e.g. dsputil_init chose dsp functions with respect to
the bit depth to decode. The naming scheme of bit depth dependent
functions is <base name>_<bit depth>[_<prefix>] (i.e. the old
clear_blocks_c is now named clear_blocks_8_c).
Note: Some of the functions for high bit depth is not dependent on the
bit depth, but only on the pixel size. This leaves some room for
optimizing binary size.
Preparatory patch for high bit depth h264 decoding support.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-05-10 07:24:36 -04:00
Alexander Strange
1500be13f2
dsputil: allow to skip drawing of top/bottom edges.
2011-03-26 17:45:38 -04:00
Justin Ruggles
e6e9823488
Add apply_window_int16() to DSPContext with x86-optimized versions and use it
...
in the ac3_fixed encoder.
2011-03-22 21:08:30 -04:00
Mans Rullgard
2912e87a6c
Replace FFmpeg with Libav in licence headers
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-03-19 13:33:20 +00:00
Ronald S. Bultje
bf6fa73245
dsputil_mmx.c: remove ff_vector128.
...
Remove ff_vector128, it is identical to ff_pb_80.
2011-02-19 10:51:15 -05:00
Ronald S. Bultje
12802ec060
dsputil: move VC1-specific stuff into VC1DSPContext.
2011-02-17 17:35:35 -05:00
Justin Ruggles
c73d99e672
Separate format conversion DSP functions from DSPContext.
...
This will be beneficial for use with the audio conversion API without
requiring it to depend on all of dsputil.
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-02-02 02:44:53 +00:00
Ronald S. Bultje
81f2a3f4ff
Implement a SIMD version of emulated_edge_mc() for x86.
...
From ~550 cycles (C version) to 170 (SSE/x86-64), 206 (MMX/x86-32)
and 196 (SSE2/x86-32) cycles.
2011-01-31 20:55:56 -05:00
Justin Ruggles
d19b744a36
cosmetics: indentation
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-01-31 20:30:15 +00:00
Justin Ruggles
80ba1ddb58
Remove unneeded add bias from 3 functions.
...
DSPContext.vector_fmul_window()
DCADSPContext.lfe_fir()
SynthFilterContext.synth_filter_float()
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-01-31 20:28:42 +00:00
Justin Ruggles
6eabb0d3ad
Change DSPContext.vector_fmul() from dst=dst*src to dest=src0*src1.
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-01-22 17:53:27 +00:00
Mans Rullgard
ef4a65149d
Replace ASMALIGN() with .p2align
...
This macro has unconditionally used .p2align for a long time and
serves no useful purpose.
2011-01-18 20:48:24 +00:00
Mans Rullgard
ac3c9d0169
x86: remove VLA in ac3_downmix_sse
2011-01-18 20:48:24 +00:00
Ronald S. Bultje
ec3233a855
Fix ff_pw_3 alignment.
...
Originally committed as revision 26344 to svn://svn.ffmpeg.org/ffmpeg/trunk
2011-01-14 23:26:34 +00:00
Jason Garrett-Glaser
19fb234e4a
H.264: split luma dc idct out and implement MMX/SSE2 versions
...
About 2.5x the speed.
NOTE: the way that the asm code handles large qmuls is a bit suboptimal.
If x264-style dequant was used (separate shift and qmul values), it might
be possible to get some extra speed.
Originally committed as revision 26336 to svn://svn.ffmpeg.org/ffmpeg/trunk
2011-01-14 21:34:25 +00:00
Ronald S. Bultje
8d147f1f60
For rounding in chroma MC SSSE3, use 16-byte pw_3/4 instead of reading 8 bytes
...
and then using movlhps to dup it into the higher half of the register.
Originally committed as revision 26086 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-24 17:23:22 +00:00
Baptiste Coudurier
90f1f3bf00
In yadif filter, declare asm constants directly to avoid dependency on libavcodec
...
Originally committed as revision 25895 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-06 00:14:15 +00:00
Baptiste Coudurier
9e95999e2a
10l, add ff_pw_1 to dsputil_mmx for yadif sse2
...
Originally committed as revision 25881 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-04 13:06:06 +00:00
İsmail Dönmez
80e33d2451
dsputil: Use explicit movzbl instead of movzx
...
This fixes compilation with the latest clang trunk version.
Patch by İsmail Dönmez, ismail at namtrac dot org
Originally committed as revision 25628 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-11-01 19:35:51 +00:00
Ramiro Polla
153ca56b38
xmm_clobbers: list xmm registers first in clobber list
...
suncc does not like the leading commas inside the macro, but it has no problem
with trailing commas.
Originally committed as revision 25615 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-31 18:14:48 +00:00
Ramiro Polla
5d543a3d13
dsputil_mmx: add xmm registers to clobber list
...
Originally committed as revision 25611 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-31 13:57:58 +00:00
Ramiro Polla
559738eff3
dsputil_mmx: prefer xmm registers below xmm6 when they are available
...
Originally committed as revision 25606 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-31 13:13:53 +00:00
Ronald S. Bultje
dd68d4db43
MMX, MMX2, SSE2 and SSSE3 optimizations for pred16x16/8x8_plane H264 intra
...
prediction (plus some with different rounding for svq3/rv40). Speedup (for
SSSE3) about ~6-fold, 3.6% faster overall with cathedral sample.
Originally committed as revision 25361 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-05 22:06:18 +00:00
Eli Friedman
329d689f75
Use sse2 variant of put_pixels16() for no_rnd also. Provides a minor speed
...
increase to e.g. vc1, snow and mpeg decoding.
Patch by Eli Friedman <eli dot friedman gmail com>.
Originally committed as revision 25259 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-29 15:34:43 +00:00
Stefano Sabatini
c6c98d0897
Move mm_support() from libavcodec to libavutil, make it a public
...
function and rename it to av_get_cpu_flags().
Originally committed as revision 25076 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-08 15:07:14 +00:00
Stefano Sabatini
7160bb716b
Rename FF_MM_ symbols related to CPU features flags as AV_CPU_FLAG_
...
symbols, and move them from libavcodec/avcodec.h to libavutil/cpu.h.
Originally committed as revision 25040 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-04 09:59:08 +00:00
Ronald S. Bultje
2c166c3af1
Port latest x264 deblock asm (before they moved to using NV12 as internal
...
format), LGPL'ed with permission from Jason and Loren. This includes mmx2
code, so remove inline asm from h264dsp_mmx.c accordingly.
Originally committed as revision 25031 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-03 16:52:46 +00:00
Ronald S. Bultje
14bc1f2485
Split h264dsp_mmx.c (which was #included in dsputil_mmx.c) in h264_qpel_mmx.c,
...
still #included in dsputil_mmx.c and is part of DSPContext, and h264dsp_mmx.c,
which represents H264DSPContext and is now compiled on its own.
Originally committed as revision 25018 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-01 20:48:59 +00:00
Ronald S. Bultje
79ce0f002e
Fix compilation failure if yasm is disabled (missing vp3 symbols).
...
Originally committed as revision 24992 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-30 20:30:40 +00:00
Ronald S. Bultje
d0eb5a1174
Move H264 chroma MC from inline asm to yasm. This fixes VP3/5/6 and VC-1
...
fate failures on Win64.
Originally committed as revision 24989 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-30 16:31:04 +00:00