Commit Graph

526 Commits

Author SHA1 Message Date
Justin Ruggles
d5a7229ba4 Add a float DSP framework to libavutil
Move vector_fmul() from DSPContext to AVFloatDSPContext.
2012-06-08 13:14:38 -04:00
Vitor Sessak
bac0729d9e x86: use new schema for ASM macros
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2012-05-29 14:49:45 +02:00
Justin Ruggles
713548cbad x86: lavc: use %if HAVE_AVX guards around AVX functions in yasm code.
This is needed for older versions of yasm/nasm that do not support AVX.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2012-05-22 20:46:02 +02:00
Kieran Kunhya
5ff01259a8 Convert vector_fmul range of functions to YASM and add AVX versions
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
2012-05-21 17:13:05 -04:00
Michael Kostylev
6797d1948b x86: rv40: Mark rv40_weight functions as MMX2; they use MMX2 instructions. 2012-05-15 23:54:08 +02:00
Justin Ruggles
95a98ab3f0 ac3dsp: simplify x86 versions of ac3_max_msb_abs_int16
Simplifies the code by using cpuflags and a new macro.
Also fixes the invalid use of the MMX2 pshufw operation in the MMX-only
function.
2012-05-15 15:23:59 -04:00
Vitor Sessak
fcc456b829 x86: use more standard construct for setting ASM functions in FFT code
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2012-05-14 15:38:42 +02:00
Michael Kostylev
ea60dfe284 x86: vc1: drop MMX loop filter implementation, which uses MMX2 instructions. 2012-05-12 14:02:45 +02:00
Christophe Gisquet
110d0cdc9d rv40dsp x86: MMX/MMX2/3DNow/SSE2/SSSE3 implementations of MC
Code mostly inspired by vp8's MC, however:
- its MMX2 horizontal filter is worse because it can't take advantage of
  the coefficient redundancy
- that same coefficient redundancy allows better code for non-SSSE3 versions

Benchmark (rounded to tens of unit):
        V8x8  H8x8  2D8x8  V16x16  H16x16  2D16x16
C       445    358   985    1785    1559    3280
MMX*    219    271   478     714     929    1443
SSE2    131    158   294     425     515     892
SSSE3   120    122   248     387     390     763

End result is overall around a 15% speedup for SSSE3 version (on 6 sequences);
all loop filter functions now take around 55% of decoding time, while luma MC
dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2012-05-10 18:42:43 +02:00
Ronald S. Bultje
bec207f9f9 snowdsp: explicitily state instruction size.
Fixes a compile error with clang at -O0.
2012-05-02 09:57:12 -07:00
Christophe GISQUET
e75d1d4f73 dsputil x86: revert a test back to its previous value
Commit 356ee8d caused the initial inversion.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-04-28 11:00:51 -07:00
Christophe Gisquet
fe5ed69dc7 rv34dsp x86: implement MMX2 inverse transform
141 cycles down to 51.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-04-28 10:58:47 -07:00
Roland Scheidegger
9b9df1cdff h264: new assembly version of get_cabac for x86_64 with PIC
This adds a hand-optimized assembly version for get_cabac much like the
existing one, but it works if the table offsets are RIP-relative.
Compared to the non-RIP-relative version this adds 2 lea instructions
and it needs one extra register. get_cabac() gets about 40% faster, for
an overall speedup of about 5%.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-04-28 09:43:25 -07:00
Roland Scheidegger
14e9ffc1e4 h264: use one table instead of several for cabac functions
The reason is this is easier for PIC code (in particular on darwin...).
Keep the old names as pointers (static in cabac_functions.h so gcc
knows these are just immediate offsets) so the c code can nicely stay the same
(alternatively could use offsets directly in the functions needing the
tables). This should produce the same code as before with non-pic and better
code (confirmed) with pic.

The assembly uses the new table but still won't work for PIC case.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-04-28 08:26:12 -07:00
Roland Scheidegger
444f47b55c h264: (trivial) remove unneeded macro argument in x86/cabac.h
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-04-28 08:24:56 -07:00
Mans Rullgard
2bcbd98459 Remove lowres video decoding
This feature is complex, of questionable utility, and slows down
normal decoding.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-04-21 18:56:19 +01:00
Mans Rullgard
95510be8c3 avcodec: remove AVCodecContext.dsp_mask
This removes all references to AVCodecContext.dsp_mask and marks
it for eviction at the next version bump.  It has been superseded
by av_set_cpu_flag_mask() which, unlike this field, works everywhere.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-04-21 18:30:01 +01:00
Ronald S. Bultje
87a246341b h264: use proper PROLOGUE statement for a function using 8 registers.
Fixes crashes when using biweight on win64.
2012-04-16 08:07:21 -07:00
Ronald S. Bultje
b089ca871a dsputil: fix optimized emu_edge function on Win64.
Recent register allocation changes (x86inc.asm update) changed the
register order and thus opcodes for the inner loops. One of them became
>128bytes, which confuses other parts of this function where it jumps
to fixed-offset positions to extend the edge by fixed amounts. A simple
register change fixes this.
2012-04-13 11:28:30 -07:00
Justin Ruggles
de7f22ab0c ac3dsp: call femms/emms at the end of float_to_fixed24() for 3DNow and SSE
Fixes ac3-encode and eac3-encode FATE test failures with SSE2 disabled.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-04-12 21:33:04 -07:00
Ronald S. Bultje
76538d7a78 h264: fix 10bit biweight functions after recent x86inc.asm fixes.
This should have been updated in the x86inc.asm update, but was
accidently forgotten.
2012-04-12 21:13:57 -07:00
Diego Biurrun
7bb3a302fe build: Consistently handle conditional compilation for all optimization OBJS. 2012-04-12 09:00:49 +02:00
Henrik Gramner
729f90e268 x86inc improvements for 64-bit
Add support for all x86-64 registers
Prefer caller-saved register over callee-saved on WIN64
Support up to 15 function arguments

Also (by Ronald S. Bultje)
Fix up our asm to work with new x86inc.asm.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
2012-04-11 15:47:00 -04:00
Christophe GISQUET
2130bd8f5b rv40dsp x86: use only one register, for both increment and loop counter
Around 10 cycles faster for luma.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-04-10 10:07:09 -07:00
Christophe GISQUET
272b252c01 rv40dsp: implement prescaled versions for biweight.
Quite often, the original weights are multiple of 512. By prescaling them
by 1/512 when they are computed (once per frame), no intermediate shifting
is needed, and no prescaling on each call either.

The x86 code already used that trick.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-04-10 10:06:48 -07:00
Christophe GISQUET
6b81da2fd0 dsputil x86: use SSE float instruction instead of SSE2 integer equivalent
All the more required since the users are pure SSE functions.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-04-04 11:24:27 -07:00
Christophe GISQUET
cd88105f6f dsputil x86: remove deprecated parameter from scalarproduct_int16 prototype
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-04-04 11:24:08 -07:00
Christophe GISQUET
f9888520cc vp8dsp x86: perform rounding shift with a single instruction
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-04-04 11:23:36 -07:00
Ronald S. Bultje
a940198130 cabac: add overread protection to BRANCHLESS_GET_CABAC().
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
2012-03-28 08:01:29 -07:00
Ronald S. Bultje
448dc42571 cabac: increment jump locations by one in callers of BRANCHLESS_GET_CABAC(). 2012-03-28 08:01:29 -07:00
Ronald S. Bultje
16f6e83f74 cabac: remove unused argument from BRANCHLESS_GET_CABAC_UPDATE(). 2012-03-28 08:01:29 -07:00
Ronald S. Bultje
951014e5bb cabac: use struct+offset instead of memory operand in BRANCHLESS_GET_CABAC(). 2012-03-28 08:01:29 -07:00
Ronald S. Bultje
a0bdcb019e h264: add overread protection to get_cabac_bypass_sign_x86(). 2012-03-28 08:01:29 -07:00
Ronald S. Bultje
95bfa4ead7 h264: reindent get_cabac_bypass_sign_x86(). 2012-03-28 08:01:29 -07:00
Ronald S. Bultje
db025929f2 h264: use struct offsets in get_cabac_bypass_sign_x86(). 2012-03-28 08:01:29 -07:00
Diego Biurrun
ad0e31f134 build: prettyprinting cosmetics 2012-03-26 13:00:10 +02:00
Diego Biurrun
62ce9defb8 x86: dsputil: prettyprint gcc inline asm 2012-03-25 11:50:48 +02:00
Diego Biurrun
3b54912113 x86: K&R prettyprinting cosmetics for dsputil_mmx.c 2012-03-25 11:50:48 +02:00
Diego Biurrun
915a2a0a65 x86: conditionally compile H.264 QPEL optimizations 2012-03-25 11:50:45 +02:00
Diego Biurrun
3816642eab dsputil_mmx: Surround QPEL macros by "do { } while (0);" blocks.
This makes them safe to use in non-fully braced if-blocks and similar.
2012-03-25 11:48:37 +02:00
Ronald S. Bultje
71ea26811c aacsbr: handle m_max values smaller than 4.
Prevents a signflip in the counter, and a subsequent crash because of
overreads/overwrites.

Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
2012-03-23 12:56:08 -07:00
Ronald S. Bultje
a928ed3751 vp8: convert mbedge loopfilter x86 assembly to use named arguments. 2012-03-10 11:36:33 -08:00
Ronald S. Bultje
bee330e300 vp8: convert inner loopfilter x86 assembly to use named arguments. 2012-03-10 11:36:33 -08:00
Reimar Döffinger
6eda85e15b sbrdsp.asm: convert all instructions to float/SSE ones.
Since the values are floats, using the float operations
makes sense, improves performance on some CPUs and
makes the code SSE compatible instead of needing SSE2.

Based on suggestion by Jason.

Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-03-07 13:50:13 -08:00
Christophe GISQUET
7e1ce6a6ac dsputil: remove shift parameter from scalarproduct_int16
There is only one caller, which does not need the shifting. Other use cases
are situations where different roundings would be needed.

The x86 and neon versions are modified accordingly.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-03-07 10:29:52 -08:00
Diego Biurrun
1e9d55e45e x86: Remove duplicated AVG_3DNOW_OP / AVG_MMX2_OP macros from h264_qpel_mmx.c. 2012-03-07 09:36:04 +01:00
Reimar Döffinger
b5161908e0 SBR DSP: fix SSE code to not use SSE2 instructions.
movq from SSE register _to_ memory is an SSE2 instruction.
Use the SSE movlps function instead that does the same thing.

Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-03-06 13:40:35 -08:00
Mans Rullgard
356ee8d7de x86: clean up ff_dsputil_init_mmx()
This splits ff_dsputil_init_mmx() into multiple functions, one for
each MMX/SSE level, somewhat simplifying the nested conditions.

Signed-off-by: Mans Rullgard <mans@mansr.com>
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2012-03-05 14:40:03 +01:00
Ronald S. Bultje
b4188f0d46 vp8: convert simple loopfilter x86 assembly to use named arguments. 2012-03-03 20:40:00 -08:00
Ronald S. Bultje
8476ca3b4e vp8: convert idct x86 assembly to use named arguments. 2012-03-03 20:40:00 -08:00