Michael Niedermayer
a0c5cd3475
avcodec/x86/dsputilenc: set the count of SSE registers correctly for get_pixels
...
Found-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-27 05:52:25 +02:00
Christophe Gisquet
86ae0da60c
x86: hpeldsp: propagate changes across codecs
...
Some codecs still use mmx versions, so have them use the versions
with newer instruction sets.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-26 15:37:04 +02:00
Michael Niedermayer
a3950a90f6
Revert "x86: dsputilenc: convert ff_sse{8, 16}_mmx() to yasm"
...
This reverts commit ad733089b024e4a3ff8f024d247a032f79a50ac8.
breaks with --disable-yasm
revert requested by: Christophe Gisquet <christophe.gisquet@gmail.com>
2014-05-25 19:42:18 +02:00
Timothy Gu
ad733089b0
x86: dsputilenc: convert ff_sse{8, 16}_mmx() to yasm
...
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-25 16:30:08 +02:00
James Almer
d94e255dd1
x86/dsputilenc: make the SUM_ABS_DCTELEM macro more readable
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-25 02:03:54 +02:00
James Almer
61eea421b2
x86/dsputilenc: port sum_abs_dctelem functions to yasm
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-24 21:46:25 +02:00
Christophe Gisquet
81aa0f4604
x86: hpeldsp: implement SSSE3 version of _xy2
...
Loading pb_1 rather than pw_8192 was benchmarked to be more efficient.
Loading of the 2 yields no advantage. Loading of one saves ~11 cycles.
decicycles count:
put8: 3223(mmx) -> 2387
avg8: 2863(mmxext) -> 2125
put16: 4356(sse2) -> 3553
avg16: 4481(sse2) -> 3513
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-24 15:15:56 +02:00
Christophe Gisquet
9722a6a3f3
x86: hpeldsp: implement SSE2 put_pixels16_xy2
...
This is obviously equivalent to the avg version, without the avg.
3223(mmx) -> 2006(sse2)
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-24 03:45:17 +02:00
Christophe Gisquet
f0aca50e0b
x86: hpeldsp: implement SSE2 versions
...
Those are mostly used in codecs older than H.264, eg MPEG-2.
put16 versions:
mmx mmx2 sse2
x2: 1888 1185 552
y2: 1778 1092 510
avg16 xy2: 3509(mmx2) -> 2169(sse2)
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-24 03:29:48 +02:00
James Almer
7538ad2248
x86/hevc_deblock: improve chroma functions register allocation
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-24 01:16:26 +02:00
James Almer
584327f22f
x86/dsputil: fix argument declaration in vector_clipf
...
Should fix fate failures in msvc x86_64
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-23 23:10:17 +02:00
James Almer
518cbf9b4a
x86/dsputil: fix VECTOR_CLIP_INT32 macro
...
The inline loop was incrementing and using the value of %%i
the wrong way.
Disassembly of ff_vector_clip_int32_sse2 before and after
this patch:
movdqa (%rdx),%xmm0 | movdqa (%rdx),%xmm0
movdqa 0x10(%rdx),%xmm1 | movdqa 0x10(%rdx),%xmm1
movdqa 0x20(%rdx),%xmm2 | movdqa 0x20(%rdx),%xmm2
movdqa 0x30(%rdx),%xmm3 | movdqa 0x30(%rdx),%xmm3
[...] |
movdqa %xmm0,(%rcx) | movdqa %xmm0,(%rcx)
movdqa %xmm1,0x10(%rcx) | movdqa %xmm1,0x10(%rcx)
movdqa %xmm2,0x20(%rcx) | movdqa %xmm2,0x20(%rcx)
movdqa %xmm3,0x30(%rcx) | movdqa %xmm3,0x30(%rcx)
movdqa (%rdx),%xmm0 | movdqa 0x40(%rdx),%xmm0
movdqa 0x20(%rdx),%xmm1 | movdqa 0x50(%rdx),%xmm1
movdqa 0x40(%rdx),%xmm2 | movdqa 0x60(%rdx),%xmm2
movdqa 0x60(%rdx),%xmm3 | movdqa 0x70(%rdx),%xmm3
[...] |
movdqa %xmm0,(%rcx) | movdqa %xmm0,0x40(%rcx)
movdqa %xmm1,0x20(%rcx) | movdqa %xmm1,0x50(%rcx)
movdqa %xmm2,0x40(%rcx) | movdqa %xmm2,0x60(%rcx)
movdqa %xmm3,0x60(%rcx) | movdqa %xmm3,0x70(%rcx)
add $0x80,%rdx | add $0x80,%rdx
add $0x80,%rcx | add $0x80,%rcx
Other versions were unaffected.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-23 22:59:55 +02:00
James Almer
6a4832caae
x86/diracdsp: mark all functions as yasm
...
No inline asm dirac code remains in the tree, so replace every relevant check.
This also moves all the dirac functions from dsputil_mmx.c to diracdsp_mmx.c
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-23 15:02:42 +02:00
James Almer
1d36defe94
x86/dsputil: port ff_vector_clipf_sse to yasm
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-23 00:08:21 +02:00
Christophe Gisquet
c081ca851c
x86: hpeldsp: avg_pixels_xy2 for mmx2&3dnow
...
This is a port of the inline assembly of the mmx version to use the
pavg(us|)b instruction.
8 16
mmx 1498 4355
mmx2 1242 3509
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-22 20:17:49 +02:00
Christophe Gisquet
17ac998055
x86: hpeldsp: mark _xy2 versions as approximate
...
Currently, only the mmx version is bitexact, the others (mmxext and
3dnow) are not, in spite of their naming.
Therefore, make their name more obvious. Also restore a comment that
was removed in 71155d7b.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-22 20:17:45 +02:00
Christophe Gisquet
f8de35ebc4
x86: hpeldsp: kill hpeldsp_mmx.c
...
before:
1987 decicycles in 8_x2, 262121 runs, 23 skips
after:
1902 decicycles in 8_x2, 262112 runs, 32 skips
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-22 20:17:40 +02:00
James Almer
80ee2dfcf6
x86/dsputil: port ff_put_signed_pixels_clamped_mmx to yasm
...
Also add an SSE2 version
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-21 23:33:45 +02:00
James Almer
7b05267239
x86/dsputil: port clear_block functions to yasm
...
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-21 23:33:45 +02:00
Michael Niedermayer
3d4e365073
avcodec/x86/hpeldsp_init: remove redundant if()
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-21 13:38:27 +02:00
Hendrik Leppkes
cd9e08e110
hpeldsp: fix build without inline asm
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-21 13:37:38 +02:00
Christophe Gisquet
d1a32c3f49
x86: kill fpel_mmx.c
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-21 03:25:08 +02:00
James Almer
d43c303038
x86/hevc_deblock: use constants instead of generating values at runtime
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-19 23:09:33 +02:00
James Almer
057ebf1222
x86/hevc_deblock: remove some duplicated instructions
...
Also remove a couple unnecessary cmps
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-18 23:28:17 +02:00
Christophe Gisquet
f1793fe9cd
x86: hevc_mc: specify coefficients registers
...
By default, macro EPEL_FILTER loads the coefficients inconditionally
into m14/m15. This forces an unneeded higher register count.
Reduce that count by making them parameters of EPEL_FILTER.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-18 16:23:58 +02:00
Carl Eugen Hoyos
ef2713747f
Fix compilation of libavcodec/x86/hevc_deblock.asm with nasm.
...
Suggested-by: Reimar
2014-05-17 12:50:55 +02:00
James Almer
be1fbc02b8
x86/hevc_deblock: use movhps instead of shuffling values
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-17 05:40:14 +02:00
James Almer
8aac77fede
x86/hevc_deblock: fix label names
...
Also remove some unnecessary jmps
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-17 05:40:08 +02:00
James Almer
521eaea63a
x86/hevc_deblock: fix usage of ABS1
...
The second argument is a temp register for non-SSSE3 cases
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-17 05:39:55 +02:00
James Almer
45110d2290
x86/hevc_deblock: merge movs with other instructions
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-17 05:39:34 +02:00
plepere
ef7c4cd001
avcodec/x86/hevc: updated to use x86util macros
...
Reviewed-by: James Almer <jamrial@gmail.com>
Reviewed-by: Ronald S. Bultje
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-16 21:11:07 +02:00
plepere
de7b89fd43
avcodec/x86/hevc: added DBF assembly functions
...
Reviewed-by: James Almer <jamrial@gmail.com>
Reviewed-by: Ronald S. Bultje
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-16 21:11:03 +02:00
Michael Niedermayer
bebce653e5
avcodec/x86/dsputil_mmx: Fix build with clang-usan
...
Found-by: Katerina Barone-Adesi
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-15 23:56:39 +02:00
Christophe Gisquet
d1310c591e
x86: sbrdsp: implement SSE qmf_deint_neg
...
From 133 (unrolled av_intfloat32 C) to 59 cycles on Arrandale/Win64.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-15 23:11:18 +02:00
Hendrik Leppkes
87f2d8079a
hevcdsp: correctly indicate that hevc_put_hevc_bi_epel_h uses 9 GPRs
...
Fixes FATE on Windows.
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-12 17:00:48 +02:00
James Almer
8e07800001
hevcdsp: include stddef.h for ptrdiff_t definition
...
Including stdint.h was enough for systems like Mingw, but apparently not for Linux.
This should fix make checkheaders failures on every platform
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-10 18:23:30 +02:00
James Almer
fa23190a7a
hevcdsp: add missing header include
...
Fixes make checkheaders
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-10 14:55:03 +02:00
Michael Niedermayer
341cacb9ac
avcodec/x86/hevcdsp_init: fix build failure with --disable-mmx
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-09 05:16:27 +02:00
plepere
63832e01c3
hvcodec/x86/hevcdsp: make macros more modular to support functions that are not sse4
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-09 00:14:50 +02:00
Matt Oliver
1898c2f49d
inline asm: fix arrays as named constraints.
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-07 15:02:45 +02:00
Michael Niedermayer
fc7d0d8201
avcodec/x86/hevcdsp_init: fix SSE4 checks
...
Found-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-06 18:27:49 +02:00
Michael Niedermayer
7be230b5fa
avcodec/x86/Makefile: remove duplicate line
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-06 18:23:42 +02:00
Michael Niedermayer
3b3db02f2e
avcodec/x86/hevcdsp_init: fix build on 32bit
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-06 18:23:42 +02:00
plepere
7a2491c436
HEVC : added assembly MC functions
...
pretty print x86
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-06 18:23:36 +02:00
Matt Oliver
ac9869ffb0
x86/mpegaudiodsp.c: msvc compilation error without sse/avx_external
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-06 14:15:02 +02:00
Michael Niedermayer
ebf2c2c3a8
avcodec/lossless_videodsp: fix incompatible pointer type warning
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-05 05:49:18 +02:00
Matt Oliver
3c3e02b8d1
x86/cavdsp: prevent named constraints appearing twice.
2014-05-03 17:47:55 +02:00
James Almer
5ac10d40fb
x86/mpegaudiodsp: define apply_window_mp3 as SSE
...
None of the handwritten asm in this function seems to be SSE2
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-25 00:38:01 +02:00
Hendrik Leppkes
5809c2a99d
vc1dsp: fix build without inline asm
...
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-22 14:01:53 +02:00
Clément Bœsch
62d31307c1
avcodec/x86/vp9lpf: add a comment above a bunch of SWAP.
2014-04-20 21:33:58 +02:00