Commit Graph

1679 Commits

Author SHA1 Message Date
Diego Biurrun
054013a0fc dsputil: Move APE-specific bits into apedsp 2014-05-29 06:41:15 -07:00
Diego Biurrun
65d5d58658 dsputil: Move SVQ1 encoding specific bits into svq1enc 2014-05-29 06:41:15 -07:00
Michael Niedermayer
b50559fc0b libavcodec/x86/dsputilenc: drop and 0xffff that should have becomei redundant
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-29 00:16:52 +02:00
James Almer
561bfc85eb x86/dsputilenc: implement SSE2 versions of pix_{sum16, norm1}
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-28 23:29:34 +02:00
Christophe Gisquet
0810608e23 x86: hevc_mc: better register allocation
The xmm reg count was incorrect, and manual loading of the gprs
furthermore allows to noticeable reduce the number needed.

The modified functions are used in weighted prediction, so only a
few samples like WP_* exhibit a change. For this one and Win64
(some widths removed because of too few occurrences):

WP_A_Toshiba_3.bit, ff_hevc_put_hevc_uni_w
         16    32
before: 2194  3872
after:  2119  3767

WP_B_Toshiba_3.bit, ff_hevc_put_hevc_bi_w
         16    32    64
before: 2819  4960  9396
after:  2617  4788  9150

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-28 17:39:34 +02:00
Michael Niedermayer
48a6916308 Merge commit '512f3ffe9b4bb86767c2b1176554407c75fe1a5c'
* commit '512f3ffe9b4bb86767c2b1176554407c75fe1a5c':
  dsputil: Split off HuffYUV encoding bits into their own context

Conflicts:
	configure
	libavcodec/dsputil.c
	libavcodec/dsputil.h
	libavcodec/huffyuv.h
	libavcodec/huffyuvenc.c
	libavcodec/pngenc.c
	libavcodec/x86/dsputilenc_mmx.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-28 00:03:59 +02:00
Michael Niedermayer
e2abc0d5ca Merge commit '0d439fbede03854eac8a978cccf21a3425a3c82d'
* commit '0d439fbede03854eac8a978cccf21a3425a3c82d':
  dsputil: Split off HuffYUV decoding bits into their own context

Conflicts:
	configure
	libavcodec/dsputil.c
	libavcodec/dsputil.h
	libavcodec/huffyuv.h
	libavcodec/huffyuvdec.c
	libavcodec/lagarith.c
	libavcodec/vble.c
	libavcodec/x86/Makefile
	libavcodec/x86/dsputil.asm
	libavcodec/x86/dsputil_init.c
	libavcodec/x86/dsputil_mmx.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-27 23:16:06 +02:00
Diego Biurrun
512f3ffe9b dsputil: Split off HuffYUV encoding bits into their own context
Also shorten HuffYUV context member names to avoid clutter.
2014-05-27 08:54:53 -07:00
Diego Biurrun
0d439fbede dsputil: Split off HuffYUV decoding bits into their own context
Also shorten HuffYUV context member names to avoid clutter.
2014-05-27 08:52:34 -07:00
James Almer
5863207086 x86/dsputilenc: use HADDD in ff_sse16_sse2
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-27 15:12:50 +02:00
James Almer
e64e079ece x86/dsputilenc: implement SSE2 version of diff_pixels
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-27 05:55:11 +02:00
Michael Niedermayer
a0c5cd3475 avcodec/x86/dsputilenc: set the count of SSE registers correctly for get_pixels
Found-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-27 05:52:25 +02:00
Christophe Gisquet
86ae0da60c x86: hpeldsp: propagate changes across codecs
Some codecs still use mmx versions, so have them use the versions
with newer instruction sets.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-26 15:37:04 +02:00
Michael Niedermayer
a3950a90f6 Revert "x86: dsputilenc: convert ff_sse{8, 16}_mmx() to yasm"
This reverts commit ad733089b0.

breaks with --disable-yasm
revert requested by: Christophe Gisquet <christophe.gisquet@gmail.com>
2014-05-25 19:42:18 +02:00
Timothy Gu
ad733089b0 x86: dsputilenc: convert ff_sse{8, 16}_mmx() to yasm
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-25 16:30:08 +02:00
James Almer
d94e255dd1 x86/dsputilenc: make the SUM_ABS_DCTELEM macro more readable
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-25 02:03:54 +02:00
James Almer
61eea421b2 x86/dsputilenc: port sum_abs_dctelem functions to yasm
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-24 21:46:25 +02:00
Christophe Gisquet
81aa0f4604 x86: hpeldsp: implement SSSE3 version of _xy2
Loading pb_1 rather than pw_8192 was benchmarked to be more efficient.
Loading of the 2 yields no advantage. Loading of one saves ~11 cycles.

decicycles count:
put8:  3223(mmx)    -> 2387
avg8:  2863(mmxext) -> 2125
put16: 4356(sse2)   -> 3553
avg16: 4481(sse2)   -> 3513

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-24 15:15:56 +02:00
Christophe Gisquet
9722a6a3f3 x86: hpeldsp: implement SSE2 put_pixels16_xy2
This is obviously equivalent to the avg version, without the avg.

3223(mmx) -> 2006(sse2)

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-24 03:45:17 +02:00
Christophe Gisquet
f0aca50e0b x86: hpeldsp: implement SSE2 versions
Those are mostly used in codecs older than H.264, eg MPEG-2.

put16 versions:
      mmx  mmx2  sse2
x2:  1888  1185   552
y2:  1778  1092   510

avg16 xy2: 3509(mmx2) -> 2169(sse2)

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-24 03:29:48 +02:00
James Almer
7538ad2248 x86/hevc_deblock: improve chroma functions register allocation
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-24 01:16:26 +02:00
James Almer
584327f22f x86/dsputil: fix argument declaration in vector_clipf
Should fix fate failures in msvc x86_64

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-23 23:10:17 +02:00
James Almer
518cbf9b4a x86/dsputil: fix VECTOR_CLIP_INT32 macro
The inline loop was incrementing and using the value of %%i
the wrong way.

Disassembly of ff_vector_clip_int32_sse2 before and after
this patch:

    movdqa (%rdx),%xmm0      |  movdqa (%rdx),%xmm0
    movdqa 0x10(%rdx),%xmm1  |  movdqa 0x10(%rdx),%xmm1
    movdqa 0x20(%rdx),%xmm2  |  movdqa 0x20(%rdx),%xmm2
    movdqa 0x30(%rdx),%xmm3  |  movdqa 0x30(%rdx),%xmm3
[...]                        |
    movdqa %xmm0,(%rcx)      |  movdqa %xmm0,(%rcx)
    movdqa %xmm1,0x10(%rcx)  |  movdqa %xmm1,0x10(%rcx)
    movdqa %xmm2,0x20(%rcx)  |  movdqa %xmm2,0x20(%rcx)
    movdqa %xmm3,0x30(%rcx)  |  movdqa %xmm3,0x30(%rcx)
    movdqa (%rdx),%xmm0      |  movdqa 0x40(%rdx),%xmm0
    movdqa 0x20(%rdx),%xmm1  |  movdqa 0x50(%rdx),%xmm1
    movdqa 0x40(%rdx),%xmm2  |  movdqa 0x60(%rdx),%xmm2
    movdqa 0x60(%rdx),%xmm3  |  movdqa 0x70(%rdx),%xmm3
[...]                        |
    movdqa %xmm0,(%rcx)      |  movdqa %xmm0,0x40(%rcx)
    movdqa %xmm1,0x20(%rcx)  |  movdqa %xmm1,0x50(%rcx)
    movdqa %xmm2,0x40(%rcx)  |  movdqa %xmm2,0x60(%rcx)
    movdqa %xmm3,0x60(%rcx)  |  movdqa %xmm3,0x70(%rcx)
    add    $0x80,%rdx        |  add    $0x80,%rdx
    add    $0x80,%rcx        |  add    $0x80,%rcx

Other versions were unaffected.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-23 22:59:55 +02:00
James Almer
6a4832caae x86/diracdsp: mark all functions as yasm
No inline asm dirac code remains in the tree, so replace every relevant check.
This also moves all the dirac functions from dsputil_mmx.c to diracdsp_mmx.c

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-23 15:02:42 +02:00
James Almer
1d36defe94 x86/dsputil: port ff_vector_clipf_sse to yasm
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-23 00:08:21 +02:00
Christophe Gisquet
c081ca851c x86: hpeldsp: avg_pixels_xy2 for mmx2&3dnow
This is a port of the inline assembly of the mmx version to use the
pavg(us|)b instruction.

        8    16
mmx   1498  4355
mmx2  1242  3509

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-22 20:17:49 +02:00
Christophe Gisquet
17ac998055 x86: hpeldsp: mark _xy2 versions as approximate
Currently, only the mmx version is bitexact, the others (mmxext and
3dnow) are not, in spite of their naming.

Therefore, make their name more obvious. Also restore a comment that
was removed in 71155d7b.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-22 20:17:45 +02:00
Christophe Gisquet
f8de35ebc4 x86: hpeldsp: kill hpeldsp_mmx.c
before:
1987 decicycles in 8_x2, 262121 runs, 23 skips

after:
1902 decicycles in 8_x2, 262112 runs, 32 skips

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-22 20:17:40 +02:00
James Almer
80ee2dfcf6 x86/dsputil: port ff_put_signed_pixels_clamped_mmx to yasm
Also add an SSE2 version

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-21 23:33:45 +02:00
James Almer
7b05267239 x86/dsputil: port clear_block functions to yasm
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-21 23:33:45 +02:00
Michael Niedermayer
3d4e365073 avcodec/x86/hpeldsp_init: remove redundant if()
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-21 13:38:27 +02:00
Hendrik Leppkes
cd9e08e110 hpeldsp: fix build without inline asm
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-21 13:37:38 +02:00
Christophe Gisquet
d1a32c3f49 x86: kill fpel_mmx.c
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-21 03:25:08 +02:00
James Almer
d43c303038 x86/hevc_deblock: use constants instead of generating values at runtime
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-19 23:09:33 +02:00
James Almer
057ebf1222 x86/hevc_deblock: remove some duplicated instructions
Also remove a couple unnecessary cmps

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-18 23:28:17 +02:00
Christophe Gisquet
f1793fe9cd x86: hevc_mc: specify coefficients registers
By default, macro EPEL_FILTER loads the coefficients inconditionally
into m14/m15. This forces an unneeded higher register count.

Reduce that count by making them parameters of EPEL_FILTER.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-18 16:23:58 +02:00
Carl Eugen Hoyos
ef2713747f Fix compilation of libavcodec/x86/hevc_deblock.asm with nasm.
Suggested-by: Reimar
2014-05-17 12:50:55 +02:00
James Almer
be1fbc02b8 x86/hevc_deblock: use movhps instead of shuffling values
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-17 05:40:14 +02:00
James Almer
8aac77fede x86/hevc_deblock: fix label names
Also remove some unnecessary jmps

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-17 05:40:08 +02:00
James Almer
521eaea63a x86/hevc_deblock: fix usage of ABS1
The second argument is a temp register for non-SSSE3 cases

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-17 05:39:55 +02:00
James Almer
45110d2290 x86/hevc_deblock: merge movs with other instructions
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-17 05:39:34 +02:00
plepere
ef7c4cd001 avcodec/x86/hevc: updated to use x86util macros
Reviewed-by: James Almer <jamrial@gmail.com>
Reviewed-by: Ronald S. Bultje
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-16 21:11:07 +02:00
plepere
de7b89fd43 avcodec/x86/hevc: added DBF assembly functions
Reviewed-by: James Almer <jamrial@gmail.com>
Reviewed-by: Ronald S. Bultje
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-16 21:11:03 +02:00
Michael Niedermayer
bebce653e5 avcodec/x86/dsputil_mmx: Fix build with clang-usan
Found-by: Katerina Barone-Adesi
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-15 23:56:39 +02:00
Christophe Gisquet
d1310c591e x86: sbrdsp: implement SSE qmf_deint_neg
From 133 (unrolled av_intfloat32 C) to 59 cycles on Arrandale/Win64.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-15 23:11:18 +02:00
Hendrik Leppkes
87f2d8079a hevcdsp: correctly indicate that hevc_put_hevc_bi_epel_h uses 9 GPRs
Fixes FATE on Windows.

Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-12 17:00:48 +02:00
James Almer
8e07800001 hevcdsp: include stddef.h for ptrdiff_t definition
Including stdint.h was enough for systems like Mingw, but apparently not for Linux.
This should fix make checkheaders failures on every platform

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-10 18:23:30 +02:00
James Almer
fa23190a7a hevcdsp: add missing header include
Fixes make checkheaders

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-10 14:55:03 +02:00
Michael Niedermayer
341cacb9ac avcodec/x86/hevcdsp_init: fix build failure with --disable-mmx
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-09 05:16:27 +02:00
plepere
63832e01c3 hvcodec/x86/hevcdsp: make macros more modular to support functions that are not sse4
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-09 00:14:50 +02:00
Matt Oliver
1898c2f49d inline asm: fix arrays as named constraints.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-07 15:02:45 +02:00
Michael Niedermayer
fc7d0d8201 avcodec/x86/hevcdsp_init: fix SSE4 checks
Found-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-06 18:27:49 +02:00
Michael Niedermayer
7be230b5fa avcodec/x86/Makefile: remove duplicate line
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-06 18:23:42 +02:00
Michael Niedermayer
3b3db02f2e avcodec/x86/hevcdsp_init: fix build on 32bit
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-06 18:23:42 +02:00
plepere
7a2491c436 HEVC : added assembly MC functions
pretty print x86

Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-06 18:23:36 +02:00
Matt Oliver
ac9869ffb0 x86/mpegaudiodsp.c: msvc compilation error without sse/avx_external
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-06 14:15:02 +02:00
Michael Niedermayer
ebf2c2c3a8 avcodec/lossless_videodsp: fix incompatible pointer type warning
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-05 05:49:18 +02:00
Matt Oliver
3c3e02b8d1 x86/cavdsp: prevent named constraints appearing twice. 2014-05-03 17:47:55 +02:00
James Almer
5ac10d40fb x86/mpegaudiodsp: define apply_window_mp3 as SSE
None of the handwritten asm in this function seems to be SSE2

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-25 00:38:01 +02:00
Hendrik Leppkes
5809c2a99d vc1dsp: fix build without inline asm
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-22 14:01:53 +02:00
Clément Bœsch
62d31307c1 avcodec/x86/vp9lpf: add a comment above a bunch of SWAP. 2014-04-20 21:33:58 +02:00
Clément Bœsch
f0d368d758 avcodec/x86/vp9lpf: merge a few movs with other instructions. 2014-04-20 21:29:11 +02:00
Christophe Gisquet
319235c67c vc1dsp: introduce cases for 8x8 and 16x16
This allows further unrolling the DSP implementation where possible.

x86 and ARM DSP modified by simply moving the multiple calls from vc1dec
to the DSP code. Decoding improvements should only occurs because of the
compiler actually able to unroll more.

Decoding time: ~8.80s -> 8.64s (ie around 2%)

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-20 18:25:36 +02:00
Clément Bœsch
010732b73a vp9/x86: simplify FILTER_INIT.
In the 2 FILTER_INIT usages, the source is already preloaded so that
extra complexity taken from FILTER_UPDATE is not necessary.

Also add forgotten "mask" argument in FILTER_{INIT,UPDATE} comments.
2014-04-19 17:30:33 +02:00
Clément Bœsch
b8d002dc95 vp9/x86: clarify mixed splatb. 2014-04-19 17:00:51 +02:00
Carl Eugen Hoyos
b38910c979 Fix compilation with !HAVE_6REGS.
Can be tested with:
$ ./configure --cc='cc -m32' --disable-optimizations --enable-pic
2014-04-19 09:56:01 +02:00
Carl Eugen Hoyos
72c93abaad Use MANGLE in cavsdsp.c to save two registers using gcc.
Fixes compilation with !HAVE_6REGS.
2014-04-19 09:54:26 +02:00
James Almer
197fe392db x86/dsputil: use HADDD where applicable
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-17 14:15:35 +02:00
James Almer
76ed71a72b x86: move horizontal add macros to x86util
Also port relevant AVX2/XOP optimizations from x264 with permission
to relicense to LGPL from the corresponding authors

Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-17 14:15:09 +02:00
Michael Niedermayer
46d5625f44 avcodec/x86/idct_sse2_xvid: fix non C99 inline function
Found-by: Matt Oliver <protogonoi@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-14 18:04:57 +02:00
James Almer
0f524b6c69 x86/synth_filter: remove the fma3 version ifdefs
This fixes compilation failures with --disable-fma3

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2014-04-13 11:29:28 +02:00
Timothy Gu
71c32ed533 DNxHD: convert inline asm to yasm 2014-04-11 12:09:09 +02:00
Timothy Gu
676856204b DNxHD: make get_pixel_8x4_sym accept ptrdiff_t as stride 2014-04-11 12:09:09 +02:00
Matt Oliver
d1e6e5c887 avcodec/x86: Exclude broken get_cabac under icl.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-10 17:47:22 +02:00
Matt Oliver
158a80cc0b Remove leal op to fix icl inline asm.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-07 13:02:54 +02:00
Hendrik Leppkes
fc7e02f0ff dcadsp: fix SSE code to not use SSE2 instructions.
movq from SSE register to memory is an SSE2 instruction.
Instead, use SSE movlps, which does the same thing.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-06 18:31:22 +02:00
Michael Niedermayer
e6f69b324e Merge commit '57b5b84e208ad61ffdd74ad849bed212deb92bc5'
* commit '57b5b84e208ad61ffdd74ad849bed212deb92bc5':
  x86: dsputil: Move ff_apply_window_int16_* bits to ac3dsp, where they belong

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-05 19:36:21 +02:00
Michael Niedermayer
e3c3f277a9 Merge commit 'c2c5be57494e6117086771bca34c8cd4c72c8e99'
* commit 'c2c5be57494e6117086771bca34c8cd4c72c8e99':
  x86: h264_qpel: Simplify an #if conditional

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-05 19:30:44 +02:00
Michael Niedermayer
ebb21887b8 Merge commit '01c5779f56cf708e6cb88b11cfdc248cae7e2ee8'
* commit '01c5779f56cf708e6cb88b11cfdc248cae7e2ee8':
  x86: Drop some unnecessary YASM ifdefs

Conflicts:
	libavfilter/x86/vf_yadif_init.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-05 19:16:39 +02:00
Michael Niedermayer
874f27a8f7 Merge commit 'b42f49e42f8cde25a788b2d13d03e99ca2956647'
* commit 'b42f49e42f8cde25a788b2d13d03e99ca2956647':
  x86: dsputil: Eliminate some unnecessary dsputil_x86.h #includes

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-05 19:05:00 +02:00
Michael Niedermayer
5440151fa4 Merge commit '3dc6272bed7890a49080e18eacf3c7a4a6594b0d'
* commit '3dc6272bed7890a49080e18eacf3c7a4a6594b0d':
  Remove a number of unnecessary dsputil.h #includes

Conflicts:
	libavcodec/h264pred.c
	libavcodec/vc1dsp.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-05 18:54:15 +02:00
James Almer
a1ac12bddd x86/dcadsp: add ff_dca_lfe_fir0_fma3
~10% faster than the SSE version.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-05 13:55:59 +02:00
James Almer
7d2116dd09 x86/synth_filter: compile avx and fma3 functions unconditionally
Fixes compilation failures with "--disable-{avx,fma3} --disable-optimizations"

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-05 05:15:27 +02:00
Michael Niedermayer
490d53e335 avcodec/x86/dcadsp_init: fix compilation failure without FMA3
alternatively the call could be put under #if or the #if
over the function removed

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-05 00:11:48 +02:00
Michael Niedermayer
51fd962c0b Merge commit 'c74b86699c86bdf62e8570f41d8a38be5710baa3'
* commit 'c74b86699c86bdf62e8570f41d8a38be5710baa3':
  x86/synth_filter: add synth_filter_fma3
  x86/synth_filter: add synth_filter_avx
  x86/synth_filter: add synth_filter_sse

Conflicts:
	libavcodec/x86/dcadsp.asm
	libavcodec/x86/dcadsp_init.c

See: 6467209836
See: 68c3ed936a
See: 7fd64e3e36
See: aa1f38015c
See: dfd865e51b
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-04 23:40:08 +02:00
Christophe Gisquet
dfd865e51b x86/synth_filter: remove the main loop when it's not needed
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-04 22:35:45 +02:00
Diego Biurrun
57b5b84e20 x86: dsputil: Move ff_apply_window_int16_* bits to ac3dsp, where they belong 2014-04-04 19:08:05 +02:00
Diego Biurrun
c2c5be5749 x86: h264_qpel: Simplify an #if conditional
The extra conditions are covered by previous #ifs and conditional compilation.
2014-04-04 19:08:05 +02:00
Diego Biurrun
01c5779f56 x86: Drop some unnecessary YASM ifdefs
Dead code elimination is enough to avoid undefined references in these cases.
2014-04-04 19:08:05 +02:00
Diego Biurrun
b42f49e42f x86: dsputil: Eliminate some unnecessary dsputil_x86.h #includes 2014-04-04 19:08:05 +02:00
Diego Biurrun
3dc6272bed Remove a number of unnecessary dsputil.h #includes 2014-04-04 19:08:05 +02:00
James Almer
c74b86699c x86/synth_filter: add synth_filter_fma3
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2014-04-04 17:40:51 +02:00
James Almer
81e02fae6e x86/synth_filter: add synth_filter_avx
Sandy Bridge Win64:
180 cycles in ff_synth_filter_inner_sse2
150 cycles in ff_synth_filter_inner_avx

Also switch some instructions to a three operand format to avoid
assembly errors with Yasm 1.1.0 or older.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2014-04-04 17:40:51 +02:00
James Almer
2025d8026f x86/synth_filter: add synth_filter_sse
Build only on x86_32 targets.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2014-04-04 17:40:51 +02:00
Michael Niedermayer
fb61ed1e9f Merge commit 'ac4b32df71bd932838043a4838b86d11e169707f'
* commit 'ac4b32df71bd932838043a4838b86d11e169707f':
  On2 VP7 decoder

Conflicts:
	Changelog
	libavcodec/arm/h264pred_init_arm.c
	libavcodec/arm/vp8dsp.h
	libavcodec/arm/vp8dsp_init_arm.c
	libavcodec/arm/vp8dsp_init_armv6.c
	libavcodec/arm/vp8dsp_init_neon.c
	libavcodec/avcodec.h
	libavcodec/h264pred.c
	libavcodec/version.h
	libavcodec/vp8.c
	libavcodec/vp8.h
	libavcodec/vp8data.h
	libavcodec/vp8dsp.c
	libavcodec/vp8dsp.h
	libavcodec/x86/h264_intrapred_init.c
	libavcodec/x86/vp8dsp_init.c

See: 89f2f5dbd7 and others
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-04 14:46:10 +02:00
Peter Ross
ac4b32df71 On2 VP7 decoder
Further performance improvements and security fixes by
Vittorio Giovara, Luca Barbato and Diego Biurrun.

Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2014-04-04 04:00:11 +02:00
Matt Oliver
0f2588d7e5 Use intel compliant CDQ instead of CLTD in inline asm.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-30 23:14:36 +02:00
Clément Bœsch
c4148a6668 x86/vp9mc: add vp9 namespace. 2014-03-29 18:13:15 +01:00
Timothy Gu
9d34dce05b x86: convert DNxHDenc inline asm to yasm
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-27 23:16:17 +01:00
Timothy Gu
cb11b9e89e dnxhdenc: make get_pixel_8x4_sym accept ptrdiff_t as stride
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-27 23:09:10 +01:00