James Almer
4bb6cb4c7d
x86/vp9mc: fix string concatenation of fullpel function names
...
Fixes compilation with NASM
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-09-20 12:32:27 -03:00
Ronald S. Bultje
344d519040
vp9: add subpel MC SIMD for 10/12bpp.
2015-09-16 21:11:34 -04:00
Ronald S. Bultje
77f359670f
vp9: add fullpel (avg) MC SIMD for 10/12bpp.
2015-09-16 21:11:34 -04:00
Ronald S. Bultje
6354ff0383
vp9: add fullpel (put) MC SIMD for 10/12bpp.
2015-09-16 21:11:34 -04:00
Ronald S. Bultje
cae893f692
vp9/x86: sse2 MC assembly.
...
Also a slight change to the ssse3 code, which prevents a theoretical
overflow in the sharp filter.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-15 02:34:05 +01:00
James Almer
6b2caa321f
x86/vp9: add AVX and AVX2 MC
...
Roughly 25% faster MC than ssse3 for blocksizes 32 and 64.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-22 22:35:03 -03:00
Christophe Gisquet
4e128ab0b1
x86: vpx/h264/hevc/mpeg2: share constants
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-06 18:36:31 +02:00
Clément Bœsch
c4148a6668
x86/vp9mc: add vp9 namespace.
2014-03-29 18:13:15 +01:00
James Almer
26800e3864
vp9/x86: rename ff_avg[48]_sse to ff_avg[48]_mmxext
...
pavgb is an sse integer instruction, so the mmxext flag is enough
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-18 17:08:25 +01:00
Clément Bœsch
9cc8fa63dd
vp9/x86: simplify a few mc inits.
2014-01-16 07:48:27 +01:00
Ronald S. Bultje
18175baa54
vp9/x86: 16px MC functions (64bit only).
...
Cycle counts for large MCs (old -> new on ped1080p.webm, mx!=0&&my!=0):
16x8: 876 -> 870 (0.7%)
16x16: 1444 -> 1435 (0.7%)
16x32: 2784 -> 2748 (1.3%)
32x16: 2455 -> 2349 (4.5%)
32x32: 4641 -> 4084 (13.6%)
32x64: 9200 -> 7834 (17.4%)
64x32: 8980 -> 7197 (24.8%)
64x64: 17330 -> 13796 (25.6%)
Total decoding time goes from 9.326sec to 9.182sec.
2013-12-26 21:05:10 -05:00
Ronald S. Bultje
8729964b99
vp9: split x86 assembly in two files.
...
(And in future, loopfilter or intra pred could be put in their own
respective files also.)
2013-12-07 12:39:35 -05:00