Commit Graph

1892 Commits

Author SHA1 Message Date
Christophe Gisquet
7aeafacfd0 x86/sbrdsp: Use different mem moves
Before
2843 decicycles in ff_sbr_autocorrelate_sse3, 262086 runs, 58 skips

After
2693 decicycles in ff_sbr_autocorrelate_sse3, 262117 runs, 27 skips

Signed-off-by: James Almer <jamrial@gmail.com>
2015-01-25 18:20:43 -03:00
James Almer
449b21bfab x86/sbrdsp: add ff_sbr_autocorrelate_{sse,sse3}
2 to 2.5 times faster.

Signed-off-by: James Almer <jamrial@gmail.com>
2015-01-25 18:20:39 -03:00
James Almer
08810a8895 x86/flacdsp: remove unneeded ifdeffery
x86inc can translate r*m into a register or stack on its own

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-01-05 16:29:28 -03:00
James Almer
37b35feb64 x86/swr: add SSE2/AVX pack_8ch functions
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-12-30 23:05:27 -03:00
Ronald S. Bultje
3aefca68ca vp9/x86: add myself to copyright holders for loopfilter assembly. 2014-12-27 16:55:16 -05:00
Ronald S. Bultje
afd8c464b7 vp9/x86: make filter_16_h work on 32-bit. 2014-12-27 16:55:16 -05:00
Ronald S. Bultje
b26bc3520f vp9/x86: make filter_48/84/88_h work on 32-bit. 2014-12-27 16:55:15 -05:00
Ronald S. Bultje
8a1cff1c35 vp9/x86: make filter_44_h work on 32-bit. 2014-12-27 16:55:15 -05:00
Ronald S. Bultje
047088b8c6 vp9/x86: make filter_16_v work on 32-bit. 2014-12-27 16:55:14 -05:00
Ronald S. Bultje
0cc9c23ea1 vp9/x86: make filter_48/84_v work on 32-bit. 2014-12-27 16:55:14 -05:00
Ronald S. Bultje
6433a9133f vp9/x86: make filter_88_v work on 32-bit. 2014-12-27 16:55:14 -05:00
Ronald S. Bultje
75f8e52089 vp9/x86: make filter_44_v work on 32-bit. 2014-12-27 16:55:13 -05:00
Ronald S. Bultje
7f80c3344c vp8/x86: save one register in SIGN_ADD/SUB. 2014-12-27 16:55:13 -05:00
Ronald S. Bultje
8ea2194ebb vp9/x86: store unpacked intermediates for filter6/14 on stack.
filter16 goes from 508 to 482 (h) or 346 to 314 (v) cycles; filter88
goes from 240 to 238 (h) or 174 to 165 (v) cycles, measured on TOS.
2014-12-27 16:55:13 -05:00
Ronald S. Bultje
e42409479f vp8/x86: move variable assigned inside macro branch.
The value is not used outside the branch.
2014-12-27 16:55:12 -05:00
Ronald S. Bultje
418c202c63 vp9/x86: simplify ABSSUM_CMP by inverting the comparison meaning. 2014-12-27 16:55:12 -05:00
Ronald S. Bultje
d1c55654e1 vp8/x86: remove unused register from ABSSUB_CMP macro. 2014-12-27 16:55:12 -05:00
Ronald S. Bultje
e59bd08986 vp9/x86: slightly simplify 44/48/84/88 h stores. 2014-12-27 16:55:11 -05:00
Ronald S. Bultje
8132629bd5 vp9/x86: make cglobal statement more conservative in register allocation. 2014-12-27 16:55:11 -05:00
Ronald S. Bultje
c013ca58c5 vp9/x86: save one register in loopfilter surface coverage. 2014-12-27 16:55:11 -05:00
James Almer
32c836cb11 x86/vp9: remove duplicate function prototypes
Fixes "redundant redeclaration" warnings.

Signed-off-by: James Almer <jamrial@gmail.com>
2014-12-23 00:56:51 -03:00
James Almer
7696e429c7 x86/vp3dsp: port put_vp_no_rnd_pixels8_l2_mmx to yasm
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-20 13:25:43 +01:00
James Almer
a4d62f7775 x86/constants: fix alignment of pw_255
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-19 20:21:34 +01:00
Ronald S. Bultje
bdc1e3e3b2 vp9/x86: intra prediction sse2/32bit support.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-19 14:07:19 +01:00
Ronald S. Bultje
b6e1711223 vp9/x86: invert hu_ipred left array ordering.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-19 14:07:18 +01:00
Ronald S. Bultje
0a7964dca5 vp9/x86: save one register on 32bit idct32x32.
Fixes build on win32.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-16 02:51:26 +01:00
Ronald S. Bultje
cae893f692 vp9/x86: sse2 MC assembly.
Also a slight change to the ssse3 code, which prevents a theoretical
overflow in the sharp filter.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-15 02:34:05 +01:00
Ronald S. Bultje
fd77fbb390 vp9/x86: 32bit and sse2 support for vp9 inverse transform assembly
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-15 00:38:05 +01:00
Michael Niedermayer
a03f72e744 avcodec/x86/hevc_mc: fix sse register counts
These fix failures of --enable-xmm-clobber-test
It would be better to change the code to use fewer registers, but until
someone does the used register count must not be too small

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-11 13:17:26 +01:00
Michael Niedermayer
d43d5c5707 avcodec/x86/hevc_mc: remove dead branch from EPEL_FILTER
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-10 07:34:49 +01:00
Michael Niedermayer
ed9be7dd47 avcodec/x86/pngdsp: fix off by 1 error
This fixes artifacts in the last pixel of rows with some widths and pixel formats

Found-by: Dominique Leroux <Dominique.Leroux@autodesk.com>
Tested-by: Dominique Leroux <Dominique.Leroux@autodesk.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-08 18:24:40 +01:00
Michael Niedermayer
1d048f762d Merge commit '9a738c27dceb4b975784b23213a46f5cb560d1c2'
* commit '9a738c27dceb4b975784b23213a46f5cb560d1c2':
  v210enc: Add SIMD optimised 8-bit and 10-bit encoders

Conflicts:
	libavcodec/v210enc.c
	libavcodec/v210enc.h
	libavcodec/x86/Makefile
	libavcodec/x86/v210enc.asm
	libavcodec/x86/v210enc_init.c
	tests/ref/vsynth/vsynth1-v210
	tests/ref/vsynth/vsynth2-v210

See: 36091742d1
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-06 01:54:10 +01:00
Kieran Kunhya
9a738c27dc v210enc: Add SIMD optimised 8-bit and 10-bit encoders
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2014-12-05 13:03:49 +00:00
Reimar Döffinger
49d9cbe55d h264_i386: Fix operand size
Fixes fate failure on macosx clang x86-64

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-03 23:03:13 +01:00
Christophe Gisquet
9fa056ba75 pngdsp x86: use unaligned access
For test images manually generated to contain only up prediction,
timing results:
         8380x3032    255x185
before:   138635       1992
after:    139232       1996

Actually jumping to the proper version depending on the alignment:
8380x3032: 138767

A 0.5% speed improvement for gigantic images is not worth the code
duplication.

Fixes ticket #4148

Signed-off-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Tested-by: Benoit Fouet <benoit.fouet@free.fr>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-03 11:56:22 +01:00
Kieran Kunhya
36091742d1 v210enc: Add SIMD optimised 8-bit and 10-bit encoders
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-26 20:30:47 +01:00
Michael Niedermayer
ea41e6d637 Merge commit '9c12c6ff9539e926df0b2a2299e915ae71872600'
* commit '9c12c6ff9539e926df0b2a2299e915ae71872600':
  motion_est: convert stride to ptrdiff_t

Conflicts:
	libavcodec/me_cmp.c
	libavcodec/ppc/me_cmp.c
	libavcodec/x86/me_cmp_init.c

See: 9c669672c7
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-24 12:13:00 +01:00
Vittorio Giovara
9c12c6ff95 motion_est: convert stride to ptrdiff_t
CC: libav-stable@libav.org
Bug-Id: CID 700556 / CID 700557 / CID 700558
2014-11-24 01:30:10 +00:00
Carl Eugen Hoyos
600e38f563 Fix standalone compilation of the apng decoder on x86. 2014-11-23 13:21:29 +01:00
Michael Niedermayer
65ce8f8895 avcodec/x86/Makefile: fix order
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-23 01:49:04 +01:00
Michael Niedermayer
d3512a0e89 avcodec/x86/lossless_audiodsp: fix fallback code for 32bit
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-22 21:08:38 +01:00
Michael Niedermayer
4327088da3 avcodec/x86/lossless_audiodsp: support len %16 == 8 in scalarproduct_and_madd_int16()
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-22 20:40:36 +01:00
Reimar Döffinger
478c61ccb2 h264_i386: Optimize decode_significance_8x8_x86 for 64 bit.
11674 -> 10877 decicycles on my Phenom II.
Overall speedup was unfortunately within measurement error.

Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
2014-11-22 14:06:48 +01:00
James Almer
3cec54b7d7 x86/flacdsp: add SSE2 and AVX decorrelate functions
Two to four times faster depending on instruction set, block size and channel count.
2014-11-13 13:47:55 -03:00
James Almer
84ccc317ce x86/flacdsp: separate decoder and encoder dsp initialization
Signed-off-by: James Almer <jamrial@gmail.com>
2014-11-12 14:41:45 -03:00
James Almer
7292b0477a x86/hpeldsp: fix loop in {avg,avg_no_rnd}_pixels16_x2_mmx
Handle it inside the __asm__() block.
Fixes fate-vc1_ilaced_twomv when using the gcc-usan toolchain.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-10-23 13:11:05 -03:00
Michael Niedermayer
3c1378ce0a Merge commit '2d91abade29e43bb45c881d45909b8ee77e904e2'
* commit '2d91abade29e43bb45c881d45909b8ee77e904e2':
  x86: h264_intrapred: Don't treat 32-bit integers as 64-bit

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-10-08 11:48:58 +02:00
Henrik Gramner
2d91abade2 x86: h264_intrapred: Don't treat 32-bit integers as 64-bit
The upper halves are not guaranteed to be zero in x86-64.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2014-10-08 08:15:52 +00:00
Mickaël Raulet
4ba6371a83 x86/hevc: get rid off packusdw for ssse3 compatibility
cherry picked from commit df8ebe304df453f26c28ff8f11d607f49b90a4c2

Fixes out of array access
Fixes: asan_stack-oob_1046454_9_asan_stack-oob_15a9e7c_170_WP_MAIN10_B_Toshiba_3.bit

Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-10-04 21:14:15 +02:00
James Almer
0de1d6287e x86/mlpdec: add ff_mlp_rematrix_channel_{sse4,avx2}
2x to 2.5x faster than the C version.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-10-02 22:11:55 -03:00