1959 Commits

Author SHA1 Message Date
Christophe Gisquet
691b7f5e9e lavc/lossless_audiodsp: revert various commits
Their intent was to make the DSP work with wmalossless pro.
The later was fixed to work with the DSP.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-07 15:15:19 +01:00
Christophe Gisquet
9dc45d1f42 x86: lavc: share more constants
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 23:35:02 +01:00
Mickaël Raulet
6ecc3fd612 x86/hevc_mc: use aligned loads
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 21:38:00 +01:00
James Almer
383fddeec6 x86/lossless_audiodsp: fix compilation with --disable-yasm
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-06 17:30:17 -03:00
James Almer
aea29a891f x86/hevc_sao: fix loading of RIP address
pb_eo must be handled as a rip relative address for MSVC64, so an
intermediate register is needed. Should fix link failures.

Suggested by Hendrik Leppkes and Christophe Gisquet.

Tested-By: Hendrik Leppkes <h.leppkes@gmail.com>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-06 15:06:15 -03:00
Mickaël Raulet
bcb0925115 x86/hevc: use CLIPW macro when possible
Conflicts:
	libavcodec/x86/hevc_mc.asm

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 17:38:47 +01:00
Christophe Gisquet
5eedd36df1 x86: hevc_mc: use epel_hv 16-wide function
The epel_hv functions were still relying on only epel_hv 8-wide
being the maximum width instanciated.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 17:37:56 +01:00
Pierre Edouard Lepere
a0d1300f71 x86: hevc_mc: add AVX2 optimizations
before
33304 decicycles in luma_bi_1, 523066 runs, 1222 skips
38138 decicycles in luma_bi_2, 523427 runs, 861 skips
13490 decicycles in luma_uni, 516138 runs, 8150 skips
after
20185 decicycles in luma_bi_1, 519970 runs, 4318 skips
24620 decicycles in luma_bi_2, 521024 runs, 3264 skips
10397 decicycles in luma_uni, 515715 runs, 8573 skips

Conflicts:
	libavcodec/x86/hevc_mc.asm
	libavcodec/x86/hevcdsp_init.c

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 17:20:47 +01:00
Michael Niedermayer
a6c2c8fe3f Revert "avcodec/x86/lossless_audiodsp: Make scalarproduct_and_madd_int16 prototypes more similar"
This reverts commit 3b4ffba3af968ae702e3a44f6b5f53445efc7363.

Unbreaks the SSSE3 code on mingw32

Conflicts:

	libavcodec/x86/lossless_audiodsp.asm

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 02:31:45 +01:00
Michael Niedermayer
f1214763af avcodec/x86/lossless_audiodsp: Move order&8 fallback into C code
This is simpler and more robust, and fixes mismatching XMM save restore
mismatches

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 02:18:54 +01:00
Michael Niedermayer
3b4ffba3af avcodec/x86/lossless_audiodsp: Make scalarproduct_and_madd_int16 prototypes more similar
This is needed as the mmx code is used as fallback from the ssse3 code

Suggested-by: jamrial
Tested-by: wm4
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 00:20:59 +01:00
James Almer
15574c505b x86/hevcdsp: add ff_hevc_sao_edge_filter_{10,12}_{sse2,avx2}
Original x86 intrinsics code by Pierre-Edouard Lepere.
Yasm port, refactoring and optimizations by James Almer.

Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U

Width 32
342694 decicycles in sao_edge_filter_10, 16384 runs, 0 skips
29476 decicycles in ff_hevc_sao_edge_filter_32_10_ssse3, 16384 runs, 0 skips
13996 decicycles in ff_hevc_sao_edge_filter_32_10_avx2, 16381 runs, 3 skips

Width 64
581163 decicycles in sao_edge_filter_10, 8192 runs, 0 skips
59774 decicycles in ff_hevc_sao_edge_filter_64_10_ssse3, 8192 runs, 0 skips
28383 decicycles in ff_hevc_sao_edge_filter_64_10_avx2, 8191 runs, 1 skips

Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-05 15:02:33 -03:00
James Almer
042c1159fc x86/hevcdsp: add ff_hevc_sao_edge_filter_8_{ssse3,avx2}
Original x86 intrinsics code and initial yasm port by Pierre-Edouard Lepere.
Refactoring and optimizations by James Almer.

Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U

Width 32
158583 decicycles in edge, sao_edge_filter_8 runs, 0 skips
5205 decicycles in ff_hevc_sao_edge_filter_32_8_ssse3, 32767 runs, 1 skips
2942 decicycles in ff_hevc_sao_edge_filter_32_8_avx2, 32767 runs, 1 skips

Width 64
705639 decicycles in sao_edge_filter_8, 262144 runs, 0 skips
19224 decicycles in ff_hevc_sao_edge_filter_64_8_ssse3, 262111 runs, 33 skips
10433 decicycles in ff_hevc_sao_edge_filter_64_8_avx2, 262115 runs, 29 skips

Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-05 15:02:27 -03:00
James Almer
aa945dc112 x86/hevcdsp: add missing vzeroupper in ff_hevc_sao_band_filter_48_*_avx2
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-02 00:01:35 -03:00
James Almer
71e2cb4706 x86/hevcdsp: add missing guards to ff_hevc_sao_band_filter_avx2
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-01 21:45:52 -03:00
Christophe Gisquet
bff7feb328 x86: hevc/sao: aligned source buffers
Usefull for at least band filter, for which:
- Band filter call only:
           32      64
Before:  16556    54015
After:   16497    52355
- Whole case:
           32      64
Before:  37031   103008
After:   32045    93952
2015-02-01 20:22:54 -03:00
James Almer
fa3eccb4f9 x86/hevc: add ff_hevc_sao_band_filter_{8,10,12}_{sse2,avx,avx2}
Original x86 intrinsics code and initial 8bit yasm port by Pierre-Edouard Lepere.
10/12bit yasm ports, refactoring and optimizations by James Almer

Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U

width 32
40338 decicycles in sao_band_filter_0_8, 2048 runs, 0 skips
8056 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 2048 runs, 0 skips
7458 decicycles in ff_hevc_sao_band_filter_8_32_avx, 2048 runs, 0 skips
4504 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 2048 runs, 0 skips

width 64
136046 decicycles in sao_band_filter_0_8, 16384 runs, 0 skips
28576 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 16384 runs, 0 skips
26707 decicycles in ff_hevc_sao_band_filter_8_32_avx, 16384 runs, 0 skips
14387 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 16384 runs, 0 skips

Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-01 20:22:35 -03:00
Christophe Gisquet
7aeafacfd0 x86/sbrdsp: Use different mem moves
Before
2843 decicycles in ff_sbr_autocorrelate_sse3, 262086 runs, 58 skips

After
2693 decicycles in ff_sbr_autocorrelate_sse3, 262117 runs, 27 skips

Signed-off-by: James Almer <jamrial@gmail.com>
2015-01-25 18:20:43 -03:00
James Almer
449b21bfab x86/sbrdsp: add ff_sbr_autocorrelate_{sse,sse3}
2 to 2.5 times faster.

Signed-off-by: James Almer <jamrial@gmail.com>
2015-01-25 18:20:39 -03:00
James Almer
08810a8895 x86/flacdsp: remove unneeded ifdeffery
x86inc can translate r*m into a register or stack on its own

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-01-05 16:29:28 -03:00
James Almer
37b35feb64 x86/swr: add SSE2/AVX pack_8ch functions
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-12-30 23:05:27 -03:00
Ronald S. Bultje
3aefca68ca vp9/x86: add myself to copyright holders for loopfilter assembly. 2014-12-27 16:55:16 -05:00
Ronald S. Bultje
afd8c464b7 vp9/x86: make filter_16_h work on 32-bit. 2014-12-27 16:55:16 -05:00
Ronald S. Bultje
b26bc3520f vp9/x86: make filter_48/84/88_h work on 32-bit. 2014-12-27 16:55:15 -05:00
Ronald S. Bultje
8a1cff1c35 vp9/x86: make filter_44_h work on 32-bit. 2014-12-27 16:55:15 -05:00
Ronald S. Bultje
047088b8c6 vp9/x86: make filter_16_v work on 32-bit. 2014-12-27 16:55:14 -05:00
Ronald S. Bultje
0cc9c23ea1 vp9/x86: make filter_48/84_v work on 32-bit. 2014-12-27 16:55:14 -05:00
Ronald S. Bultje
6433a9133f vp9/x86: make filter_88_v work on 32-bit. 2014-12-27 16:55:14 -05:00
Ronald S. Bultje
75f8e52089 vp9/x86: make filter_44_v work on 32-bit. 2014-12-27 16:55:13 -05:00
Ronald S. Bultje
7f80c3344c vp8/x86: save one register in SIGN_ADD/SUB. 2014-12-27 16:55:13 -05:00
Ronald S. Bultje
8ea2194ebb vp9/x86: store unpacked intermediates for filter6/14 on stack.
filter16 goes from 508 to 482 (h) or 346 to 314 (v) cycles; filter88
goes from 240 to 238 (h) or 174 to 165 (v) cycles, measured on TOS.
2014-12-27 16:55:13 -05:00
Ronald S. Bultje
e42409479f vp8/x86: move variable assigned inside macro branch.
The value is not used outside the branch.
2014-12-27 16:55:12 -05:00
Ronald S. Bultje
418c202c63 vp9/x86: simplify ABSSUM_CMP by inverting the comparison meaning. 2014-12-27 16:55:12 -05:00
Ronald S. Bultje
d1c55654e1 vp8/x86: remove unused register from ABSSUB_CMP macro. 2014-12-27 16:55:12 -05:00
Ronald S. Bultje
e59bd08986 vp9/x86: slightly simplify 44/48/84/88 h stores. 2014-12-27 16:55:11 -05:00
Ronald S. Bultje
8132629bd5 vp9/x86: make cglobal statement more conservative in register allocation. 2014-12-27 16:55:11 -05:00
Ronald S. Bultje
c013ca58c5 vp9/x86: save one register in loopfilter surface coverage. 2014-12-27 16:55:11 -05:00
James Almer
32c836cb11 x86/vp9: remove duplicate function prototypes
Fixes "redundant redeclaration" warnings.

Signed-off-by: James Almer <jamrial@gmail.com>
2014-12-23 00:56:51 -03:00
James Almer
7696e429c7 x86/vp3dsp: port put_vp_no_rnd_pixels8_l2_mmx to yasm
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-20 13:25:43 +01:00
James Almer
a4d62f7775 x86/constants: fix alignment of pw_255
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-19 20:21:34 +01:00
Ronald S. Bultje
bdc1e3e3b2 vp9/x86: intra prediction sse2/32bit support.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-19 14:07:19 +01:00
Ronald S. Bultje
b6e1711223 vp9/x86: invert hu_ipred left array ordering.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-19 14:07:18 +01:00
Ronald S. Bultje
0a7964dca5 vp9/x86: save one register on 32bit idct32x32.
Fixes build on win32.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-16 02:51:26 +01:00
Ronald S. Bultje
cae893f692 vp9/x86: sse2 MC assembly.
Also a slight change to the ssse3 code, which prevents a theoretical
overflow in the sharp filter.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-15 02:34:05 +01:00
Ronald S. Bultje
fd77fbb390 vp9/x86: 32bit and sse2 support for vp9 inverse transform assembly
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-15 00:38:05 +01:00
Michael Niedermayer
a03f72e744 avcodec/x86/hevc_mc: fix sse register counts
These fix failures of --enable-xmm-clobber-test
It would be better to change the code to use fewer registers, but until
someone does the used register count must not be too small

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-11 13:17:26 +01:00
Michael Niedermayer
d43d5c5707 avcodec/x86/hevc_mc: remove dead branch from EPEL_FILTER
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-10 07:34:49 +01:00
Michael Niedermayer
ed9be7dd47 avcodec/x86/pngdsp: fix off by 1 error
This fixes artifacts in the last pixel of rows with some widths and pixel formats

Found-by: Dominique Leroux <Dominique.Leroux@autodesk.com>
Tested-by: Dominique Leroux <Dominique.Leroux@autodesk.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-08 18:24:40 +01:00
Michael Niedermayer
1d048f762d Merge commit '9a738c27dceb4b975784b23213a46f5cb560d1c2'
* commit '9a738c27dceb4b975784b23213a46f5cb560d1c2':
  v210enc: Add SIMD optimised 8-bit and 10-bit encoders

Conflicts:
	libavcodec/v210enc.c
	libavcodec/v210enc.h
	libavcodec/x86/Makefile
	libavcodec/x86/v210enc.asm
	libavcodec/x86/v210enc_init.c
	tests/ref/vsynth/vsynth1-v210
	tests/ref/vsynth/vsynth2-v210

See: 36091742d182b3ad4411aae22682354b3834a974
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-06 01:54:10 +01:00
Kieran Kunhya
9a738c27dc v210enc: Add SIMD optimised 8-bit and 10-bit encoders
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2014-12-05 13:03:49 +00:00