Commit Graph

32 Commits

Author SHA1 Message Date
Christophe Gisquet
398f531915 x86: hevc_mc: fewer xmm regs used in epel h/v
11 xmm regs seem only required for avx2.

Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-17 15:19:19 +01:00
Christophe Gisquet
89cb4995fa x86: hevc_mc: save 1 gpr in epel filter loading
The 3*stride value stored in r3src can be loaded much later,
so use r3src instead of a dedicated gpr when possible.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-16 21:53:51 +01:00
Christophe Gisquet
b533949813 x86: hevc: remove a parameter to WP internals
The second stride is always the internal buffer one, MAX_PB_SIZE (times 2 to
get the value in bytes).

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-14 17:22:50 +01:00
James Almer
1679d68dbf x86/hevc_mc: optimize AVX2 mc functions
Before
40766 decicycles in ff_hevc_put_hevc_qpel_h64_8_avx2, 8192 runs, 0 skips

After
37975 decicycles in ff_hevc_put_hevc_qpel_h64_8_avx2, 8192 runs, 0 skips

Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-12 13:21:58 -03:00
Christophe Gisquet
b61b9e4919 x86: hevc_mc: remove lea in EPEL_LOAD
The second parameter to the macro is always an immediate address,
so no lea is needed.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-08 22:19:35 +01:00
Christophe Gisquet
4919b38421 x86: hevc_mc: fewer gpr autoloads for _v filters
In that case, it's just to load my, but mx/r3src is not used.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-08 22:19:34 +01:00
Christophe Gisquet
626d6184ce x86: lavc/hevc_mc: fix comments
The width parameter is now completely at the back, and actually
never used. This helps understanding the actual parameter list.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-07 20:52:03 +01:00
Christophe Gisquet
ed450d4acf x86: lavc: share more constant through defines
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-07 17:48:14 +01:00
Christophe Gisquet
9dc45d1f42 x86: lavc: share more constants
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 23:35:02 +01:00
Mickaël Raulet
6ecc3fd612 x86/hevc_mc: use aligned loads
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 21:38:00 +01:00
Mickaël Raulet
bcb0925115 x86/hevc: use CLIPW macro when possible
Conflicts:
	libavcodec/x86/hevc_mc.asm

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 17:38:47 +01:00
Pierre Edouard Lepere
a0d1300f71 x86: hevc_mc: add AVX2 optimizations
before
33304 decicycles in luma_bi_1, 523066 runs, 1222 skips
38138 decicycles in luma_bi_2, 523427 runs, 861 skips
13490 decicycles in luma_uni, 516138 runs, 8150 skips
after
20185 decicycles in luma_bi_1, 519970 runs, 4318 skips
24620 decicycles in luma_bi_2, 521024 runs, 3264 skips
10397 decicycles in luma_uni, 515715 runs, 8573 skips

Conflicts:
	libavcodec/x86/hevc_mc.asm
	libavcodec/x86/hevcdsp_init.c

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 17:20:47 +01:00
Michael Niedermayer
a03f72e744 avcodec/x86/hevc_mc: fix sse register counts
These fix failures of --enable-xmm-clobber-test
It would be better to change the code to use fewer registers, but until
someone does the used register count must not be too small

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-11 13:17:26 +01:00
Michael Niedermayer
d43d5c5707 avcodec/x86/hevc_mc: remove dead branch from EPEL_FILTER
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-10 07:34:49 +01:00
Mickaël Raulet
4ba6371a83 x86/hevc: get rid off packusdw for ssse3 compatibility
cherry picked from commit df8ebe304df453f26c28ff8f11d607f49b90a4c2

Fixes out of array access
Fixes: asan_stack-oob_1046454_9_asan_stack-oob_15a9e7c_170_WP_MAIN10_B_Toshiba_3.bit

Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-10-04 21:14:15 +02:00
Christophe Gisquet
38e2aa3759 x86: hevc_mc: correct unneeded use of SSE4 code
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-24 11:43:33 +02:00
Christophe Gisquet
2346f2b5db x86: hevcdsp: use compilation-time-fixed constant
The stride for some buffers is known.

Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-22 16:26:30 +02:00
Christophe Gisquet
dad7f15567 hevcdsp: remove more instances of compile-time-fixed parameters
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-22 15:22:42 +02:00
Christophe Gisquet
d4f44b66d3 hevcdsp: remove compilation-time-fixed parameter
The dststride parameter is always MAX_PB_SIZE.

Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-22 14:57:37 +02:00
Christophe Gisquet
fb1a98ec5b x86: hevc_mc: assume 2nd source stride is 64
Reviewed-by: Mickaël Raulet <mraulet@gmail.com
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-22 13:21:37 +02:00
James Almer
b7863c972c x86/hevc_mc: use fewer instructions in hevc_put_hevc_{uni, bi}_w[24]_{8, 10, 12}
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-04 14:47:15 +02:00
James Almer
b1a44e6bf5 x86/hevc_mc: remove an unnecessary pxor
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-04 14:35:08 +02:00
Christophe Gisquet
a507623bad x86: hevc_mc: fix register count usage
A macro was using a fixed register, causing too many GPRs to be
declared as used.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-29 22:50:50 +02:00
Christophe Gisquet
81943a10b5 x86: hevc_mc: load less data in epel filters
Before:
5679 decicycles in epel_bi, 2059976 runs, 37176 skips
3468 decicycles in epel_uni, 1040886 runs, 7690 skips

After:
5323 decicycles in epel_bi, 2059493 runs, 37659 skips
3262 decicycles in epel_uni, 1040871 runs, 7705 skips

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-27 18:34:39 +02:00
Christophe Gisquet
36284ae981 x86: hevc_mc: replace one lea by add
Should have been in 036f11bdb5.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-27 17:42:56 +02:00
Christophe Gisquet
036f11bdb5 x86: hevc_mc: replace simple leas by adds
lea is detrimental for those simple cases. No impact overall to
the change though.

Before:
15017 decicycles in q, 1016152 runs, 32424 skips
15382 decicycles in q_bi, 1013673 runs, 34903 skips
3713 decicycles in e, 2074534 runs, 22618 skips
3901 decicycles in e_bi, 2065509 runs, 31643 skips
7852 decicycles in q_uni, 520165 runs, 4123 skips
2398 decicycles in e_uni, 1043339 runs, 5237 skips

After:
14898 decicycles in q, 1016295 runs, 32281 skips
15119 decicycles in q_bi, 1015392 runs, 33184 skips
3682 decicycles in e, 2073224 runs, 23928 skips
3720 decicycles in e_bi, 2065043 runs, 32109 skips
7643 decicycles in q_uni, 520280 runs, 4008 skips
2363 decicycles in e_uni, 1043780 runs, 4796 skips

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-26 05:41:04 +02:00
Mickaël Raulet
bd0f2d316f x86/hevc: add 12bits support for MC
cherry picked from commit 3fcb7a4595a6f40100a22110a5805e3b7510c0fd

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-26 01:55:20 +02:00
Christophe Gisquet
dcd2a6ca36 x86: hevc_mc: remove unneeded shift
The immediate value may be 0.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-01 23:34:33 +02:00
Christophe Gisquet
0810608e23 x86: hevc_mc: better register allocation
The xmm reg count was incorrect, and manual loading of the gprs
furthermore allows to noticeable reduce the number needed.

The modified functions are used in weighted prediction, so only a
few samples like WP_* exhibit a change. For this one and Win64
(some widths removed because of too few occurrences):

WP_A_Toshiba_3.bit, ff_hevc_put_hevc_uni_w
         16    32
before: 2194  3872
after:  2119  3767

WP_B_Toshiba_3.bit, ff_hevc_put_hevc_bi_w
         16    32    64
before: 2819  4960  9396
after:  2617  4788  9150

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-28 17:39:34 +02:00
Christophe Gisquet
f1793fe9cd x86: hevc_mc: specify coefficients registers
By default, macro EPEL_FILTER loads the coefficients inconditionally
into m14/m15. This forces an unneeded higher register count.

Reduce that count by making them parameters of EPEL_FILTER.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-18 16:23:58 +02:00
Hendrik Leppkes
87f2d8079a hevcdsp: correctly indicate that hevc_put_hevc_bi_epel_h uses 9 GPRs
Fixes FATE on Windows.

Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-12 17:00:48 +02:00
plepere
7a2491c436 HEVC : added assembly MC functions
pretty print x86

Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-06 18:23:36 +02:00