Linfeng Zhang
d0e687bf8c
remove mmx sad functions
...
there are sse2 equivalents which is a reasonable modern baseline
Change-Id: Ibbe536a5ad1c2cccef6bdcc75c13b3dde35a56ba
2016-05-11 10:50:04 -07:00
Jim Bankoski
da33728f48
vpx_dsp: Rename postproc.c add_noise.
...
Change-Id: I4906d1b79a2951e659995202b9fa97e2ea5cfba0
2016-05-10 06:52:58 -07:00
Scott LaVarnway
c2c5297595
Merge "VPX: refactor vpx_idct16x16_1_add_sse2()"
2016-05-09 22:15:17 +00:00
Scott LaVarnway
1490342be5
VPX: refactor vpx_idct16x16_1_add_sse2()
...
Change-Id: I431ea0d9abe764d110a1ba32a8cb15e2fdac8805
2016-05-09 09:50:00 -07:00
Johann
b23bd2360f
The subfunctions are only defined for sse2
...
See highbd_subpel_variance_impl_sse2.asm
Change-Id: Id13b97f4f6d189ed71cdc6d52b3c4ea63dc1da05
2016-05-06 18:58:49 -07:00
Johann
a761197fbd
Unlike non-hbd variance, opt2 is never used
...
Change-Id: I1d342725df332c4efc6006d9e3dcb7372c41f448
2016-05-06 18:38:04 -07:00
James Zern
2184692c07
vpx_dsp/*.[hc]: add missing vpx_dsp_rtcd.h include
...
Change-Id: I103be7eee36492f8619144ce8325bc916d4975c7
2016-05-04 15:06:44 -07:00
James Bankoski
89f905e5e5
Merge "libvpx: add a unit test for plane_add_noise."
2016-05-04 13:09:05 +00:00
Jim Bankoski
34d5aff747
libvpx: add a unit test for plane_add_noise.
...
In so doing this fixes a couple of bugs:
vpx_plane_add_noise.c needed to subtract a clamp instead of add.
And the assembly (mmx sse) had assumptions that parameters were
continuous in memory which was not true.
Change-Id: I76f2c43cf54bfc838eb2edf8a443eaaa7565d7b5
2016-05-03 16:23:06 -07:00
James Bankoski
e755a283dd
Merge "Move vpx_add_plane from codec to vpx_dsp and dedup."
2016-05-03 14:11:57 +00:00
Jim Bankoski
fce3cee8dd
Move vpx_add_plane from codec to vpx_dsp and dedup.
...
Change-Id: I12218d8331c0558c0587a66321e3ca46da7e5cc7
2016-05-02 12:17:39 -07:00
Alex Converse
a68b24fdee
Tweak casts on vpx_sub_pixel_variance to avoid implicit overflow.
...
Change-Id: I481eb271b082fa3497b0283f37d9b4d1f6de270c
2016-04-27 16:37:18 -07:00
Alex Converse
6c4007be1c
Be explicit about overflow in vpx_variance16x16_sse2.
...
The product always fits in uint32_t, but the operands don't.
An optimizing compiler should generate the wraparound code.
(Verified with clang).
Change-Id: I25eb64df99152992bc898b8ccbb01d55c8d16e3c
2016-04-27 15:22:17 -07:00
Alex Converse
ccb894ce73
Remove casts on < 16x16 variance.
...
These blocks will never overflow since max sum is +/-255*w*h.
Change-Id: Ia2c630339fd9cfb411b56b6040ff402095f12a2e
2016-04-27 15:21:58 -07:00
James Zern
38bc1d0f4b
vpx_fdct16x16_1_sse2: improve load pattern
...
load the full row rather than doing 2 8-wide columns
Change-Id: I7a1c0cba06b0dc1ae86046410922b1efccb95c95
2016-04-04 16:03:42 -07:00
James Zern
3735def667
vpx_fdctNxN_1_sse2: reduce store size
...
only output[0] needs to be set, store_output is more involved than a
movdqa in the high bitdepth case
Change-Id: I2cbd85d7cf74688bdf47eb767934fe42e02bff67
2016-04-04 16:02:06 -07:00
Scott LaVarnway
67c4c8244a
VPX: loopfilter_mmx.asm using x86inc 2
...
This reverts commit 9aa083d164e0d39086aa0c83f0d1a0d0f0d1ba61.
Fixes a decoder mismatch with 32bit PIC builds.
Change-Id: I94717df662834810302fe3594b38c53084a4e284
2016-03-08 04:24:47 -08:00
James Zern
9aa083d164
Revert "VPX: loopfilter_mmx.asm using x86inc"
...
This reverts commit 15ecdc3970462c15fdf7185d373cb52664f40c0f.
breaks 32-bit pic builds
Change-Id: I8bb1b9471a293f05ac7423aaba0339d408931b7a
2016-03-04 18:23:45 -08:00
Scott LaVarnway
dd6729f826
VPX: Remove pmin/pmax from subpixel functions.
...
These instructions are unnecessary if the adds
are done in the correct order.
Change-Id: I4e533b8267c32e610a4b94203ad052dc9fdabd71
2016-02-27 05:47:56 -08:00
Scott LaVarnway
51beb29f52
Merge "VPX: vpx_filter_block1d16_(v8, v8_avg)"
2016-02-27 13:31:18 +00:00
James Zern
654d2163c9
x86/convolve.h: remove redundant check in FUN_CONV_2D
...
the filter will be the same in this case
Change-Id: I95159bcb05bbfb71b57da741393e80cc7ffc5cff
2016-02-25 23:31:50 -08:00
James Zern
6d8c8c6201
x86/convolve.h: replace while w/if for w < 16
...
in non-hbd configurations; any high-bitdepth changes will be done in a
follow-up
Change-Id: Ia74e30971b744c1faab68c92fdeda1a053988c77
2016-02-25 21:44:06 -08:00
Scott LaVarnway
1f736e400f
VPX: vpx_filter_block1d16_(v8, v8_avg)
...
Store result with one 16 byte store instead of
two 8 byte stores.
Change-Id: I43acbc5edfd6d6055a926f9b9605d47127400f09
2016-02-25 06:15:24 -08:00
James Zern
b3ceb629ba
x86/convolve.h: change filter[] || chains to |
...
Change-Id: I661f64390f232826857b259e7a67e77f5a3a91ad
2016-02-24 19:47:43 -08:00
Scott LaVarnway
06d0e2fe6c
BUG FIX: vpx_filter_block1d(8,4)_(v8, v8_avg)
...
Change-Id: Ic7ea79988ed0864e7ddbfeb312516bcf77eaaac1
2016-02-23 12:23:41 -08:00
Scott LaVarnway
15ecdc3970
VPX: loopfilter_mmx.asm using x86inc
...
Change-Id: Idcf29281d617b275e3ca50f77e6d00c60992a36d
2016-02-18 15:34:58 -08:00
James Zern
9b44d9d00f
split vpx_highbd_lpf_horizontal_16 in two
...
replace with vpx_highbd_lpf_horizontal_edge_16 and
vpx_highbd_lpf_horizontal_edge_8 to avoid passing a count parameter
Change-Id: I551f8cec0fce57032cb2652584bb802e2248644d
2016-02-16 23:13:58 -08:00
James Zern
1b519fb666
split vpx_lpf_horizontal_16 in two
...
replace with vpx_lpf_horizontal_edge_16 and vpx_lpf_horizontal_edge_8 to
avoid passing a count parameter
Change-Id: I848c95c02a3c6ebaa6c2bdf0983dce05cd645271
2016-02-16 22:57:45 -08:00
James Zern
e7a23d703b
vpx_highbd_lpf_horizontal_4: remove unused count param
...
Change-Id: I655a771e1b1a8753be5669ef9348a312ba6cfdbc
2016-02-16 22:57:45 -08:00
James Zern
5171857329
vpx_highbd_lpf_horizontal_8: remove unused count param
...
Change-Id: Iaca71ea3796115d4c2d43563b4e6f3914e21f1bf
2016-02-16 22:57:44 -08:00
James Zern
3c1019e49d
vpx_highbd_lpf_vertical_4: remove unused count param
...
Change-Id: Ic6da723c5cf3cd8127db1f476c3e46ea134cb774
2016-02-16 22:57:44 -08:00
James Zern
72a9f06ac2
vpx_highbd_lpf_vertical_8: remove unused count param
...
Change-Id: Id16f7259897654831d31642c2d5e0bbe5e13416c
2016-02-16 22:57:44 -08:00
James Zern
b1e97c6a25
vpx_lpf_horizontal_4: remove unused count param
...
Change-Id: Iec7d8eda343991f7d7d46931dca17af23c821d11
2016-02-16 22:57:27 -08:00
James Zern
bd5a5bb561
vpx_lpf_horizontal_8: remove unused count param
...
Change-Id: I48741e167a7b09b7c9ad3bfc1c4b88ef1029ae46
2016-02-16 22:54:40 -08:00
James Zern
109a47b342
vpx_lpf_vertical_4: remove unused count param
...
Change-Id: I43a191cb3d42e51e7bca266adfa11c6239a8064c
2016-02-16 14:59:00 -08:00
James Zern
37225744db
vpx_lpf_vertical_8: remove unused count param
...
Change-Id: Ic69406da00afb0f06588e8c0deb2b043952b078c
2016-02-16 14:59:00 -08:00
Yaowu Xu
0aef1bc898
Enable sse2 version of inverse wht for hbd build
...
Change-Id: If8f5efd701a11c8a7ad3078d10ec3cd0fe27667e
2016-01-29 14:47:56 -08:00
Yaowu Xu
b229710811
SSSE3 idct8x8 functions for highbitdpeth build
...
This commit changes SSSE3 optimized idct8x8 functions to work with
highbitdepth build.
With this commit and the previous one that enabled SSSE3 idct32x32
functions, tests showed virtually no difference on decoding speed for
file fdJc1_IBKJA.248.webm for the build with -enable-vp9-highbitdpeth
option and the build without the option.
Change-Id: Ibe0634149ec70e8b921e6b30171664b8690a9c45
2016-01-29 12:36:53 -08:00
Yaowu Xu
aac1ef7f80
Enable hbd_build to use SSSE3optimized functions
...
This commit changes the SSSE3 assembly functions for idct32x32 to
support highbitdepth build.
On test clip fdJc1_IBKJA.248.webm, this cuts the speed difference
between hbd and lbd build from between 3-4% to 1-2%.
Change-Id: Ic3390e0113bc1ca5bba8ec80d1795ad31b484fca
2016-01-29 01:30:43 +00:00
James Zern
3a2ad10de2
Merge "Code clean of sad4xNx4D_sse"
2016-01-25 20:57:15 +00:00
Alex Converse
ed3df445d9
Revert "Merge "Change highbd variance rounding to prevent negative variance.""
...
This reverts commit ea48370a500537906d62544ca4ed75301d79e772, reversing
changes made to 15939cb2d76c773950cda40988ede89e111872ea.
The commit was insufficiently tested and causes failures.
Change-Id: I623d6fc2cd3ae6fd42d0abab1f8eada465ae57a7
2016-01-13 11:19:06 -08:00
Alex Converse
ea48370a50
Merge "Change highbd variance rounding to prevent negative variance."
2016-01-13 00:25:54 +00:00
Jian Zhou
26a6ce4c6d
Code clean of highbd_tm_predictor_32x32
...
Remove the ARCH_X86_64 constraint. No performance hit on both
big core and small core.
Change-Id: I39860b62b7a0ae4acaafdca7d68f3e5820133a81
2015-12-22 16:51:57 -08:00
Jian Zhou
355bfa2193
Code clean of highbd_tm_predictor_16x16
...
Remove the ARCH_X86_64 constraint.
Change-Id: I0139f8e998cc5525df55161c2054008d21ac24d4
2015-12-22 16:34:40 -08:00
Jian Zhou
a4c265f1b7
Code clean of highbd_dc_predictor_32x32
...
Remove the ARCH_X86_64 constraint.
Change-Id: I7d2545fc4f24eb352cf3e03082fc4d48d46fbb09
2015-12-22 16:06:54 -08:00
James Zern
cedb1db594
Merge "Code clean of highbd_tm_predictor_4x4"
2015-12-22 16:45:01 +00:00
James Zern
a097963f80
Merge "Code clean of highbd_dc_predictor_4x4"
2015-12-22 16:30:37 +00:00
Jian Zhou
52e7f4153b
Merge "Code clean of highbd_v_predictor_4x4"
2015-12-21 18:07:48 +00:00
Yunqing Wang
b597e3e188
Merge "Fix for issue 1114 compile error"
2015-12-19 04:29:39 +00:00
James Zern
8b2ddbc728
sad_sse2: fix sad4xN(_avg) on windows
...
reduce the register count by 1 to avoid xmm6 and unnecessarily
penalizing the other users of the base macro
Change-Id: I59605c9a41a31c1b74f67ec06a40d1a7f92c4699
2015-12-18 19:19:32 -08:00