generic-library/vpx

Author	SHA1	Message	Date
Jian Zhou	c91dd55eda	Code clean of highbd_v_predictor_4x4 MMX replaced with SSE2, same performance. Change-Id: I2ab8f30a71e5fadbbc172fb385093dec1e11a696	2015-12-18 15:25:27 -08:00
Jian Zhou	b158d9a649	Code clean of sad4xN(_avg)_sse Replace MMX with SSE2, reduce psadbw ops which may help Silvermont. Change-Id: Ic7aec15245c9e5b2f3903dc7631f38e60be7c93d	2015-12-17 11:10:42 -08:00
James Zern	b81f04a0cc	Merge "move vp9_avg to vpx_dsp"	2015-12-15 03:41:22 +00:00
James Zern	d36659cec7	move vp9_avg to vpx_dsp Change-Id: I7bc991abea383db1f86c1bb0f2e849837b54d90f	2015-12-14 14:42:12 -08:00
Jian Zhou	2404e3290e	Merge "Code clean of tm_predictor_32x32"	2015-12-14 17:56:01 +00:00
Jian Zhou	6e87880e7f	Merge "Speed up tm_predictor_16x16"	2015-12-11 18:55:46 +00:00
Jian Zhou	88120481a4	Code clean of tm_predictor_32x32 Reallocate the xmm register usage so that no ARCH_X86_64 required. Reduce memory access to the left neighbor by half. Speed up by single digit on big core machine. Change-Id: I392515ed8e8aeb02e6a717b3966b1ba13f5be990	2015-12-11 10:32:08 -08:00
Jian Zhou	62f986265f	Merge "SSE2 based h_predictor_32x32"	2015-12-11 18:02:34 +00:00
James Zern	ecb8dff768	Merge "dc_left_pred[48]: fix pic builds"	2015-12-11 02:48:11 +00:00
Jian Zhou	5604924945	Merge "Code clean of dc_left/top_predictor_16x16"	2015-12-11 01:53:44 +00:00
James Zern	40ee78bc19	dc_left_pred[48]: fix pic builds GET_GOT modifies the stack pointer so the offset for left's address will be wrong if loaded afterword. Change-Id: Iff9433aec45f5f6fe1a59ed8080c589bad429536	2015-12-10 15:44:31 -08:00
Yunqing Wang	322ea7ff5b	Fix the win32 crash when GET_GOT is not defined This patch continues to fix the win32 crash issue: https://bugs.chromium.org/p/webm/issues/detail?id=1105 Johann's patch is here: https://chromium-review.googlesource.com/#/c/316446/2 Change-Id: I7fe191c717e40df8602e229371321efb0d689375	2015-12-10 14:25:01 -08:00
Jian Zhou	4ec5953080	Code clean of dc_left/top_predictor_16x16 Remove some redundant code. Change-Id: Ida2e8c0ce28770f7a9545ca014fe792b04295260	2015-12-10 11:59:58 -08:00
Jian Zhou	c90a8a1a43	SSE2 based h_predictor_32x32 Relocate the function from SSSE3 to SSE2, Unroll loop from 16 to 8, and reduce mem access to left. Speed up by single digit in ./test_intra_pred_speed on big core machines. Change-Id: I2b7fc95ffc0c42145be2baca4dc77116dff1c960	2015-12-10 10:09:58 -08:00
Johann Koenig	420b9f5bd3	Merge "fix null pointer crash in Win32 because esp register is broken"	2015-12-09 19:31:12 +00:00
Jian Zhou	aa5b517a39	Re-enable SSE2 based intra 4x4 prediction 4x4 Intra predictor implemented with MMX is replaced with SSE2. Segfault in change 315561 when decoding vp8 is taken care of. Change-Id: I083a7cb4eb8982954c20865160f91ebec777ec76	2015-12-07 18:50:37 -08:00
Scott LaVarnway	c7e557b82c	Merge "VP9: Add ssse3 version of vpx_idct32x32_135_add()"	2015-12-07 21:13:35 +00:00
Sergey Kolomenkin	5fc9688792	fix null pointer crash in Win32 because esp register is broken https://bugs.chromium.org/p/webm/issues/detail?id=1105 Change-Id: I304ea85ea1f6474e26f074dc39dc0748b90d4d3d	2015-12-07 12:57:06 -08:00
James Zern	79a9add666	Revert "MMX in intra 4x4 prediction replaced with SSE2" This reverts commit 89a1efa4c436c58c101c8b3de866e3014be7d77a. This causes a segfault when decoding vp8, in both 32 and 64-bit Change-Id: Idbb9bb28ab897e1d055340497c47b49a12231367	2015-12-05 10:20:39 -08:00
Jian Zhou	e86c7c863e	Speed up h_predictor_16x16 Relocate the function from SSSE3 to SSE2, Unroll loop from 8 to 4, and reduce mem access to left. Speed up by >20% in ./test_intra_pred_speed. Change-Id: Ie48229c2e32404706b722442942c84983bda74cc	2015-12-04 12:12:55 -08:00
Jian Zhou	da3f08fac3	Speed up h_predictor_8x8 Relocate the function from SSSE3 to SSE2, Unroll loop from 4 to 2, and reduce mem access to left. Speed up by >20% in ./test_intra_pred_speed. Change-Id: Ib9f1846819783b6e05e2a310c930eb844b2b4d2e	2015-12-04 11:36:44 -08:00
Jian Zhou	aa2764abdd	MMX in intra 8x8 prediction replaced with SSE2 8x8 Intra predictor implemented with MMX is replaced with SSE2. Change-Id: I0c90e7c1e1e6942489ac2bfe58903b728aac7a52	2015-12-03 18:11:06 -08:00
Jian Zhou	89a1efa4c4	MMX in intra 4x4 prediction replaced with SSE2 4x4 Intra predictor implemented with MMX is replaced with SSE2. Change-Id: Id57da2a7c38832d0356bc998790fc1989d39eafc	2015-12-03 16:40:23 -08:00
Jian Zhou	623e988add	Merge "SSE2 speed up of h_predictor_4x4"	2015-12-02 18:49:00 +00:00
Scott LaVarnway	f0b0b1fe62	VP9: Add ssse3 version of vpx_idct32x32_135_add() Change-Id: I9a780131efaad28cf1ad233ae64c5c319a329727	2015-12-02 04:50:46 -08:00
Jian Zhou	c7fae5d893	Speed up tm_predictor_16x16 Reduce mem access to left. Speed up by 10% in ./test_intra_pred_speed with the same instruction size. Change-Id: Ia33689d62476972cc82ebb06b50415aeccc95d15	2015-11-30 17:46:40 -08:00
Scott LaVarnway	2669e05949	Merge "VPX: x86 asm version of vpx_idct32x32_1024_add()"	2015-11-30 23:28:27 +00:00
Jian Zhou	9d29d76280	SSE2 speed up of h_predictor_4x4 Relocate h_predictor_4x4 from SSSE3 to SSE2 with XMM registers. Speed up by ~25% in ./test_intra_pred_speed. Change-Id: I64e14c13b482a471449be3559bfb0da45cf88d9d	2015-11-30 10:08:05 -08:00
Scott LaVarnway	0148e20c3c	VPX: x86 asm version of vpx_idct32x32_1024_add() Change-Id: I3ba4ede553e068bf116dce59d1317347988b3542	2015-11-25 10:11:29 -08:00
Jian Zhou	901d20369a	Merge "Speed up tm_predictor_8x8"	2015-11-25 02:34:07 +00:00
Jian Zhou	f4621c5c8d	Speed up tm_predictor_8x8 Left neighbor read from memory only once. Speed up by ~20% in ./test_intra_pred_speed. Change-Id: Ia1388630df6fed0dce9a6eeded6cb855bbc43505	2015-11-24 16:07:06 -08:00
Scott LaVarnway	97e6cc6198	VPX: Removed unnecessary pmulhrsw in IDCT32X32_34 and fixed macro name. Change-Id: I306b98a2b4ec80b130ae80290b4cd9c7a5363311	2015-11-23 10:24:09 -08:00
James Zern	16eba81f69	Revert "Speed up h_predictor_4x4" This reverts commit d76032ae87e535be5b924d9e88bbd67189380534. breaks 32-bit builds Change-Id: If6266ec2a405b5a21d615112f0f37e8a71193858	2015-11-20 22:25:29 -08:00
James Zern	1b10753ad7	Merge "Speed up h_predictor_4x4"	2015-11-21 01:12:42 +00:00
Scott LaVarnway	e7fc39fdf5	Merge "VPX: x86 asm version of vpx_idct32x32_34_add()"	2015-11-20 15:11:00 +00:00
Jian Zhou	d76032ae87	Speed up h_predictor_4x4 Modify h_predictor_4x4 with XMM registers. Speed up by ~25% in ./test_intra_pred_speed. Change-Id: Id01c34c48e75b9d56dfc2e93af12cf0c0326a279	2015-11-19 11:34:22 -08:00
Jian Zhou	79b68626ae	Speed up tm_predictor_4x4 tm_predictor_4x4 is implemented with SSE2 using XMM registers. Speed up by ~25% in ./test_intra_pred_speed. Change-Id: I25074b78d476a2cb17f81cf654bdfd80df2070e0	2015-11-18 16:44:25 -08:00
Scott LaVarnway	ed833048c2	VPX: x86 asm version of vpx_idct32x32_34_add() Change-Id: Ic81f38998fb1b8d33f5a5d7424c2c41002786cef	2015-11-17 17:42:24 -08:00
James Zern	0ccad4d649	Revert "VPX: x86 asm version of vpx_idct32x32_34_add()" This reverts commit 9aeaa2016e7470c4e316d90da33d883098eed6f4. This causes some test vectors to fail. Change-Id: I3659a2068404ec5a0591fba5c88b1bec0c9059a4	2015-11-11 11:12:38 -08:00
James Zern	e3efed7f4c	Merge "convolve_copy_sse2: replace SSE w/SSE2 code"	2015-11-10 22:35:12 +00:00
Scott LaVarnway	f48321974b	Merge "VPX: x86 asm version of vpx_idct32x32_34_add()"	2015-11-10 21:40:11 +00:00
Scott LaVarnway	9aeaa2016e	VPX: x86 asm version of vpx_idct32x32_34_add() Change-Id: I8a933c63b7fbf3c65e2c06dbdca9646cadd0b7cb	2015-11-10 11:54:56 -08:00
James Zern	40dab58941	convolve_copy_sse2: replace SSE w/SSE2 code this should be neutral or slightly faster on modern (P4+) architectures Change-Id: Iec4c080275941eb8c9e05a66a2daf0405d86a69b	2015-11-09 23:45:16 -08:00
Geza Lore	9cfba09ac0	Optimize vpx_quantize_{b,b_32x32} assembler. Added optimization of the 8 bit assembly quantizer routines. This makes these functions up to 100% faster, depending on encoding parameters. This patch maskes the encoder faster in both the high bitdepth and 8bit configurations. In the high bitdepth configuration, it effects profile 0 only. Based on my profiling using 1080p input the net gain is between 1-3% for the 8 bit config, and around 2.5-4.5% for the high bitdepth config, depending on target bitrate. The difference between the 8 bit and high bitdepth configurations for the same encoder run is reduced by 1% in all cases I have profiled. Change-Id: I86714a6b7364da20cd468cd784247009663a5140	2015-10-20 10:11:19 +01:00
Johann	ec623a0bb7	Upstream Mozilla fix for older Apple clang builds Also use the _mm_broadcastsi128_si256 intrisic for Apple clang versions 4.[012] https://bugzilla.mozilla.org/show_bug.cgi?id=1085607 https://code.google.com/p/webm/issues/detail?id=1082 Change-Id: I6bc821d8163387194ef663e94bfed91fa7281d88	2015-10-14 07:41:23 -07:00
Alex Converse	0c00af126d	Add vpx_highbd_convolve_{copy,avg}_sse2 single-threaded: swanky (silvermont): ~1% faster overall peppy (celeron,haswell): ~1.5% faster overall Change-Id: Ib74f014374c63c9eaf2d38191cbd8e2edcc52073	2015-10-09 11:50:25 -07:00
Geza Lore	cbada4a982	Remove 4 mova insts from quantize_ssse3_x86_64.asm Change-Id: If3cb9345b44162e600e6c74873e0cb4c207fc7fb	2015-10-09 07:52:04 -07:00
Julia Robson	37c68efee2	SSSE3 optimisation for quantize in high bit depth When configured with high bit detpth enabled, the 8bit quantize function stopped using optimised code. This made 8bit content decode slowly. This commit re-enables the SSSE3 optimisations. Change-Id: I194b505dd3f4c494e5c5e53e020f5d94534b16b5	2015-10-06 13:32:02 +01:00
Scott LaVarnway	b212094839	Merge "VPX: refactor vpx_idct32x32_1_add_sse2()"	2015-10-06 11:35:15 +00:00
Julia Robson	5e6533e707	SSE2 optimisation for quantize in high bit depth When configured with high bit detpth enabled, the 8bit quantize function stopped using optimised code. This made 8bit content decode slowly. This commit re-enables the SSE2 optimisation (but not the SSSE3 optimisation). Change-Id: Id015fe3c1c44580a4bff3f4bd985170f2806a9d9	2015-10-05 10:59:16 -07:00

1 2 3

111 Commits