generic-library/vpx

Author	SHA1	Message	Date
Linfeng Zhang	563d58ab84	Rewrite vpx_idct16x16_{10,256}_add_sse2() and add case 38 function BUG=webm:1412 Change-Id: I945f0fb6807b8948747243794dc7352b959221f7	2017-08-03 13:59:47 -07:00
Linfeng Zhang	6624f20785	Merge changes I76727df0,I66297d78,I1d000c6b * changes: Extract inlined 16x16 idct sse2 code into header file Add transpose_32bit_8x4() sse2 optimization Update x86 idct optimization	2017-08-03 20:51:02 +00:00
Scott LaVarnway	8334a48d3a	vpx_dsp: Use correct check for halfpel in vpx_sub_pixel_variance32xh_avx2() and vpx_sub_pixel_avg_variance32xh_avx2 see: `17fae3a` Change to use correct check for halfpel Change-Id: Ib0741c5c2fd011e9650ca62b76009f1b59fdbe4c	2017-08-03 06:57:40 -07:00
Linfeng Zhang	15a47db730	Extract inlined 16x16 idct sse2 code into header file Will be called by high bitdepth functions. Change-Id: I76727df00941b5a27adceaba8347f275475fcd8c	2017-08-02 16:17:43 -07:00
Linfeng Zhang	8c0ab7607e	Add transpose_32bit_8x4() sse2 optimization Change-Id: I66297d78b38db718cfe3ebb8ea972f5a72c17955	2017-08-02 16:15:58 -07:00
Scott LaVarnway	698e56f26c	Merge "vpxdsp: variance_impl_avx2.c cleanup"	2017-08-02 19:08:10 +00:00
Scott LaVarnway	632fe8286a	vpxdsp: variance_impl_avx2.c cleanup BUG=webm:1404 Change-Id: I8d8498009e5ef7bf1137e4ff16ec81738a020b02	2017-08-02 05:57:39 -07:00
Linfeng Zhang	6738ad7aaf	Update x86 idct optimization Move constant coefficients preparation into inline function. Change-Id: I1d000c6b161794c8828ff70768439b767e2afea1	2017-08-01 14:40:12 -07:00
Linfeng Zhang	bf14d468c1	Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2 This replaces commit `aa1c4cd`, which has a bug and was reverted in commit `3c73e58`. The bug is caused by rounding -step1[5] in highbd_idct8x8_12_half1d(). Change-Id: I37b3a5f0d91815f2dc570209091dc6626fd178a8	2017-07-31 16:36:13 -07:00
James Zern	78155b7ed5	highbd_inv_txfm_sse4: make << of neg. val a multiply left shifting a negative value is undefined; quiets a ubsan warning. this is applied to a constant, no change in the generated code. Change-Id: I595f0ff7904ef025e07bb80234293d958dc9f254	2017-07-30 12:48:28 -07:00
James Zern	d35b627340	Revert "Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2" This reverts commit `aa1c4cd140`. This fails the following tests with extreme input coefficients: SSE2/InvTrans8x8DCT.CompareReference/0 SSE2/InvTrans8x8DCT.CompareReference/2 previously the optimized path was skipped in this range Change-Id: I9af015a46eba96208834a219fafd651d37556a80	2017-07-29 11:12:27 -07:00
Linfeng Zhang	75653b7032	Merge changes Ia0e20f5f,I28150789,I35df041b,I221dff34 * changes: Update vpx_idct16x16_10_add_sse2() Add vpx_idct16x16_38_add_sse2() Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2 Refactor highbd idct 4x4 and 8x8 x86 functions	2017-07-28 22:43:00 +00:00
James Zern	3c73e587d1	Revert "quantize ssse3: declare all variables" This reverts commit `03f5e300d6`. This causes test failures under OSX: SSSE3/VP9QuantizeTest.EOBCheck/0 SSSE3/VP9QuantizeTest.OperationCheck/0 Change-Id: I122732717ead1f7af5b04c529a6948e382e5e59b	2017-07-28 01:22:16 -07:00
Linfeng Zhang	5232e35bc2	Update vpx_idct16x16_10_add_sse2() Change-Id: Ia0e20f5fa47382af5785221eebb05212b40bd35c	2017-07-27 18:03:25 -07:00
Linfeng Zhang	7f4acf8700	Add vpx_idct16x16_38_add_sse2() Change-Id: I28150789feadc0b63d2fadc707e48971b41f9898	2017-07-27 18:02:43 -07:00
Linfeng Zhang	aa1c4cd140	Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2 BUG=webm:1412 Change-Id: I35df041b757d42278ac7a5cdbd909e8ffcee1455	2017-07-27 18:02:36 -07:00
Linfeng Zhang	9c43d81bc2	Refactor highbd idct 4x4 and 8x8 x86 functions BUG=webm:1412 Change-Id: I221dff34dd5f71b390b5e043d0a137ccb0a01dec	2017-07-27 18:01:03 -07:00
Johann Koenig	a83e1f1d53	Merge "quantize ssse3: declare all variables"	2017-07-27 21:18:35 +00:00
James Zern	1c666465af	inv_txfm_{sse2,ssse3}: clear conversion warnings visual studio reports tran_high_t (int64) -> short in calls to _mm_set1_epi16 Change-Id: Icb8d1baee77ad3d45edb1477a443d3e648f0b745	2017-07-25 20:13:49 -07:00
James Zern	62682ac8ad	highbd_idct_sse.c: clear conversion warnings visual studio reports tran_high_t (int64) -> int in calls to _mm_setr_epi32 Change-Id: Ic2247c8e3800991202151790d78bd94c4f4aed05	2017-07-25 20:11:09 -07:00
James Zern	85736e616e	vpx_variance16x16_sse2: correct cast order allow the right shift to operate on 64-bits, this matches the rest of the implementations previously: `b0f1ae147` vpx_get16x16var_avx2: correct cast order Change-Id: I632ee5e418f3f9b30e79ecd05588eb172b0783aa	2017-07-25 16:45:40 -07:00
James Zern	b0f1ae1475	vpx_get16x16var_avx2: correct cast order allow the right shift to operate on 64-bits, this matches the rest of the implementations missed in: `6acd061aa` variance_avx2: sync variance functions with c-code Change-Id: Icae436b881251ccb9f9ed64fcbf8d358c58a4617	2017-07-24 16:29:44 -07:00
Johann	03f5e300d6	quantize ssse3: declare all variables Copy missing line from avx implementation. Change-Id: I9755c5b4d4034867de6fa9f741c24bf49dce3a27	2017-07-18 12:32:57 -07:00
James Zern	3dd993e4be	highbd_idct8x8_add_sse4: make << of neg. val a multiply left shifting a negative value is undefined; quiets a ubsan warning. this is applied to a constant, no change in the generated code. Change-Id: Ia17a7672d4832463decbc4afd6cd42974d02698e	2017-07-01 11:56:56 -07:00
Linfeng Zhang	c338f3635e	Add vpx_highbd_idct8x8_{12, 64}_add_sse4_1 BUG=webm:1412 Change-Id: I5d038b4fa842ce2f6b9bd5c8c44c70647bda9591	2017-06-29 17:19:34 -07:00
Linfeng Zhang	ee5cb8d87f	sse2: Add transpose_32bit_4x4x2() and update transpose_32bit_4x4() BUG=webm:1412 Change-Id: I9d00d1ddbd724fd5f825fd974c4cf46a9bca6cb3	2017-06-29 17:18:01 -07:00
Linfeng Zhang	0fa59a4baf	Refactor highbd idct 4x4 sse4.1 code and add highbd_inv_txfm_sse4.h Also clean highbd_inv_txfm_sse2.h BUG=webm:1412 Change-Id: I0722841d824ce602874019bd9779b10d49d10c0b	2017-06-29 17:17:43 -07:00
Linfeng Zhang	9ac78ae35f	Refactor vpx_idct8x8_12_add_ssse3() and add inv_txfm_ssse3.h BUG=webm:1412 Change-Id: I1f640db71ad4c644b7521305a781f2218eb1ba9d	2017-06-29 17:13:28 -07:00
Linfeng Zhang	0bb31a46a4	Update vpx_idct8x8_12_add_ssse3() Change-Id: I0f38801c391db87ddae168602a786a062cd34b1d	2017-06-26 14:57:41 -07:00
Linfeng Zhang	a76b6b232c	Update load_input_data() in x86 Split to load_input_data4() and load_input_data8(). Use pack with signed saturation instruction for high bitdepth. Change-Id: Icda3e0129a6fdb4a51d1cafbdc652ae3a65f4e06	2017-06-26 13:38:33 -07:00
Linfeng Zhang	8253a27904	Add vpx_highbd_idct4x4_16_add_sse4_1() BUG=webm:1412 Change-Id: Ie33482409351a01be4e89466b0441834eb1e905a	2017-06-23 14:30:12 -07:00
Linfeng Zhang	b8a4b5dd8d	Cosmetics, 8x8 idct SSE2 optimization Change-Id: Id21fa94fd323e36cd19a2d890bf4a0cafb7d964d	2017-06-23 14:30:12 -07:00
Linfeng Zhang	466b667ff3	Clean vpx_idct16x16_256_add_sse2() Remove macro IDCT16 which is redundant with idct16_8col(). Change-Id: I783c5f4fda038a22d5ee5c2b22e8c2cdfb38432c	2017-06-21 13:47:15 -07:00
Linfeng Zhang	42522ce0b7	Update vpx_idct{8x8,16x16,32x32}_1_add_sse2() Change-Id: I365f8e53d9ccd028cef0f561d4de9e5916278609	2017-06-21 13:47:05 -07:00
Linfeng Zhang	2b43a1ee18	Clean 32x32 full idct sse2 and ssse3 code vpx_idct32x32_1024_add_ssse3() is actually a sse2 function and faster than vpx_idct32x32_1024_add_sse2(). Replace the slow one. All are code relocations, no new code. Change-Id: I5dac0e98cc411a4ce05660406921118986638d19	2017-06-21 13:46:49 -07:00
Linfeng Zhang	c7e4917e97	Clean 8x8 idct x86 optimization Create load_buffer_8x8() and write_buffer_8x8(). Change-Id: Ib26dd515d734a5402971c91de336ab481b213fdf	2017-06-15 14:30:00 -07:00
Linfeng Zhang	98967645a1	Remove vpx_idct8x8_64_add_ssse3() It's almost identical with vpx_idct8x8_64_add_sse2(), except little difference in instructions order. Change-Id: Ie60dabc35eaa6ebae7c755e6cff00a710aad284f	2017-06-15 14:09:33 -07:00
Linfeng Zhang	6da6a23291	Update high bitdepth load_input_data() in x86 BUG=webm:1412 Change-Id: Ibf9d120b80c7d3a7637e79e123cf2f0aae6dd78c	2017-06-13 16:53:53 -07:00
Linfeng Zhang	d6eeef9ee6	Clean array_transpose_{4X8,16x16,16x16_2) in x86 Change-Id: I341399ecbde37065375ea7e63511a26bfc285ea0	2017-06-13 16:50:44 -07:00
Linfeng Zhang	9c72e85e4c	Remove array_transpose_8x8() in x86 Duplicate of transpose_16bit_8x8() Change-Id: Iaa5dd63b5cccb044974a65af22c90e13418e311f	2017-06-13 16:50:44 -07:00
Linfeng Zhang	cbb991b6b8	Convert 8x8 idct x86 macros to inline functions Change-Id: Id59865fd6c453a24121ce7160048d67875fc67ce	2017-06-13 16:50:43 -07:00
Linfeng Zhang	45048dc9dc	Update vpx_highbd_idct4x4_16_add_sse2() BUG=webm:1412 Change-Id: I26e4b34ae9bc1ae80c24f56d740d737a95f1ab84	2017-05-30 09:25:30 -07:00
Linfeng Zhang	6444958f62	Update inv_txfm_sse2.h and inv_txfm_sse2.c Extract shared code into inline functions. Change-Id: Iee1e5a4bc6396aeed0d301163095c9b21aa66b2f	2017-05-23 14:54:46 -07:00
Linfeng Zhang	c167345ffb	Add vpx_highbd_idct{4x4,8x8,16x16}_1_add_sse2 BUG=webm:1412 Change-Id: Ia338a6057d36f9ed7eaa9cbd4dfbf0c3cbdc6468	2017-05-22 11:24:21 -07:00
Linfeng Zhang	18e8baa5c0	Add transpose_32bit_4x4() and rename transpose_4x4() for vpx_dsp/x86 Change-Id: Ib57377f6cf6573c04720d3cc5dea4285362b4220	2017-05-16 17:46:37 -07:00
Linfeng Zhang	ecd1eb2162	Update 4x4 idct sse2 functions It's a bit faster to call idct4_sse2() in vpx_idct4x4_16_add_sse2() Change-Id: I1513be7a895cd2fc190f4a8297c240b17de0f876	2017-05-08 16:16:52 -07:00
Linfeng Zhang	2c3a2ad6f1	Merge changes I0cfe4117,I3581d80d,Ida62c941 * changes: Split dsp/x86/inv_txfm_sse2.c Update highbd idct functions arguments to use uint16_t dst Clean CONVERT_TO_BYTEPTR/SHORTPTR in idct	2017-05-08 16:15:57 +00:00
Linfeng Zhang	2231669a83	Split dsp/x86/inv_txfm_sse2.c Spin out highbd idct functions. BUG=webm:1412 Change-Id: I0cfe4117c00039b6778c59c022eee79ad089a2af	2017-05-03 15:43:02 -07:00
Linfeng Zhang	d5de63d2be	Update highbd idct functions arguments to use uint16_t dst BUG=webm:1388 Change-Id: I3581d80d0389b99166e70987d38aba2db6c469d5	2017-05-03 13:59:16 -07:00
Linfeng Zhang	081b39f2b7	Clean CONVERT_TO_BYTEPTR/SHORTPTR in idct BUG=webm:1388 Change-Id: Ida62c941f2b836d6c9e27b427a7d5008ab6dc112	2017-05-03 13:58:31 -07:00
Yi Luo	a3452996a1	High bit depth inter prediction horizontal/vertical filters AVX2 User level speed improvement on i7-6700, cpu-used=1, x86_64 Linux, bitrate, 1080p, 8Mbps, 4K, 16Mbps: - Decoder: 1080p: ~4% 4K: ~5% - Encoder: 1080p: ~1% 4K: ~3% Change-Id: I51b48f9c5de0d62487d5a11aa579c97bd03dd640	2017-05-03 12:18:01 -07:00
Linfeng Zhang	51dc998f3a	Update highbd convolve functions arguments to use uint16_t src/dst BUG=webm:1388 Change-Id: I6912de2639895d817ce850da8ea9f6c8fe21da42	2017-04-25 14:22:19 -07:00
Linfeng Zhang	bf8a49abbd	Clean CONVERT_TO_BYTEPTR/SHORTPTR in convolve Replace by CAST_TO_BYTEPTR/SHORTPTR. The rule is: if a short ptr is casted to a byte ptr, any offset operation on the byte ptr must be doubled. We do this by casting to short ptr first, adding offset, then casting back to byte ptr. BUG=webm:1388 Change-Id: I9e18a73ba45ddae58fc9dae470c0ff34951fe248	2017-04-19 12:13:49 -07:00
James Zern	4ba20da8b1	Merge "Add AVX2 optimization to copy/avg functions"	2017-04-15 00:26:08 +00:00
Yi Luo	aa5a941992	Add AVX2 optimization to copy/avg functions Change-Id: Ibcef70e4fead74e2c2909330a7044a29381a8074	2017-04-14 16:50:10 -07:00
Johann	28a8622143	vpx_comp_avg_pred: sse2 optimization Provides over 15x speedup for width > 8. Due to smaller loads and shifting for width == 8 it gets about 8x speedup. For width == 4 it's only about 4x speedup because there is a lot of shuffling and shifting to get the data properly situated. BUG=webm:1390 Change-Id: Ice0b3dbbf007be3d9509786a61e7f35e94bdffa8	2017-04-13 08:44:52 -07:00
James Zern	47b9a09120	Resolve -Wshorten-64-to-32 in highbd variance. For 8-bit the subtrahend is small enough to fit into uint32_t. This is the same that was done for: `c0241664a` Resolve -Wshorten-64-to-32 in variance. For 10/12-bit apply: `63a37d16f` Prevent negative variance Change-Id: Iab35e3f3f269035e17c711bd6cc01272c3137e1d	2017-04-05 17:34:02 -07:00
James Zern	f16ea6a6eb	Merge "vp9_rdopt: correct size to vpx_sum_squares_2d_i16"	2017-03-23 00:53:22 +00:00
James Zern	5661cd8ff4	vp9_rdopt: correct size to vpx_sum_squares_2d_i16 the current implementations expect pixel size, not the block type BUG=webm:1392 Change-Id: Ib91e9f30a1f56e13566b1fb76f089dae9bb50cdc	2017-03-22 12:04:33 -07:00
Yi Luo	266868a40b	Make butterfly_self() signature consistent with butterfly() - Refer to patch: `48fca113d` inv_txfm_ssse3,butterfly: fix win32 abi compatibility. - Change four butterfly() calls to butterfly_self(), to simplify the operations. Change-Id: Ib2a8cfe6cddcaf0a59e6e6270d8380055ea42ef3	2017-03-21 09:36:35 -07:00
James Zern	5da2e500d7	inv_txfm_sse2: clear conversion warning in hbd build tran_high -> tran_low in return from dct_const_round_shift() Change-Id: I2fe06c4b604823b1d1fe40a487017c3c2819a440	2017-03-17 01:16:38 -07:00
Yi Luo	fedcf83f33	Improve idct32x32_1024_add SSSE3 intrinsics performance - Function level speed improves ~12%. Change-Id: I9b7dbddabf08c7d0f6b25264e6074d5ccbe39290	2017-03-14 14:04:08 -07:00
James Zern	48fca113d1	inv_txfm_ssse3,butterfly: fix win32 abi compatibility only the first 3 parameters can be aligned to 16 as required by __m128i, make them all pointers for consistency. since: `07c48ccfe` Improve idct32x32_34_add SSSE3 intrinsics performance BUG=webm:1384 Change-Id: I0324f701e723a27cb470036a180693ba8829d01d	2017-03-10 19:57:17 -08:00
Yi Luo	327add990f	Improve idct32x32_135_add SSSE3 intrinsics performance - Split the inv txfm into three parts to avoid stack spillover. - Function level speed improves ~12%. - Use function and macro to remove some repeated code. Change-Id: I14f5f072334fd766808cb52bf648df792e7379ee	2017-03-09 16:17:54 -08:00
Yi Luo	07c48ccfe0	Improve idct32x32_34_add SSSE3 intrinsics performance - Split the transform into first half and second half. - Reschedule the instructions to avoid stack spillover. - Function level speed improves ~16%. Change-Id: I166889840d23aa8a273eca00f6fbdae8b4566f35	2017-03-01 11:14:48 -08:00
Jerome Jiang	b1dcaf7f1e	Merge "Fix segmentation fault caused by denoiser working with spatial SVC."	2017-02-22 04:44:55 +00:00
Jerome Jiang	0d1e5a21c4	Fix segmentation fault caused by denoiser working with spatial SVC. Re-enable the affected test. BUG=webm:1374 Change-Id: I98cd49403927123546d1d0056660b98c9cb8babb	2017-02-21 09:38:28 -08:00
Yi Luo	1f8e8e5bf1	Fix idct8x8 SSSE3 SingleExtremeCoeff unit tests - In SSSE3 optimization, 16-bit addition and subtraction would overflow when input coefficient is 16-bit signed extreme values. - Function-level speed becomes slower (unit ms): idct8x8_64: 284 -> 294 idct8x8_12: 145 -> 158. BUG=webm:1332 Change-Id: I1e4bf9d30a6d4112b8cac5823729565bf145e40b	2017-02-17 14:05:05 -08:00
Yi Luo	f62dcc9c33	Replace idct32x32_1024_add_ssse3 assembly with intrinsics - Encoding/decoding test, BQTerrace_1920x1080_60.y4m, on i7-6700, no obvious user-level speed performance downgrade. - Passed unit tests. Change-Id: I20688e0dd3731021ec8fb4404734336f1a426bfc	2017-02-16 16:10:40 -08:00
Johann Koenig	a9b81da575	Merge "block error avx2: use tran_low_t"	2017-02-16 23:51:14 +00:00
Johann Koenig	06a82af0de	Merge "correct bitdepth_conversion_sse2.h header guard"	2017-02-16 21:41:28 +00:00
Johann	6c2d732bf4	correct bitdepth_conversion_sse2.h header guard Change-Id: Ic4ffd861608e67fe59bcb3a86010ce3ef11a5519	2017-02-16 12:43:33 -08:00
Yi Luo	1cb44945fb	Merge "Add idct32x32_135_add SSSE3 intrinsics"	2017-02-16 20:43:29 +00:00
Johann	2104454607	block error avx2: use tran_low_t Change-Id: Ic5f3a1f569d6f82afeaf4fcd7235374bb460db3c	2017-02-16 12:39:02 -08:00
Yi Luo	72a43e2378	Add idct32x32_135_add SSSE3 intrinsics - Replace the corresponding assembly code. - No user level speed performance degrade. - Unit tests passed. Change-Id: Idd0c5a4bad4976f1617c34100cb46e75e3b961e5	2017-02-16 11:29:34 -08:00
Johann	4682130b60	quantize_fp highbd ssse3: use tran_low_t for coeff Change-Id: Iebade0efc0efbb0a80a0f3adbef4962e3a2f25e8	2017-02-16 07:40:56 -08:00
Johann	44600442dc	bitdepth conversion: really use num elements The previous implementation confused bit/bytes/elements. It was using '32' as the multiplier but that was mistakenly adopted because a 32x32 transform embedded the stride. Change-Id: Ieeb867a332416b9a40580b5e7c9b20088e9e691a	2017-02-16 15:02:48 +00:00
Johann	327a02d77e	Use 'packssdw' for loading tran_low_t values This matches bitdepth_conversion_sse2.asm and produces substantially better assembly. The old way had lots of 'movzwl' and 'shl' and storing back to memory before loading into an xmm register. Change-Id: Ib33e35354dfd691a4f8b1e39f4dbcbb14cd5302b	2017-02-14 22:39:49 +00:00
clang-format	4b402746ca	apply clang-format Change-Id: I75e4a9e0b37bd4586f26c8d6c1fa27f3f6ff1bce	2017-02-14 12:45:52 -08:00
Yi Luo	bd86de1ac8	Replace idct32x32_34_add_ssse3 assembly with intrinsics - No user-level speed performance change. - Pass unit tests. Change-Id: Idfc598e00f354265e41f6b3219f4734216c115c6	2017-02-14 10:38:36 -08:00
Yi Luo	ac04d11abc	Replace idct8x8_12_add_ssse3 assembly code with intrinsics - Performance achieves the same as assembly. - Unit tests pass. Change-Id: I6eacfbbd826b3946c724d78fbef7948af6406ccd	2017-02-08 10:07:45 -08:00
Johann	641fda79bb	highbd x86: consolidate tran_low_t conversions Create new helper files specifically for converting tran_low_t types. Change-Id: I7c4c458ef910f3b3d10a3cfbf9df4de7682fd905	2017-02-06 10:43:26 -08:00
Jingning Han	bb40844e32	Merge "Add SSSE3 intrinsic 8x8 inverse 2D-DCT"	2017-02-02 22:18:32 +00:00
Johann Koenig	ce6318f254	Merge changes I43521ad3,I013659f6 * changes: satd highbd neon: use tran_low_t for coeff satd highbd sse2: use tran_low_t for coeff	2017-02-02 03:03:58 +00:00
Jingning Han	8f95389742	Add SSSE3 intrinsic 8x8 inverse 2D-DCT The intrinsic version reduces the average cycles from 183 to 175. Change-Id: I7c1bcdb0a830266e93d8347aed38120fb3be0e03	2017-02-01 14:47:53 -08:00
Johann	2ba383474d	satd highbd sse2: use tran_low_t for coeff BUG=webm:1365 Change-Id: I013659f6b9fbf9cc52ab840eae520fe0b5f883fb	2017-02-01 11:55:16 -08:00
Johann	0f751ecee3	hadamard highbd ssse3: use tran_low_t for coeff BUG=webm:1365 Change-Id: I374dfc08732932382043905f128e928b08cb4f57	2017-02-01 11:51:15 -08:00
Johann	2dac808dd1	hadamard highbd sse2: use tran_low_t for coeff BUG=webm:1365 Change-Id: Ica414007d8412ceebfffa9e58e8416226a3fe934	2017-02-01 11:46:57 -08:00
Johann	dcfff3ccc8	quantize ssse3: remove unused pxor Change-Id: Ifa22d77fd530827de0b32ae71810dc2213ab2937	2017-01-30 17:02:57 -08:00
Jingning Han	39fff1bea0	Rework 8x8 transpose SSSE3 for avg computation Use same transpose process as inv_txfm_sse2 does. Change-Id: I2db05f0b254628a11f621c4c09abb89501ba6d3c	2017-01-12 15:16:07 -08:00
Jingning Han	f65170ea84	Rework 8x8 transpose SSSE3 for inverse 2D-DCT Use same transpose process as inv_txfm_sse2 does. Change-Id: Ic4827825bd174cba57a0a80e19bf458a648e7d94	2017-01-12 15:13:18 -08:00
Jingning Han	9a780fa7db	Rework forward 8x8 2D-DCT ssse3 implementation This commit reworks the SSSE3 implementation of the forward 8x8 2D-DCT. It uses a cyclic rotation approach to the temporary xmm registers. It reduces the average cycles from 158 to 154. The SSE2 version uses 169 cycles. Change-Id: I1b79b9642aae0ed3fb3cefb5b70246e6de5d5caa	2017-01-10 12:50:55 -08:00
Linfeng Zhang	c8f25fa5c0	Clean hbd idct 4x4 neon functions and other BUG=webm:1301 Change-Id: I387b7eae716a7df15c691dc6f368b07602df7342	2016-12-14 11:38:28 -08:00
Linfeng Zhang	264f6e70ec	Update idct x86 intrinsics to not use saturated add and sub Change-Id: Iaa64d23fdb45ca1f235b0ea57e614516e548eca4	2016-11-29 17:06:08 -08:00
Jerome Jiang	de5fd00ec5	Change _xmm to _sse2 in deblocker assembly functions. Some cosmetic changes because xmm is an anachronism. Change-Id: I436a5b78a3c52776c20d6640939311f2a84a9bc7	2016-11-17 23:38:04 +00:00
Linfeng Zhang	d545c19afa	Rename vpx_highbd_idct8x8_10{}() to vpx_highbd_idct8x8_12{}() Also update its trigger threshold from 10 to 12. Change-Id: Ib8dddd87a5a22a12ca66e7084d342fbb027b0a2f	2016-11-07 09:07:55 -08:00
Linfeng Zhang	a9874961f0	Merge "Replace highbd_dct_const_round_shift with dct_const_round_shift"	2016-11-07 16:55:01 +00:00
Johann	e10c95dc83	Update vp9_fdct8x8_quant_ssse3 for highbitdepth Borrow transition functions from fdct.h nee vpx_quantize_b_sse2 BUG=webm:1304 Change-Id: I9c88c3eec3ff8bb461411d98c26c3c236ea28ef1	2016-11-05 01:23:07 +00:00
Linfeng Zhang	04c3bf3c85	Replace highbd_dct_const_round_shift with dct_const_round_shift They are identical. Change-Id: I1ccaf03c81c3cbf88e82d77ffeb8204f5b063c61	2016-11-04 16:15:02 -07:00
Johann	cf35ffc025	Extract high bit depth helper functions These can be used in the vp9 fdct as well. Change-Id: I4f3875e0cba1b8cad209c3a0581e121deba7675e	2016-11-04 18:13:51 +00:00
Urvang Joshi	e084e05484	Fix warnings reported by -Wshadow: Part1: vpx_dsp directory While we are at it: - Rename some variables to more meaningful names - Reuse some common consts from a header instead of redefining them. Change-Id: I75c4248cb75aa54c52111686f139b096dc119328 (cherry picked from aomedia 09eea21)	2016-10-17 19:25:19 -07:00
Linfeng Zhang	9c8981c666	add vpx high bitdepth convolve8 NEON intrinsics optimization BUG=webm:1299 Change-Id: I236bfa0441e357b6ff05add8269a2cfb543924d1	2016-10-17 15:23:54 -07:00
Linfeng Zhang	7f1f35183a	Unify loopfilter function names Rename vpx_lpf_horizontal_edge_8() to vpx_lpf_horizontal_16(). Rename vpx_lpf_horizontal_edge_16() to vpx_lpf_horizontal_16_dual(). Change-Id: I798ca8fbbd657d06d3db2bfb0fb3321168f49e52	2016-09-29 16:25:42 -07:00
Urvang Joshi	0aa3e2564f	Add compiler warning flag -Wextra and fix related warnings. Note: some of these warnings are enabled by a combination of -Wunused (added earlier) and -Wextra. Cherry-picked from AOM 4790a69faaec8f03d65f64ff070f6ab4307dbb16 Expands use of (void)x; on unused variables. AOM only supports one codec in codec_factory.h Does not include changes to HandleDecodeResult. AOM removed invalid_file_test.cc which does use the video parameter. Does not enable -Wextra yet. There are more issues to fix. BUG=webm:1069 Change-Id: I322a1366bd4fd6c0dec9e758c2d5e88e003b1cbf	2016-09-27 12:05:01 -07:00
James Zern	fdd1186f97	vpx_idct32x32_34_add_sse2: rm unneeded transposes this change is neutral to mildly positive across various x86-64 platforms Change-Id: I28fb5ae598fc1317b7a42c9a846ac5d57d104784	2016-09-21 19:49:25 -07:00
James Zern	6acd061aad	variance_avx2: sync variance functions with c-code add missing int64 -> uint32 cast; quiets -Wshorten-64-to-32 warnings Change-Id: I4850b36e18dc8b399108342be4bfe0b684aefb78	2016-09-19 16:19:29 -07:00
James Zern	33aef48f29	vpx_subpixel_8t_intrin_avx2: tolerate unversioned clang assume __clang_major__==0 has the latest version of _mm256_broadcastsi128_si256. fixes builds with custom clang toolchains. BUG=b/30970831 Change-Id: I90becd56278e4716bd46e2ba9d910af977e8dfa6	2016-09-16 07:14:17 +00:00
clang-format	5f6d143b41	apply clang-format Change-Id: I501597b7c1e0f0c7ae2aea3ee8073f0a641b3487	2016-09-15 15:07:53 -07:00
James Zern	4b0e78bfda	Merge "vpx_dsp: added vpx_highbd_idct32x32_1_add_sse2()"	2016-09-08 01:05:18 +00:00
Scott LaVarnway	309125b1e7	vpx_dsp: added vpx_highbd_idct32x32_1_add_sse2() Change-Id: I140d93aebadb0eaf6220881e61a0451450081227	2016-09-07 05:58:29 -07:00
Johann	d393885af1	Remove halfpix specialization This function only exists as a shortcut to subpixel variance with predefined offsets. xoffset = 4 for horizontal, yoffset = 4 for vertical and both for "hv" Removing this allows the existing optimizations for the variance functions to be called. Instead of having only sse2 optimizations, this gives sse2, ssse3, msa and neon. BUG=webm:1273 Change-Id: Ieb407b423b91b87d33c4263c6a1ad5e673b0efd6	2016-08-23 17:05:39 -07:00
James Zern	bd7cfb46fb	variance_impl_avx2: restore table layout disable clang-format for bilinear_filters_avx2 restores the row layout prior to: `099bd7f` vpx_dsp: apply clang-format but keeps the justification used by clang-format Change-Id: Icf1733a37edb807e74c26b23a93963c03bd08fd7	2016-08-12 11:52:53 -07:00
Alex Converse	c0241664aa	Resolve -Wshorten-64-to-32 in variance. The subtrahend is small enough to fit into uint32_t. Change-Id: Ic4d7128aaa665eaf6b25d562610ba8942c46137f	2016-07-28 10:16:31 -07:00
clang-format	956af1d478	vpx_dsp/x86/quantize_sse2.c: apply clang-format post: `e429080` .clang-format: disable DerivePointerAlignment Change-Id: I21a0546668edb2b09660e216d4875a1d2ad24d53	2016-07-27 21:41:18 -07:00
clang-format	099bd7f07e	vpx_dsp: apply clang-format Change-Id: I3ea3e77364879928bd916f2b0a7838073ade5975	2016-07-25 14:14:19 -07:00
Ivan Krasin	91369fd9b7	Fix compilation error under Clang 4.0. The LLVM trunk has reached 4.0 and now __clang_major__ is not enough to distinguish between old XCode Clang and the new 'real' Clang. Using __apple_build_version__ allows to make this distinction. BUG=chromium:631144 Change-Id: I0b6e46fddfe4f409c7b7e558bda34872e60ee2d9	2016-07-25 19:18:49 +00:00
Jim Bankoski	0dc69c70f7	postproc : fix function parameters for noise functions. Change-Id: I582b6307f28bfc987dcf8910379a52c6f679173c	2016-07-15 08:27:34 -07:00
Jim Bankoski	88e6951465	deblock filter : moved from vp8 code branch The deblocking filters used in vp8 have been moved to vpx_dsp for use by both vp8 and vp9. Change-Id: I5209d76edafc894b550f751fc76d3aa6799b392d	2016-07-12 05:53:00 -07:00
Jingning Han	7c1fdf02cd	Merge "Support measure distortion in the pixel domain"	2016-07-07 18:09:20 +00:00
Jingning Han	e357b9efe0	Support measure distortion in the pixel domain Use pixel domain distortion metric in speed 0. This improves the compression performance by 0.3% for both low and high resolution test sets. Change-Id: I5b5b7115960de73f0b5e5d0c69db305e490e6f1d	2016-07-06 18:25:17 -07:00
James Zern	5afa3b9150	Merge "improve vpx_filter_block1d* based on replace paddsw+psrlw to pmulhrsw"	2016-07-02 03:08:33 +00:00
James Zern	3197172405	Merge "Update vpx subpixel 1d filter ssse3 asm"	2016-07-02 03:08:17 +00:00
Johann	1b833d63d9	vpx_dsp: remove x86inc.asm distinction BUG=b:29583530 Change-Id: I397d77536b0d3cee0a92cdfe8b76bc4e434d0720	2016-06-29 18:55:58 -07:00
James Zern	3a6a81fc9a	Merge changes I9433d858,Iafd05637,If08ce6ca * changes: tests: remove redundant round() definition remove visual studio < 2010 workarounds configure: remove old visual studio support (<2010)	2016-06-29 23:07:16 +00:00
Linfeng Zhang	6b350766bd	Update vpx subpixel 1d filter ssse3 asm Speed test shows the new vertical filters have degradation on Celeron Chromebook. Added "X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON" to control the vertical filters activated code. Now just simply active the code without degradation on Celeron. Later there should be 2 set of vertical filters ssse3 functions, and let jump table to choose based on CPU type. Change-Id: Iba2f1f2fe059a9d142c396d03a6b8d2d3b981e87	2016-06-29 13:48:41 -07:00
Yaowu Xu	63a37d16f3	Prevent negative variance Due to rounding, hbd variance may become negative. This commit put in check and clamp of negative values to 0. Change-Id: I610d9c8aa2d4eebe7bc5f2c5624a9e3cadad4c94	2016-06-29 11:08:17 -07:00
James Zern	c125f4a594	remove visual studio < 2010 workarounds BUG=b/29583530 Change-Id: Iafd05637eb65f4da54a9c857e79204a77646858a	2016-06-28 20:58:49 -07:00
James Zern	f51f67602e	*.asm: normalize label format add a trailing ':', though it's optional with the tools we support, it's more common to use it to mark a label. this also quiets the orphan-labels warning with nasm/yasm. BUG=b/29583530 Change-Id: I46e95255e12026dd542d9838e2dd3fbddf7b56e2	2016-06-27 19:46:57 -07:00
Min Chen	b2fb48cfcf	improve vpx_filter_block1d* based on replace paddsw+psrlw to pmulhrsw Change-Id: I14c0c2e54d0b0584df88e9a3f0a256ec096bea6e	2016-06-27 17:50:45 +00:00
James Zern	cfd5e0221c	Revert "Update vpx subpixel 1d filter ssse3 asm" This reverts commit `1517fb74fd`. Fixes a segfault in windows x64 builds. Change-Id: I6a6959cd7e64a28376849a9f2b11fc852a7c1fbe	2016-06-25 11:37:20 -07:00
Linfeng Zhang	bdeb5febe4	Merge "Update vpx subpixel 1d filter ssse3 asm"	2016-06-23 19:08:04 +00:00
Alex Converse	83db21b2fd	vpx_lpf_horizontal_4_sse2: Remove dead load. Change-Id: I51026c52baa1f0881fcd5b68e1fdf08a2dc0916e	2016-06-22 18:17:41 -07:00
Linfeng Zhang	1517fb74fd	Update vpx subpixel 1d filter ssse3 asm Speed test shows the new vertical filters have degradation on Celeron Chromebook. Added "X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON" to control the vertical filters activated code. Now just simply active the code without degradation on Celeron. Later there should be 2 set of vertical filters ssse3 functions, and let jump table to choose based on CPU type. Change-Id: I37e3e9c5694737d9134a6bce6698d3e43f8fc962	2016-06-22 13:15:00 -07:00
Yaowu Xu	543ea3eb3e	Make type conversion explicit This fixes MSVC warnings. Change-Id: I675d8486230b2b74d7973d95720a4995c4750282	2016-06-20 12:05:29 -07:00
James Zern	e34e684059	Merge changes If31d36c8,I10b947e7 * changes: vpx_dsp,add_noise: remove mmx implementation vpx_dsp: remove mmx variance implementations	2016-06-04 00:56:06 +00:00
Linfeng Zhang	b90166665f	Merge "Slow pshufb removal in 3 intra prediction functions."	2016-06-03 16:35:14 +00:00
James Zern	462e0ff88b	vpx_dsp,add_noise: remove mmx implementation a sse2 version exists, this is a reasonable modern baseline. Change-Id: If31d36c8412d25b53f41b4a93cf02f46802c0c33	2016-06-02 23:51:22 -07:00
James Zern	eea8ea88ab	vpx_dsp: remove mmx variance implementations there are sse2 equivalents for all remaining variance implementations Change-Id: I10b947e73fc0067688181f819b59e47966bec3d2	2016-06-02 23:46:16 -07:00
Linfeng Zhang	ad0646cb84	Slow pshufb removal in 3 intra prediction functions. Replaced vpx_d45_predictor_4x4_ssse3(), vpx_d45_predictor_8x8_ssse3() and vpx_d207_predictor_4x4_ssse3() with created vpx_d45_predictor_4x4_sse2(), vpx_d45_predictor_8x8_sse2() and vpx_d207_predictor_4x4_sse2() respectively. It's mostly neutral or slightly worse than ssse3 in good cases and better than ssse3 in the bad cases (but still worse than using the mmx regs). Change-Id: Ib0237ceb71d2c57b8a93fd3170330cfed9d56bdd	2016-06-02 10:55:58 -07:00
Yaowu Xu	46ff1072b3	variance_avx2.c: UBSAN/IOC fix BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1222 Change-Id: Ifb3bedf9b4e1b007b21aebaa4beb9ba50424efef	2016-05-31 16:44:35 -07:00
Linfeng Zhang	0ba9b299e9	Merge "Upgrade vpx_lpf_{vertical,horizontal}_4 mmx to sse2"	2016-05-27 15:47:28 +00:00
Linfeng Zhang	4b5e462d08	Upgrade vpx_lpf_{vertical,horizontal}_4 mmx to sse2 Followed the code style of other lpf fuctions. These 2 functions put 2 rows of data in a single xmm register, so they have similar but not identical filter operations, and cannot share the same macros. Change-Id: I3bab55a5d1a1232926ac8fd1f03251acc38302bc	2016-05-26 14:55:18 -07:00
Scott LaVarnway	9d24fe60f1	Merge "Code clean of sub_pixel_variance4xh -- 2"	2016-05-26 13:20:24 +00:00
Scott LaVarnway	a4f3751be5	Code clean of sub_pixel_variance4xh -- 2 Replace MMX with SSE2. Change-Id: Id8482d2589131f9427e7f36bc64413f058caf31f	2016-05-24 04:44:05 -07:00
James Zern	3fb55d24e8	Revert "Code clean of sub_pixel_variance4xh" This reverts commit `2468163e07`. causes valgrind errors for overread of buffer in SubpelVarianceTest Change-Id: I448e52c76f815ac199305b71f7d169f2bc167679	2016-05-19 23:37:27 -07:00
Yaowu Xu	d1f0f4cc63	Merge "Clarify integer value ranges"	2016-05-18 23:55:05 +00:00
Yaowu Xu	a564b18d7f	Clarify integer value ranges This commit clarifies integer value range for vairables used in several variance functions, also change to use proper type conversion to reflect the value ranges. Change-Id: Ic3234b83a912ce1ad12d1b254f3378763e15cc5c	2016-05-18 10:25:12 -07:00
Scott LaVarnway	2468163e07	Code clean of sub_pixel_variance4xh Replace MMX with SSE2. Change-Id: Ia8fcba755952804e347d7d7736f57d1f90c988a0	2016-05-18 04:24:41 -07:00
Yaowu Xu	c1e4f5a80d	Merge "Change to use correct check for halfpel"	2016-05-13 01:27:47 +00:00
Linfeng Zhang	2f55beb355	Merge "remove mmx variance functions"	2016-05-11 22:21:23 +00:00

1 2 3 4 5 ...

418 Commits