generic-library/vpx

Author	SHA1	Message	Date
Yi Luo	8440cc4817	Merge "Improve idct32x32_1024_add SSSE3 intrinsics performance"	2017-03-15 02:32:52 +00:00
Linfeng Zhang	c756eb01c8	Fix overflow issue in 32x32 idct NEON intrinsics Similar issue as Change `bc1c18e`. The PartialIDctTest.ResultsMatch test on vpx_idct32x32_135_add_neon() in high bit-depth mode exposes 16-bit overflow in final stage of pass 2, when changing the test number from 1,000 to 1,000,000. Change to use saturating add/sub for vpx_idct32x32_34_add_neon(), vpx_idct32x32_135_add_neon and vpx_idct32x32_1024_add_neon() in high bit-depth mode. Change-Id: Iaec0e9aeab41a3fdb4e170d7e9b3ad1fda922f6f	2017-03-14 16:59:14 -07:00
Yi Luo	fedcf83f33	Improve idct32x32_1024_add SSSE3 intrinsics performance - Function level speed improves ~12%. Change-Id: I9b7dbddabf08c7d0f6b25264e6074d5ccbe39290	2017-03-14 14:04:08 -07:00
Linfeng Zhang	b0bfcc368c	Merge "Add vpx_highbd_idct32x32_135_add_c()"	2017-03-13 18:49:01 +00:00
James Zern	48fca113d1	inv_txfm_ssse3,butterfly: fix win32 abi compatibility only the first 3 parameters can be aligned to 16 as required by __m128i, make them all pointers for consistency. since: `07c48ccfe` Improve idct32x32_34_add SSSE3 intrinsics performance BUG=webm:1384 Change-Id: I0324f701e723a27cb470036a180693ba8829d01d	2017-03-10 19:57:17 -08:00
Yi Luo	327add990f	Improve idct32x32_135_add SSSE3 intrinsics performance - Split the inv txfm into three parts to avoid stack spillover. - Function level speed improves ~12%. - Use function and macro to remove some repeated code. Change-Id: I14f5f072334fd766808cb52bf648df792e7379ee	2017-03-09 16:17:54 -08:00
Linfeng Zhang	77311e0dff	Update vpx_idct32x32_1024_add_neon() Most are cosmetics changes. Speed has no change with clang 3.8, and about 5% faster with gcc 4.8.4 Tried the strategy used in 8x8 and 16x16 (which operations' orders are similar to the C code), though speed gets better with gcc, it's worse with clang. Tried to remove store_in_output(), but speed gets worse. Change-Id: I93c8d284e90836f98962bb23d63a454cd40f776e	2017-03-08 12:39:04 -08:00
Linfeng Zhang	48f5886605	Add vpx_highbd_idct32x32_135_add_c() When eob is less than or equal to 135 for high-bitdepth 32x32 idct, call this function. BUG=webm:1301 Change-Id: I8a5864f5c076e449c984e602946547a7b09c9fe6	2017-03-08 10:46:33 -08:00
Linfeng Zhang	c4e5c54d69	cosmetics,dsp/arm/: vpx_idct32x32_{34,135}_add_neon() No speed changes and disassembly is almost identical. Change-Id: Id07996237d2607ca6004da5906b7d288b8307e1f	2017-03-08 08:58:32 -08:00
Linfeng Zhang	3cf5c213f1	cosmetics,dsp/arm/: rename a variable Rename cospi_6_26_14_18N to cospi_6_26N_14_18N for consistency. Change-Id: I00498b43bb612b368219a489b3adaa41729bf31a	2017-03-08 08:55:41 -08:00
Yi Luo	07c48ccfe0	Improve idct32x32_34_add SSSE3 intrinsics performance - Split the transform into first half and second half. - Reschedule the instructions to avoid stack spillover. - Function level speed improves ~16%. Change-Id: I166889840d23aa8a273eca00f6fbdae8b4566f35	2017-03-01 11:14:48 -08:00
James Zern	47d6f16a04	get_prob(): rationalize int types promote the unsigned int calculation to uint64_t rather than int64_t for type consistency Change-Id: Ic34dee1dc707d9faf6a3ae250bfe39b60bef3438	2017-02-24 15:36:52 -08:00
Jerome Jiang	b1dcaf7f1e	Merge "Fix segmentation fault caused by denoiser working with spatial SVC."	2017-02-22 04:44:55 +00:00
Yi Luo	6036a0d24f	Following SSSE3 intrinsics functions also work for HBD - vpx_idct8x8_12_add_ssse3 vpx_idct8x8_64_add_ssse3 vpx_idct32x32_34_add_ssse3 vpx_idct32x32_135_add_ssse3 vpx_idct32x32_1024_add_ssse3 - turn on unit tests. Change-Id: I788b2b3b2074a6f3ab6a0e6f469c1327a123eff7	2017-02-21 12:37:53 -08:00
Jerome Jiang	0d1e5a21c4	Fix segmentation fault caused by denoiser working with spatial SVC. Re-enable the affected test. BUG=webm:1374 Change-Id: I98cd49403927123546d1d0056660b98c9cb8babb	2017-02-21 09:38:28 -08:00
Yi Luo	1f8e8e5bf1	Fix idct8x8 SSSE3 SingleExtremeCoeff unit tests - In SSSE3 optimization, 16-bit addition and subtraction would overflow when input coefficient is 16-bit signed extreme values. - Function-level speed becomes slower (unit ms): idct8x8_64: 284 -> 294 idct8x8_12: 145 -> 158. BUG=webm:1332 Change-Id: I1e4bf9d30a6d4112b8cac5823729565bf145e40b	2017-02-17 14:05:05 -08:00
James Zern	3e7025022e	Merge "Add vpx_highbd_idct16x16_10_add_neon()"	2017-02-17 20:29:37 +00:00
Yi Luo	f62dcc9c33	Replace idct32x32_1024_add_ssse3 assembly with intrinsics - Encoding/decoding test, BQTerrace_1920x1080_60.y4m, on i7-6700, no obvious user-level speed performance downgrade. - Passed unit tests. Change-Id: I20688e0dd3731021ec8fb4404734336f1a426bfc	2017-02-16 16:10:40 -08:00
Johann Koenig	a9b81da575	Merge "block error avx2: use tran_low_t"	2017-02-16 23:51:14 +00:00
Linfeng Zhang	0620081731	Add vpx_highbd_idct16x16_10_add_neon() BUG=webm:1301 Change-Id: If686c8144764c4162458f0bc4bb1bbf6555c48ab	2017-02-16 15:13:50 -08:00
James Zern	0f014c97e5	Merge "Fix mips vpx_post_proc_down_and_across_mb_row_msa function"	2017-02-16 23:02:10 +00:00
Johann Koenig	06a82af0de	Merge "correct bitdepth_conversion_sse2.h header guard"	2017-02-16 21:41:28 +00:00
Johann	6c2d732bf4	correct bitdepth_conversion_sse2.h header guard Change-Id: Ic4ffd861608e67fe59bcb3a86010ce3ef11a5519	2017-02-16 12:43:33 -08:00
Yi Luo	1cb44945fb	Merge "Add idct32x32_135_add SSSE3 intrinsics"	2017-02-16 20:43:29 +00:00
Johann	2104454607	block error avx2: use tran_low_t Change-Id: Ic5f3a1f569d6f82afeaf4fcd7235374bb460db3c	2017-02-16 12:39:02 -08:00
Yi Luo	72a43e2378	Add idct32x32_135_add SSSE3 intrinsics - Replace the corresponding assembly code. - No user level speed performance degrade. - Unit tests passed. Change-Id: Idd0c5a4bad4976f1617c34100cb46e75e3b961e5	2017-02-16 11:29:34 -08:00
Johann	4682130b60	quantize_fp highbd ssse3: use tran_low_t for coeff Change-Id: Iebade0efc0efbb0a80a0f3adbef4962e3a2f25e8	2017-02-16 07:40:56 -08:00
Johann	44600442dc	bitdepth conversion: really use num elements The previous implementation confused bit/bytes/elements. It was using '32' as the multiplier but that was mistakenly adopted because a 32x32 transform embedded the stride. Change-Id: Ieeb867a332416b9a40580b5e7c9b20088e9e691a	2017-02-16 15:02:48 +00:00
Kaustubh Raste	fddf66b741	Fix mips vpx_post_proc_down_and_across_mb_row_msa function Added fix to handle non-multiple of 16 cols case for size 16 Change-Id: If3a6d772d112077c5e0a9be9e612e1148f04338c	2017-02-16 13:17:00 +05:30
Johann Koenig	b63e88e506	Merge "Use 'packssdw' for loading tran_low_t values"	2017-02-16 02:41:00 +00:00
Linfeng Zhang	106c342659	cosmetics,dsp/inv_txfm.c: reorder functions Change-Id: Ie0f7689ebe230c68eadb22a32b14838c1a7543a6	2017-02-15 11:40:35 -08:00
Linfeng Zhang	81914ce68a	Add vpx_highbd_idct16x16_38_add_neon() BUG=webm:1301 Change-Id: Ic6cd8c1e63e1b7a997cbed221e20fff4c599e0fe	2017-02-15 09:12:02 -08:00
Linfeng Zhang	e07e74fb0f	Add vpx_highbd_idct16x16_38_add_c() When eob is less than or equal to 38 for high-bitdepth 16x16 idct, call this function. BUG=webm:1301 Change-Id: I09167f89d29c401f9c36710b0fd2d02644052060	2017-02-14 17:25:52 -08:00
Johann	327a02d77e	Use 'packssdw' for loading tran_low_t values This matches bitdepth_conversion_sse2.asm and produces substantially better assembly. The old way had lots of 'movzwl' and 'shl' and storing back to memory before loading into an xmm register. Change-Id: Ib33e35354dfd691a4f8b1e39f4dbcbb14cd5302b	2017-02-14 22:39:49 +00:00
Linfeng Zhang	429e652809	Replace 14 with DCT_CONST_BITS in idct NEON functions' shifts Change-Id: I2a39a3bb87516b04d273bc1c0f4a634e3fb6f0f6	2017-02-14 13:08:41 -08:00
clang-format	4b402746ca	apply clang-format Change-Id: I75e4a9e0b37bd4586f26c8d6c1fa27f3f6ff1bce	2017-02-14 12:45:52 -08:00
Yi Luo	c1a90dc160	Merge "Replace idct32x32_34_add_ssse3 assembly with intrinsics"	2017-02-14 20:13:27 +00:00
Yi Luo	bd86de1ac8	Replace idct32x32_34_add_ssse3 assembly with intrinsics - No user-level speed performance change. - Pass unit tests. Change-Id: Idfc598e00f354265e41f6b3219f4734216c115c6	2017-02-14 10:38:36 -08:00
Linfeng Zhang	de9ae32b93	Merge "Add vpx_highbd_idct16x16_256_add_neon()"	2017-02-14 01:15:34 +00:00
Linfeng Zhang	5ad4159ebb	Add vpx_highbd_idct16x16_256_add_neon() BUG=webm:1301 Change-Id: I6bb755552a39bdd26eef3f449601f6a9766c65ec	2017-02-13 15:50:33 -08:00
Johann	5ecde212a8	fdct8x8 highbd neon: use tran_low_t for output Change-Id: I100c4a1955d80bec4d28e82796b3e7f57e84d0ba	2017-02-13 22:16:14 +00:00
Linfeng Zhang	016933ad48	Add vpx_highbd_idct{16x16,32x32}_1_add_neon() and update vpx_highbd_idct8x8_1_add_neon() BUG=webm:1301 Change-Id: I18d1a0cbe98ba822d5194c1b4e13a4c29c5c75f4	2017-02-13 10:25:22 -08:00
James Zern	91f87e7513	Merge "Add vpx_idct16x16_38_add_neon()"	2017-02-11 03:42:36 +00:00
Linfeng Zhang	bc1c18e18c	Add vpx_idct16x16_38_add_neon() The RunQuantCheck() test on it exposes 16-bit overflow in stage 7 of pass 2. Change to use saturating add/sub for both vpx_idct16x16_38_add_neon() and vpx_idct16x16_256_add_neon() for high bitdepth. Change-Id: Ibf4c107a887553a52852cc582e28d38a5a5a2712	2017-02-08 12:15:22 -08:00
Yi Luo	ac04d11abc	Replace idct8x8_12_add_ssse3 assembly code with intrinsics - Performance achieves the same as assembly. - Unit tests pass. Change-Id: I6eacfbbd826b3946c724d78fbef7948af6406ccd	2017-02-08 10:07:45 -08:00
Linfeng Zhang	cf76ee2cb7	Add vpx_idct16x16_38_add_c() When eob is less than or equal to 38 for 16x16 idct, call this function. Change-Id: Ief6f3fb16a49ace3c92cebf4e220bf5bf52a6087	2017-02-07 09:40:51 -08:00
Linfeng Zhang	66695533a8	Merge "Update 16x16 8-bit idct NEON intrinsics"	2017-02-07 16:52:40 +00:00
Johann	641fda79bb	highbd x86: consolidate tran_low_t conversions Create new helper files specifically for converting tran_low_t types. Change-Id: I7c4c458ef910f3b3d10a3cfbf9df4de7682fd905	2017-02-06 10:43:26 -08:00
Jingning Han	bb40844e32	Merge "Add SSSE3 intrinsic 8x8 inverse 2D-DCT"	2017-02-02 22:18:32 +00:00
Kaustubh Raste	5b10674b5c	Merge "Add mips msa sum_squares_2d_i16 function"	2017-02-02 08:09:21 +00:00

1 2 3 4 5 ...

600 Commits