generic-library/vpx

Author	SHA1	Message	Date
Johann Koenig	b63e88e506	Merge "Use 'packssdw' for loading tran_low_t values"	2017-02-16 02:41:00 +00:00
Linfeng Zhang	106c342659	cosmetics,dsp/inv_txfm.c: reorder functions Change-Id: Ie0f7689ebe230c68eadb22a32b14838c1a7543a6	2017-02-15 11:40:35 -08:00
Linfeng Zhang	81914ce68a	Add vpx_highbd_idct16x16_38_add_neon() BUG=webm:1301 Change-Id: Ic6cd8c1e63e1b7a997cbed221e20fff4c599e0fe	2017-02-15 09:12:02 -08:00
Linfeng Zhang	e07e74fb0f	Add vpx_highbd_idct16x16_38_add_c() When eob is less than or equal to 38 for high-bitdepth 16x16 idct, call this function. BUG=webm:1301 Change-Id: I09167f89d29c401f9c36710b0fd2d02644052060	2017-02-14 17:25:52 -08:00
Johann	327a02d77e	Use 'packssdw' for loading tran_low_t values This matches bitdepth_conversion_sse2.asm and produces substantially better assembly. The old way had lots of 'movzwl' and 'shl' and storing back to memory before loading into an xmm register. Change-Id: Ib33e35354dfd691a4f8b1e39f4dbcbb14cd5302b	2017-02-14 22:39:49 +00:00
Linfeng Zhang	429e652809	Replace 14 with DCT_CONST_BITS in idct NEON functions' shifts Change-Id: I2a39a3bb87516b04d273bc1c0f4a634e3fb6f0f6	2017-02-14 13:08:41 -08:00
clang-format	4b402746ca	apply clang-format Change-Id: I75e4a9e0b37bd4586f26c8d6c1fa27f3f6ff1bce	2017-02-14 12:45:52 -08:00
Yi Luo	c1a90dc160	Merge "Replace idct32x32_34_add_ssse3 assembly with intrinsics"	2017-02-14 20:13:27 +00:00
Yi Luo	bd86de1ac8	Replace idct32x32_34_add_ssse3 assembly with intrinsics - No user-level speed performance change. - Pass unit tests. Change-Id: Idfc598e00f354265e41f6b3219f4734216c115c6	2017-02-14 10:38:36 -08:00
Linfeng Zhang	de9ae32b93	Merge "Add vpx_highbd_idct16x16_256_add_neon()"	2017-02-14 01:15:34 +00:00
Linfeng Zhang	5ad4159ebb	Add vpx_highbd_idct16x16_256_add_neon() BUG=webm:1301 Change-Id: I6bb755552a39bdd26eef3f449601f6a9766c65ec	2017-02-13 15:50:33 -08:00
Johann	5ecde212a8	fdct8x8 highbd neon: use tran_low_t for output Change-Id: I100c4a1955d80bec4d28e82796b3e7f57e84d0ba	2017-02-13 22:16:14 +00:00
Linfeng Zhang	016933ad48	Add vpx_highbd_idct{16x16,32x32}_1_add_neon() and update vpx_highbd_idct8x8_1_add_neon() BUG=webm:1301 Change-Id: I18d1a0cbe98ba822d5194c1b4e13a4c29c5c75f4	2017-02-13 10:25:22 -08:00
James Zern	91f87e7513	Merge "Add vpx_idct16x16_38_add_neon()"	2017-02-11 03:42:36 +00:00
Linfeng Zhang	bc1c18e18c	Add vpx_idct16x16_38_add_neon() The RunQuantCheck() test on it exposes 16-bit overflow in stage 7 of pass 2. Change to use saturating add/sub for both vpx_idct16x16_38_add_neon() and vpx_idct16x16_256_add_neon() for high bitdepth. Change-Id: Ibf4c107a887553a52852cc582e28d38a5a5a2712	2017-02-08 12:15:22 -08:00
Yi Luo	ac04d11abc	Replace idct8x8_12_add_ssse3 assembly code with intrinsics - Performance achieves the same as assembly. - Unit tests pass. Change-Id: I6eacfbbd826b3946c724d78fbef7948af6406ccd	2017-02-08 10:07:45 -08:00
Linfeng Zhang	cf76ee2cb7	Add vpx_idct16x16_38_add_c() When eob is less than or equal to 38 for 16x16 idct, call this function. Change-Id: Ief6f3fb16a49ace3c92cebf4e220bf5bf52a6087	2017-02-07 09:40:51 -08:00
Linfeng Zhang	66695533a8	Merge "Update 16x16 8-bit idct NEON intrinsics"	2017-02-07 16:52:40 +00:00
Johann	641fda79bb	highbd x86: consolidate tran_low_t conversions Create new helper files specifically for converting tran_low_t types. Change-Id: I7c4c458ef910f3b3d10a3cfbf9df4de7682fd905	2017-02-06 10:43:26 -08:00
Jingning Han	bb40844e32	Merge "Add SSSE3 intrinsic 8x8 inverse 2D-DCT"	2017-02-02 22:18:32 +00:00
Kaustubh Raste	5b10674b5c	Merge "Add mips msa sum_squares_2d_i16 function"	2017-02-02 08:09:21 +00:00
Johann Koenig	726556dde9	Merge "Remove neon assembly for idct 16x16 and 8x8"	2017-02-02 03:25:31 +00:00
Johann Koenig	ce6318f254	Merge changes I43521ad3,I013659f6 * changes: satd highbd neon: use tran_low_t for coeff satd highbd sse2: use tran_low_t for coeff	2017-02-02 03:03:58 +00:00
Linfeng Zhang	e4985cf619	Update 16x16 8-bit idct NEON intrinsics Remove redundant memory accesses. Change-Id: I8049074bdba5f49eab7e735b2b377423a69cd4c8	2017-02-01 17:04:33 -08:00
Jingning Han	8f95389742	Add SSSE3 intrinsic 8x8 inverse 2D-DCT The intrinsic version reduces the average cycles from 183 to 175. Change-Id: I7c1bcdb0a830266e93d8347aed38120fb3be0e03	2017-02-01 14:47:53 -08:00
Johann Koenig	dc90501ba3	Merge changes I374dfc08,I7e15192e,Ica414007 * changes: hadamard highbd ssse3: use tran_low_t for coeff hadamard highbd neon: use tran_low_t for coeff hadamard highbd sse2: use tran_low_t for coeff	2017-02-01 21:56:36 +00:00
Johann Koenig	f60171bb4f	Merge "deblock: annotate postproc parameters"	2017-02-01 19:57:29 +00:00
Johann	f8d744d91a	satd highbd neon: use tran_low_t for coeff BUG=webm:1365 Change-Id: I43521ad32b6c96737a8ef2b8c327f901fd7eaf84	2017-02-01 11:55:47 -08:00
Johann	2ba383474d	satd highbd sse2: use tran_low_t for coeff BUG=webm:1365 Change-Id: I013659f6b9fbf9cc52ab840eae520fe0b5f883fb	2017-02-01 11:55:16 -08:00
Johann	0f751ecee3	hadamard highbd ssse3: use tran_low_t for coeff BUG=webm:1365 Change-Id: I374dfc08732932382043905f128e928b08cb4f57	2017-02-01 11:51:15 -08:00
Johann	1eb8a718bf	hadamard highbd neon: use tran_low_t for coeff BUG=webm:1365 Change-Id: I7e15192ead3a3631755b386f102c979f06e26279	2017-02-01 11:50:46 -08:00
Johann	2dac808dd1	hadamard highbd sse2: use tran_low_t for coeff BUG=webm:1365 Change-Id: Ica414007d8412ceebfffa9e58e8416226a3fe934	2017-02-01 11:46:57 -08:00
Johann Koenig	3bda634576	Merge "quantize ssse3: remove unused pxor"	2017-02-01 19:41:41 +00:00
Jingning Han	969957f9f2	Fix real-time compression regression in hbd mode This commit resolves the compression performance regression in real-time encoding setting when high bit-depth mode is enabled. The current solution temporarily disables the SIMD implementations of vpx_satd, hadamard8x8, and hadamard16x16 in high bit-depth mode. The commit makes the coding results bit-wise identical between regular coding pipeline and high bit-depth at profile 0. BUG=webm:1365 Change-Id: Icfb900821733749685370460a1a5a7e07f76f4bf	2017-01-31 23:17:09 -08:00
Johann	32f68cc58c	deblock: annotate postproc parameters Clears a clang static analyzer warning where 'cols' is assumed to be less than 0, preventing the for loop from executing. The assembly already requires that the size be 8 or 16 (U/V or Y plane) and cols is a multiple of 8. Change-Id: Ica4612690ead1638c94cfe56b306e87f8ce644f9	2017-01-31 15:58:57 -08:00
Kaustubh Raste	750e753134	Add mips msa sum_squares_2d_i16 function average improvement ~4x-5x Change-Id: I8d91b71d0677009be52b412e4f52b40b98573a53	2017-01-31 12:22:43 +00:00
Kaustubh Raste	df7e1fecc1	Add mips msa vpx_minmax_8x8 function average improvement ~4x-5x Change-Id: I83aee9977534fddb8a9b80d31af646c0b6b1a8c3	2017-01-31 10:00:43 +05:30
Johann	dcfff3ccc8	quantize ssse3: remove unused pxor Change-Id: Ifa22d77fd530827de0b32ae71810dc2213ab2937	2017-01-30 17:02:57 -08:00
Kaustubh Raste	4ce20fb3f4	Add mips msa vpx_vector_var function average improvement ~4x-5x Change-Id: I2f63ef83d816052ca8dc42421e7e9d42f7a7af6b	2017-01-28 08:53:20 +00:00
Kaustubh Raste	407fad2356	Add mips msa vpx Integer projection row/col functions average improvement ~4x-5x Change-Id: I17c41383250282b39f5ecae0197ef1df7de20801	2017-01-27 11:11:42 +05:30
Kaustubh Raste	182ea677a0	Add mips msa vpx satd function average improvement ~4x-5x Change-Id: If8683d636fe2606d4ca1038e28185bca53bbe244	2017-01-24 10:44:22 +05:30
Johann	13234d3c43	Remove neon assembly for idct 16x16 and 8x8 Tested using test/partial_idct_test.cc:DISABLED_Speed Both gcc 4.9 and clang 3.8 from the r13 Android NDK offer improvements using the intrinsics: <function> <clang asm> <gcc asm> <clang intrin> <gcc intrin> idct16x16_256 1720ms 1703ms 1546ms 1554ms idct16x16_10 1320ms 1247ms 518ms 488ms idct16x16_1 107ms 108ms 64ms 68ms idct8x8_64 924ms 931ms 866ms 989ms idct8x8_12 826ms 824ms 519ms 514ms idct8x8_1 172ms 166ms 110ms 125ms idct8x8_64 isn't quite perfect (slight regression with gcc intrinsics) but as a counter example idct16x16_10 goes from ~1300ms to ~500ms On a sample clip, clang improved from 48.5 to 49fps and gcc stayed roughly stable. BUG=webm:1303 Change-Id: I9d4fd2b41b46ea6174a887b40a82c8e6e4769ed4	2017-01-19 12:27:31 -08:00
Kaustubh Raste	e0c0e65378	Add mips msa vpx hadamard functions average improvement ~4x-5x Change-Id: I167132d894c04fa85dda8dde7906ff9c61b3a65d	2017-01-19 14:44:03 +05:30
Jingning Han	b6fe63a505	Merge "Rework 8x8 transpose SSSE3 for avg computation"	2017-01-13 18:25:17 +00:00
Jingning Han	553e9e291f	Merge "Rework 8x8 transpose SSSE3 for inverse 2D-DCT"	2017-01-13 18:25:09 +00:00
Jingning Han	39fff1bea0	Rework 8x8 transpose SSSE3 for avg computation Use same transpose process as inv_txfm_sse2 does. Change-Id: I2db05f0b254628a11f621c4c09abb89501ba6d3c	2017-01-12 15:16:07 -08:00
Jingning Han	f65170ea84	Rework 8x8 transpose SSSE3 for inverse 2D-DCT Use same transpose process as inv_txfm_sse2 does. Change-Id: Ic4827825bd174cba57a0a80e19bf458a648e7d94	2017-01-12 15:13:18 -08:00
Johann Koenig	9f27d1f843	Merge "arm idct16x16: remove extra config guards"	2017-01-11 20:22:27 +00:00
Johann	68d0f46ec0	arm idct16x16: remove extra config guards This file is guarded by HAVE_NEON_ASM in the .mk file now. Change-Id: I513a621c234aa90ad52e426c8ed494d8a7d4b74a	2017-01-11 10:17:14 -08:00
Jingning Han	9a780fa7db	Rework forward 8x8 2D-DCT ssse3 implementation This commit reworks the SSSE3 implementation of the forward 8x8 2D-DCT. It uses a cyclic rotation approach to the temporary xmm registers. It reduces the average cycles from 158 to 154. The SSE2 version uses 169 cycles. Change-Id: I1b79b9642aae0ed3fb3cefb5b70246e6de5d5caa	2017-01-10 12:50:55 -08:00

1 2 3 4 5 ...

571 Commits