generic-library/vpx

Author	SHA1	Message	Date
Jingning Han	bb40844e32	Merge "Add SSSE3 intrinsic 8x8 inverse 2D-DCT"	2017-02-02 22:18:32 +00:00
Kaustubh Raste	5b10674b5c	Merge "Add mips msa sum_squares_2d_i16 function"	2017-02-02 08:09:21 +00:00
Johann Koenig	726556dde9	Merge "Remove neon assembly for idct 16x16 and 8x8"	2017-02-02 03:25:31 +00:00
Johann Koenig	ce6318f254	Merge changes I43521ad3,I013659f6 * changes: satd highbd neon: use tran_low_t for coeff satd highbd sse2: use tran_low_t for coeff	2017-02-02 03:03:58 +00:00
Jingning Han	8f95389742	Add SSSE3 intrinsic 8x8 inverse 2D-DCT The intrinsic version reduces the average cycles from 183 to 175. Change-Id: I7c1bcdb0a830266e93d8347aed38120fb3be0e03	2017-02-01 14:47:53 -08:00
Johann Koenig	dc90501ba3	Merge changes I374dfc08,I7e15192e,Ica414007 * changes: hadamard highbd ssse3: use tran_low_t for coeff hadamard highbd neon: use tran_low_t for coeff hadamard highbd sse2: use tran_low_t for coeff	2017-02-01 21:56:36 +00:00
Johann Koenig	f60171bb4f	Merge "deblock: annotate postproc parameters"	2017-02-01 19:57:29 +00:00
Johann	f8d744d91a	satd highbd neon: use tran_low_t for coeff BUG=webm:1365 Change-Id: I43521ad32b6c96737a8ef2b8c327f901fd7eaf84	2017-02-01 11:55:47 -08:00
Johann	2ba383474d	satd highbd sse2: use tran_low_t for coeff BUG=webm:1365 Change-Id: I013659f6b9fbf9cc52ab840eae520fe0b5f883fb	2017-02-01 11:55:16 -08:00
Johann	0f751ecee3	hadamard highbd ssse3: use tran_low_t for coeff BUG=webm:1365 Change-Id: I374dfc08732932382043905f128e928b08cb4f57	2017-02-01 11:51:15 -08:00
Johann	1eb8a718bf	hadamard highbd neon: use tran_low_t for coeff BUG=webm:1365 Change-Id: I7e15192ead3a3631755b386f102c979f06e26279	2017-02-01 11:50:46 -08:00
Johann	2dac808dd1	hadamard highbd sse2: use tran_low_t for coeff BUG=webm:1365 Change-Id: Ica414007d8412ceebfffa9e58e8416226a3fe934	2017-02-01 11:46:57 -08:00
Johann Koenig	3bda634576	Merge "quantize ssse3: remove unused pxor"	2017-02-01 19:41:41 +00:00
Jingning Han	969957f9f2	Fix real-time compression regression in hbd mode This commit resolves the compression performance regression in real-time encoding setting when high bit-depth mode is enabled. The current solution temporarily disables the SIMD implementations of vpx_satd, hadamard8x8, and hadamard16x16 in high bit-depth mode. The commit makes the coding results bit-wise identical between regular coding pipeline and high bit-depth at profile 0. BUG=webm:1365 Change-Id: Icfb900821733749685370460a1a5a7e07f76f4bf	2017-01-31 23:17:09 -08:00
Johann	32f68cc58c	deblock: annotate postproc parameters Clears a clang static analyzer warning where 'cols' is assumed to be less than 0, preventing the for loop from executing. The assembly already requires that the size be 8 or 16 (U/V or Y plane) and cols is a multiple of 8. Change-Id: Ica4612690ead1638c94cfe56b306e87f8ce644f9	2017-01-31 15:58:57 -08:00
Kaustubh Raste	750e753134	Add mips msa sum_squares_2d_i16 function average improvement ~4x-5x Change-Id: I8d91b71d0677009be52b412e4f52b40b98573a53	2017-01-31 12:22:43 +00:00
Kaustubh Raste	df7e1fecc1	Add mips msa vpx_minmax_8x8 function average improvement ~4x-5x Change-Id: I83aee9977534fddb8a9b80d31af646c0b6b1a8c3	2017-01-31 10:00:43 +05:30
Johann	dcfff3ccc8	quantize ssse3: remove unused pxor Change-Id: Ifa22d77fd530827de0b32ae71810dc2213ab2937	2017-01-30 17:02:57 -08:00
Kaustubh Raste	4ce20fb3f4	Add mips msa vpx_vector_var function average improvement ~4x-5x Change-Id: I2f63ef83d816052ca8dc42421e7e9d42f7a7af6b	2017-01-28 08:53:20 +00:00
Kaustubh Raste	407fad2356	Add mips msa vpx Integer projection row/col functions average improvement ~4x-5x Change-Id: I17c41383250282b39f5ecae0197ef1df7de20801	2017-01-27 11:11:42 +05:30
Kaustubh Raste	182ea677a0	Add mips msa vpx satd function average improvement ~4x-5x Change-Id: If8683d636fe2606d4ca1038e28185bca53bbe244	2017-01-24 10:44:22 +05:30
Johann	13234d3c43	Remove neon assembly for idct 16x16 and 8x8 Tested using test/partial_idct_test.cc:DISABLED_Speed Both gcc 4.9 and clang 3.8 from the r13 Android NDK offer improvements using the intrinsics: <function> <clang asm> <gcc asm> <clang intrin> <gcc intrin> idct16x16_256 1720ms 1703ms 1546ms 1554ms idct16x16_10 1320ms 1247ms 518ms 488ms idct16x16_1 107ms 108ms 64ms 68ms idct8x8_64 924ms 931ms 866ms 989ms idct8x8_12 826ms 824ms 519ms 514ms idct8x8_1 172ms 166ms 110ms 125ms idct8x8_64 isn't quite perfect (slight regression with gcc intrinsics) but as a counter example idct16x16_10 goes from ~1300ms to ~500ms On a sample clip, clang improved from 48.5 to 49fps and gcc stayed roughly stable. BUG=webm:1303 Change-Id: I9d4fd2b41b46ea6174a887b40a82c8e6e4769ed4	2017-01-19 12:27:31 -08:00
Kaustubh Raste	e0c0e65378	Add mips msa vpx hadamard functions average improvement ~4x-5x Change-Id: I167132d894c04fa85dda8dde7906ff9c61b3a65d	2017-01-19 14:44:03 +05:30
Jingning Han	b6fe63a505	Merge "Rework 8x8 transpose SSSE3 for avg computation"	2017-01-13 18:25:17 +00:00
Jingning Han	553e9e291f	Merge "Rework 8x8 transpose SSSE3 for inverse 2D-DCT"	2017-01-13 18:25:09 +00:00
Jingning Han	39fff1bea0	Rework 8x8 transpose SSSE3 for avg computation Use same transpose process as inv_txfm_sse2 does. Change-Id: I2db05f0b254628a11f621c4c09abb89501ba6d3c	2017-01-12 15:16:07 -08:00
Jingning Han	f65170ea84	Rework 8x8 transpose SSSE3 for inverse 2D-DCT Use same transpose process as inv_txfm_sse2 does. Change-Id: Ic4827825bd174cba57a0a80e19bf458a648e7d94	2017-01-12 15:13:18 -08:00
Johann Koenig	9f27d1f843	Merge "arm idct16x16: remove extra config guards"	2017-01-11 20:22:27 +00:00
Johann	68d0f46ec0	arm idct16x16: remove extra config guards This file is guarded by HAVE_NEON_ASM in the .mk file now. Change-Id: I513a621c234aa90ad52e426c8ed494d8a7d4b74a	2017-01-11 10:17:14 -08:00
Jingning Han	9a780fa7db	Rework forward 8x8 2D-DCT ssse3 implementation This commit reworks the SSSE3 implementation of the forward 8x8 2D-DCT. It uses a cyclic rotation approach to the temporary xmm registers. It reduces the average cycles from 158 to 154. The SSE2 version uses 169 cycles. Change-Id: I1b79b9642aae0ed3fb3cefb5b70246e6de5d5caa	2017-01-10 12:50:55 -08:00
James Zern	9480da21e8	Merge "Refine 8-bit 16x16 idct NEON intrinsics"	2017-01-09 23:52:29 +00:00
Johann Koenig	371a64bfe7	Merge "postproc: vpx_mbpost_proc_down_neon"	2017-01-09 19:53:15 +00:00
Johann Koenig	8a7847c2c9	Merge "Fix mips dspr2 idct32x32 functions for large coefficient input"	2017-01-09 19:47:47 +00:00
Johann Koenig	bf168b24f5	Merge "Fix mips dspr2 idct16x16 functions for large coefficient input"	2017-01-09 19:47:00 +00:00
Johann Koenig	08d0a7fd0f	Merge "Fix mips dspr2 idct8x8 functions for large coefficient input"	2017-01-09 19:46:18 +00:00
Johann Koenig	ab20869221	Merge "Fix mips dspr2 idct4x4 functions for large coefficient input"	2017-01-09 19:45:54 +00:00
Johann	c23970ec25	postproc: vpx_mbpost_proc_down_neon This was much more amenable to optimization than the across filter. Speedup of almost 2.5x BUG=webm:1320 Change-Id: I49acc0f9cb2e7642303df90132cbc938acade4c4	2017-01-09 10:21:56 -08:00
Johann Koenig	9af97fb630	Merge "postproc: vpx_mbpost_proc_across_ip_neon"	2017-01-09 18:17:26 +00:00
Kaustubh Raste	50dd3eb62c	Fix mips dspr2 idct32x32 functions for large coefficient input Change-Id: If9da7099f226a27a09cc9e2899eb66a1158909d2	2017-01-09 17:21:09 +05:30
Kaustubh Raste	c06991fce6	Fix mips dspr2 idct16x16 functions for large coefficient input Change-Id: I9be3d3d040837f658c6314606e28db8c31092a1a	2017-01-09 16:35:28 +05:30
Kaustubh Raste	24d804f79c	Fix mips dspr2 idct8x8 functions for large coefficient input Change-Id: If011dd923bbe976589735d5aa1c3167dda1a3b61	2017-01-09 16:22:19 +05:30
Kaustubh Raste	afd2d797eb	Fix mips dspr2 idct4x4 functions for large coefficient input Change-Id: I06730eec80ca81e0b7436d26232465b79f447e89	2017-01-09 15:28:30 +05:30
Linfeng Zhang	6abdd31555	Refine 8-bit 16x16 idct NEON intrinsics Speed test shows 25% gain on vpx_idct16x16_256_add_neon(), and vpx_idct16x16_10_add_neon() got trippled. Change-Id: If8518d9b6a3efab74031297b8d40cd83c4a49541	2017-01-06 17:52:07 -08:00
Johann	4dca923454	postproc: vpx_mbpost_proc_across_ip_neon The speedup is pretty poor. I would be concerned except the SSE2 is worse: Existing SSE2 improvement: 22% New neon improvement: 35% BUG=webm:1320 Change-Id: Ied598a261134aa6cbe69f96f58589d2bae17bf62	2017-01-06 16:39:17 -08:00
Linfeng Zhang	2d12a52ff0	Merge "Add high bitdepth 8x8 idct NEON intrinsics"	2017-01-06 16:47:23 +00:00
Linfeng Zhang	911bb980b1	Clean DC only idct NEON intrinsics BUG=webm:1301 Change-Id: Iffc83854218460b3f687f3774e71d45b552382a5	2016-12-28 13:51:44 -08:00
Linfeng Zhang	9b187954df	Add high bitdepth 8x8 idct NEON intrinsics BUG=webm:1301 Change-Id: I56e3bc3aab9214e2debac93796389a7194991084	2016-12-27 16:28:53 -08:00
Linfeng Zhang	6d5a3fe583	Clean idct 8x8 neon functions BUG=webm:1301 Change-Id: I05f47dca1fddc155c8396e627cfccf6449677307	2016-12-21 14:24:17 -08:00
James Zern	a68b36c752	vpx_idct32x32_1024_add_neon: quiet uninitialized warning relocate the assignment to 'in' outside of the for loop. this quiets a spurious warning in visual studio builds since: 86e340c enable vpx_idct32x32_1024_add_neon in hbd builds + give the variable a more descriptive name BUG=webm:1294 Change-Id: I5c3da5c7939621477e0fc0ad3a1b2a3045c5bffd	2016-12-19 12:49:44 -08:00
Linfeng Zhang	7e23f895ca	Merge "Clean hbd idct 4x4 neon functions and other"	2016-12-19 17:09:26 +00:00

1 2 3 4 5 ...

551 Commits