generic-library/vpx

Author	SHA1	Message	Date
Kaustubh Raste	4ce20fb3f4	Add mips msa vpx_vector_var function average improvement ~4x-5x Change-Id: I2f63ef83d816052ca8dc42421e7e9d42f7a7af6b	2017-01-28 08:53:20 +00:00
Kaustubh Raste	407fad2356	Add mips msa vpx Integer projection row/col functions average improvement ~4x-5x Change-Id: I17c41383250282b39f5ecae0197ef1df7de20801	2017-01-27 11:11:42 +05:30
Kaustubh Raste	182ea677a0	Add mips msa vpx satd function average improvement ~4x-5x Change-Id: If8683d636fe2606d4ca1038e28185bca53bbe244	2017-01-24 10:44:22 +05:30
Kaustubh Raste	e0c0e65378	Add mips msa vpx hadamard functions average improvement ~4x-5x Change-Id: I167132d894c04fa85dda8dde7906ff9c61b3a65d	2017-01-19 14:44:03 +05:30
Jingning Han	b6fe63a505	Merge "Rework 8x8 transpose SSSE3 for avg computation"	2017-01-13 18:25:17 +00:00
Jingning Han	553e9e291f	Merge "Rework 8x8 transpose SSSE3 for inverse 2D-DCT"	2017-01-13 18:25:09 +00:00
Jingning Han	39fff1bea0	Rework 8x8 transpose SSSE3 for avg computation Use same transpose process as inv_txfm_sse2 does. Change-Id: I2db05f0b254628a11f621c4c09abb89501ba6d3c	2017-01-12 15:16:07 -08:00
Jingning Han	f65170ea84	Rework 8x8 transpose SSSE3 for inverse 2D-DCT Use same transpose process as inv_txfm_sse2 does. Change-Id: Ic4827825bd174cba57a0a80e19bf458a648e7d94	2017-01-12 15:13:18 -08:00
Johann Koenig	9f27d1f843	Merge "arm idct16x16: remove extra config guards"	2017-01-11 20:22:27 +00:00
Johann	68d0f46ec0	arm idct16x16: remove extra config guards This file is guarded by HAVE_NEON_ASM in the .mk file now. Change-Id: I513a621c234aa90ad52e426c8ed494d8a7d4b74a	2017-01-11 10:17:14 -08:00
Jingning Han	9a780fa7db	Rework forward 8x8 2D-DCT ssse3 implementation This commit reworks the SSSE3 implementation of the forward 8x8 2D-DCT. It uses a cyclic rotation approach to the temporary xmm registers. It reduces the average cycles from 158 to 154. The SSE2 version uses 169 cycles. Change-Id: I1b79b9642aae0ed3fb3cefb5b70246e6de5d5caa	2017-01-10 12:50:55 -08:00
James Zern	9480da21e8	Merge "Refine 8-bit 16x16 idct NEON intrinsics"	2017-01-09 23:52:29 +00:00
Johann Koenig	371a64bfe7	Merge "postproc: vpx_mbpost_proc_down_neon"	2017-01-09 19:53:15 +00:00
Johann Koenig	8a7847c2c9	Merge "Fix mips dspr2 idct32x32 functions for large coefficient input"	2017-01-09 19:47:47 +00:00
Johann Koenig	bf168b24f5	Merge "Fix mips dspr2 idct16x16 functions for large coefficient input"	2017-01-09 19:47:00 +00:00
Johann Koenig	08d0a7fd0f	Merge "Fix mips dspr2 idct8x8 functions for large coefficient input"	2017-01-09 19:46:18 +00:00
Johann Koenig	ab20869221	Merge "Fix mips dspr2 idct4x4 functions for large coefficient input"	2017-01-09 19:45:54 +00:00
Johann	c23970ec25	postproc: vpx_mbpost_proc_down_neon This was much more amenable to optimization than the across filter. Speedup of almost 2.5x BUG=webm:1320 Change-Id: I49acc0f9cb2e7642303df90132cbc938acade4c4	2017-01-09 10:21:56 -08:00
Johann Koenig	9af97fb630	Merge "postproc: vpx_mbpost_proc_across_ip_neon"	2017-01-09 18:17:26 +00:00
Kaustubh Raste	50dd3eb62c	Fix mips dspr2 idct32x32 functions for large coefficient input Change-Id: If9da7099f226a27a09cc9e2899eb66a1158909d2	2017-01-09 17:21:09 +05:30
Kaustubh Raste	c06991fce6	Fix mips dspr2 idct16x16 functions for large coefficient input Change-Id: I9be3d3d040837f658c6314606e28db8c31092a1a	2017-01-09 16:35:28 +05:30
Kaustubh Raste	24d804f79c	Fix mips dspr2 idct8x8 functions for large coefficient input Change-Id: If011dd923bbe976589735d5aa1c3167dda1a3b61	2017-01-09 16:22:19 +05:30
Kaustubh Raste	afd2d797eb	Fix mips dspr2 idct4x4 functions for large coefficient input Change-Id: I06730eec80ca81e0b7436d26232465b79f447e89	2017-01-09 15:28:30 +05:30
Linfeng Zhang	6abdd31555	Refine 8-bit 16x16 idct NEON intrinsics Speed test shows 25% gain on vpx_idct16x16_256_add_neon(), and vpx_idct16x16_10_add_neon() got trippled. Change-Id: If8518d9b6a3efab74031297b8d40cd83c4a49541	2017-01-06 17:52:07 -08:00
Johann	4dca923454	postproc: vpx_mbpost_proc_across_ip_neon The speedup is pretty poor. I would be concerned except the SSE2 is worse: Existing SSE2 improvement: 22% New neon improvement: 35% BUG=webm:1320 Change-Id: Ied598a261134aa6cbe69f96f58589d2bae17bf62	2017-01-06 16:39:17 -08:00
Linfeng Zhang	2d12a52ff0	Merge "Add high bitdepth 8x8 idct NEON intrinsics"	2017-01-06 16:47:23 +00:00
Linfeng Zhang	911bb980b1	Clean DC only idct NEON intrinsics BUG=webm:1301 Change-Id: Iffc83854218460b3f687f3774e71d45b552382a5	2016-12-28 13:51:44 -08:00
Linfeng Zhang	9b187954df	Add high bitdepth 8x8 idct NEON intrinsics BUG=webm:1301 Change-Id: I56e3bc3aab9214e2debac93796389a7194991084	2016-12-27 16:28:53 -08:00
Linfeng Zhang	6d5a3fe583	Clean idct 8x8 neon functions BUG=webm:1301 Change-Id: I05f47dca1fddc155c8396e627cfccf6449677307	2016-12-21 14:24:17 -08:00
James Zern	a68b36c752	vpx_idct32x32_1024_add_neon: quiet uninitialized warning relocate the assignment to 'in' outside of the for loop. this quiets a spurious warning in visual studio builds since: `86e340c` enable vpx_idct32x32_1024_add_neon in hbd builds + give the variable a more descriptive name BUG=webm:1294 Change-Id: I5c3da5c7939621477e0fc0ad3a1b2a3045c5bffd	2016-12-19 12:49:44 -08:00
Linfeng Zhang	7e23f895ca	Merge "Clean hbd idct 4x4 neon functions and other"	2016-12-19 17:09:26 +00:00
Johann	41b0888a84	postproc: neon down and across macroblock filter Implement vpx_post_proc_down_and_across_mb_row in NEON. Runs about 6-7x faster than C. BUG=webm:1320 Change-Id: Ic5c7d3552a88cfcf999ec5bf2bd46fee460642c2	2016-12-14 15:11:28 -08:00
Linfeng Zhang	c8f25fa5c0	Clean hbd idct 4x4 neon functions and other BUG=webm:1301 Change-Id: I387b7eae716a7df15c691dc6f368b07602df7342	2016-12-14 11:38:28 -08:00
James Zern	86e340c76e	enable vpx_idct32x32_1024_add_neon in hbd builds BUG=webm:1294 Change-Id: Ibdda54e6d1303b0f73bc7bc71417e4041d7618de	2016-12-12 19:28:35 -08:00
Linfeng Zhang	5d4aa325a6	Cosmetics by unifying dest_stride to stride in idct Change-Id: Ie9336a808a3c3592bb4fd5d4ad3839028bfcafba	2016-12-12 15:13:22 -08:00
Johann	2c24f7178d	Move load_and_transpose to transpose_neon.h Allows for use outside the idcts without pulling in idct_neon.h Change-Id: I4a94c1af3dac3e1b5bc8296ec9eab0ddcc8cfecf	2016-12-09 12:54:55 -08:00
James Zern	6defef4ab2	idct16x16_add_neon: fix arm visual studio builds after: `2d3d95f` enable vpx_idct16x16_256_add_neon in hbd builds reorder INCLUDEs and fix indent of IF/ENDIFs remove vpx_config.asm to avoid multiple symbol definitions in windows builds and shift idct_neon.asm.S to the top to allow use of CONFIG_VP9_HIGHBITDEPTH in the export list. Change-Id: I0dacfbae62a6ec8fe4a26940c1a52da2dfad2029	2016-12-08 15:17:57 -08:00
Linfeng Zhang	174528de1e	Merge "Update idct NEON optimization to not use narrowing saturating shift"	2016-12-07 21:03:21 +00:00
James Zern	f16a0a1aa4	Merge "enable vpx_idct16x16_256_add_neon in hbd builds"	2016-12-07 20:26:44 +00:00
Linfeng Zhang	018a2adcb1	Update idct NEON optimization to not use narrowing saturating shift Change-Id: Iae517017217dbacd638d40fcfeeb0f4bba7b8b8b	2016-12-07 10:25:09 -08:00
James Zern	2d3d95f7ac	enable vpx_idct16x16_256_add_neon in hbd builds BUG=webm:1294 Change-Id: Ib421c150b0d29dee0a81390a612bf01a4a28cff1	2016-12-06 18:32:21 -08:00
James Zern	228c9940ea	Merge changes Ibad079f2,I7858a0a1 * changes: enable vpx_idct16x16_10_add_neon in hbd builds idct16x16,NEON: rm output_stride from pass1 fns	2016-12-07 01:40:28 +00:00
James Zern	8befcd0089	enable vpx_idct16x16_10_add_neon in hbd builds BUG=webm:1294 Change-Id: Ibad079f25e673d4f5181961896a8a8333a51e825	2016-12-06 16:09:19 -08:00
James Zern	af9d7aa9fb	idct16x16,NEON: rm output_stride from pass1 fns vpx_idct16x16_256_add_neon_pass1, vpx_idct16x16_10_add_neon: this was a constant 8 in all cases meaning the results are stored contiguously, this allows the number of stores to be reduced. Change-Id: I7858a0a15a284883ef45c13dfd97c308df9ea09e	2016-12-06 15:13:33 -08:00
Linfeng Zhang	cb339d628f	Refine 8-bit 8x8 idct NEON intrinsics Change-Id: I4ec4ad1928ec2ed87f596f52f097bc52065278dd	2016-12-05 17:50:14 -08:00
Linfeng Zhang	a8eee97b43	Check in vpx_lpf_vertical_4_dual_neon() assembly This replaces its C version. Change-Id: Ie39e9324305fdc0fff610ced608a037e44a85a1a	2016-12-02 15:54:30 -08:00
James Zern	a7fa1314da	Merge changes I4afc130e,Iaa64d23f * changes: Add high bitdepth 4x4 idct NEON intrinsics Update idct x86 intrinsics to not use saturated add and sub	2016-12-02 04:01:28 +00:00
Linfeng Zhang	17a8cf5cc3	Add high bitdepth 4x4 idct NEON intrinsics Change-Id: I4afc130effa05b8be2e9f982967216b1beb2ce4b	2016-11-30 13:07:13 -08:00
Linfeng Zhang	264f6e70ec	Update idct x86 intrinsics to not use saturated add and sub Change-Id: Iaa64d23fdb45ca1f235b0ea57e614516e548eca4	2016-11-29 17:06:08 -08:00
James Zern	c6641782c3	idct16x16,NEON,cosmetics: normalize fn signatures + remove unused parameters from vpx_idct16x16_10_add_neon_pass2 Change-Id: Ie5912a4abdd308fab589380bca054a2e7234a2c4	2016-11-28 16:46:01 -08:00

1 2 3 4 5 ...

532 Commits