generic-library/vpx

Author	SHA1	Message	Date
Johann Koenig	b63e88e506	Merge "Use 'packssdw' for loading tran_low_t values"	2017-02-16 02:41:00 +00:00
Linfeng Zhang	106c342659	cosmetics,dsp/inv_txfm.c: reorder functions Change-Id: Ie0f7689ebe230c68eadb22a32b14838c1a7543a6	2017-02-15 11:40:35 -08:00
Linfeng Zhang	81914ce68a	Add vpx_highbd_idct16x16_38_add_neon() BUG=webm:1301 Change-Id: Ic6cd8c1e63e1b7a997cbed221e20fff4c599e0fe	2017-02-15 09:12:02 -08:00
Linfeng Zhang	e07e74fb0f	Add vpx_highbd_idct16x16_38_add_c() When eob is less than or equal to 38 for high-bitdepth 16x16 idct, call this function. BUG=webm:1301 Change-Id: I09167f89d29c401f9c36710b0fd2d02644052060	2017-02-14 17:25:52 -08:00
Johann	327a02d77e	Use 'packssdw' for loading tran_low_t values This matches bitdepth_conversion_sse2.asm and produces substantially better assembly. The old way had lots of 'movzwl' and 'shl' and storing back to memory before loading into an xmm register. Change-Id: Ib33e35354dfd691a4f8b1e39f4dbcbb14cd5302b	2017-02-14 22:39:49 +00:00
Linfeng Zhang	429e652809	Replace 14 with DCT_CONST_BITS in idct NEON functions' shifts Change-Id: I2a39a3bb87516b04d273bc1c0f4a634e3fb6f0f6	2017-02-14 13:08:41 -08:00
clang-format	4b402746ca	apply clang-format Change-Id: I75e4a9e0b37bd4586f26c8d6c1fa27f3f6ff1bce	2017-02-14 12:45:52 -08:00
Yi Luo	c1a90dc160	Merge "Replace idct32x32_34_add_ssse3 assembly with intrinsics"	2017-02-14 20:13:27 +00:00
Yi Luo	bd86de1ac8	Replace idct32x32_34_add_ssse3 assembly with intrinsics - No user-level speed performance change. - Pass unit tests. Change-Id: Idfc598e00f354265e41f6b3219f4734216c115c6	2017-02-14 10:38:36 -08:00
Linfeng Zhang	de9ae32b93	Merge "Add vpx_highbd_idct16x16_256_add_neon()"	2017-02-14 01:15:34 +00:00
Linfeng Zhang	5ad4159ebb	Add vpx_highbd_idct16x16_256_add_neon() BUG=webm:1301 Change-Id: I6bb755552a39bdd26eef3f449601f6a9766c65ec	2017-02-13 15:50:33 -08:00
Johann	5ecde212a8	fdct8x8 highbd neon: use tran_low_t for output Change-Id: I100c4a1955d80bec4d28e82796b3e7f57e84d0ba	2017-02-13 22:16:14 +00:00
Linfeng Zhang	016933ad48	Add vpx_highbd_idct{16x16,32x32}_1_add_neon() and update vpx_highbd_idct8x8_1_add_neon() BUG=webm:1301 Change-Id: I18d1a0cbe98ba822d5194c1b4e13a4c29c5c75f4	2017-02-13 10:25:22 -08:00
James Zern	91f87e7513	Merge "Add vpx_idct16x16_38_add_neon()"	2017-02-11 03:42:36 +00:00
Linfeng Zhang	bc1c18e18c	Add vpx_idct16x16_38_add_neon() The RunQuantCheck() test on it exposes 16-bit overflow in stage 7 of pass 2. Change to use saturating add/sub for both vpx_idct16x16_38_add_neon() and vpx_idct16x16_256_add_neon() for high bitdepth. Change-Id: Ibf4c107a887553a52852cc582e28d38a5a5a2712	2017-02-08 12:15:22 -08:00
Yi Luo	ac04d11abc	Replace idct8x8_12_add_ssse3 assembly code with intrinsics - Performance achieves the same as assembly. - Unit tests pass. Change-Id: I6eacfbbd826b3946c724d78fbef7948af6406ccd	2017-02-08 10:07:45 -08:00
Linfeng Zhang	cf76ee2cb7	Add vpx_idct16x16_38_add_c() When eob is less than or equal to 38 for 16x16 idct, call this function. Change-Id: Ief6f3fb16a49ace3c92cebf4e220bf5bf52a6087	2017-02-07 09:40:51 -08:00
Linfeng Zhang	66695533a8	Merge "Update 16x16 8-bit idct NEON intrinsics"	2017-02-07 16:52:40 +00:00
Johann	641fda79bb	highbd x86: consolidate tran_low_t conversions Create new helper files specifically for converting tran_low_t types. Change-Id: I7c4c458ef910f3b3d10a3cfbf9df4de7682fd905	2017-02-06 10:43:26 -08:00
Jingning Han	bb40844e32	Merge "Add SSSE3 intrinsic 8x8 inverse 2D-DCT"	2017-02-02 22:18:32 +00:00
Kaustubh Raste	5b10674b5c	Merge "Add mips msa sum_squares_2d_i16 function"	2017-02-02 08:09:21 +00:00
Johann Koenig	726556dde9	Merge "Remove neon assembly for idct 16x16 and 8x8"	2017-02-02 03:25:31 +00:00
Johann Koenig	ce6318f254	Merge changes I43521ad3,I013659f6 * changes: satd highbd neon: use tran_low_t for coeff satd highbd sse2: use tran_low_t for coeff	2017-02-02 03:03:58 +00:00
Linfeng Zhang	e4985cf619	Update 16x16 8-bit idct NEON intrinsics Remove redundant memory accesses. Change-Id: I8049074bdba5f49eab7e735b2b377423a69cd4c8	2017-02-01 17:04:33 -08:00
Jingning Han	8f95389742	Add SSSE3 intrinsic 8x8 inverse 2D-DCT The intrinsic version reduces the average cycles from 183 to 175. Change-Id: I7c1bcdb0a830266e93d8347aed38120fb3be0e03	2017-02-01 14:47:53 -08:00
Johann Koenig	dc90501ba3	Merge changes I374dfc08,I7e15192e,Ica414007 * changes: hadamard highbd ssse3: use tran_low_t for coeff hadamard highbd neon: use tran_low_t for coeff hadamard highbd sse2: use tran_low_t for coeff	2017-02-01 21:56:36 +00:00
Johann Koenig	f60171bb4f	Merge "deblock: annotate postproc parameters"	2017-02-01 19:57:29 +00:00
Johann	f8d744d91a	satd highbd neon: use tran_low_t for coeff BUG=webm:1365 Change-Id: I43521ad32b6c96737a8ef2b8c327f901fd7eaf84	2017-02-01 11:55:47 -08:00
Johann	2ba383474d	satd highbd sse2: use tran_low_t for coeff BUG=webm:1365 Change-Id: I013659f6b9fbf9cc52ab840eae520fe0b5f883fb	2017-02-01 11:55:16 -08:00
Johann	0f751ecee3	hadamard highbd ssse3: use tran_low_t for coeff BUG=webm:1365 Change-Id: I374dfc08732932382043905f128e928b08cb4f57	2017-02-01 11:51:15 -08:00
Johann	1eb8a718bf	hadamard highbd neon: use tran_low_t for coeff BUG=webm:1365 Change-Id: I7e15192ead3a3631755b386f102c979f06e26279	2017-02-01 11:50:46 -08:00
Johann	2dac808dd1	hadamard highbd sse2: use tran_low_t for coeff BUG=webm:1365 Change-Id: Ica414007d8412ceebfffa9e58e8416226a3fe934	2017-02-01 11:46:57 -08:00
Johann Koenig	3bda634576	Merge "quantize ssse3: remove unused pxor"	2017-02-01 19:41:41 +00:00
Jingning Han	969957f9f2	Fix real-time compression regression in hbd mode This commit resolves the compression performance regression in real-time encoding setting when high bit-depth mode is enabled. The current solution temporarily disables the SIMD implementations of vpx_satd, hadamard8x8, and hadamard16x16 in high bit-depth mode. The commit makes the coding results bit-wise identical between regular coding pipeline and high bit-depth at profile 0. BUG=webm:1365 Change-Id: Icfb900821733749685370460a1a5a7e07f76f4bf	2017-01-31 23:17:09 -08:00
Johann	32f68cc58c	deblock: annotate postproc parameters Clears a clang static analyzer warning where 'cols' is assumed to be less than 0, preventing the for loop from executing. The assembly already requires that the size be 8 or 16 (U/V or Y plane) and cols is a multiple of 8. Change-Id: Ica4612690ead1638c94cfe56b306e87f8ce644f9	2017-01-31 15:58:57 -08:00
Kaustubh Raste	750e753134	Add mips msa sum_squares_2d_i16 function average improvement ~4x-5x Change-Id: I8d91b71d0677009be52b412e4f52b40b98573a53	2017-01-31 12:22:43 +00:00
Kaustubh Raste	df7e1fecc1	Add mips msa vpx_minmax_8x8 function average improvement ~4x-5x Change-Id: I83aee9977534fddb8a9b80d31af646c0b6b1a8c3	2017-01-31 10:00:43 +05:30
Johann	dcfff3ccc8	quantize ssse3: remove unused pxor Change-Id: Ifa22d77fd530827de0b32ae71810dc2213ab2937	2017-01-30 17:02:57 -08:00
Kaustubh Raste	4ce20fb3f4	Add mips msa vpx_vector_var function average improvement ~4x-5x Change-Id: I2f63ef83d816052ca8dc42421e7e9d42f7a7af6b	2017-01-28 08:53:20 +00:00
Kaustubh Raste	407fad2356	Add mips msa vpx Integer projection row/col functions average improvement ~4x-5x Change-Id: I17c41383250282b39f5ecae0197ef1df7de20801	2017-01-27 11:11:42 +05:30
Kaustubh Raste	182ea677a0	Add mips msa vpx satd function average improvement ~4x-5x Change-Id: If8683d636fe2606d4ca1038e28185bca53bbe244	2017-01-24 10:44:22 +05:30
Johann	13234d3c43	Remove neon assembly for idct 16x16 and 8x8 Tested using test/partial_idct_test.cc:DISABLED_Speed Both gcc 4.9 and clang 3.8 from the r13 Android NDK offer improvements using the intrinsics: <function> <clang asm> <gcc asm> <clang intrin> <gcc intrin> idct16x16_256 1720ms 1703ms 1546ms 1554ms idct16x16_10 1320ms 1247ms 518ms 488ms idct16x16_1 107ms 108ms 64ms 68ms idct8x8_64 924ms 931ms 866ms 989ms idct8x8_12 826ms 824ms 519ms 514ms idct8x8_1 172ms 166ms 110ms 125ms idct8x8_64 isn't quite perfect (slight regression with gcc intrinsics) but as a counter example idct16x16_10 goes from ~1300ms to ~500ms On a sample clip, clang improved from 48.5 to 49fps and gcc stayed roughly stable. BUG=webm:1303 Change-Id: I9d4fd2b41b46ea6174a887b40a82c8e6e4769ed4	2017-01-19 12:27:31 -08:00
Kaustubh Raste	e0c0e65378	Add mips msa vpx hadamard functions average improvement ~4x-5x Change-Id: I167132d894c04fa85dda8dde7906ff9c61b3a65d	2017-01-19 14:44:03 +05:30
Jingning Han	b6fe63a505	Merge "Rework 8x8 transpose SSSE3 for avg computation"	2017-01-13 18:25:17 +00:00
Jingning Han	553e9e291f	Merge "Rework 8x8 transpose SSSE3 for inverse 2D-DCT"	2017-01-13 18:25:09 +00:00
Jingning Han	39fff1bea0	Rework 8x8 transpose SSSE3 for avg computation Use same transpose process as inv_txfm_sse2 does. Change-Id: I2db05f0b254628a11f621c4c09abb89501ba6d3c	2017-01-12 15:16:07 -08:00
Jingning Han	f65170ea84	Rework 8x8 transpose SSSE3 for inverse 2D-DCT Use same transpose process as inv_txfm_sse2 does. Change-Id: Ic4827825bd174cba57a0a80e19bf458a648e7d94	2017-01-12 15:13:18 -08:00
Johann Koenig	9f27d1f843	Merge "arm idct16x16: remove extra config guards"	2017-01-11 20:22:27 +00:00
Johann	68d0f46ec0	arm idct16x16: remove extra config guards This file is guarded by HAVE_NEON_ASM in the .mk file now. Change-Id: I513a621c234aa90ad52e426c8ed494d8a7d4b74a	2017-01-11 10:17:14 -08:00
Jingning Han	9a780fa7db	Rework forward 8x8 2D-DCT ssse3 implementation This commit reworks the SSSE3 implementation of the forward 8x8 2D-DCT. It uses a cyclic rotation approach to the temporary xmm registers. It reduces the average cycles from 158 to 154. The SSE2 version uses 169 cycles. Change-Id: I1b79b9642aae0ed3fb3cefb5b70246e6de5d5caa	2017-01-10 12:50:55 -08:00
James Zern	9480da21e8	Merge "Refine 8-bit 16x16 idct NEON intrinsics"	2017-01-09 23:52:29 +00:00
Johann Koenig	371a64bfe7	Merge "postproc: vpx_mbpost_proc_down_neon"	2017-01-09 19:53:15 +00:00
Johann Koenig	8a7847c2c9	Merge "Fix mips dspr2 idct32x32 functions for large coefficient input"	2017-01-09 19:47:47 +00:00
Johann Koenig	bf168b24f5	Merge "Fix mips dspr2 idct16x16 functions for large coefficient input"	2017-01-09 19:47:00 +00:00
Johann Koenig	08d0a7fd0f	Merge "Fix mips dspr2 idct8x8 functions for large coefficient input"	2017-01-09 19:46:18 +00:00
Johann Koenig	ab20869221	Merge "Fix mips dspr2 idct4x4 functions for large coefficient input"	2017-01-09 19:45:54 +00:00
Johann	c23970ec25	postproc: vpx_mbpost_proc_down_neon This was much more amenable to optimization than the across filter. Speedup of almost 2.5x BUG=webm:1320 Change-Id: I49acc0f9cb2e7642303df90132cbc938acade4c4	2017-01-09 10:21:56 -08:00
Johann Koenig	9af97fb630	Merge "postproc: vpx_mbpost_proc_across_ip_neon"	2017-01-09 18:17:26 +00:00
Kaustubh Raste	50dd3eb62c	Fix mips dspr2 idct32x32 functions for large coefficient input Change-Id: If9da7099f226a27a09cc9e2899eb66a1158909d2	2017-01-09 17:21:09 +05:30
Kaustubh Raste	c06991fce6	Fix mips dspr2 idct16x16 functions for large coefficient input Change-Id: I9be3d3d040837f658c6314606e28db8c31092a1a	2017-01-09 16:35:28 +05:30
Kaustubh Raste	24d804f79c	Fix mips dspr2 idct8x8 functions for large coefficient input Change-Id: If011dd923bbe976589735d5aa1c3167dda1a3b61	2017-01-09 16:22:19 +05:30
Kaustubh Raste	afd2d797eb	Fix mips dspr2 idct4x4 functions for large coefficient input Change-Id: I06730eec80ca81e0b7436d26232465b79f447e89	2017-01-09 15:28:30 +05:30
Linfeng Zhang	6abdd31555	Refine 8-bit 16x16 idct NEON intrinsics Speed test shows 25% gain on vpx_idct16x16_256_add_neon(), and vpx_idct16x16_10_add_neon() got trippled. Change-Id: If8518d9b6a3efab74031297b8d40cd83c4a49541	2017-01-06 17:52:07 -08:00
Johann	4dca923454	postproc: vpx_mbpost_proc_across_ip_neon The speedup is pretty poor. I would be concerned except the SSE2 is worse: Existing SSE2 improvement: 22% New neon improvement: 35% BUG=webm:1320 Change-Id: Ied598a261134aa6cbe69f96f58589d2bae17bf62	2017-01-06 16:39:17 -08:00
Linfeng Zhang	2d12a52ff0	Merge "Add high bitdepth 8x8 idct NEON intrinsics"	2017-01-06 16:47:23 +00:00
Linfeng Zhang	911bb980b1	Clean DC only idct NEON intrinsics BUG=webm:1301 Change-Id: Iffc83854218460b3f687f3774e71d45b552382a5	2016-12-28 13:51:44 -08:00
Linfeng Zhang	9b187954df	Add high bitdepth 8x8 idct NEON intrinsics BUG=webm:1301 Change-Id: I56e3bc3aab9214e2debac93796389a7194991084	2016-12-27 16:28:53 -08:00
Linfeng Zhang	6d5a3fe583	Clean idct 8x8 neon functions BUG=webm:1301 Change-Id: I05f47dca1fddc155c8396e627cfccf6449677307	2016-12-21 14:24:17 -08:00
James Zern	a68b36c752	vpx_idct32x32_1024_add_neon: quiet uninitialized warning relocate the assignment to 'in' outside of the for loop. this quiets a spurious warning in visual studio builds since: `86e340c` enable vpx_idct32x32_1024_add_neon in hbd builds + give the variable a more descriptive name BUG=webm:1294 Change-Id: I5c3da5c7939621477e0fc0ad3a1b2a3045c5bffd	2016-12-19 12:49:44 -08:00
Linfeng Zhang	7e23f895ca	Merge "Clean hbd idct 4x4 neon functions and other"	2016-12-19 17:09:26 +00:00
Johann	41b0888a84	postproc: neon down and across macroblock filter Implement vpx_post_proc_down_and_across_mb_row in NEON. Runs about 6-7x faster than C. BUG=webm:1320 Change-Id: Ic5c7d3552a88cfcf999ec5bf2bd46fee460642c2	2016-12-14 15:11:28 -08:00
Linfeng Zhang	c8f25fa5c0	Clean hbd idct 4x4 neon functions and other BUG=webm:1301 Change-Id: I387b7eae716a7df15c691dc6f368b07602df7342	2016-12-14 11:38:28 -08:00
James Zern	86e340c76e	enable vpx_idct32x32_1024_add_neon in hbd builds BUG=webm:1294 Change-Id: Ibdda54e6d1303b0f73bc7bc71417e4041d7618de	2016-12-12 19:28:35 -08:00
Linfeng Zhang	5d4aa325a6	Cosmetics by unifying dest_stride to stride in idct Change-Id: Ie9336a808a3c3592bb4fd5d4ad3839028bfcafba	2016-12-12 15:13:22 -08:00
Johann	2c24f7178d	Move load_and_transpose to transpose_neon.h Allows for use outside the idcts without pulling in idct_neon.h Change-Id: I4a94c1af3dac3e1b5bc8296ec9eab0ddcc8cfecf	2016-12-09 12:54:55 -08:00
James Zern	6defef4ab2	idct16x16_add_neon: fix arm visual studio builds after: `2d3d95f` enable vpx_idct16x16_256_add_neon in hbd builds reorder INCLUDEs and fix indent of IF/ENDIFs remove vpx_config.asm to avoid multiple symbol definitions in windows builds and shift idct_neon.asm.S to the top to allow use of CONFIG_VP9_HIGHBITDEPTH in the export list. Change-Id: I0dacfbae62a6ec8fe4a26940c1a52da2dfad2029	2016-12-08 15:17:57 -08:00
Linfeng Zhang	174528de1e	Merge "Update idct NEON optimization to not use narrowing saturating shift"	2016-12-07 21:03:21 +00:00
James Zern	f16a0a1aa4	Merge "enable vpx_idct16x16_256_add_neon in hbd builds"	2016-12-07 20:26:44 +00:00
Linfeng Zhang	018a2adcb1	Update idct NEON optimization to not use narrowing saturating shift Change-Id: Iae517017217dbacd638d40fcfeeb0f4bba7b8b8b	2016-12-07 10:25:09 -08:00
James Zern	2d3d95f7ac	enable vpx_idct16x16_256_add_neon in hbd builds BUG=webm:1294 Change-Id: Ib421c150b0d29dee0a81390a612bf01a4a28cff1	2016-12-06 18:32:21 -08:00
James Zern	228c9940ea	Merge changes Ibad079f2,I7858a0a1 * changes: enable vpx_idct16x16_10_add_neon in hbd builds idct16x16,NEON: rm output_stride from pass1 fns	2016-12-07 01:40:28 +00:00
James Zern	8befcd0089	enable vpx_idct16x16_10_add_neon in hbd builds BUG=webm:1294 Change-Id: Ibad079f25e673d4f5181961896a8a8333a51e825	2016-12-06 16:09:19 -08:00
James Zern	af9d7aa9fb	idct16x16,NEON: rm output_stride from pass1 fns vpx_idct16x16_256_add_neon_pass1, vpx_idct16x16_10_add_neon: this was a constant 8 in all cases meaning the results are stored contiguously, this allows the number of stores to be reduced. Change-Id: I7858a0a15a284883ef45c13dfd97c308df9ea09e	2016-12-06 15:13:33 -08:00
Linfeng Zhang	cb339d628f	Refine 8-bit 8x8 idct NEON intrinsics Change-Id: I4ec4ad1928ec2ed87f596f52f097bc52065278dd	2016-12-05 17:50:14 -08:00
Linfeng Zhang	a8eee97b43	Check in vpx_lpf_vertical_4_dual_neon() assembly This replaces its C version. Change-Id: Ie39e9324305fdc0fff610ced608a037e44a85a1a	2016-12-02 15:54:30 -08:00
James Zern	a7fa1314da	Merge changes I4afc130e,Iaa64d23f * changes: Add high bitdepth 4x4 idct NEON intrinsics Update idct x86 intrinsics to not use saturated add and sub	2016-12-02 04:01:28 +00:00
Linfeng Zhang	17a8cf5cc3	Add high bitdepth 4x4 idct NEON intrinsics Change-Id: I4afc130effa05b8be2e9f982967216b1beb2ce4b	2016-11-30 13:07:13 -08:00
Linfeng Zhang	264f6e70ec	Update idct x86 intrinsics to not use saturated add and sub Change-Id: Iaa64d23fdb45ca1f235b0ea57e614516e548eca4	2016-11-29 17:06:08 -08:00
James Zern	c6641782c3	idct16x16,NEON,cosmetics: normalize fn signatures + remove unused parameters from vpx_idct16x16_10_add_neon_pass2 Change-Id: Ie5912a4abdd308fab589380bca054a2e7234a2c4	2016-11-28 16:46:01 -08:00
James Zern	21a1abd8e3	enable vpx_idct32x32_135_add_neon in hbd builds BUG=webm:1294 Change-Id: Ide6d3994fe01c4320c9d143e6d059b49568048e4	2016-11-23 19:59:43 -08:00
James Zern	568d4b1d63	idct_neon: rename load_tran_low_to_s16 -> ...s16q BUG=webm:1294 Change-Id: I164cfcbe9bc4511d1d04af9206cf351a0ec2957b	2016-11-23 19:57:48 -08:00
James Zern	d757d7e998	Merge changes Icc4ead05,Ib019964b,I3b5fd3b3,Ieedadee2 * changes: Update vpx_idct4x4_16_add_neon() to pass SingleExtremeCoeff test Refine 8-bit 4x4 idct NEON intrinsics Add idct speed test. Update partial_idct_test.cc to support high bitdepth	2016-11-24 03:31:25 +00:00
Jerome Jiang	97ec6291ee	Change C/MSA post proc to match SSE2. BUG=webm:1321 Change-Id: I719023375dc48cf7d8ed72188853f0f1ccc4ad7f	2016-11-23 10:42:11 -08:00
Linfeng Zhang	05e2b5a59f	Merge "Add 32x32 d45 and 8x8, 16x16, 32x32 d135 NEON intra prediction"	2016-11-22 23:20:53 +00:00
Linfeng Zhang	6cc76ec73f	Update vpx_idct4x4_16_add_neon() to pass SingleExtremeCoeff test Change-Id: Icc4ead05506797d12bf134e8790443676fef5c10	2016-11-22 11:35:05 -08:00
Linfeng Zhang	974e81d184	Refine 8-bit 4x4 idct NEON intrinsics Change-Id: Ib019964bfcbce7aec57d8c3583127f9354d3c11f	2016-11-22 11:26:03 -08:00
Kaustubh Raste	ecc5998bcf	Fix mips dspr2 build warning Change-Id: Ia8fb3ed124f01384e7896e309c9ff22c05b40719	2016-11-22 17:49:17 +05:30
Kaustubh Raste	a38e9f412d	Merge "Fix SingleLargeCoeff idct test"	2016-11-19 03:37:29 +00:00
James Zern	cbeae53e76	Merge "Clean horizontal intra prediction NEON optimization"	2016-11-19 01:29:37 +00:00
Jerome Jiang	de5fd00ec5	Change _xmm to _sse2 in deblocker assembly functions. Some cosmetic changes because xmm is an anachronism. Change-Id: I436a5b78a3c52776c20d6640939311f2a84a9bc7	2016-11-17 23:38:04 +00:00
Kaustubh Raste	c56e5dd620	Fix SingleLargeCoeff idct test Updated idct code to handle single large coefficient (-32768) Change-Id: Ia13ab1ab434a9a1b9954a5914088977a88841cc7	2016-11-17 11:41:07 +00:00
Jerome Jiang	5d48663e04	Merge "Change C and msa to match results from sse2."	2016-11-17 05:16:27 +00:00
Jerome Jiang	cb1b1b8fef	Change C and msa to match results from sse2. Re-enable the tests to check CvsAssembly. BUG=webm:1321 Change-Id: Id7f7d74b06c469fb6c8f5d04e91359e9cd9097a6	2016-11-16 17:05:26 -08:00
Linfeng Zhang	85c1ee434d	Add high bitdepth intra prediction NEON optimization (mode tm) BUG=webm:1316 Change-Id: Ib014de06836ac12726f4a2c9f0833ec4eb4d233b	2016-11-15 14:19:46 -08:00
Linfeng Zhang	a3128ad33a	Add high bitdepth intra prediction NEON optimization (h and v) BUG=webm:1316 Change-Id: I47eeac698a98a31d1af5f72441052302e9fa4f46	2016-11-12 12:00:19 -08:00
James Zern	80f6b243a7	Merge changes I339088b2,Iaade219e,If142afb1,I4257c4b3 * changes: fdct8x8_test: add vpx_idct8x8_64_add_neon in hbd fdct4x4_test: add vpx_idct4x4_16_add_neon in hbd partial_idct_test,NEON: add missing idct variants enable vpx_idct32x32_34_add_neon in hbd builds	2016-11-10 05:02:39 +00:00
Linfeng Zhang	40ab0424d4	Add high bitdepth intra prediction NEON optimization (mode d45 and d135) BUG=webm:1316 Change-Id: I6a330874348df04df24a6d9efdc06f567e04bf8e	2016-11-09 12:04:04 -08:00
James Zern	738c8f23c6	enable vpx_idct32x32_34_add_neon in hbd builds replace load_and_transpose_s16_8x8() in idct32_6_neon() with a separate load_tran_low_to_s16() and transpose_s16_8x8(). the combined function is used in idct32_8_neon() where the input is the correctly sized output from the earlier stage. BUG=webm:1294 Change-Id: I4257c4b3a421b2cf5d13651f966eee0680ef98a9	2016-11-08 17:03:36 -08:00
Johann	50b40f114c	Optimize idct32x32_135_add for NEON BUG=webm:1295 Change-Id: I7f80ef4d29813fcb401fc6075babf19e3c195462	2016-11-08 22:06:07 +00:00
Linfeng Zhang	64a5a8fd6f	Merge "Add high bitdepth intra prediction NEON optimization (mode dc)"	2016-11-08 16:53:42 +00:00
Linfeng Zhang	d545c19afa	Rename vpx_highbd_idct8x8_10{}() to vpx_highbd_idct8x8_12{}() Also update its trigger threshold from 10 to 12. Change-Id: Ib8dddd87a5a22a12ca66e7084d342fbb027b0a2f	2016-11-07 09:07:55 -08:00
Linfeng Zhang	a9874961f0	Merge "Replace highbd_dct_const_round_shift with dct_const_round_shift"	2016-11-07 16:55:01 +00:00
Johann	e10c95dc83	Update vp9_fdct8x8_quant_ssse3 for highbitdepth Borrow transition functions from fdct.h nee vpx_quantize_b_sse2 BUG=webm:1304 Change-Id: I9c88c3eec3ff8bb461411d98c26c3c236ea28ef1	2016-11-05 01:23:07 +00:00
Linfeng Zhang	04c3bf3c85	Replace highbd_dct_const_round_shift with dct_const_round_shift They are identical. Change-Id: I1ccaf03c81c3cbf88e82d77ffeb8204f5b063c61	2016-11-04 16:15:02 -07:00
Linfeng Zhang	32326c2f13	Merge "Cosmetics of inv_txfm.c"	2016-11-04 22:40:03 +00:00
Johann Koenig	900ec31bea	Merge "Extract high bit depth helper functions"	2016-11-04 21:03:17 +00:00
Linfeng Zhang	b68d8107cb	Cosmetics of inv_txfm.c Unify code of 8-bit and high bitdepth. Change-Id: I3fe441577af0249030ca3a1ef769eb9030711434	2016-11-04 13:24:41 -07:00
Johann	cf35ffc025	Extract high bit depth helper functions These can be used in the vp9 fdct as well. Change-Id: I4f3875e0cba1b8cad209c3a0581e121deba7675e	2016-11-04 18:13:51 +00:00
Martin Storsjo	34c35b6fb6	Add a missing END directive in idct_neon.asm This fixes building with MS armasm. Change-Id: I2629eeed859b775ca667a65ba109f8d1bf7b0e03	2016-11-04 12:21:18 +02:00
Linfeng Zhang	1338c71dfb	Clean horizontal intra prediction NEON optimization Change-Id: I1ef0a5b2655cbc7e1cc2a4a1a72e0eed9aa41f05	2016-11-02 11:43:45 -07:00
Linfeng Zhang	1868582e7d	Add 32x32 d45 and 8x8, 16x16, 32x32 d135 NEON intra prediction Change-Id: I852616794244490123eb615ac750da50265f0fa5	2016-11-02 11:40:37 -07:00
Johann Koenig	5ac7a59a05	Merge "arm idct: move to-be-shared code to header"	2016-11-02 18:09:45 +00:00
Linfeng Zhang	3b74066b10	Add high bitdepth intra prediction NEON optimization (mode dc) BUG=webm:1316 Change-Id: I984d6004ea2445e86f213fb6fa4d794a9955af8f	2016-11-01 17:07:36 -07:00
Johann	bf8ab194ee	arm idct: move to-be-shared code to header Change-Id: I67458cd358b4dc4434bbdbfcdd571769561b619e	2016-11-01 15:43:56 -07:00
James Zern	1b275ab898	Merge "idct32x32_1_add_neon: clear a couple conv warnings"	2016-11-01 22:34:59 +00:00
James Zern	9de91855ef	Merge changes I08af3a54,If5959a25,I6763e62e * changes: build/make/Android.mk: s/armv8/arm64/ build/make/Android.mk: fix armeabi-v7a build use .S suffix rather than .s for NEON asm	2016-11-01 21:43:13 +00:00
Linfeng Zhang	cc5f49767a	Refine 8-bit intra prediction NEON optimization (mode tm) Change-Id: I98b9577ec51367df5e5d564bedf7c3ea0606de4c	2016-11-01 09:45:16 -07:00
James Zern	7625c803b3	idct32x32_1_add_neon: clear a couple conv warnings int16_t -> uint8_t Change-Id: I3c5e0985bc3584dce289c35b5973de24cdc73b76	2016-10-31 18:56:34 -07:00
James Zern	1ddb4c0362	use .S suffix rather than .s for NEON asm for compatibility with other build systems Change-Id: I6763e62e3126850ad4f8ad29e388b8dad0bbc4c3	2016-10-31 16:39:05 -07:00
James Zern	410d947c5f	Merge "idct,NEON: add a tran_low_t->s16 load adapter"	2016-10-31 21:59:12 +00:00
James Zern	3ae25974fd	idct,NEON: add a tran_low_t->s16 load adapter enable idct4x4* and idct8x8* which are compatible for 8-bit decodes in high-bitdepth mode. the adapter narrows 32-bit input to 16, whether the expansion can be avoided at all in this case remains a TODO. roughly matches sse2. BUG=webm:1294 Change-Id: I3ea94e5a2070dfd509b5de0c555aab4e1f4da036	2016-10-31 11:21:16 -07:00
Linfeng Zhang	a347118f3c	Refine 8-bit intra prediction NEON optimization (mode h and v) Change-Id: I45e1454c3a85e081bfa14386e0248f57e2a91854	2016-10-31 10:33:44 -07:00
Linfeng Zhang	4ae9f5c092	Refine 8-bit intra prediction NEON optimization (mode d45 and d135) dst += stride behaving better with gcc/clang. Unroll loops. Change-Id: I83f85df2bc9f17c6159542f57680b509395db2b1	2016-10-27 14:24:50 -07:00
Linfeng Zhang	9c0680bd43	Merge "Refine 8-bit intra prediction NEON optimization (mode dc)"	2016-10-26 16:51:44 +00:00
Johann	9720b58aac	Optimize idct32x32_34_add for NEON Approximately 3 times faster than the 1024 version which was used previously. BUG=webm:1295 Change-Id: Id15fb3d096029ec38ef01c53e5f6eb08254347c9	2016-10-25 15:43:58 -07:00
Linfeng Zhang	ce88b8f5c5	Refine 8-bit intra prediction NEON optimization (mode dc) dst += stride behaving better with gcc/clang Expanding inline function dc_SIZExSIZE() save intructions for vpx_dc_predictor_SIZExSIZE_neon(). Change-Id: Id0ccbd58b6a31df539141fd33bdf28633339150d	2016-10-24 13:18:51 -07:00
James Zern	2e6a1976a0	Merge "remove idct32x32*_add_neon.asm"	2016-10-22 02:29:56 +00:00
James Zern	5d91752a98	Merge "vpx_highbd_convolve_copy_neon: use multi reg loads"	2016-10-22 02:28:15 +00:00
James Zern	9dbb3ad396	remove idct32x32*_add_neon.asm the intrinsics are neutral to ~20% faster on cros/android devices when using gcc-4.9/clang-3.8.1 and gcc-4.9/clang-3.8.x from the r13 ndk. neutral results typically came with gcc-4.9 while larger positive gains were achieved with clang 3.8.x. BUG=webm:1303 Change-Id: I4d31f9c017944681b881493525d4573a7a5b1e16	2016-10-20 19:47:14 -07:00
James Zern	a60dd5c83a	Merge "Fix warnings reported by -Wshadow: Part1: vpx_dsp directory"	2016-10-18 22:09:29 +00:00
Kaustubh Raste	8ff5af773a	Merge "Optimize sad_64width_x4d_msa function"	2016-10-18 07:46:02 +00:00
Kaustubh Raste	b7310e2aff	Optimize sad_64width_x4d_msa function Reduced HADD_UH_U32 macro calls Change-Id: Ie089b9a443de516646b46e8f72156aa826ca8cfa	2016-10-18 04:05:33 +00:00
Urvang Joshi	e084e05484	Fix warnings reported by -Wshadow: Part1: vpx_dsp directory While we are at it: - Rename some variables to more meaningful names - Reuse some common consts from a header instead of redefining them. Change-Id: I75c4248cb75aa54c52111686f139b096dc119328 (cherry picked from aomedia 09eea21)	2016-10-17 19:25:19 -07:00
James Zern	68cd3052ca	vpx_highbd_convolve_copy_neon: use multi reg loads for copy16/32/64 BUG=webm:1299 Change-Id: I5080d736bde7e487c80ef3d7024dda1e96a57eaf	2016-10-17 17:15:03 -07:00
Linfeng Zhang	9c8981c666	add vpx high bitdepth convolve8 NEON intrinsics optimization BUG=webm:1299 Change-Id: I236bfa0441e357b6ff05add8269a2cfb543924d1	2016-10-17 15:23:54 -07:00
Linfeng Zhang	f910d14a1a	add vpx_highbd_convolve_{copy,avg}_neon() BUG=webm:1299 Change-Id: Ib87ac466ada63251eb06ae2abd1e13e61e0d1538	2016-10-13 15:21:14 -07:00
James Zern	1909270f65	Merge "cosmetics,*loopfilter_neon.c: s/tranpose/transpose/"	2016-10-13 07:12:51 +00:00
Kaustubh Raste	9e75c01353	Merge "Optimize vpx_mbpost_proc_across_ip_msa function"	2016-10-13 02:12:33 +00:00
Kaustubh Raste	99adf8b22e	Merge "Optimize vpx_get4x4sse_cs_msa function"	2016-10-13 02:12:00 +00:00
James Zern	fd270437f0	cosmetics,*loopfilter_neon.c: s/tranpose/transpose/ Change-Id: I267d6a9d715ddb6110f0881c2e820c37fc673fe1	2016-10-12 16:12:56 -07:00
Linfeng Zhang	01454ec485	[vpx highbd lpf NEON 6/6] vertical 16 BUG=webm:1300 Change-Id: I29d0b482d66f05e278325ddebcf108fbf0b6e222	2016-10-11 22:59:19 -07:00
Linfeng Zhang	27479775c4	[vpx highbd lpf NEON 5/6] horizontal 16 BUG=webm:1300 Change-Id: I21da32d6cfb8a1a6f58bc9756d17f48f13a59a12	2016-10-11 22:59:19 -07:00
Linfeng Zhang	251cbfbec8	[vpx highbd lpf NEON 4/6] vertical 8 BUG=webm:1300 Change-Id: If06b12bc081bab60059b100414dd7018f83ac62d	2016-10-11 22:59:19 -07:00
Linfeng Zhang	96c7206ede	[vpx highbd lpf NEON 3/6] horizontal 8 BUG=webm:1300 Change-Id: Ica2379e294be60b7f80fcfcec110dca4c3b59d81	2016-10-12 00:48:31 +00:00
Linfeng Zhang	57e4cbc632	Merge "[vpx highbd lpf NEON 2/6] vertical 4"	2016-10-10 16:57:55 +00:00
Linfeng Zhang	19046d9963	Merge "[vpx highbd lpf NEON 1/6] horizontal 4"	2016-10-10 16:56:23 +00:00
Kaustubh Raste	3da752fe00	Optimize vpx_mbpost_proc_across_ip_msa function Removed HADD_SW_S32 calculation Change-Id: I7384dc881451d197404d09beb7c27b222e1d6875	2016-10-10 18:03:28 +05:30
Kaustubh Raste	d05104b488	Optimize vpx_get4x4sse_cs_msa function Reuse CALC_MSE_B macro Change-Id: I39f0a92ac2dbb5fa8628df1a5d556cfdc42a3648	2016-10-10 16:31:57 +05:30
Kaustubh Raste	3c2f7eb339	Optimize vp9 loopfilter msa functions Updated code to process in 8bit as saturation/clipping takes care of overflow Removed unused macro Change-Id: I113df60286fb28b216df800d95b2d3695ef71440	2016-10-07 19:26:26 -07:00
Linfeng Zhang	49aa9b1f12	[vpx highbd lpf NEON 2/6] vertical 4 BUG=webm:1300 Change-Id: Ia33a9f2d6c7e2e6b3497ad6f1a09439a85b33983	2016-10-06 14:22:26 -07:00
Linfeng Zhang	7aa27bd62f	[vpx highbd lpf NEON 1/6] horizontal 4 BUG=webm:1300 Change-Id: Idf441806e6bf397ff5ecd8776146b3f781f50c40	2016-10-06 14:03:04 -07:00
James Zern	1e1caad165	vpx_dsp/idct*_neon.asm: simplify immediate loads mov supports 0-65535 Change-Id: I019de0d784836d7bd60e6b36f2cdeefb541cb3fd	2016-10-05 14:28:32 -07:00
James Zern	a6be7ba1aa	enable idct*_1_add_neon in high-bitdepth builds these are compatible as they only load one element of the input so the larger size of tran_low_t makes no difference in little endian builds. note the asm is incompatible with big-endian, but there are other points of failure there so currently it's considered unsupported. BUG=webm:1294 Change-Id: Icd2665a0699bccae92d1bea43a95b0a83fb17028	2016-10-05 11:14:25 -07:00
Angie Chiang	5d635365bb	Merge "Move highbd txfm input range check from 2d iht transform to 1d idct/iadst"	2016-10-04 16:57:37 +00:00
Kaustubh Raste	0a92dd7319	Merge "Fix vpx_plane_add_noise_msa functionality bit-mismatch"	2016-10-04 06:35:47 +00:00
Angie Chiang	5b073c695b	Move highbd txfm input range check from 2d iht transform to 1d idct/iadst This change will make the highbd txfm input range check more comprehensive The 25-bit highbd input range is composed by 12 signal input bits + 7 bits for 2D forward transform amplification + 5 bits for 1D inverse transform amplification + 1 bit for contingency in rounding and quantizing BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1286 BUG=https://bugs.chromium.org/p/chromium/issues/detail?id=651625 Change-Id: I04c0796edd7653f8d463fba5dc418132986131e7	2016-10-03 17:21:08 -07:00
James Zern	c6bc7499d9	Merge "cosmetics,*_neon.c: rm redundant return from void fns"	2016-10-03 22:40:42 +00:00
Kaustubh Raste	6922fc8230	Fix vpx_plane_add_noise_msa functionality bit-mismatch Change-Id: I04961afb592ae6a67fdcfd8c9066e920dd4b30e7	2016-10-03 18:15:59 +00:00
James Zern	50b9c467da	Merge "vpx_convolve8_neon,load/store*: correct param type"	2016-10-01 23:52:14 +00:00
James Zern	c449983c56	vpx_convolve8_neon,load/store*: correct param type stride/pitch in convolve is expressed with a ptrdiff_t Change-Id: Ia5a6732dc509f06ccf7035386fa8ae721b4b1a71	2016-10-01 11:03:29 -07:00
Martin Storsjo	9255328f27	Remove a stray END declaration in loopfilter_4_neon.asm Change-Id: Ic8c359a5677f9c663787aac74f530e886163bc69	2016-10-01 14:12:42 +03:00
Linfeng Zhang	da14d23e44	Merge "Refactor vpx lpf NEON files (step 2/2)"	2016-10-01 00:07:51 +00:00
Linfeng Zhang	edbca72a53	Merge "Refactor vpx lpf NEON files (step 1/2)"	2016-10-01 00:07:31 +00:00
James Zern	db80c23fd4	cosmetics,*_neon.c: rm redundant return from void fns + a couple of 'break's after a return Change-Id: Ia21f12ebcef98244feb923c17b689fc8115da015	2016-09-30 13:09:57 -07:00
James Zern	b6277a47c7	Merge changes from topic '8bit-hbd-idct' * changes: idct_neon.c: add missing rtcd include idct,msa/neon: exclude idct files from hbd build *rtcd_defs.pl: remove empty specialize calls	2016-09-30 19:36:08 +00:00
James Zern	1396d12103	idct_neon.c: add missing rtcd include + correct declarations as necessary BUG=webm:1294 Change-Id: I719602df9a56e79188a78e7f8b31257c6d3cc11d	2016-09-30 11:41:26 -07:00
James Zern	b51c4df93a	idct,msa/neon: exclude idct files from hbd build these functions are incompatible currently and unreferenced in rtcd, exclude them from the build. BUG=webm:1294 Change-Id: I7790c195a91e1b142f56c04d2a5e305d9133b896	2016-09-30 11:32:47 -07:00
Linfeng Zhang	ca2fe7a8c7	Refactor vpx lpf NEON files (step 2/2) Change-Id: I0744407cd3361ff752bd7f6e654b70ab6b41a58f	2016-09-30 09:56:28 -07:00
Linfeng Zhang	4779f5308d	Refactor vpx lpf NEON files (step 1/2) Change-Id: I4016d096d46ca691f3b17199b259b7231e983cfb	2016-09-30 09:48:54 -07:00
Linfeng Zhang	8c744fd978	Merge "Unify loopfilter function names"	2016-09-30 15:58:08 +00:00
Linfeng Zhang	c435b7fbdd	Merge "Refine vpx convolve8 NEON intrinsics optimization"	2016-09-30 15:56:31 +00:00
Linfeng Zhang	bde905cba1	Merge "Refine vpx_convolve_copy_neon() and vpx_convolve_avg_neon()"	2016-09-30 15:54:02 +00:00
James Zern	ed62d27c71	*rtcd_defs.pl: remove empty specialize calls add_proto adds a 'c' specialization Change-Id: I0ed0c2240d45264b0e0056ce7c8f63f4a00780bc	2016-09-29 20:38:26 -07:00
Linfeng Zhang	7f1f35183a	Unify loopfilter function names Rename vpx_lpf_horizontal_edge_8() to vpx_lpf_horizontal_16(). Rename vpx_lpf_horizontal_edge_16() to vpx_lpf_horizontal_16_dual(). Change-Id: I798ca8fbbd657d06d3db2bfb0fb3321168f49e52	2016-09-29 16:25:42 -07:00
Linfeng Zhang	85a9e48d25	Refine vpx_convolve_copy_neon() and vpx_convolve_avg_neon() BUG=webm:1290 Change-Id: Ia27e58521eba5a4852b50381c56746fa5767f6d6	2016-09-29 16:19:39 -07:00
Johann Koenig	ad55b1d270	Merge changes Ia3e9122f,Id33eb6c8,I956bd8ce * changes: Remove vp8_clear_system_state vpx_dsp: clean up rtcd vp8: clean up rtcd	2016-09-29 23:16:45 +00:00
Linfeng Zhang	b3cb065ee4	Refine vpx convolve8 NEON intrinsics optimization BUG=webm:1290 Change-Id: I5d7fce62270f9d76ef9ce98b3d188ad11fb21873	2016-09-29 12:48:59 -07:00
Johann	7b5a348088	vpx_dsp: clean up rtcd Remove avx2+ssse3 specialization. Disabling ssse3 now automatically disables avx2. Change-Id: Id33eb6c85d1c4ee57128ebe45c995eb15cfcc765	2016-09-29 12:10:07 -07:00
James Zern	93c823e24b	vpx_dsp/get_prob: relocate den == 0 test to get_binary_prob(). the only other caller mode_mv_merge_probs() does its own test on 0. BUG=chromium:639712 Change-Id: I1178688706baeca2883f7aadbc254abb219a44ce	2016-09-28 17:42:49 -07:00
James Zern	7481edb33f	vpx_dsp/get_prob: make clip_prob branchless + inline the function directly as there was only one consumer (get_prob()) this is an attempt to reduce the amount of branches to workaround an amd bug. this change is mildly faster or neutral across x86-64, arm. http://support.amd.com/TechDocs/44739_12h_Rev_Gd.pdf 665 Integer Divide Instruction May Cause Unpredictable Behavior BUG=chromium:639712 Suggested-by: Pascal Massimino <pascal.massimino@gmail.com> Change-Id: Ia91823aded79aab469dd68095d44300e8df04ed2	2016-09-28 11:51:46 -07:00
Johann	02fa245d15	mips: clean up wextra warnings Remove unused zbin variable: warning: unused parameter ‘zbin’ Use int for loop variables to avoid unsigned conversion: warning: comparison between signed and unsigned integer expressions Change-Id: Icea74b870c0ee68a8bf687e796a69392af25a8ad	2016-09-27 13:19:18 -07:00
Urvang Joshi	0aa3e2564f	Add compiler warning flag -Wextra and fix related warnings. Note: some of these warnings are enabled by a combination of -Wunused (added earlier) and -Wextra. Cherry-picked from AOM 4790a69faaec8f03d65f64ff070f6ab4307dbb16 Expands use of (void)x; on unused variables. AOM only supports one codec in codec_factory.h Does not include changes to HandleDecodeResult. AOM removed invalid_file_test.cc which does use the video parameter. Does not enable -Wextra yet. There are more issues to fix. BUG=webm:1069 Change-Id: I322a1366bd4fd6c0dec9e758c2d5e88e003b1cbf	2016-09-27 12:05:01 -07:00
Linfeng Zhang	b46243d7ff	Merge "Refactor lpf (size 4 and 8) NEON intrinsics optimization"	2016-09-26 16:11:12 +00:00
James Zern	deadda3dea	Merge "vpx_idct32x32_34_add_sse2: rm unneeded transposes"	2016-09-23 02:49:26 +00:00
James Zern	fdd1186f97	vpx_idct32x32_34_add_sse2: rm unneeded transposes this change is neutral to mildly positive across various x86-64 platforms Change-Id: I28fb5ae598fc1317b7a42c9a846ac5d57d104784	2016-09-21 19:49:25 -07:00
James Zern	e372bfd5ac	variance_neon: sync variance*() w/c,sse2 removes some unnecessary casts and adds a few explicit uint32 ones for larger sizes to quiet -Wshorten-64-to-32 warnings Change-Id: I63c5fce8e62c426d5cf5c10a66a113c119a43518	2016-09-21 18:04:45 -07:00
Linfeng Zhang	761e5ec2f6	Refactor lpf (size 4 and 8) NEON intrinsics optimization Also check in 8x8 8-bit transpose NEON intrinsics optimization transpose_u8_8x8() Change-Id: I32d321cf97ea21eab158ac4896990fc9a51681c4	2016-09-19 16:41:37 -07:00
James Zern	6acd061aad	variance_avx2: sync variance functions with c-code add missing int64 -> uint32 cast; quiets -Wshorten-64-to-32 warnings Change-Id: I4850b36e18dc8b399108342be4bfe0b684aefb78	2016-09-19 16:19:29 -07:00
James Zern	aa0eb67bf7	loopfilter_mb_neon: remove unused load_8x8() quiets a -Wunused-function warning for arm targets Change-Id: I293a7e3d3d7d61d6af2fbedad5e8c25126c418b6	2016-09-17 11:00:31 -07:00
Linfeng Zhang	5d73639d8f	Merge "Refactor lpf (size 16) NEON intrinsics optimization"	2016-09-17 00:33:30 +00:00

... 2 3 4 5 6 ...

721 Commits