generic-library/vpx

Author	SHA1	Message	Date
Johann	eae7cf2368	fdct16x16 neon optimization Roughly 2x speedup. Since the only change for HBD is to store(), the improvement appears to hold there as well. BUG=webm:1424 Change-Id: I15b813d50deb2e47b49a6b0705945de748e83c19	2017-06-07 14:59:55 -07:00
Johann Koenig	755b3daf90	Merge "comp_avg_pred neon: used by sub pixel avg variance"	2017-05-31 18:17:28 +00:00
Johann	f695b30ac2	comp_avg_pred neon: used by sub pixel avg variance BUG=webm:1423 Change-Id: I33de537f238f58f89b7a6c1c2d6e8110de4b8804	2017-05-30 22:47:34 +00:00
Johann	42ce25821d	remove DECLARE_ALIGNED from neon code Unlike x86 neon only requires type alignment when loading into vectors. Change-Id: I7bbbe4d51f78776e499ce137578d8c0effdbc02f	2017-05-26 10:41:57 -07:00
Johann	f3c97ed32e	subpel variance neon: reduce stack usage Unlike x86, arm does not impose additional alignment restrictions on vector loads. For incoming values to the first pass, it uses vld1_u32() which typically does impose a 4 byte alignment. However, as the first pass operates on user-supplied values we must prepare for unaligned values anyway (and have, see mem_neon.h). But for the local temporary values there is no stride and the load will use vld1_u8 which does not require 4 byte alignment. There are 3 temporary structures. In the C, one is uint16_t. The arm saturates between passes but still passes tests. If this becomes an issue new functions will be needed. Change-Id: I3c9d4701bfeb14b77c783d0164608e621bfecfb1	2017-05-24 13:28:13 -07:00
Johann	d204c4bf01	Use vdup instead of vmov Change-Id: Idb6248c1429b55176bb3e9f4e8365ea0ed2be62a	2017-05-24 11:38:15 -07:00
Johann	f6fcd3410d	sub pel avg variance neon: 4x block sizes BUG=webm:1423 Change-Id: Iaab2b9a183fdb54aae5f717aba95d90dc36a9e3b	2017-05-22 14:40:05 -07:00
Johann	188d58eaa9	sub pel variance neon: 4x block sizes Add optimizations for blocks of width 4 BUG=webm:1423 Change-Id: Idfb458d36db3014d48fbfbe7f5462aa6eb249938	2017-05-22 14:40:01 -07:00
Johann	9b0d306a2f	sub pel avg variance neon: add neon optimizations These are missing an optimized version of vpx_comp_avg_pred BUG=webm:1423 Change-Id: I31fa6ef842e98f7ff3ea079ffed51ae33178e2ed	2017-05-22 13:58:43 -07:00
Johann	e0d294c3af	sub pel variance neon: normalize variable names match vpx_dsp/variance.c variable names Change-Id: I228c6f296c183af147b079b7c8bcdf97bd09cf3a	2017-05-22 13:58:43 -07:00
Johann	67ac68e399	variance neon: assert overflow conditions Change-Id: I12faca82d062eb33dc48dfeb39739b25112316cd	2017-05-22 11:25:06 -07:00
Johann	d217c87139	neon variance: special case 4x The sub pixel variance uses a temp buffer which guarantees width == stride. Take advantage of this with the 4x and avoid the very costly lane loads. Change-Id: Ia0c97eb8c29dc8dfa6e51a29dff9b75b3c6726f1	2017-05-22 10:51:31 -07:00
Johann Koenig	e7cac13016	Merge changes Ib8dd96f7,Ie9854b77 * changes: neon variance: process 4x blocks use memcpy for unaligned neon stores	2017-05-22 17:48:33 +00:00
Johann Koenig	b5055002d7	Merge "neon 4 byte helper functions"	2017-05-19 17:11:30 +00:00
Johann	7b742da63e	neon variance: process 4x blocks Continue processing sets of 16 values. Plenty of improvement for 4x8 (doubles the speed) but only about 30% for 4x4. BUG=webm:1422 Change-Id: Ib8dd96f75d474f0348800271d11e58356b620905	2017-05-17 17:35:01 -07:00
Johann	2057d3ef75	use memcpy for unaligned neon stores Advise the compiler that the store is eventually going to a uint8_t buffer. This helps avoid getting alignment hints which would cause the memory access to fail. Originally added as a workaround for clang: https://bugs.llvm.org//show_bug.cgi?id=24421 Change-Id: Ie9854b777cfb2f4baaee66764f0e51dcb094d51e	2017-05-17 12:11:31 -07:00
Johann	105503b839	neon fdct: 4x4 implementation Approximately twice as fast as C implementation. BUG=webm:1424 Change-Id: I3c0307fb08ddc23df42545cd089a78e2ed5c9d3f	2017-05-17 07:38:18 -07:00
Johann	7498fe2e54	neon 4 byte helper functions When data is guaranteed to be aligned, use helper functions which assert that requirement. Change-Id: Ic4b188593aea0799d5bd8eda64f9858a1592a2a3	2017-05-15 13:42:31 -07:00
Johann	1088b4f87c	move neon load/stores to a new file Move the tran_low_t helper functions to a new file. Additional load/store functions will be added here. Change-Id: I52bf652c344c585ea2f3e1230886be93f5caefc3	2017-05-15 08:29:43 -07:00
Johann Koenig	d713ec3c46	Merge changes I92eb4312,Ibb2afe4e * changes: subpel variance neon: add mixed sizes sub pixel variance neon: use generic variance	2017-05-10 18:19:52 +00:00
Johann	f7d1486f48	neon variance: process 16 values at a time Read in a Q register. Works on blocks of 16 and larger. Improvement of about 20% for 64x64. The smaller blocks are faster, but don't have quite the same level of improvement. 16x32 is only about 5% BUG=webm:1422 Change-Id: Ie11a877c7b839e66690a48117a46657b2ac82d4b	2017-05-08 18:48:55 +00:00
Johann Koenig	1814463864	Merge changes Id602909a,Ib0e85608 * changes: neon variance: process two rows of 8 at a time neon variance: add small missing sizes	2017-05-08 17:34:20 +00:00
Linfeng Zhang	2c3a2ad6f1	Merge changes I0cfe4117,I3581d80d,Ida62c941 * changes: Split dsp/x86/inv_txfm_sse2.c Update highbd idct functions arguments to use uint16_t dst Clean CONVERT_TO_BYTEPTR/SHORTPTR in idct	2017-05-08 16:15:57 +00:00
Johann	2346a6da4a	subpel variance neon: add mixed sizes Add support for everything except block sizes of 4. Performance is better but numbers will improve again when the variance optimizations land. BUG=webm:1423 Change-Id: I92eb4312b20be423fa2fe6fdb18167a604ff4d80	2017-05-04 15:30:01 -07:00
Johann	19e1ec8359	sub pixel variance neon: use generic variance When a neon version is available it will be called. This allows decoupling the variance implementations and has no real downside. For most configurations, the call will be #define'd to the neon implementation. Change-Id: Ibb2afe4e156c5610e89488504d366b3e6d1ba712	2017-05-04 15:30:01 -07:00
Johann	462e29703c	fdct 8x8 neon: minor comment cleanup Simplify HBD/non distinction in test. Document why transpose_neon.h is not used Change-Id: I17659414206ddbb8c2f1ef0d9f4a17f1745d5a52	2017-05-04 15:14:23 -07:00
Johann	d6a7489dd5	neon variance: process two rows of 8 at a time When the width is equal to 8, process two rows at a time. This doubles the speed of 8x4 and improves 8x8 by about 20%. 8x16 was using this technique already, but still improved a little bit with the rewrite. Also use this for vpx_get8x8var_neon BUG=webm:1422 Change-Id: Id602909afcec683665536d11298b7387ac0a1207	2017-05-04 08:59:46 -07:00
Johann	cb9133c72f	neon variance: add small missing sizes Some of the mixed sizes were missing. They can be implemented trivially using the existing helper function. When comparing the previous 16x8 and 8x16 implementations, the helper function is about 10% faster than the 16x8 version. The 8x16 is very close, but the existing version appears to be faster. BUG=webm:1422 Change-Id: Ib0e856083c1893e1bd399373c5fbcd6271a7f004	2017-05-04 08:59:42 -07:00
Linfeng Zhang	d5de63d2be	Update highbd idct functions arguments to use uint16_t dst BUG=webm:1388 Change-Id: I3581d80d0389b99166e70987d38aba2db6c469d5	2017-05-03 13:59:16 -07:00
Linfeng Zhang	081b39f2b7	Clean CONVERT_TO_BYTEPTR/SHORTPTR in idct BUG=webm:1388 Change-Id: Ida62c941f2b836d6c9e27b427a7d5008ab6dc112	2017-05-03 13:58:31 -07:00
Linfeng Zhang	51dc998f3a	Update highbd convolve functions arguments to use uint16_t src/dst BUG=webm:1388 Change-Id: I6912de2639895d817ce850da8ea9f6c8fe21da42	2017-04-25 14:22:19 -07:00
Linfeng Zhang	bf8a49abbd	Clean CONVERT_TO_BYTEPTR/SHORTPTR in convolve Replace by CAST_TO_BYTEPTR/SHORTPTR. The rule is: if a short ptr is casted to a byte ptr, any offset operation on the byte ptr must be doubled. We do this by casting to short ptr first, adding offset, then casting back to byte ptr. BUG=webm:1388 Change-Id: I9e18a73ba45ddae58fc9dae470c0ff34951fe248	2017-04-19 12:13:49 -07:00
Linfeng Zhang	6fc2e57c2c	Update 32x32 high bitdepth idct NEON optimization Preparation of CONVERT_TO_BYTEPTR/SHORTPTR clean up. BUG=webm:1388 Change-Id: I928d30a5698023bb90888d783cf81c51ec183760	2017-04-05 15:28:11 -07:00
James Zern	f91c3bb3ab	idct_neon: prefix non-static functions w/'vpx_' Change-Id: I94fcdeae18468e6ef0cb7119b8142d982a048031	2017-03-22 11:49:23 -07:00
Linfeng Zhang	27530d484e	Add vpx_highbd_idct32x32_1024_add_neon() BUG=webm:1301 Change-Id: Ib90af0c1712e56b301d0e981dbe9a641e15e36ca	2017-03-17 00:27:46 -07:00
Linfeng Zhang	50b13f75b8	Add vpx_highbd_idct32x32_34_add_neon() BUG=webm:1301 Change-Id: I74dd16c6c64e7bb71aa991cedccddf0663ef5e06	2017-03-17 00:27:46 -07:00
Linfeng Zhang	65e9fb65e8	Add vpx_highbd_idct32x32_135_add_neon() BUG=webm:1301 Change-Id: I58c2d65d385080711c3666d6d8f9d241dac7b21a	2017-03-16 22:37:55 -07:00
Linfeng Zhang	e54231d613	Clean vpx_idct32x32_1024_add_neon() Change-Id: I05921e16d6a3e4e7e5b00a90624735050a186636	2017-03-15 11:24:31 -07:00
Linfeng Zhang	c756eb01c8	Fix overflow issue in 32x32 idct NEON intrinsics Similar issue as Change `bc1c18e`. The PartialIDctTest.ResultsMatch test on vpx_idct32x32_135_add_neon() in high bit-depth mode exposes 16-bit overflow in final stage of pass 2, when changing the test number from 1,000 to 1,000,000. Change to use saturating add/sub for vpx_idct32x32_34_add_neon(), vpx_idct32x32_135_add_neon and vpx_idct32x32_1024_add_neon() in high bit-depth mode. Change-Id: Iaec0e9aeab41a3fdb4e170d7e9b3ad1fda922f6f	2017-03-14 16:59:14 -07:00
Linfeng Zhang	77311e0dff	Update vpx_idct32x32_1024_add_neon() Most are cosmetics changes. Speed has no change with clang 3.8, and about 5% faster with gcc 4.8.4 Tried the strategy used in 8x8 and 16x16 (which operations' orders are similar to the C code), though speed gets better with gcc, it's worse with clang. Tried to remove store_in_output(), but speed gets worse. Change-Id: I93c8d284e90836f98962bb23d63a454cd40f776e	2017-03-08 12:39:04 -08:00
Linfeng Zhang	c4e5c54d69	cosmetics,dsp/arm/: vpx_idct32x32_{34,135}_add_neon() No speed changes and disassembly is almost identical. Change-Id: Id07996237d2607ca6004da5906b7d288b8307e1f	2017-03-08 08:58:32 -08:00
Linfeng Zhang	3cf5c213f1	cosmetics,dsp/arm/: rename a variable Rename cospi_6_26_14_18N to cospi_6_26N_14_18N for consistency. Change-Id: I00498b43bb612b368219a489b3adaa41729bf31a	2017-03-08 08:55:41 -08:00
Linfeng Zhang	0620081731	Add vpx_highbd_idct16x16_10_add_neon() BUG=webm:1301 Change-Id: If686c8144764c4162458f0bc4bb1bbf6555c48ab	2017-02-16 15:13:50 -08:00
Linfeng Zhang	81914ce68a	Add vpx_highbd_idct16x16_38_add_neon() BUG=webm:1301 Change-Id: Ic6cd8c1e63e1b7a997cbed221e20fff4c599e0fe	2017-02-15 09:12:02 -08:00
Linfeng Zhang	429e652809	Replace 14 with DCT_CONST_BITS in idct NEON functions' shifts Change-Id: I2a39a3bb87516b04d273bc1c0f4a634e3fb6f0f6	2017-02-14 13:08:41 -08:00
Linfeng Zhang	de9ae32b93	Merge "Add vpx_highbd_idct16x16_256_add_neon()"	2017-02-14 01:15:34 +00:00
Linfeng Zhang	5ad4159ebb	Add vpx_highbd_idct16x16_256_add_neon() BUG=webm:1301 Change-Id: I6bb755552a39bdd26eef3f449601f6a9766c65ec	2017-02-13 15:50:33 -08:00
Johann	5ecde212a8	fdct8x8 highbd neon: use tran_low_t for output Change-Id: I100c4a1955d80bec4d28e82796b3e7f57e84d0ba	2017-02-13 22:16:14 +00:00
Linfeng Zhang	016933ad48	Add vpx_highbd_idct{16x16,32x32}_1_add_neon() and update vpx_highbd_idct8x8_1_add_neon() BUG=webm:1301 Change-Id: I18d1a0cbe98ba822d5194c1b4e13a4c29c5c75f4	2017-02-13 10:25:22 -08:00
Linfeng Zhang	bc1c18e18c	Add vpx_idct16x16_38_add_neon() The RunQuantCheck() test on it exposes 16-bit overflow in stage 7 of pass 2. Change to use saturating add/sub for both vpx_idct16x16_38_add_neon() and vpx_idct16x16_256_add_neon() for high bitdepth. Change-Id: Ibf4c107a887553a52852cc582e28d38a5a5a2712	2017-02-08 12:15:22 -08:00
Linfeng Zhang	66695533a8	Merge "Update 16x16 8-bit idct NEON intrinsics"	2017-02-07 16:52:40 +00:00
Johann Koenig	726556dde9	Merge "Remove neon assembly for idct 16x16 and 8x8"	2017-02-02 03:25:31 +00:00
Johann Koenig	ce6318f254	Merge changes I43521ad3,I013659f6 * changes: satd highbd neon: use tran_low_t for coeff satd highbd sse2: use tran_low_t for coeff	2017-02-02 03:03:58 +00:00
Linfeng Zhang	e4985cf619	Update 16x16 8-bit idct NEON intrinsics Remove redundant memory accesses. Change-Id: I8049074bdba5f49eab7e735b2b377423a69cd4c8	2017-02-01 17:04:33 -08:00
Johann	f8d744d91a	satd highbd neon: use tran_low_t for coeff BUG=webm:1365 Change-Id: I43521ad32b6c96737a8ef2b8c327f901fd7eaf84	2017-02-01 11:55:47 -08:00
Johann	1eb8a718bf	hadamard highbd neon: use tran_low_t for coeff BUG=webm:1365 Change-Id: I7e15192ead3a3631755b386f102c979f06e26279	2017-02-01 11:50:46 -08:00
Johann	13234d3c43	Remove neon assembly for idct 16x16 and 8x8 Tested using test/partial_idct_test.cc:DISABLED_Speed Both gcc 4.9 and clang 3.8 from the r13 Android NDK offer improvements using the intrinsics: <function> <clang asm> <gcc asm> <clang intrin> <gcc intrin> idct16x16_256 1720ms 1703ms 1546ms 1554ms idct16x16_10 1320ms 1247ms 518ms 488ms idct16x16_1 107ms 108ms 64ms 68ms idct8x8_64 924ms 931ms 866ms 989ms idct8x8_12 826ms 824ms 519ms 514ms idct8x8_1 172ms 166ms 110ms 125ms idct8x8_64 isn't quite perfect (slight regression with gcc intrinsics) but as a counter example idct16x16_10 goes from ~1300ms to ~500ms On a sample clip, clang improved from 48.5 to 49fps and gcc stayed roughly stable. BUG=webm:1303 Change-Id: I9d4fd2b41b46ea6174a887b40a82c8e6e4769ed4	2017-01-19 12:27:31 -08:00
Johann	68d0f46ec0	arm idct16x16: remove extra config guards This file is guarded by HAVE_NEON_ASM in the .mk file now. Change-Id: I513a621c234aa90ad52e426c8ed494d8a7d4b74a	2017-01-11 10:17:14 -08:00
James Zern	9480da21e8	Merge "Refine 8-bit 16x16 idct NEON intrinsics"	2017-01-09 23:52:29 +00:00
Johann	c23970ec25	postproc: vpx_mbpost_proc_down_neon This was much more amenable to optimization than the across filter. Speedup of almost 2.5x BUG=webm:1320 Change-Id: I49acc0f9cb2e7642303df90132cbc938acade4c4	2017-01-09 10:21:56 -08:00
Johann Koenig	9af97fb630	Merge "postproc: vpx_mbpost_proc_across_ip_neon"	2017-01-09 18:17:26 +00:00
Linfeng Zhang	6abdd31555	Refine 8-bit 16x16 idct NEON intrinsics Speed test shows 25% gain on vpx_idct16x16_256_add_neon(), and vpx_idct16x16_10_add_neon() got trippled. Change-Id: If8518d9b6a3efab74031297b8d40cd83c4a49541	2017-01-06 17:52:07 -08:00
Johann	4dca923454	postproc: vpx_mbpost_proc_across_ip_neon The speedup is pretty poor. I would be concerned except the SSE2 is worse: Existing SSE2 improvement: 22% New neon improvement: 35% BUG=webm:1320 Change-Id: Ied598a261134aa6cbe69f96f58589d2bae17bf62	2017-01-06 16:39:17 -08:00
Linfeng Zhang	2d12a52ff0	Merge "Add high bitdepth 8x8 idct NEON intrinsics"	2017-01-06 16:47:23 +00:00
Linfeng Zhang	911bb980b1	Clean DC only idct NEON intrinsics BUG=webm:1301 Change-Id: Iffc83854218460b3f687f3774e71d45b552382a5	2016-12-28 13:51:44 -08:00
Linfeng Zhang	9b187954df	Add high bitdepth 8x8 idct NEON intrinsics BUG=webm:1301 Change-Id: I56e3bc3aab9214e2debac93796389a7194991084	2016-12-27 16:28:53 -08:00
Linfeng Zhang	6d5a3fe583	Clean idct 8x8 neon functions BUG=webm:1301 Change-Id: I05f47dca1fddc155c8396e627cfccf6449677307	2016-12-21 14:24:17 -08:00
James Zern	a68b36c752	vpx_idct32x32_1024_add_neon: quiet uninitialized warning relocate the assignment to 'in' outside of the for loop. this quiets a spurious warning in visual studio builds since: `86e340c` enable vpx_idct32x32_1024_add_neon in hbd builds + give the variable a more descriptive name BUG=webm:1294 Change-Id: I5c3da5c7939621477e0fc0ad3a1b2a3045c5bffd	2016-12-19 12:49:44 -08:00
Linfeng Zhang	7e23f895ca	Merge "Clean hbd idct 4x4 neon functions and other"	2016-12-19 17:09:26 +00:00
Johann	41b0888a84	postproc: neon down and across macroblock filter Implement vpx_post_proc_down_and_across_mb_row in NEON. Runs about 6-7x faster than C. BUG=webm:1320 Change-Id: Ic5c7d3552a88cfcf999ec5bf2bd46fee460642c2	2016-12-14 15:11:28 -08:00
Linfeng Zhang	c8f25fa5c0	Clean hbd idct 4x4 neon functions and other BUG=webm:1301 Change-Id: I387b7eae716a7df15c691dc6f368b07602df7342	2016-12-14 11:38:28 -08:00
James Zern	86e340c76e	enable vpx_idct32x32_1024_add_neon in hbd builds BUG=webm:1294 Change-Id: Ibdda54e6d1303b0f73bc7bc71417e4041d7618de	2016-12-12 19:28:35 -08:00
Linfeng Zhang	5d4aa325a6	Cosmetics by unifying dest_stride to stride in idct Change-Id: Ie9336a808a3c3592bb4fd5d4ad3839028bfcafba	2016-12-12 15:13:22 -08:00
Johann	2c24f7178d	Move load_and_transpose to transpose_neon.h Allows for use outside the idcts without pulling in idct_neon.h Change-Id: I4a94c1af3dac3e1b5bc8296ec9eab0ddcc8cfecf	2016-12-09 12:54:55 -08:00
James Zern	6defef4ab2	idct16x16_add_neon: fix arm visual studio builds after: `2d3d95f` enable vpx_idct16x16_256_add_neon in hbd builds reorder INCLUDEs and fix indent of IF/ENDIFs remove vpx_config.asm to avoid multiple symbol definitions in windows builds and shift idct_neon.asm.S to the top to allow use of CONFIG_VP9_HIGHBITDEPTH in the export list. Change-Id: I0dacfbae62a6ec8fe4a26940c1a52da2dfad2029	2016-12-08 15:17:57 -08:00
Linfeng Zhang	174528de1e	Merge "Update idct NEON optimization to not use narrowing saturating shift"	2016-12-07 21:03:21 +00:00
James Zern	f16a0a1aa4	Merge "enable vpx_idct16x16_256_add_neon in hbd builds"	2016-12-07 20:26:44 +00:00
Linfeng Zhang	018a2adcb1	Update idct NEON optimization to not use narrowing saturating shift Change-Id: Iae517017217dbacd638d40fcfeeb0f4bba7b8b8b	2016-12-07 10:25:09 -08:00
James Zern	2d3d95f7ac	enable vpx_idct16x16_256_add_neon in hbd builds BUG=webm:1294 Change-Id: Ib421c150b0d29dee0a81390a612bf01a4a28cff1	2016-12-06 18:32:21 -08:00
James Zern	228c9940ea	Merge changes Ibad079f2,I7858a0a1 * changes: enable vpx_idct16x16_10_add_neon in hbd builds idct16x16,NEON: rm output_stride from pass1 fns	2016-12-07 01:40:28 +00:00
James Zern	8befcd0089	enable vpx_idct16x16_10_add_neon in hbd builds BUG=webm:1294 Change-Id: Ibad079f25e673d4f5181961896a8a8333a51e825	2016-12-06 16:09:19 -08:00
James Zern	af9d7aa9fb	idct16x16,NEON: rm output_stride from pass1 fns vpx_idct16x16_256_add_neon_pass1, vpx_idct16x16_10_add_neon: this was a constant 8 in all cases meaning the results are stored contiguously, this allows the number of stores to be reduced. Change-Id: I7858a0a15a284883ef45c13dfd97c308df9ea09e	2016-12-06 15:13:33 -08:00
Linfeng Zhang	cb339d628f	Refine 8-bit 8x8 idct NEON intrinsics Change-Id: I4ec4ad1928ec2ed87f596f52f097bc52065278dd	2016-12-05 17:50:14 -08:00
Linfeng Zhang	a8eee97b43	Check in vpx_lpf_vertical_4_dual_neon() assembly This replaces its C version. Change-Id: Ie39e9324305fdc0fff610ced608a037e44a85a1a	2016-12-02 15:54:30 -08:00
James Zern	a7fa1314da	Merge changes I4afc130e,Iaa64d23f * changes: Add high bitdepth 4x4 idct NEON intrinsics Update idct x86 intrinsics to not use saturated add and sub	2016-12-02 04:01:28 +00:00
Linfeng Zhang	17a8cf5cc3	Add high bitdepth 4x4 idct NEON intrinsics Change-Id: I4afc130effa05b8be2e9f982967216b1beb2ce4b	2016-11-30 13:07:13 -08:00
James Zern	c6641782c3	idct16x16,NEON,cosmetics: normalize fn signatures + remove unused parameters from vpx_idct16x16_10_add_neon_pass2 Change-Id: Ie5912a4abdd308fab589380bca054a2e7234a2c4	2016-11-28 16:46:01 -08:00
James Zern	21a1abd8e3	enable vpx_idct32x32_135_add_neon in hbd builds BUG=webm:1294 Change-Id: Ide6d3994fe01c4320c9d143e6d059b49568048e4	2016-11-23 19:59:43 -08:00
James Zern	568d4b1d63	idct_neon: rename load_tran_low_to_s16 -> ...s16q BUG=webm:1294 Change-Id: I164cfcbe9bc4511d1d04af9206cf351a0ec2957b	2016-11-23 19:57:48 -08:00
James Zern	d757d7e998	Merge changes Icc4ead05,Ib019964b,I3b5fd3b3,Ieedadee2 * changes: Update vpx_idct4x4_16_add_neon() to pass SingleExtremeCoeff test Refine 8-bit 4x4 idct NEON intrinsics Add idct speed test. Update partial_idct_test.cc to support high bitdepth	2016-11-24 03:31:25 +00:00
Linfeng Zhang	05e2b5a59f	Merge "Add 32x32 d45 and 8x8, 16x16, 32x32 d135 NEON intra prediction"	2016-11-22 23:20:53 +00:00
Linfeng Zhang	6cc76ec73f	Update vpx_idct4x4_16_add_neon() to pass SingleExtremeCoeff test Change-Id: Icc4ead05506797d12bf134e8790443676fef5c10	2016-11-22 11:35:05 -08:00
Linfeng Zhang	974e81d184	Refine 8-bit 4x4 idct NEON intrinsics Change-Id: Ib019964bfcbce7aec57d8c3583127f9354d3c11f	2016-11-22 11:26:03 -08:00
James Zern	cbeae53e76	Merge "Clean horizontal intra prediction NEON optimization"	2016-11-19 01:29:37 +00:00
Linfeng Zhang	85c1ee434d	Add high bitdepth intra prediction NEON optimization (mode tm) BUG=webm:1316 Change-Id: Ib014de06836ac12726f4a2c9f0833ec4eb4d233b	2016-11-15 14:19:46 -08:00
Linfeng Zhang	a3128ad33a	Add high bitdepth intra prediction NEON optimization (h and v) BUG=webm:1316 Change-Id: I47eeac698a98a31d1af5f72441052302e9fa4f46	2016-11-12 12:00:19 -08:00
James Zern	80f6b243a7	Merge changes I339088b2,Iaade219e,If142afb1,I4257c4b3 * changes: fdct8x8_test: add vpx_idct8x8_64_add_neon in hbd fdct4x4_test: add vpx_idct4x4_16_add_neon in hbd partial_idct_test,NEON: add missing idct variants enable vpx_idct32x32_34_add_neon in hbd builds	2016-11-10 05:02:39 +00:00
Linfeng Zhang	40ab0424d4	Add high bitdepth intra prediction NEON optimization (mode d45 and d135) BUG=webm:1316 Change-Id: I6a330874348df04df24a6d9efdc06f567e04bf8e	2016-11-09 12:04:04 -08:00
James Zern	738c8f23c6	enable vpx_idct32x32_34_add_neon in hbd builds replace load_and_transpose_s16_8x8() in idct32_6_neon() with a separate load_tran_low_to_s16() and transpose_s16_8x8(). the combined function is used in idct32_8_neon() where the input is the correctly sized output from the earlier stage. BUG=webm:1294 Change-Id: I4257c4b3a421b2cf5d13651f966eee0680ef98a9	2016-11-08 17:03:36 -08:00
Johann	50b40f114c	Optimize idct32x32_135_add for NEON BUG=webm:1295 Change-Id: I7f80ef4d29813fcb401fc6075babf19e3c195462	2016-11-08 22:06:07 +00:00

1 2 3 4 5 ...

254 Commits