generic-library/vpx

Author	SHA1	Message	Date
James Zern	9480da21e8	Merge "Refine 8-bit 16x16 idct NEON intrinsics"	2017-01-09 23:52:29 +00:00
Johann Koenig	371a64bfe7	Merge "postproc: vpx_mbpost_proc_down_neon"	2017-01-09 19:53:15 +00:00
Johann Koenig	8a7847c2c9	Merge "Fix mips dspr2 idct32x32 functions for large coefficient input"	2017-01-09 19:47:47 +00:00
Johann Koenig	bf168b24f5	Merge "Fix mips dspr2 idct16x16 functions for large coefficient input"	2017-01-09 19:47:00 +00:00
Johann Koenig	08d0a7fd0f	Merge "Fix mips dspr2 idct8x8 functions for large coefficient input"	2017-01-09 19:46:18 +00:00
Johann Koenig	ab20869221	Merge "Fix mips dspr2 idct4x4 functions for large coefficient input"	2017-01-09 19:45:54 +00:00
Johann	c23970ec25	postproc: vpx_mbpost_proc_down_neon This was much more amenable to optimization than the across filter. Speedup of almost 2.5x BUG=webm:1320 Change-Id: I49acc0f9cb2e7642303df90132cbc938acade4c4	2017-01-09 10:21:56 -08:00
Johann Koenig	9af97fb630	Merge "postproc: vpx_mbpost_proc_across_ip_neon"	2017-01-09 18:17:26 +00:00
Kaustubh Raste	50dd3eb62c	Fix mips dspr2 idct32x32 functions for large coefficient input Change-Id: If9da7099f226a27a09cc9e2899eb66a1158909d2	2017-01-09 17:21:09 +05:30
Kaustubh Raste	c06991fce6	Fix mips dspr2 idct16x16 functions for large coefficient input Change-Id: I9be3d3d040837f658c6314606e28db8c31092a1a	2017-01-09 16:35:28 +05:30
Kaustubh Raste	24d804f79c	Fix mips dspr2 idct8x8 functions for large coefficient input Change-Id: If011dd923bbe976589735d5aa1c3167dda1a3b61	2017-01-09 16:22:19 +05:30
Kaustubh Raste	afd2d797eb	Fix mips dspr2 idct4x4 functions for large coefficient input Change-Id: I06730eec80ca81e0b7436d26232465b79f447e89	2017-01-09 15:28:30 +05:30
Linfeng Zhang	6abdd31555	Refine 8-bit 16x16 idct NEON intrinsics Speed test shows 25% gain on vpx_idct16x16_256_add_neon(), and vpx_idct16x16_10_add_neon() got trippled. Change-Id: If8518d9b6a3efab74031297b8d40cd83c4a49541	2017-01-06 17:52:07 -08:00
Johann	4dca923454	postproc: vpx_mbpost_proc_across_ip_neon The speedup is pretty poor. I would be concerned except the SSE2 is worse: Existing SSE2 improvement: 22% New neon improvement: 35% BUG=webm:1320 Change-Id: Ied598a261134aa6cbe69f96f58589d2bae17bf62	2017-01-06 16:39:17 -08:00
Linfeng Zhang	2d12a52ff0	Merge "Add high bitdepth 8x8 idct NEON intrinsics"	2017-01-06 16:47:23 +00:00
Linfeng Zhang	911bb980b1	Clean DC only idct NEON intrinsics BUG=webm:1301 Change-Id: Iffc83854218460b3f687f3774e71d45b552382a5	2016-12-28 13:51:44 -08:00
Linfeng Zhang	9b187954df	Add high bitdepth 8x8 idct NEON intrinsics BUG=webm:1301 Change-Id: I56e3bc3aab9214e2debac93796389a7194991084	2016-12-27 16:28:53 -08:00
Linfeng Zhang	6d5a3fe583	Clean idct 8x8 neon functions BUG=webm:1301 Change-Id: I05f47dca1fddc155c8396e627cfccf6449677307	2016-12-21 14:24:17 -08:00
James Zern	a68b36c752	vpx_idct32x32_1024_add_neon: quiet uninitialized warning relocate the assignment to 'in' outside of the for loop. this quiets a spurious warning in visual studio builds since: `86e340c` enable vpx_idct32x32_1024_add_neon in hbd builds + give the variable a more descriptive name BUG=webm:1294 Change-Id: I5c3da5c7939621477e0fc0ad3a1b2a3045c5bffd	2016-12-19 12:49:44 -08:00
Linfeng Zhang	7e23f895ca	Merge "Clean hbd idct 4x4 neon functions and other"	2016-12-19 17:09:26 +00:00
Johann	41b0888a84	postproc: neon down and across macroblock filter Implement vpx_post_proc_down_and_across_mb_row in NEON. Runs about 6-7x faster than C. BUG=webm:1320 Change-Id: Ic5c7d3552a88cfcf999ec5bf2bd46fee460642c2	2016-12-14 15:11:28 -08:00
Linfeng Zhang	c8f25fa5c0	Clean hbd idct 4x4 neon functions and other BUG=webm:1301 Change-Id: I387b7eae716a7df15c691dc6f368b07602df7342	2016-12-14 11:38:28 -08:00
James Zern	86e340c76e	enable vpx_idct32x32_1024_add_neon in hbd builds BUG=webm:1294 Change-Id: Ibdda54e6d1303b0f73bc7bc71417e4041d7618de	2016-12-12 19:28:35 -08:00
Linfeng Zhang	5d4aa325a6	Cosmetics by unifying dest_stride to stride in idct Change-Id: Ie9336a808a3c3592bb4fd5d4ad3839028bfcafba	2016-12-12 15:13:22 -08:00
Johann	2c24f7178d	Move load_and_transpose to transpose_neon.h Allows for use outside the idcts without pulling in idct_neon.h Change-Id: I4a94c1af3dac3e1b5bc8296ec9eab0ddcc8cfecf	2016-12-09 12:54:55 -08:00
James Zern	6defef4ab2	idct16x16_add_neon: fix arm visual studio builds after: `2d3d95f` enable vpx_idct16x16_256_add_neon in hbd builds reorder INCLUDEs and fix indent of IF/ENDIFs remove vpx_config.asm to avoid multiple symbol definitions in windows builds and shift idct_neon.asm.S to the top to allow use of CONFIG_VP9_HIGHBITDEPTH in the export list. Change-Id: I0dacfbae62a6ec8fe4a26940c1a52da2dfad2029	2016-12-08 15:17:57 -08:00
Linfeng Zhang	174528de1e	Merge "Update idct NEON optimization to not use narrowing saturating shift"	2016-12-07 21:03:21 +00:00
James Zern	f16a0a1aa4	Merge "enable vpx_idct16x16_256_add_neon in hbd builds"	2016-12-07 20:26:44 +00:00
Linfeng Zhang	018a2adcb1	Update idct NEON optimization to not use narrowing saturating shift Change-Id: Iae517017217dbacd638d40fcfeeb0f4bba7b8b8b	2016-12-07 10:25:09 -08:00
James Zern	2d3d95f7ac	enable vpx_idct16x16_256_add_neon in hbd builds BUG=webm:1294 Change-Id: Ib421c150b0d29dee0a81390a612bf01a4a28cff1	2016-12-06 18:32:21 -08:00
James Zern	228c9940ea	Merge changes Ibad079f2,I7858a0a1 * changes: enable vpx_idct16x16_10_add_neon in hbd builds idct16x16,NEON: rm output_stride from pass1 fns	2016-12-07 01:40:28 +00:00
James Zern	8befcd0089	enable vpx_idct16x16_10_add_neon in hbd builds BUG=webm:1294 Change-Id: Ibad079f25e673d4f5181961896a8a8333a51e825	2016-12-06 16:09:19 -08:00
James Zern	af9d7aa9fb	idct16x16,NEON: rm output_stride from pass1 fns vpx_idct16x16_256_add_neon_pass1, vpx_idct16x16_10_add_neon: this was a constant 8 in all cases meaning the results are stored contiguously, this allows the number of stores to be reduced. Change-Id: I7858a0a15a284883ef45c13dfd97c308df9ea09e	2016-12-06 15:13:33 -08:00
Linfeng Zhang	cb339d628f	Refine 8-bit 8x8 idct NEON intrinsics Change-Id: I4ec4ad1928ec2ed87f596f52f097bc52065278dd	2016-12-05 17:50:14 -08:00
Linfeng Zhang	a8eee97b43	Check in vpx_lpf_vertical_4_dual_neon() assembly This replaces its C version. Change-Id: Ie39e9324305fdc0fff610ced608a037e44a85a1a	2016-12-02 15:54:30 -08:00
James Zern	a7fa1314da	Merge changes I4afc130e,Iaa64d23f * changes: Add high bitdepth 4x4 idct NEON intrinsics Update idct x86 intrinsics to not use saturated add and sub	2016-12-02 04:01:28 +00:00
Linfeng Zhang	17a8cf5cc3	Add high bitdepth 4x4 idct NEON intrinsics Change-Id: I4afc130effa05b8be2e9f982967216b1beb2ce4b	2016-11-30 13:07:13 -08:00
Linfeng Zhang	264f6e70ec	Update idct x86 intrinsics to not use saturated add and sub Change-Id: Iaa64d23fdb45ca1f235b0ea57e614516e548eca4	2016-11-29 17:06:08 -08:00
James Zern	c6641782c3	idct16x16,NEON,cosmetics: normalize fn signatures + remove unused parameters from vpx_idct16x16_10_add_neon_pass2 Change-Id: Ie5912a4abdd308fab589380bca054a2e7234a2c4	2016-11-28 16:46:01 -08:00
James Zern	21a1abd8e3	enable vpx_idct32x32_135_add_neon in hbd builds BUG=webm:1294 Change-Id: Ide6d3994fe01c4320c9d143e6d059b49568048e4	2016-11-23 19:59:43 -08:00
James Zern	568d4b1d63	idct_neon: rename load_tran_low_to_s16 -> ...s16q BUG=webm:1294 Change-Id: I164cfcbe9bc4511d1d04af9206cf351a0ec2957b	2016-11-23 19:57:48 -08:00
James Zern	d757d7e998	Merge changes Icc4ead05,Ib019964b,I3b5fd3b3,Ieedadee2 * changes: Update vpx_idct4x4_16_add_neon() to pass SingleExtremeCoeff test Refine 8-bit 4x4 idct NEON intrinsics Add idct speed test. Update partial_idct_test.cc to support high bitdepth	2016-11-24 03:31:25 +00:00
Jerome Jiang	97ec6291ee	Change C/MSA post proc to match SSE2. BUG=webm:1321 Change-Id: I719023375dc48cf7d8ed72188853f0f1ccc4ad7f	2016-11-23 10:42:11 -08:00
Linfeng Zhang	05e2b5a59f	Merge "Add 32x32 d45 and 8x8, 16x16, 32x32 d135 NEON intra prediction"	2016-11-22 23:20:53 +00:00
Linfeng Zhang	6cc76ec73f	Update vpx_idct4x4_16_add_neon() to pass SingleExtremeCoeff test Change-Id: Icc4ead05506797d12bf134e8790443676fef5c10	2016-11-22 11:35:05 -08:00
Linfeng Zhang	974e81d184	Refine 8-bit 4x4 idct NEON intrinsics Change-Id: Ib019964bfcbce7aec57d8c3583127f9354d3c11f	2016-11-22 11:26:03 -08:00
Kaustubh Raste	ecc5998bcf	Fix mips dspr2 build warning Change-Id: Ia8fb3ed124f01384e7896e309c9ff22c05b40719	2016-11-22 17:49:17 +05:30
Kaustubh Raste	a38e9f412d	Merge "Fix SingleLargeCoeff idct test"	2016-11-19 03:37:29 +00:00
James Zern	cbeae53e76	Merge "Clean horizontal intra prediction NEON optimization"	2016-11-19 01:29:37 +00:00
Jerome Jiang	de5fd00ec5	Change _xmm to _sse2 in deblocker assembly functions. Some cosmetic changes because xmm is an anachronism. Change-Id: I436a5b78a3c52776c20d6640939311f2a84a9bc7	2016-11-17 23:38:04 +00:00
Kaustubh Raste	c56e5dd620	Fix SingleLargeCoeff idct test Updated idct code to handle single large coefficient (-32768) Change-Id: Ia13ab1ab434a9a1b9954a5914088977a88841cc7	2016-11-17 11:41:07 +00:00
Jerome Jiang	5d48663e04	Merge "Change C and msa to match results from sse2."	2016-11-17 05:16:27 +00:00
Jerome Jiang	cb1b1b8fef	Change C and msa to match results from sse2. Re-enable the tests to check CvsAssembly. BUG=webm:1321 Change-Id: Id7f7d74b06c469fb6c8f5d04e91359e9cd9097a6	2016-11-16 17:05:26 -08:00
Linfeng Zhang	85c1ee434d	Add high bitdepth intra prediction NEON optimization (mode tm) BUG=webm:1316 Change-Id: Ib014de06836ac12726f4a2c9f0833ec4eb4d233b	2016-11-15 14:19:46 -08:00
Linfeng Zhang	a3128ad33a	Add high bitdepth intra prediction NEON optimization (h and v) BUG=webm:1316 Change-Id: I47eeac698a98a31d1af5f72441052302e9fa4f46	2016-11-12 12:00:19 -08:00
James Zern	80f6b243a7	Merge changes I339088b2,Iaade219e,If142afb1,I4257c4b3 * changes: fdct8x8_test: add vpx_idct8x8_64_add_neon in hbd fdct4x4_test: add vpx_idct4x4_16_add_neon in hbd partial_idct_test,NEON: add missing idct variants enable vpx_idct32x32_34_add_neon in hbd builds	2016-11-10 05:02:39 +00:00
Linfeng Zhang	40ab0424d4	Add high bitdepth intra prediction NEON optimization (mode d45 and d135) BUG=webm:1316 Change-Id: I6a330874348df04df24a6d9efdc06f567e04bf8e	2016-11-09 12:04:04 -08:00
James Zern	738c8f23c6	enable vpx_idct32x32_34_add_neon in hbd builds replace load_and_transpose_s16_8x8() in idct32_6_neon() with a separate load_tran_low_to_s16() and transpose_s16_8x8(). the combined function is used in idct32_8_neon() where the input is the correctly sized output from the earlier stage. BUG=webm:1294 Change-Id: I4257c4b3a421b2cf5d13651f966eee0680ef98a9	2016-11-08 17:03:36 -08:00
Johann	50b40f114c	Optimize idct32x32_135_add for NEON BUG=webm:1295 Change-Id: I7f80ef4d29813fcb401fc6075babf19e3c195462	2016-11-08 22:06:07 +00:00
Linfeng Zhang	64a5a8fd6f	Merge "Add high bitdepth intra prediction NEON optimization (mode dc)"	2016-11-08 16:53:42 +00:00
Linfeng Zhang	d545c19afa	Rename vpx_highbd_idct8x8_10{}() to vpx_highbd_idct8x8_12{}() Also update its trigger threshold from 10 to 12. Change-Id: Ib8dddd87a5a22a12ca66e7084d342fbb027b0a2f	2016-11-07 09:07:55 -08:00
Linfeng Zhang	a9874961f0	Merge "Replace highbd_dct_const_round_shift with dct_const_round_shift"	2016-11-07 16:55:01 +00:00
Johann	e10c95dc83	Update vp9_fdct8x8_quant_ssse3 for highbitdepth Borrow transition functions from fdct.h nee vpx_quantize_b_sse2 BUG=webm:1304 Change-Id: I9c88c3eec3ff8bb461411d98c26c3c236ea28ef1	2016-11-05 01:23:07 +00:00
Linfeng Zhang	04c3bf3c85	Replace highbd_dct_const_round_shift with dct_const_round_shift They are identical. Change-Id: I1ccaf03c81c3cbf88e82d77ffeb8204f5b063c61	2016-11-04 16:15:02 -07:00
Linfeng Zhang	32326c2f13	Merge "Cosmetics of inv_txfm.c"	2016-11-04 22:40:03 +00:00
Johann Koenig	900ec31bea	Merge "Extract high bit depth helper functions"	2016-11-04 21:03:17 +00:00
Linfeng Zhang	b68d8107cb	Cosmetics of inv_txfm.c Unify code of 8-bit and high bitdepth. Change-Id: I3fe441577af0249030ca3a1ef769eb9030711434	2016-11-04 13:24:41 -07:00
Johann	cf35ffc025	Extract high bit depth helper functions These can be used in the vp9 fdct as well. Change-Id: I4f3875e0cba1b8cad209c3a0581e121deba7675e	2016-11-04 18:13:51 +00:00
Martin Storsjo	34c35b6fb6	Add a missing END directive in idct_neon.asm This fixes building with MS armasm. Change-Id: I2629eeed859b775ca667a65ba109f8d1bf7b0e03	2016-11-04 12:21:18 +02:00
Linfeng Zhang	1338c71dfb	Clean horizontal intra prediction NEON optimization Change-Id: I1ef0a5b2655cbc7e1cc2a4a1a72e0eed9aa41f05	2016-11-02 11:43:45 -07:00
Linfeng Zhang	1868582e7d	Add 32x32 d45 and 8x8, 16x16, 32x32 d135 NEON intra prediction Change-Id: I852616794244490123eb615ac750da50265f0fa5	2016-11-02 11:40:37 -07:00
Johann Koenig	5ac7a59a05	Merge "arm idct: move to-be-shared code to header"	2016-11-02 18:09:45 +00:00
Linfeng Zhang	3b74066b10	Add high bitdepth intra prediction NEON optimization (mode dc) BUG=webm:1316 Change-Id: I984d6004ea2445e86f213fb6fa4d794a9955af8f	2016-11-01 17:07:36 -07:00
Johann	bf8ab194ee	arm idct: move to-be-shared code to header Change-Id: I67458cd358b4dc4434bbdbfcdd571769561b619e	2016-11-01 15:43:56 -07:00
James Zern	1b275ab898	Merge "idct32x32_1_add_neon: clear a couple conv warnings"	2016-11-01 22:34:59 +00:00
James Zern	9de91855ef	Merge changes I08af3a54,If5959a25,I6763e62e * changes: build/make/Android.mk: s/armv8/arm64/ build/make/Android.mk: fix armeabi-v7a build use .S suffix rather than .s for NEON asm	2016-11-01 21:43:13 +00:00
Linfeng Zhang	cc5f49767a	Refine 8-bit intra prediction NEON optimization (mode tm) Change-Id: I98b9577ec51367df5e5d564bedf7c3ea0606de4c	2016-11-01 09:45:16 -07:00
James Zern	7625c803b3	idct32x32_1_add_neon: clear a couple conv warnings int16_t -> uint8_t Change-Id: I3c5e0985bc3584dce289c35b5973de24cdc73b76	2016-10-31 18:56:34 -07:00
James Zern	1ddb4c0362	use .S suffix rather than .s for NEON asm for compatibility with other build systems Change-Id: I6763e62e3126850ad4f8ad29e388b8dad0bbc4c3	2016-10-31 16:39:05 -07:00
James Zern	410d947c5f	Merge "idct,NEON: add a tran_low_t->s16 load adapter"	2016-10-31 21:59:12 +00:00
James Zern	3ae25974fd	idct,NEON: add a tran_low_t->s16 load adapter enable idct4x4* and idct8x8* which are compatible for 8-bit decodes in high-bitdepth mode. the adapter narrows 32-bit input to 16, whether the expansion can be avoided at all in this case remains a TODO. roughly matches sse2. BUG=webm:1294 Change-Id: I3ea94e5a2070dfd509b5de0c555aab4e1f4da036	2016-10-31 11:21:16 -07:00
Linfeng Zhang	a347118f3c	Refine 8-bit intra prediction NEON optimization (mode h and v) Change-Id: I45e1454c3a85e081bfa14386e0248f57e2a91854	2016-10-31 10:33:44 -07:00
Linfeng Zhang	4ae9f5c092	Refine 8-bit intra prediction NEON optimization (mode d45 and d135) dst += stride behaving better with gcc/clang. Unroll loops. Change-Id: I83f85df2bc9f17c6159542f57680b509395db2b1	2016-10-27 14:24:50 -07:00
Linfeng Zhang	9c0680bd43	Merge "Refine 8-bit intra prediction NEON optimization (mode dc)"	2016-10-26 16:51:44 +00:00
Johann	9720b58aac	Optimize idct32x32_34_add for NEON Approximately 3 times faster than the 1024 version which was used previously. BUG=webm:1295 Change-Id: Id15fb3d096029ec38ef01c53e5f6eb08254347c9	2016-10-25 15:43:58 -07:00
Linfeng Zhang	ce88b8f5c5	Refine 8-bit intra prediction NEON optimization (mode dc) dst += stride behaving better with gcc/clang Expanding inline function dc_SIZExSIZE() save intructions for vpx_dc_predictor_SIZExSIZE_neon(). Change-Id: Id0ccbd58b6a31df539141fd33bdf28633339150d	2016-10-24 13:18:51 -07:00
James Zern	2e6a1976a0	Merge "remove idct32x32*_add_neon.asm"	2016-10-22 02:29:56 +00:00
James Zern	5d91752a98	Merge "vpx_highbd_convolve_copy_neon: use multi reg loads"	2016-10-22 02:28:15 +00:00
James Zern	9dbb3ad396	remove idct32x32*_add_neon.asm the intrinsics are neutral to ~20% faster on cros/android devices when using gcc-4.9/clang-3.8.1 and gcc-4.9/clang-3.8.x from the r13 ndk. neutral results typically came with gcc-4.9 while larger positive gains were achieved with clang 3.8.x. BUG=webm:1303 Change-Id: I4d31f9c017944681b881493525d4573a7a5b1e16	2016-10-20 19:47:14 -07:00
James Zern	a60dd5c83a	Merge "Fix warnings reported by -Wshadow: Part1: vpx_dsp directory"	2016-10-18 22:09:29 +00:00
Kaustubh Raste	8ff5af773a	Merge "Optimize sad_64width_x4d_msa function"	2016-10-18 07:46:02 +00:00
Kaustubh Raste	b7310e2aff	Optimize sad_64width_x4d_msa function Reduced HADD_UH_U32 macro calls Change-Id: Ie089b9a443de516646b46e8f72156aa826ca8cfa	2016-10-18 04:05:33 +00:00
Urvang Joshi	e084e05484	Fix warnings reported by -Wshadow: Part1: vpx_dsp directory While we are at it: - Rename some variables to more meaningful names - Reuse some common consts from a header instead of redefining them. Change-Id: I75c4248cb75aa54c52111686f139b096dc119328 (cherry picked from aomedia 09eea21)	2016-10-17 19:25:19 -07:00
James Zern	68cd3052ca	vpx_highbd_convolve_copy_neon: use multi reg loads for copy16/32/64 BUG=webm:1299 Change-Id: I5080d736bde7e487c80ef3d7024dda1e96a57eaf	2016-10-17 17:15:03 -07:00
Linfeng Zhang	9c8981c666	add vpx high bitdepth convolve8 NEON intrinsics optimization BUG=webm:1299 Change-Id: I236bfa0441e357b6ff05add8269a2cfb543924d1	2016-10-17 15:23:54 -07:00
Linfeng Zhang	f910d14a1a	add vpx_highbd_convolve_{copy,avg}_neon() BUG=webm:1299 Change-Id: Ib87ac466ada63251eb06ae2abd1e13e61e0d1538	2016-10-13 15:21:14 -07:00
James Zern	1909270f65	Merge "cosmetics,*loopfilter_neon.c: s/tranpose/transpose/"	2016-10-13 07:12:51 +00:00
Kaustubh Raste	9e75c01353	Merge "Optimize vpx_mbpost_proc_across_ip_msa function"	2016-10-13 02:12:33 +00:00
Kaustubh Raste	99adf8b22e	Merge "Optimize vpx_get4x4sse_cs_msa function"	2016-10-13 02:12:00 +00:00
James Zern	fd270437f0	cosmetics,*loopfilter_neon.c: s/tranpose/transpose/ Change-Id: I267d6a9d715ddb6110f0881c2e820c37fc673fe1	2016-10-12 16:12:56 -07:00
Linfeng Zhang	01454ec485	[vpx highbd lpf NEON 6/6] vertical 16 BUG=webm:1300 Change-Id: I29d0b482d66f05e278325ddebcf108fbf0b6e222	2016-10-11 22:59:19 -07:00
Linfeng Zhang	27479775c4	[vpx highbd lpf NEON 5/6] horizontal 16 BUG=webm:1300 Change-Id: I21da32d6cfb8a1a6f58bc9756d17f48f13a59a12	2016-10-11 22:59:19 -07:00
Linfeng Zhang	251cbfbec8	[vpx highbd lpf NEON 4/6] vertical 8 BUG=webm:1300 Change-Id: If06b12bc081bab60059b100414dd7018f83ac62d	2016-10-11 22:59:19 -07:00
Linfeng Zhang	96c7206ede	[vpx highbd lpf NEON 3/6] horizontal 8 BUG=webm:1300 Change-Id: Ica2379e294be60b7f80fcfcec110dca4c3b59d81	2016-10-12 00:48:31 +00:00
Linfeng Zhang	57e4cbc632	Merge "[vpx highbd lpf NEON 2/6] vertical 4"	2016-10-10 16:57:55 +00:00
Linfeng Zhang	19046d9963	Merge "[vpx highbd lpf NEON 1/6] horizontal 4"	2016-10-10 16:56:23 +00:00
Kaustubh Raste	3da752fe00	Optimize vpx_mbpost_proc_across_ip_msa function Removed HADD_SW_S32 calculation Change-Id: I7384dc881451d197404d09beb7c27b222e1d6875	2016-10-10 18:03:28 +05:30
Kaustubh Raste	d05104b488	Optimize vpx_get4x4sse_cs_msa function Reuse CALC_MSE_B macro Change-Id: I39f0a92ac2dbb5fa8628df1a5d556cfdc42a3648	2016-10-10 16:31:57 +05:30
Kaustubh Raste	3c2f7eb339	Optimize vp9 loopfilter msa functions Updated code to process in 8bit as saturation/clipping takes care of overflow Removed unused macro Change-Id: I113df60286fb28b216df800d95b2d3695ef71440	2016-10-07 19:26:26 -07:00
Linfeng Zhang	49aa9b1f12	[vpx highbd lpf NEON 2/6] vertical 4 BUG=webm:1300 Change-Id: Ia33a9f2d6c7e2e6b3497ad6f1a09439a85b33983	2016-10-06 14:22:26 -07:00
Linfeng Zhang	7aa27bd62f	[vpx highbd lpf NEON 1/6] horizontal 4 BUG=webm:1300 Change-Id: Idf441806e6bf397ff5ecd8776146b3f781f50c40	2016-10-06 14:03:04 -07:00
James Zern	1e1caad165	vpx_dsp/idct*_neon.asm: simplify immediate loads mov supports 0-65535 Change-Id: I019de0d784836d7bd60e6b36f2cdeefb541cb3fd	2016-10-05 14:28:32 -07:00
James Zern	a6be7ba1aa	enable idct*_1_add_neon in high-bitdepth builds these are compatible as they only load one element of the input so the larger size of tran_low_t makes no difference in little endian builds. note the asm is incompatible with big-endian, but there are other points of failure there so currently it's considered unsupported. BUG=webm:1294 Change-Id: Icd2665a0699bccae92d1bea43a95b0a83fb17028	2016-10-05 11:14:25 -07:00
Angie Chiang	5d635365bb	Merge "Move highbd txfm input range check from 2d iht transform to 1d idct/iadst"	2016-10-04 16:57:37 +00:00
Kaustubh Raste	0a92dd7319	Merge "Fix vpx_plane_add_noise_msa functionality bit-mismatch"	2016-10-04 06:35:47 +00:00
Angie Chiang	5b073c695b	Move highbd txfm input range check from 2d iht transform to 1d idct/iadst This change will make the highbd txfm input range check more comprehensive The 25-bit highbd input range is composed by 12 signal input bits + 7 bits for 2D forward transform amplification + 5 bits for 1D inverse transform amplification + 1 bit for contingency in rounding and quantizing BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1286 BUG=https://bugs.chromium.org/p/chromium/issues/detail?id=651625 Change-Id: I04c0796edd7653f8d463fba5dc418132986131e7	2016-10-03 17:21:08 -07:00
James Zern	c6bc7499d9	Merge "cosmetics,*_neon.c: rm redundant return from void fns"	2016-10-03 22:40:42 +00:00
Kaustubh Raste	6922fc8230	Fix vpx_plane_add_noise_msa functionality bit-mismatch Change-Id: I04961afb592ae6a67fdcfd8c9066e920dd4b30e7	2016-10-03 18:15:59 +00:00
James Zern	50b9c467da	Merge "vpx_convolve8_neon,load/store*: correct param type"	2016-10-01 23:52:14 +00:00
James Zern	c449983c56	vpx_convolve8_neon,load/store*: correct param type stride/pitch in convolve is expressed with a ptrdiff_t Change-Id: Ia5a6732dc509f06ccf7035386fa8ae721b4b1a71	2016-10-01 11:03:29 -07:00
Martin Storsjo	9255328f27	Remove a stray END declaration in loopfilter_4_neon.asm Change-Id: Ic8c359a5677f9c663787aac74f530e886163bc69	2016-10-01 14:12:42 +03:00
Linfeng Zhang	da14d23e44	Merge "Refactor vpx lpf NEON files (step 2/2)"	2016-10-01 00:07:51 +00:00
Linfeng Zhang	edbca72a53	Merge "Refactor vpx lpf NEON files (step 1/2)"	2016-10-01 00:07:31 +00:00
James Zern	db80c23fd4	cosmetics,*_neon.c: rm redundant return from void fns + a couple of 'break's after a return Change-Id: Ia21f12ebcef98244feb923c17b689fc8115da015	2016-09-30 13:09:57 -07:00
James Zern	b6277a47c7	Merge changes from topic '8bit-hbd-idct' * changes: idct_neon.c: add missing rtcd include idct,msa/neon: exclude idct files from hbd build *rtcd_defs.pl: remove empty specialize calls	2016-09-30 19:36:08 +00:00
James Zern	1396d12103	idct_neon.c: add missing rtcd include + correct declarations as necessary BUG=webm:1294 Change-Id: I719602df9a56e79188a78e7f8b31257c6d3cc11d	2016-09-30 11:41:26 -07:00
James Zern	b51c4df93a	idct,msa/neon: exclude idct files from hbd build these functions are incompatible currently and unreferenced in rtcd, exclude them from the build. BUG=webm:1294 Change-Id: I7790c195a91e1b142f56c04d2a5e305d9133b896	2016-09-30 11:32:47 -07:00
Linfeng Zhang	ca2fe7a8c7	Refactor vpx lpf NEON files (step 2/2) Change-Id: I0744407cd3361ff752bd7f6e654b70ab6b41a58f	2016-09-30 09:56:28 -07:00
Linfeng Zhang	4779f5308d	Refactor vpx lpf NEON files (step 1/2) Change-Id: I4016d096d46ca691f3b17199b259b7231e983cfb	2016-09-30 09:48:54 -07:00
Linfeng Zhang	8c744fd978	Merge "Unify loopfilter function names"	2016-09-30 15:58:08 +00:00
Linfeng Zhang	c435b7fbdd	Merge "Refine vpx convolve8 NEON intrinsics optimization"	2016-09-30 15:56:31 +00:00
Linfeng Zhang	bde905cba1	Merge "Refine vpx_convolve_copy_neon() and vpx_convolve_avg_neon()"	2016-09-30 15:54:02 +00:00
James Zern	ed62d27c71	*rtcd_defs.pl: remove empty specialize calls add_proto adds a 'c' specialization Change-Id: I0ed0c2240d45264b0e0056ce7c8f63f4a00780bc	2016-09-29 20:38:26 -07:00
Linfeng Zhang	7f1f35183a	Unify loopfilter function names Rename vpx_lpf_horizontal_edge_8() to vpx_lpf_horizontal_16(). Rename vpx_lpf_horizontal_edge_16() to vpx_lpf_horizontal_16_dual(). Change-Id: I798ca8fbbd657d06d3db2bfb0fb3321168f49e52	2016-09-29 16:25:42 -07:00
Linfeng Zhang	85a9e48d25	Refine vpx_convolve_copy_neon() and vpx_convolve_avg_neon() BUG=webm:1290 Change-Id: Ia27e58521eba5a4852b50381c56746fa5767f6d6	2016-09-29 16:19:39 -07:00
Johann Koenig	ad55b1d270	Merge changes Ia3e9122f,Id33eb6c8,I956bd8ce * changes: Remove vp8_clear_system_state vpx_dsp: clean up rtcd vp8: clean up rtcd	2016-09-29 23:16:45 +00:00
Linfeng Zhang	b3cb065ee4	Refine vpx convolve8 NEON intrinsics optimization BUG=webm:1290 Change-Id: I5d7fce62270f9d76ef9ce98b3d188ad11fb21873	2016-09-29 12:48:59 -07:00
Johann	7b5a348088	vpx_dsp: clean up rtcd Remove avx2+ssse3 specialization. Disabling ssse3 now automatically disables avx2. Change-Id: Id33eb6c85d1c4ee57128ebe45c995eb15cfcc765	2016-09-29 12:10:07 -07:00
James Zern	93c823e24b	vpx_dsp/get_prob: relocate den == 0 test to get_binary_prob(). the only other caller mode_mv_merge_probs() does its own test on 0. BUG=chromium:639712 Change-Id: I1178688706baeca2883f7aadbc254abb219a44ce	2016-09-28 17:42:49 -07:00
James Zern	7481edb33f	vpx_dsp/get_prob: make clip_prob branchless + inline the function directly as there was only one consumer (get_prob()) this is an attempt to reduce the amount of branches to workaround an amd bug. this change is mildly faster or neutral across x86-64, arm. http://support.amd.com/TechDocs/44739_12h_Rev_Gd.pdf 665 Integer Divide Instruction May Cause Unpredictable Behavior BUG=chromium:639712 Suggested-by: Pascal Massimino <pascal.massimino@gmail.com> Change-Id: Ia91823aded79aab469dd68095d44300e8df04ed2	2016-09-28 11:51:46 -07:00
Johann	02fa245d15	mips: clean up wextra warnings Remove unused zbin variable: warning: unused parameter ‘zbin’ Use int for loop variables to avoid unsigned conversion: warning: comparison between signed and unsigned integer expressions Change-Id: Icea74b870c0ee68a8bf687e796a69392af25a8ad	2016-09-27 13:19:18 -07:00
Urvang Joshi	0aa3e2564f	Add compiler warning flag -Wextra and fix related warnings. Note: some of these warnings are enabled by a combination of -Wunused (added earlier) and -Wextra. Cherry-picked from AOM 4790a69faaec8f03d65f64ff070f6ab4307dbb16 Expands use of (void)x; on unused variables. AOM only supports one codec in codec_factory.h Does not include changes to HandleDecodeResult. AOM removed invalid_file_test.cc which does use the video parameter. Does not enable -Wextra yet. There are more issues to fix. BUG=webm:1069 Change-Id: I322a1366bd4fd6c0dec9e758c2d5e88e003b1cbf	2016-09-27 12:05:01 -07:00
Linfeng Zhang	b46243d7ff	Merge "Refactor lpf (size 4 and 8) NEON intrinsics optimization"	2016-09-26 16:11:12 +00:00
James Zern	deadda3dea	Merge "vpx_idct32x32_34_add_sse2: rm unneeded transposes"	2016-09-23 02:49:26 +00:00
James Zern	fdd1186f97	vpx_idct32x32_34_add_sse2: rm unneeded transposes this change is neutral to mildly positive across various x86-64 platforms Change-Id: I28fb5ae598fc1317b7a42c9a846ac5d57d104784	2016-09-21 19:49:25 -07:00
James Zern	e372bfd5ac	variance_neon: sync variance*() w/c,sse2 removes some unnecessary casts and adds a few explicit uint32 ones for larger sizes to quiet -Wshorten-64-to-32 warnings Change-Id: I63c5fce8e62c426d5cf5c10a66a113c119a43518	2016-09-21 18:04:45 -07:00
Linfeng Zhang	761e5ec2f6	Refactor lpf (size 4 and 8) NEON intrinsics optimization Also check in 8x8 8-bit transpose NEON intrinsics optimization transpose_u8_8x8() Change-Id: I32d321cf97ea21eab158ac4896990fc9a51681c4	2016-09-19 16:41:37 -07:00
James Zern	6acd061aad	variance_avx2: sync variance functions with c-code add missing int64 -> uint32 cast; quiets -Wshorten-64-to-32 warnings Change-Id: I4850b36e18dc8b399108342be4bfe0b684aefb78	2016-09-19 16:19:29 -07:00
James Zern	aa0eb67bf7	loopfilter_mb_neon: remove unused load_8x8() quiets a -Wunused-function warning for arm targets Change-Id: I293a7e3d3d7d61d6af2fbedad5e8c25126c418b6	2016-09-17 11:00:31 -07:00
Linfeng Zhang	5d73639d8f	Merge "Refactor lpf (size 16) NEON intrinsics optimization"	2016-09-17 00:33:30 +00:00
Linfeng Zhang	8107368000	Refactor lpf (size 16) NEON intrinsics optimization Extract shared code so later lpf size 4 and 8 functions can reuse. Change-Id: Ibb43ef1fd8651bd2e32fcc4c56cf6fa7ca237401	2016-09-16 09:12:13 -07:00
James Zern	33aef48f29	vpx_subpixel_8t_intrin_avx2: tolerate unversioned clang assume __clang_major__==0 has the latest version of _mm256_broadcastsi128_si256. fixes builds with custom clang toolchains. BUG=b/30970831 Change-Id: I90becd56278e4716bd46e2ba9d910af977e8dfa6	2016-09-16 07:14:17 +00:00
clang-format	5f6d143b41	apply clang-format Change-Id: I501597b7c1e0f0c7ae2aea3ee8073f0a641b3487	2016-09-15 15:07:53 -07:00
James Zern	4b0e78bfda	Merge "vpx_dsp: added vpx_highbd_idct32x32_1_add_sse2()"	2016-09-08 01:05:18 +00:00
Scott LaVarnway	309125b1e7	vpx_dsp: added vpx_highbd_idct32x32_1_add_sse2() Change-Id: I140d93aebadb0eaf6220881e61a0451450081227	2016-09-07 05:58:29 -07:00
Johann Koenig	4d1540f8ce	Merge changes from topic 'Wundef' * changes: Enable -Wundef by default Define VP8_TEMPORAL_ALT_REF to !CONFIG_REALTIME_ONLY Remove CONFIG_DEBUG guards from assert() Remove unused function vpx_de_mblock Fix -Wundef warning for OUTPUT_FPF Fix -Wundef warning for __SANITIZE_ADDRESS__	2016-09-02 01:39:18 +00:00
Johann	24f534ac90	Remove unused function vpx_de_mblock vpx_config.h was not included so CONFIG_POSTPROC was never defined. Change-Id: I777de499823afa286734549a8e7f4a93e7ad97f3	2016-08-31 23:01:45 -07:00
Linfeng Zhang	bee7d837ab	Update NEON transpose functions. Unify coding style. Change-Id: I5826f40c02c882df7353391e0c9dd6cef6bd4b97	2016-08-31 14:58:40 -07:00
Linfeng Zhang	f7cbfed682	Update vpx_lpf_vertical_16_dual_neon() intrinsics Process 16 samples together. Change-Id: If6ee8e3377aa2786417f2fc411ba7d87ea8b6799	2016-08-30 11:17:33 -07:00
Linfeng Zhang	4916515511	Update vpx_lpf_horizontal_edge_16_neon() intrinsics Process 16 samples together. Change-Id: I9cfbe04c9d25d8b89f63f48f519e812746db754d	2016-08-27 14:47:48 -07:00
Johann Koenig	a70861c435	Merge "Remove halfpix specialization"	2016-08-26 21:28:01 +00:00
James Zern	3ddff4503a	add_noise,vpx_setup_noise: correct 'char_dist' type fixes SSE2/AddNoiseTest.CheckCvsAssembly/0 with -funsigned-char. visibly broken since: `0dc69c7` postproc : fix function parameters for noise functions. where the types diverged (char vs. int8) but likely the return changed in: `2ca24b0` postproc - move filling of noise buffer to vpx_dsp. when multiple implementations were merged. Change-Id: I176ca1f170217f05ba7872b0c4de63e41949e999	2016-08-24 21:46:26 -07:00
Johann	d393885af1	Remove halfpix specialization This function only exists as a shortcut to subpixel variance with predefined offsets. xoffset = 4 for horizontal, yoffset = 4 for vertical and both for "hv" Removing this allows the existing optimizations for the variance functions to be called. Instead of having only sse2 optimizations, this gives sse2, ssse3, msa and neon. BUG=webm:1273 Change-Id: Ieb407b423b91b87d33c4263c6a1ad5e673b0efd6	2016-08-23 17:05:39 -07:00
Linfeng Zhang	f9efbad392	NEON asm of vpx_lpf_{horizontal,vertical}_8_dual_neon() Also expose the NEON intrinsics version. BUG=webm:1261, webm:1266. Change-Id: I8c4ae658467dcf66ebf7a75982b2ef712dbb4535	2016-08-16 08:50:57 -07:00
James Zern	dfcefe06fa	Merge "variance_impl_avx2: restore table layout"	2016-08-12 23:02:27 +00:00
James Zern	bd7cfb46fb	variance_impl_avx2: restore table layout disable clang-format for bilinear_filters_avx2 restores the row layout prior to: `099bd7f` vpx_dsp: apply clang-format but keeps the justification used by clang-format Change-Id: Icf1733a37edb807e74c26b23a93963c03bd08fd7	2016-08-12 11:52:53 -07:00
Linfeng Zhang	f09b5a3328	NEON intrinsics for 4 loopfilter functions New NEON intrinsics functions: vpx_lpf_horizontal_edge_8_neon() vpx_lpf_horizontal_edge_16_neon() vpx_lpf_vertical_16_neon() vpx_lpf_vertical_16_dual_neon() BUG=webm:1262, webm:1263, webm:1264, webm:1265. Change-Id: I7a2aff2a358b22277429329adec606e08efbc8cb	2016-08-12 09:58:17 -07:00
Johann Koenig	57f49db81f	Merge changes I6ef79702,Id332c641,I354b5d22,I84438013 * changes: Use common transpose for vpx_idct32x32_1024_add_neon Use common transpose for vpx_idct8x8_[12\|64]_add_neon Use common transpose for vp9_iht8x8_add_neon Use common transpose for vpx_idct16x16_[10\|256]_add_neon	2016-08-04 22:30:47 +00:00
Johann Koenig	17720b60bb	Merge "Remove armv6 target"	2016-08-04 22:21:13 +00:00
James Zern	7f7c888c14	Merge "correct break placement"	2016-08-04 22:19:30 +00:00
Johann	0325b95938	Use common transpose for vpx_idct32x32_1024_add_neon Change-Id: I6ef7970206d588761ebe80005aecd35365ec50ff	2016-08-04 20:13:18 +00:00
Johann	f4e4ce7549	Use common transpose for vpx_idct8x8_[12\|64]_add_neon Change-Id: Id332c641f05336ef9a45e17493ff149fd0a168f0	2016-08-04 20:13:12 +00:00
Johann	8619203ddc	Use common transpose for vpx_idct16x16_[10\|256]_add_neon Change-Id: I84438013f483e82084d33ba9a63c33273d35fcaa	2016-08-04 20:12:53 +00:00
Johann Koenig	b757d89ff9	Merge "Extract neon transpose for re-use"	2016-08-04 20:12:38 +00:00
James Zern	70a7885a65	correct break placement these should be placed within {}s when present Change-Id: Ia775fac5373603e77360398f19b07958fb43f476	2016-08-04 13:00:14 -07:00
Johann	d55724fae9	Remove armv6 target Change-Id: I1fa81cc9cabf362a185fc3a53f1e58de533a41e5	2016-08-04 12:55:06 -07:00
Johann	377cfa31f0	Extract neon transpose for re-use Change-Id: I5e1c7f4c80d1c6f7fd582ac468c6eaaa3603a06c	2016-08-04 19:04:25 +00:00
Johann	df69c751a7	Don't expand to Q register for 4x4 intrapred The code was expanding to Q registers so that vqrshn could be used, for vector quad round shift and narrow. If 4 values are added together, there is a shift by 2. If 8 values, a shift by 3. Since this accounts for any possibility of overflow, we can skip the narrowing shift. This allows keeping the values in D registers and casting the 16 bit value to 8 bits. Change-Id: I8d9cfa07176271f492c116ffa6a7b351af0b8751	2016-08-04 18:51:46 +00:00
Alex Converse	d089ac4dda	Resolve -Wshorten-64-to-32 warnings in prob.h. Change-Id: I1244ee908d81467f0fc8a8fce979fc8077a325b4	2016-08-02 15:40:23 -07:00
Alex Converse	3a04c9c9c4	Merge "Resolve -Wshorten-64-to-32 in variance."	2016-08-02 22:26:55 +00:00
Min Chen	407c2e2974	replace by VSTM/VLDM to reduce one of VST1/VLD1 Change-Id: I596567570580babb1a52925541d1fd1045c352f5	2016-07-28 23:01:38 +00:00
Alex Converse	c0241664aa	Resolve -Wshorten-64-to-32 in variance. The subtrahend is small enough to fit into uint32_t. Change-Id: Ic4d7128aaa665eaf6b25d562610ba8942c46137f	2016-07-28 10:16:31 -07:00
clang-format	956af1d478	vpx_dsp/x86/quantize_sse2.c: apply clang-format post: `e429080` .clang-format: disable DerivePointerAlignment Change-Id: I21a0546668edb2b09660e216d4875a1d2ad24d53	2016-07-27 21:41:18 -07:00
clang-format	099bd7f07e	vpx_dsp: apply clang-format Change-Id: I3ea3e77364879928bd916f2b0a7838073ade5975	2016-07-25 14:14:19 -07:00
Ivan Krasin	91369fd9b7	Fix compilation error under Clang 4.0. The LLVM trunk has reached 4.0 and now __clang_major__ is not enough to distinguish between old XCode Clang and the new 'real' Clang. Using __apple_build_version__ allows to make this distinction. BUG=chromium:631144 Change-Id: I0b6e46fddfe4f409c7b7e558bda34872e60ee2d9	2016-07-25 19:18:49 +00:00
Yury Gitman	3d3f51262c	Add VPX_SWAP macro Change-Id: I60e233eddef238ad918183392794084673f27d2d	2016-07-22 15:41:25 -07:00
James Zern	3d791194f8	vpx_plane_add_noise_c: normalize int types quiets signed/unsigned mismatch warning Change-Id: Iaabd7dfff110ba26056258457541f5635d2e85e6	2016-07-16 11:56:55 -07:00
Jim Bankoski	0dc69c70f7	postproc : fix function parameters for noise functions. Change-Id: I582b6307f28bfc987dcf8910379a52c6f679173c	2016-07-15 08:27:34 -07:00
James Bankoski	7eec1f31b5	Merge "postproc: noise style fixes."	2016-07-13 22:04:47 +00:00
Yaowu Xu	d6197b621d	Merge "Fix encoder crashes for odd size input"	2016-07-13 20:05:09 +00:00
Jim Bankoski	e736691a6d	postproc: noise style fixes. Change-Id: Ifdcb36b8e77b65faeeb10644256e175acb32275d	2016-07-13 12:39:01 -07:00
James Bankoski	e93f2fdb83	Merge "postproc - move filling of noise buffer to vpx_dsp."	2016-07-13 15:31:17 +00:00
Jim Bankoski	2ca24b0075	postproc - move filling of noise buffer to vpx_dsp. Change-Id: I63ba35dc0ae9286c9812367a531e01d79a4c1635	2016-07-13 07:35:25 -07:00
Jim Bankoski	b24373fec2	deblock: missing const on extern const. Change-Id: I0df08f7c431daf939e266f008bf5158b0c97358b	2016-07-13 07:27:29 -07:00
Jim Bankoski	6f424a768e	vp9_postproc.c missing extern. BUG=webm:1256 Change-Id: I5271e71bc53cce033fb906040643dcdd5ccb2381	2016-07-12 17:47:49 -07:00
Yaowu Xu	98431cde07	Fix encoder crashes for odd size input Change-Id: Id5c30c419282369cc8c3280d9a70b34a859a71d8	2016-07-12 11:11:26 -07:00
Jim Bankoski	88e6951465	deblock filter : moved from vp8 code branch The deblocking filters used in vp8 have been moved to vpx_dsp for use by both vp8 and vp9. Change-Id: I5209d76edafc894b550f751fc76d3aa6799b392d	2016-07-12 05:53:00 -07:00
Jingning Han	7c1fdf02cd	Merge "Support measure distortion in the pixel domain"	2016-07-07 18:09:20 +00:00
Jingning Han	e357b9efe0	Support measure distortion in the pixel domain Use pixel domain distortion metric in speed 0. This improves the compression performance by 0.3% for both low and high resolution test sets. Change-Id: I5b5b7115960de73f0b5e5d0c69db305e490e6f1d	2016-07-06 18:25:17 -07:00
James Zern	5afa3b9150	Merge "improve vpx_filter_block1d* based on replace paddsw+psrlw to pmulhrsw"	2016-07-02 03:08:33 +00:00
James Zern	3197172405	Merge "Update vpx subpixel 1d filter ssse3 asm"	2016-07-02 03:08:17 +00:00
Johann	1b833d63d9	vpx_dsp: remove x86inc.asm distinction BUG=b:29583530 Change-Id: I397d77536b0d3cee0a92cdfe8b76bc4e434d0720	2016-06-29 18:55:58 -07:00
James Zern	3a6a81fc9a	Merge changes I9433d858,Iafd05637,If08ce6ca * changes: tests: remove redundant round() definition remove visual studio < 2010 workarounds configure: remove old visual studio support (<2010)	2016-06-29 23:07:16 +00:00
Linfeng Zhang	6b350766bd	Update vpx subpixel 1d filter ssse3 asm Speed test shows the new vertical filters have degradation on Celeron Chromebook. Added "X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON" to control the vertical filters activated code. Now just simply active the code without degradation on Celeron. Later there should be 2 set of vertical filters ssse3 functions, and let jump table to choose based on CPU type. Change-Id: Iba2f1f2fe059a9d142c396d03a6b8d2d3b981e87	2016-06-29 13:48:41 -07:00
Yaowu Xu	63a37d16f3	Prevent negative variance Due to rounding, hbd variance may become negative. This commit put in check and clamp of negative values to 0. Change-Id: I610d9c8aa2d4eebe7bc5f2c5624a9e3cadad4c94	2016-06-29 11:08:17 -07:00
James Zern	c125f4a594	remove visual studio < 2010 workarounds BUG=b/29583530 Change-Id: Iafd05637eb65f4da54a9c857e79204a77646858a	2016-06-28 20:58:49 -07:00
James Zern	0afe5e405d	Merge "*.asm: normalize label format"	2016-06-28 19:22:10 +00:00
Yaowu Xu	d34b49d7b9	psnr.c: use int64_t for sum of differences Since the values can be negative. Change-Id: Idda69e9fb47bb34696aeb20170341a0191c5d85e	2016-06-28 09:53:11 -07:00
James Zern	f51f67602e	*.asm: normalize label format add a trailing ':', though it's optional with the tools we support, it's more common to use it to mark a label. this also quiets the orphan-labels warning with nasm/yasm. BUG=b/29583530 Change-Id: I46e95255e12026dd542d9838e2dd3fbddf7b56e2	2016-06-27 19:46:57 -07:00
Yaowu Xu	7676defca9	Merge "Port metric computation changes from nextgenv2"	2016-06-27 19:18:00 +00:00
Min Chen	b2fb48cfcf	improve vpx_filter_block1d* based on replace paddsw+psrlw to pmulhrsw Change-Id: I14c0c2e54d0b0584df88e9a3f0a256ec096bea6e	2016-06-27 17:50:45 +00:00
James Zern	cfd5e0221c	Revert "Update vpx subpixel 1d filter ssse3 asm" This reverts commit `1517fb74fd`. Fixes a segfault in windows x64 builds. Change-Id: I6a6959cd7e64a28376849a9f2b11fc852a7c1fbe	2016-06-25 11:37:20 -07:00
Yaowu Xu	003a9d20ad	Port metric computation changes from nextgenv2 Change-Id: I4aceffcdf7af59ffeb51984f0345c3a4c7e76a9f	2016-06-24 13:52:50 -07:00
Linfeng Zhang	bdeb5febe4	Merge "Update vpx subpixel 1d filter ssse3 asm"	2016-06-23 19:08:04 +00:00
Alex Converse	83db21b2fd	vpx_lpf_horizontal_4_sse2: Remove dead load. Change-Id: I51026c52baa1f0881fcd5b68e1fdf08a2dc0916e	2016-06-22 18:17:41 -07:00
James Zern	527a9fea76	Merge "remove vp10"	2016-06-22 22:35:57 +00:00
Linfeng Zhang	1517fb74fd	Update vpx subpixel 1d filter ssse3 asm Speed test shows the new vertical filters have degradation on Celeron Chromebook. Added "X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON" to control the vertical filters activated code. Now just simply active the code without degradation on Celeron. Later there should be 2 set of vertical filters ssse3 functions, and let jump table to choose based on CPU type. Change-Id: I37e3e9c5694737d9134a6bce6698d3e43f8fc962	2016-06-22 13:15:00 -07:00
Yaowu Xu	ef665996ae	Prevent negative variance Due to rounding used computation, HDB variance computation may produce slightly negative values. This commit adds clamping to make sure output variance values for 10 and 12 to be non-negative. Change-Id: Id679aa55a4c201958c4c7d28cd8733b9246a71c8	2016-06-22 17:55:14 +00:00
Yaowu Xu	543ea3eb3e	Make type conversion explicit This fixes MSVC warnings. Change-Id: I675d8486230b2b74d7973d95720a4995c4750282	2016-06-20 12:05:29 -07:00
James Zern	67edc5e83b	remove vp10 development has moved to the nextgenv2 branch and a snapshot from here was used to seed aomedia BUG=b/29457125 Change-Id: Iedaca11ec7870fb3a4e50b2c9ea0c2b056a0d3c0	2016-06-17 18:26:08 -07:00
Yaowu Xu	de3a8f23c8	vpx_dsp/quantize.c: fix ubsan warnings BUG=webm:1219 Change-Id: I0c80271c6b78adf40aa7a4cac9e6b431d56958cb	2016-06-16 21:46:14 +00:00
Yaowu Xu	e5e998a6eb	vpx_dsp/variance.c: change to use correct type This commit change to use int64_t to represent the sum of pixel differences, which can be negative. This fixes a number of ubsan warnings. BUG=webm:1219 Change-Id: I885f245ae895ab92ca5f3b9848d37024b07aac98	2016-06-16 21:45:48 +00:00
Johann	c516dd67bc	neon hadamard 16x16 Runs about twice as fast as C BUG=webm:1027 Change-Id: I6760d99f4e22259439ca35d746194b12a81bfa71	2016-06-14 19:23:38 +00:00
Debargha Mukherjee	697bcef677	Add a couple of missing WRAPLOW checks To make coefficient checking consistent with the VP9 spec sections 8.7.1.6 and 8.7.1.1. Change-Id: I92e38e89a41d1e482317bb478c48ffa608d2d6ee	2016-06-09 12:58:27 -07:00
Debargha Mukherjee	c2ebd0e6da	Merge "Move range checks into WRAPLOW"	2016-06-06 16:28:24 +00:00
James Zern	e34e684059	Merge changes If31d36c8,I10b947e7 * changes: vpx_dsp,add_noise: remove mmx implementation vpx_dsp: remove mmx variance implementations	2016-06-04 00:56:06 +00:00
Debargha Mukherjee	aa90983696	Move range checks into WRAPLOW Provides more comprehensive coverage for --enable-coefficient-checking. The intent is to make the --enable-coefficient-checking option consistent with the VP9 spec. Change-Id: I12d0120756d17572ca2b2d7e6a2ab9d8071d8d58	2016-06-03 11:27:33 -07:00
Linfeng Zhang	b90166665f	Merge "Slow pshufb removal in 3 intra prediction functions."	2016-06-03 16:35:14 +00:00
James Zern	462e0ff88b	vpx_dsp,add_noise: remove mmx implementation a sse2 version exists, this is a reasonable modern baseline. Change-Id: If31d36c8412d25b53f41b4a93cf02f46802c0c33	2016-06-02 23:51:22 -07:00
James Zern	eea8ea88ab	vpx_dsp: remove mmx variance implementations there are sse2 equivalents for all remaining variance implementations Change-Id: I10b947e73fc0067688181f819b59e47966bec3d2	2016-06-02 23:46:16 -07:00
Linfeng Zhang	ad0646cb84	Slow pshufb removal in 3 intra prediction functions. Replaced vpx_d45_predictor_4x4_ssse3(), vpx_d45_predictor_8x8_ssse3() and vpx_d207_predictor_4x4_ssse3() with created vpx_d45_predictor_4x4_sse2(), vpx_d45_predictor_8x8_sse2() and vpx_d207_predictor_4x4_sse2() respectively. It's mostly neutral or slightly worse than ssse3 in good cases and better than ssse3 in the bad cases (but still worse than using the mmx regs). Change-Id: Ib0237ceb71d2c57b8a93fd3170330cfed9d56bdd	2016-06-02 10:55:58 -07:00
Yaowu Xu	46ff1072b3	variance_avx2.c: UBSAN/IOC fix BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1222 Change-Id: Ifb3bedf9b4e1b007b21aebaa4beb9ba50424efef	2016-05-31 16:44:35 -07:00
Linfeng Zhang	0ba9b299e9	Merge "Upgrade vpx_lpf_{vertical,horizontal}_4 mmx to sse2"	2016-05-27 15:47:28 +00:00
Linfeng Zhang	4b5e462d08	Upgrade vpx_lpf_{vertical,horizontal}_4 mmx to sse2 Followed the code style of other lpf fuctions. These 2 functions put 2 rows of data in a single xmm register, so they have similar but not identical filter operations, and cannot share the same macros. Change-Id: I3bab55a5d1a1232926ac8fd1f03251acc38302bc	2016-05-26 14:55:18 -07:00
Scott LaVarnway	9d24fe60f1	Merge "Code clean of sub_pixel_variance4xh -- 2"	2016-05-26 13:20:24 +00:00
Scott LaVarnway	a4f3751be5	Code clean of sub_pixel_variance4xh -- 2 Replace MMX with SSE2. Change-Id: Id8482d2589131f9427e7f36bc64413f058caf31f	2016-05-24 04:44:05 -07:00
James Zern	3fb55d24e8	Revert "Code clean of sub_pixel_variance4xh" This reverts commit `2468163e07`. causes valgrind errors for overread of buffer in SubpelVarianceTest Change-Id: I448e52c76f815ac199305b71f7d169f2bc167679	2016-05-19 23:37:27 -07:00
Yaowu Xu	d1f0f4cc63	Merge "Clarify integer value ranges"	2016-05-18 23:55:05 +00:00
James Zern	146ccd304f	Merge "Code clean of sub_pixel_variance4xh"	2016-05-18 23:18:35 +00:00
Johann Koenig	36b610d8c1	Merge "neon hadamard 8x8"	2016-05-18 20:11:16 +00:00
Yaowu Xu	a564b18d7f	Clarify integer value ranges This commit clarifies integer value range for vairables used in several variance functions, also change to use proper type conversion to reflect the value ranges. Change-Id: Ic3234b83a912ce1ad12d1b254f3378763e15cc5c	2016-05-18 10:25:12 -07:00
Scott LaVarnway	2468163e07	Code clean of sub_pixel_variance4xh Replace MMX with SSE2. Change-Id: Ia8fcba755952804e347d7d7736f57d1f90c988a0	2016-05-18 04:24:41 -07:00
Johann	9b54e812f7	neon hadamard 8x8 Runs about 30% faster than the C BUG=webm:1021 Change-Id: I6809d6d84c3077ab619c53298296950e976bdaba	2016-05-16 11:58:02 -07:00
Yaowu Xu	c1e4f5a80d	Merge "Change to use correct check for halfpel"	2016-05-13 01:27:47 +00:00
Linfeng Zhang	2f55beb355	Merge "remove mmx variance functions"	2016-05-11 22:21:23 +00:00
Yaowu Xu	17fae3ad0a	Change to use correct check for halfpel In motion estimation stage for subpel motion, subpel variance is computed use bilinear interpolation. The motion vector precision used is at 1/8 pel and three bits are used to represent the x and y subpel offsets. Based on this, the half pel check should be against 4, not 8. Change-Id: I1f56fa1fa3f2f5e19a20d27983efe628557f170e	2016-05-11 13:52:59 -07:00
Linfeng Zhang	d0ffae825d	remove mmx variance functions there are sse2 equivalents which is a reasonable modern baseline Removed mmx variance functions: vpx_get_mb_ss_mmx() vpx_get8x8var_mmx() vpx_get4x4var_mmx() vpx_variance4x4_mmx() vpx_variance8x8_mmx() vpx_mse16x16_mmx() vpx_variance16x16_mmx() vpx_variance16x8_mmx() vpx_variance8x16_mmx() Change-Id: Iffaf85344c6676a3dd337c0645a2dd5deb2f86a1	2016-05-11 12:39:42 -07:00
Linfeng Zhang	d0e687bf8c	remove mmx sad functions there are sse2 equivalents which is a reasonable modern baseline Change-Id: Ibbe536a5ad1c2cccef6bdcc75c13b3dde35a56ba	2016-05-11 10:50:04 -07:00
Jim Bankoski	da33728f48	vpx_dsp: Rename postproc.c add_noise. Change-Id: I4906d1b79a2951e659995202b9fa97e2ea5cfba0	2016-05-10 06:52:58 -07:00
Scott LaVarnway	c2c5297595	Merge "VPX: refactor vpx_idct16x16_1_add_sse2()"	2016-05-09 22:15:17 +00:00

... 3 4 5 6 7 ...

721 Commits