generic-library/vpx

Author	SHA1	Message	Date
Scott LaVarnway	c39a05ff61	vpxdsp: [x86] add highbd_dc_left_predictor functions C vs SSE2 speed gains: _4x4 : ~6.49x _8x8 : ~10.82x _16x16 : ~7.61x _32x32 : ~5.29x BUG=webm:1411 Change-Id: Ibc30c50cb7139049bf05298010803499e6ef949b	2017-08-30 09:29:06 -07:00
Scott LaVarnway	f783e3a75d	vpxdsp: [x86] add highbd_dc_top_predictor functions C vs SSE2 speed gains: _4x4 : ~7.39x _8x8 : ~11.36x _16x16 : ~8.68x _32x32 : ~4.33x BUG=webm:1411 Change-Id: I7f1487cd1531d4e7f0fbb4596fed3bfb72a59d58	2017-08-29 12:53:30 -07:00
Scott LaVarnway	30d9a1916c	vpxdsp: [x86] add highbd_h_predictor functions C vs SSE2 speed gains: _4x4 : ~8.12x _8x8 : ~9.71x _16x16 : ~8.21x _32x32 : ~5.0x BUG=webm:1422 Change-Id: I5e8a1ed4db7b8dc539b3e2a728b0b34d8b4b1993	2017-08-28 17:31:18 -07:00
Marco Paniconi	3e069846b9	Merge "Revert "quantize avx: copy 32x32 implementation""	2017-08-25 18:20:31 +00:00
Marco Paniconi	8c42237bb2	Revert "quantize avx: copy 32x32 implementation" This reverts commit `f60d1dcd3d`. Reason for revert: <INSERT REASONING HERE> Failures in AVX/VP9QuantizeTest in nightly tests. Original change's description: > quantize avx: copy 32x32 implementation > > Ensure avx and ssse3 stay in sync by testing them against each other. > > Change-Id: I699f3b48785c83260825402d7826231f475f697c TBR=slavarnway@google.com,johannkoenig@google.com,builds@webmproject.org Change-Id: Ibd38636212269328317dd0721be9d25452113d1c No-Presubmit: true No-Tree-Checks: true No-Try: true	2017-08-25 16:56:08 +00:00
Shiyou Yin	ece1989fa2	Merge "vpx_dsp:loongson optimize vpx_varianceWxH_c,vpx_sub_pixel_varianceWxH_c and vpx_sub_pixel_avg_varianceWxH_c with mmi."	2017-08-25 06:44:02 +00:00
Shiyou Yin	9e4647c7ab	vpx_dsp:loongson optimize vpx_varianceWxH_c,vpx_sub_pixel_varianceWxH_c and vpx_sub_pixel_avg_varianceWxH_c with mmi. Change-Id: Ia576a721df6312329b599c31cfe1fb1267a9f174	2017-08-25 01:58:49 +08:00
Johann	f60d1dcd3d	quantize avx: copy 32x32 implementation Ensure avx and ssse3 stay in sync by testing them against each other. Change-Id: I699f3b48785c83260825402d7826231f475f697c	2017-08-24 10:42:34 -07:00
Johann	1787e7dbe0	quantize ssse3: copy implementation to intrinsics Still does not pass tests. Does match the previous assembly, although saving the sign before multiplying is dubious. Change-Id: Ia163f18c755aba542d6e93f7bf7343184660df5a	2017-08-24 07:47:51 -07:00
Shiyou Yin	d080c92524	Merge "vpx_dsp:loongson optimize vpx_mseWxH_c(case 16x16,16X8,8X16,8X8) with mmi."	2017-08-24 00:55:11 +00:00
Johann Koenig	f53b656207	Merge "quantize avx: copy implementation to intrinsics"	2017-08-23 21:14:13 +00:00
Scott LaVarnway	1aad50c092	Merge "vpx_dsp: get32x32var_avx2() cleanup"	2017-08-23 19:59:25 +00:00
Johann Koenig	dfafd10ef5	Merge "quantize neon: round dqcoeff towards zero"	2017-08-23 19:20:53 +00:00
Johann	7c27872164	quantize avx: copy implementation to intrinsics Adds an early exit based on ptest. Slightly slower than ssse3 in the full case because of the extra check, but potentially faster if lots of rows can be skipped. Very close in speed to the assembly. Can run in 32 bit, unlike the assembly. Allows reworking the function prototype to use structs. Change-Id: If80e2b9ba059370a4cad3c973196e82a97b4330e	2017-08-23 09:19:16 -07:00
Johann	2a5aa98a35	quantize neon: round dqcoeff towards zero Add 1 if negative to get dqcoeff to round towards zero. 10-15% faster than converting to positive before shifting. Change-Id: I01a62fd0c9bca786b6885b318bd447bb9229903d	2017-08-23 08:05:50 -07:00
Shiyou Yin	59e065b6ed	vpx_dsp:loongson optimize vpx_mseWxH_c(case 16x16,16X8,8X16,8X8) with mmi. Change-Id: I2c782d18d9004414ba61b77238e0caf3e022d8f2	2017-08-23 15:14:15 +08:00
Johann	b9c1dcc5fa	quantize ssse3: copy style from sse2 Change-Id: I53f8a160e640c674ea035fc112e207b6dca42598	2017-08-22 14:25:27 -07:00
Johann	75752ab7c0	quantize sse2: copy opts from ssse3 Simplify eob calculations based on ssse3 implementation. General clean up and re-scoping. Change-Id: I48f282bf9bd28ee9bc2c7a6779be9d45b5a3a3ee	2017-08-22 13:01:44 -07:00
Johann Koenig	ab27b68693	Merge changes Icfb70687,I9a963e99,Ie8ac00ef,I1272917c * changes: quantize: ignore skip_block in arm quantize: ignore skip_block in x86 quantize fp: ignore skip_block in arm quantize fp: ignore skip_block in x86	2017-08-22 19:19:14 +00:00
James Zern	419ce36294	Merge "ppc: Add vpx_idct16x16_256_add_vsx"	2017-08-22 00:48:39 +00:00
Shiyou Yin	bff5aa9827	Merge "vpx_dsp:loongson optimize vpx_subtract_block_c (case 4x4,8x8,16x16) with mmi."	2017-08-22 00:37:23 +00:00
Johann	2c56bb97f2	quantize: ignore skip_block in arm Change-Id: Icfb70687476b2edb25d255793ba325b261d40584	2017-08-21 14:37:50 -07:00
Johann	c02fdd0258	quantize: ignore skip_block in x86 Change-Id: I9a963e99f08761f0c8d6a305619270b2f1c4edf8	2017-08-21 14:37:03 -07:00
Johann	13eed991f9	Remove skip_block from quantize This condition is handled before this code is reached. The ssse3 version of the function has always crashed when attempting to handle the skip_block condition. Add assert() and comments regarding the usage of skip_block. Removing the parameter is a fairly involved process so leave it be for the moment. Change-Id: Ib299f6fc6589d7ee102262cc74a7aeb60110bc5a	2017-08-21 09:49:04 -07:00
Scott LaVarnway	eab3f5e0cc	vpx_dsp: get32x32var_avx2() cleanup renamed to get32x16var_avx2() BUG=webm:1404 Change-Id: Icb8f3986c9c9c646e13a69430db7235fc7e1a036	2017-08-18 13:44:09 -07:00
Scott LaVarnway	2c5478e383	Merge "vpx_dsp: vpx_get16x16var_avx2() cleanup"	2017-08-18 20:30:59 +00:00
Scott LaVarnway	2f7497f341	vpx_dsp: vpx_get16x16var_avx2() cleanup BUG=webm:1404 Change-Id: I88aceb07f4db4870a06eee21d87296974ce3221a	2017-08-18 12:23:49 -07:00
Johann Koenig	1426f04e91	Merge "quantize: normalize intermediate types"	2017-08-18 16:00:28 +00:00
Shiyou Yin	7d82e57f5b	vpx_dsp:loongson optimize vpx_subtract_block_c (case 4x4,8x8,16x16) with mmi. Change-Id: Ia120ad1064d0b6106d9685cf075bdab373eef19e	2017-08-18 09:06:49 +08:00
James Zern	bb15fd51be	highbd_idct32x32*,idct32_34_4x32_quarter_1_2: fix typo 135 -> 34 fixes unused function warnings for highbd_idct32_34_4x32_quarter_[12] Change-Id: I4f50ff6ea514200af93dd59ff94c7f9717409682	2017-08-17 15:37:38 -07:00
Johann	7f602d6114	quantize: normalize intermediate types Despite abs_coeff being a positive value, all the other implementations treat it as signed which simplifies restoring the sign. HBD builds cast qcoeff to avoid a visual studio warning. Match vp9_quantize.c style of casting the entire expression. Change-Id: I62b539b8df05364df3d7644311e325288da7c5b5	2017-08-17 12:34:28 -07:00
James Zern	e038d1610e	inv_txfm_sse2.h: correct idct/iadst prototypes fixes mismatch between prototypes and definitions Change-Id: Ib5e7dfcce244dbb8401815be2cdd183d96792652	2017-08-16 23:06:09 -07:00
Linfeng Zhang	f95686895b	Merge changes I08b562b6,Ia275940a,I51106e90 * changes: Add vpx_highbd_idct32x32_{34, 135, 1024}_add_{sse2, sse4_1} Update highbd idct x86 optimizations. Update 32x32 idct sse2 and ssse3 optimizations.	2017-08-16 16:36:37 +00:00
Jerome Jiang	6b9c691daf	Merge "Clean up writing YUV files for debug purpose."	2017-08-15 18:28:54 +00:00
Jerome Jiang	a153080b55	Clean up writing YUV files for debug purpose. Change legacy vp8/9_write_yuv_frame to vpx_write_yuv_files. Delete some flags that can be enabled during build. To enable writing denoised YUV, use the following command line: CFLAGS='-DOUTPUT_YUV_DENOISED' ./configure --enable-vp9-temporal-denoising For skinmap, use CFLAGS='-DOUTPUT_YUV_SKINMAP' Change-Id: I236974ac8b3cf279d20c4dc7f6162d8b480b6528	2017-08-15 10:44:03 -07:00
Johann	77ed4414d6	quantize: silence unsigned overflow warning The result of the xor operation is unsigned. If coeff was negative, this results in an unsigned value - INT_MIN. Change-Id: I1f1edeaa6de1f4c68b848e8a82a666d390b749f0	2017-08-15 09:48:24 -07:00
Linfeng Zhang	d72e20b123	Add vpx_highbd_idct32x32_{34, 135, 1024}_add_{sse2, sse4_1} BUG=webm:1412 Change-Id: I08b562b60fa85fbc2fec1c15c323a3444b44618f	2017-08-14 17:05:22 -07:00
Linfeng Zhang	69775d2f40	Update highbd idct x86 optimizations. BUG=webm:1412 Change-Id: Ia275940af7d7d8637e9a851a9e39d655bfbe4069	2017-08-14 16:59:50 -07:00
Linfeng Zhang	3f05a70c41	Update 32x32 idct sse2 and ssse3 optimizations. Change-Id: I51106e90344035452621c49a6e1be7d5276b6c70	2017-08-14 16:59:31 -07:00
Linfeng Zhang	15193ce51f	Merge "Clean highbd idct x86 code with inline functions"	2017-08-10 20:25:18 +00:00
Johann Koenig	9bb8ce5efb	Merge "neon: vpx_quantize_b_32x32"	2017-08-10 15:42:49 +00:00
Johann Koenig	0b393ae505	Merge "quantize: copy ssse3 optimizations to intrinsics"	2017-08-10 15:42:20 +00:00
Linfeng Zhang	39da7fb786	Clean highbd idct x86 code with inline functions Created inline functions highbd_butterfly_cospi16_sse2() and highbd_butterfly_cospi16_sse4_1() BUG=webm:1412 Change-Id: Icbc53a73712b6207379872a5e88d0a4d09e2322a	2017-08-08 17:53:28 -07:00
Johann	93166c5e51	neon: vpx_quantize_b_32x32 With skip block the neon is about twice as fast as C. The neon has no shortcut for coeff < zbin so it always takes the same amount of time. Even if the C can take the shortcut, it is over twice as fast in neon. If it can't, that gap increases to over 10x. BUG=webm:1426 Change-Id: I400722146c1b5a5f6289f67d85fd642463d2bfc6	2017-08-08 14:05:18 -07:00
Johann	d52cb59729	quantize: copy ssse3 optimizations to intrinsics Fairly minor differences from sse2. pabsw and psignw are the big gains. Also re-uses some values in eob calculation to avoid an extra pcmp. Fixes test failures in HBD and OS X builds. Allows using it in 32bit builds, where it is about 40% faster than sse2. Substantially faster than the assembly for skip_block. 10-20% faster the rest of the time. Change-Id: If783bb3567e561e47667e10133b9c84414a334e2	2017-08-08 12:22:14 -07:00
Linfeng Zhang	853165ba39	Update 32x32 idct sse2 funcs, add partial case 135 Change-Id: I2b9add83f6fd8f9138fed3bec04a59877a237a6a	2017-08-07 17:37:02 -07:00
Linfeng Zhang	d670678f26	Rename highbd_multiplication_and_add_xx() to highbd_butterfly_xx() in idct x86 code Change-Id: I5159499a73a5c1b680516f6ca9c3d84f00c35083	2017-08-04 15:33:37 -07:00
Linfeng Zhang	fa829e0e5a	Replace multiplication_and_add() with butterfly() in idct x86 code Change-Id: I266e45a3d75a5357c7d6e6f20ab5c6fdbfe4982e	2017-08-04 15:33:34 -07:00
Linfeng Zhang	c9fb719ee1	Update butterfly() in idct x86 optimizations. Change-Id: Ic73e03bab9fdc085146f52094014db4af36ad701	2017-08-04 15:33:28 -07:00
Linfeng Zhang	7f20c3ac44	Add vpx_highbd_idct16x16_{10, 38, 256}_add_sse4_1 BUG=webm:1412 Change-Id: I8877c986b4042f7b8e33f5674c86700675a0e4ca	2017-08-04 15:31:17 -07:00
Linfeng Zhang	22b6dc9fdf	Update for loop increment of idct x86 functions Change-Id: Ided7895eaf41d5bc9d64fe536a17f5a078da68d4	2017-08-04 15:29:19 -07:00
Linfeng Zhang	0c61331244	Update high bitdepth 16x16 idct x86 code Prepare for high bitdepth 16x16 idct sse4.1 code. Just functions moving and renaming. BUG=webm:1412 Change-Id: Ie056fe4494b1f299491968beadcef990e2ab714a	2017-08-04 15:12:33 -07:00
Scott LaVarnway	c42517568d	vpx_dsp: merge avx2 variance files BUG=webm:1404 Change-Id: Ieb8f85c3811b05df78722cb41eeb1166966ceec4	2017-08-04 07:49:30 -07:00
Linfeng Zhang	e921c7ba8d	Merge "Rewrite vpx_idct16x16_{10,256}_add_sse2() and add case 38 function"	2017-08-04 01:16:35 +00:00
Scott LaVarnway	f6c6f37e0c	Merge "vpx_dsp: Use correct check for halfpel in"	2017-08-03 23:17:09 +00:00
Linfeng Zhang	563d58ab84	Rewrite vpx_idct16x16_{10,256}_add_sse2() and add case 38 function BUG=webm:1412 Change-Id: I945f0fb6807b8948747243794dc7352b959221f7	2017-08-03 13:59:47 -07:00
Linfeng Zhang	6624f20785	Merge changes I76727df0,I66297d78,I1d000c6b * changes: Extract inlined 16x16 idct sse2 code into header file Add transpose_32bit_8x4() sse2 optimization Update x86 idct optimization	2017-08-03 20:51:02 +00:00
Scott LaVarnway	8334a48d3a	vpx_dsp: Use correct check for halfpel in vpx_sub_pixel_variance32xh_avx2() and vpx_sub_pixel_avg_variance32xh_avx2 see: `17fae3a` Change to use correct check for halfpel Change-Id: Ib0741c5c2fd011e9650ca62b76009f1b59fdbe4c	2017-08-03 06:57:40 -07:00
Linfeng Zhang	15a47db730	Extract inlined 16x16 idct sse2 code into header file Will be called by high bitdepth functions. Change-Id: I76727df00941b5a27adceaba8347f275475fcd8c	2017-08-02 16:17:43 -07:00
Linfeng Zhang	8c0ab7607e	Add transpose_32bit_8x4() sse2 optimization Change-Id: I66297d78b38db718cfe3ebb8ea972f5a72c17955	2017-08-02 16:15:58 -07:00
Scott LaVarnway	698e56f26c	Merge "vpxdsp: variance_impl_avx2.c cleanup"	2017-08-02 19:08:10 +00:00
Scott LaVarnway	632fe8286a	vpxdsp: variance_impl_avx2.c cleanup BUG=webm:1404 Change-Id: I8d8498009e5ef7bf1137e4ff16ec81738a020b02	2017-08-02 05:57:39 -07:00
Linfeng Zhang	6738ad7aaf	Update x86 idct optimization Move constant coefficients preparation into inline function. Change-Id: I1d000c6b161794c8828ff70768439b767e2afea1	2017-08-01 14:40:12 -07:00
Linfeng Zhang	c0490b52b1	Merge "Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2"	2017-08-01 21:39:39 +00:00
Johann Koenig	847394fe77	Merge "neon: vpx_quantize_b"	2017-08-01 16:44:31 +00:00
Linfeng Zhang	bf14d468c1	Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2 This replaces commit `aa1c4cd`, which has a bug and was reverted in commit `3c73e58`. The bug is caused by rounding -step1[5] in highbd_idct8x8_12_half1d(). Change-Id: I37b3a5f0d91815f2dc570209091dc6626fd178a8	2017-07-31 16:36:13 -07:00
Johann	2d6b5df657	neon: vpx_quantize_b With skip block or coeff < zbin it is about twice as fast as C. If most coeff values are > zbin it is about 10-15x as fast as C. BUG=webm:1426 Change-Id: I5d3c007b014a372d5ef0882b39bb48983b4131c7	2017-07-31 10:38:46 -07:00
James Zern	78155b7ed5	highbd_inv_txfm_sse4: make << of neg. val a multiply left shifting a negative value is undefined; quiets a ubsan warning. this is applied to a constant, no change in the generated code. Change-Id: I595f0ff7904ef025e07bb80234293d958dc9f254	2017-07-30 12:48:28 -07:00
James Zern	d35b627340	Revert "Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2" This reverts commit `aa1c4cd140`. This fails the following tests with extreme input coefficients: SSE2/InvTrans8x8DCT.CompareReference/0 SSE2/InvTrans8x8DCT.CompareReference/2 previously the optimized path was skipped in this range Change-Id: I9af015a46eba96208834a219fafd651d37556a80	2017-07-29 11:12:27 -07:00
Linfeng Zhang	75653b7032	Merge changes Ia0e20f5f,I28150789,I35df041b,I221dff34 * changes: Update vpx_idct16x16_10_add_sse2() Add vpx_idct16x16_38_add_sse2() Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2 Refactor highbd idct 4x4 and 8x8 x86 functions	2017-07-28 22:43:00 +00:00
James Zern	3c73e587d1	Revert "quantize ssse3: declare all variables" This reverts commit `03f5e300d6`. This causes test failures under OSX: SSSE3/VP9QuantizeTest.EOBCheck/0 SSSE3/VP9QuantizeTest.OperationCheck/0 Change-Id: I122732717ead1f7af5b04c529a6948e382e5e59b	2017-07-28 01:22:16 -07:00
Linfeng Zhang	5232e35bc2	Update vpx_idct16x16_10_add_sse2() Change-Id: Ia0e20f5fa47382af5785221eebb05212b40bd35c	2017-07-27 18:03:25 -07:00
Linfeng Zhang	7f4acf8700	Add vpx_idct16x16_38_add_sse2() Change-Id: I28150789feadc0b63d2fadc707e48971b41f9898	2017-07-27 18:02:43 -07:00
Linfeng Zhang	aa1c4cd140	Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2 BUG=webm:1412 Change-Id: I35df041b757d42278ac7a5cdbd909e8ffcee1455	2017-07-27 18:02:36 -07:00
Linfeng Zhang	9c43d81bc2	Refactor highbd idct 4x4 and 8x8 x86 functions BUG=webm:1412 Change-Id: I221dff34dd5f71b390b5e043d0a137ccb0a01dec	2017-07-27 18:01:03 -07:00
Johann Koenig	a83e1f1d53	Merge "quantize ssse3: declare all variables"	2017-07-27 21:18:35 +00:00
James Zern	1c666465af	inv_txfm_{sse2,ssse3}: clear conversion warnings visual studio reports tran_high_t (int64) -> short in calls to _mm_set1_epi16 Change-Id: Icb8d1baee77ad3d45edb1477a443d3e648f0b745	2017-07-25 20:13:49 -07:00
James Zern	62682ac8ad	highbd_idct_sse.c: clear conversion warnings visual studio reports tran_high_t (int64) -> int in calls to _mm_setr_epi32 Change-Id: Ic2247c8e3800991202151790d78bd94c4f4aed05	2017-07-25 20:11:09 -07:00
James Zern	85736e616e	vpx_variance16x16_sse2: correct cast order allow the right shift to operate on 64-bits, this matches the rest of the implementations previously: `b0f1ae147` vpx_get16x16var_avx2: correct cast order Change-Id: I632ee5e418f3f9b30e79ecd05588eb172b0783aa	2017-07-25 16:45:40 -07:00
Alexandra Hájková	666c543f7b	ppc: Add vpx_idct16x16_256_add_vsx Change-Id: Ibc3f7965423fd91179f8d8e77c7ae3e6d7f80572	2017-07-25 12:34:15 +00:00
James Zern	b0f1ae1475	vpx_get16x16var_avx2: correct cast order allow the right shift to operate on 64-bits, this matches the rest of the implementations missed in: `6acd061aa` variance_avx2: sync variance functions with c-code Change-Id: Icae436b881251ccb9f9ed64fcbf8d358c58a4617	2017-07-24 16:29:44 -07:00
Johann	4b9a848bb3	variance: call C comp_avg_pred Keep optimized code out of the reference implementation. This matches the style of the other sub calls. Change-Id: I3da6acd4f2c647b029c420e22ac9410a18259689	2017-07-18 20:22:53 +00:00
Johann	03f5e300d6	quantize ssse3: declare all variables Copy missing line from avx implementation. Change-Id: I9755c5b4d4034867de6fa9f741c24bf49dce3a27	2017-07-18 12:32:57 -07:00
Johann	e381753926	sad4d neon: 64x[32,64] Rewrite 64x64. BUG=webm:1425 Change-Id: I336bf5a3aa4b783389c10b16a50f0f559346ecbf	2017-07-12 13:26:39 +00:00
Johann	e1bde306c8	sad4d neon: 32x[16,32,64] Rewrite 32x32. Use half the accumulator registers. BUG=webm:1425 Change-Id: Ibf5e61dc4ba15056102aef8495f4a02c668c5d13	2017-07-12 13:25:18 +00:00
Johann	807ce8fb1e	sad4d neon: 16x[8,16,32] Rewrite 16x16. Use half the accumulator registers. BUG=webm:1425 Change-Id: I44b48512b1e3629505d83c2645e800f53878ccc2	2017-07-12 13:25:11 +00:00
Johann	8152b0904d	sad4d neon: 8x[4,8,16] BUG=webm:1425 Change-Id: I7de2500cca4b621f21478c4b0333c56d76dbc9a4	2017-07-12 13:25:03 +00:00
Johann	dd4347e9ec	sad4d neon: 4x4, 4x8 BUG=webm:1425 Change-Id: I5081b5ce131821d590c53ac1206a94f50cb8b468	2017-07-12 03:38:03 +00:00
Johann	66a96fd3de	avg_neon: fix 4x4, update 8x8 4x4 was failing with a bus error. Most likely due to clang alignment hints on 32bit loads. Change-Id: Ib191ce0e6239fc55d85f10e4dbe15876e5052edb	2017-07-10 15:29:34 -07:00
Johann	87610ac45e	neon: consolidate horizontal adds Change-Id: Iaf9e88ff636ccf8f0ef310869c6827f3f205cca8	2017-07-10 15:29:13 -07:00
Johann Koenig	4b78c6e6f7	Merge "remove vp9_full_sad_search"	2017-07-10 20:42:40 +00:00
Johann	109faffe9b	remove vp9_full_sad_search This code is unused in vp9. Only vp8 still contains references to vpx_sad_NxMx[3\|8] and only for sizes 16x16, 16x8, 8x16, 8x8 and 4x4. Remove the remaining sizes and all the highbitdepth versions. BUG=webm:1425 Change-Id: If6a253977c8e0c04599e25cbeb45f71a94f563e8	2017-07-10 11:20:35 -07:00
Johann Koenig	4e16f70703	Merge changes Id84d9780,Iaa6ea75b,I3362e0dd,I0020a49e,Ia42e4f36, ... * changes: sad neon: avg for 64x[32,64] sad neon: macroize 64xN definitions sad neon: avg for 32x[16,32,64] sad neon: macroize 32xN definitions sad neon: avg for 16x[8,16,32] sad neon: macroize 16xN definitions	2017-07-07 21:01:23 +00:00
Johann Koenig	6c375b9cd0	Merge "fdct neon: 32x32_rd"	2017-07-07 14:05:51 +00:00
Johann	e4e08556db	sad neon: avg for 64x[32,64] BUG=webm:1425 Change-Id: Id84d97807a6a0fbcc889c4dfe11929d54f85493d	2017-07-07 07:04:04 -07:00
Johann	6ae8f8dbe8	sad neon: macroize 64xN definitions Change-Id: Iaa6ea75b10e75784f31b1e08637eecf0dcb5cff9	2017-07-07 07:04:04 -07:00
Johann	67cffc1ef6	sad neon: avg for 32x[16,32,64] BUG=webm:1425 Change-Id: I3362e0dded3b46ca032caa7f44db42f324bc596d	2017-07-07 07:04:04 -07:00
Johann	b0d15713be	sad neon: macroize 32xN definitions Change-Id: I0020a49e77d27514375a03095d5821dc0aa7d128	2017-07-07 07:04:04 -07:00
Johann	527e0c9b1c	sad neon: avg for 16x[8,16,32] BUG=webm:1425 Change-Id: Ia42e4f36547c5fe12114fb58379e34bce82eb2f2	2017-07-07 07:04:04 -07:00
Johann	3c18acf452	sad neon: macroize 16xN definitions Change-Id: I5aea6ffbfa48eb1970afe3be54f0bba275d7fa58	2017-07-07 07:04:04 -07:00
Johann	d6423b3166	sad neon: macroize 8xN definitions Change-Id: I7b36a57e893c1795a37ba7994995bec7ff021409	2017-07-06 07:51:59 -07:00
Johann	63bdc574e5	sad neon: avg for 8x[4,8,16] BUG=webm:1425 Change-Id: If2ab51e3050e078b0011b174efe41fcb65a15f44	2017-07-06 07:43:09 -07:00
Johann	6bac3f80ee	sad neon: avg for 4x4 and 4x8 BUG=webm:1425 Change-Id: Ifc685a96cb34f7fd9243b4c674027480564b84fb	2017-07-06 07:12:47 -07:00
Johann	75b00592c7	fdct neon: 32x32_rd About 40% faster than the non-rd version. BUG=webm:1424 Change-Id: Ia99d14eb9532302eeaab8cd3e503395b0374b5a2	2017-07-06 06:30:50 -07:00
James Zern	a6531cbc54	Merge changes from topic 'missing-proto' * changes: fwd_txfm_msa.c: add missing vpx_dsp_rtcd.h vpx_convolve__msa.c: add missing vpx_dsp_rtcd.h loopfilter__msa.c: add missing vpx_dsp_rtcd.h	2017-07-05 20:00:25 +00:00
Johann Koenig	b6321025cd	Merge "partial fdct neon: maintain neon registers"	2017-07-05 19:12:38 +00:00
James Zern	fb135ff050	Merge changes I4ed1312f,Id2673eec * changes: ppc: Add vpx_idct8x8_64_add_vsx ppc: Add vpx_idct4x4_16_add_vsx	2017-07-02 02:38:39 +00:00
Alexandra Hájková	c757d6dde4	ppc: Add vpx_idct8x8_64_add_vsx Change-Id: I4ed1312f365509e0595dcc09890ecb050f6f2069	2017-07-01 12:55:47 -07:00
Alexandra Hájková	d8c277030c	ppc: Add vpx_idct4x4_16_add_vsx Change-Id: Id2673eece32027fb245919c7a5c81994a4a19fd8	2017-07-01 12:32:18 -07:00
James Zern	3dd993e4be	highbd_idct8x8_add_sse4: make << of neg. val a multiply left shifting a negative value is undefined; quiets a ubsan warning. this is applied to a constant, no change in the generated code. Change-Id: Ia17a7672d4832463decbc4afd6cd42974d02698e	2017-07-01 11:56:56 -07:00
Johann	3ae458f2f3	partial fdct neon: maintain neon registers Finish the calulations in neon registers. This avoids a potentially expensive move from neon to gp and allows at least clang to store directly to memory. BUG=webm:1424 Change-Id: Idef25eec95f7610947167818e9194bde8b00d282	2017-07-01 09:29:38 -07:00
James Zern	a876d04072	fwd_txfm_msa.c: add missing vpx_dsp_rtcd.h + only expose compatible functions in high-bitdepth build quiets -Wmissing-prototypes warnings Change-Id: I8ef7db08a34c5c54b5cde6e732c0d70f4287c89a	2017-06-30 18:53:30 -07:00
James Zern	8710c6d884	vpx_convolve_*_msa.c: add missing vpx_dsp_rtcd.h quiets -Wmissing-prototypes warnings Change-Id: I1ab5b8ae4a62f54e0f9eb3fc81371c9b99972c30	2017-06-30 18:50:56 -07:00
James Zern	329dabf57e	loopfilter_*_msa.c: add missing vpx_dsp_rtcd.h + make some functions static quiets -Wmissing-prototypes warnings Change-Id: I2130e06142e71a004a1eb30e173feba4f6fe68a0	2017-06-30 18:50:52 -07:00
James Zern	27e37e1a8a	fwd_txfm_msa.c: correct vpx_fdct8x8_1_msa prototype this makes the function compatible with high-bitdepth and fixes test failures since: `5ac88162b` partial fdct test Change-Id: Ib630694608237f0c515948942e05dbea259ba338	2017-06-30 18:50:47 -07:00
Linfeng Zhang	1e3a93e72e	Merge changes I5d038b4f,I9d00d1dd,I0722841d,I1f640db7 * changes: Add vpx_highbd_idct8x8_{12, 64}_add_sse4_1 sse2: Add transpose_32bit_4x4x2() and update transpose_32bit_4x4() Refactor highbd idct 4x4 sse4.1 code and add highbd_inv_txfm_sse4.h Refactor vpx_idct8x8_12_add_ssse3() and add inv_txfm_ssse3.h	2017-06-30 20:49:19 +00:00
Johann Koenig	89d3dc043e	Merge changes Id5beb35d,I2945fe54,Ib0f3cfd6,I78a2eba8 * changes: partial fdct neon: add 32x32_1 partial fdct neon: add 16x16_1 partial fdct neon: add 4x4_1 partial fdct neon: move 8x8_1 and enable hbd tests	2017-06-30 01:00:07 +00:00
Linfeng Zhang	c338f3635e	Add vpx_highbd_idct8x8_{12, 64}_add_sse4_1 BUG=webm:1412 Change-Id: I5d038b4fa842ce2f6b9bd5c8c44c70647bda9591	2017-06-29 17:19:34 -07:00
Linfeng Zhang	ee5cb8d87f	sse2: Add transpose_32bit_4x4x2() and update transpose_32bit_4x4() BUG=webm:1412 Change-Id: I9d00d1ddbd724fd5f825fd974c4cf46a9bca6cb3	2017-06-29 17:18:01 -07:00
Linfeng Zhang	0fa59a4baf	Refactor highbd idct 4x4 sse4.1 code and add highbd_inv_txfm_sse4.h Also clean highbd_inv_txfm_sse2.h BUG=webm:1412 Change-Id: I0722841d824ce602874019bd9779b10d49d10c0b	2017-06-29 17:17:43 -07:00
Linfeng Zhang	9ac78ae35f	Refactor vpx_idct8x8_12_add_ssse3() and add inv_txfm_ssse3.h BUG=webm:1412 Change-Id: I1f640db71ad4c644b7521305a781f2218eb1ba9d	2017-06-29 17:13:28 -07:00
James Zern	bd77931421	dct_partial_test,fwd_txfm: change << to * left shift of a negative number is undefined in C; quiets a ubsan warning Change-Id: Ib1624ad5326ac8e0eead9348468ef7fe5d4df9a4	2017-06-29 14:42:03 -07:00
Johann	9fe510c12a	partial fdct neon: add 32x32_1 Always return an int32_t. Since it needs to be moved to a register for shifting, this doesn't really penalize the smaller transforms. The values could potentially be summed and shifted in place. BUG=webm:1424 Change-Id: Id5beb35d79c7574ebd99285fc4182788cf2bb972	2017-06-28 15:37:44 -07:00
Johann	f310ddc470	partial fdct neon: add 16x16_1 For the 8x8_1, the highbd output fit nicely in the existing function. 12 bit input will overflow this implementation of 16x16_1. BUG=webm:1424 Change-Id: I2945fe5478b18f996f1a5de80110fa30f3f4e7ec	2017-06-28 15:37:44 -07:00
Johann	4959dd3eb3	partial fdct neon: add 4x4_1 BUG=webm:1424 Change-Id: Ib0f3cfd6116fc1f5a99acb8bfd76e25b90177ffc	2017-06-28 15:37:44 -07:00
Johann	cf75ab6ccd	partial fdct neon: move 8x8_1 and enable hbd tests The function was originally written with HBD in mind. Enable it and configure the tests. BUG=webm:1424 Change-Id: I78a2eba8d4d9d59db98a344ba0840d4a60ebe9a1	2017-06-28 15:37:43 -07:00
Johann Koenig	81e25512c3	Merge changes Ib454762d,I966650df,Ie126553e,I068f06c6,Icb72a94e * changes: sad neon: rewrite 64x64 and add 64x32 sad neon: rewrite 32x32, add 32x16 and 32x64 sad neon: rewrite 16x8, 16x16, add 16x32 sad neon: rewrite 8x8 and 8x16 sad neon: rewrite 4x4 and add 4x8	2017-06-28 22:37:00 +00:00
Johann Koenig	35f8515c3f	Merge "partial fdct test"	2017-06-28 22:34:53 +00:00
Johann	5ac88162b9	partial fdct test Test the _1 variant of the fdct, which simply sums the block and applies a modifying shift based on the block size. BUG=webm:1424 Change-Id: Ic80d6008abba0c596b575fa0484d5b5855321468	2017-06-28 20:32:20 +00:00
Johann	ad011aaab8	sad neon: rewrite 64x64 and add 64x32 BUG=webm:1425 Change-Id: Ib454762d1c61b05a98324fe81ad58c9e09784717	2017-06-28 12:21:34 -07:00
Johann	77a648885c	sad neon: rewrite 32x32, add 32x16 and 32x64 BUG=webm:1425 Change-Id: I966650df7e3face93e1e771634d1cc5458a35f85	2017-06-28 12:20:27 -07:00
Johann	469643757f	sad neon: rewrite 16x8, 16x16, add 16x32 BUG=webm:1425 Change-Id: Ie126553e5fffcdfaf3d82a85b368ac10ce9ab082	2017-06-28 12:16:00 -07:00
Johann	e40e78be24	sad neon: rewrite 8x8 and 8x16 BUG=webm:1425 Change-Id: I068f06c67b841f09ea07c04ada0c2f1706102138	2017-06-28 12:15:57 -07:00
Johann	46d8660ce3	sad neon: rewrite 4x4 and add 4x8 The previous implementation loaded 8 values (discarding half) BUG=webm:1425 Change-Id: Icb72a94e2557a4ee2db7091266ab58fd92f72158	2017-06-28 11:14:59 -07:00
Linfeng Zhang	0bb31a46a4	Update vpx_idct8x8_12_add_ssse3() Change-Id: I0f38801c391db87ddae168602a786a062cd34b1d	2017-06-26 14:57:41 -07:00
Linfeng Zhang	a76b6b232c	Update load_input_data() in x86 Split to load_input_data4() and load_input_data8(). Use pack with signed saturation instruction for high bitdepth. Change-Id: Icda3e0129a6fdb4a51d1cafbdc652ae3a65f4e06	2017-06-26 13:38:33 -07:00
Linfeng Zhang	8253a27904	Add vpx_highbd_idct4x4_16_add_sse4_1() BUG=webm:1412 Change-Id: Ie33482409351a01be4e89466b0441834eb1e905a	2017-06-23 14:30:12 -07:00
Linfeng Zhang	b8a4b5dd8d	Cosmetics, 8x8 idct SSE2 optimization Change-Id: Id21fa94fd323e36cd19a2d890bf4a0cafb7d964d	2017-06-23 14:30:12 -07:00
James Zern	88a302e743	Merge changes from topic 'missing-proto' * changes: onyxd_int.h: add missing prototypes onyxd.h: add vp8dx_references_buffer prototype vp[89],vpx_dsp: add missing includes vp8,encodeframe.h: correct prototypes vp8: add temporal_filter.h add picklpf.h add ethreading.h vp8,bitstream.h: add missing prototypes vp8: remove vp8_fast_quantize_b_mmx vp8,loopfilter_filters: make some functions static vp9_ratectrl: make adjust_gf_boost_lag_one_pass_vbr static vp9_encodeframe: make scale_part_thresh_sumdiff static vp9_alt_ref_aq: correct vp9_alt_ref_aq_create proto tiny_ssim: make some functions static	2017-06-23 05:44:24 +00:00
Johann Koenig	794a5ad713	Merge "fdct32x32 neon implementation"	2017-06-23 01:58:00 +00:00
Johann	e67660cf37	fdct32x32 neon implementation Almost 3x faster in constrained loop testing. Over 10x faster in HBD builds. BUG=webm:1424 Change-Id: I2b7f8453e1d4ada63cde729d8115d684c4a71ff9	2017-06-22 06:40:17 -07:00
James Zern	44418c659f	vp[89],vpx_dsp: add missing includes quiets -Wmissing-prototypes Change-Id: I841cfc019d592f2bc6b3fec5818051a31f4c53b5	2017-06-21 19:00:15 -07:00
Linfeng Zhang	466b667ff3	Clean vpx_idct16x16_256_add_sse2() Remove macro IDCT16 which is redundant with idct16_8col(). Change-Id: I783c5f4fda038a22d5ee5c2b22e8c2cdfb38432c	2017-06-21 13:47:15 -07:00
Linfeng Zhang	42522ce0b7	Update vpx_idct{8x8,16x16,32x32}_1_add_sse2() Change-Id: I365f8e53d9ccd028cef0f561d4de9e5916278609	2017-06-21 13:47:05 -07:00
Linfeng Zhang	2b43a1ee18	Clean 32x32 full idct sse2 and ssse3 code vpx_idct32x32_1024_add_ssse3() is actually a sse2 function and faster than vpx_idct32x32_1024_add_sse2(). Replace the slow one. All are code relocations, no new code. Change-Id: I5dac0e98cc411a4ce05660406921118986638d19	2017-06-21 13:46:49 -07:00
Linfeng Zhang	c7e4917e97	Clean 8x8 idct x86 optimization Create load_buffer_8x8() and write_buffer_8x8(). Change-Id: Ib26dd515d734a5402971c91de336ab481b213fdf	2017-06-15 14:30:00 -07:00
Linfeng Zhang	98967645a1	Remove vpx_idct8x8_64_add_ssse3() It's almost identical with vpx_idct8x8_64_add_sse2(), except little difference in instructions order. Change-Id: Ie60dabc35eaa6ebae7c755e6cff00a710aad284f	2017-06-15 14:09:33 -07:00
Linfeng Zhang	6da6a23291	Update high bitdepth load_input_data() in x86 BUG=webm:1412 Change-Id: Ibf9d120b80c7d3a7637e79e123cf2f0aae6dd78c	2017-06-13 16:53:53 -07:00
Linfeng Zhang	d6eeef9ee6	Clean array_transpose_{4X8,16x16,16x16_2) in x86 Change-Id: I341399ecbde37065375ea7e63511a26bfc285ea0	2017-06-13 16:50:44 -07:00
Linfeng Zhang	9c72e85e4c	Remove array_transpose_8x8() in x86 Duplicate of transpose_16bit_8x8() Change-Id: Iaa5dd63b5cccb044974a65af22c90e13418e311f	2017-06-13 16:50:44 -07:00

1 2 3 4 5 ...

980 Commits