generic-library/vpx

Author	SHA1	Message	Date
Scott LaVarnway	2f7497f341	vpx_dsp: vpx_get16x16var_avx2() cleanup BUG=webm:1404 Change-Id: I88aceb07f4db4870a06eee21d87296974ce3221a	2017-08-18 12:23:49 -07:00
Johann Koenig	1426f04e91	Merge "quantize: normalize intermediate types"	2017-08-18 16:00:28 +00:00
Shiyou Yin	7d82e57f5b	vpx_dsp:loongson optimize vpx_subtract_block_c (case 4x4,8x8,16x16) with mmi. Change-Id: Ia120ad1064d0b6106d9685cf075bdab373eef19e	2017-08-18 09:06:49 +08:00
James Zern	bb15fd51be	highbd_idct32x32*,idct32_34_4x32_quarter_1_2: fix typo 135 -> 34 fixes unused function warnings for highbd_idct32_34_4x32_quarter_[12] Change-Id: I4f50ff6ea514200af93dd59ff94c7f9717409682	2017-08-17 15:37:38 -07:00
Johann	7f602d6114	quantize: normalize intermediate types Despite abs_coeff being a positive value, all the other implementations treat it as signed which simplifies restoring the sign. HBD builds cast qcoeff to avoid a visual studio warning. Match vp9_quantize.c style of casting the entire expression. Change-Id: I62b539b8df05364df3d7644311e325288da7c5b5	2017-08-17 12:34:28 -07:00
James Zern	e038d1610e	inv_txfm_sse2.h: correct idct/iadst prototypes fixes mismatch between prototypes and definitions Change-Id: Ib5e7dfcce244dbb8401815be2cdd183d96792652	2017-08-16 23:06:09 -07:00
Linfeng Zhang	f95686895b	Merge changes I08b562b6,Ia275940a,I51106e90 * changes: Add vpx_highbd_idct32x32_{34, 135, 1024}_add_{sse2, sse4_1} Update highbd idct x86 optimizations. Update 32x32 idct sse2 and ssse3 optimizations.	2017-08-16 16:36:37 +00:00
Jerome Jiang	6b9c691daf	Merge "Clean up writing YUV files for debug purpose."	2017-08-15 18:28:54 +00:00
Jerome Jiang	a153080b55	Clean up writing YUV files for debug purpose. Change legacy vp8/9_write_yuv_frame to vpx_write_yuv_files. Delete some flags that can be enabled during build. To enable writing denoised YUV, use the following command line: CFLAGS='-DOUTPUT_YUV_DENOISED' ./configure --enable-vp9-temporal-denoising For skinmap, use CFLAGS='-DOUTPUT_YUV_SKINMAP' Change-Id: I236974ac8b3cf279d20c4dc7f6162d8b480b6528	2017-08-15 10:44:03 -07:00
Johann	77ed4414d6	quantize: silence unsigned overflow warning The result of the xor operation is unsigned. If coeff was negative, this results in an unsigned value - INT_MIN. Change-Id: I1f1edeaa6de1f4c68b848e8a82a666d390b749f0	2017-08-15 09:48:24 -07:00
Linfeng Zhang	d72e20b123	Add vpx_highbd_idct32x32_{34, 135, 1024}_add_{sse2, sse4_1} BUG=webm:1412 Change-Id: I08b562b60fa85fbc2fec1c15c323a3444b44618f	2017-08-14 17:05:22 -07:00
Linfeng Zhang	69775d2f40	Update highbd idct x86 optimizations. BUG=webm:1412 Change-Id: Ia275940af7d7d8637e9a851a9e39d655bfbe4069	2017-08-14 16:59:50 -07:00
Linfeng Zhang	3f05a70c41	Update 32x32 idct sse2 and ssse3 optimizations. Change-Id: I51106e90344035452621c49a6e1be7d5276b6c70	2017-08-14 16:59:31 -07:00
Linfeng Zhang	15193ce51f	Merge "Clean highbd idct x86 code with inline functions"	2017-08-10 20:25:18 +00:00
Johann Koenig	9bb8ce5efb	Merge "neon: vpx_quantize_b_32x32"	2017-08-10 15:42:49 +00:00
Johann Koenig	0b393ae505	Merge "quantize: copy ssse3 optimizations to intrinsics"	2017-08-10 15:42:20 +00:00
Linfeng Zhang	39da7fb786	Clean highbd idct x86 code with inline functions Created inline functions highbd_butterfly_cospi16_sse2() and highbd_butterfly_cospi16_sse4_1() BUG=webm:1412 Change-Id: Icbc53a73712b6207379872a5e88d0a4d09e2322a	2017-08-08 17:53:28 -07:00
Johann	93166c5e51	neon: vpx_quantize_b_32x32 With skip block the neon is about twice as fast as C. The neon has no shortcut for coeff < zbin so it always takes the same amount of time. Even if the C can take the shortcut, it is over twice as fast in neon. If it can't, that gap increases to over 10x. BUG=webm:1426 Change-Id: I400722146c1b5a5f6289f67d85fd642463d2bfc6	2017-08-08 14:05:18 -07:00
Johann	d52cb59729	quantize: copy ssse3 optimizations to intrinsics Fairly minor differences from sse2. pabsw and psignw are the big gains. Also re-uses some values in eob calculation to avoid an extra pcmp. Fixes test failures in HBD and OS X builds. Allows using it in 32bit builds, where it is about 40% faster than sse2. Substantially faster than the assembly for skip_block. 10-20% faster the rest of the time. Change-Id: If783bb3567e561e47667e10133b9c84414a334e2	2017-08-08 12:22:14 -07:00
Linfeng Zhang	853165ba39	Update 32x32 idct sse2 funcs, add partial case 135 Change-Id: I2b9add83f6fd8f9138fed3bec04a59877a237a6a	2017-08-07 17:37:02 -07:00
Linfeng Zhang	d670678f26	Rename highbd_multiplication_and_add_xx() to highbd_butterfly_xx() in idct x86 code Change-Id: I5159499a73a5c1b680516f6ca9c3d84f00c35083	2017-08-04 15:33:37 -07:00
Linfeng Zhang	fa829e0e5a	Replace multiplication_and_add() with butterfly() in idct x86 code Change-Id: I266e45a3d75a5357c7d6e6f20ab5c6fdbfe4982e	2017-08-04 15:33:34 -07:00
Linfeng Zhang	c9fb719ee1	Update butterfly() in idct x86 optimizations. Change-Id: Ic73e03bab9fdc085146f52094014db4af36ad701	2017-08-04 15:33:28 -07:00
Linfeng Zhang	7f20c3ac44	Add vpx_highbd_idct16x16_{10, 38, 256}_add_sse4_1 BUG=webm:1412 Change-Id: I8877c986b4042f7b8e33f5674c86700675a0e4ca	2017-08-04 15:31:17 -07:00
Linfeng Zhang	22b6dc9fdf	Update for loop increment of idct x86 functions Change-Id: Ided7895eaf41d5bc9d64fe536a17f5a078da68d4	2017-08-04 15:29:19 -07:00
Linfeng Zhang	0c61331244	Update high bitdepth 16x16 idct x86 code Prepare for high bitdepth 16x16 idct sse4.1 code. Just functions moving and renaming. BUG=webm:1412 Change-Id: Ie056fe4494b1f299491968beadcef990e2ab714a	2017-08-04 15:12:33 -07:00
Scott LaVarnway	c42517568d	vpx_dsp: merge avx2 variance files BUG=webm:1404 Change-Id: Ieb8f85c3811b05df78722cb41eeb1166966ceec4	2017-08-04 07:49:30 -07:00
Linfeng Zhang	e921c7ba8d	Merge "Rewrite vpx_idct16x16_{10,256}_add_sse2() and add case 38 function"	2017-08-04 01:16:35 +00:00
Scott LaVarnway	f6c6f37e0c	Merge "vpx_dsp: Use correct check for halfpel in"	2017-08-03 23:17:09 +00:00
Linfeng Zhang	563d58ab84	Rewrite vpx_idct16x16_{10,256}_add_sse2() and add case 38 function BUG=webm:1412 Change-Id: I945f0fb6807b8948747243794dc7352b959221f7	2017-08-03 13:59:47 -07:00
Linfeng Zhang	6624f20785	Merge changes I76727df0,I66297d78,I1d000c6b * changes: Extract inlined 16x16 idct sse2 code into header file Add transpose_32bit_8x4() sse2 optimization Update x86 idct optimization	2017-08-03 20:51:02 +00:00
Scott LaVarnway	8334a48d3a	vpx_dsp: Use correct check for halfpel in vpx_sub_pixel_variance32xh_avx2() and vpx_sub_pixel_avg_variance32xh_avx2 see: `17fae3a` Change to use correct check for halfpel Change-Id: Ib0741c5c2fd011e9650ca62b76009f1b59fdbe4c	2017-08-03 06:57:40 -07:00
Linfeng Zhang	15a47db730	Extract inlined 16x16 idct sse2 code into header file Will be called by high bitdepth functions. Change-Id: I76727df00941b5a27adceaba8347f275475fcd8c	2017-08-02 16:17:43 -07:00
Linfeng Zhang	8c0ab7607e	Add transpose_32bit_8x4() sse2 optimization Change-Id: I66297d78b38db718cfe3ebb8ea972f5a72c17955	2017-08-02 16:15:58 -07:00
Scott LaVarnway	698e56f26c	Merge "vpxdsp: variance_impl_avx2.c cleanup"	2017-08-02 19:08:10 +00:00
Scott LaVarnway	632fe8286a	vpxdsp: variance_impl_avx2.c cleanup BUG=webm:1404 Change-Id: I8d8498009e5ef7bf1137e4ff16ec81738a020b02	2017-08-02 05:57:39 -07:00
Linfeng Zhang	6738ad7aaf	Update x86 idct optimization Move constant coefficients preparation into inline function. Change-Id: I1d000c6b161794c8828ff70768439b767e2afea1	2017-08-01 14:40:12 -07:00
Linfeng Zhang	c0490b52b1	Merge "Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2"	2017-08-01 21:39:39 +00:00
Johann Koenig	847394fe77	Merge "neon: vpx_quantize_b"	2017-08-01 16:44:31 +00:00
Linfeng Zhang	bf14d468c1	Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2 This replaces commit `aa1c4cd`, which has a bug and was reverted in commit `3c73e58`. The bug is caused by rounding -step1[5] in highbd_idct8x8_12_half1d(). Change-Id: I37b3a5f0d91815f2dc570209091dc6626fd178a8	2017-07-31 16:36:13 -07:00
Johann	2d6b5df657	neon: vpx_quantize_b With skip block or coeff < zbin it is about twice as fast as C. If most coeff values are > zbin it is about 10-15x as fast as C. BUG=webm:1426 Change-Id: I5d3c007b014a372d5ef0882b39bb48983b4131c7	2017-07-31 10:38:46 -07:00
James Zern	78155b7ed5	highbd_inv_txfm_sse4: make << of neg. val a multiply left shifting a negative value is undefined; quiets a ubsan warning. this is applied to a constant, no change in the generated code. Change-Id: I595f0ff7904ef025e07bb80234293d958dc9f254	2017-07-30 12:48:28 -07:00
James Zern	d35b627340	Revert "Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2" This reverts commit `aa1c4cd140`. This fails the following tests with extreme input coefficients: SSE2/InvTrans8x8DCT.CompareReference/0 SSE2/InvTrans8x8DCT.CompareReference/2 previously the optimized path was skipped in this range Change-Id: I9af015a46eba96208834a219fafd651d37556a80	2017-07-29 11:12:27 -07:00
Linfeng Zhang	75653b7032	Merge changes Ia0e20f5f,I28150789,I35df041b,I221dff34 * changes: Update vpx_idct16x16_10_add_sse2() Add vpx_idct16x16_38_add_sse2() Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2 Refactor highbd idct 4x4 and 8x8 x86 functions	2017-07-28 22:43:00 +00:00
James Zern	3c73e587d1	Revert "quantize ssse3: declare all variables" This reverts commit `03f5e300d6`. This causes test failures under OSX: SSSE3/VP9QuantizeTest.EOBCheck/0 SSSE3/VP9QuantizeTest.OperationCheck/0 Change-Id: I122732717ead1f7af5b04c529a6948e382e5e59b	2017-07-28 01:22:16 -07:00
Linfeng Zhang	5232e35bc2	Update vpx_idct16x16_10_add_sse2() Change-Id: Ia0e20f5fa47382af5785221eebb05212b40bd35c	2017-07-27 18:03:25 -07:00
Linfeng Zhang	7f4acf8700	Add vpx_idct16x16_38_add_sse2() Change-Id: I28150789feadc0b63d2fadc707e48971b41f9898	2017-07-27 18:02:43 -07:00
Linfeng Zhang	aa1c4cd140	Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2 BUG=webm:1412 Change-Id: I35df041b757d42278ac7a5cdbd909e8ffcee1455	2017-07-27 18:02:36 -07:00
Linfeng Zhang	9c43d81bc2	Refactor highbd idct 4x4 and 8x8 x86 functions BUG=webm:1412 Change-Id: I221dff34dd5f71b390b5e043d0a137ccb0a01dec	2017-07-27 18:01:03 -07:00
Johann Koenig	a83e1f1d53	Merge "quantize ssse3: declare all variables"	2017-07-27 21:18:35 +00:00
James Zern	1c666465af	inv_txfm_{sse2,ssse3}: clear conversion warnings visual studio reports tran_high_t (int64) -> short in calls to _mm_set1_epi16 Change-Id: Icb8d1baee77ad3d45edb1477a443d3e648f0b745	2017-07-25 20:13:49 -07:00
James Zern	62682ac8ad	highbd_idct_sse.c: clear conversion warnings visual studio reports tran_high_t (int64) -> int in calls to _mm_setr_epi32 Change-Id: Ic2247c8e3800991202151790d78bd94c4f4aed05	2017-07-25 20:11:09 -07:00
James Zern	85736e616e	vpx_variance16x16_sse2: correct cast order allow the right shift to operate on 64-bits, this matches the rest of the implementations previously: `b0f1ae147` vpx_get16x16var_avx2: correct cast order Change-Id: I632ee5e418f3f9b30e79ecd05588eb172b0783aa	2017-07-25 16:45:40 -07:00
Alexandra Hájková	666c543f7b	ppc: Add vpx_idct16x16_256_add_vsx Change-Id: Ibc3f7965423fd91179f8d8e77c7ae3e6d7f80572	2017-07-25 12:34:15 +00:00
James Zern	b0f1ae1475	vpx_get16x16var_avx2: correct cast order allow the right shift to operate on 64-bits, this matches the rest of the implementations missed in: `6acd061aa` variance_avx2: sync variance functions with c-code Change-Id: Icae436b881251ccb9f9ed64fcbf8d358c58a4617	2017-07-24 16:29:44 -07:00
Johann	4b9a848bb3	variance: call C comp_avg_pred Keep optimized code out of the reference implementation. This matches the style of the other sub calls. Change-Id: I3da6acd4f2c647b029c420e22ac9410a18259689	2017-07-18 20:22:53 +00:00
Johann	03f5e300d6	quantize ssse3: declare all variables Copy missing line from avx implementation. Change-Id: I9755c5b4d4034867de6fa9f741c24bf49dce3a27	2017-07-18 12:32:57 -07:00
Johann	e381753926	sad4d neon: 64x[32,64] Rewrite 64x64. BUG=webm:1425 Change-Id: I336bf5a3aa4b783389c10b16a50f0f559346ecbf	2017-07-12 13:26:39 +00:00
Johann	e1bde306c8	sad4d neon: 32x[16,32,64] Rewrite 32x32. Use half the accumulator registers. BUG=webm:1425 Change-Id: Ibf5e61dc4ba15056102aef8495f4a02c668c5d13	2017-07-12 13:25:18 +00:00
Johann	807ce8fb1e	sad4d neon: 16x[8,16,32] Rewrite 16x16. Use half the accumulator registers. BUG=webm:1425 Change-Id: I44b48512b1e3629505d83c2645e800f53878ccc2	2017-07-12 13:25:11 +00:00
Johann	8152b0904d	sad4d neon: 8x[4,8,16] BUG=webm:1425 Change-Id: I7de2500cca4b621f21478c4b0333c56d76dbc9a4	2017-07-12 13:25:03 +00:00
Johann	dd4347e9ec	sad4d neon: 4x4, 4x8 BUG=webm:1425 Change-Id: I5081b5ce131821d590c53ac1206a94f50cb8b468	2017-07-12 03:38:03 +00:00
Johann	66a96fd3de	avg_neon: fix 4x4, update 8x8 4x4 was failing with a bus error. Most likely due to clang alignment hints on 32bit loads. Change-Id: Ib191ce0e6239fc55d85f10e4dbe15876e5052edb	2017-07-10 15:29:34 -07:00
Johann	87610ac45e	neon: consolidate horizontal adds Change-Id: Iaf9e88ff636ccf8f0ef310869c6827f3f205cca8	2017-07-10 15:29:13 -07:00
Johann Koenig	4b78c6e6f7	Merge "remove vp9_full_sad_search"	2017-07-10 20:42:40 +00:00
Johann	109faffe9b	remove vp9_full_sad_search This code is unused in vp9. Only vp8 still contains references to vpx_sad_NxMx[3\|8] and only for sizes 16x16, 16x8, 8x16, 8x8 and 4x4. Remove the remaining sizes and all the highbitdepth versions. BUG=webm:1425 Change-Id: If6a253977c8e0c04599e25cbeb45f71a94f563e8	2017-07-10 11:20:35 -07:00
Johann Koenig	4e16f70703	Merge changes Id84d9780,Iaa6ea75b,I3362e0dd,I0020a49e,Ia42e4f36, ... * changes: sad neon: avg for 64x[32,64] sad neon: macroize 64xN definitions sad neon: avg for 32x[16,32,64] sad neon: macroize 32xN definitions sad neon: avg for 16x[8,16,32] sad neon: macroize 16xN definitions	2017-07-07 21:01:23 +00:00
Johann Koenig	6c375b9cd0	Merge "fdct neon: 32x32_rd"	2017-07-07 14:05:51 +00:00
Johann	e4e08556db	sad neon: avg for 64x[32,64] BUG=webm:1425 Change-Id: Id84d97807a6a0fbcc889c4dfe11929d54f85493d	2017-07-07 07:04:04 -07:00
Johann	6ae8f8dbe8	sad neon: macroize 64xN definitions Change-Id: Iaa6ea75b10e75784f31b1e08637eecf0dcb5cff9	2017-07-07 07:04:04 -07:00
Johann	67cffc1ef6	sad neon: avg for 32x[16,32,64] BUG=webm:1425 Change-Id: I3362e0dded3b46ca032caa7f44db42f324bc596d	2017-07-07 07:04:04 -07:00
Johann	b0d15713be	sad neon: macroize 32xN definitions Change-Id: I0020a49e77d27514375a03095d5821dc0aa7d128	2017-07-07 07:04:04 -07:00
Johann	527e0c9b1c	sad neon: avg for 16x[8,16,32] BUG=webm:1425 Change-Id: Ia42e4f36547c5fe12114fb58379e34bce82eb2f2	2017-07-07 07:04:04 -07:00
Johann	3c18acf452	sad neon: macroize 16xN definitions Change-Id: I5aea6ffbfa48eb1970afe3be54f0bba275d7fa58	2017-07-07 07:04:04 -07:00
Johann	d6423b3166	sad neon: macroize 8xN definitions Change-Id: I7b36a57e893c1795a37ba7994995bec7ff021409	2017-07-06 07:51:59 -07:00
Johann	63bdc574e5	sad neon: avg for 8x[4,8,16] BUG=webm:1425 Change-Id: If2ab51e3050e078b0011b174efe41fcb65a15f44	2017-07-06 07:43:09 -07:00
Johann	6bac3f80ee	sad neon: avg for 4x4 and 4x8 BUG=webm:1425 Change-Id: Ifc685a96cb34f7fd9243b4c674027480564b84fb	2017-07-06 07:12:47 -07:00
Johann	75b00592c7	fdct neon: 32x32_rd About 40% faster than the non-rd version. BUG=webm:1424 Change-Id: Ia99d14eb9532302eeaab8cd3e503395b0374b5a2	2017-07-06 06:30:50 -07:00
James Zern	a6531cbc54	Merge changes from topic 'missing-proto' * changes: fwd_txfm_msa.c: add missing vpx_dsp_rtcd.h vpx_convolve__msa.c: add missing vpx_dsp_rtcd.h loopfilter__msa.c: add missing vpx_dsp_rtcd.h	2017-07-05 20:00:25 +00:00
Johann Koenig	b6321025cd	Merge "partial fdct neon: maintain neon registers"	2017-07-05 19:12:38 +00:00
James Zern	fb135ff050	Merge changes I4ed1312f,Id2673eec * changes: ppc: Add vpx_idct8x8_64_add_vsx ppc: Add vpx_idct4x4_16_add_vsx	2017-07-02 02:38:39 +00:00
Alexandra Hájková	c757d6dde4	ppc: Add vpx_idct8x8_64_add_vsx Change-Id: I4ed1312f365509e0595dcc09890ecb050f6f2069	2017-07-01 12:55:47 -07:00
Alexandra Hájková	d8c277030c	ppc: Add vpx_idct4x4_16_add_vsx Change-Id: Id2673eece32027fb245919c7a5c81994a4a19fd8	2017-07-01 12:32:18 -07:00
James Zern	3dd993e4be	highbd_idct8x8_add_sse4: make << of neg. val a multiply left shifting a negative value is undefined; quiets a ubsan warning. this is applied to a constant, no change in the generated code. Change-Id: Ia17a7672d4832463decbc4afd6cd42974d02698e	2017-07-01 11:56:56 -07:00
Johann	3ae458f2f3	partial fdct neon: maintain neon registers Finish the calulations in neon registers. This avoids a potentially expensive move from neon to gp and allows at least clang to store directly to memory. BUG=webm:1424 Change-Id: Idef25eec95f7610947167818e9194bde8b00d282	2017-07-01 09:29:38 -07:00
James Zern	a876d04072	fwd_txfm_msa.c: add missing vpx_dsp_rtcd.h + only expose compatible functions in high-bitdepth build quiets -Wmissing-prototypes warnings Change-Id: I8ef7db08a34c5c54b5cde6e732c0d70f4287c89a	2017-06-30 18:53:30 -07:00
James Zern	8710c6d884	vpx_convolve_*_msa.c: add missing vpx_dsp_rtcd.h quiets -Wmissing-prototypes warnings Change-Id: I1ab5b8ae4a62f54e0f9eb3fc81371c9b99972c30	2017-06-30 18:50:56 -07:00
James Zern	329dabf57e	loopfilter_*_msa.c: add missing vpx_dsp_rtcd.h + make some functions static quiets -Wmissing-prototypes warnings Change-Id: I2130e06142e71a004a1eb30e173feba4f6fe68a0	2017-06-30 18:50:52 -07:00
James Zern	27e37e1a8a	fwd_txfm_msa.c: correct vpx_fdct8x8_1_msa prototype this makes the function compatible with high-bitdepth and fixes test failures since: `5ac88162b` partial fdct test Change-Id: Ib630694608237f0c515948942e05dbea259ba338	2017-06-30 18:50:47 -07:00
Linfeng Zhang	1e3a93e72e	Merge changes I5d038b4f,I9d00d1dd,I0722841d,I1f640db7 * changes: Add vpx_highbd_idct8x8_{12, 64}_add_sse4_1 sse2: Add transpose_32bit_4x4x2() and update transpose_32bit_4x4() Refactor highbd idct 4x4 sse4.1 code and add highbd_inv_txfm_sse4.h Refactor vpx_idct8x8_12_add_ssse3() and add inv_txfm_ssse3.h	2017-06-30 20:49:19 +00:00
Johann Koenig	89d3dc043e	Merge changes Id5beb35d,I2945fe54,Ib0f3cfd6,I78a2eba8 * changes: partial fdct neon: add 32x32_1 partial fdct neon: add 16x16_1 partial fdct neon: add 4x4_1 partial fdct neon: move 8x8_1 and enable hbd tests	2017-06-30 01:00:07 +00:00
Linfeng Zhang	c338f3635e	Add vpx_highbd_idct8x8_{12, 64}_add_sse4_1 BUG=webm:1412 Change-Id: I5d038b4fa842ce2f6b9bd5c8c44c70647bda9591	2017-06-29 17:19:34 -07:00
Linfeng Zhang	ee5cb8d87f	sse2: Add transpose_32bit_4x4x2() and update transpose_32bit_4x4() BUG=webm:1412 Change-Id: I9d00d1ddbd724fd5f825fd974c4cf46a9bca6cb3	2017-06-29 17:18:01 -07:00
Linfeng Zhang	0fa59a4baf	Refactor highbd idct 4x4 sse4.1 code and add highbd_inv_txfm_sse4.h Also clean highbd_inv_txfm_sse2.h BUG=webm:1412 Change-Id: I0722841d824ce602874019bd9779b10d49d10c0b	2017-06-29 17:17:43 -07:00
Linfeng Zhang	9ac78ae35f	Refactor vpx_idct8x8_12_add_ssse3() and add inv_txfm_ssse3.h BUG=webm:1412 Change-Id: I1f640db71ad4c644b7521305a781f2218eb1ba9d	2017-06-29 17:13:28 -07:00
James Zern	bd77931421	dct_partial_test,fwd_txfm: change << to * left shift of a negative number is undefined in C; quiets a ubsan warning Change-Id: Ib1624ad5326ac8e0eead9348468ef7fe5d4df9a4	2017-06-29 14:42:03 -07:00
Johann	9fe510c12a	partial fdct neon: add 32x32_1 Always return an int32_t. Since it needs to be moved to a register for shifting, this doesn't really penalize the smaller transforms. The values could potentially be summed and shifted in place. BUG=webm:1424 Change-Id: Id5beb35d79c7574ebd99285fc4182788cf2bb972	2017-06-28 15:37:44 -07:00
Johann	f310ddc470	partial fdct neon: add 16x16_1 For the 8x8_1, the highbd output fit nicely in the existing function. 12 bit input will overflow this implementation of 16x16_1. BUG=webm:1424 Change-Id: I2945fe5478b18f996f1a5de80110fa30f3f4e7ec	2017-06-28 15:37:44 -07:00
Johann	4959dd3eb3	partial fdct neon: add 4x4_1 BUG=webm:1424 Change-Id: Ib0f3cfd6116fc1f5a99acb8bfd76e25b90177ffc	2017-06-28 15:37:44 -07:00
Johann	cf75ab6ccd	partial fdct neon: move 8x8_1 and enable hbd tests The function was originally written with HBD in mind. Enable it and configure the tests. BUG=webm:1424 Change-Id: I78a2eba8d4d9d59db98a344ba0840d4a60ebe9a1	2017-06-28 15:37:43 -07:00
Johann Koenig	81e25512c3	Merge changes Ib454762d,I966650df,Ie126553e,I068f06c6,Icb72a94e * changes: sad neon: rewrite 64x64 and add 64x32 sad neon: rewrite 32x32, add 32x16 and 32x64 sad neon: rewrite 16x8, 16x16, add 16x32 sad neon: rewrite 8x8 and 8x16 sad neon: rewrite 4x4 and add 4x8	2017-06-28 22:37:00 +00:00
Johann Koenig	35f8515c3f	Merge "partial fdct test"	2017-06-28 22:34:53 +00:00
Johann	5ac88162b9	partial fdct test Test the _1 variant of the fdct, which simply sums the block and applies a modifying shift based on the block size. BUG=webm:1424 Change-Id: Ic80d6008abba0c596b575fa0484d5b5855321468	2017-06-28 20:32:20 +00:00
Johann	ad011aaab8	sad neon: rewrite 64x64 and add 64x32 BUG=webm:1425 Change-Id: Ib454762d1c61b05a98324fe81ad58c9e09784717	2017-06-28 12:21:34 -07:00
Johann	77a648885c	sad neon: rewrite 32x32, add 32x16 and 32x64 BUG=webm:1425 Change-Id: I966650df7e3face93e1e771634d1cc5458a35f85	2017-06-28 12:20:27 -07:00
Johann	469643757f	sad neon: rewrite 16x8, 16x16, add 16x32 BUG=webm:1425 Change-Id: Ie126553e5fffcdfaf3d82a85b368ac10ce9ab082	2017-06-28 12:16:00 -07:00
Johann	e40e78be24	sad neon: rewrite 8x8 and 8x16 BUG=webm:1425 Change-Id: I068f06c67b841f09ea07c04ada0c2f1706102138	2017-06-28 12:15:57 -07:00
Johann	46d8660ce3	sad neon: rewrite 4x4 and add 4x8 The previous implementation loaded 8 values (discarding half) BUG=webm:1425 Change-Id: Icb72a94e2557a4ee2db7091266ab58fd92f72158	2017-06-28 11:14:59 -07:00
Linfeng Zhang	0bb31a46a4	Update vpx_idct8x8_12_add_ssse3() Change-Id: I0f38801c391db87ddae168602a786a062cd34b1d	2017-06-26 14:57:41 -07:00
Linfeng Zhang	a76b6b232c	Update load_input_data() in x86 Split to load_input_data4() and load_input_data8(). Use pack with signed saturation instruction for high bitdepth. Change-Id: Icda3e0129a6fdb4a51d1cafbdc652ae3a65f4e06	2017-06-26 13:38:33 -07:00
Linfeng Zhang	8253a27904	Add vpx_highbd_idct4x4_16_add_sse4_1() BUG=webm:1412 Change-Id: Ie33482409351a01be4e89466b0441834eb1e905a	2017-06-23 14:30:12 -07:00
Linfeng Zhang	b8a4b5dd8d	Cosmetics, 8x8 idct SSE2 optimization Change-Id: Id21fa94fd323e36cd19a2d890bf4a0cafb7d964d	2017-06-23 14:30:12 -07:00
James Zern	88a302e743	Merge changes from topic 'missing-proto' * changes: onyxd_int.h: add missing prototypes onyxd.h: add vp8dx_references_buffer prototype vp[89],vpx_dsp: add missing includes vp8,encodeframe.h: correct prototypes vp8: add temporal_filter.h add picklpf.h add ethreading.h vp8,bitstream.h: add missing prototypes vp8: remove vp8_fast_quantize_b_mmx vp8,loopfilter_filters: make some functions static vp9_ratectrl: make adjust_gf_boost_lag_one_pass_vbr static vp9_encodeframe: make scale_part_thresh_sumdiff static vp9_alt_ref_aq: correct vp9_alt_ref_aq_create proto tiny_ssim: make some functions static	2017-06-23 05:44:24 +00:00
Johann Koenig	794a5ad713	Merge "fdct32x32 neon implementation"	2017-06-23 01:58:00 +00:00
Johann	e67660cf37	fdct32x32 neon implementation Almost 3x faster in constrained loop testing. Over 10x faster in HBD builds. BUG=webm:1424 Change-Id: I2b7f8453e1d4ada63cde729d8115d684c4a71ff9	2017-06-22 06:40:17 -07:00
James Zern	44418c659f	vp[89],vpx_dsp: add missing includes quiets -Wmissing-prototypes Change-Id: I841cfc019d592f2bc6b3fec5818051a31f4c53b5	2017-06-21 19:00:15 -07:00
Linfeng Zhang	466b667ff3	Clean vpx_idct16x16_256_add_sse2() Remove macro IDCT16 which is redundant with idct16_8col(). Change-Id: I783c5f4fda038a22d5ee5c2b22e8c2cdfb38432c	2017-06-21 13:47:15 -07:00
Linfeng Zhang	42522ce0b7	Update vpx_idct{8x8,16x16,32x32}_1_add_sse2() Change-Id: I365f8e53d9ccd028cef0f561d4de9e5916278609	2017-06-21 13:47:05 -07:00
Linfeng Zhang	2b43a1ee18	Clean 32x32 full idct sse2 and ssse3 code vpx_idct32x32_1024_add_ssse3() is actually a sse2 function and faster than vpx_idct32x32_1024_add_sse2(). Replace the slow one. All are code relocations, no new code. Change-Id: I5dac0e98cc411a4ce05660406921118986638d19	2017-06-21 13:46:49 -07:00
Linfeng Zhang	c7e4917e97	Clean 8x8 idct x86 optimization Create load_buffer_8x8() and write_buffer_8x8(). Change-Id: Ib26dd515d734a5402971c91de336ab481b213fdf	2017-06-15 14:30:00 -07:00
Linfeng Zhang	98967645a1	Remove vpx_idct8x8_64_add_ssse3() It's almost identical with vpx_idct8x8_64_add_sse2(), except little difference in instructions order. Change-Id: Ie60dabc35eaa6ebae7c755e6cff00a710aad284f	2017-06-15 14:09:33 -07:00
Linfeng Zhang	6da6a23291	Update high bitdepth load_input_data() in x86 BUG=webm:1412 Change-Id: Ibf9d120b80c7d3a7637e79e123cf2f0aae6dd78c	2017-06-13 16:53:53 -07:00
Linfeng Zhang	d6eeef9ee6	Clean array_transpose_{4X8,16x16,16x16_2) in x86 Change-Id: I341399ecbde37065375ea7e63511a26bfc285ea0	2017-06-13 16:50:44 -07:00
Linfeng Zhang	9c72e85e4c	Remove array_transpose_8x8() in x86 Duplicate of transpose_16bit_8x8() Change-Id: Iaa5dd63b5cccb044974a65af22c90e13418e311f	2017-06-13 16:50:44 -07:00
Linfeng Zhang	cbb991b6b8	Convert 8x8 idct x86 macros to inline functions Change-Id: Id59865fd6c453a24121ce7160048d67875fc67ce	2017-06-13 16:50:43 -07:00
Jerome Jiang	943f9ee25c	Merge "Merge skin detection code in vp8/9."	2017-06-08 16:36:00 +00:00
Johann Koenig	903375a48a	Merge "fdct16x16 neon optimization"	2017-06-08 15:19:36 +00:00
Jerome Jiang	658e854252	Merge skin detection code in vp8/9. BUG=webm:1438 Change-Id: Ie3dc034c7dbb498a0b088a767b1936ddeed4df14	2017-06-07 21:20:34 -07:00
Johann	eae7cf2368	fdct16x16 neon optimization Roughly 2x speedup. Since the only change for HBD is to store(), the improvement appears to hold there as well. BUG=webm:1424 Change-Id: I15b813d50deb2e47b49a6b0705945de748e83c19	2017-06-07 14:59:55 -07:00
James Zern	ff42e04f9c	Merge "ppc: Add vpx_sadnxmx4d_vsx for n,m = {8, 16, 32 ,64}"	2017-06-06 23:52:39 +00:00
James Zern	4753c23983	Merge "ppc: Add vpx_sad64/32/16x64/32/16_avg_vsx"	2017-06-06 02:19:41 +00:00
Johann Koenig	755b3daf90	Merge "comp_avg_pred neon: used by sub pixel avg variance"	2017-05-31 18:17:28 +00:00
Linfeng Zhang	30ea3ef283	Merge "Update vpx_highbd_idct4x4_16_add_sse2()"	2017-05-31 15:56:20 +00:00
Johann	f695b30ac2	comp_avg_pred neon: used by sub pixel avg variance BUG=webm:1423 Change-Id: I33de537f238f58f89b7a6c1c2d6e8110de4b8804	2017-05-30 22:47:34 +00:00
Linfeng Zhang	45048dc9dc	Update vpx_highbd_idct4x4_16_add_sse2() BUG=webm:1412 Change-Id: I26e4b34ae9bc1ae80c24f56d740d737a95f1ab84	2017-05-30 09:25:30 -07:00
Johann Koenig	b9649d2407	Merge "comp_avg_pred: alignment"	2017-05-30 16:21:05 +00:00
Johann	ea8b4a450d	comp_avg_pred: alignment x86 requires 16 byte alignment for some vector loads/stores. arm does not have the same requirement. The asserts are still in avg_pred_sse2.c. This just removes them from the common code. Change-Id: Ic5175c607a94d2abf0b80d431c4e30c8a6f731b6	2017-05-30 07:46:43 -07:00
Johann	42ce25821d	remove DECLARE_ALIGNED from neon code Unlike x86 neon only requires type alignment when loading into vectors. Change-Id: I7bbbe4d51f78776e499ce137578d8c0effdbc02f	2017-05-26 10:41:57 -07:00
Johann	f3c97ed32e	subpel variance neon: reduce stack usage Unlike x86, arm does not impose additional alignment restrictions on vector loads. For incoming values to the first pass, it uses vld1_u32() which typically does impose a 4 byte alignment. However, as the first pass operates on user-supplied values we must prepare for unaligned values anyway (and have, see mem_neon.h). But for the local temporary values there is no stride and the load will use vld1_u8 which does not require 4 byte alignment. There are 3 temporary structures. In the C, one is uint16_t. The arm saturates between passes but still passes tests. If this becomes an issue new functions will be needed. Change-Id: I3c9d4701bfeb14b77c783d0164608e621bfecfb1	2017-05-24 13:28:13 -07:00
Johann	d204c4bf01	Use vdup instead of vmov Change-Id: Idb6248c1429b55176bb3e9f4e8365ea0ed2be62a	2017-05-24 11:38:15 -07:00
Johann Koenig	de1a9c77a7	Merge changes Iaab2b9a1,Idfb458d3 * changes: sub pel avg variance neon: 4x block sizes sub pel variance neon: 4x block sizes	2017-05-24 18:33:53 +00:00
Johann Koenig	b11a37f540	Merge changes I31fa6ef8,I228c6f29 * changes: sub pel avg variance neon: add neon optimizations sub pel variance neon: normalize variable names	2017-05-24 18:32:02 +00:00
Alexandra Hájková	8bf6eaf433	ppc: Add vpx_sadnxmx4d_vsx for n,m = {8, 16, 32 ,64} Change-Id: I547d0099e15591655eae954e3ce65fdf3b003123	2017-05-24 13:27:09 +00:00
Linfeng Zhang	6444958f62	Update inv_txfm_sse2.h and inv_txfm_sse2.c Extract shared code into inline functions. Change-Id: Iee1e5a4bc6396aeed0d301163095c9b21aa66b2f	2017-05-23 14:54:46 -07:00
Johann	f6fcd3410d	sub pel avg variance neon: 4x block sizes BUG=webm:1423 Change-Id: Iaab2b9a183fdb54aae5f717aba95d90dc36a9e3b	2017-05-22 14:40:05 -07:00
Johann	188d58eaa9	sub pel variance neon: 4x block sizes Add optimizations for blocks of width 4 BUG=webm:1423 Change-Id: Idfb458d36db3014d48fbfbe7f5462aa6eb249938	2017-05-22 14:40:01 -07:00
Johann	9b0d306a2f	sub pel avg variance neon: add neon optimizations These are missing an optimized version of vpx_comp_avg_pred BUG=webm:1423 Change-Id: I31fa6ef842e98f7ff3ea079ffed51ae33178e2ed	2017-05-22 13:58:43 -07:00
Johann	e0d294c3af	sub pel variance neon: normalize variable names match vpx_dsp/variance.c variable names Change-Id: I228c6f296c183af147b079b7c8bcdf97bd09cf3a	2017-05-22 13:58:43 -07:00
Linfeng Zhang	27beada6d0	Merge "Add vpx_highbd_idct{4x4,8x8,16x16}_1_add_sse2"	2017-05-22 20:58:18 +00:00
Johann	67ac68e399	variance neon: assert overflow conditions Change-Id: I12faca82d062eb33dc48dfeb39739b25112316cd	2017-05-22 11:25:06 -07:00

1 2 3 4 5 ...

954 Commits