generic-library/vpx

Author	SHA1	Message	Date
Kaustubh Raste	339f4dcaee	mips msa optimize vpx_scaled_2d function Change-Id: I638507b360c71489ab0e87bd558d2719ad995333	2017-11-29 13:27:04 +05:30
Kyle Siefring	dd4cc5b596	Merge "Optimize AVX2 get16x16var and get32x16var functions"	2017-11-20 22:37:57 +00:00
Kyle Siefring	07a0bf038f	Optimize AVX2 get16x16var and get32x16var functions Change-Id: If8b91aaa883c01107f0ea3468139fa24cfb301d2	2017-11-17 13:55:49 -05:00
Jerome Jiang	ea14a1a965	Merge "vp9: Fix mem rel for non-ref for external buffer."	2017-11-17 00:31:16 +00:00
Scott LaVarnway	8e6022844f	vpx: [x86] add vpx_satd_avx2() SSE2 instrinsic vs AVX2 intrinsic speed gains: blocksize 16: ~1.33 blocksize 64: ~1.51 blocksize 256: ~3.03 blocksize 1024: ~3.71 Change-Id: I79b28cba82d21f9dd765e79881aa16d24fd0cb58	2017-11-10 12:24:12 -08:00
Scott LaVarnway	8c7213bc00	Merge "vpx: [x86] add vp9_block_error_fp_avx2()"	2017-11-10 00:45:47 +00:00
Jerome Jiang	6246d8aa76	vp9: Fix mem rel for non-ref for external buffer. Release frame buffers for non-ref when the decoder is destroyed. Enable the non ref test. BUG=b/68819248 Change-Id: Id87ef3b0a62318f9812e927cd957c05c859047fa	2017-11-09 15:47:21 -08:00
Scott LaVarnway	62ab5e99c1	vpx: [x86] add vp9_block_error_fp_avx2() SSE2 asm vs AVX2 intrinsics speed gains: blocksize 16: ~1.00 blocksize 64: ~1.17 blocksize 256: ~1.67 blocksize 1024: ~1.81 Change-Id: I2a86db239cf57e3ff617890ccb2d236aba83ad5e	2017-11-09 05:02:31 -08:00
Jerome Jiang	adbb4c4d32	Merge "vp9: Add nonref frame buffer test."	2017-11-09 04:41:10 +00:00
Jerome Jiang	a68bbcff29	vp9: Add nonref frame buffer test. The new test will run a SVC bitstream which has non ref frames. It checks the number of buffer acquired and released to make sure all external frame buffers are released. Add a new test bitstream: vp90-2-22-svc_1280x720_1.webm which has 400 frames in total, and 1 spatial layer and 2 temporal layers. There is one non ref frame every other frame. Disabled for now. Will be enabled with the fix. BUG=b/68819248 Change-Id: I0515336fd9809a9e1fceba90e4dce53dabaf53a5	2017-11-08 18:41:33 -08:00
Kyle Siefring	b383a17fa4	Support building AVX-512 and implement sadx4 for AVX-512 The added AVX-512 support requires the subset of AVX-512 added in Skylake-X. Change-Id: I39666b00d10bf96d06c709823663eb09b89265b7	2017-11-03 13:37:23 -04:00
Scott LaVarnway	3bf02ad74a	vpx: hadamard: use ptrdiff_t instead of int for stride Eliminates the following instruction for the x86 (64 bit) intrinsic code: movslq %esi,%rax Change-Id: I8f5ebd40726f998708a668b0f52ea7a0576befae	2017-10-26 11:41:48 -07:00
Kyle Siefring	037e596f04	Merge "Optimize convolve8 SSSE3 and AVX2 intrinsics"	2017-10-24 19:22:36 +00:00
Kyle Siefring	ae35425ae6	Optimize convolve8 SSSE3 and AVX2 intrinsics Changed the intrinsics to perform summation similiar to the way the assembly does. The new code diverges from the assembly by preferring unsaturated additions. Results for haswell SSSE3 Horiz/Vert Size Speedup Horiz x4 ~32% Horiz x8 ~6% Vert x8 ~4% AVX2 Horiz/Vert Size Speedup Horiz x16 ~16% Vert x16 ~14% BUG=webm:1471 Change-Id: I7ad98ea688c904b1ba324adf8eb977873c8b8668	2017-10-24 10:39:48 -04:00
Scott LaVarnway	b58259ab55	Merge "vpx: [x86] add vpx_hadamard_16x16_avx2()"	2017-10-19 23:32:10 +00:00
Scott LaVarnway	55c126a5d7	vpx: [x86] add vpx_hadamard_16x16_avx2() This version is ~1.91x faster than the sse2 version. When highbitdepth is enabled, it is ~1.74x. Change-Id: I2b0e92ede9f55c6259ca07bf1f8c8a5d0d0955bd	2017-10-18 18:00:00 -07:00
Jerome Jiang	401e6d48bf	Merge "Add datarate test for vp8 ROI."	2017-10-18 19:39:26 +00:00
Jerome Jiang	bd6d82e881	Add datarate test for vp8 ROI. BUG=webm:1470 Change-Id: Icbc848837e64eacc49491dcc26b4c5802af2ee13	2017-10-18 11:19:59 -07:00
Jerome Jiang	ec2fced451	Merge "vp8: Enable use of ROI map."	2017-10-18 18:16:44 +00:00
Jerome Jiang	dbb8926b86	vp8: Enable use of ROI map. Disable cyclic refresh if ROI is used and add flag to properly handle the static_thresh deltas. Remove the ROI test for cyclic refresh (it's allowed but disabled if ROI is used). Add an example in vpx_temporal_svc_encoder.c. Turned off by default. BUG=webm:1470 Change-Id: Ief9ba1d7f967bc00511b412b491c3f70943bfbda	2017-10-17 15:23:03 -07:00
Linfeng Zhang	9336e01621	Merge changes I17fff122,Ic149e3cb * changes: Add 4 to 3 scaling SSSE3 optimization Test extreme inputs in frame scale functions	2017-10-17 16:03:29 +00:00
Linfeng Zhang	0d2e95193b	Merge "Generalize CheckScalingFiltering in ConvolveTest"	2017-10-17 16:03:07 +00:00
Shiyou Yin	3e2770de4f	Merge "vp8: [loongson] optimize dct with mmi"	2017-10-13 00:37:57 +00:00
Kyle Siefring	caa116c9be	Merge changes I38783d97,If5160c0c * changes: Extend 16 wide AVX2 convolve8 code to support averaging. Add AVX2 version of vpx_convolve8_avg.	2017-10-12 16:12:38 +00:00
Shiyou Yin	f70de09f2a	vp8: [loongson] optimize dct with mmi 1. vp8_short_fdct4x4_mmi 2. vp8_short_fdct8x4_mmi 3. vp8_short_walsh4x4_mmi Change-Id: I89a7df25cfd09fae309fac257ad8b6a3dc1c8acb	2017-10-12 08:50:04 +08:00
Shiyou Yin	bc4098a8e9	Merge "vp8: [loongson] optimize quantize with mmi"	2017-10-12 00:33:17 +00:00
Marco	72c69e14ad	Adjust threshold in datarate tests for 1 pass VBR Small increase in threshold for the 1 pass VBR datarate tests. Needed due to commit: <017257a Adjustment to scene detection and key frame> Change-Id: I28b3bd7db2192a8cc2bccc3cb0e3b8dbb910ca16	2017-10-11 11:48:36 -07:00
Linfeng Zhang	1fa3ec3023	Test extreme inputs in frame scale functions Change-Id: Ic149e3cb59be2ee0f98a3fcfd83226ad5ea30c99	2017-10-11 11:35:19 -07:00
Shiyou Yin	e8ed2bb762	vp8: [loongson] optimize quantize with mmi 1. vp8_fast_quantize_b_mmi 2. vp8_regular_quantize_b_mmi Change-Id: Ic6e21593075f92c1004acd67184602d2aa5d5646	2017-10-11 16:45:58 +08:00
Linfeng Zhang	54f7d68c5c	Generalize CheckScalingFiltering in ConvolveTest Let it test extreme inputs and all filter types. In the future ConvolveTest should test regular 8-bit functions in high bitdepth mode. Change-Id: I1042564d1d390589ca203070fe332c6da3315d75	2017-10-10 14:12:43 -07:00
Kyle Siefring	1b2f92ee8e	Extend 16 wide AVX2 convolve8 code to support averaging. Also adds vpx_convolve8_avg_horiz_avx2. Change-Id: I38783d972ac26bec77610e9e15a0a058ed498cbf	2017-10-09 19:10:03 -04:00
Linfeng Zhang	e1ae3772da	Merge "Update vp9_scale_and_extend_frame_ssse3()"	2017-10-09 16:20:00 +00:00
Kyle Siefring	9ca06bcdd2	Add AVX2 version of vpx_convolve8_avg. vpx_convolve8_avg works by first running a normal horizontal filter then a vertical filter averages at the end. The added vpx_convolve8_avg_avx2 calls pre-existing AVX2 code for the horizontal step. vpx_convolve8_avg_vert_avx2 is also added, but only uses ssse3 code. Change-Id: If5160c0c8e778e10de61ee9bf42ee4be5975c983	2017-10-07 23:37:48 -04:00
James Zern	807248ec81	Merge "ppc: Add vpx_idct32x32_1024_add_vsx"	2017-10-07 19:08:26 +00:00
James Zern	107eb6a9d4	vp9_ethread_test: abort early/add more detailed output in the case compare_fp_stats fails report the 2 values and their index Change-Id: I927a832b7a1e24c392961093b7caee1134223def	2017-10-05 15:02:51 -07:00
Linfeng Zhang	b809442521	Update vp9_scale_and_extend_frame_ssse3() Change-Id: I22622faebfcc36f7a4d1f37e3800ae8ab87c8cd4	2017-10-04 12:32:30 -07:00
Linfeng Zhang	9a71811d98	Merge changes Id6a8c549,Ib1e0650b,Ic369dd86 * changes: Refactor x86/vpx_subpixel_8t_intrin_ssse3.c Add vpx_dsp/x86/mem_sse2.h Add transpose_8bit_{4x4,8x8}() x86 optimization	2017-10-04 16:15:14 +00:00
Jerome Jiang	ffa3a3c441	Merge "Fix image width alignment. Enable ImageSizeSetting test."	2017-10-04 14:48:03 +00:00
Linfeng Zhang	6543213e87	Refactor x86/vpx_subpixel_8t_intrin_ssse3.c Change-Id: Id6a8c549709a3c516ed5d7b719b05117c5ef8bac	2017-10-03 13:02:05 -07:00
Alexandra Hájková	fb7fc1dbda	ppc: Add vpx_idct32x32_1024_add_vsx Change-Id: I55cd0a1569ccc47a53d0ecf751aac259d510e10d	2017-09-30 19:31:20 +00:00
Scott LaVarnway	3bbd62ed27	vpxdsp: [x86] add highbd_d135_predictor functions C vs SSE2 speed gains: _4x4 : ~1.81x C vs SSSE3 speed gains: _8x8 : ~1.96x _16x16 : ~1.88x _32x32 : ~2.02x BUG=webm:1411 Change-Id: Iefaf8b39afbbfe34c1ad1d21e3a003b20f1f61e0	2017-09-29 08:56:38 -07:00
Scott LaVarnway	4cae64c32c	vpxdsp: [x86] add highbd_d117_predictor functions C vs SSE2 speed gains: _4x4 : ~2.04x C vs SSSE3 speed gains: _8x8 : ~2.82x _16x16 : ~5.93x _32x32 : ~2.79x BUG=webm:1411 Change-Id: I31d949695991c067dac89d91e0bed3e666c94993	2017-09-28 14:45:28 -07:00
Jerome Jiang	5a40c8fde1	Fix image width alignment. Enable ImageSizeSetting test. BUG=b/64710201 Change-Id: I5465f6c6481d3c9a5e00fcab024cf4ae562b6b01	2017-09-28 11:25:24 -07:00
Scott LaVarnway	80992a746c	Merge "vpxdsp: [x86] add highbd_d153_predictor functions"	2017-09-27 20:40:21 +00:00
James Zern	690fa6bb6e	Merge "fix signed integer overflow of idct"	2017-09-27 19:39:11 +00:00
Linfeng Zhang	dbbbd44304	fix signed integer overflow of idct Exposed by fuzz test in high bitdepth. The bug is introduced in commit `64653fa`. BUG=webm:1466 Change-Id: Idd77d5c6a60efb9241471611ce1aba0646cb6ff5	2017-09-27 11:17:54 -07:00
Scott LaVarnway	19c45ccd43	vpxdsp: [x86] add highbd_d153_predictor functions C vs SSE2 speed gains: _4x4 : ~1.95x C vs SSSE3 speed gains: _8x8 : ~3.30x _16x16 : ~5.67x _32x32 : ~3.87x BUG=webm:1411 Change-Id: Ib483989b25614aa89b635e8c087d0879a5d71904	2017-09-27 11:01:11 -07:00
Linfeng Zhang	d203a91a09	Merge "Add vpx_scaled_2d_neon()"	2017-09-27 16:12:48 +00:00
Jerome Jiang	878464150b	Merge "Add unit test to expose vp8 bug when width is set odd."	2017-09-27 01:26:59 +00:00
Jerome Jiang	767503504f	Add unit test to expose vp8 bug when width is set odd. BUG=b/64710201 Change-Id: Ia518af5494a42e80949cf1165244fbed59606cf7	2017-09-26 17:40:13 -07:00
Linfeng Zhang	9d0d13e939	Add vpx_scaled_2d_neon() BUG=webm:1419 Change-Id: I39c8033734562efc0ac0e28e7f06fa05130f9b96	2017-09-26 09:22:39 -07:00
Linfeng Zhang	28762341ac	Merge changes Ib9105462,Idfac00ed,If8d8a0e2 * changes: cosmetics: NEON scaling code Refactor convolve NEON code Refactor convolve code	2017-09-26 16:10:46 +00:00
Scott LaVarnway	cf82f7276e	vpxdsp: [x86] add highbd_d45_predictor functions C vs SSSE3 speed gains: _4x4 : ~2.45x _8x8 : ~10.61x _16x16 : ~11.34x _32x32 : ~6.36x BUG=webm:1411 Change-Id: Ic91389a4f1a8ad093f498afe53765b897fb9be09	2017-09-22 05:20:12 -07:00
Scott LaVarnway	b85e391ac8	Merge "vpxdsp: [x86] add highbd_d63_predictor functions"	2017-09-20 11:39:28 +00:00
Linfeng Zhang	7c0529728a	cosmetics: NEON scaling code Change-Id: Ib91054622c1f09c4ca523bc6837d7d8ab9f03618	2017-09-19 16:39:17 -07:00
Scott LaVarnway	bc86e2c6a2	vpxdsp: [x86] add highbd_d63_predictor functions C vs SSE2 speed gains: _4x4 : ~2.94x C vs SSSE3 speed gains: _8x8 : ~8.69x _16x16 : ~6.32x _32x32 : ~5.33x BUG=webm:1411 Change-Id: I2c35b527eac2229f17aaa9d118fb601e7195efe4	2017-09-19 15:47:22 -07:00
Marco	ad31fe36a8	Add datarate test for frame_parallel_decoding mode off. Add datarate test, for both VBR and CBR mode, with the frame_parallel_decoding mode disabled (and error_resilience off). Change-Id: I54feec3248a68ecff4bef8d9a31bb1616fab77df	2017-09-15 11:38:38 -07:00
James Zern	90ed0d2f73	Merge "vp9_scale_test: add C config"	2017-09-15 00:27:58 +00:00
Hui Su	293734b755	Merge "VP9 level targeting: add a new AUTO mode"	2017-09-14 21:02:38 +00:00
James Zern	c24d911847	vp9_scale_test: add C config Change-Id: I9dfe8255d1c096d246bf9719729f57dbae779ffc	2017-09-14 13:08:04 -07:00
Hui Su	c3a6943c16	VP9 level targeting: add a new AUTO mode In the new AUTO mode, restrict the minimum alt-ref interval and max column tiles adaptively based on picture size, while not applying any rate control constraints. This mode aims to produce encodings that fit into levels corresponding to the source picture size, with minimum compression quality lost. However, the bitstream is not guaranteed to be level compatible, e.g., the average bitrate may exceed level limit. BUG=b/64451920 Change-Id: I02080b169cbbef4ab2e08c0df4697ce894aad83c	2017-09-14 16:20:29 +00:00
Shiyou Yin	5b558592f5	vp8: [loongson] optimize idctllm with mmi 1. vp8_short_idct4x4llm_mmi 2. vp8_short_inv_walsh4x4_mmi 3. vp8_dc_only_idct_add_mmi Change-Id: I616923681e79d78607a4988608fc39df77b093f4	2017-09-14 16:51:11 +08:00
Johann	eb4238ac70	Revert "Revert "quantize avx: copy 32x32 implementation"" This reverts commit `8c42237bb2`. Because ssse3 code is used for the reference, the qcoeff and dqcoeff reference buffers must be aligned. Original change's description: > quantize avx: copy 32x32 implementation > > Ensure avx and ssse3 stay in sync by testing them against each other. > > Change-Id: I699f3b48785c83260825402d7826231f475f697c Change-Id: Ieeef11b9406964194028b0d81d84bcb63296ae06	2017-09-12 14:25:38 -07:00
Scott LaVarnway	d6c9bbc2b6	vpxdsp: [x86] add highbd_d207_predictor functions C vs SSE2 speed gains: _4x4 : ~2.31x C vs SSSE3 speed gains: _8x8 : ~4.73x _16x16 : ~10.88x _32x32 : ~4.80x BUG=webm:1411 Change-Id: I0bac29db261079181ddabc6814bd62c463109caf	2017-09-11 07:36:24 -07:00
James Zern	9a2dd7e67e	Merge changes I9ec438aa,I99c954ff * changes: Update convolve functions' assertions Add 2 to 1 scaling NEON optimization	2017-09-08 19:23:40 +00:00
James Zern	d7caee2170	vpx_scale_test.h: remove #if from inside macro fixes visual studio error Change-Id: I86206f17ca951b15e247c1b92561847d8c21ec7a	2017-09-08 00:06:25 -07:00
Shiyou Yin	43cbdc216d	Merge "vp8: [loongson] optimize sixtap predict with mmi"	2017-09-08 00:59:31 +00:00
Shiyou Yin	2c7b7424c5	Merge "vpxdsp: [loongson] optimize sad functions with mmi"	2017-09-08 00:55:14 +00:00
Linfeng Zhang	ef41c6286d	Update convolve functions' assertions So that 4 to 1 frame scaling can call them. Change-Id: I9ec438aa63b923ba164ad3c59d7ecfa12789eab5	2017-09-07 12:33:58 -07:00
Linfeng Zhang	71b38a144e	Add 2 to 1 scaling NEON optimization BUG=webm:1419 Change-Id: I99c954ffa50a62ccff2c4ab54162916141826d9b	2017-09-07 12:33:50 -07:00
Linfeng Zhang	d5d2cbcc75	Add ScaleFrameTest Move class VpxScaleBase to new file test/vpx_scale_test.h. Add new file test/vp9_scale_test.cc with ScaleFrameTest. BUG=webm:1419 Change-Id: Iec2098eafcef99b94047de525e5da47bcab519c1	2017-09-06 15:54:58 -07:00
Linfeng Zhang	7219f31904	Merge "Remove get_filter_base() and get_filter_offset() in convolve"	2017-09-06 22:39:15 +00:00
Linfeng Zhang	d331e7a1c0	Remove get_filter_base() and get_filter_offset() in convolve so that the convolve functions are independent of table alignment. Change-Id: Ieab132a30d72c6e75bbe9473544fbe2cf51541ee	2017-09-05 15:22:36 -07:00
Scott LaVarnway	bc4bcca3fd	vpxdsp: [x86] add highbd_dc_128_predictor functions C vs SSE2 speed gains: _4x4 : ~7.64x _8x8 : ~16.60x _16x16 : ~8.15x _32x32 : ~5.05x BUG=webm:1411 Change-Id: If165d419711cfda901bd428a05ca1560a009e62e	2017-09-05 07:57:42 -07:00
Shiyou Yin	0095213790	vp8: [loongson] optimize sixtap predict with mmi 1. vp8_sixtap_predict16x16_mmi 2. vp8_sixtap_predict8x8_mmi 3. vp8_sixtap_predict8x4_mmi 4. vp8_sixtap_predict4x4_mmi Change-Id: I186669d1a1d998a0f3ba3a548e25eee8b52c251b	2017-09-02 19:08:20 +00:00
Shiyou Yin	f4150163a2	vpxdsp: [loongson] optimize sad functions with mmi 1. vpx_sadWxH_c 2. vpx_sadWxH_avg_c 3. vpx_sadWxHx3_c 4. vpx_sadWxHx8_c 5. vpx_sadWxHx4d_c Change-Id: Ie13161e3d73a052ea6ea7bac9cfadf55598fea7a	2017-09-02 15:11:32 +00:00
James Zern	d49a1a5329	test,Android.mk: export gtest include path fixes test file builds Change-Id: Iaa725ad95d56cf77d9fef8994981a80102e9a966	2017-09-01 19:44:12 -07:00
clang-format	7587a97551	apply clang-format Change-Id: If4c3e8a396d0fcb304f407b44e28cac3219f038c	2017-09-01 01:24:03 -07:00
Peter Boström	9ab4d9df38	Prevent data race from low-pass filter. Makes main thread wait for the filter level to be picked to avoid a race between the LPF thread and update_reference_frames(). This also re-enables the failing tests under thread_sanitizer where this data race was detected. BUG=webm:1460 Change-Id: I7f5797142ea0200394309842ce3e91a480be4fbc	2017-08-31 18:37:55 -07:00
Scott LaVarnway	ab5704f02c	Merge "vpxdsp: [x86] add highbd_dc_left_predictor functions"	2017-08-31 21:34:27 +00:00
Jerome Jiang	297c110dcb	Merge "Revert "Re-enable disabled tests under TSan.""	2017-08-31 01:52:42 +00:00
Jerome Jiang	d7ba519b9f	Revert "Re-enable disabled tests under TSan." This reverts commit `df9ce12259`. Reason for revert: Re-enabled tests still fail tsan in high bitdepth. Original change's description: > Re-enable disabled tests under TSan. > > These tests point to an already-fixed bug, this should no longer have a > data race. > > BUG=webm:1049 > > Change-Id: Iaedc5db8df99362bdc501b70ff7fdebf8756fdb8 TBR=jzern@google.com,pbos@chromium.org,builds@webmproject.org # Not skipping CQ checks because original CL landed > 1 day ago. Bug: webm:1049 Change-Id: I232f1f7726bf795b301abfb2e07cad6756642e53	2017-08-30 23:44:21 +00:00
Scott LaVarnway	c39a05ff61	vpxdsp: [x86] add highbd_dc_left_predictor functions C vs SSE2 speed gains: _4x4 : ~6.49x _8x8 : ~10.82x _16x16 : ~7.61x _32x32 : ~5.29x BUG=webm:1411 Change-Id: Ibc30c50cb7139049bf05298010803499e6ef949b	2017-08-30 09:29:06 -07:00
Scott LaVarnway	2d0c11093e	Merge "vpxdsp: [x86] add highbd_dc_top_predictor functions"	2017-08-30 11:25:07 +00:00
Scott LaVarnway	f783e3a75d	vpxdsp: [x86] add highbd_dc_top_predictor functions C vs SSE2 speed gains: _4x4 : ~7.39x _8x8 : ~11.36x _16x16 : ~8.68x _32x32 : ~4.33x BUG=webm:1411 Change-Id: I7f1487cd1531d4e7f0fbb4596fed3bfb72a59d58	2017-08-29 12:53:30 -07:00
Peter Boström	2f5fb37dac	Merge "Re-enable disabled tests under TSan."	2017-08-29 15:42:39 +00:00
Scott LaVarnway	30d9a1916c	vpxdsp: [x86] add highbd_h_predictor functions C vs SSE2 speed gains: _4x4 : ~8.12x _8x8 : ~9.71x _16x16 : ~8.21x _32x32 : ~5.0x BUG=webm:1422 Change-Id: I5e8a1ed4db7b8dc539b3e2a728b0b34d8b4b1993	2017-08-28 17:31:18 -07:00
Peter Boström	df9ce12259	Re-enable disabled tests under TSan. These tests point to an already-fixed bug, this should no longer have a data race. BUG=webm:1049 Change-Id: Iaedc5db8df99362bdc501b70ff7fdebf8756fdb8	2017-08-28 16:24:38 -07:00
Marco Paniconi	3e069846b9	Merge "Revert "quantize avx: copy 32x32 implementation""	2017-08-25 18:20:31 +00:00
Marco Paniconi	8c42237bb2	Revert "quantize avx: copy 32x32 implementation" This reverts commit `f60d1dcd3d`. Reason for revert: <INSERT REASONING HERE> Failures in AVX/VP9QuantizeTest in nightly tests. Original change's description: > quantize avx: copy 32x32 implementation > > Ensure avx and ssse3 stay in sync by testing them against each other. > > Change-Id: I699f3b48785c83260825402d7826231f475f697c TBR=slavarnway@google.com,johannkoenig@google.com,builds@webmproject.org Change-Id: Ibd38636212269328317dd0721be9d25452113d1c No-Presubmit: true No-Tree-Checks: true No-Try: true	2017-08-25 16:56:08 +00:00
Shiyou Yin	ece1989fa2	Merge "vpx_dsp:loongson optimize vpx_varianceWxH_c,vpx_sub_pixel_varianceWxH_c and vpx_sub_pixel_avg_varianceWxH_c with mmi."	2017-08-25 06:44:02 +00:00
Johann Koenig	6c21650c0e	Merge "quantize avx: copy 32x32 implementation"	2017-08-24 18:55:03 +00:00
Shiyou Yin	9e4647c7ab	vpx_dsp:loongson optimize vpx_varianceWxH_c,vpx_sub_pixel_varianceWxH_c and vpx_sub_pixel_avg_varianceWxH_c with mmi. Change-Id: Ia576a721df6312329b599c31cfe1fb1267a9f174	2017-08-25 01:58:49 +08:00
Johann Koenig	258122fdc6	Merge "quantize test: skip block was removed"	2017-08-24 17:43:10 +00:00
Johann	f60d1dcd3d	quantize avx: copy 32x32 implementation Ensure avx and ssse3 stay in sync by testing them against each other. Change-Id: I699f3b48785c83260825402d7826231f475f697c	2017-08-24 10:42:34 -07:00
Johann	1787e7dbe0	quantize ssse3: copy implementation to intrinsics Still does not pass tests. Does match the previous assembly, although saving the sign before multiplying is dubious. Change-Id: Ia163f18c755aba542d6e93f7bf7343184660df5a	2017-08-24 07:47:51 -07:00
Johann	92aafefa1e	quantize test: skip block was removed Change-Id: I1d93698bc27529b0544d79dd7b9fe37afa51ef87	2017-08-24 07:21:42 -07:00
Johann Koenig	2dc0a5132d	Merge "quantize test: set threshold for 32x32"	2017-08-24 14:04:29 +00:00
Shiyou Yin	d080c92524	Merge "vpx_dsp:loongson optimize vpx_mseWxH_c(case 16x16,16X8,8X16,8X8) with mmi."	2017-08-24 00:55:11 +00:00
Johann	e89344d61a	quantize test: set threshold for 32x32 Change-Id: I77be617c7d7c64929dd51c6077322f4f8ad23897	2017-08-23 15:59:11 -07:00
Johann Koenig	f53b656207	Merge "quantize avx: copy implementation to intrinsics"	2017-08-23 21:14:13 +00:00
Johann	7c27872164	quantize avx: copy implementation to intrinsics Adds an early exit based on ptest. Slightly slower than ssse3 in the full case because of the extra check, but potentially faster if lots of rows can be skipped. Very close in speed to the assembly. Can run in 32 bit, unlike the assembly. Allows reworking the function prototype to use structs. Change-Id: If80e2b9ba059370a4cad3c973196e82a97b4330e	2017-08-23 09:19:16 -07:00
Johann	e83d99d7b8	quantize fp: neon implementation About 4x faster when values are below the dequant threshold and 10x faster if everything needs to be calculated. Both numbers would improve if the division for dqcoeff could be simplified. BUG=webm:1426 Change-Id: I8da67c1f3fcb4abed8751990c1afe00bc841f4b2	2017-08-23 08:01:30 -07:00
Shiyou Yin	59e065b6ed	vpx_dsp:loongson optimize vpx_mseWxH_c(case 16x16,16X8,8X16,8X8) with mmi. Change-Id: I2c782d18d9004414ba61b77238e0caf3e022d8f2	2017-08-23 15:14:15 +08:00
James Zern	419ce36294	Merge "ppc: Add vpx_idct16x16_256_add_vsx"	2017-08-22 00:48:39 +00:00
Shiyou Yin	bff5aa9827	Merge "vpx_dsp:loongson optimize vpx_subtract_block_c (case 4x4,8x8,16x16) with mmi."	2017-08-22 00:37:23 +00:00
Johann	661efeca97	quantize test: test _fp_ version of quantize None of the x86 optimizations pass the tests. Change-Id: Ic67f2ba1977b657e68f2a13b0711fc5fcbafd909	2017-08-21 12:29:41 -07:00
Johann	13eed991f9	Remove skip_block from quantize This condition is handled before this code is reached. The ssse3 version of the function has always crashed when attempting to handle the skip_block condition. Add assert() and comments regarding the usage of skip_block. Removing the parameter is a fairly involved process so leave it be for the moment. Change-Id: Ib299f6fc6589d7ee102262cc74a7aeb60110bc5a	2017-08-21 09:49:04 -07:00
Shiyou Yin	7d82e57f5b	vpx_dsp:loongson optimize vpx_subtract_block_c (case 4x4,8x8,16x16) with mmi. Change-Id: Ia120ad1064d0b6106d9685cf075bdab373eef19e	2017-08-18 09:06:49 +08:00
Paul Wilkins	372336d1e5	Merge "Fix corrupt arf groups due to low "lag_in_frames""	2017-08-16 18:25:29 +00:00
Linfeng Zhang	f95686895b	Merge changes I08b562b6,Ia275940a,I51106e90 * changes: Add vpx_highbd_idct32x32_{34, 135, 1024}_add_{sse2, sse4_1} Update highbd idct x86 optimizations. Update 32x32 idct sse2 and ssse3 optimizations.	2017-08-16 16:36:37 +00:00
paulwilkins	48110d0f79	Fix corrupt arf groups due to low "lag_in_frames" Having a very small value for "lag_in_frames" can result in corrupt arf groups including displayed frames that update the arf buffer and fake overlay frames that are not in fact overlays of real arfs but are nevertheless starved of bits. Leaving lag_in_frames at the default of 25 for these 5 frame two pass VBR tests should now give rise to a valid ARF coding pattern as follows:- K(ey), A(rf), N(ormal), N, N, O(verlay). This change is part of a response to BUG=webm:1454 where broken arf groups interacted badly with a change that corrects for large rate misses. However, it may still in some cases increase encode time by virtue of the fact that the unit test now codes a correct coding pattern with "hidden" ARF frames. Change-Id: Ifd0246a4c1d0be247247c754024d7a4ed5f66a6b	2017-08-16 14:07:24 +01:00
Johann Koenig	c59d1a4dc7	Merge changes I1f1edeaa,I89313cac * changes: quantize: silence unsigned overflow warning quantize test: quiet overflow warning	2017-08-15 17:37:59 +00:00
Johann	08cb7b5c68	quantize test: quiet overflow warning Promote the result of RandRange to signed Change-Id: I89313cace3bcbe9af96946bef00b6857fc48b128	2017-08-15 08:28:09 -07:00
Linfeng Zhang	d72e20b123	Add vpx_highbd_idct32x32_{34, 135, 1024}_add_{sse2, sse4_1} BUG=webm:1412 Change-Id: I08b562b60fa85fbc2fec1c15c323a3444b44618f	2017-08-14 17:05:22 -07:00
Scott LaVarnway	fa85cf131c	vp9: strip temporal filter code when CONFIG_REALTIME_ONLY is enabled. BUG=webm:1446 Change-Id: Id547783ec75383966c40ab5cf6abb4a0f7984f52	2017-08-14 14:27:53 -07:00
Johann Koenig	ff184e482a	Merge changes I4b4beab1,I02f74dec * changes: quantize test: check skip_block quantize test: use negative input	2017-08-14 20:52:52 +00:00
Johann Koenig	45b39750d6	Merge "temporal filter test: adjust inputs and runtime"	2017-08-14 20:46:22 +00:00
Johann	c06d6649c5	temporal filter test: adjust inputs and runtime Use input with a narrow range because the filter only applies when the frames are similar. Run CompareReferenceRandom more times. Especially before narrowing the input range, the filter frequently did not apply. Change-Id: Ie249bedf6d0d33dfa5884611cb1835788e418b38	2017-08-14 17:24:11 +00:00
James Zern	746c0eab3b	disable SSSE3/VP9QuantizeTest* in hbd builds this test fails with the configuration similar to the assembly prior to: `d52cb5972` quantize: copy ssse3 optimizations to intrinsics BUG=webm:1458 Change-Id: Idc5c0b84c0598259fc49609a9f0756de531d3baf	2017-08-14 09:31:14 -07:00
Johann Koenig	9bb8ce5efb	Merge "neon: vpx_quantize_b_32x32"	2017-08-10 15:42:49 +00:00
Johann Koenig	0b393ae505	Merge "quantize: copy ssse3 optimizations to intrinsics"	2017-08-10 15:42:20 +00:00
Johann	357adb68b2	quantize test: check skip_block Not all sizes were tested previously. Only 4x4 and 32x32 Change-Id: I4b4beab1b92a810a097a7306de04cc9e0e260315	2017-08-08 14:21:58 -07:00
Johann	1092cc7f1a	quantize test: use negative input coeff contains signed values. Change-Id: I02f74decf30379a28122169ab3e844d0f3bd7d23	2017-08-08 14:19:56 -07:00
Johann	93166c5e51	neon: vpx_quantize_b_32x32 With skip block the neon is about twice as fast as C. The neon has no shortcut for coeff < zbin so it always takes the same amount of time. Even if the C can take the shortcut, it is over twice as fast in neon. If it can't, that gap increases to over 10x. BUG=webm:1426 Change-Id: I400722146c1b5a5f6289f67d85fd642463d2bfc6	2017-08-08 14:05:18 -07:00
Johann	d52cb59729	quantize: copy ssse3 optimizations to intrinsics Fairly minor differences from sse2. pabsw and psignw are the big gains. Also re-uses some values in eob calculation to avoid an extra pcmp. Fixes test failures in HBD and OS X builds. Allows using it in 32bit builds, where it is about 40% faster than sse2. Substantially faster than the assembly for skip_block. 10-20% faster the rest of the time. Change-Id: If783bb3567e561e47667e10133b9c84414a334e2	2017-08-08 12:22:14 -07:00
Linfeng Zhang	853165ba39	Update 32x32 idct sse2 funcs, add partial case 135 Change-Id: I2b9add83f6fd8f9138fed3bec04a59877a237a6a	2017-08-07 17:37:02 -07:00
Linfeng Zhang	7f20c3ac44	Add vpx_highbd_idct16x16_{10, 38, 256}_add_sse4_1 BUG=webm:1412 Change-Id: I8877c986b4042f7b8e33f5674c86700675a0e4ca	2017-08-04 15:31:17 -07:00
Johann Koenig	cbb83ba4aa	Merge "quantize test: consolidate sizes"	2017-08-04 20:34:50 +00:00
Johann	9578a84205	quantize test: consolidate sizes Pass a max txfm size parameter and combine the base quantize test with the 32x32 test. Change-Id: I72ddf020fe6888e864ea9f3642ee2d9a8e48a04b	2017-08-04 12:45:32 -07:00
Linfeng Zhang	563d58ab84	Rewrite vpx_idct16x16_{10,256}_add_sse2() and add case 38 function BUG=webm:1412 Change-Id: I945f0fb6807b8948747243794dc7352b959221f7	2017-08-03 13:59:47 -07:00
Yunqing Wang	6843e7c7f3	Merge "Force the bit exactness in the first pass"	2017-08-03 00:03:10 +00:00
Yunqing Wang	bfd0f41f9b	Force the bit exactness in the first pass Originally, for the purpose of keeping a fast first pass, the first-pass stats between row_mt_mode = 0 and row_mt_mode = 1 are not bit exact, but that difference is very small that doesn't cause a mismatch between the final bitstreams. However, if the encoder changes, this minor difference may cause a mismatch. Thus, this patch always forces the first pass to be bit exact. BUG=webm:1453 Change-Id: I2b67cf529dee81f660f9d9e7fe9a60ea3c7b12b8	2017-08-02 15:58:39 -07:00
Johann	1059b5cc52	quantize test: add speed comparison Test some possible scenarios. Change-Id: I1a612e7153b31756be66390ceea55877856d5a33	2017-08-02 09:33:35 -07:00
Johann Koenig	847394fe77	Merge "neon: vpx_quantize_b"	2017-08-01 16:44:31 +00:00
Johann	2d6b5df657	neon: vpx_quantize_b With skip block or coeff < zbin it is about twice as fast as C. If most coeff values are > zbin it is about 10-15x as fast as C. BUG=webm:1426 Change-Id: I5d3c007b014a372d5ef0882b39bb48983b4131c7	2017-07-31 10:38:46 -07:00
Linfeng Zhang	75653b7032	Merge changes Ia0e20f5f,I28150789,I35df041b,I221dff34 * changes: Update vpx_idct16x16_10_add_sse2() Add vpx_idct16x16_38_add_sse2() Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2 Refactor highbd idct 4x4 and 8x8 x86 functions	2017-07-28 22:43:00 +00:00
James Zern	3c73e587d1	Revert "quantize ssse3: declare all variables" This reverts commit `03f5e300d6`. This causes test failures under OSX: SSSE3/VP9QuantizeTest.EOBCheck/0 SSSE3/VP9QuantizeTest.OperationCheck/0 Change-Id: I122732717ead1f7af5b04c529a6948e382e5e59b	2017-07-28 01:22:16 -07:00
Linfeng Zhang	7f4acf8700	Add vpx_idct16x16_38_add_sse2() Change-Id: I28150789feadc0b63d2fadc707e48971b41f9898	2017-07-27 18:02:43 -07:00
Linfeng Zhang	9c43d81bc2	Refactor highbd idct 4x4 and 8x8 x86 functions BUG=webm:1412 Change-Id: I221dff34dd5f71b390b5e043d0a137ccb0a01dec	2017-07-27 18:01:03 -07:00
Johann Koenig	a83e1f1d53	Merge "quantize ssse3: declare all variables"	2017-07-27 21:18:35 +00:00
Alexandra Hájková	666c543f7b	ppc: Add vpx_idct16x16_256_add_vsx Change-Id: Ibc3f7965423fd91179f8d8e77c7ae3e6d7f80572	2017-07-25 12:34:15 +00:00
Johann	af08fbb444	quantize test: promote RandRange() result to signed Avoid unsigned overflow warning: unsigned integer overflow: 19974 - 32703 cannot be represented in type 'unsigned int' Change-Id: Ifebee014342e4c6f3b53306c0cad6ae0b465ac12	2017-07-20 08:17:48 -07:00
Johann	c782f27ead	quantize test: lowbd functions do not pass in highbd qcoeff output looks OK but dqcoeff is no good. BUG=webm:1448 Change-Id: I07211db8a8b74f1f45fdd059852e2de0e5ee18fd	2017-07-20 08:17:48 -07:00
Johann	bde2e4aa36	quantize test: eob is output eob values are generated by the function. Change-Id: I8ce92100e83022bff99888a5a7e6ef378c49fda3	2017-07-19 14:17:19 -07:00
Johann	03f5e300d6	quantize ssse3: declare all variables Copy missing line from avx implementation. Change-Id: I9755c5b4d4034867de6fa9f741c24bf49dce3a27	2017-07-18 12:32:57 -07:00
Johann	101981b736	quantize test: test sse2 and avx optimizations ssse3 does not pass either of the tests. avx 32x32 does not pass. Change-Id: I62c2e31336fd2327327afaa0da896ad79a3def44	2017-07-18 12:08:16 -07:00
Johann	c7ebe82253	quantize test: extend arrays Officially the quant structures are 8 elements, with one dc element and 7 repeated ac elements. The low bit depth optimizations take advantage of this to fill the xmm registers. The high bit depth version manually duplicates the values. If all the optimizations were unified, the structure sizes could be greatly reduced. Change-Id: Ibd7a0337a7832ce2a1a05ee433c310077e1059ae	2017-07-18 09:55:47 -07:00
Johann	cb61ba02f4	quantize test: restrict and correct input Use only valid values for quantize inputs. These were determined by looping over vp9_init_quantizer and looking for max and min values. This allows extending the test to the low bit depth functions which were not designed to handle all possible inputs but only valid inputs. Change-Id: I94e1d8863a49ac227845b65c6b50130e10e6319e	2017-07-18 09:40:45 -07:00
James Zern	9223b947ca	Merge "fix 'make exampletest' w/CONFIG_REALTIME_ONLY"	2017-07-15 18:37:10 +00:00

1 2 3 4 5 ...

2249 Commits