generic-library/vpx

Author	SHA1	Message	Date
paulwilkins	0149fb3d6b	Changes to exhaustive motion search. This change alters the nature and use of exhaustive motion search. Firstly any exhaustive search is preceded by a normal step search. The exhaustive search is only carried out if the distortion resulting from the step search is above a threshold value. Secondly the simple +/- 64 exhaustive search is replaced by a multi stage mesh based search where each stage has a range and step/interval size. Subsequent stages use the best position from the previous stage as the center of the search but use a reduced range and interval size. For example: stage 1: Range +/- 64 interval 4 stage 2: Range +/- 32 interval 2 stage 3: Range +/- 15 interval 1 This process, especially when it follows on from a normal step search, has shown itself to be almost as effective as a full range exhaustive search with step 1 but greatly lowers the computational complexity such that it can be used in some cases for speeds 0-2. This patch also removes a double exhaustive search for sub 8x8 blocks which also contained a bug (the two searches used different distortion metrics). For best quality in my test animation sequence this patch has almost no impact on quality but improves encode speed by more than 5X. Restricted use in good quality speeds 0-2 yields significant quality gains on the animation test of 0.2 - 0.5 db with only a small impact on encode speed. On most clips though the quality gain and speed impact are small. Change-Id: Id22967a840e996e1db273f6ac4ff03f4f52d49aa	2015-11-13 10:16:31 +00:00
Geza Lore	5eefd3ebfd	Add AVX vectorized vp9_diamond_search_sad This function now has an AVX intrinsics version which is about 80% faster compared to the C implementation. This provides a 2-4% total speed-up for encode, depending on encoding parameters. The function utilizes 3 properties of the cost function lookup table, constructed in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'. For the joint cost: - mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3] For the component costs: - For all i: mvsadcost[0][i] == mvsadcost[1][i] (equal per component cost) - For all i: mvsadcost[0][i] == mvsadcost[0][-i] (Cost function is even) These must hold, otherwise the AVX version of the function cannot be used. Change-Id: I6c2791d43022822a9e6ab43cd124a773946d0bdc	2015-11-11 14:03:47 +00:00
James Zern	30466f26b4	Revert "Add AVX vectorized vp9_diamond_search_sad" This reverts commit `f1342a7b07`. This breaks 32-bit builds: runtime error: load of misaligned address 0xf72fdd48 for type 'const __m128i' (vector of 2 'long long' values), which requires 16 byte alignment + _mm_set1_epi64x is incompatible with some versions of visual studio Change-Id: I6f6fc3c11403344cef78d1c432cdc9147e5c1673	2015-11-06 13:15:01 -08:00
Geza Lore	f1342a7b07	Add AVX vectorized vp9_diamond_search_sad This function now has an AVX intrinsics version which is about 80% faster compared to the C implementation. This provides a 2-4% total speed-up for encode, depending on encoding parameters. The function utilizes 3 properties of the cost function lookup table, constructed in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'. For the joint cost: - mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3] For the component costs: - For all i: mvsadcost[0][i] == mvsadcost[1][i] (equal per component cost) - For all i: mvsadcost[0][i] == mvsadcost[0][-i] (Cost function is even) These must hold, otherwise the AVX version of the function cannot be used. Change-Id: I184055b864c5a2dc37b2d8c5c9012eb801e9daf6	2015-11-05 10:02:17 +00:00
Geza Lore	965a8dea0b	Convert motion search config from AoS to SoA This is a prerequisite for vectorizing vp9_diamond_search_sad_c. Change-Id: I49cd9148782410ca8b16e8a468ca9e7c6d088410	2015-10-28 15:30:43 +00:00
Johann	c5f11912ae	Include vpx_dsp_common.h when using VPXMIN/MAX Change-Id: I2e387a06484a06301f3cd6600c4ba2f4335b61ee	2015-08-31 14:36:35 -07:00
James Zern	ff03d5448a	vp9_mcomp: make search functions private vp9_full_pixel_search() can be used as a replacement as it dispatches to all search methods Change-Id: I57fcb79c1362b569dc95237bdcc8390f54efd440	2015-08-28 18:54:10 -07:00
James Zern	5e16d397bd	vpx_dsp_common: add VPX prefix to MIN/MAX prevents redeclaration warnings; vp8 has its own define which will be resolved in a future commit Change-Id: Ic941fef3dd4262fcdce48b73075fe6b375f11c9c	2015-08-26 20:11:32 -07:00
Yunqing Wang	4bc6ae4342	Merge "Improve the second-level sub-pixel motion search"	2015-08-07 16:05:59 +00:00
Yunqing Wang	7418b176ce	Improve the second-level sub-pixel motion search Re-investigated the second-level sub-pixel motion search. Improved the way of choosing search points. Rewrote the second-level search code. At speed 0, the borg tests showed: 1. for stdhd set, Avg PSNR gain: 0.216%; Overall PSNR gain: 0.196%; SSIM gain: 0.206%. Only 1 out of 15 clips showed PSNR loss. 2. for derf set, Avg PSNR gain: 0.171%; Overall PSNR gain: 0.192%; SSIM gain: 0.207%. Only 3 out of 30 clips showed PSNR losses. Added the condition for third-point checking, namely, less points were checked. Speed tests showed no speed loss(Avg 0.3% speedup at speed 0). Change-Id: I6284ebb3fa7ba63be8528184c49e06757211a7f1	2015-08-06 16:28:32 -07:00
Jingning Han	b4f2c567c8	Cosmetic - align format in vp9 Change-Id: I83ed3422f1f4009675ad2f5c4b7236bc7b83b30e	2015-08-06 15:56:11 -07:00
Yunqing Wang	726d1b841b	Minor adjustment in diagonal sub-pixel point checking Choose a different diagonal point to check when the two costs are the same, making it consistent with the way we choose the best mv. This slightly changes the encoding result, and the derflr set borg test at speed 0 shows 0.027% Overall PSNR gain, 0.024% Avg PSNR gain, and 0.043% SSIM gain. Change-Id: Ic8ee3a6767394866d159e4f9e1c777604dd73c17	2015-08-04 12:16:47 -07:00
Yunqing Wang	a3d22aa2a4	Small improvement in sub-pixel motion search If the current best mv(namely, the search center) is still the best mv after the first level search, the second level checks is skipped. This patch doesn't change the bitstream. At speed 0, it speeds up the encoder by 1% - 2%. Change-Id: I054c91b884d3f7aef157436c061744562bd6506d	2015-08-04 12:06:21 -07:00
James Zern	aaa49f0485	vp9_mcomp: make search_step_table static Change-Id: I2552d8101cf49ed951782ab69adce407579700fc	2015-06-12 18:11:54 -07:00
James Zern	7ea431df98	vp9_mcomp: don't mark setup_center_error() inline this function is a bit too involved for the hint; avoids a -Winline warning Change-Id: Ib82e424764aa78b37ddb94116e2b009a6de31d35	2015-06-12 17:56:33 -07:00
Johann	eb88b172fe	Make vp9 subpixel match vp8 The only difference between the two was that the vp9 function allowed for every step in the bilinear filter (16 steps) while vp8 only allowed for half of those. Since all the call sites in vp9 (<< 1) the input, it only ever used the same steps as vp8. This will allow moving the subpel variance to vpx_dsp with the rest of the variance functions. Change-Id: I6fa2509350a2dc610c46b3e15bde98a15a084b75	2015-06-03 22:10:51 -07:00
Johann	dee70d355f	Merge "Move variance functions to vpx_dsp"	2015-05-26 23:02:11 +00:00
Johann	c3bdffb0a5	Move variance functions to vpx_dsp subpel functions will be moved in another patch. Change-Id: Idb2e049bad0b9b32ac42cc7731cd6903de2826ce	2015-05-26 12:01:52 -07:00
Jingning Han	96dba4902c	Fix integral projection motion search for frame resize This commit fixes the integral projection motion search crash when frame resize is used. It fixes issue 994. Change-Id: Ieeb52619121d7444f7d6b3d0cf09415f990d1506	2015-05-22 15:40:45 -07:00
Johann	1d7ccd5325	Relocate memory operations for common code With the sad functions, and hopefully the variance functions soon, moving to the vpx_dsp location, place the defines used in the reference C code in a common location. Change-Id: I4c8ce7778eb38a0a3ee674d2f1c488eda01cfeca	2015-05-13 11:41:15 -07:00
James Zern	fd3658b0e4	replace DECLARE_ALIGNED_ARRAY w/DECLARE_ALIGNED this macro was used inconsistently and only differs in behavior from DECLARE_ALIGNED when an alignment attribute is unavailable. this macro is used with calls to assembly, while generic c-code doesn't rely on it, so in a c-only build without an alignment attribute the code will function as expected. Change-Id: Ie9d06d4028c0de17c63b3a27e6c1b0491cc4ea79	2015-05-07 11:55:08 -07:00
Scott LaVarnway	8b17f7f4eb	Revert "Remove mi_grid_* structures." (see I3a05cf1610679fed26e0b2eadd315a9ae91afdd6) For the test clip used, the decoder performance improved by ~2%. This is also an intermediate step towards adding back the mode_info streams. Change-Id: Idddc4a3f46e4180fbebddc156c4bbf177d5c2e0d	2015-04-21 11:16:45 -07:00
James Zern	e1ff83f4b0	vp9_full_search_sadx[38]: align sad arrays the sse4 code expects 16-byte aligned arrays; vp8 already had a similar change applied: `b2aa401` Align SAD output array to be 16-byte aligned Change-Id: I5e902035e5a87e23309e151113f3c0d4a8372226	2015-04-07 14:34:06 -07:00
Jingning Han	2cfddec332	Refactor column integral projection computation Move the scaling factor outside column projection. This avoids repeated calculation of the same scaling factor. Profiling shows that the percentage of vp9_int_pro_col_sse2 of overall cycles goes from 2.29% down to 1.88%. Change-Id: I5ac4e324ab2d7f33ba2de66dd2a12e04e04dfd66	2015-03-16 12:07:15 -07:00
Jingning Han	b03cf9317a	Fix 1-step refinement search table Change-Id: I32f0bcb40c6e7ba63bfae487739ededd0b6b2dde	2015-03-14 10:52:11 -07:00
Jingning Han	cce7020f2c	Use sdx4df to do 1-step refinement Change-Id: Ie0c3ef3ae3aedf049b1a296de607730b79c12672	2015-03-13 09:53:15 -07:00
Jingning Han	ba29125f7b	Reset src buffer only once in vp9_int_pro_motion_estimation Change-Id: I5c96b6a25f9df60da65b7af7c92a921b611746e3	2015-03-12 18:50:53 -07:00
Jingning Han	427cdf0a41	Reduce the number of full block SAD calls This commit uses a 6-point 1-step refine motion search in the integral projection based full pixel motion estimation, to replace the current 9-point search. It reduces runtime cost of speed -6 on some noisy clips, e.g., dark720p single thread 33314 b/f, 40.076 dB, 18231 ms -> 33307 b/f, 40.067 dB, 17768 ms The compression performance for rtc set remains unchanged. Change-Id: I194ea5a9ce52e5a10baeee36338633adc22f764c	2015-03-12 18:30:57 -07:00
Jingning Han	1ca4d51b2e	Refactor to remove GLOBAL_MOTION Make the vp9_int_pro_motion_estimation() function return zero motion vector if high bit depth is turned on, instead of removing it from compiled codes. Change-Id: Ia48f010eb590b2d517d5678c394110b326a1a95e	2015-03-11 15:53:15 -07:00
Jingning Han	fda0410822	Move pred_mv assign outside integral projection motion search Change-Id: I040b066fdce08e2f05115a22ea808715aa147779	2015-03-05 11:44:10 -08:00
Jingning Han	2deecdd5cb	Move integral projection motion search to vp9_mcomp.c Make it a general purpose fast motion estimation function, to be used in the mode search process. Change-Id: Ib354cb0e664dc61c30c0b2314297835ee75b157a	2015-03-04 10:30:15 -08:00
Johann	ba18609502	Remove unnecessary pointer check The original implementation had the following comment: // Ignore mv costing if mvsadcost is NULL However the current implementation does not allow for this. If x exists then nmvsadcost must not be null. This removes the only warning from -Wpointer-bool-conversion https://code.google.com/p/webm/issues/detail?id=894 Change-Id: I1a2cee340d7972d41e1bbbe1ec8dfbe917667085	2015-02-03 13:03:46 -08:00
James Zern	616b3a810f	vp9 asserts: fix compile warning string literal to int within an assert Change-Id: I76a173f96b9add5bf27c3f5ad5d72c6f30e51629	2014-12-05 16:20:42 -08:00
Peter de Rivaz	2c886953d1	Reinsert macro to fix issue 884. Change 72056 unfolded some macro definitions, but lost some alternative behaviour required for high bitdepth encodes. This causes the encoder to crash, see issue 884. Change-Id: I8ce4d73c9fe0a3c10ccb86fba210fabc8b2f0ccc	2014-12-02 13:45:26 -08:00
Yaowu Xu	1687c47bfd	change to call vp9_refining_search_sad() directly The function pointer in compressor instance does not change, so this commit changes to call the function directly. Change-Id: I9c9c460e3475711c384b74c9842f0b4f3d037cc5	2014-11-17 11:30:17 -08:00
Jingning Han	e083f6bd08	Refactor sub-pixel motion search unit This commit unfolds the legacy macro definitions used in the sub-pixel motion search and refactors the operational flow for later optimizations. Change-Id: I3e3f770cad961d03d1a6eb0b2a0186cc77eaf2b8	2014-11-03 09:02:57 -08:00
Deb Mukherjee	9a29fdbae7	Merge "Rename highbitdepth functions to use highbd prefix"	2014-10-09 15:39:56 -07:00
Deb Mukherjee	1929c9b391	Rename highbitdepth functions to use highbd prefix Uses highbd_ prefix convention consistently. Change-Id: I58f7f799a7ff8e32701bcd71c955bcf1cdd4581e	2014-10-09 14:40:40 -07:00
Deb Mukherjee	d78dbff09a	Subpel search cleanups and enhancements - Some fixes to surface fit. - Returns variance function as cost rather than sad in the pattern search and diamond search functions. Only vp9_pattern_search_sad function used in bigdia search uses sad as integer 1-away costs. - Deploys SUBPEL_TREE_PRUNED_MORE for speed 4+. Results: derf [Speed 3]: About +0.036% in coding efficiency without any discernible speed loss. derf [Speed 4]: About 2-3% faster at -0.199% loss in coding efficiency. derf [Speed 5]: About 3-4% faster at -0.149% loss in coding efficiency. Change-Id: I8462f94f6adb46966ca964f2bd0400977357fd63	2014-10-08 23:59:43 -07:00
Deb Mukherjee	4e9c0d2ad4	Adds two new subpel search methods One is a more aggressive version of the pruned subpel tree search where only a single halfpel candidate is searched. The search candidate is based on a surface fit result. The other is a method to obtain the subpel position at one shot based on the same surface fit. The methods have not been deployed in any speed setting yet. Change-Id: I34fef3f2e34f11396c9d1ba97f4be8c4ffca62d3	2014-09-29 12:51:20 -07:00
Deb Mukherjee	993d10a217	Adds various high bit-depth encode functions Change-Id: I6f67b171022bbc8199c6d674190b57f6bab1b62f	2014-09-25 01:50:36 -07:00
Deb Mukherjee	c94b17f4b2	Pruned subpel search for speed 3. Adds code to return an integer cost list for NSTEP search. Then uses it for pruned subpel search in speed 3. derf: -0.06% Speed on mobcal 720p increaes from 10.28 fps to 10.65 fps. [Subject to further testing]. Change-Id: Ib591382d25b2c11bcaba9d3a27a93a9d1ab27a96	2014-09-23 11:27:58 -07:00
Deb Mukherjee	83c76118eb	Use bigdia search with pruned subpel search Improves function to return sad of integer pels by reusing integer pels already visited in the smallest scale. Turns on BIGDIA search for speed 4. Also, turns on the first version of the pruned subpel search at this speed. derf: -0.32% (speed 4) Speed seems to improve by at least 5% but subject to verification. Change-Id: Iaec8eaffd61d6237ac029e6a2a1b0a88b2a35271	2014-09-12 10:25:12 -07:00
Deb Mukherjee	04b100b23e	Updates vp9_pattern search to return integer sads Updates the vp9_pattern_search function to return integer one-away neighbors' sad values, for subsequent use in speeding up the sub-pel search. Also, removes code for the do_refine option which is not being used currently. Updates the integer and subpel functions to pass in a 5-element sad list for output or input. A new pruned sub-pel search algorithm is implemented that uses the sad returned from the integer pel search. But it is not deployed yet. Change-Id: Ifa9f5ad024b5b660570366d2bd900343e1891520	2014-08-28 06:49:58 -07:00
Yaowu Xu	6673d2f309	Remove an unused parameter in vp9_init_search_range() Change-Id: I3d9130e726a1299fd258f6dfe93315e2d12f76da	2014-07-11 10:32:39 -07:00
Yunqing Wang	75cd57503d	Refactor vp9_diamond_search_sad function Currently, vp9_diamond_search_sadx4() is only called when sse3 is enabled, which is improper since sse2 optimization of sdx4df functions are available. Changed to always use vp9_diamond_search_sadx4(). Change-Id: I4b95d6b7a3c6c645783c373f0ba8d645ece24717	2014-07-10 09:19:03 -07:00
Yunqing Wang	30117a576d	Refactor refining_search_sad code There are sse2 optimization of sdx4df functions. Instead of calling vp9_refining_search_sadx4 only when sse3 is enabled, call it always. Change-Id: I24f93818f7d4209d1425039e0eb099ff9ff08fe9	2014-07-09 16:50:11 -07:00
Yunqing Wang	a581da218e	Remove repetitive code in mcomp.c Deleted vp9_find_best_sub_pixel_comp_tree(), and combined it in vp9_find_best_sub_pixel_tree(). Change-Id: Ifb25763c8b19822df5537cc1daa76ce88dc3b056	2014-07-09 14:50:50 -07:00
Alex Converse	f60a1178c6	Cleanup motion search speed features. * Replace max_step_search_steps with constant MAX_MVSEARCH_STEPS * Fold (reduce_first_step_size + speed > 5) into reduce_first_step_size replacing uses of reduce_first_step_size that don't add the speed check with zero. Change-Id: Iae46395dbf3eaca138bf4d18b838a9e364b5a198	2014-07-07 10:08:45 -07:00
Dmitry Kovalev	4ff1a614f1	Adding MV_SPEED_FEATURES struct. Moving all motion vector related speed parameters from SPEED_FEATURES to MV_SPEED_FEATURES. Change-Id: I3e9af0039c7162f8671878c5920bce3cb256a84e	2014-06-12 14:15:27 -07:00

1 2 3 4

189 Commits