generic-library/vpx

Author	SHA1	Message	Date
Yunqing Wang	4d2c376923	Early termination in encoding partition search In the partition search, the encoder checks all possible partitionings in the superblock's partition search tree. This patch proposed a set of criteria for partition search early termination, which effectively decided whether or not to terminate the search in current branch based on the "skippable" result of the quantized transform coefficients. The "skippable" information was gathered during the partition mode search, and no overhead calculations were introduced. This patch gives significant encoding speed gains without sacrificing the quality. Borg test results: 1. At speed 1, stdhd set: psnr: +0.074%, ssim: +0.093%; derf set: psnr: -0.024%, ssim: +0.011%; 2. At speed 2, stdhd set: psnr: +0.033%, ssim: +0.100%; derf set: psnr: -0.062%, ssim: +0.003%; 3. At speed 3, stdhd set: psnr: +0.060%, ssim: +0.190%; derf set: psnr: -0.064%, ssim: -0.002%; 4. At speed 4, stdhd set: psnr: +0.070%, ssim: +0.143%; derf set: psnr: -0.104%, ssim: +0.039%; The speedup ranges from several percent to 60+%. speed1 speed2 speed3 speed4 (1080p, 100f): old_town_cross: 48.2% 23.9% 20.8% 16.5% park_joy: 11.4% 17.8% 29.4% 18.2% pedestrian_area: 10.7% 4.0% 4.2% 2.4% (720p, 200f): mobcal: 68.1% 36.3% 34.4% 17.7% parkrun: 15.8% 24.2% 37.1% 16.8% shields: 45.1% 32.8% 30.1% 9.6% (cif, 300f) bus: 3.7% 10.4% 14.0% 7.9% deadline: 13.6% 14.8% 12.6% 10.9% mobile: 5.3% 11.5% 14.7% 10.7% Change-Id: I246c38fb952ad762ce5e365711235b605f470a66	2014-08-28 11:27:28 -07:00
Deb Mukherjee	bb2a9abb1e	Merge "Updates vp9_pattern search to return integer sads"	2014-08-28 09:38:56 -07:00
Deb Mukherjee	04b100b23e	Updates vp9_pattern search to return integer sads Updates the vp9_pattern_search function to return integer one-away neighbors' sad values, for subsequent use in speeding up the sub-pel search. Also, removes code for the do_refine option which is not being used currently. Updates the integer and subpel functions to pass in a 5-element sad list for output or input. A new pruned sub-pel search algorithm is implemented that uses the sad returned from the integer pel search. But it is not deployed yet. Change-Id: Ifa9f5ad024b5b660570366d2bd900343e1891520	2014-08-28 06:49:58 -07:00
Yaowu Xu	bcfb1ffb9d	Merge "add a new interp filter search strategy."	2014-08-26 17:30:42 -07:00
Yaowu Xu	1144fee3d5	add a new interp filter search strategy. This commit addes a new strategy to reduce the search for optimal interpolation filter type. The encoder counts and store how many each filter type is selected and used for each of the reference frames. A filter type that is rarely used for all three reference frames is masked out to avoid computation. The impact on compression is neglectible: -0.02% on derf +0.02% on stdhd Encoding time is seen to reduce by 2~3%. Change-Id: Ibafa92291b51185de40da513716222db4b230383	2014-08-26 09:05:04 -07:00
Dmitry Kovalev	0082727cb7	Merge "Adding is_keyframe temp var."	2014-08-25 18:36:59 -07:00
Dmitry Kovalev	98c8eb85e6	Adding is_keyframe temp var. Change-Id: I5fec955c8b8f5a9b5027a0f92afb22d22770d84a	2014-08-21 17:41:03 -07:00
Dmitry Kovalev	45425f8c1e	Removing is_best_mode() function. Change-Id: Iccd7cec885e8aeb0e54613d888f9960c393cee0b	2014-08-21 11:32:33 -07:00
Dmitry Kovalev	f617889be7	Moving frame_is_boosted() to vp9_speed_features.c. Change-Id: I9261ded5fbba7a625d8224d91be296265a932410	2014-08-19 10:31:29 -07:00
Yunqing Wang	4d98b50be5	Merge "Add early termination in transform size search"	2014-08-18 19:00:24 -07:00
Yunqing Wang	ba70f16011	Add early termination in transform size search In the full-rd transform size search, we go through all transform sizes to choose the one with best rd score. In this patch, an early termination is added to stop the search once we see that the smaller size won't give better rd score than the larger size. Also, the search starts from largest transform size, then goes down to smallest size. A speed feature tx_size_search_breakout is added, which is turned off at speed 0, and on for other speeds. The transform size search is turned on at speed 1. Borg test results: 1. At speed 1, derf set: psnr gain: 0.618%, ssim gain: 0.377%; stdhd set: psnr gain: 0.594%, ssim gain: 0.162%; No noticeable speed change. 3. At speed 2, derf set: psnr loss: 0.157%, ssim loss: 0.175%; stdhd set: psnr loss: 0.090%, ssim loss: 0.101%; speed gain: ~4%. Change-Id: I22535cd2017b5e54f2a62bb6a38231aea4268b3f	2014-08-18 16:27:04 -07:00
Jingning Han	6a464eca05	Speed up mode search depending on relative ref frame position This commit enables the encoder to record the location of the center frame to generate alter reference frame. It then allows to skip checking prediction modes of other reference frame types when it comes to encode this frame. The speed 3 runtime is reduced for the test sequences: bus at CIF 1000 kbps, 9791 ms -> 9446 ms, i.e., 3.5% speed-up, pedestrian at 1080p 2000 kbps, 184043 ms -> 175730 ms, i.e., 4.5% speed-up. No compression performance change observed. Change-Id: Iacfde3bcc1445964e7a241f239bd6ea11cb94bd1	2014-08-18 16:06:54 -07:00
Pengchong Jin	997db6fc3f	Merge "Add a speed feature to give the tighter search range"	2014-08-15 19:51:04 -07:00
Pengchong Jin	eca93642e2	Add a speed feature to give the tighter search range Add a speed feature to give the tighter partition search range. Before partition search, calculate the histogram of the partition sizes of the left, above and previous co-located blocks of the current block. If the variance of observed partition sizes is small enough, adjust the search range around the mean partition size, which will be tigher. The feature is currently turned on at speed 2. Experiments on sample youtube clips show on average the runtime is reduced by 3-7%. For hard stdhd clips: park_joy_1080p @ 15000kbps: 509251 ms -> 491953 ms (3.3%) pedestrian_area_1080p @ 2000kbps: 223941 ms -> 214226 ms (4.3%) The PSNR performance is changed: derf: -0.112% yt: -0.099% hd: -0.090% stdhd:-0.102% Change-Id: Ie205ec5325bf92ec5676c243e30ba9d0adca10f2	2014-08-15 16:14:20 -07:00
Dmitry Kovalev	dc35b40a67	Merge "Simplifying vp9_set_speed_features() function."	2014-08-15 15:31:43 -07:00
Yunqing Wang	28b1437d77	Remove a unused speed feature Removed disable_split_var_thresh, which is not used anymore. Change-Id: I50119b150442e1571157433b5effc6aae0dbe0fd	2014-08-15 14:10:27 -07:00
Yaowu Xu	5966586aef	Mask out H_PRED and V_PRED for 32x32 blocks Change-Id: I2847af5062b5fa320629fcabb9fa6b23ba3e5513	2014-08-14 10:52:10 -07:00
Yaowu Xu	4d6d061316	Set max_intra_bsize to 32x32 At --good and speed 3 or above for resolution less than 720p. This disables the tests for 64x64 intra prediction modes. Encoding time reduction is about 1%. Change-Id: Ib396e3d1417fece416e3f0fee929b128acbb130f	2014-08-14 10:51:44 -07:00
Jingning Han	ccef8842d2	Allow full coeff probability model and cost update This commit moves the simplified coefficient probability model and costing update to speed 4, and turns on chessboard pattern mode search for sub 720p sequences. The overall coding performance of speed 3 is improved: derf 0.889% stdhd 1.744% The speed 3 runtime for test sequences are improved: bus cif at 1000 kbps 9823 ms -> 9642 ms pedestrian 1080p 2000 kbps 189559 ms -> 183284 ms Change-Id: Iecbc7496a68f31fd49fb09f8dfd97c028d675a5d	2014-08-13 14:17:14 -07:00
Jingning Han	6e086548cb	Merge "Enable motion field based mode seach skip"	2014-08-13 14:13:19 -07:00
Jingning Han	0daadeb60c	Enable motion field based mode seach skip This commit allows the encoder to check the above and left neighbor blocks' reference frames and motion vectors. If they are all consistent, skip checking the NEARMV and ZEROMV modes. This is enabled in speed 3. The coding performance is improved: pedestrian area 1080p at 2000 kbps, from 74773 b/f, 41.101 dB, 198064 ms to 74795 b/f, 41.099 dB, 193078 ms park joy 1080p at 15000 kbps, from 290727 b/f, 30.640 dB, 609113 ms to 290558 b/f, 30.630 dB, 592815 ms Overall compression performance of speed 3 is changed derf -0.171% stdhd -0.168% Change-Id: I8d47dd543a5f90d7a1c583f74035b926b6704b95	2014-08-13 12:15:13 -07:00
Jim Bankoski	5c55202c6b	intra blocks disallowed inadvertently At speed 6 the smallest partitioning was 16x16 and biggest intra block was 8x8, essentially disallowing all intra blocks which produces ugly artifacts when revealing new video. Change-Id: I364042d4c64e09be0666ade64aac94d0a1b586cf	2014-08-12 16:22:32 -07:00
Dmitry Kovalev	cd1fbc67f9	Simplifying vp9_set_speed_features() function. Change-Id: I3e67230690b81ef54ef48ae26107fe7bc880ab8e	2014-08-08 16:29:24 -07:00
Dmitry Kovalev	91c2f1e45a	Moving pass from VP9_COMP to VP9EncoderConfig. We had a very complicated way to initialize cpi->pass from cfg->g_pass: switch (cfg->g_pass) { case VPX_RC_ONE_PASS: oxcf->mode = ONE_PASS_GOOD; break; case VPX_RC_FIRST_PASS: oxcf->mode = TWO_PASS_FIRST; break; case VPX_RC_LAST_PASS: oxcf->mode = TWO_PASS_SECOND_BEST; break; } cpi->pass = get_pass(oxcf->mode). Now pass is moved to VP9EncoderConfig and initialization is simple: switch (cfg->g_pass) { case VPX_RC_ONE_PASS: oxcf->pass = 0; break; case VPX_RC_FIRST_PASS: oxcf->pass = 1; break; case VPX_RC_LAST_PASS: oxcf->pass = 2; break; } Change-Id: I8f582203a4575f5e39b071598484a8ad2b72e0d9	2014-08-08 14:27:54 -07:00
Alex Converse	2be9ea610f	Use INTER_ALL for VAR based partitions for screencast material. This offers 25% more compression on my HD screencast testset. Change-Id: I85eaef95fd8f2e03e326443e9514482b2ee35cef	2014-08-05 15:23:50 -07:00
Jingning Han	ca2dcb7fed	Chessboard pattern partition search This commit enables a chessboard pattern constrained partition search for 720p and above resolutions. The scheme applies stricter partition search to alternative blocks based on its above/left neighboring blocks' partition range, as well as that of the collocated blocks in the previous frame. It is currently turned on at 16x16 block size level. The chessboard pattern is flipped per coding frame. The speed 3 runtime is reduced: park_joy_1080p, 652832 ms -> 607738 ms (7% speed-up) pedestrian_area_1080p, 215998 ms -> 200589 ms (8% speed-up) The compression performance is changed: hd -0.223% stdhd -0.295% Change-Id: I2d4d123ae89f7171562f618febb4d81789575b19	2014-07-30 10:32:41 -07:00
Jingning Han	54ad09586c	Enable chessboard inter prediction filter type search This commit enables a chessboard pattern prediction filter type search scheme for rate-distortion optimization speed-up. For the inferred motion vector modes, the encoder can re-use its above/left neighbor blocks' prediction filter type and skip a full test on all possible filter types. Such operation is turned on/off alternatively in a chessboard manner. It is turned on in speed 3. For test clip pedestrian 1080p, the runtime is reduced from 231500 ms -> 221700 ms. The compression performance is changed: derf: -0.147% yt: -0.134% hd: -0.079% stdhd: -0.220% Change-Id: I1912f278e7576c2dc632688e3ad7a257410c605a	2014-07-22 16:49:03 -07:00
Jingning Han	ffd948bbd5	Turn on adaptive pred filter scheme for sub8x8 below 720p For sequences of resolution below 720p, the encoder will check intra prediction modes and inter prediction modes from LAST_FRAME. This commit turns on adaptive prediction filter scheme for sub8x8 blocks, where inter prediction modes are enabled. For the test sequence bus at CIF, the speed 2 runtime goes down from 17879 ms to 16783 ms, i.e., 6% speed up. The compression performance of derf set is down by -0.128%. Change-Id: I01d5321a5ceab4e0666ac5be56c52d896c7a8d45	2014-07-21 16:22:56 -07:00
Yaowu Xu	51c60a891e	make default_interp_filter choice a speed feature This commit changed the hard-coded DEFAULT_INTERP_FILTER to a speed feature with the same default value: SWITCHABLE. Change-Id: I7f54f40f1bd3f5277841d04b85db7a84e47313f1	2014-07-16 14:28:51 -07:00
Yaowu Xu	faa686bb1b	Added a rt speed 12 We target this speed to achieve similar encoding speed and better compression than vp8 rt mode with cpu-used at -12. Change-Id: Ic1bb4371c81a17ea80e83459c1cbf4c09a3498e8	2014-07-15 16:46:22 -07:00
Jingning Han	b957439c87	Fix a potential invalid memory access in non-RD coding flow This commit fixes a potential out-of-boundary memory access due to the use of reuse_inter_pred_sby in the non-RD coding flow. It resolves the corresponding asan error. Change-Id: Iff605f5921230966990013541cd855d698810922	2014-07-11 15:50:43 -07:00
Yunqing Wang	a581da218e	Remove repetitive code in mcomp.c Deleted vp9_find_best_sub_pixel_comp_tree(), and combined it in vp9_find_best_sub_pixel_tree(). Change-Id: Ifb25763c8b19822df5537cc1daa76ce88dc3b056	2014-07-09 14:50:50 -07:00
Yunqing Wang	9bd3be69a4	Adjust full-pixel search method in real-time mode Use FAST_HEX in speed 5 and 6, which covers more points than FAST_DIAMOND and improves motion search quality. At speed 6, RTC set borg tests showed slight quality gain (psnr gain: 0.143%, ssim gain: 0.226%). No noticeable encoding speed change. Change-Id: Ifa62875d9a52ee382ec494f271382bb77d8c67bf	2014-07-09 12:56:25 -07:00
Jingning Han	f6bf614b2f	Merge "Re-design quantization process for 32x32 transform block"	2014-07-09 11:55:26 -07:00
Jingning Han	9ad1b9fc67	Re-design quantization process for 32x32 transform block This commit enables a new quantization process for 32x32 2D-DCT transform coefficient blocks. It improves the compression performance of speed 5 by 1.4%. The overall compression gains of speed 5 due to the new quantization scheme is 4.7%. It also includes the SSSE3 implementation of the 32x32 quantization process. Change-Id: I0855b124fd6462418683f783f5bcb44255c9993b	2014-07-08 16:55:28 -07:00
Alex Converse	f60a1178c6	Cleanup motion search speed features. * Replace max_step_search_steps with constant MAX_MVSEARCH_STEPS * Fold (reduce_first_step_size + speed > 5) into reduce_first_step_size replacing uses of reduce_first_step_size that don't add the speed check with zero. Change-Id: Iae46395dbf3eaca138bf4d18b838a9e364b5a198	2014-07-07 10:08:45 -07:00
Yaowu Xu	92a6db7928	Added a speed feature controlling a motion search parameter This commit added a speed feature to control the step_param used in full pixel motion search. The intention is to reduced the search steps for high speed real time coding. Change-Id: I21d2f0105c2b647783a6688615da7fcf2b6d670b	2014-07-02 09:30:43 -07:00
Yaowu Xu	82fd084b35	Merge "Re-design quantization process"	2014-07-01 19:04:01 -07:00
Jingning Han	9ac2f66320	Re-design quantization process This commit re-designs the quantization process for transform coefficient blocks of size 4x4 to 16x16. It improves compression performance for speed 7 by 3.85%. The SSSE3 version for the new quantization process is included. The average runtime of the 8x8 block quantization is reduced from 285 cycles -> 255 cycles, i.e., over 10% faster. Change-Id: I61278aa02efc70599b962d3314671db5b0446a50	2014-07-01 17:00:07 -07:00
Yunqing Wang	f31ff029df	Elevate NEWMV mode checking threshold in real time The current threshold is knid of low, and in many cases NEWMV mode is checked but not picked as the best mode. This patch added a speed feature to increase NEWMV threshold, so that less partition mode checking goes to check NEWMV. This feature is enabled for speed 6 and 7. Rtc set borg tests showed: 1. Speed 6, overall psnr: -0.088%, ssim: -1.339%; Average speedup on rtc set is 11.1%. 2. Speed 7, overall psnr: -0.505%, ssim: -2.320% Average speedup on rtc set is 12.9%. Change-Id: I953b849eeb6e0d5a1f13eacba30c14204472c5be	2014-07-01 14:50:39 -07:00
Yunqing Wang	dee5782f93	Enable encode breakout in real time For real time speed 7, once encode breakout is on(i.e. encoding setting --static-thresh=1), a proper encode breakout threshold is set to speed up the encoder. Set --static-thresh=1, RTC set borg test showed a slight overall psnr loss of 0.162%, but ssim gain of 0.287%. The average speedup on RTC set is 6%, and for some clips, the speedup can be 10+%. Change-Id: Id522d9ce779ff7c699936d13d0c47083de4afb85	2014-06-30 10:41:12 -07:00
Yunqing Wang	9d41313e4b	Decide the partitioning threshold from the variance histogram Before encoding a frame, calculate and store each 16x16 block's variance of source difference between last and current frame. Find partitioning threshold T for the frame from its variance histogram, and then use T to make partition decisions. Comparing with fixed 16x16 partitioning, rtc set test showed an overall psnr gain of 3.242%, and ssim gain of 3.751%. The best psnr gain is 8.653%. The overall encoding speed didn't change much. It got faster for some clips(for example, 12% speedup for vidyo1), and a little slower for others. Also, a minor modification was made in datarate unit test. Change-Id: Ie290743aa3814e83607b93831b667a2a49d0932c	2014-06-30 09:36:23 -07:00
Yaowu Xu	d0cb273e04	Allow encoder to set lpf level to 0 As a way to speed-up rtc encoding at speed 7. Change-Id: Ie36a010392cf7b741dc130df21a4e733622a75b7	2014-06-27 15:23:41 -07:00
Yaowu Xu	3f92b7b994	Added a new speed 7 in rt mode To experiment with different speed/quality compromises. Change-Id: Ia9d4b85243554d620498a327da37c356e752b07f	2014-06-27 13:29:09 -07:00
Jingning Han	5a3e3c6d3f	Adaptive txfm size selection depending on residual sse/variance This commit enables an adaptive transform size selection method for speed -6. It uses largest transform size when the sse is more than 4 times of variance, i.e., most energy is compacted in the DC coefficient. Otherwise, use the default TX_8X8. It improves the compression efficiency for rtc set of speed -6 by 0.8%, no speed change observed. Change-Id: Ie6ed1e728ff7bf88ebe940a60811361cdd19969c	2014-06-26 16:00:42 -07:00
Jingning Han	2aa50eafb2	Make non-RD intra mode search txfm size dependent This commit fixes the potential issue in the non-RD mode decision flow that only checks part of the block to estimate the cost. It was due to the use of fixed transform size, in replacing the largest transform block size. This commit enables per transform block cost estimation of the intra prediction mode in the non-RD mode decision. Change-Id: I14ff92065e193e3e731c2bbf7ec89db676f1e132	2014-06-25 18:52:18 -07:00
Yunqing Wang	bccc785f63	Merge "Reuse inter prediction result in real-time speed 6"	2014-06-25 08:18:33 -07:00
Yunqing Wang	0aae100076	Reuse inter prediction result in real-time speed 6 In real-time speed 6, no partition search is done. The inter prediction results got from picking mode can be reused in the following encoding process. A speed feature reuse_inter_pred_sby is added to only enable the resue in speed 6. This patch doesn't change encoding result. RTC set tests showed that the encoding speed gain is 2% - 5%. Change-Id: I3884780f64ef95dd8be10562926542528713b92c	2014-06-24 12:46:33 -07:00
Paul Wilkins	8160a26fa0	Fix some bugs in multi-arf Fix some bugs relating to the use of buffers in the overlay frames. Fix bug where a mid sequence overlay was propagating large partition and transform sizes into the subsequent frame because of :- sf->last_partitioning_redo_frequency > 1 and sf->tx_size_search_method == USE_LARGESTALL Change-Id: Ibf9ef39a5a5150f8cbdd2c9275abb0316c67873a	2014-06-24 13:07:48 +01:00
Jingning Han	48b8ce21f0	Merge "Allow key frame more flexibility in mode search"	2014-06-20 09:38:02 -07:00

1 2

95 Commits