generic-library/vpx

Author	SHA1	Message	Date
Marco	03e8f13337	vp9: Modification to adapt the ARF usage for 1 pass vbr Add stats for past ARF usage, and use it to disable ARF usage based on some conditions. Overall improvement on ytlive set, reduces the regression on the problem clips for this feature. Only affects when sf->use_altref_onepass is enabled (currently off by default). Change-Id: I66267f227ea132dc86acb730e9882f85bead2cdb	2017-09-28 09:10:30 -07:00
Marco	999bd6ea84	vp9: Fix denoising condition when pickmode partition is used. When the superblock partition is based on the nonrd-pickmode, we need to avoid the denoising. Current condition was based on the speed level. This change is to make the condition at the superblock level, as the switch in partitioning may be done at sb level based on source_sad (e.g., in speed 6). Change-Id: I12ece4f60b93ed34ee65ff2d6cdce1213c36de04	2017-07-30 23:16:38 -07:00
Marco	0c9e2f4c15	vp9: Reuse motion from choose_partitioning in NEWMV search. When int_pro_motion_estimation is done for superblock in choose_partitioning, use it to avoid the full_pixel_search for NEWMV mode, if bsize is >= 32X32. For speed > 7. Small/neutral change on RTC metrics. ~1-2% speedup on arm on high motion clip. Change-Id: I3cfe6833ff4bf75d4afa83eaf058ad45729de85b	2017-07-17 13:15:48 -07:00
James Zern	80b83c73ba	cosmetics,vp9/: normalize inv/fwd_txfm naming + vpx_dsp/, test/ itxfm -> inv_txfm, ftxfm -> fwd_txfm Change-Id: I3aacdb65143576d64cfe5c9b14dd358c17c1fe7e	2017-07-06 18:35:44 -07:00
James Zern	8d1bda93f4	cosmetics,vp9/encoder: s/txm/txfm/ txfm is more commonly used as an abbreviation through the codebase Change-Id: I86fd90ef132468f9da270091c05daa1f5a49ece2	2017-06-29 15:08:47 -07:00
Linfeng Zhang	d5de63d2be	Update highbd idct functions arguments to use uint16_t dst BUG=webm:1388 Change-Id: I3581d80d0389b99166e70987d38aba2db6c469d5	2017-05-03 13:59:16 -07:00
Marco	92ec0674fd	vp9; Reduce artifact in non-rd pickmode for lighting changes. Add a low-variance high-sumdiff to the superblock content state and use it to limit the mv and bias some decisions in non-rd pickmode. Only affects speed >= 6. Reduces artifact for lighting changes. Small/no difference in metrics on RTC set. Change-Id: Ic84b2379fe0ae3fa71ae826ee6bae3eaf551a25b	2017-04-24 17:08:43 -07:00
Yunqing Wang	bca4564683	Make allow_exhaustive_searches feature no longer adaptive A previous patch turned on allow_exhaustive_searches feature only for FC_GRAPHICS_ANIMATION content. This patch further modified the feature by removing the exhaustive search limit, and made it no longer adaptive. As a result, the 2 counts that recorded the number of motion searches were removed, which helped achieve the determinism in the row based multi-threading encoding. Tests showed that this patch didn't cause the encoder much slower. Used exhaustive_searches_thresh for this speed feature, and removed allow_exhaustive_searches. Also, refactored the speed feature code to follow the general speed feature setting style. Change-Id: Ib96b182c4c8dfff4c1ab91d2497cc42bb9e5a4aa	2017-04-21 11:14:02 -07:00
Marco	66c6b4d6fc	vp9: 1 pass: Move source sad computation into encodeframe loop. Refactor to split the 1 passs source sad computation into scene detection (currently used for VBR and screen-content mode), and superblock based source sad computation (used in non-rd CBR mode). This allows the source sad computation for CBR mode to be multi-threaded. No change in compression. Change-Id: I112f2918613ccbd37c1771d852606d3af18c1388	2017-03-27 11:11:05 -07:00
Yunqing Wang	bf43b4c4b4	Merge "Record the sum of tx block eobs in the partition block"	2017-03-20 23:20:12 +00:00
Marco	06c8713e89	vp9: Use sb content measure to bias against golden. For each superblock, keep track of how far from current frame was the last significant content change, and use that (along with GF distance), to turnoff GF search in non-rd pickmode. Only enabled for speed >= 8. avgPNSR on RTC/RTC_derf down by ~0.9/1.2. Speedup on mac: ~3-5%. Speedup on arm: 3.6% for VGA and 4.4% for HD. Change-Id: Ic3f3d6a2af650aca6ba0064d2b1db8d48c035ac7	2017-03-20 12:42:26 -07:00
Yunqing Wang	9c2552a1c1	Record the sum of tx block eobs in the partition block The sum of tx bloxk eobs is needed in the machine learning based partition early termination. The eobs are first accumulated during tx search, and then the value associated with the best tx_size is copied to ctx for later use. After the sum of eobs are calculated correctly, re-enabled ml_partition_search_early_termination speed feature. Re-did the quality/speed test to check the impact of the fix. 1. Borg test BDRATE result: 4k set: PSNR: +0.183%; SSIM: +0.100%; hdres set: PSNR: +0.168%; SSIM: +0.256%; midres set: PSNR: +0.186%; SSIM: +0.326%; 2.Average speed gain result: 4k clips: 21%; hd clips: 26%; midres clips: 15%. The result is in line with the original result. Change-Id: I4209a95c89be03b4cbfb6a95b16885f89feddbda	2017-03-20 17:12:15 +00:00
James Zern	c09b290cea	vp9/encoder: fix segfault on win32 using vs < 2015 shift the bsse[] member of the macroblock struct to the front to avoid an incorrect offset (0) to the upper half of bsse[0] which leads to a negative resulting in a crash. restrict this to visual studio versions before 2015 (the bug was observed with 2013, fixed in 2015) to avoid any potential cache impact on other platforms. https://connect.microsoft.com/VisualStudio/feedback/details/2396360/bad-structure-offset-in-32-bit-code BUG=webm:1054 Change-Id: I40f68a1d421ccc503cc712192263bab4f7dde076	2017-03-10 17:37:17 -08:00
Ranjit Kumar Tulabandu	71061e9332	Row based multi-threading of encoding stage (Yunqing Wang) This patch implements the row-based multi-threading within tiles in the encoding pass, and substantially speeds up the multi-threaded encoder in VP9. Speed tests at speed 1 on STDHD(using 4 tiles) set show that the average speedups of the encoding pass(second pass in the 2-pass encoding) is 7% while using 2 threads, 16% while using 4 threads, 85% while using 8 threads, and 116% while using 16 threads. Change-Id: I12e41dbc171951958af9e6d098efd6e2c82827de	2017-02-15 00:49:34 +00:00
Ranjit Kumar Tulabandu	8b0c11c358	Multi-threading of first pass stats collection (yunqingwang) 1. Rebased the patch. Incorporated recent first pass changes. 2. Turned on the first pass unit test. Change-Id: Ia2f7ba8152d0b6dd6bf8efb9dfaf505ba7d8edee	2017-01-24 15:48:02 -08:00
Marco	219cdab676	vp9: Add feature to use block source_sad for realtime mode. Only for speed >= 7, and affects skipping of intra modes. Threshold is set low for now, needs to be tuned. Small/no difference in metrics on rtc clips. Change-Id: If9bdbd43f08d1f80407cdd2e9e5e96780dcd2424	2017-01-20 11:57:02 -08:00
paulwilkins	635ae8bdc1	Adjust coefficient optimization and tx_domain rd speed features. Previously Tx domain rd was used in all cases above speed 0. Coefficient optimization was only enabled for best and speed 0. This patch selectively sets these features at other speed settings based on block complexity. For the Netflix and HD sets in particular the quality gains are large compared to the speed hit. At speed 1 the average psnr gain in the NF set is > 2.5% with one clip coming in at 18% and some points almost 30%. Average gains for the lower resolution test sets are around 1%. The gains are biggest at low Q so some further optimization may be possible. Change-Id: I340376c7b2a78e5389a34b7ebdc41072808d0576	2016-08-25 15:36:16 +01:00
Alex Converse	6554333b59	Refactor mv limits. Change-Id: Ifebdc9ef37850508eb4b8e572fd0f6026ab04987	2016-08-08 11:54:00 -07:00
clang-format	e0cc52db3f	vp9/encoder: apply clang-format Change-Id: I45d9fb4013f50766b24363a86365e8063e8954c2	2016-08-02 16:47:11 -07:00
JackyChen	f9c0587200	vp9: Encoding cycle reduction for speed 8. 1. Skip golden non-zeromv and newmv-last for bsize >= 16x16 if the temporal variance obtained from choose_partitioning is very low. 2. Skip horz and vert INTRA mode for speed 8. This change works best on the clips with little noise and with some motion (e.g. gips_motion which has > 5% speed up). PSNR drop is 1.78% on rtc test set, no obvious visual quality regression found. Change-Id: Ib43b5b20e67809d03c5a6890818ddff59e1fc94a	2016-06-13 09:33:22 -07:00
jackychen	bacc67f4a8	vp9: Skip some modes when variance is low for big blocks, for 1 pass real-time. Skip intra-mode and some inter-modes (newmv, nearmv, nearestmv) for golden frame if the variance got from choose_partitioning is very low. Only for 1 pass real-time CBR mode and bsize >= 32x32, it has ~2.5% speed up with less than 0.1% PSNR drop for rtc test set. Don't see visual regression. Change-Id: I70efbc95a1007231ae36f02c5b2fbf6cd35077ad	2016-06-01 13:54:18 -07:00
Alex Converse	fac947df77	Restore previous motion search bit-error scale. The bit to error transformation got doubled as a result of going from 8-bit to 9-bit costs (change `d13385c`). Use defines to derive the scale numbers and comment some of the fields. derf: -0.023 BDRATE hevcmr: +0.067 BDRATE stdhd: +0.098 BDRATE (These are substantially smaller than than the original gains from 8 to 9 bit costing.) Change-Id: I6a2b3b029b2f1415e4f90a05709b2333ec0eea9b	2016-02-09 13:20:25 -08:00
Marco	b39a599cef	vp9 non-rd mode: Modification for detected skin areas. If a superblock contains alot of "skin" then force split of 64x64 partition, and make some adjustments in mode selection. This helps to reduce artifacts on moving face/skin areas at low bitrates. Little/no change in metrics: avgPSNR/SSIM down by ~0.12%. Small encoding time increase < 1%. Change-Id: Ic57f52148c3716f391419fab0530d916e4c1d186	2016-01-27 17:38:58 -08:00
paulwilkins	0149fb3d6b	Changes to exhaustive motion search. This change alters the nature and use of exhaustive motion search. Firstly any exhaustive search is preceded by a normal step search. The exhaustive search is only carried out if the distortion resulting from the step search is above a threshold value. Secondly the simple +/- 64 exhaustive search is replaced by a multi stage mesh based search where each stage has a range and step/interval size. Subsequent stages use the best position from the previous stage as the center of the search but use a reduced range and interval size. For example: stage 1: Range +/- 64 interval 4 stage 2: Range +/- 32 interval 2 stage 3: Range +/- 15 interval 1 This process, especially when it follows on from a normal step search, has shown itself to be almost as effective as a full range exhaustive search with step 1 but greatly lowers the computational complexity such that it can be used in some cases for speeds 0-2. This patch also removes a double exhaustive search for sub 8x8 blocks which also contained a bug (the two searches used different distortion metrics). For best quality in my test animation sequence this patch has almost no impact on quality but improves encode speed by more than 5X. Restricted use in good quality speeds 0-2 yields significant quality gains on the animation test of 0.2 - 0.5 db with only a small impact on encode speed. On most clips though the quality gain and speed impact are small. Change-Id: Id22967a840e996e1db273f6ac4ff03f4f52d49aa	2015-11-13 10:16:31 +00:00
Alex Converse	4ac5058afc	Give skip_txfm constants names. This is using a define instead of an enum to keep byte packing. Change-Id: I3abb07c8bfe377e19be4531b624af7b7b4207792	2015-07-31 10:08:08 -07:00
Alex Converse	dfe7fdae7d	Comment zcoeff_blk. Change-Id: Iefc2eb78e71472ecf51802ec59ff32caef4bd0f4	2015-07-29 16:53:33 -07:00
Scott LaVarnway	c06d56cc7d	VP9: Move ref_mvs[][] and mode_context[] from MB_MODE_INFO to MB_MODE_INFO_EXT. This saves 36 bytes per 8x8 area for both the decoder and encoder. (encoder has two MODE_INFO buffers) Change-Id: If006abb2224acaf326df3c2be09e77e967662107	2015-06-29 12:46:47 -07:00
Jingning Han	0c6d3a03e1	Account for chroma component costs in RTC mode decision This commit allows the encoder to account for additional chroma plane costs in the mode decision process, if the current block potentially contains significant color change. It improves the visual quality at very low bit-rates. The compression performance of dark720p is improved by 12.39% in speed 6. For jimred at 150 kbps, the PSNR of V component (red) increased by 0.2 dB, at the expense of about 5% increase in encoding time. Note that for sequences where the chroma components are fairly consistent, the encoding time increase is negligible. On average the rtc set compression performance is improved by 1.172% in PSNR and 1.920% in SSIM. Change-Id: Ia55b24ef23a25304f7ec9958fbf07fd6e658505c	2015-02-04 09:45:14 -08:00
Jingning Han	d0f2377027	Revert "Revert "Removal of legacy zbin_extra / zbin_oq_value."" This reverts commit `9946ee23e0`. Fix the ssse3 asm function. Change-Id: I07f77a63aa98087626e45c4e87aa5dcafc0b0b07	2014-12-22 10:09:25 -08:00
Paul Wilkins	9946ee23e0	Revert "Removal of legacy zbin_extra / zbin_oq_value." This reverts commit `e9b586e21b`. Change-Id: I5b36e6727da6c05278d97e2c37b80c109f79bed4	2014-12-19 15:02:58 +00:00
Paul Wilkins	e9b586e21b	Removal of legacy zbin_extra / zbin_oq_value. zbin extra / zbin_oq_value was widely passed around, hence removal touches a lot of code. Change-Id: Idc94359735b60c38a160e4385ae09d5ca8b6b8e5	2014-12-18 16:49:11 +00:00
Yunqing Wang	ad7586a9e1	vp9_ethread: move max/min partition size to mb struct The max_partition_size and max_partition_size are set at the beginning while setting speed features, and then adjusted at SB level. Moving them to mb struct ensures there is a local copy for each thread. Change-Id: I7dd08dc918d9f772fcd718bbd6533e0787720ad4	2014-11-20 09:24:50 -08:00
Jingning Han	caaf63b2c4	Rework cut-off decisions in cyclic refresh aq mode This commit removes the cyclic aq mode dependency on in_static_area and reworks the corresponding cut-off thresholds. It improves the compression performance of speed -5 by 1.47% in PSNR and 2.07% in SSIM, and the compression performance of speed -6 by 3.10% in PSNR and 5.25% in SSIM. Speed wise, about 1% faster in both settings at high bit-rates. Change-Id: I1ffc775afdc047964448d9dff5751491ba4ff4a9	2014-11-05 21:17:09 -08:00
Deb Mukherjee	1929c9b391	Rename highbitdepth functions to use highbd prefix Uses highbd_ prefix convention consistently. Change-Id: I58f7f799a7ff8e32701bcd71c955bcf1cdd4581e	2014-10-09 14:40:40 -07:00
Jingning Han	0a9f5fa146	Remove repeated header files from vp9_block.h This commit removes unused header file vp9_onyxc_int.h and repeatedly included file vpx_ports/mem.h from vp9_block.h Change-Id: I400b210bd1da48f1880bd50a8f4a6e2c690e15a1	2014-10-01 13:01:43 -07:00
Deb Mukherjee	10783d4f3a	Adds high bitdepth transform functions and tests Adds various high bitdepth transform functions and tests. Much of the changes are related to using typedefs tran_low_t and tran_high_t for the final transform cofficients and intermediate stages of the transform computation respectively rather than fixed types int16_t/int. When vp9_highbitdepth configure flag is off, these map tp int16_t/int32_t, but when the flag is on, they map to int32_t/int64_t to make space for needed extra precision. Change-Id: I3c56de79e15b904d6f655b62ffae170729befdd8	2014-09-11 19:56:33 -07:00
Jingning Han	d62d804e64	Speed up compound inter prediction mode check This commit allows the encoder to store outcomes of single reference frame modes and compares them to decide if the inter prediction filter, forward transform, and quantization can be skipped. The compression performance of speed 3 is down derf -0.364% stdhd -0.198% For test sequences, the speed 3 runtime is reduced highway CIF 100 kbps, 51976 ms -> 45033 ms, 13% speed-up stockholm 720p 1000 kbps, 71826 ms -> 67838 ms, 5.5% speed-up pedestrian 1080p 2000 kbps, 154924 ms -> 150702 ms, 2.6% speed-up Change-Id: I5aa26f918d2b4b5197a2c0afa2779319f1c88e44	2014-09-03 15:28:01 -07:00
Jingning Han	02e6ecdc4c	Extend block level sse to support multiple txfm blocks This commit extends the sse and forward transform computation flag to support the case 64x64 blocks where there are 4 32x32 2D-DCT blocks. Change-Id: I86a3e805dfaa0f3abd812f590520c71aa0e40473	2014-08-29 08:29:34 -07:00
Jingning Han	2b1c6eacb9	Move mv cost table to VP9_COMP The mv cost table set is maintained at frame level, hence moved to VP9_COMP. Change-Id: Icb3d0185d47443590bd11357de729aa4ba5c5e5e	2014-08-22 09:38:07 -07:00
Jingning Han	245e57c78e	Merge "Enable fast forward txfm and quant for rate-distortion search"	2014-08-11 17:56:48 -07:00
Jingning Han	9da4cd94f5	Merge "Extend skip_txfm flag into array to cover YUV planes"	2014-08-11 08:53:25 -07:00
Jingning Han	b4b09c9796	Enable fast forward txfm and quant for rate-distortion search This commit enables encoder to select fast forward transform and quantization path according to the prediction residual sse/variance, in the rate-distortion optimization scheme. Change-Id: Ief9fc3844fd4107166d401970e800c6e5ce2b5fe	2014-08-08 16:16:51 -07:00
Jingning Han	1a8d45f309	Extend skip_txfm flag into array to cover YUV planes Change-Id: Ieae182d72d625d0d3fd4ed7c7d24cb521a0f21b0	2014-08-05 15:42:12 -07:00
Jim Bankoski	63c1c3ee64	energy -> int to avoid unsigned / signed mismatch Change-Id: Idd1327852f0df0eab0ea3b33959f2b8292b77301	2014-08-04 12:07:26 -07:00
Jingning Han	9ac2f66320	Re-design quantization process This commit re-designs the quantization process for transform coefficient blocks of size 4x4 to 16x16. It improves compression performance for speed 7 by 3.85%. The SSSE3 version for the new quantization process is included. The average runtime of the 8x8 block quantization is reduced from 285 cycles -> 255 cycles, i.e., over 10% faster. Change-Id: I61278aa02efc70599b962d3314671db5b0446a50	2014-07-01 17:00:07 -07:00
Yunqing Wang	9d41313e4b	Decide the partitioning threshold from the variance histogram Before encoding a frame, calculate and store each 16x16 block's variance of source difference between last and current frame. Find partitioning threshold T for the frame from its variance histogram, and then use T to make partition decisions. Comparing with fixed 16x16 partitioning, rtc set test showed an overall psnr gain of 3.242%, and ssim gain of 3.751%. The best psnr gain is 8.653%. The overall encoding speed didn't change much. It got faster for some clips(for example, 12% speedup for vidyo1), and a little slower for others. Also, a minor modification was made in datarate unit test. Change-Id: Ie290743aa3814e83607b93831b667a2a49d0932c	2014-06-30 09:36:23 -07:00
Alex Converse	aeacaac574	Switch active map implementation to segment based. Change-Id: Ibb841a1fa4d08d164cf5461246ec290f582b1f80	2014-06-20 13:13:23 -07:00
Dmitry Kovalev	f80a346e0e	Merge "Replacing txfm_size with tx_size."	2014-06-12 13:07:11 -07:00
Dmitry Kovalev	4345d12d28	Replacing txfm_size with tx_size. Change-Id: Ifa6374e9db5919322733b656e0865f5f19ee6f2c	2014-06-12 11:57:26 -07:00
Jingning Han	ccba289f8d	Fast computation path for forward transform and quantization This commit enables a fast path computational flow for forward transformation. It checks the sse and variance of prediction residuals and decides if the quantized coefficients are all zero, dc only, or more. It then selects the corresponding coding path in the forward transformation and quantization stage. It is currently enabled in rtc coding mode. Will do it for rd coding mode next. In speed -6, the runtime for pedestrian_area 1080p at 1000 kbps goes down from 14234 ms to 13704 ms, i.e., about 4% speed-up. Overall coding performance for rtc set is changed by -0.18%. Change-Id: I0452da1786d59bc8bcbe0a35fdae9f623d1d44e1	2014-06-12 11:10:54 -07:00

1 2 3 4

186 Commits