generic-library/vpx

Author	SHA1	Message	Date
Marco	999bd6ea84	vp9: Fix denoising condition when pickmode partition is used. When the superblock partition is based on the nonrd-pickmode, we need to avoid the denoising. Current condition was based on the speed level. This change is to make the condition at the superblock level, as the switch in partitioning may be done at sb level based on source_sad (e.g., in speed 6). Change-Id: I12ece4f60b93ed34ee65ff2d6cdce1213c36de04	2017-07-30 23:16:38 -07:00
Yunqing Wang	9c2552a1c1	Record the sum of tx block eobs in the partition block The sum of tx bloxk eobs is needed in the machine learning based partition early termination. The eobs are first accumulated during tx search, and then the value associated with the best tx_size is copied to ctx for later use. After the sum of eobs are calculated correctly, re-enabled ml_partition_search_early_termination speed feature. Re-did the quality/speed test to check the impact of the fix. 1. Borg test BDRATE result: 4k set: PSNR: +0.183%; SSIM: +0.100%; hdres set: PSNR: +0.168%; SSIM: +0.256%; midres set: PSNR: +0.186%; SSIM: +0.326%; 2.Average speed gain result: 4k clips: 21%; hd clips: 26%; midres clips: 15%. The result is in line with the original result. Change-Id: I4209a95c89be03b4cbfb6a95b16885f89feddbda	2017-03-20 17:12:15 +00:00
Yunqing Wang	670101439f	Apply machine learning-based early termination in VP9 partition search This patch was based on Yang Xian's intern project code. Further modifications were done. 1. Moved machine-learning related parameters into the context structure. 2. Corrected the calculation of sum_eobs. 3. Removed unused parameters and calculations. 4. Made it work with multiple tiles. 5. Added a speed feature for the machine-learning based partition search early termination. 6. Re-organized the code. The patch was rebased to the top-of-tree. Borg test BDRATE result: 4k set: PSNR: +0.144%; SSIM: +0.043%; hdres set: PSNR: +0.149%; SSIM: +0.269%; midres set: PSNR: +0.127%; SSIM: +0.257%; Average speed gain result: 4k clips: 22%; hd clips: 23%; midres clips: 15%. Change-Id: I0220e93a8277e6a7ea4b2c34b605966e3b1584ac	2017-03-13 09:54:18 -07:00
Marco	131c1600a9	vp9 denoiser: Bias to last reference for temporal filter. Change-Id: I6a360a12e8da8cdcb8a779647512591612d64f31	2015-11-20 15:38:32 -08:00
James Zern	b09aa3ac54	vp9: add extern "C" to headers Change-Id: I1b6927ad820f99340985b094d415aaab14defaf4	2015-09-09 23:15:59 -07:00
Yunqing Wang	3b2e73b9a4	Remove tx cache and speed up tx size selection 1. The RD scores obtained during the tx size selection were stored in the tx cache, and used to help make the tx decision for the following frames. This wasn't used anymore in VP9 encoder. Recovered the related decision making code from 1.5+ years ago, and borg tests didn't show any quality gain. This patch removed it to lower the complexity. 2. An optimization was done after the above refactoring. If the tx_mode is not TX_MODE_SELECT, we only need to test the chosen tx size instead of all posible tx sizes. This gave a 1.5% average speed gain at speed 2, and a 1% average speed gain at speed 3. Change-Id: Id8cd650e066a8cef33829d8c15388a8138adc78c	2015-07-30 18:53:40 -07:00
Scott LaVarnway	c06d56cc7d	VP9: Move ref_mvs[][] and mode_context[] from MB_MODE_INFO to MB_MODE_INFO_EXT. This saves 36 bytes per 8x8 area for both the decoder and encoder. (encoder has two MODE_INFO buffers) Change-Id: If006abb2224acaf326df3c2be09e77e967662107	2015-06-29 12:46:47 -07:00
Yunqing Wang	edbd61e136	vp9_ethread: modify VP9_COMP structure This patch modified struct VP9_COMP. Created a struct ThreadData to include data that need to be copied for each thread. In multiple thread case, one thread processes one tile. all threads share one copy of VP9_COMP, (refer to VP9_COMP cpi in the code) but each thread has its own copy of ThreadData, (refer to ThreadData td in the code). Therefore, within the scope of encode_tiles(), both cpi and td need to be passed as function parameters. In single thread case, the FRAME_COUNTS pointer in ThreadData points to "counts" in VP9_COMMON. Change-Id: Ib37908b2d8e2c0f4f9c18f38017df5ce60e8b13e	2014-11-24 17:57:38 -08:00
Jingning Han	caaf63b2c4	Rework cut-off decisions in cyclic refresh aq mode This commit removes the cyclic aq mode dependency on in_static_area and reworks the corresponding cut-off thresholds. It improves the compression performance of speed -5 by 1.47% in PSNR and 2.07% in SSIM, and the compression performance of speed -6 by 3.10% in PSNR and 5.25% in SSIM. Speed wise, about 1% faster in both settings at high bit-rates. Change-Id: I1ffc775afdc047964448d9dff5751491ba4ff4a9	2014-11-05 21:17:09 -08:00
Jingning Han	9f128b3ed9	Hybrid partition search for rtc coding mode This commit re-designs the recursive partition search scheme in rtc speed -5. It first checks if the current block is under cyclic refresh mode. If so, apply recursive partition search. Otherwise, perform sub-sampled pixel based partition selection. When the pre-selection finds the partition size should be 32x32 or above, use the partition size directly. Otherwise, apply partition search at nearby levels around the preset partition size. It is enabled in speed -5. The compression performance of rtc speed -5 is improved by 9.4%. Speed wise, the run-time goes slower from 1% to 10%. nik_720p, 1000 kbps 33220 b/f, 38.977 dB, 10109 ms -> 33200 b/f, 39.119 dB, 10210 ms vidyo1_720p, 1000 kbps 16536 b/f, 40.495 dB, 10119 ms -> 16536 b/f, 40.827 dB, 11287 ms Change-Id: I65adba352e3adc03bae50854ddaea1b421653c6c	2014-10-20 13:02:12 -07:00
Jingning Han	5d9cdac087	Move inter filter defs to vp9_filter.h Add comments on the use case of these definitions. Further reduce the scope of header file in vp9_context_tree.h. Change-Id: Ic4a7638e838d0ac441b64abfc56e57354c059d75	2014-10-07 12:16:37 -07:00
Jingning Han	5e32036b97	Reduce the scope of the header file used in vp9_context_tree.h Change-Id: I264ee35044a5973c7725daba7af870968353a3c1	2014-10-07 11:13:35 -07:00
Deb Mukherjee	10783d4f3a	Adds high bitdepth transform functions and tests Adds various high bitdepth transform functions and tests. Much of the changes are related to using typedefs tran_low_t and tran_high_t for the final transform cofficients and intermediate stages of the transform computation respectively rather than fixed types int16_t/int. When vp9_highbitdepth configure flag is off, these map tp int16_t/int32_t, but when the flag is on, they map to int32_t/int64_t to make space for needed extra precision. Change-Id: I3c56de79e15b904d6f655b62ffae170729befdd8	2014-09-11 19:56:33 -07:00
Jingning Han	d62d804e64	Speed up compound inter prediction mode check This commit allows the encoder to store outcomes of single reference frame modes and compares them to decide if the inter prediction filter, forward transform, and quantization can be skipped. The compression performance of speed 3 is down derf -0.364% stdhd -0.198% For test sequences, the speed 3 runtime is reduced highway CIF 100 kbps, 51976 ms -> 45033 ms, 13% speed-up stockholm 720p 1000 kbps, 71826 ms -> 67838 ms, 5.5% speed-up pedestrian 1080p 2000 kbps, 154924 ms -> 150702 ms, 2.6% speed-up Change-Id: I5aa26f918d2b4b5197a2c0afa2779319f1c88e44	2014-09-03 15:28:01 -07:00
Jingning Han	02e6ecdc4c	Extend block level sse to support multiple txfm blocks This commit extends the sse and forward transform computation flag to support the case 64x64 blocks where there are 4 32x32 2D-DCT blocks. Change-Id: I86a3e805dfaa0f3abd812f590520c71aa0e40473	2014-08-29 08:29:34 -07:00
Yunqing Wang	4d2c376923	Early termination in encoding partition search In the partition search, the encoder checks all possible partitionings in the superblock's partition search tree. This patch proposed a set of criteria for partition search early termination, which effectively decided whether or not to terminate the search in current branch based on the "skippable" result of the quantized transform coefficients. The "skippable" information was gathered during the partition mode search, and no overhead calculations were introduced. This patch gives significant encoding speed gains without sacrificing the quality. Borg test results: 1. At speed 1, stdhd set: psnr: +0.074%, ssim: +0.093%; derf set: psnr: -0.024%, ssim: +0.011%; 2. At speed 2, stdhd set: psnr: +0.033%, ssim: +0.100%; derf set: psnr: -0.062%, ssim: +0.003%; 3. At speed 3, stdhd set: psnr: +0.060%, ssim: +0.190%; derf set: psnr: -0.064%, ssim: -0.002%; 4. At speed 4, stdhd set: psnr: +0.070%, ssim: +0.143%; derf set: psnr: -0.104%, ssim: +0.039%; The speedup ranges from several percent to 60+%. speed1 speed2 speed3 speed4 (1080p, 100f): old_town_cross: 48.2% 23.9% 20.8% 16.5% park_joy: 11.4% 17.8% 29.4% 18.2% pedestrian_area: 10.7% 4.0% 4.2% 2.4% (720p, 200f): mobcal: 68.1% 36.3% 34.4% 17.7% parkrun: 15.8% 24.2% 37.1% 16.8% shields: 45.1% 32.8% 30.1% 9.6% (cif, 300f) bus: 3.7% 10.4% 14.0% 7.9% deadline: 13.6% 14.8% 12.6% 10.9% mobile: 5.3% 11.5% 14.7% 10.7% Change-Id: I246c38fb952ad762ce5e365711235b605f470a66	2014-08-28 11:27:28 -07:00
Jingning Han	1a8d45f309	Extend skip_txfm flag into array to cover YUV planes Change-Id: Ieae182d72d625d0d3fd4ed7c7d24cb521a0f21b0	2014-08-05 15:42:12 -07:00
Tim Kopp	9d337d34f2	s/CONFIG_DENOISING/CONFIG_VP9_TEMPORAL_DENOISING This should prevent confusion with the VP8 CONFIG_TEMPORAL_DENOISING and other flags. Change-Id: I1fe4e2977895b7966841d861ab74317ad875b6c8	2014-07-24 13:43:52 -07:00
Tim Kopp	03819ed9ab	VP9 Denoiser denoises after mode/bsize search In vp8, statistics are collected about the different modes as they are searched. This process is more complicated due to the variable block size. Fields were added to the PICM_MODE_CONTEXT struct to hold this information for each point in the search. The information is then taken from the appropriate part of the tree during denoising. Change-Id: I89261ab77ad637821287ae157dfdf694702b8e77	2014-07-15 08:43:43 -07:00
Jingning Han	ccba289f8d	Fast computation path for forward transform and quantization This commit enables a fast path computational flow for forward transformation. It checks the sse and variance of prediction residuals and decides if the quantized coefficients are all zero, dc only, or more. It then selects the corresponding coding path in the forward transformation and quantization stage. It is currently enabled in rtc coding mode. Will do it for rd coding mode next. In speed -6, the runtime for pedestrian_area 1080p at 1000 kbps goes down from 14234 ms to 13704 ms, i.e., about 4% speed-up. Overall coding performance for rtc set is changed by -0.18%. Change-Id: I0452da1786d59bc8bcbe0a35fdae9f623d1d44e1	2014-06-12 11:10:54 -07:00
Dmitry Kovalev	81e03394d6	Replacing int_mv with MV. Change-Id: Icd7eea20e944e3e28e5eb20cdc088866a54d53b4	2014-05-19 11:43:07 -07:00
Dmitry Kovalev	f80bd43bf8	Removing unused members from PICK_MODE_CONTEXT struct. Change-Id: Ieb3bc037a2ae7791323a0f9cec04381ba9b0c795	2014-05-19 10:41:58 -07:00
Dmitry Kovalev	51545f5753	Moving PC_TREE from MACROBLOCK to VP9_COMP. Because PC_TREE is encoder-level data, not MACROBLOCK-level data. Change-Id: I4f620c0781acd3a2744860610117e74948e0b2b5	2014-05-16 10:17:13 -07:00
Dmitry Kovalev	ef003078e8	Renaming "onyx" to "encoder". Actual renames: vp9_onyx_if.c -> vp9_encoder.c vp9_onyx_int.h -> vp9_encoder.h Change-Id: I80532a80b118d0060518e6c6a0d640e3f411783c	2014-04-22 14:57:05 -07:00
Jim Bankoski	e890c2579b	add a context tree structure to encoder This patch sets up a quad_tree structure (pc_tree) for holding all of pick_mode_context data we use at any square block size during encoding or picking modes. That includes contexts for 2 horizontal and 2 vertical splits, one none, and pointers to 4 sub pc_tree nodes corresponding to split. It also includes a pointer to the current chosen partitioning. This replaces code that held an index for every level in the pick modes array including: sb_index, mb_index, b_index, ab_index. These were used as stateful indexes that pointed to the current pick mode contexts you had at each level stored in the following arrays array ab4x4_context[][][], sb8x4_context[][][], sb4x8_context[][][], sb8x8_context[][][], sb8x16_context[][][], sb16x8_context[][][], mb_context[][], sb32x16[][], sb16x32[], sb32_context[], sb32x64_context[], sb64x32_context[], sb64_context and the partitioning that had been stored in the following: b_partitioning, mb_partitioning, sb_partitioning, and sb64_partitioning. Prior to this patch before doing an encode you had to set the appropriate index for your block size ( switch statement), update it ( up to 3 lookups for the index array value) and then make your call into a recursive function at which point you'd have to call get_context which then had to do a switch statement based on the blocksize, and then up to 3 lookups based upon the block size to find the context to use. With the new code the context for the block size is passed around directly avoiding the extraneous switch statements and multi dimensional array look ups that were listed above. At any level in the search all of the contexts are local to the pc_tree you are working on (in?). In addition in most places code that used to call sub functions and then check if the block size was 4x4 and index was > 0 and return now don't preferring instead to call the right none function on the inside. Change-Id: I06e39318269d9af2ce37961b3f95e181b57f5ed9	2014-04-17 07:30:55 -07:00

25 Commits