generic-library/vpx

Author	SHA1	Message	Date
Jingning Han	b4b09c9796	Enable fast forward txfm and quant for rate-distortion search This commit enables encoder to select fast forward transform and quantization path according to the prediction residual sse/variance, in the rate-distortion optimization scheme. Change-Id: Ief9fc3844fd4107166d401970e800c6e5ce2b5fe	2014-08-08 16:16:51 -07:00
Alex Converse	5926e7c0e8	Remove unfinished VP9 alpha channel. Change-Id: Ic5d3a3a0dac10b49495771886a31e793bb78b5ca	2014-07-21 15:55:50 -07:00
Jingning Han	9ad1b9fc67	Re-design quantization process for 32x32 transform block This commit enables a new quantization process for 32x32 2D-DCT transform coefficient blocks. It improves the compression performance of speed 5 by 1.4%. The overall compression gains of speed 5 due to the new quantization scheme is 4.7%. It also includes the SSSE3 implementation of the 32x32 quantization process. Change-Id: I0855b124fd6462418683f783f5bcb44255c9993b	2014-07-08 16:55:28 -07:00
Alex Converse	03c276ea17	Split vp9_rdopt into vp9_rdopt and vp9_rd. vp9_rdopt is for making rd optimal mode decisions. vp9_rd is for all other rd related routines. Anything used outside of making an rd optimal decision belongs in rd. Change-Id: I772a3073f7588bdf139f551fb9810b6864d8e64b	2014-07-02 15:33:33 -07:00
Jingning Han	9ac2f66320	Re-design quantization process This commit re-designs the quantization process for transform coefficient blocks of size 4x4 to 16x16. It improves compression performance for speed 7 by 3.85%. The SSSE3 version for the new quantization process is included. The average runtime of the 8x8 block quantization is reduced from 285 cycles -> 255 cycles, i.e., over 10% faster. Change-Id: I61278aa02efc70599b962d3314671db5b0446a50	2014-07-01 17:00:07 -07:00
Jingning Han	ccba289f8d	Fast computation path for forward transform and quantization This commit enables a fast path computational flow for forward transformation. It checks the sse and variance of prediction residuals and decides if the quantized coefficients are all zero, dc only, or more. It then selects the corresponding coding path in the forward transformation and quantization stage. It is currently enabled in rtc coding mode. Will do it for rd coding mode next. In speed -6, the runtime for pedestrian_area 1080p at 1000 kbps goes down from 14234 ms to 13704 ms, i.e., about 4% speed-up. Overall coding performance for rtc set is changed by -0.18%. Change-Id: I0452da1786d59bc8bcbe0a35fdae9f623d1d44e1	2014-06-12 11:10:54 -07:00
Yaowu Xu	c39a361b0f	vp9_quantizer.c: cleanup -wextra warnings Change-Id: If5a3c48a8c554018a5d63c1541a2900f15767a00	2014-05-14 09:37:45 -07:00
Dmitry Kovalev	ef003078e8	Renaming "onyx" to "encoder". Actual renames: vp9_onyx_if.c -> vp9_encoder.c vp9_onyx_int.h -> vp9_encoder.h Change-Id: I80532a80b118d0060518e6c6a0d640e3f411783c	2014-04-22 14:57:05 -07:00
Alex Converse	e6222b1a47	Fix the CONFIG_ALPHA build. Change-Id: Ib89fe34812c17cd6294ce3c38f87d43a79abb16f	2014-04-22 11:23:49 -07:00
Paul Wilkins	e434d08fd9	Remove old activity masking code. Delete code relating to the old VP8_TUNE_SSIM flag as this code does not currently work and is largely made redundant in VP9 by the various AQ modes. Change-Id: I71f28e1f680573d296422254489000678552b17b	2014-04-16 12:22:41 -07:00
Dmitry Kovalev	d1a396d8b9	Moving q_trans[] table to vp9_quantize.{c, h}. Change-Id: I1324c339815a47004ddccdaf651d24c60382b92f	2014-04-09 13:35:39 -07:00
Dmitry Kovalev	86f44a91f4	Renaming two members in MACROBLOCKD struct. Renames: mi_8x8 -> mi mode_info_stride -> mi_stride Change-Id: I66f3e5fd1e7b7f46f108af5bb711c5fd9493c1be	2014-04-01 17:46:40 -07:00
Dmitry Kovalev	7bcfa31750	Moving encoder quantization parameters into separate struct. Change-Id: I2a169535489aeda3943fb5a46ab53e7a12abaa36	2014-03-28 16:46:41 -07:00
Dmitry Kovalev	c9513e1dfb	Fixing include order in vp9_quantize.c Change-Id: Ic32eb103d0d7f98c0a16c4e7bdec117faf05df02	2014-02-28 11:30:51 -08:00
Dmitry Kovalev	f527c46f20	Cleaning up vp9_quantize.c. Change-Id: I9a38af32f16f196b83dd69755eafb9543edf5691	2014-02-28 10:11:31 -08:00
Paul Wilkins	8618c70683	A couple more V.S. warnings silenced. Change-Id: Ica1b583d69810182f621de757d2543b2a3b35566	2014-02-14 20:34:14 -08:00
Dmitry Kovalev	803a5c67dd	Merge "Encoder quantization cleanup."	2014-02-10 21:32:04 -08:00
Dmitry Kovalev	220b8f8644	Encoder quantization cleanup. Change-Id: I633205c95f0e81ce0589580501d0be4425a3cb8e	2014-02-03 14:57:28 -08:00
Dmitry Kovalev	70cde0af3d	Removing ENC_DEBUG. Change-Id: I101017621003314f000a454725ea13fc9db43177	2014-01-29 12:58:57 -08:00
Dmitry Kovalev	f00d157c12	Moving eob array to the encoder. In the decoder we don't need to save eobs, we can pass eob as an argument. That's why removing eob arrays from VP9Decompressor and TileWorkerData, and moving eob pointer from macroblockd_plane to macroblock_plane. Change-Id: I8eb919acc837acfb3abdd8319af63d1bbca8217a	2013-12-03 17:59:32 -08:00
Alex Converse	2360a5f093	Remove plane_block_idx. Its last remaining caller can be passed its results directly without any additional work. Also, it's not non-4:2:0 safe. Change-Id: Ia5089ba5f7f66c7617270483c619c9271aefd868	2013-12-02 18:33:50 -08:00
Dmitry Kovalev	d3a2e55af4	Removing qcoeff buffers from the decoder. We only need qcoeff buffers in the encoder. Reducing TileWorkerData struct and VP9Decompressor struct sizes by 24K. Change-Id: Id148868461f7ffa3d3dd634b371503ae9c57e207	2013-11-26 18:52:10 -08:00
Dmitry Kovalev	3f3d14e1d3	Moving q_index from MACROBLOCKD to MACROBLOCK. Moving because q_index is used only by encoder. Change-Id: I0b96175614ed4fd3d76ee56a0ba36258e1e896f6	2013-11-12 18:13:19 -08:00
Dmitry Kovalev	d54ff9dc3d	Cleaning up vp9_quantize_b_c() function. Change-Id: I42c75530a8c9cff68480657f074131e6b60d9fca	2013-11-05 17:41:56 -08:00
Dmitry Kovalev	065972f959	Adding const to vp9_quantize_b_{32x32,} parameters. Change-Id: I56f8c50ac382202f66040cd9cfaa05d889572fc7	2013-10-29 15:25:19 -07:00
Dmitry Kovalev	8253532c2d	Cleaning up vp9_regular_quantize_b_4x4. Passing scan & iscan as parameters, adding useful local variables. Change-Id: Ia2a87906941db9557350d273669ce5c3cdb7235d	2013-10-28 14:28:28 -07:00
Guillaume Martres	acf0d56f0b	Get rid of "this_mi", use "mi_8x8[0]" everywhere instead The only case where they were intentionally pointing to different structures was in mbgraph, and this didn't have the expected behavior because both of these pointers are used interchangeably through the code Change-Id: I979251782f90885fe962305bcc845bc05907f80c	2013-10-16 16:24:03 -07:00
Guillaume Martres	e55f60240a	Implement variance-based adaptive quantization This should be similar to what x264 does with --aq-mode 1. It works well with clips like parkjoy and touhou (http://x264.nl/developers/Dark_Shikari/LosslessTouhou.mkv). At low bitrates, the segmentation signaling overhead may negate the benefits of this feature. (PGW) Default changed to feature OFF to allow provisional merge. Change-Id: I938abf9bb487e1d4ad3b0264ea03d9826275c70b	2013-10-16 11:55:13 +01:00
Dmitry Kovalev	12d57a9409	Removing redundant 'extern' keyword. Change-Id: Ie51306689c0dc527a8aa12d3984389dd8f360dea	2013-09-24 15:13:09 -07:00
Scott LaVarnway	8fc95a1b11	Merge "New mode_info_context storage -- undo revert"	2013-09-13 08:56:20 -07:00
Scott LaVarnway	ac6093d179	New mode_info_context storage -- undo revert mode_info_context was stored as a grid of MODE_INFO structs. The grid now constists of pointers to MODE_INFO structs. The MODE_INFO structs are now stored as a stream (decoder only), eliminating unnecessary copies and is a little more cache friendly. Change-Id: I031d376284c6eb98a38ad5595b797f048a6cfc0d	2013-09-11 13:45:44 -04:00
Jingning Han	5d93feb6ad	Remove redundant condition check in 32x32 quant The c code implementation of 32x32 quantization does the zbin check of all coefficients prior to the quant/dequant loop, hence removing the redundant zbin check inside the loop. This only affects the c code version. SSSE3 version does not separate the zbin check out. Change-Id: Ic197a7d61d0b25fcac3cc092987651378cb56e4e	2013-09-10 12:04:33 -07:00
James Zern	c1913c9cf4	Merge "Revert "New mode_info_context storage""	2013-09-09 14:38:01 -07:00
James Zern	54a03e20dd	Revert "New mode_info_context storage" This reverts commit `dae17734ec` Encode crashes, leaks and increases integer overflow errors. Change-Id: I595aa2649bb8d0b6552ff91652837a74c103fda2	2013-09-09 13:37:01 -07:00
Jim Bankoski	e378566060	Merge "New mode_info_context storage"	2013-09-08 07:16:25 -07:00
Jingning Han	09bc942b47	Fix overflow issue in 16x16 quantization SSSE3 The 16x16 transform unit test suggested that the peak coefficient value can reach 32639. This could cause potential overflow issue in the SSSE3 implmentation of 16x16 block quantization. This commit fixes this issue by replacing addition with saturated addition. Change-Id: I6d5bb7c5faad4a927be53292324bd2728690717e	2013-09-06 21:06:10 -07:00
Scott LaVarnway	dae17734ec	New mode_info_context storage mode_info_context was stored as a grid of MODE_INFO structs. The grid now constists of a pointer to a MODE_INFO struct and a "in the image" flag. The MODE_INFO structs are now stored as a stream, eliminating unnecessary copies and is a little more cache friendly. For the test clips used, the decoder performance improved by ~4.3% (1080p) and ~9.7% (720p). Patch Set 2: Re-encoded clips with latest. Now ~1.7% (1080p) and 5.9% (720p). Change-Id: I846f29e88610fce2523ca697a9a9ef2a182e9256	2013-09-06 12:33:34 -04:00
Jingning Han	458c2833c0	Use saturated addition in SSSE3 of 32x32 quant The 32x32 forward transform can potentially reach peak coefficient value close to 32700, while the rounding factor can go upto 610. This could cause overflow issue in the SSSE3 implementation of 32x32 quantization process. This commit resolves this issue by replacing the addition operations with saturated addition operations in 32x32 block quantization. Change-Id: Id6b98996458e16c5b6241338ca113c332bef6e70	2013-09-05 12:49:12 -07:00
Jingning Han	abff678866	Fix overflow issue in SSSE3 32x32 quantization The 32x32 quantization process can potentially have the intermediate stacks over 16-bit range, thereby causing enc/dec mismatch. This commit fixes this overflow issue in the SSSE3 implementation, as well as the prototype, of 32x32 quantization. This fixes issue 607 from webm@googlecode. Change-Id: I85635e6ca236b90c3dcfc40d449215c7b9caa806	2013-08-29 11:00:54 -07:00
Dmitry Kovalev	569ca37d09	Moving plane_block_idx from vp9_blockd.h to vp9_quantize.c. Change-Id: Ib8af21f2e7f603c2fb407e5d15a3bba64b545b49	2013-08-19 16:44:10 -07:00
Dmitry Kovalev	b7616e387e	Moving segmentation struct from MACROBLOCKD to VP9_COMMON. VP9_COMMON is the right place to segmentatation struct because it has global segmentation parameters, not something specific to macroblock processing. Change-Id: Ib9ada0c06c253996eb3b5f6cccf6a323fbbba708	2013-08-15 10:47:48 -07:00
Dmitry Kovalev	9d5885b0ab	Quantization code cleanup. Change-Id: I77b42418b852093f79260cbd880533a0bd86678f	2013-08-12 15:23:47 -07:00
Dmitry Kovalev	f1559bdeaf	Inlining 16 as a stride for BLOCK_OFFSET macro. Change-Id: I7f23d174eb089e5500f268a10db09648634c1b82	2013-08-09 16:40:05 -07:00
Ronald S. Bultje	1ff94fea56	Inline vp9_quantize() in xform_quant(). Cycle times: 4x4: 151 to 131 cycles (15% faster) 8x8: 334 to 306 cycles (9% faster) 16x16: 1401 to 1368 cycles (2.5% faster) 32x32: 7403 to 7367 cycles (0.5% faster) Total encode time of first 50 frames of bus @ 1500kbps (speed 0) goes from 1min39.2 to 1min38.6, i.e. a 0.67% overall speedup. Change-Id: I799a49460e5e3fcab01725564dd49c629bfe935f	2013-07-15 17:30:57 -07:00
Dmitry Kovalev	c4ad3273c7	Moving segmentation related vars into separate struct. Adding segmentation struct to vp9_seg_common.h. Struct members are from macroblockd and VP9Common structs. Moving segmentation related constants and enums to vp9_seg_common.h. Change-Id: I23fabc33f11a359249f5f80d161daf569d02ec03	2013-07-11 11:57:57 -07:00
Ronald S. Bultje	c8defcfdee	Update quantize SSSE3 SIMD to cover 32x32 transform case also. Encode time of bus (speed 0) 50 frames @ 1500kbps goes from 2min14.4 to 2min10.1, i.e. a 2.3% overall speed increase. Change-Id: I3699580e74ec26c7d24e03681bc47ba25ee1ee87	2013-07-01 11:36:33 -07:00
Ronald S. Bultje	7353ceab9d	Quantize (64-bit only, for now) SSSE3 SIMD. Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is x86-64 only, it needs some minor modifications to be 32bit compatible, because it uses 15 xmm registers, whereas 32bit only has 8. Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904	2013-07-01 11:36:07 -07:00
Ronald S. Bultje	af660715c0	Make coefficient skip condition an explicit RD choice. This commit replaces zrun_zbin_boost, a method of biasing non-zero coefficients following runs of zero-coefficients to be rounded towards zero, with an explicit skip-block choice in the RD loop. The logic is basically that if individual coefficients should be rounded towards zero (from a RD point of view), the trellis/optimize loop should take care of it. If whole blocks should be zero (from a RD point of view), a single RD check is much more efficient than a complete serialization of the quantization loop. Quality change: derf +0.5% psnr, +1.6% ssim; yt +0.6% psnr, +1.1% ssim. SIMD for quantize will follow in a separate patch. Results for other test sets pending. Change-Id: Ife5fa641163ac5150ac428011e87188f1937c1f4	2013-06-28 10:28:49 -07:00
Ronald S. Bultje	7a049be6bf	Inline quantize so idiv instruction gets removed from inner loop. Encoding time of first 50 frames of bus @ 1500kbps (speed 0) goes from 3min15.0 to 3min10.9, i.e. 2.1% faster overall. Change-Id: If592ee99be09bcd34a7c8498347f44e7305e982c	2013-06-27 09:01:04 -07:00
Yunqing Wang	b5bf7b13a8	Add two-pass quantization Optimized the quantization function by making it a two-pass process. The first pass does a quick checking of the transform coefficients against the base ZBIN, and only keep the good enough set of coefficients for quantization. A skipping check is added. If all coefficients are within the base ZBIN, no quantization is needed. The second pass is the actual quantization pass, which only processes the coefficient subset determined in first pass. This reduces the computation. Furthermore, an alternitive method is used for large transform size, which often has sparse nonzero quantized coefficients. Overall, the encoder speedup is about 4%. The quantization function itself gets 20% faster. Change-Id: I3a9dd0da6db030260b6d9c314a9fa48ecae89f22	2013-06-19 10:35:02 -07:00

1 2 3

110 Commits