generic-library/vpx

Author	SHA1	Message	Date
Jingning Han	458c2833c0	Use saturated addition in SSSE3 of 32x32 quant The 32x32 forward transform can potentially reach peak coefficient value close to 32700, while the rounding factor can go upto 610. This could cause overflow issue in the SSSE3 implementation of 32x32 quantization process. This commit resolves this issue by replacing the addition operations with saturated addition operations in 32x32 block quantization. Change-Id: Id6b98996458e16c5b6241338ca113c332bef6e70	2013-09-05 12:49:12 -07:00
Jingning Han	abff678866	Fix overflow issue in SSSE3 32x32 quantization The 32x32 quantization process can potentially have the intermediate stacks over 16-bit range, thereby causing enc/dec mismatch. This commit fixes this overflow issue in the SSSE3 implementation, as well as the prototype, of 32x32 quantization. This fixes issue 607 from webm@googlecode. Change-Id: I85635e6ca236b90c3dcfc40d449215c7b9caa806	2013-08-29 11:00:54 -07:00
Dmitry Kovalev	569ca37d09	Moving plane_block_idx from vp9_blockd.h to vp9_quantize.c. Change-Id: Ib8af21f2e7f603c2fb407e5d15a3bba64b545b49	2013-08-19 16:44:10 -07:00
Dmitry Kovalev	b7616e387e	Moving segmentation struct from MACROBLOCKD to VP9_COMMON. VP9_COMMON is the right place to segmentatation struct because it has global segmentation parameters, not something specific to macroblock processing. Change-Id: Ib9ada0c06c253996eb3b5f6cccf6a323fbbba708	2013-08-15 10:47:48 -07:00
Dmitry Kovalev	9d5885b0ab	Quantization code cleanup. Change-Id: I77b42418b852093f79260cbd880533a0bd86678f	2013-08-12 15:23:47 -07:00
Dmitry Kovalev	f1559bdeaf	Inlining 16 as a stride for BLOCK_OFFSET macro. Change-Id: I7f23d174eb089e5500f268a10db09648634c1b82	2013-08-09 16:40:05 -07:00
Ronald S. Bultje	1ff94fea56	Inline vp9_quantize() in xform_quant(). Cycle times: 4x4: 151 to 131 cycles (15% faster) 8x8: 334 to 306 cycles (9% faster) 16x16: 1401 to 1368 cycles (2.5% faster) 32x32: 7403 to 7367 cycles (0.5% faster) Total encode time of first 50 frames of bus @ 1500kbps (speed 0) goes from 1min39.2 to 1min38.6, i.e. a 0.67% overall speedup. Change-Id: I799a49460e5e3fcab01725564dd49c629bfe935f	2013-07-15 17:30:57 -07:00
Dmitry Kovalev	c4ad3273c7	Moving segmentation related vars into separate struct. Adding segmentation struct to vp9_seg_common.h. Struct members are from macroblockd and VP9Common structs. Moving segmentation related constants and enums to vp9_seg_common.h. Change-Id: I23fabc33f11a359249f5f80d161daf569d02ec03	2013-07-11 11:57:57 -07:00
Ronald S. Bultje	c8defcfdee	Update quantize SSSE3 SIMD to cover 32x32 transform case also. Encode time of bus (speed 0) 50 frames @ 1500kbps goes from 2min14.4 to 2min10.1, i.e. a 2.3% overall speed increase. Change-Id: I3699580e74ec26c7d24e03681bc47ba25ee1ee87	2013-07-01 11:36:33 -07:00
Ronald S. Bultje	7353ceab9d	Quantize (64-bit only, for now) SSSE3 SIMD. Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is x86-64 only, it needs some minor modifications to be 32bit compatible, because it uses 15 xmm registers, whereas 32bit only has 8. Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904	2013-07-01 11:36:07 -07:00
Ronald S. Bultje	af660715c0	Make coefficient skip condition an explicit RD choice. This commit replaces zrun_zbin_boost, a method of biasing non-zero coefficients following runs of zero-coefficients to be rounded towards zero, with an explicit skip-block choice in the RD loop. The logic is basically that if individual coefficients should be rounded towards zero (from a RD point of view), the trellis/optimize loop should take care of it. If whole blocks should be zero (from a RD point of view), a single RD check is much more efficient than a complete serialization of the quantization loop. Quality change: derf +0.5% psnr, +1.6% ssim; yt +0.6% psnr, +1.1% ssim. SIMD for quantize will follow in a separate patch. Results for other test sets pending. Change-Id: Ife5fa641163ac5150ac428011e87188f1937c1f4	2013-06-28 10:28:49 -07:00
Ronald S. Bultje	7a049be6bf	Inline quantize so idiv instruction gets removed from inner loop. Encoding time of first 50 frames of bus @ 1500kbps (speed 0) goes from 3min15.0 to 3min10.9, i.e. 2.1% faster overall. Change-Id: If592ee99be09bcd34a7c8498347f44e7305e982c	2013-06-27 09:01:04 -07:00
Yunqing Wang	b5bf7b13a8	Add two-pass quantization Optimized the quantization function by making it a two-pass process. The first pass does a quick checking of the transform coefficients against the base ZBIN, and only keep the good enough set of coefficients for quantization. A skipping check is added. If all coefficients are within the base ZBIN, no quantization is needed. The second pass is the actual quantization pass, which only processes the coefficient subset determined in first pass. This reduces the computation. Furthermore, an alternitive method is used for large transform size, which often has sparse nonzero quantized coefficients. Overall, the encoder speedup is about 4%. The quantization function itself gets 20% faster. Change-Id: I3a9dd0da6db030260b6d9c314a9fa48ecae89f22	2013-06-19 10:35:02 -07:00
Paul Wilkins	33ecd6ad54	Merge Scatter Scan experiment. Removal from under configure flag. A bit renaming Change-Id: I2213229dfe852001dfec16b149f47c52ce88f3aa	2013-05-23 13:09:27 +01:00
John Koleszar	679e4abdd5	Initial version of alpha channel support This is a mostly-working implementation of an extra channel in the bitstream. Configure with --enable-alpha to test. Notable TODOs: - Add extra channel to all mismatch tests, PSNR, SSIM, etc - Configurable subsampling - Variable number of planes (currently always uses all 4) - Loop filtering - Per-plane lossless quantizer - ARNR support This implementation just uses the same contents as the Y channel for the A channel, due to lack of content and general pain in playing back 4 channel content. A later patch will use the actual alpha channel passed in from outside the codec. Change-Id: Ibf81f023b1c570bd84b3064e9b4b8ae52e087592	2013-05-16 22:21:09 -07:00
Dmitry Kovalev	1e7cf5d174	Renaming Y1 and UV quant prefixes to y_ and uv_. Change-Id: If57e360c187a475fc90edb8c7170f498efcb31a5	2013-05-07 16:59:15 -07:00
Paul Wilkins	9afb6700c2	Adjust q range Skip Q values between the q.0 mode and a real q of 2.0 as these are not valuable from an RD perspective. Change-Id: I110c4858c57f97315953f4d88a2596d4764360df	2013-05-07 15:34:17 -07:00
Jingning Han	776c1482a3	Merge SB8X8 into the codebase Pull sb8x8 out of experimental list. verified via borg run tests. Fixed unit test failures. Change-Id: I12a4bbd17395930580c048ab68becad1ffe46e76	2013-05-07 09:08:25 -07:00
John Koleszar	6c622e2783	Merge "Separate transform and quant from vp9_encode_sb" into experimental	2013-05-03 17:19:01 -07:00
John Koleszar	4529c68b3b	Separate transform and quant from vp9_encode_sb This allows removing a large number of transform size specific functions, as well as supporting 444/alpha by routing all code through the subsampling-aware path. Change-Id: Ieb085cebe9f37f24fc24de179898b22abfda08a4	2013-05-03 12:14:50 -07:00
John Koleszar	f07733010b	Merge "Create common vp9_encode_sb{,y}" into experimental	2013-05-03 10:26:53 -07:00
Scott LaVarnway	3041cf8c8b	Merge "Reduced y_dequant, uv_dequant size" into experimental	2013-05-03 07:30:31 -07:00
John Koleszar	3f4e80634b	Create common vp9_encode_sb{,y} Creates a common encode (subtract, transform, quantize, optimize, inverse transform, reconstruct) function for all sb sizes, including the old 16x16 path. Change-Id: I964dff1ea7a0a5c378046a069ad83495f54df007	2013-05-02 14:02:03 -07:00
Scott LaVarnway	94ed11d89d	Reduced y_dequant, uv_dequant size Currently, only two values are used. Removed the unused values. Change-Id: Idc5b8be354d84ffc68df39ea3e45f9f50d977b35	2013-05-01 16:25:10 -04:00
Dmitry Kovalev	6e4ed2f0fe	Merge "Adding vp9_get_qindex function." into experimental	2013-05-01 12:04:21 -07:00
Ronald S. Bultje	d068d869b9	sb8x8 integration in rd loop. Work-in-progress, not yet ready for review. TODO items: - bitstream writing (encoder) and reading (decoder) - decoder reconstruction Change-Id: I5afb7284e7e0480847b47cd0097cb469433c9081	2013-04-30 16:13:20 -07:00
Dmitry Kovalev	3f6c6ffc86	Adding vp9_get_qindex function. Moving common code from encoder and decoder to vp9_get_qindex function. Also moving quant-related constants from vp9_onyxc_int.h to vp9_quant_common.h. Change-Id: I70c5bfbaa1c8bf00fde0bfc459d077f88b6d46c8	2013-04-30 13:39:50 -07:00
Dmitry Kovalev	5a5a1f25a8	Consistent names for quant-related functions and variables. Change-Id: I3a6d601e90e8740b9c26dd0afbfe9d467b75d367	2013-04-26 12:30:20 -07:00
John Koleszar	4b27eb1f18	Merge "quantize: make 4x4, 8x8 common with larger transforms" into experimental	2013-04-26 09:08:49 -07:00
John Koleszar	a672351af9	quantize: make 4x4, 8x8 common with larger transforms There were 4 variants of the quantize loop in vp9_quantize.c, now there is 1. Change-Id: Ic853393411214b32d46a6ba53769413bd14e1cac	2013-04-25 14:44:54 -07:00
Ronald S. Bultje	c849eaca59	Use b_width/height_log2 instead of mb_ where appropriate. Basic assumption: when talking about transform units, use b_; when talking about macroblock indices, use mb_. Change-Id: Ifd163f595d4924ff892de4eb0401ccd56dc81884	2013-04-25 14:20:59 -07:00
John Koleszar	15255eef82	Move dequant from BLOCKD to per-plane MACROBLOCKD This data can vary per-plane, but not per-block. Change-Id: I1971b0b2c2e697d2118e38b54ef446e52f63c65a	2013-04-25 11:57:20 -07:00
John Koleszar	c7c98a7ffb	Move skip_block from BLOCK to MACROBLOCK This data is fixed at the MB level, so move it to the common part of MACROBLOCK. Change-Id: Idd8c87118e501cdf0a202bd84c28b502a8234edf	2013-04-23 17:43:43 -07:00
John Koleszar	5c649f67a2	Move quantizer data from BLOCK to MACROBLOCK Quantizers can vary per plane, but not per block. Move these values to the per-plane part of MACROBLOCK. Change-Id: I320a55e38b7b28b29aec751a4aca5ccd0c9b9326	2013-04-23 17:43:43 -07:00
John Koleszar	aa6a36b062	Merge "Convert coeff to per-plane MACROBLOCK data" into experimental	2013-04-23 17:41:59 -07:00
John Koleszar	138ec38cab	Convert coeff to per-plane MACROBLOCK data This commit moves the coeff storage from the MACROBLOCK struct to its per-plane part. The next commit will remove the coeff member from the BLOCK structure so that it is consistently accessed per-plane. Also refactors vp9_sb_block_error_c and vp9_sb_uv_block_error_c to be variable subsampling aware. Change-Id: I18c30f87f27c3a012119b6c1970d5fa499804455	2013-04-23 16:28:17 -07:00
Dmitry Kovalev	5de7e16ca2	Adding get_scan_{4x4, 8x8, 16x16} functions. Change-Id: Id4306ef6d65d4a3984aed50b775bdf48d4f6c438	2013-04-22 14:08:41 -07:00
Deb Mukherjee	0aa79be7d5	Removes the code_nonzerocount experiment This patch does not seem to give any benefits. Change-Id: I9d2b4091d6af3dfc0875f24db86c01e2de57f8db	2013-04-22 10:58:49 -07:00
Dmitry Kovalev	1ad7c1f250	Renaming y1dc_delta_q, uvdc_delta_q, uvac_delta_q fields from VP9Common. New names are y_dc_delta_q, uv_dc_delta_q, uv_ac_delta_q. Change-Id: I4acae1fc23a4697ce2c5a5becb8dc28ef0a4b552	2013-04-16 15:05:52 -07:00
Ronald S. Bultje	f551c2d1c0	Fix width/height switch-up in U/V SB quantize code. Change-Id: I697514efd6024e1b4153bbde58ae5e323b030981	2013-04-15 09:58:27 -07:00
Ronald S. Bultje	13e41ba440	Remove unused macroblock versions of reconstruction functions. More specifically, remove vp9_quantize_mb, vp9_optimize_mb, vp9_inverse_transform_mb* and vp9_transform_mb. Instead, use the generic _sb functions that take a size argument, and call them with BLOCK_SIZE_MB16X16. Change-Id: I33024afea95d3a23ffbc1df7da426e4645110f29	2013-04-11 12:27:15 -07:00
Dmitry Kovalev	0cef7234e1	Merge "Fixing upper case names." into experimental	2013-04-10 13:29:38 -07:00
Ronald S. Bultje	a3874850dd	Make SB coding size-independent. Merge sb32x32 and sb64x64 functions; allow for rectangular sizes. Code gives identical encoder results before and after. There are a few macros for rectangular block sizes under the sbsegment experiment; this experiment is not yet functional and should not yet be used. Change-Id: I71f93b5d2a1596e99a6f01f29c3f0a456694d728	2013-04-09 21:28:27 -07:00
Dmitry Kovalev	c34f6fcb54	Fixing upper case names. Renaming Y1dequant to y_dequant, UVdequant to uv_dequant, QIndex to qindex. Change-Id: I1c356e5f886deb3f8807dc212de9799b55b09d58	2013-04-09 10:46:57 -07:00
John Koleszar	05a79f2fbf	Move EOB to per-plane data Continue migrating data from BLOCKD/MACROBLOCKD to the per-plane structures. Change-Id: Ibbfa68d6da438d32dcbe8df68245ee28b0a2fa2c	2013-04-04 21:30:23 -07:00
John Koleszar	4c05a051ab	Move qcoeff, dqcoeff from BLOCKD to per-plane data Start grouping data per-plane, as part of refactoring to support additional planes, and chroma planes with other-than 4:2:0 subsampling. Change-Id: Idb76a0e23ab239180c818025bae1f36f1608bb23	2013-04-04 16:30:57 -07:00
Ronald S. Bultje	d9094d8fd3	Add col/row-based coefficient scanning patterns for 1D 8x8/16x16 ADSTs. These are mostly just for experimental purposes. I saw small gains (in the 0.1% range) when playing with this on derf. Change-Id: Ib21eed477bbb46bddcd73b21c5c708a5b46abedc	2013-03-26 16:46:13 -07:00
Deb Mukherjee	fd18d5dffe	Modeling default coef probs with distribution Replaces the default tables for single coefficient magnitudes with those obtained from an appropriate distribution. The EOB node is left unchanged. The model is represeted as a 256-size codebook where the index corresponds to the probability of the Zero or the One node. Two variations are implemented corresponding to whether the Zero node or the One-node is used as the peg. The main advantage is that the default prob tables will become considerably smaller and manageable. Besides there is substantially less risk of over-fitting for a training set. Various distributions are tried and the one that gives the best results is the family of Generalized Gaussian distributions with shape parameter 0.75. The results are within about 0.2% of fully trained tables for the Zero peg variant, and within 0.1% of the One peg variant. The forward updates are optionally (controlled by a macro) model-based, i.e. restricted to only convey probabilities from the codebook. Backward updates can also be optionally (controlled by another macro) model-based, but is turned off by default. Currently model-based forward updates work about the same as unconstrained updates, but there is a drop in performance with backward-updates being model based. The model based approach also allows the probabilities for the key frames to be adjusted from the defaults based on the base_qindex of the frame. Currently the adjustment function is a placeholder that adjusts the prob of EOB and Zero node from the nominal one at higher quality (lower qindex) or lower quality (higher qindex) ends of the range. The rest of the probabilities are then derived based on the model from the adjusted prob of zero. Change-Id: Iae050f3cbcc6d8b3f204e8dc395ae47b3b2192c9	2013-03-25 23:43:38 -07:00
Dmitry Kovalev	3edbc77ae3	Merge "Consistent usage of ROUND_POWER_OF_TWO macro." into experimental	2013-03-08 11:35:22 -08:00
Dmitry Kovalev	3603dfb62c	Consistent usage of ROUND_POWER_OF_TWO macro. Change-Id: I44660975e9985310d8c654c158ee7a61291b5a08	2013-03-07 12:24:35 -08:00

1 2

73 Commits