generic-library/vpx

Author	SHA1	Message	Date
Ronald S. Bultje	c8defcfdee	Update quantize SSSE3 SIMD to cover 32x32 transform case also. Encode time of bus (speed 0) 50 frames @ 1500kbps goes from 2min14.4 to 2min10.1, i.e. a 2.3% overall speed increase. Change-Id: I3699580e74ec26c7d24e03681bc47ba25ee1ee87	2013-07-01 11:36:33 -07:00
Ronald S. Bultje	7353ceab9d	Quantize (64-bit only, for now) SSSE3 SIMD. Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is x86-64 only, it needs some minor modifications to be 32bit compatible, because it uses 15 xmm registers, whereas 32bit only has 8. Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904	2013-07-01 11:36:07 -07:00
Ronald S. Bultje	af660715c0	Make coefficient skip condition an explicit RD choice. This commit replaces zrun_zbin_boost, a method of biasing non-zero coefficients following runs of zero-coefficients to be rounded towards zero, with an explicit skip-block choice in the RD loop. The logic is basically that if individual coefficients should be rounded towards zero (from a RD point of view), the trellis/optimize loop should take care of it. If whole blocks should be zero (from a RD point of view), a single RD check is much more efficient than a complete serialization of the quantization loop. Quality change: derf +0.5% psnr, +1.6% ssim; yt +0.6% psnr, +1.1% ssim. SIMD for quantize will follow in a separate patch. Results for other test sets pending. Change-Id: Ife5fa641163ac5150ac428011e87188f1937c1f4	2013-06-28 10:28:49 -07:00
Ronald S. Bultje	7a049be6bf	Inline quantize so idiv instruction gets removed from inner loop. Encoding time of first 50 frames of bus @ 1500kbps (speed 0) goes from 3min15.0 to 3min10.9, i.e. 2.1% faster overall. Change-Id: If592ee99be09bcd34a7c8498347f44e7305e982c	2013-06-27 09:01:04 -07:00
Yunqing Wang	b5bf7b13a8	Add two-pass quantization Optimized the quantization function by making it a two-pass process. The first pass does a quick checking of the transform coefficients against the base ZBIN, and only keep the good enough set of coefficients for quantization. A skipping check is added. If all coefficients are within the base ZBIN, no quantization is needed. The second pass is the actual quantization pass, which only processes the coefficient subset determined in first pass. This reduces the computation. Furthermore, an alternitive method is used for large transform size, which often has sparse nonzero quantized coefficients. Overall, the encoder speedup is about 4%. The quantization function itself gets 20% faster. Change-Id: I3a9dd0da6db030260b6d9c314a9fa48ecae89f22	2013-06-19 10:35:02 -07:00
Paul Wilkins	33ecd6ad54	Merge Scatter Scan experiment. Removal from under configure flag. A bit renaming Change-Id: I2213229dfe852001dfec16b149f47c52ce88f3aa	2013-05-23 13:09:27 +01:00
John Koleszar	679e4abdd5	Initial version of alpha channel support This is a mostly-working implementation of an extra channel in the bitstream. Configure with --enable-alpha to test. Notable TODOs: - Add extra channel to all mismatch tests, PSNR, SSIM, etc - Configurable subsampling - Variable number of planes (currently always uses all 4) - Loop filtering - Per-plane lossless quantizer - ARNR support This implementation just uses the same contents as the Y channel for the A channel, due to lack of content and general pain in playing back 4 channel content. A later patch will use the actual alpha channel passed in from outside the codec. Change-Id: Ibf81f023b1c570bd84b3064e9b4b8ae52e087592	2013-05-16 22:21:09 -07:00
Dmitry Kovalev	1e7cf5d174	Renaming Y1 and UV quant prefixes to y_ and uv_. Change-Id: If57e360c187a475fc90edb8c7170f498efcb31a5	2013-05-07 16:59:15 -07:00
Paul Wilkins	9afb6700c2	Adjust q range Skip Q values between the q.0 mode and a real q of 2.0 as these are not valuable from an RD perspective. Change-Id: I110c4858c57f97315953f4d88a2596d4764360df	2013-05-07 15:34:17 -07:00
Jingning Han	776c1482a3	Merge SB8X8 into the codebase Pull sb8x8 out of experimental list. verified via borg run tests. Fixed unit test failures. Change-Id: I12a4bbd17395930580c048ab68becad1ffe46e76	2013-05-07 09:08:25 -07:00
John Koleszar	6c622e2783	Merge "Separate transform and quant from vp9_encode_sb" into experimental	2013-05-03 17:19:01 -07:00
John Koleszar	4529c68b3b	Separate transform and quant from vp9_encode_sb This allows removing a large number of transform size specific functions, as well as supporting 444/alpha by routing all code through the subsampling-aware path. Change-Id: Ieb085cebe9f37f24fc24de179898b22abfda08a4	2013-05-03 12:14:50 -07:00
John Koleszar	f07733010b	Merge "Create common vp9_encode_sb{,y}" into experimental	2013-05-03 10:26:53 -07:00
Scott LaVarnway	3041cf8c8b	Merge "Reduced y_dequant, uv_dequant size" into experimental	2013-05-03 07:30:31 -07:00
John Koleszar	3f4e80634b	Create common vp9_encode_sb{,y} Creates a common encode (subtract, transform, quantize, optimize, inverse transform, reconstruct) function for all sb sizes, including the old 16x16 path. Change-Id: I964dff1ea7a0a5c378046a069ad83495f54df007	2013-05-02 14:02:03 -07:00
Scott LaVarnway	94ed11d89d	Reduced y_dequant, uv_dequant size Currently, only two values are used. Removed the unused values. Change-Id: Idc5b8be354d84ffc68df39ea3e45f9f50d977b35	2013-05-01 16:25:10 -04:00
Dmitry Kovalev	6e4ed2f0fe	Merge "Adding vp9_get_qindex function." into experimental	2013-05-01 12:04:21 -07:00
Ronald S. Bultje	d068d869b9	sb8x8 integration in rd loop. Work-in-progress, not yet ready for review. TODO items: - bitstream writing (encoder) and reading (decoder) - decoder reconstruction Change-Id: I5afb7284e7e0480847b47cd0097cb469433c9081	2013-04-30 16:13:20 -07:00
Dmitry Kovalev	3f6c6ffc86	Adding vp9_get_qindex function. Moving common code from encoder and decoder to vp9_get_qindex function. Also moving quant-related constants from vp9_onyxc_int.h to vp9_quant_common.h. Change-Id: I70c5bfbaa1c8bf00fde0bfc459d077f88b6d46c8	2013-04-30 13:39:50 -07:00
Dmitry Kovalev	5a5a1f25a8	Consistent names for quant-related functions and variables. Change-Id: I3a6d601e90e8740b9c26dd0afbfe9d467b75d367	2013-04-26 12:30:20 -07:00
John Koleszar	4b27eb1f18	Merge "quantize: make 4x4, 8x8 common with larger transforms" into experimental	2013-04-26 09:08:49 -07:00
John Koleszar	a672351af9	quantize: make 4x4, 8x8 common with larger transforms There were 4 variants of the quantize loop in vp9_quantize.c, now there is 1. Change-Id: Ic853393411214b32d46a6ba53769413bd14e1cac	2013-04-25 14:44:54 -07:00
Ronald S. Bultje	c849eaca59	Use b_width/height_log2 instead of mb_ where appropriate. Basic assumption: when talking about transform units, use b_; when talking about macroblock indices, use mb_. Change-Id: Ifd163f595d4924ff892de4eb0401ccd56dc81884	2013-04-25 14:20:59 -07:00
John Koleszar	15255eef82	Move dequant from BLOCKD to per-plane MACROBLOCKD This data can vary per-plane, but not per-block. Change-Id: I1971b0b2c2e697d2118e38b54ef446e52f63c65a	2013-04-25 11:57:20 -07:00
John Koleszar	c7c98a7ffb	Move skip_block from BLOCK to MACROBLOCK This data is fixed at the MB level, so move it to the common part of MACROBLOCK. Change-Id: Idd8c87118e501cdf0a202bd84c28b502a8234edf	2013-04-23 17:43:43 -07:00
John Koleszar	5c649f67a2	Move quantizer data from BLOCK to MACROBLOCK Quantizers can vary per plane, but not per block. Move these values to the per-plane part of MACROBLOCK. Change-Id: I320a55e38b7b28b29aec751a4aca5ccd0c9b9326	2013-04-23 17:43:43 -07:00
John Koleszar	aa6a36b062	Merge "Convert coeff to per-plane MACROBLOCK data" into experimental	2013-04-23 17:41:59 -07:00
John Koleszar	138ec38cab	Convert coeff to per-plane MACROBLOCK data This commit moves the coeff storage from the MACROBLOCK struct to its per-plane part. The next commit will remove the coeff member from the BLOCK structure so that it is consistently accessed per-plane. Also refactors vp9_sb_block_error_c and vp9_sb_uv_block_error_c to be variable subsampling aware. Change-Id: I18c30f87f27c3a012119b6c1970d5fa499804455	2013-04-23 16:28:17 -07:00
Dmitry Kovalev	5de7e16ca2	Adding get_scan_{4x4, 8x8, 16x16} functions. Change-Id: Id4306ef6d65d4a3984aed50b775bdf48d4f6c438	2013-04-22 14:08:41 -07:00
Deb Mukherjee	0aa79be7d5	Removes the code_nonzerocount experiment This patch does not seem to give any benefits. Change-Id: I9d2b4091d6af3dfc0875f24db86c01e2de57f8db	2013-04-22 10:58:49 -07:00
Dmitry Kovalev	1ad7c1f250	Renaming y1dc_delta_q, uvdc_delta_q, uvac_delta_q fields from VP9Common. New names are y_dc_delta_q, uv_dc_delta_q, uv_ac_delta_q. Change-Id: I4acae1fc23a4697ce2c5a5becb8dc28ef0a4b552	2013-04-16 15:05:52 -07:00
Ronald S. Bultje	f551c2d1c0	Fix width/height switch-up in U/V SB quantize code. Change-Id: I697514efd6024e1b4153bbde58ae5e323b030981	2013-04-15 09:58:27 -07:00
Ronald S. Bultje	13e41ba440	Remove unused macroblock versions of reconstruction functions. More specifically, remove vp9_quantize_mb, vp9_optimize_mb, vp9_inverse_transform_mb* and vp9_transform_mb. Instead, use the generic _sb functions that take a size argument, and call them with BLOCK_SIZE_MB16X16. Change-Id: I33024afea95d3a23ffbc1df7da426e4645110f29	2013-04-11 12:27:15 -07:00
Dmitry Kovalev	0cef7234e1	Merge "Fixing upper case names." into experimental	2013-04-10 13:29:38 -07:00
Ronald S. Bultje	a3874850dd	Make SB coding size-independent. Merge sb32x32 and sb64x64 functions; allow for rectangular sizes. Code gives identical encoder results before and after. There are a few macros for rectangular block sizes under the sbsegment experiment; this experiment is not yet functional and should not yet be used. Change-Id: I71f93b5d2a1596e99a6f01f29c3f0a456694d728	2013-04-09 21:28:27 -07:00
Dmitry Kovalev	c34f6fcb54	Fixing upper case names. Renaming Y1dequant to y_dequant, UVdequant to uv_dequant, QIndex to qindex. Change-Id: I1c356e5f886deb3f8807dc212de9799b55b09d58	2013-04-09 10:46:57 -07:00
John Koleszar	05a79f2fbf	Move EOB to per-plane data Continue migrating data from BLOCKD/MACROBLOCKD to the per-plane structures. Change-Id: Ibbfa68d6da438d32dcbe8df68245ee28b0a2fa2c	2013-04-04 21:30:23 -07:00
John Koleszar	4c05a051ab	Move qcoeff, dqcoeff from BLOCKD to per-plane data Start grouping data per-plane, as part of refactoring to support additional planes, and chroma planes with other-than 4:2:0 subsampling. Change-Id: Idb76a0e23ab239180c818025bae1f36f1608bb23	2013-04-04 16:30:57 -07:00
Ronald S. Bultje	d9094d8fd3	Add col/row-based coefficient scanning patterns for 1D 8x8/16x16 ADSTs. These are mostly just for experimental purposes. I saw small gains (in the 0.1% range) when playing with this on derf. Change-Id: Ib21eed477bbb46bddcd73b21c5c708a5b46abedc	2013-03-26 16:46:13 -07:00
Deb Mukherjee	fd18d5dffe	Modeling default coef probs with distribution Replaces the default tables for single coefficient magnitudes with those obtained from an appropriate distribution. The EOB node is left unchanged. The model is represeted as a 256-size codebook where the index corresponds to the probability of the Zero or the One node. Two variations are implemented corresponding to whether the Zero node or the One-node is used as the peg. The main advantage is that the default prob tables will become considerably smaller and manageable. Besides there is substantially less risk of over-fitting for a training set. Various distributions are tried and the one that gives the best results is the family of Generalized Gaussian distributions with shape parameter 0.75. The results are within about 0.2% of fully trained tables for the Zero peg variant, and within 0.1% of the One peg variant. The forward updates are optionally (controlled by a macro) model-based, i.e. restricted to only convey probabilities from the codebook. Backward updates can also be optionally (controlled by another macro) model-based, but is turned off by default. Currently model-based forward updates work about the same as unconstrained updates, but there is a drop in performance with backward-updates being model based. The model based approach also allows the probabilities for the key frames to be adjusted from the defaults based on the base_qindex of the frame. Currently the adjustment function is a placeholder that adjusts the prob of EOB and Zero node from the nominal one at higher quality (lower qindex) or lower quality (higher qindex) ends of the range. The rest of the probabilities are then derived based on the model from the adjusted prob of zero. Change-Id: Iae050f3cbcc6d8b3f204e8dc395ae47b3b2192c9	2013-03-25 23:43:38 -07:00
Dmitry Kovalev	3edbc77ae3	Merge "Consistent usage of ROUND_POWER_OF_TWO macro." into experimental	2013-03-08 11:35:22 -08:00
Dmitry Kovalev	3603dfb62c	Consistent usage of ROUND_POWER_OF_TWO macro. Change-Id: I44660975e9985310d8c654c158ee7a61291b5a08	2013-03-07 12:24:35 -08:00
Ronald S. Bultje	d3724abe9f	Re-add support for ADST in superblocks. This also changes the RD search to take account of the correct block index when searching (this is required for ADST positioning to work correctly in combination with tx_select). Change-Id: Ie50d05b3a024a64ecd0b376887aa38ac5f7b6af6	2013-03-07 11:19:10 -08:00
Deb Mukherjee	eb6ef2417f	Coding con-zero count rather than EOB for coeffs This patch revamps the entropy coding of coefficients to code first a non-zero count per coded block and correspondingly remove the EOB token from the token set. STATUS: Main encode/decode code achieving encode/decode sync - done. Forward and backward probability updates to the nzcs - done. Rd costing updates for nzcs - done. Note: The dynamic progrmaming apporach used in trellis quantization is not exactly compatible with nzcs. A suboptimal approach has been used instead where branch costs are updated to account for changes in the nzcs. TODO: Training the default probs/counts for nzcs Change-Id: I951bc1e22f47885077a7453a09b0493daa77883d	2013-03-07 07:20:30 -08:00
Ronald S. Bultje	111ca42133	Make superblocks independent of macroblock code and data. Split macroblock and superblock tokenization and detokenization functions and coefficient-related data structs so that the bitstream layout and related code of superblock coefficients looks less like it's a hack to fit macroblocks in superblocks. In addition, unify chroma transform size selection from luma transform size (i.e. always use the same size, as long as it fits the predictor); in practice, this means 32x32 and 64x64 superblocks using the 16x16 luma transform will now use the 16x16 (instead of the 8x8) chroma transform, and 64x64 superblocks using the 32x32 luma transform will now use the 32x32 (instead of the 16x16) chroma transform. Lastly, add a trellis optimize function for 32x32 transform blocks. HD gains about 0.3%, STDHD about 0.15% and derf about 0.1%. There's a few negative points here and there that I might want to analyze a little closer. Change-Id: Ibad7c3ddfe1acfc52771dfc27c03e9783e054430	2013-03-04 16:34:36 -08:00
Ronald S. Bultje	e8c74e2b70	Move eob from BLOCKD to MACROBLOCKD. Consistent with VP8. Change-Id: I8c316ee49f072e15abbb033a80e9c36617891f07	2013-02-27 11:00:55 -08:00
Paul Wilkins	dbf4942046	Experimental removal of over quant code The over quant code was added in VP8 post bitstream freeze to allow compression to lower data rates In VP9 the real qualtizer range has been greatly extended anyway. Change-Id: I5d384fa5e9a83ef75a3df34ee30627bd21901526	2013-02-22 14:00:51 +00:00
Yaowu Xu	93d6b86cfd	Use lossless for Q0 The commit changes the coding mode to lossless whenever the lowest quantizer is choosen. As expected, test results showed no difference for cif and std-hd set where Q0 is rarely used. For yt and yt-hd set, Q0 is used for a number of clips, where this commit helped a lot in the high end. Average over all clips in the sets: yt: 2.391% 1.017% 1.066% hd: 1.937% .764% .787% Change-Id: I9fa9df8646fd70cb09ffe9e4202b86b67da16765	2013-02-19 06:18:42 -08:00
Ronald S. Bultje	48598e30b1	Remove y2dc/ac Q delta values from the bitstream. Since there is no Y2, these values are always zero. This changes the bitstream results slightly, hence a separate commit. Change-Id: I2f838f184341868f35113ec77ca89da53c4644e0	2013-02-15 14:06:30 -08:00
Ronald S. Bultje	46dff5d233	Remove some Y2-related code. Change-Id: I4f46d142c2a8d1e8a880cfac63702dcbfb999b78	2013-02-15 14:06:25 -08:00
Paul Wilkins	56049d9488	Fixed encoder decoder mismatch. Reverted part of change I19981d1ef0b33e4e5732739574f367fe82771a84 That gives rise to an enc/dec mismatch. As things stand the memsets are still needed. Change-Id: I9fa076a703909aa0c4da0059ac6ae19aa530db30	2013-02-13 18:56:56 +00:00
Yaowu Xu	f01b08c96c	Merge "enable bitstream lossless support" into experimental	2013-02-13 10:26:58 -08:00
Yaowu Xu	17db5d00be	enable bitstream lossless support 1. Added a bit in frame header to to indicate if a frame is encoded in lossless mode, so decoder does not make the decision based on Q0 2. Minor changes to make sure that lossy coding works same as when the lossless experiment is not enabled. 3. Renamed function pointers for transforms to be consistent, using prefix fwd_txm and inv_txm for forward and inverse respectively To encode in lossless mode, using "--lossless=1 --min-q=0 --max-q=0" with vpxenc. Change-Id: Ifae53b26d2ffbe378d707e29d96817b8a5e6c068	2013-02-13 09:24:39 -08:00
Christian Duvivier	0e4397f0cd	Faster vp9_regular_quantize_b_8x8. A couple of scalar optimizations speeding up quantization by about 1.6x. Overall encoder speedup is around 3%. Change-Id: I19981d1ef0b33e4e5732739574f367fe82771a84	2013-02-12 15:55:58 -08:00
Paul Wilkins	93762ca9b2	Remove eob_max_offset markers. Remove eob_max_offset markers and replace with the generic skip_block flag to indicate to the quantizer that all coeffs to be set to 0 and eob position set to 0; Change-Id: Id477e8f8d4ec1a5562758904071013c24b76bfd7	2013-01-29 13:39:34 +00:00
Paul Wilkins	0ff9b033b0	Segment Skip Flag First step in simplifying the segment mode and segment EOB flags into a simpler segment skip flag that implies 0,0 mv and EOB at position 0. Change-Id: Ib750cac31a7a02dc21082580498efd9f7d8d72a5	2013-01-28 17:28:04 +00:00
Paul Wilkins	8e2c03fbfd	Simplify Zero bin and zero bin run code. Simplification to eliminate a number of very large data data structures. All zero run, zbin boosts for different transform sizes are now limited to a maximum run length of 15 before they max out the boost. Some further work still needs be done to refactor, rationalize and optimize the multiple quantizer functions. The simplification coupled with tweaks to the 16 element array now used for all transform sizes, has minimal effect on quality. Change-Id: I6f3948b8ca0418b60d4db9030ff19026a34ed423	2013-01-28 13:21:10 +00:00
Ronald S. Bultje	aa2effa954	Merge tx32x32 experiment. Change-Id: I615651e4c7b09e576a341ad425cf80c393637833	2013-01-10 08:23:59 -08:00
Ronald S. Bultje	4455036cfc	Merge superblocks (32x32) experiment. Change-Id: I0df99742029834a85c4933652b0587cf5b6b2587	2013-01-08 12:54:45 -08:00
Ronald S. Bultje	4cca47b538	Use standard integer types for pixel values and coefficients. For coefficients, use int16_t (instead of short); for pixel values in 16-bit intermediates, use uint16_t (instead of unsigned short); for all others, use uint8_t (instead of unsigned char). Change-Id: I3619cd9abf106c3742eccc2e2f5e89a62774f7da	2012-12-18 15:31:19 -08:00
Ronald S. Bultje	8986eb5c26	Give 4x4 scan and coef_band tables a _4x4 suffix. This matches the names of tables for all other transform sizes. Change-Id: Ia7681b7f8d34c97c27b0eb0e34d490cd0f8d02c6	2012-12-18 10:49:10 -08:00
Ronald S. Bultje	c456b35fdf	32x32 transform for superblocks. This adds Debargha's DCT/DWT hybrid and a regular 32x32 DCT, and adds code all over the place to wrap that in the bitstream/encoder/decoder/RD. Some implementation notes (these probably need careful review): - token range is extended by 1 bit, since the value range out of this transform is [-16384,16383]. - the coefficients coming out of the FDCT are manually scaled back by 1 bit, or else they won't fit in int16_t (they are 17 bits). Because of this, the RD error scoring does not right-shift the MSE score by two (unlike for 4x4/8x8/16x16). - to compensate for this loss in precision, the quantizer is halved also. This is currently a little hacky. - FDCT and IDCT is double-only right now. Needs a fixed-point impl. - There are no default probabilities for the 32x32 transform yet; I'm simply using the 16x16 luma ones. A future commit will add newly generated probabilities for all transforms. - No ADST version. I don't think we'll add one for this level; if an ADST is desired, transform-size selection can scale back to 16x16 or lower, and use an ADST at that level. Additional notes specific to Debargha's DWT/DCT hybrid: - coefficient scale is different for the top/left 16x16 (DCT-over-DWT) block than for the rest (DWT pixel differences) of the block. Therefore, RD error scoring isn't easily scalable between coefficient and pixel domain. Thus, unfortunately, we need to compute the RD distortion in the pixel domain until we figure out how to scale these appropriately. Change-Id: I00386f20f35d7fabb19aba94c8162f8aee64ef2b	2012-12-07 14:45:05 -08:00
Deb Mukherjee	0742b1e4ae	Fixing 8x8/4x4 ADST for intra modes with tx select This patch allows use of 8x8 and 4x4 ADST correctly for Intra 16x16 modes and Intra 8x8 modes when the block size selected is smaller than the prediction mode. Also includes some cleanups and refactoring. Rebase. Change-Id: Ie3257bdf07bdb9c6e9476915e3a80183c8fa005a	2012-11-28 16:21:12 -08:00
Jim Bankoski	c67873989f	fixed includes to be fully specified Change-Id: Ia1cce221f8511561b9cbd8edb7726fbc286ff243	2012-11-28 10:53:17 -08:00
John Koleszar	fcccbcbb39	Add vp9_ prefix to all vp9 files Support for gyp which doesn't support multiple objects in the same static library having the same basename. Change-Id: Ib947eefbaf68f8b177a796d23f875ccdfa6bc9dc	2012-11-27 14:12:30 -08:00

1 2 3

115 Commits