Commit Graph

115 Commits

Author SHA1 Message Date
Ronald S. Bultje
c8defcfdee Update quantize SSSE3 SIMD to cover 32x32 transform case also.
Encode time of bus (speed 0) 50 frames @ 1500kbps goes from 2min14.4 to
2min10.1, i.e. a 2.3% overall speed increase.

Change-Id: I3699580e74ec26c7d24e03681bc47ba25ee1ee87
2013-07-01 11:36:33 -07:00
Ronald S. Bultje
7353ceab9d Quantize (64-bit only, for now) SSSE3 SIMD.
Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps
goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is
x86-64 only, it needs some minor modifications to be 32bit compatible,
because it uses 15 xmm registers, whereas 32bit only has 8.

Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904
2013-07-01 11:36:07 -07:00
Ronald S. Bultje
af660715c0 Make coefficient skip condition an explicit RD choice.
This commit replaces zrun_zbin_boost, a method of biasing non-zero
coefficients following runs of zero-coefficients to be rounded towards
zero, with an explicit skip-block choice in the RD loop.

The logic is basically that if individual coefficients should be rounded
towards zero (from a RD point of view), the trellis/optimize loop should
take care of it. If whole blocks should be zero (from a RD point of
view), a single RD check is much more efficient than a complete
serialization of the quantization loop.

Quality change: derf +0.5% psnr, +1.6% ssim; yt +0.6% psnr, +1.1% ssim.
SIMD for quantize will follow in a separate patch. Results for other
test sets pending.

Change-Id: Ife5fa641163ac5150ac428011e87188f1937c1f4
2013-06-28 10:28:49 -07:00
Ronald S. Bultje
7a049be6bf Inline quantize so idiv instruction gets removed from inner loop.
Encoding time of first 50 frames of bus @ 1500kbps (speed 0) goes from
3min15.0 to 3min10.9, i.e. 2.1% faster overall.

Change-Id: If592ee99be09bcd34a7c8498347f44e7305e982c
2013-06-27 09:01:04 -07:00
Yunqing Wang
b5bf7b13a8 Add two-pass quantization
Optimized the quantization function by making it a two-pass
process. The first pass does a quick checking of the transform
coefficients against the base ZBIN, and only keep the good
enough set of coefficients for quantization. A skipping
check is added. If all coefficients are within the base ZBIN, no
quantization is needed. The second pass is the actual quantization
pass, which only processes the coefficient subset determined
in first pass. This reduces the computation. Furthermore, an
alternitive method is used for large transform size, which often
has sparse nonzero quantized coefficients.

Overall, the encoder speedup is about 4%. The quantization function
itself gets 20% faster.

Change-Id: I3a9dd0da6db030260b6d9c314a9fa48ecae89f22
2013-06-19 10:35:02 -07:00
Paul Wilkins
33ecd6ad54 Merge Scatter Scan experiment.
Removal from under configure flag.
A bit  renaming

Change-Id: I2213229dfe852001dfec16b149f47c52ce88f3aa
2013-05-23 13:09:27 +01:00
John Koleszar
679e4abdd5 Initial version of alpha channel support
This is a mostly-working implementation of an extra channel in the
bitstream. Configure with --enable-alpha to test. Notable TODOs:

 - Add extra channel to all mismatch tests, PSNR, SSIM, etc
 - Configurable subsampling
 - Variable number of planes (currently always uses all 4)
 - Loop filtering
 - Per-plane lossless quantizer
 - ARNR support

This implementation just uses the same contents as the Y channel
for the A channel, due to lack of content and general pain in
playing back 4 channel content. A later patch will use the actual
alpha channel passed in from outside the codec.

Change-Id: Ibf81f023b1c570bd84b3064e9b4b8ae52e087592
2013-05-16 22:21:09 -07:00
Dmitry Kovalev
1e7cf5d174 Renaming Y1 and UV quant prefixes to y_ and uv_.
Change-Id: If57e360c187a475fc90edb8c7170f498efcb31a5
2013-05-07 16:59:15 -07:00
Paul Wilkins
9afb6700c2 Adjust q range
Skip Q values between the q.0 mode and a real q of
2.0 as these are not valuable from an RD perspective.

Change-Id: I110c4858c57f97315953f4d88a2596d4764360df
2013-05-07 15:34:17 -07:00
Jingning Han
776c1482a3 Merge SB8X8 into the codebase
Pull sb8x8 out of experimental list. verified via borg run tests.
Fixed unit test failures.

Change-Id: I12a4bbd17395930580c048ab68becad1ffe46e76
2013-05-07 09:08:25 -07:00
John Koleszar
6c622e2783 Merge "Separate transform and quant from vp9_encode_sb" into experimental 2013-05-03 17:19:01 -07:00
John Koleszar
4529c68b3b Separate transform and quant from vp9_encode_sb
This allows removing a large number of transform size specific functions,
as well as supporting 444/alpha by routing all code through the
subsampling-aware path.

Change-Id: Ieb085cebe9f37f24fc24de179898b22abfda08a4
2013-05-03 12:14:50 -07:00
John Koleszar
f07733010b Merge "Create common vp9_encode_sb{,y}" into experimental 2013-05-03 10:26:53 -07:00
Scott LaVarnway
3041cf8c8b Merge "Reduced y_dequant, uv_dequant size" into experimental 2013-05-03 07:30:31 -07:00
John Koleszar
3f4e80634b Create common vp9_encode_sb{,y}
Creates a common encode (subtract, transform, quantize, optimize,
inverse transform, reconstruct) function for all sb sizes, including
the old 16x16 path.

Change-Id: I964dff1ea7a0a5c378046a069ad83495f54df007
2013-05-02 14:02:03 -07:00
Scott LaVarnway
94ed11d89d Reduced y_dequant, uv_dequant size
Currently, only two values are used.  Removed the unused
values.

Change-Id: Idc5b8be354d84ffc68df39ea3e45f9f50d977b35
2013-05-01 16:25:10 -04:00
Dmitry Kovalev
6e4ed2f0fe Merge "Adding vp9_get_qindex function." into experimental 2013-05-01 12:04:21 -07:00
Ronald S. Bultje
d068d869b9 sb8x8 integration in rd loop.
Work-in-progress, not yet ready for review. TODO items:
- bitstream writing (encoder) and reading (decoder)
- decoder reconstruction

Change-Id: I5afb7284e7e0480847b47cd0097cb469433c9081
2013-04-30 16:13:20 -07:00
Dmitry Kovalev
3f6c6ffc86 Adding vp9_get_qindex function.
Moving common code from encoder and decoder to vp9_get_qindex function.
Also moving quant-related constants from vp9_onyxc_int.h to
vp9_quant_common.h.

Change-Id: I70c5bfbaa1c8bf00fde0bfc459d077f88b6d46c8
2013-04-30 13:39:50 -07:00
Dmitry Kovalev
5a5a1f25a8 Consistent names for quant-related functions and variables.
Change-Id: I3a6d601e90e8740b9c26dd0afbfe9d467b75d367
2013-04-26 12:30:20 -07:00
John Koleszar
4b27eb1f18 Merge "quantize: make 4x4, 8x8 common with larger transforms" into experimental 2013-04-26 09:08:49 -07:00
John Koleszar
a672351af9 quantize: make 4x4, 8x8 common with larger transforms
There were 4 variants of the quantize loop in vp9_quantize.c, now
there is 1.

Change-Id: Ic853393411214b32d46a6ba53769413bd14e1cac
2013-04-25 14:44:54 -07:00
Ronald S. Bultje
c849eaca59 Use b_width/height_log2 instead of mb_ where appropriate.
Basic assumption: when talking about transform units, use b_; when
talking about macroblock indices, use mb_.

Change-Id: Ifd163f595d4924ff892de4eb0401ccd56dc81884
2013-04-25 14:20:59 -07:00
John Koleszar
15255eef82 Move dequant from BLOCKD to per-plane MACROBLOCKD
This data can vary per-plane, but not per-block.

Change-Id: I1971b0b2c2e697d2118e38b54ef446e52f63c65a
2013-04-25 11:57:20 -07:00
John Koleszar
c7c98a7ffb Move skip_block from BLOCK to MACROBLOCK
This data is fixed at the MB level, so move it to the common part
of MACROBLOCK.

Change-Id: Idd8c87118e501cdf0a202bd84c28b502a8234edf
2013-04-23 17:43:43 -07:00
John Koleszar
5c649f67a2 Move quantizer data from BLOCK to MACROBLOCK
Quantizers can vary per plane, but not per block. Move these values to
the per-plane part of MACROBLOCK.

Change-Id: I320a55e38b7b28b29aec751a4aca5ccd0c9b9326
2013-04-23 17:43:43 -07:00
John Koleszar
aa6a36b062 Merge "Convert coeff to per-plane MACROBLOCK data" into experimental 2013-04-23 17:41:59 -07:00
John Koleszar
138ec38cab Convert coeff to per-plane MACROBLOCK data
This commit moves the coeff storage from the MACROBLOCK struct to its
per-plane part. The next commit will remove the coeff member from the
BLOCK structure so that it is consistently accessed per-plane.

Also refactors vp9_sb_block_error_c and vp9_sb_uv_block_error_c to be
variable subsampling aware.

Change-Id: I18c30f87f27c3a012119b6c1970d5fa499804455
2013-04-23 16:28:17 -07:00
Dmitry Kovalev
5de7e16ca2 Adding get_scan_{4x4, 8x8, 16x16} functions.
Change-Id: Id4306ef6d65d4a3984aed50b775bdf48d4f6c438
2013-04-22 14:08:41 -07:00
Deb Mukherjee
0aa79be7d5 Removes the code_nonzerocount experiment
This patch does not seem to give any benefits.

Change-Id: I9d2b4091d6af3dfc0875f24db86c01e2de57f8db
2013-04-22 10:58:49 -07:00
Dmitry Kovalev
1ad7c1f250 Renaming y1dc_delta_q, uvdc_delta_q, uvac_delta_q fields from VP9Common.
New names are y_dc_delta_q, uv_dc_delta_q, uv_ac_delta_q.

Change-Id: I4acae1fc23a4697ce2c5a5becb8dc28ef0a4b552
2013-04-16 15:05:52 -07:00
Ronald S. Bultje
f551c2d1c0 Fix width/height switch-up in U/V SB quantize code.
Change-Id: I697514efd6024e1b4153bbde58ae5e323b030981
2013-04-15 09:58:27 -07:00
Ronald S. Bultje
13e41ba440 Remove unused macroblock versions of reconstruction functions.
More specifically, remove vp9_quantize_mb*, vp9_optimize_mb*,
vp9_inverse_transform_mb* and vp9_transform_mb*. Instead, use the
generic _sb* functions that take a size argument, and call them with
BLOCK_SIZE_MB16X16.

Change-Id: I33024afea95d3a23ffbc1df7da426e4645110f29
2013-04-11 12:27:15 -07:00
Dmitry Kovalev
0cef7234e1 Merge "Fixing upper case names." into experimental 2013-04-10 13:29:38 -07:00
Ronald S. Bultje
a3874850dd Make SB coding size-independent.
Merge sb32x32 and sb64x64 functions; allow for rectangular sizes. Code
gives identical encoder results before and after. There are a few
macros for rectangular block sizes under the sbsegment experiment; this
experiment is not yet functional and should not yet be used.

Change-Id: I71f93b5d2a1596e99a6f01f29c3f0a456694d728
2013-04-09 21:28:27 -07:00
Dmitry Kovalev
c34f6fcb54 Fixing upper case names.
Renaming Y1dequant to y_dequant, UVdequant to uv_dequant, QIndex to qindex.

Change-Id: I1c356e5f886deb3f8807dc212de9799b55b09d58
2013-04-09 10:46:57 -07:00
John Koleszar
05a79f2fbf Move EOB to per-plane data
Continue migrating data from BLOCKD/MACROBLOCKD to the per-plane
structures.

Change-Id: Ibbfa68d6da438d32dcbe8df68245ee28b0a2fa2c
2013-04-04 21:30:23 -07:00
John Koleszar
4c05a051ab Move qcoeff, dqcoeff from BLOCKD to per-plane data
Start grouping data per-plane, as part of refactoring to support
additional planes, and chroma planes with other-than 4:2:0
subsampling.

Change-Id: Idb76a0e23ab239180c818025bae1f36f1608bb23
2013-04-04 16:30:57 -07:00
Ronald S. Bultje
d9094d8fd3 Add col/row-based coefficient scanning patterns for 1D 8x8/16x16 ADSTs.
These are mostly just for experimental purposes. I saw small gains (in
the 0.1% range) when playing with this on derf.

Change-Id: Ib21eed477bbb46bddcd73b21c5c708a5b46abedc
2013-03-26 16:46:13 -07:00
Deb Mukherjee
fd18d5dffe Modeling default coef probs with distribution
Replaces the default tables for single coefficient magnitudes with
those obtained from an appropriate distribution. The EOB node
is left unchanged. The model is represeted as a 256-size codebook
where the index corresponds to the probability of the Zero or the
One node. Two variations are implemented corresponding to whether
the Zero node or the One-node is used as the peg. The main advantage
is that the default prob tables will become considerably smaller and
manageable. Besides there is substantially less risk of over-fitting
for a training set.

Various distributions are tried and the one that gives the best
results is the family of Generalized Gaussian distributions with
shape parameter 0.75. The results are within about 0.2% of fully
trained tables for the Zero peg variant, and within 0.1% of the
One peg variant.

The forward updates are optionally (controlled by a macro)
model-based, i.e. restricted to only convey probabilities from the
codebook. Backward updates can also be optionally (controlled by
another macro) model-based, but is turned off by default. Currently
model-based forward updates work about the same as unconstrained
updates, but there is a drop in performance with backward-updates
being model based.

The model based approach also allows the probabilities for the key
frames to be adjusted from the defaults based on the base_qindex of
the frame. Currently the adjustment function is a placeholder that
adjusts the prob of EOB and Zero node from the nominal one at higher
quality (lower qindex) or lower quality (higher qindex) ends of the
range. The rest of the probabilities are then derived based on the
model from the adjusted prob of zero.

Change-Id: Iae050f3cbcc6d8b3f204e8dc395ae47b3b2192c9
2013-03-25 23:43:38 -07:00
Dmitry Kovalev
3edbc77ae3 Merge "Consistent usage of ROUND_POWER_OF_TWO macro." into experimental 2013-03-08 11:35:22 -08:00
Dmitry Kovalev
3603dfb62c Consistent usage of ROUND_POWER_OF_TWO macro.
Change-Id: I44660975e9985310d8c654c158ee7a61291b5a08
2013-03-07 12:24:35 -08:00
Ronald S. Bultje
d3724abe9f Re-add support for ADST in superblocks.
This also changes the RD search to take account of the correct block
index when searching (this is required for ADST positioning to work
correctly in combination with tx_select).

Change-Id: Ie50d05b3a024a64ecd0b376887aa38ac5f7b6af6
2013-03-07 11:19:10 -08:00
Deb Mukherjee
eb6ef2417f Coding con-zero count rather than EOB for coeffs
This patch revamps the entropy coding of coefficients to code first
a non-zero count per coded block and correspondingly remove the EOB
token from the token set.

STATUS:
Main encode/decode code achieving encode/decode sync - done.
Forward and backward probability updates to the nzcs - done.
Rd costing updates for nzcs - done.
Note: The dynamic progrmaming apporach used in trellis quantization
is not exactly compatible with nzcs. A suboptimal approach has been
used instead where branch costs are updated to account for changes
in the nzcs.

TODO:
Training the default probs/counts for nzcs

Change-Id: I951bc1e22f47885077a7453a09b0493daa77883d
2013-03-07 07:20:30 -08:00
Ronald S. Bultje
111ca42133 Make superblocks independent of macroblock code and data.
Split macroblock and superblock tokenization and detokenization
functions and coefficient-related data structs so that the bitstream
layout and related code of superblock coefficients looks less like it's
a hack to fit macroblocks in superblocks.

In addition, unify chroma transform size selection from luma transform
size (i.e. always use the same size, as long as it fits the predictor);
in practice, this means 32x32 and 64x64 superblocks using the 16x16 luma
transform will now use the 16x16 (instead of the 8x8) chroma transform,
and 64x64 superblocks using the 32x32 luma transform will now use the
32x32 (instead of the 16x16) chroma transform.

Lastly, add a trellis optimize function for 32x32 transform blocks.

HD gains about 0.3%, STDHD about 0.15% and derf about 0.1%. There's
a few negative points here and there that I might want to analyze
a little closer.

Change-Id: Ibad7c3ddfe1acfc52771dfc27c03e9783e054430
2013-03-04 16:34:36 -08:00
Ronald S. Bultje
e8c74e2b70 Move eob from BLOCKD to MACROBLOCKD.
Consistent with VP8.

Change-Id: I8c316ee49f072e15abbb033a80e9c36617891f07
2013-02-27 11:00:55 -08:00
Paul Wilkins
dbf4942046 Experimental removal of over quant code
The over quant code was added in VP8 post
bitstream freeze to allow compression to lower
data rates

In VP9 the real qualtizer range has been greatly
extended anyway.

Change-Id: I5d384fa5e9a83ef75a3df34ee30627bd21901526
2013-02-22 14:00:51 +00:00
Yaowu Xu
93d6b86cfd Use lossless for Q0
The commit changes the coding mode to lossless whenever the lowest
quantizer is choosen.

As expected, test results showed no difference for cif and std-hd
set where Q0 is rarely used. For yt and yt-hd set, Q0 is used for
a number of clips, where this commit helped a lot in the high end.

Average over all clips in the sets:
yt: 2.391% 1.017% 1.066%
hd: 1.937%  .764%  .787%

Change-Id: I9fa9df8646fd70cb09ffe9e4202b86b67da16765
2013-02-19 06:18:42 -08:00
Ronald S. Bultje
48598e30b1 Remove y2dc/ac Q delta values from the bitstream.
Since there is no Y2, these values are always zero. This changes the
bitstream results slightly, hence a separate commit.

Change-Id: I2f838f184341868f35113ec77ca89da53c4644e0
2013-02-15 14:06:30 -08:00
Ronald S. Bultje
46dff5d233 Remove some Y2-related code.
Change-Id: I4f46d142c2a8d1e8a880cfac63702dcbfb999b78
2013-02-15 14:06:25 -08:00
Paul Wilkins
56049d9488 Fixed encoder decoder mismatch.
Reverted part of change
I19981d1ef0b33e4e5732739574f367fe82771a84

That gives rise to an enc/dec mismatch.
As things stand the memsets are still needed.

Change-Id: I9fa076a703909aa0c4da0059ac6ae19aa530db30
2013-02-13 18:56:56 +00:00
Yaowu Xu
f01b08c96c Merge "enable bitstream lossless support" into experimental 2013-02-13 10:26:58 -08:00
Yaowu Xu
17db5d00be enable bitstream lossless support
1. Added a bit in frame header to  to indicate if a frame is encoded
in lossless mode, so decoder does not make the decision based on Q0
2. Minor changes to make sure that lossy coding works same as when
the lossless experiment is not enabled.
3. Renamed function pointers for transforms to be consistent, using
prefix fwd_txm and inv_txm for forward and inverse respectively

To encode in lossless mode, using "--lossless=1 --min-q=0 --max-q=0"
with vpxenc.

Change-Id: Ifae53b26d2ffbe378d707e29d96817b8a5e6c068
2013-02-13 09:24:39 -08:00
Christian Duvivier
0e4397f0cd Faster vp9_regular_quantize_b_8x8.
A couple of scalar optimizations speeding up quantization by about 1.6x. Overall encoder speedup is around 3%.

Change-Id: I19981d1ef0b33e4e5732739574f367fe82771a84
2013-02-12 15:55:58 -08:00
Paul Wilkins
93762ca9b2 Remove eob_max_offset markers.
Remove eob_max_offset markers and replace
with the generic skip_block flag to indicate
to the quantizer that all coeffs to be set to 0
and eob position set to 0;

Change-Id: Id477e8f8d4ec1a5562758904071013c24b76bfd7
2013-01-29 13:39:34 +00:00
Paul Wilkins
0ff9b033b0 Segment Skip Flag
First step in simplifying the segment mode and
segment EOB flags into a simpler segment skip
flag that implies 0,0 mv and EOB at position 0.

Change-Id: Ib750cac31a7a02dc21082580498efd9f7d8d72a5
2013-01-28 17:28:04 +00:00
Paul Wilkins
8e2c03fbfd Simplify Zero bin and zero bin run code.
Simplification to eliminate a number of very large data
data structures. All zero run, zbin boosts for different
transform sizes are now limited to a maximum run length
of 15 before they max out the boost.

Some further work still needs be done to refactor, rationalize
and optimize the multiple quantizer functions.

The simplification coupled with tweaks to the 16 element array
now used for all transform sizes, has minimal effect on quality.

Change-Id: I6f3948b8ca0418b60d4db9030ff19026a34ed423
2013-01-28 13:21:10 +00:00
Ronald S. Bultje
aa2effa954 Merge tx32x32 experiment.
Change-Id: I615651e4c7b09e576a341ad425cf80c393637833
2013-01-10 08:23:59 -08:00
Ronald S. Bultje
4455036cfc Merge superblocks (32x32) experiment.
Change-Id: I0df99742029834a85c4933652b0587cf5b6b2587
2013-01-08 12:54:45 -08:00
Ronald S. Bultje
4cca47b538 Use standard integer types for pixel values and coefficients.
For coefficients, use int16_t (instead of short); for pixel values in
16-bit intermediates, use uint16_t (instead of unsigned short); for all
others, use uint8_t (instead of unsigned char).

Change-Id: I3619cd9abf106c3742eccc2e2f5e89a62774f7da
2012-12-18 15:31:19 -08:00
Ronald S. Bultje
8986eb5c26 Give 4x4 scan and coef_band tables a _4x4 suffix.
This matches the names of tables for all other transform sizes.

Change-Id: Ia7681b7f8d34c97c27b0eb0e34d490cd0f8d02c6
2012-12-18 10:49:10 -08:00
Ronald S. Bultje
c456b35fdf 32x32 transform for superblocks.
This adds Debargha's DCT/DWT hybrid and a regular 32x32 DCT, and adds
code all over the place to wrap that in the bitstream/encoder/decoder/RD.

Some implementation notes (these probably need careful review):
- token range is extended by 1 bit, since the value range out of this
  transform is [-16384,16383].
- the coefficients coming out of the FDCT are manually scaled back by
  1 bit, or else they won't fit in int16_t (they are 17 bits). Because
  of this, the RD error scoring does not right-shift the MSE score by
  two (unlike for 4x4/8x8/16x16).
- to compensate for this loss in precision, the quantizer is halved
  also. This is currently a little hacky.
- FDCT and IDCT is double-only right now. Needs a fixed-point impl.
- There are no default probabilities for the 32x32 transform yet; I'm
  simply using the 16x16 luma ones. A future commit will add newly
  generated probabilities for all transforms.
- No ADST version. I don't think we'll add one for this level; if an
  ADST is desired, transform-size selection can scale back to 16x16
  or lower, and use an ADST at that level.

Additional notes specific to Debargha's DWT/DCT hybrid:
- coefficient scale is different for the top/left 16x16 (DCT-over-DWT)
  block than for the rest (DWT pixel differences) of the block. Therefore,
  RD error scoring isn't easily scalable between coefficient and pixel
  domain. Thus, unfortunately, we need to compute the RD distortion in
  the pixel domain until we figure out how to scale these appropriately.

Change-Id: I00386f20f35d7fabb19aba94c8162f8aee64ef2b
2012-12-07 14:45:05 -08:00
Deb Mukherjee
0742b1e4ae Fixing 8x8/4x4 ADST for intra modes with tx select
This patch allows use of 8x8 and 4x4 ADST correctly for Intra
16x16 modes and Intra 8x8 modes when the block size selected
is smaller than the prediction mode. Also includes some cleanups
and refactoring.

Rebase.

Change-Id: Ie3257bdf07bdb9c6e9476915e3a80183c8fa005a
2012-11-28 16:21:12 -08:00
Jim Bankoski
c67873989f fixed includes to be fully specified
Change-Id: Ia1cce221f8511561b9cbd8edb7726fbc286ff243
2012-11-28 10:53:17 -08:00
John Koleszar
fcccbcbb39 Add vp9_ prefix to all vp9 files
Support for gyp which doesn't support multiple objects in the same
static library having the same basename.

Change-Id: Ib947eefbaf68f8b177a796d23f875ccdfa6bc9dc
2012-11-27 14:12:30 -08:00