Commit Graph

110 Commits

Author SHA1 Message Date
Jingning Han
b4b09c9796 Enable fast forward txfm and quant for rate-distortion search
This commit enables encoder to select fast forward transform and
quantization path according to the prediction residual sse/variance,
in the rate-distortion optimization scheme.

Change-Id: Ief9fc3844fd4107166d401970e800c6e5ce2b5fe
2014-08-08 16:16:51 -07:00
Alex Converse
5926e7c0e8 Remove unfinished VP9 alpha channel.
Change-Id: Ic5d3a3a0dac10b49495771886a31e793bb78b5ca
2014-07-21 15:55:50 -07:00
Jingning Han
9ad1b9fc67 Re-design quantization process for 32x32 transform block
This commit enables a new quantization process for 32x32 2D-DCT
transform coefficient blocks. It improves the compression
performance of speed 5 by 1.4%. The overall compression gains of
speed 5 due to the new quantization scheme is 4.7%. It also includes
the SSSE3 implementation of the 32x32 quantization process.

Change-Id: I0855b124fd6462418683f783f5bcb44255c9993b
2014-07-08 16:55:28 -07:00
Alex Converse
03c276ea17 Split vp9_rdopt into vp9_rdopt and vp9_rd.
vp9_rdopt is for making rd optimal mode decisions. vp9_rd is for all
other rd related routines. Anything used outside of making an rd optimal
decision belongs in rd.

Change-Id: I772a3073f7588bdf139f551fb9810b6864d8e64b
2014-07-02 15:33:33 -07:00
Jingning Han
9ac2f66320 Re-design quantization process
This commit re-designs the quantization process for transform
coefficient blocks of size 4x4 to 16x16. It improves compression
performance for speed 7 by 3.85%. The SSSE3 version for the
new quantization process is included.

The average runtime of the 8x8 block quantization is reduced
from 285 cycles -> 255 cycles, i.e., over 10% faster.

Change-Id: I61278aa02efc70599b962d3314671db5b0446a50
2014-07-01 17:00:07 -07:00
Jingning Han
ccba289f8d Fast computation path for forward transform and quantization
This commit enables a fast path computational flow for forward
transformation. It checks the sse and variance of prediction
residuals and decides if the quantized coefficients are all
zero, dc only, or more. It then selects the corresponding coding
path in the forward transformation and quantization stage.

It is currently enabled in rtc coding mode. Will do it for rd
coding mode next.

In speed -6, the runtime for pedestrian_area 1080p at 1000 kbps
goes down from 14234 ms to 13704 ms, i.e., about 4% speed-up.
Overall coding performance for rtc set is changed by -0.18%.

Change-Id: I0452da1786d59bc8bcbe0a35fdae9f623d1d44e1
2014-06-12 11:10:54 -07:00
Yaowu Xu
c39a361b0f vp9_quantizer.c: cleanup -wextra warnings
Change-Id: If5a3c48a8c554018a5d63c1541a2900f15767a00
2014-05-14 09:37:45 -07:00
Dmitry Kovalev
ef003078e8 Renaming "onyx" to "encoder".
Actual renames:
  vp9_onyx_if.c -> vp9_encoder.c
  vp9_onyx_int.h -> vp9_encoder.h

Change-Id: I80532a80b118d0060518e6c6a0d640e3f411783c
2014-04-22 14:57:05 -07:00
Alex Converse
e6222b1a47 Fix the CONFIG_ALPHA build.
Change-Id: Ib89fe34812c17cd6294ce3c38f87d43a79abb16f
2014-04-22 11:23:49 -07:00
Paul Wilkins
e434d08fd9 Remove old activity masking code.
Delete code relating to the old VP8_TUNE_SSIM flag
as this code does not currently work and is largely made
redundant in VP9 by the various AQ modes.

Change-Id: I71f28e1f680573d296422254489000678552b17b
2014-04-16 12:22:41 -07:00
Dmitry Kovalev
d1a396d8b9 Moving q_trans[] table to vp9_quantize.{c, h}.
Change-Id: I1324c339815a47004ddccdaf651d24c60382b92f
2014-04-09 13:35:39 -07:00
Dmitry Kovalev
86f44a91f4 Renaming two members in MACROBLOCKD struct.
Renames:
  mi_8x8 -> mi
  mode_info_stride -> mi_stride

Change-Id: I66f3e5fd1e7b7f46f108af5bb711c5fd9493c1be
2014-04-01 17:46:40 -07:00
Dmitry Kovalev
7bcfa31750 Moving encoder quantization parameters into separate struct.
Change-Id: I2a169535489aeda3943fb5a46ab53e7a12abaa36
2014-03-28 16:46:41 -07:00
Dmitry Kovalev
c9513e1dfb Fixing include order in vp9_quantize.c
Change-Id: Ic32eb103d0d7f98c0a16c4e7bdec117faf05df02
2014-02-28 11:30:51 -08:00
Dmitry Kovalev
f527c46f20 Cleaning up vp9_quantize.c.
Change-Id: I9a38af32f16f196b83dd69755eafb9543edf5691
2014-02-28 10:11:31 -08:00
Paul Wilkins
8618c70683 A couple more V.S. warnings silenced.
Change-Id: Ica1b583d69810182f621de757d2543b2a3b35566
2014-02-14 20:34:14 -08:00
Dmitry Kovalev
803a5c67dd Merge "Encoder quantization cleanup." 2014-02-10 21:32:04 -08:00
Dmitry Kovalev
220b8f8644 Encoder quantization cleanup.
Change-Id: I633205c95f0e81ce0589580501d0be4425a3cb8e
2014-02-03 14:57:28 -08:00
Dmitry Kovalev
70cde0af3d Removing ENC_DEBUG.
Change-Id: I101017621003314f000a454725ea13fc9db43177
2014-01-29 12:58:57 -08:00
Dmitry Kovalev
f00d157c12 Moving eob array to the encoder.
In the decoder we don't need to save eobs, we can pass eob as an argument.
That's why removing eob arrays from VP9Decompressor and TileWorkerData,
and moving eob pointer from macroblockd_plane to macroblock_plane.

Change-Id: I8eb919acc837acfb3abdd8319af63d1bbca8217a
2013-12-03 17:59:32 -08:00
Alex Converse
2360a5f093 Remove plane_block_idx.
Its last remaining caller can be passed its results directly without any
additional work. Also, it's not non-4:2:0 safe.

Change-Id: Ia5089ba5f7f66c7617270483c619c9271aefd868
2013-12-02 18:33:50 -08:00
Dmitry Kovalev
d3a2e55af4 Removing qcoeff buffers from the decoder.
We only need qcoeff buffers in the encoder. Reducing TileWorkerData struct
and VP9Decompressor struct sizes by 24K.

Change-Id: Id148868461f7ffa3d3dd634b371503ae9c57e207
2013-11-26 18:52:10 -08:00
Dmitry Kovalev
3f3d14e1d3 Moving q_index from MACROBLOCKD to MACROBLOCK.
Moving because q_index is used only by encoder.

Change-Id: I0b96175614ed4fd3d76ee56a0ba36258e1e896f6
2013-11-12 18:13:19 -08:00
Dmitry Kovalev
d54ff9dc3d Cleaning up vp9_quantize_b_c() function.
Change-Id: I42c75530a8c9cff68480657f074131e6b60d9fca
2013-11-05 17:41:56 -08:00
Dmitry Kovalev
065972f959 Adding const to vp9_quantize_b_{32x32,} parameters.
Change-Id: I56f8c50ac382202f66040cd9cfaa05d889572fc7
2013-10-29 15:25:19 -07:00
Dmitry Kovalev
8253532c2d Cleaning up vp9_regular_quantize_b_4x4.
Passing scan & iscan as parameters, adding useful local variables.

Change-Id: Ia2a87906941db9557350d273669ce5c3cdb7235d
2013-10-28 14:28:28 -07:00
Guillaume Martres
acf0d56f0b Get rid of "this_mi", use "mi_8x8[0]" everywhere instead
The only case where they were intentionally pointing to different
structures was in mbgraph, and this didn't have the expected behavior
because both of these pointers are used interchangeably through the code

Change-Id: I979251782f90885fe962305bcc845bc05907f80c
2013-10-16 16:24:03 -07:00
Guillaume Martres
e55f60240a Implement variance-based adaptive quantization
This should be similar to what x264 does with --aq-mode 1.
It works well with clips like parkjoy and touhou
(http://x264.nl/developers/Dark_Shikari/LosslessTouhou.mkv).
At low bitrates, the segmentation signaling overhead may negate the
benefits of this feature.

(PGW) Default changed to feature OFF to allow provisional merge.
Change-Id: I938abf9bb487e1d4ad3b0264ea03d9826275c70b
2013-10-16 11:55:13 +01:00
Dmitry Kovalev
12d57a9409 Removing redundant 'extern' keyword.
Change-Id: Ie51306689c0dc527a8aa12d3984389dd8f360dea
2013-09-24 15:13:09 -07:00
Scott LaVarnway
8fc95a1b11 Merge "New mode_info_context storage -- undo revert" 2013-09-13 08:56:20 -07:00
Scott LaVarnway
ac6093d179 New mode_info_context storage -- undo revert
mode_info_context was stored as a grid of MODE_INFO structs.
The grid now constists of pointers to MODE_INFO structs.  The
MODE_INFO structs are now stored as a stream (decoder only),
eliminating unnecessary copies and is a little more cache
friendly.

Change-Id: I031d376284c6eb98a38ad5595b797f048a6cfc0d
2013-09-11 13:45:44 -04:00
Jingning Han
5d93feb6ad Remove redundant condition check in 32x32 quant
The c code implementation of 32x32 quantization does the zbin check
of all coefficients prior to the quant/dequant loop, hence removing
the redundant zbin check inside the loop. This only affects the
c code version. SSSE3 version does not separate the zbin check out.

Change-Id: Ic197a7d61d0b25fcac3cc092987651378cb56e4e
2013-09-10 12:04:33 -07:00
James Zern
c1913c9cf4 Merge "Revert "New mode_info_context storage"" 2013-09-09 14:38:01 -07:00
James Zern
54a03e20dd Revert "New mode_info_context storage"
This reverts commit dae17734ec

Encode crashes, leaks and increases integer overflow errors.

Change-Id: I595aa2649bb8d0b6552ff91652837a74c103fda2
2013-09-09 13:37:01 -07:00
Jim Bankoski
e378566060 Merge "New mode_info_context storage" 2013-09-08 07:16:25 -07:00
Jingning Han
09bc942b47 Fix overflow issue in 16x16 quantization SSSE3
The 16x16 transform unit test suggested that the peak coefficient
value can reach 32639. This could cause potential overflow issue
in the SSSE3 implmentation of 16x16 block quantization. This commit
fixes this issue by replacing addition with saturated addition.

Change-Id: I6d5bb7c5faad4a927be53292324bd2728690717e
2013-09-06 21:06:10 -07:00
Scott LaVarnway
dae17734ec New mode_info_context storage
mode_info_context was stored as a grid of MODE_INFO structs.
The grid now constists of a pointer to a MODE_INFO struct and
a "in the image" flag.  The MODE_INFO structs are now stored
as a stream, eliminating unnecessary copies and is a little
more cache friendly.

For the test clips used, the decoder performance improved
by ~4.3% (1080p) and ~9.7% (720p).

Patch Set 2: Re-encoded clips with latest. Now ~1.7% (1080p)
and 5.9% (720p).

Change-Id: I846f29e88610fce2523ca697a9a9ef2a182e9256
2013-09-06 12:33:34 -04:00
Jingning Han
458c2833c0 Use saturated addition in SSSE3 of 32x32 quant
The 32x32 forward transform can potentially reach peak coefficient
value close to 32700, while the rounding factor can go upto 610.
This could cause overflow issue in the SSSE3 implementation of 32x32
quantization process.

This commit resolves this issue by replacing the addition operations
with saturated addition operations in 32x32 block quantization.

Change-Id: Id6b98996458e16c5b6241338ca113c332bef6e70
2013-09-05 12:49:12 -07:00
Jingning Han
abff678866 Fix overflow issue in SSSE3 32x32 quantization
The 32x32 quantization process can potentially have the intermediate
stacks over 16-bit range, thereby causing enc/dec mismatch. This commit
fixes this overflow issue in the SSSE3 implementation, as well as the
prototype, of 32x32 quantization.

This fixes issue 607 from webm@googlecode.

Change-Id: I85635e6ca236b90c3dcfc40d449215c7b9caa806
2013-08-29 11:00:54 -07:00
Dmitry Kovalev
569ca37d09 Moving plane_block_idx from vp9_blockd.h to vp9_quantize.c.
Change-Id: Ib8af21f2e7f603c2fb407e5d15a3bba64b545b49
2013-08-19 16:44:10 -07:00
Dmitry Kovalev
b7616e387e Moving segmentation struct from MACROBLOCKD to VP9_COMMON.
VP9_COMMON is the right place to segmentatation struct because it has
global segmentation parameters, not something specific to macroblock
processing.

Change-Id: Ib9ada0c06c253996eb3b5f6cccf6a323fbbba708
2013-08-15 10:47:48 -07:00
Dmitry Kovalev
9d5885b0ab Quantization code cleanup.
Change-Id: I77b42418b852093f79260cbd880533a0bd86678f
2013-08-12 15:23:47 -07:00
Dmitry Kovalev
f1559bdeaf Inlining 16 as a stride for BLOCK_OFFSET macro.
Change-Id: I7f23d174eb089e5500f268a10db09648634c1b82
2013-08-09 16:40:05 -07:00
Ronald S. Bultje
1ff94fea56 Inline vp9_quantize() in xform_quant().
Cycle times:
4x4:    151 to  131 cycles (15% faster)
8x8:    334 to  306 cycles (9% faster)
16x16: 1401 to 1368 cycles (2.5% faster)
32x32: 7403 to 7367 cycles (0.5% faster)

Total encode time of first 50 frames of bus @ 1500kbps (speed 0)
goes from 1min39.2 to 1min38.6, i.e. a 0.67% overall speedup.

Change-Id: I799a49460e5e3fcab01725564dd49c629bfe935f
2013-07-15 17:30:57 -07:00
Dmitry Kovalev
c4ad3273c7 Moving segmentation related vars into separate struct.
Adding segmentation struct to vp9_seg_common.h. Struct members are from
macroblockd and VP9Common structs. Moving segmentation related constants
and enums to vp9_seg_common.h.

Change-Id: I23fabc33f11a359249f5f80d161daf569d02ec03
2013-07-11 11:57:57 -07:00
Ronald S. Bultje
c8defcfdee Update quantize SSSE3 SIMD to cover 32x32 transform case also.
Encode time of bus (speed 0) 50 frames @ 1500kbps goes from 2min14.4 to
2min10.1, i.e. a 2.3% overall speed increase.

Change-Id: I3699580e74ec26c7d24e03681bc47ba25ee1ee87
2013-07-01 11:36:33 -07:00
Ronald S. Bultje
7353ceab9d Quantize (64-bit only, for now) SSSE3 SIMD.
Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps
goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is
x86-64 only, it needs some minor modifications to be 32bit compatible,
because it uses 15 xmm registers, whereas 32bit only has 8.

Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904
2013-07-01 11:36:07 -07:00
Ronald S. Bultje
af660715c0 Make coefficient skip condition an explicit RD choice.
This commit replaces zrun_zbin_boost, a method of biasing non-zero
coefficients following runs of zero-coefficients to be rounded towards
zero, with an explicit skip-block choice in the RD loop.

The logic is basically that if individual coefficients should be rounded
towards zero (from a RD point of view), the trellis/optimize loop should
take care of it. If whole blocks should be zero (from a RD point of
view), a single RD check is much more efficient than a complete
serialization of the quantization loop.

Quality change: derf +0.5% psnr, +1.6% ssim; yt +0.6% psnr, +1.1% ssim.
SIMD for quantize will follow in a separate patch. Results for other
test sets pending.

Change-Id: Ife5fa641163ac5150ac428011e87188f1937c1f4
2013-06-28 10:28:49 -07:00
Ronald S. Bultje
7a049be6bf Inline quantize so idiv instruction gets removed from inner loop.
Encoding time of first 50 frames of bus @ 1500kbps (speed 0) goes from
3min15.0 to 3min10.9, i.e. 2.1% faster overall.

Change-Id: If592ee99be09bcd34a7c8498347f44e7305e982c
2013-06-27 09:01:04 -07:00
Yunqing Wang
b5bf7b13a8 Add two-pass quantization
Optimized the quantization function by making it a two-pass
process. The first pass does a quick checking of the transform
coefficients against the base ZBIN, and only keep the good
enough set of coefficients for quantization. A skipping
check is added. If all coefficients are within the base ZBIN, no
quantization is needed. The second pass is the actual quantization
pass, which only processes the coefficient subset determined
in first pass. This reduces the computation. Furthermore, an
alternitive method is used for large transform size, which often
has sparse nonzero quantized coefficients.

Overall, the encoder speedup is about 4%. The quantization function
itself gets 20% faster.

Change-Id: I3a9dd0da6db030260b6d9c314a9fa48ecae89f22
2013-06-19 10:35:02 -07:00