Commit Graph

190 Commits

Author SHA1 Message Date
Dmitry Kovalev
61c33d0ad5 Removing plane_block_{width, height}_log2by4 functions.
Change-Id: I040b82b8e32aee272d10cbb021c7ba1c76343d7a
2013-08-07 17:06:33 -07:00
Jingning Han
debb9c68c8 Use low precision 32x32fdct for encodemb in speed1
The low precision 32x32 fdct has all the intermediate steps within
16-bit depth, hence allowing faster SSE2 implementation, at the
expense of larger round-trip error. It was used in the rate-distortion
optimization search loop only.

Using the low precision version, in replace of the high precision one,
affects the compression performance by about 0.7% (derf, stdhd) at
speed 0. For speed 1, it makes derf set down by only 0.017%.

Change-Id: I4e7d18fac5bea5317b91c8e7dabae143bc6b5c8b
2013-08-07 15:34:12 -07:00
Dmitry Kovalev
d007446b3f Replacing long block size enum values with shorter ones (2).
Change-Id: I428c4d42212b757112e3acfe5b81314cfbb5fd6b
2013-08-05 10:51:02 -07:00
Dmitry Kovalev
680ec32d18 Adding is_inter_block function.
Using it instead of long unclear verbose check
"mbmi->ref_frame[0] != INTRA_FRAME".

Change-Id: I9c7b4b3797942fa962bf3ba7460fff3084beabe9
2013-08-02 16:25:33 -07:00
Dmitry Kovalev
ce8dedc353 Cleanup: removing unused function arguments.
Change-Id: I27471768980fc631916069f24bc7c482a5c9ca17
2013-08-01 13:41:38 -07:00
Jingning Han
a7c4de22e1 16x16 inverse 2D-DCT with DC only
This commit provides special handle on 16x16 inverse 2D-DCT, where
only DC coefficient is quantized to be non-zero value.

Change-Id: I7bf71be7fa13384fab453dc8742b5b50e77a277c
2013-07-29 14:45:53 -07:00
Jingning Han
decb1b94de Merge "Shortcut 8x8/16x16 inverse 2D-DCT" 2013-07-29 11:04:07 -07:00
Ronald S. Bultje
118ccdcd30 Inverse dimension order in token_cost array.
This allows us to increment the position at the band-level only as
we go from one band to the next; more importantly, that allows us to
use an add instead of multiply instruction, and omit the instruction
altogether if the band doesn't change from one coef to the next, thus
being slightly faster (probably more noticeable on systems where a
multiply is expensive, like arm).

Change-Id: I4343fe35b9f9a47fa00b217bdcbf5f91ff96c381
2013-07-26 17:30:04 -07:00
Jingning Han
38fa487164 Shortcut 8x8/16x16 inverse 2D-DCT
This commit brought back the shortcut implementation of 8x8/16x16
inverse 2D-DCT. When the eob <= 10, it skips the inverse transform
operations on row 4:7/4:15 in the first round. For bus_cif at 1000
kbps, this provides about 2% speed-up at speed 0.

Change-Id: I453e2d72956467d75be4ad8c04b4482ab889d572
2013-07-26 17:19:14 -07:00
Jingning Han
325e0aa650 Special handle on DC only inverse 8x8 2D-DCT
This commit enables a special handle for the 8x8 inverse 2D-DCT,
where only DC coefficient is quantized to be non-zero. For bus_cif
at 2000 kbps, it provides about 1% speed-up at speed 0.

Change-Id: I2523222359eec26b144cf8fd4c63a4ad63b1b011
2013-07-26 14:16:51 -07:00
Jingning Han
2f58faffa4 Make coeff_optimize initialized per-plane
This commit makes the initialization of trellis coeff optimization
a per-plane operation, thereby eliminating the redundant steps in
encode_sby and encode_sbuv. It makes the encoder at speed 0 slightly
faster.

Change-Id: Iffe9faca6a109dafc0dd69dc7273cbdec19b17cd
2013-07-25 11:44:29 -07:00
Dmitry Kovalev
9139ee0908 Adding condition inside get_tx_type_{4x4, 8x8, 16x16}.
Adding plane type check condition because it was always used outside of
get_tx_type_{4x4, 8x8, 16x16}.

Change-Id: I02f0bbfee8063474865bd903eb25b54d26e07230
2013-07-24 12:55:45 -07:00
Jingning Han
666c266623 Merge "Unify the use of encode_b_args/optimize_block_args" 2013-07-23 18:08:50 -07:00
Jingning Han
ab77828b36 Unify the use of encode_b_args/optimize_block_args
The struct optimize_block_args is defined same as encode_b_args.
Remove this redundant definition, and use encode_b_args consistently.

Change-Id: I1703aeeb3bacf92e98a34f4355202712110173d9
2013-07-23 16:04:02 -07:00
Jingning Han
e9e2fe8ec3 Make xform_quant operations tx_type independent
The xform_quant() module is only used by inter modes, hence removing
the redundant switches therein conditioned on tx_type.

Change-Id: Ib87ce5b2f2e4cbf3ceb133a1108afa173c933a3f
2013-07-23 12:37:25 -07:00
Jingning Han
0359ad7f9a Skip inverse transform when eob is zero
When all the transform coefficients were quantized to zero, skip
the inverse transform operation. For bus_cif at 1000 kbps, the
runtime goes from 154967ms -> 149842ms, i.e., about 3% speed-up,
at speed 0.

Change-Id: Ic0a813fff5e28972d4888ee42d8747846a6c3cc6
2013-07-23 10:06:41 -07:00
Jim Bankoski
86a9dec73c clean up bw, bh
many structures use bw and bh and they have different meanings.   This cl attempts
to start this clean up and remove unneccessary 2 step look up log and then
shift operations...

also removed partition type multiple operation code in bitstream.c.

Change-Id: I7e03e552bdfc0939738e430862e3073d30fdd5db
2013-07-23 06:51:44 -07:00
Jingning Han
dd97c62ab8 Merge "Skip inter-coded block reconstruction in rd loop" 2013-07-16 09:03:38 -07:00
Ronald S. Bultje
1ff94fea56 Inline vp9_quantize() in xform_quant().
Cycle times:
4x4:    151 to  131 cycles (15% faster)
8x8:    334 to  306 cycles (9% faster)
16x16: 1401 to 1368 cycles (2.5% faster)
32x32: 7403 to 7367 cycles (0.5% faster)

Total encode time of first 50 frames of bus @ 1500kbps (speed 0)
goes from 1min39.2 to 1min38.6, i.e. a 0.67% overall speedup.

Change-Id: I799a49460e5e3fcab01725564dd49c629bfe935f
2013-07-15 17:30:57 -07:00
Ronald S. Bultje
6fb418741f Inline xform_quant() in encode_block_intra().
Also inline some of the block calculations to assist the compiler to
not do silly things like calculating the same offset (or converting
between raster/transform block offset or block, mi and pixel unit)
many, many, many times.

Cycle times:
4x4:     584 ->   505 cycles (16% faster)
8x8:    1651 ->  1560 cycles (6% faster)
16x16:  7897 ->  7704 cycles (2.5% faster)
32x32: 16096 -> 15852 cycles (1.5% faster)

Overall, this saves about 0.5 seconds (1min49.8 -> 1min49.3) on the
first 50 frames of bus (speed 0) @ 1500kbps, i.e. 0.5% overall.

Change-Id: If3dd62453f8e2ab9d4ee616bc4ea956fb8874b80
2013-07-15 16:00:42 -07:00
Jingning Han
043e0f9dad Skip inter-coded block reconstruction in rd loop
Skip the inverse transform and reconstruction of inter-mode coded
blocks in the rate-distortion optimization loop, when skip_encode_sb
feature is turned on. This provides about 1% speed-up at speed 0,
and 1.5% speed-up at speed 1. No performance change in both settings.

Change-Id: I2932718bf4d007163702b61b16b6ff100cf9d007
2013-07-15 11:32:14 -07:00
Jingning Han
faff6ed0fb Skip duplicate block encoding in the rd loop
This speed feature allows the encoder to largely remove the spatial
dependency between blocks inside a 64x64 superblock, thereby removing
the need to repeatedly encode superblocks per partition type in the
rate-distortion optimization loop.

A major challenge lies in the intra modes tested in the rate-distortion
optimization loop. The subsequent blocks do not have access to the
reconstructed boundary pixels without the intermediate coding steps.
This was resolved by using the original pixels for intra prediction
in the rd loop, followed by an appropriately designed distortion
modeling on the quantization parameters. Experiments also suggested
that the performance impact is more discernible at lower bit-rate/psnr
settings. Hence a quantizer dependent threshold is applied to deactivate
skip of block coding.

For bus_cif at 2000 kbps,
speed 0: runtime 269854ms -> 237774ms (12% speed-up) at 0.05dB
         performance loss.

speed 1: runtime 65312ms  -> 61536ms, (7% speed-up) at 0.04dB
         performance loss.

This operation is currently turned on in settings of speed 1.

Change-Id: Ib689741dfff8dd38365d8c1b92860a3e176f56ec
2013-07-15 11:08:58 -07:00
Jingning Han
b9381b6faf Remove unnecessary tx_type branch in encode_block
The function encode_block is called only by inter-prediction modes,
hence removing the transform type branching there.

Change-Id: I34a3172e28ce2388835efd0f8781922211bff857
2013-07-11 09:11:35 -07:00
Ronald S. Bultje
5a73254918 Remove unnecessary memset(best_index, 0) from trellis/optimize.
First 50 frames of bus @ 1500kbps (speed 0) goes from 2min12.6 to
2min11.6, i.e. 0.75% overall speedup.

Change-Id: I67054f8146e82a02b6457c51a1c8627a937e5e1e
2013-07-08 16:22:39 -07:00
Dmitry Kovalev
be77f6bbbf Removing redundant struct from union b_mode_info.
Change-Id: I08fc6e474ff2c12cfa065bae4989c724276e2c83
2013-07-02 16:51:57 -07:00
Ronald S. Bultje
3cc6eb7c00 Merge "Make get_coef_context() branchless." 2013-07-02 11:48:15 -07:00
Jingning Han
b91a1586a3 Calculate rd cost per transformed block
Compute the rate-distortion cost per transformed block, and cumulate
the cost through all blocks inside a partition. This allows encoder
to detect if the cumulative rd cost is already above the best rd cost,
thereby enabling early termination in the rate-distortion optimization
search.

Change-Id: I0a856367a9a7b6dd0b466e7b767f54d5018d09ac
2013-07-02 09:58:46 -07:00
Ronald S. Bultje
26b6318de8 Make get_coef_context() branchless.
This should significantly speedup cost_coeffs(). Basically what the
patch does is to make the neighbour arrays padded by one item to
prevent an eob check in get_coef_context(), then it populates each
col/row scan and left/top edge coefficient with two times the same
neighbour - this prevents a single/double context branch in
get_coef_context(). Lastly, it populates neighbour arrays in pixel
order (rather than scan order), so we don't have to dereference the
scantable to get the correct neighbours.

Total encoding time of first 50 frames of bus (speed 0) at 1500kbps
goes from 2min10.1 to 2min5.3, i.e. a 2.6% overall speed increase.

Change-Id: I42bcd2210fd7bec03767ef0e2945a665b851df56
2013-07-01 16:34:10 -07:00
Ronald S. Bultje
7353ceab9d Quantize (64-bit only, for now) SSSE3 SIMD.
Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps
goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is
x86-64 only, it needs some minor modifications to be 32bit compatible,
because it uses 15 xmm registers, whereas 32bit only has 8.

Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904
2013-07-01 11:36:07 -07:00
Ronald S. Bultje
d00b8e5f82 Inline vp9_get_coef_context() (and remove vp9_ prefix).
Makes cost_coeffs() a lot faster:
4x4: 236 -> 181 cycles
8x8: 888 -> 588 cycles
16x16: 3550 -> 2483 cycles
32x32: 17392 -> 12010 cycles

Total encode time of first 50 frames of bus (speed 0) @ 1500kbps goes
from 2min51.6 to 2min43.9, i.e. 4.7% overall speedup.

Change-Id: I16b8d595946393c8dc661599550b3f37f5718896
2013-06-28 10:40:21 -07:00
Ronald S. Bultje
91d223bd5c Some minor optimizations for cost_coeffs().
Cycle timings for first 3 frames of bus (speed 0) at 1500kbps:
4x4: 298 -> 234 cycles
8x8: 1227 -> 878 cycles
16x16: 23426 -> 18134 cycles
32x32: 4906 -> 3664 cycles

Total encode time of first 50 frames of bus @ 1500kbps (speed 0) goes
from 3min0.7 to 2min51.6 seconds, i.e. 5.3% faster.

Change-Id: I68a0e1b530b0563b84a67342cca4b45146077e95
2013-06-28 10:29:02 -07:00
Jingning Han
fc1cfd8e32 Merge "Make intra predictor reference buffer configurable" 2013-06-26 19:02:02 -07:00
Jingning Han
861cb06c67 Make intra predictor reference buffer configurable
This commit enables configurable reference buffer pointer for intra
predictor. This allows later removal of spatial dependency between
blocks inside a 64x64 superblock in the rate-distortion optimization
loop.

Change-Id: I02418c2077efe19adc86e046a6b49364a980f5b1
2013-06-26 17:17:21 -07:00
Ronald S. Bultje
b5468155b6 Remove unused macro RDTRUNC_8x8 from encodemb.c.
Change-Id: I0c097567adab24215d807963ccb34810a2afe007
2013-06-26 15:34:01 -07:00
Dmitry Kovalev
87ee34aacb Removing unused code.
Removing block index (ib) parameter from get_tx_type_{8x8, 16x16}
functions.

Change-Id: Ia213335aae7a7cb027f97b9cc9b04519840250f1
2013-06-25 10:17:19 -07:00
Ronald S. Bultje
25c588b1e4 Add subtract_block SSE2 version and unit test.
3% faster overall (3min35.0 to 3min28.5).

Change-Id: I5ff8a5c2c91586b6632ca5009ad1ea51ce94af5e
2013-06-21 09:35:37 -07:00
Jingning Han
7088426976 Merge "Make fdct32 computation flow within 16bit range" 2013-06-18 11:40:14 -07:00
Jingning Han
a41a4860c0 Make fdct32 computation flow within 16bit range
This commit makes use of dual fdct32x32 versions for rate-distortion
optimization loop and encoding process, respectively. The one for
rd loop requires only 16 bits precision for intermediate steps.
The original fdct32x32 that allows higher intermediate precision (18
bits) was retained for the encoding process only.

This allows speed-up for fdct32x32 in the rd loop. No performance
loss observed.

Change-Id: I3237770e39a8f87ed17ae5513c87228533397cc3
2013-06-18 09:46:24 -07:00
Dmitry Kovalev
686b99741c Removing vp9_invtrans.{c, h} files.
Moving single function from vp9_invtrans.c to vp9_encodemb.c.

Change-Id: I26bf6bb90de342a3036c0dbfba78a7dd75a61fe7
2013-06-17 16:09:03 -07:00
John Koleszar
717d744a01 Fix use of get_uv_tx_size in loopfilter
Change the argument of get_uv_tx_size() to be an MBMI pointer, so that the
correct column's MBMI can be passed to the function.

Change-Id: Ied6b8ec33b77cdd353119e8fd2d157811815fc98
2013-06-10 11:40:57 -07:00
Ronald S. Bultje
6ef805eb9d Change ref frame coding.
Code intra/inter, then comp/single, then the ref frame selection.
Use contextualization for all steps. Don't code two past frames
in comp pred mode.

Change-Id: I4639a78cd5cccb283023265dbcc07898c3e7cf95
2013-06-06 17:28:09 -07:00
Jim Bankoski
5a88271b09 don't tokenize & encode tokens for blocks in UMV
This avoids encoding tokens for blocks that are entirely
in the UMV border. This changes the bitstream.

Change-Id: I32b4df46ac8a990d0c37cee92fd34f8ddd4fb6c9
2013-06-06 06:10:25 -07:00
Dmitry Kovalev
317d832d38 Merge "Adding plane_block_width and plane_block_height functions." into experimental 2013-05-31 15:28:45 -07:00
Deb Mukherjee
0048ec2329 Costing fixes related to trellis optimization
Migrates costing changes/fixes from the rebalance expt to the head
without the expt on.

Rebased.

Change-Id: I51677d62f77ed08aca8d21a4c9a13103eb8de93f
Results:
derfraw300: +0.126%
2013-05-31 13:56:32 -07:00
Dmitry Kovalev
120a878199 Adding plane_block_width and plane_block_height functions.
Change-Id: I02c17fb733c0f3c22dc3167c3d3182797415f1ae
2013-05-31 12:31:49 -07:00
Deb Mukherjee
b8b3f1a46d Balancing coef-tree to reduce bool decodes
This patch changes the coefficient tree to move the EOB to below
the ZERO node in order to save number of bool decodes.

The advantages of moving EOB one step down as opposed to two steps down
in the other parallel patch are: 1. The coef modeling based on
the One-node becomes independent of the tree structure above it, and
2. Fewer conext/counter increases are needed.

The drawback is that the potential savings in bool decodes will be
less, but assuming that 0s are much more predominant than 1's the
potential savings is still likely to be substantial.

Results on derf300: -0.237%

Change-Id: Ie784be13dc98291306b338e8228703a4c2ea2242
2013-05-29 16:25:52 -07:00
Jingning Han
6c97bba403 Merge "further clean-ups on intra4x4 coding" into experimental 2013-05-29 10:55:14 -07:00
Sami Pietila
88a4d4c510 Residual coding to cache energy class of tokens.
Proposal for tuning the residual coding by changing how the context
from previous tokens is calculated. Storing the energy class of previous
tokens instead of the token itself eases the critical path of
HW implementations.

Change-Id: I6d71d856b84518f6c88de771ddd818436f794bab
2013-05-29 15:21:01 +01:00
Jingning Han
4729a6f389 further clean-ups on intra4x4 coding
Removed one 4x4 prediction step that was unnessary in the rd loop.
Removed a unused modecosts estimate from encoder side.

Change-Id: I65221a52719d6876492996955ef04142d2752d86
2013-05-28 11:19:05 -07:00
Yaowu Xu
2b96ffe025 a few clean-ups
1. remove prediction mode conversion
2. unified bmode, same for key and non-key frame
3. set I4X4_PRED count for pdf to 0, as I4X4_PRED is no longer
coded ever. It is determined by ref_frame and block partition

Change-Id: If5b282957c24339b241acdb9f2afef85658fe47d
2013-05-27 13:53:56 -07:00
Paul Wilkins
33ecd6ad54 Merge Scatter Scan experiment.
Removal from under configure flag.
A bit  renaming

Change-Id: I2213229dfe852001dfec16b149f47c52ce88f3aa
2013-05-23 13:09:27 +01:00
Yaowu Xu
8ba92a0bed changes intra coding to be based on txfm block
This commit changed the encoding and decoding of intra blocks to be
based on transform block. In each prediction block, the intra coding
iterates thorough each transform block based on raster scan order.

This commit also fixed a bug in D135 prediction code.

TODO next:
The RD mode/txfm_size selection should take this into account when
computing RD values.

Change-Id: I6d1be2faa4c4948a52e830b6a9a84a6b2b6850f6
2013-05-22 11:53:19 +01:00
Yaowu Xu
232d90d8fd Generalized intra 4x4 encoding for all sizes
Change-Id: I1b86744fa247233c8df031b3f4b87b212c8dd094
2013-05-22 11:44:12 +01:00
John Koleszar
ddf13be8ef Merge "Initial version of alpha channel support" into experimental 2013-05-21 17:29:51 -07:00
Scott LaVarnway
ba48a11130 WIP: 4x4 idct/recon merge
This patch eliminates the intermediate diff buffer usage by
combining the short idct and the add residual into one function.
The encoder can use the same code as well.

Change-Id: I296604bf73579c45105de0dd1adbcc91bcc53c22
2013-05-20 13:03:17 -04:00
John Koleszar
679e4abdd5 Initial version of alpha channel support
This is a mostly-working implementation of an extra channel in the
bitstream. Configure with --enable-alpha to test. Notable TODOs:

 - Add extra channel to all mismatch tests, PSNR, SSIM, etc
 - Configurable subsampling
 - Variable number of planes (currently always uses all 4)
 - Loop filtering
 - Per-plane lossless quantizer
 - ARNR support

This implementation just uses the same contents as the Y channel
for the A channel, due to lack of content and general pain in
playing back 4 channel content. A later patch will use the actual
alpha channel passed in from outside the codec.

Change-Id: Ibf81f023b1c570bd84b3064e9b4b8ae52e087592
2013-05-16 22:21:09 -07:00
Scott LaVarnway
794a7bedbd WIP: 8x8 idct/recon merge
This patch eliminates the intermediate diff buffer usage by
combining the short idct and the add residual into one function.
The encoder can use the same code as well.

Change-Id: Iacfd57324fbe2b7beca5d7f3dcae25c976e67f45
2013-05-16 13:52:15 -04:00
Scott LaVarnway
a272ff25cd WIP: 16x16 idct/recon merge
This patch eliminates the intermediate diff buffer usage by
combining the short idct and the add residual into one function.
The encoder can use the same code as well.

Change-Id: Iea7976b22b1927d24b8004d2a3fddae7ecca3ba1
2013-05-15 13:16:02 -04:00
Scott LaVarnway
2cf0d4be12 WIP: 32x32 idct/recon merge
This patch eliminates the intermediate diff buffer usage by
combining the short idct and the add residual into one function.
The encoder can use the same code as well.

Change-Id: I4ea09df0e162591e420d869b7431c2e7f89a8c1a
2013-05-14 15:54:17 -07:00
Paul Wilkins
e5f715201a Change to band calculation.
Change band calculation back to simpler model based
on the order in which coefficients are coded in scan order
not the absolute coefficient positions.

With the scatter scan experiment enabled the results were
appear broadly neutral on derf (-0.028) but up a little on std-hd +0.134).

Without the scatterscan experiment on the results were up derf as well.

Change-Id: Ie9ef03ce42a6b24b849a4bebe950d4a5dffa6791
2013-05-13 17:21:49 +01:00
John Koleszar
1fec23bef6 Use common get_uv_tx_size()
Use a single method for calculating the transform size of
non-luma planes.

Change-Id: I16ebd10e7944d7b9075ab79d15e6a5b5f9bab775
2013-05-08 20:48:32 -07:00
Jingning Han
776c1482a3 Merge SB8X8 into the codebase
Pull sb8x8 out of experimental list. verified via borg run tests.
Fixed unit test failures.

Change-Id: I12a4bbd17395930580c048ab68becad1ffe46e76
2013-05-07 09:08:25 -07:00
John Koleszar
4529c68b3b Separate transform and quant from vp9_encode_sb
This allows removing a large number of transform size specific functions,
as well as supporting 444/alpha by routing all code through the
subsampling-aware path.

Change-Id: Ieb085cebe9f37f24fc24de179898b22abfda08a4
2013-05-03 12:14:50 -07:00
John Koleszar
3f4e80634b Create common vp9_encode_sb{,y}
Creates a common encode (subtract, transform, quantize, optimize,
inverse transform, reconstruct) function for all sb sizes, including
the old 16x16 path.

Change-Id: I964dff1ea7a0a5c378046a069ad83495f54df007
2013-05-02 14:02:03 -07:00
John Koleszar
1f80a568d2 Make vp9_optimize_sb* common
Unify the various vp9_optimize_sb functions into one that handles all
transform sizes.

Change-Id: I48b642fbfb3e72cc2e0bcf1d0317a80a80547882
2013-04-30 21:34:58 -07:00
Ronald S. Bultje
d068d869b9 sb8x8 integration in rd loop.
Work-in-progress, not yet ready for review. TODO items:
- bitstream writing (encoder) and reading (decoder)
- decoder reconstruction

Change-Id: I5afb7284e7e0480847b47cd0097cb469433c9081
2013-04-30 16:13:20 -07:00
Ronald S. Bultje
2dbaa4f4f4 Change above/left_context to use an 8x8 basis.
Output changes slightly because of a minor bug in (at least) the sb32x16
block2above tx16x16 tables that previously existed in vp9_blockd.c.

Change-Id: I624af28ac200a8322d64454cf05c79e9502968cc
2013-04-29 10:37:25 -07:00
Ronald S. Bultje
1a46b30ebe Grow MODE_INFO array to use an 8x8 basis.
Change-Id: I087e08e7909a406b71715b8525c104208daa6889
2013-04-26 11:57:17 -07:00
Ronald S. Bultje
c849eaca59 Use b_width/height_log2 instead of mb_ where appropriate.
Basic assumption: when talking about transform units, use b_; when
talking about macroblock indices, use mb_.

Change-Id: Ifd163f595d4924ff892de4eb0401ccd56dc81884
2013-04-25 14:20:59 -07:00
John Koleszar
15255eef82 Move dequant from BLOCKD to per-plane MACROBLOCKD
This data can vary per-plane, but not per-block.

Change-Id: I1971b0b2c2e697d2118e38b54ef446e52f63c65a
2013-04-25 11:57:20 -07:00
John Koleszar
4bd0f4f646 Remove BLOCK structure
All members can be referenced from their per-plane counterparts, and
removes assumptions about 24 blocks per macroblock.

Change-Id: I593fb0715e74cd84b48facd1c9b18c3ae1185d4b
2013-04-25 11:33:17 -07:00
John Koleszar
aa6a36b062 Merge "Convert coeff to per-plane MACROBLOCK data" into experimental 2013-04-23 17:41:59 -07:00
John Koleszar
138ec38cab Convert coeff to per-plane MACROBLOCK data
This commit moves the coeff storage from the MACROBLOCK struct to its
per-plane part. The next commit will remove the coeff member from the
BLOCK structure so that it is consistently accessed per-plane.

Also refactors vp9_sb_block_error_c and vp9_sb_uv_block_error_c to be
variable subsampling aware.

Change-Id: I18c30f87f27c3a012119b6c1970d5fa499804455
2013-04-23 16:28:17 -07:00
John Koleszar
4f35e3e1c1 Merge "Move src_diff to per-plane MACROBLOCK data" into experimental 2013-04-23 16:24:08 -07:00
John Koleszar
cbd1315ac4 Move src_diff to per-plane MACROBLOCK data
First in a series of commits making certain MACROBLOCK members
addressable per-plane. This commit also refactors the block subtraction
functions vp9_subtract_b, vp9_subtract_sby_c, etc to be
loops-over-planes and variable subsampling aware.

Change-Id: I371d092b914ae0a495dfd852ea1a3d2467be6ec3
2013-04-23 12:18:51 -07:00
Dmitry Kovalev
5de7e16ca2 Adding get_scan_{4x4, 8x8, 16x16} functions.
Change-Id: Id4306ef6d65d4a3984aed50b775bdf48d4f6c438
2013-04-22 14:08:41 -07:00
Deb Mukherjee
f12509f640 Merge "Removes the code_nonzerocount experiment" into experimental 2013-04-22 11:53:14 -07:00
Deb Mukherjee
0aa79be7d5 Removes the code_nonzerocount experiment
This patch does not seem to give any benefits.

Change-Id: I9d2b4091d6af3dfc0875f24db86c01e2de57f8db
2013-04-22 10:58:49 -07:00
Deb Mukherjee
6ce718eb18 Merge "End of orientation zero group experiment" into experimental 2013-04-22 10:33:12 -07:00
Deb Mukherjee
70d9f116fd End of orientation zero group experiment
Adds an experiment that codes an end-of-orientation symbol
for every eligible zero encountered in scan order.

This cleans out various other sub-experiments that were part
of the origiinal patch, which will be later included if found
useful.

Results are slightly positive on all sets (0.1 - 0.2% range).

Change-Id: I57765c605fefc7fb9d1b57f1b356843602abefaf
2013-04-22 09:27:59 -07:00
John Koleszar
6d5ac8f2e1 reconinter: remove unnecessary functions, params
Removes the redundant dst pointers from vp9_build_inter_predictors_sb{y,uv}
and the remaining mb specific functions.

Change-Id: I7b6bf439d9394b85ea79b4fe61a3ffc1025720da
2013-04-22 08:20:54 -07:00
John Koleszar
d12376aa2c Move dst to per-plane MACROBLOCKD data
First in a series of commits moving the framebuffers pointers to
per-plane data, so that they can be indexed numerically rather than
by name.

Change-Id: I6e0d60fd4d51e6375c384eb7321776564df21775
2013-04-19 16:16:10 -07:00
John Koleszar
9ec0f658a1 Remove vp9_recon_mb{,y}
Use the common sb functions instead.

Change-Id: I4fa0a8ee3c6ada56271dd09bf895b97642f55858
2013-04-19 12:12:00 -07:00
Dmitry Kovalev
77f4697a13 Fixing member names inside TOKENVALUE and TOKENEXTRA structs.
Change-Id: I183ec5819d4d80966c92db36db75b8c3be0d381d
2013-04-18 16:18:08 -07:00
Jingning Han
f0b065e946 Merge "Make the use of pred buffers consistent in MB/SB" into experimental 2013-04-18 15:24:55 -07:00
Jingning Han
6f43ff5824 Make the use of pred buffers consistent in MB/SB
Use in-place buffers (dst of MACROBLOCKD) for  macroblock prediction.
This makes the macroblock buffer handling consistent with those of
superblock. Remove predictor buffer MACROBLOCKD.

Change-Id: Id1bcd898961097b1e6230c10f0130753a59fc6df
2013-04-18 14:59:36 -07:00
Dmitry Kovalev
9087d6d470 Replacing VP9_COMBINEENTROPYCONTEXTS macro with function.
Change-Id: I3bbc31840af69481e1d9bb4427c9ee25abf82946
2013-04-16 15:30:28 -07:00
Ronald S. Bultje
340bc46f49 Remove subtract_mb* functions.
Use subtract_sb* instead.

Change-Id: I3f34140ab97061063a4452945347ef1fe37e13d1
2013-04-11 12:27:15 -07:00
Ronald S. Bultje
13e41ba440 Remove unused macroblock versions of reconstruction functions.
More specifically, remove vp9_quantize_mb*, vp9_optimize_mb*,
vp9_inverse_transform_mb* and vp9_transform_mb*. Instead, use the
generic _sb* functions that take a size argument, and call them with
BLOCK_SIZE_MB16X16.

Change-Id: I33024afea95d3a23ffbc1df7da426e4645110f29
2013-04-11 12:27:15 -07:00
Ronald S. Bultje
8fb5be48a6 Make usage of sb_type independent of literal values.
Change-Id: I0d12f9ef9d960df0172a1377f8e5236eb6d90492
2013-04-10 17:38:57 -07:00
Ronald S. Bultje
a3874850dd Make SB coding size-independent.
Merge sb32x32 and sb64x64 functions; allow for rectangular sizes. Code
gives identical encoder results before and after. There are a few
macros for rectangular block sizes under the sbsegment experiment; this
experiment is not yet functional and should not yet be used.

Change-Id: I71f93b5d2a1596e99a6f01f29c3f0a456694d728
2013-04-09 21:28:27 -07:00
John Koleszar
05a79f2fbf Move EOB to per-plane data
Continue migrating data from BLOCKD/MACROBLOCKD to the per-plane
structures.

Change-Id: Ibbfa68d6da438d32dcbe8df68245ee28b0a2fa2c
2013-04-04 21:30:23 -07:00
John Koleszar
4c05a051ab Move qcoeff, dqcoeff from BLOCKD to per-plane data
Start grouping data per-plane, as part of refactoring to support
additional planes, and chroma planes with other-than 4:2:0
subsampling.

Change-Id: Idb76a0e23ab239180c818025bae1f36f1608bb23
2013-04-04 16:30:57 -07:00
Deb Mukherjee
fe9b5143ba Framework changes in nzc to allow more flexibility
The patch adds the flexibility to use standard EOB based coding
on smaller block sizes and nzc based coding on larger blocksizes.
The tx-sizes that use nzc based coding and those that use EOB based
coding are controlled by a function get_nzc_used().
By default, this function uses nzc based coding for 16x16 and 32x32
transform blocks, which seem to bridge the performance gap
substantially.

All sets are now lower by 0.5% to 0.7%, as opposed to ~1.8% before.

Change-Id: I06abed3df57b52d241ea1f51b0d571c71e38fd0b
2013-03-28 09:33:50 -07:00
Ronald S. Bultje
d9094d8fd3 Add col/row-based coefficient scanning patterns for 1D 8x8/16x16 ADSTs.
These are mostly just for experimental purposes. I saw small gains (in
the 0.1% range) when playing with this on derf.

Change-Id: Ib21eed477bbb46bddcd73b21c5c708a5b46abedc
2013-03-26 16:46:13 -07:00
Ronald S. Bultje
3120dbddb1 Redo banding for all transforms.
Now that the first AC coefficient in both directions use the same DC
as their context, there no longer is a purpose in letting both have
their own band. Merging these two bands allows us to split bands for
some of the very high-frequency AC bands.

In addition, I'm redoing the banding for the 1D-ADST col/row scans. I
don't think the old banding made any sense at all (it merged the last
coefficient of the first row/col in the same band as the first two of
the second row/col), which was clearly an oversight from the band being
applied in scan-order (rather than in their actual position). Now,
coefficients at the same position will be in the same band, regardless
what scan order is used. I think this makes most sense for the purpose
of banding, which is basically "predict energy for this coefficient
depending on the energy of context coefficients" (i.e. pt).

After full re-training, together with previous patch, derf gains about
1.2-1.3%, and hd/stdhd gain about 0.9-1.0%.

Change-Id: I7a0cc12ba724e88b278034113cb4adaaebf87e0c
2013-03-26 16:46:13 -07:00
Ronald S. Bultje
790fb13215 Use above/left (instead of previous in scan-order) as token context.
Pearson correlation for above or left is significantly higher than for
previous-in-scan-order (absolute values depend on position in scan, but
in general, we gain about 0.1-0.2 by using either above or left; using
both basically just makes this even better). For eob branch skipping,
we continue to use the previous token in scan order.

This helps about 0.9% on derf after re-training on a limited data set.
Full re-training and results on larger-resolution clips are pending.

Note that this commit breaks trellis, so we can probably get further
gains out of it by fixing trellis at some later point.

Change-Id: Iead68e296fc3a105cca746b5e3da9555d6010cfe
2013-03-26 16:46:09 -07:00
Ronald S. Bultje
d3724abe9f Re-add support for ADST in superblocks.
This also changes the RD search to take account of the correct block
index when searching (this is required for ADST positioning to work
correctly in combination with tx_select).

Change-Id: Ie50d05b3a024a64ecd0b376887aa38ac5f7b6af6
2013-03-07 11:19:10 -08:00
Deb Mukherjee
eb6ef2417f Coding con-zero count rather than EOB for coeffs
This patch revamps the entropy coding of coefficients to code first
a non-zero count per coded block and correspondingly remove the EOB
token from the token set.

STATUS:
Main encode/decode code achieving encode/decode sync - done.
Forward and backward probability updates to the nzcs - done.
Rd costing updates for nzcs - done.
Note: The dynamic progrmaming apporach used in trellis quantization
is not exactly compatible with nzcs. A suboptimal approach has been
used instead where branch costs are updated to account for changes
in the nzcs.

TODO:
Training the default probs/counts for nzcs

Change-Id: I951bc1e22f47885077a7453a09b0493daa77883d
2013-03-07 07:20:30 -08:00
Ronald S. Bultje
111ca42133 Make superblocks independent of macroblock code and data.
Split macroblock and superblock tokenization and detokenization
functions and coefficient-related data structs so that the bitstream
layout and related code of superblock coefficients looks less like it's
a hack to fit macroblocks in superblocks.

In addition, unify chroma transform size selection from luma transform
size (i.e. always use the same size, as long as it fits the predictor);
in practice, this means 32x32 and 64x64 superblocks using the 16x16 luma
transform will now use the 16x16 (instead of the 8x8) chroma transform,
and 64x64 superblocks using the 32x32 luma transform will now use the
32x32 (instead of the 16x16) chroma transform.

Lastly, add a trellis optimize function for 32x32 transform blocks.

HD gains about 0.3%, STDHD about 0.15% and derf about 0.1%. There's
a few negative points here and there that I might want to analyze
a little closer.

Change-Id: Ibad7c3ddfe1acfc52771dfc27c03e9783e054430
2013-03-04 16:34:36 -08:00