The commit rationized and simplified the entropy context conversion
betwen MB using 8x8 transform and MB using 4x4 transform. The old version
had a number of weirdness in how 4x4 transform MB's context is used for
8x8 blocks other than the first 8x8 within a MB.
Test showed the change has a gain ~.1% for avg psnr, glb psnr and ssim on
the limited HD set.
Change-Id: I774536c416baa6845aa741f956d8a69fa40e5d47
For 8x8 transformed macroblock, the 2nd order transform is a 2x2 haar
transform, here there is only 4 coefficients total. A previous merge
changed these to 64, causing crashes when encoding with 8x8 transform
enabled. (i.e. when input video image size > 640x360 ) This commit
reverts them back to 4 and fixes the crashes.
Change-Id: I3290b81f8c0d32c7efec03093a61ea57736c0550
In summary, this commit encompasses a series of changes in attempt to
improve the 8x8 transform based coding to help overall compression
quality, please refer to the detailed commit history below for what
are the rationale underly the series of changes:
a. A frame level flag to indicate if 8x8 transform is used at all.
b. 8x8 transform is not used for key frames and small image size.
c. On inter coded frame, macroblocks using modes B_PRED, SPLIT_MV
and I8X8_PRED are forced to using 4x4 transform based coding, the
rest uses 8x8 transform based coding.
d. Encoder and decoder has the same assumption on the relationship
between prediction modes and transform size, therefore no signaling
is encoded in bitstream.
e. Mode decision process now calculate the rate and distortion scores
using their respective transforms.
Overall test results:
1. HD set
http://www.corp.google.com/~yaowu/no_crawl/t8x8/HD_t8x8_20120206.html
(avg psnr: 3.09% glb psnr: 3.22%, ssim: 3.90%)
2. Cif set:
http://www.corp.google.com/~yaowu/no_crawl/t8x8/cif_t8x8_20120206.html
(avg psnr: -0.03%, glb psnr: -0.02%, ssim: -0.04%)
It should be noted here, as 8x8 transform coding itself is disabled
for cif size clips, the 0.03% loss is purely from the 1 bit/frame
flag overhead on if 8x8 transform is used or not for the frame.
---patch history for future reference---
Patch 1:
this commit tries to select transform size based on macroblock
prediction mode. If the size of a prediction mode is 16x16, then
the macroblock is forced to use 8x8 transform. If the prediction
mode is B_PRED, SPLITMV or I8X8_PRED, then the macroblock is forced
to use 4x4 transform. Tests on the following HD clips showed mixed
results: (all hd clips only used first 100 frames in the test)
http://www.corp.google.com/~yaowu/no_crawl/t8x8/hdmodebased8x8.htmlhttp://www.corp.google.com/~yaowu/no_crawl/t8x8/hdmodebased8x8_log.html
while the results are mixed and overall negative, it is interesting to
see 8x8 helped a few of the clips.
Patch 2:
this patch tries to hard-wire selection of transform size based on
prediction modes without using segmentation to signal the transform size.
encoder and decoder both takes the same assumption that all macroblocks
use 8x8 transform except when prediciton mode is B_PRED, I8X8_PRED or
SPLITMV. Test results are as follows:
http://www.corp.google.com/~yaowu/no_crawl/t8x8/cifmodebase8x8_0125.htmlhttp://www.corp.google.com/~yaowu/no_crawl/t8x8/hdmodebased8x8_0125log.html
Interestingly, by removing the overhead or coding the segmentation, the
results on this limited HD set have turn positive on average.
Patch 3:
this patch disabled the usage of 8x8 transform on key frames, and kept the
logic from patch 2 for inter frames only. test results on HD set turned
decidedly positive with 8x8 transform enabled on inter frame with 16x16
prediction modes: (avg psnr: .81% glb psnr: .82 ssim: .55%)
http://www.corp.google.com/~yaowu/no_crawl/t8x8/hdintermode8x8_0125.html
results on cif set still negative overall
Patch 4:
continued from last patch, but now in mode decision process, the rate and
distortion estimates are computed based on 8x8 transform results for MBs
with modes associated with 8x8 transform. This patch also fixed a problem
related to segment based eob coding when 8x8 transform is used. The patch
significantly improved the results on HD clips:
http://www.corp.google.com/~yaowu/no_crawl/t8x8/hd8x8RDintermode.html
(avg psnr: 2.70% glb psnr: 2.76% ssim: 3.34%)
results on cif also improved, though they are still negative compared to
baseline that uses 4x4 transform only:
http://www.corp.google.com/~yaowu/no_crawl/t8x8/cif8x8RDintermode.html
(avg psnr: -.78% glb psnr: -.86% ssim: -.19%)
Patch 5:
This patch does 3 things:
a. a bunch of decoder bug fixes, encodings and decodings were verified
to have matched recon buffer on a number of encodes on cif size mobile and
hd version of _pedestrian.
b. the patch further improved the rate distortion calculation of MBS that
use 8x8 transform. This provided some further gain on compression.
c. the patch also got the experimental work SEG_LVL_EOB to work with 8x8
transformed macroblock, test results indicates it improves the cif set
but hurt the HD set slightly.
Tests results on HD clips:
http://www.corp.google.com/~yaowu/no_crawl/t8x8/HD_t8x8_20120201.html
(avg psnr: 3.19% glb psnr: 3.30% ssim: 3.93%)
Test results on cif clips:
http://www.corp.google.com/~yaowu/no_crawl/t8x8/cif_t8x8_20120201.html
(avg psnr: -.47% glb psnr: -.51% ssim: +.28%)
Patch 6:
Added a frame level flag to indicate if 8x8 transform is allowed at all.
temporarily the decision is based on frame size, can be optimized later
one. This get the cif results to basically unchanged, with one bit per
frame overhead on both cif and hd clips.
Patch 8:
Rebase and Merge to head by PGW.
Fixed some suspect 4s that look like hey should be 64s in regard
to segmented EOB. Perhaps #defines would be bette.
Bulit and tested without T8x8 enabled and produces unchanged
output.
Patch 9:
Corrected misalligned code/decode of "txfm_mode" bit.
Limited testing for correct encode and decode with
T8x8 configured on derf clips.
Change-Id: I156e1405d25f81579d579dff8ab9af53944ec49c
This commit removed the macro CONFIG_I8X8, which was used to indicate
the 8x8 intra prediction experiment, made the change fully merged in.
Change-Id: Iafa4443781ce6e83f5591c12ba615a0e92ce0ea0
This commit enabled the usage of 8x8 intra prediction modes on inter
frames. There are a few TODO items related to this: 1)baseline entropy
need be calibrated; 2)cost of UV need to be done more properly rather
than using decision only relying on Y; 3)Threshold for allowing picking
8x8 intra prediction should be lowered to lower than the B_PRED.
Even with all the TODOs, tests showed consistent gain on derf set ~0.1%
(PSNR:0.08% and SSIM:0.14%). It is assumed that 8x8 intra prediction
will help more on large resolution clips, especially with above TODOs
addressed.
Change-Id: I398ada49dfc32575cfab962a569c2885111ae3ba
this commit is to add an variable in the macroblock level mode
info structure to track the transform size used in each MB, so
the information can be used later in the loop filter to change
how loop filter works on MBs with different transform sizes.
Change-Id: Id0eeaba6cc854c6d1be00ed8d237b3d9e250e447
Initial attempt at using new segment feature signaling
to indicate 4x4 or 8x8 transform.
needs --enable-experimental --enable-t8x8
Note this is work in progress.
Change-Id: Ib160d46a5d810307bfcbc79853ce1a65b5b870b7
No change to functionality or output.
Updates to the segment feature data structure now all done
through functions such as set_segdata() and get_segdata()
in seg_common.c.
The reason for this is to make changing the structures (if needed)
and debug easier.
In addition it provides a single location for subsequent addition
of range and validity checks. For example valid combination of
mode and reference frame.
Change-Id: I2e866505562db4e4cb6f17a472b25b4465f01add
Removal of configure #ifdefs so that segment features
always available. Removal of code supporting old
segment feature method.
Still a good deal of tidying up to do.
Change-Id: I397855f086f8c09ab1fae0a5f65d9e06d2e3e39f
Some correction for entropy impact of segment signaled (EOB and ref frame)
Other slight tweaks.
Derf VBR average gain now over 1% (best over 7%)
One YT test clip has gains of circa 30% (VBR)
There is still an issue with noisy clips where making the background static
and coded with 0,0 can have a negative effect, especially at low Q.
This is probably because of the loss of smoothing by fractional pixel filters.
Change-Id: I7a225613c98067b96f8fc7a7e36f95d465b2b834
This quite large check in includes the following:
Merge in some code from Ronald (mbgraph.c) that scans a Gf/arf group.
This is used as a basis for a simple segmentation for the normal frames
in a gf/arf group. This code also uses satd functions from Yaowu.
Adds functionality for coding the latest possible position of an EOB for
blocks in the segment. (Currently 0-15 only, hence just for 4x4 dct).
Where the EOB position is 0 this acts like "skip" and the normal coding
of skip at the per mb level is disabled.
Added functions (seg_common.c) for setting and reading segment feature
elements. These may want to be optimized away at some point but while the
mecahnism is in a state of flux they provide a single location for making
changes and keep things a bit cleaner.
This is still proof of concept code. Currently the tested feature set:-
Quantizer,
Loop Filter level,
Reference frame,
Prediction Mode,
EOB end stop.
TBD:-
Add functions for setting and reading the feature data with range
and validity checking.
Handling of signed and unsigned feature data. At the moment all is assumed
to be signed and a sign bit is coded but many cannot be negative.
Correct handling of EOB feature with intra coded blocks.
Testing/trapping of legal/illegal ref frame and mode combinations.
Transform size switch plus merge and test with 8c8 DCT work
Merge and test with Sumans Segmenation coding optimizations
Change-Id: Iee12e83661c7abbd1e0ce6810915eb4ec35e2d8e
Patch 1 to Patch 3 is an initial implementation of 8x8 intra prediction
modes, here are with the following assumptions:
a. 8x8 has 4 prediction modes DC, H, V and TM
b. UV 4x4 block use the same mode as corresponding 8x8 area
c. i8x8 modes are enabled for key frame only for now
Patch 4:
d. removed debug code from previous patches
Patch 5:
e. added stats code to collect entropy stats and further cleaned up
Patch 6:
f. changed mode stats code to collect finer stats of modes
Patch 7:
g. normalized i8x8 modes distribution to total at 256 (8bits).
Patch 8:
h. fixed a bug in decoder and removed debug printf output.
Patch 9:
i. more cleanups to address paul's comment
Patch 10:
j. messy rebase/merges to bring the commit up to date.
Tests on HD clips encoded with all key frame showing consistent gain
on all clips and all metrics:~0.5%(psnr) and 0.6%(ssim):
http://www.corp.google.com/~yaowu/no_crawl/i8x8hd_allkey_fixedq.html
To build and test, configure with:
--enable-experimental --enable-i8x8
Change-Id: I9813fe07ae48cab5fdb5d904bca022514ad01e7f
With this fix, the experimental branch now builds and encodes correctly
with the following two configure options respectively:
--enable-experimental --enable-t8x8
--enable-experimental
Change-Id: I3147c33c503fe713a85fd371e4f1a974805778bf
Optimized C-code of the following functions:
- vp8_tokenize_mb
- tokenize1st_order_b
- tokenize2nd_order_b
Gives ~1-5% speed-up for RT encoding on Cortex-A8/A9
depending on encoding parameters.
Change-Id: I6be86104a589a06dcbc9ed3318e8bf264ef4176c
There were many instances in the code of vp8_coef_tokens and
vp8_coef_tokens-1, which was a preprocessor macro despite the naming
convention. Replace these with MAX_ENTROPY_TOKENS and ENTROPY_NODES,
respectively.
Change-Id: I72c4f6c7634c94e1fa066cd511471e5592c748da
The dc_diff flag is used to skip loopfiltering. Instead
of setting this flag in the decoder/encoder, we now check
for this condition in the loopfilter.
Change-Id: Ie2b9cdf9e0f4e8b932bbd36e0878c05bffd28931
A large number of functions were defined with external linkage, even
though they were only used from within one file. This patch changes
their linkage to static and removes the vp8_ prefix from their names,
which should make it more obvious to the reader that the function is
contained within the current translation unit. Functions that were
not referenced were removed.
These symbols were identified by:
$ nm -A libvpx.a | sort -k3 | uniq -c -f2 | grep ' [A-Z] ' \
| sort | grep '^ *1 '
Change-Id: I59609f58ab65312012c047036ae1e0634f795779
Per John's previous change, shrink TOKENEXTRA from 20 to 8 bytes
original: b7b1e6fb
reverted: 41f4458a
Also drop unused field from vp8_extra_bit_struct
Update ARM ASM to deal with this change. In particular, Extra is signed
and needs to be sign-extended when loaded.
Change-Id: Ibd0ddc058432bc7bb09222d6ce4ef77e93a30b41
Change the size of structure elements to reduce memory utilization.
Removed the 'section' member entirely, as it is set but never read.
Change-Id: Iad043830392fb4168cb3cd6075fb0eb70c7f691c
This patch moves the scattered updates to the mb skip state
(mode_info_context->mbmi.mb_skip_coeff) to vp8_tokenize_mb. Recent
changes to the quantizer exposed a bug where if a macroblock
could be coded as a skip but isn't, the encoder would run the
loopfilter but the decoder wouldn't, causing a reference buffer
mismatch.
The loopfilter is controlled by a flag called dc_diff. The decoder
looks at the number of decoded coefficients when setting this flag.
The encoder sets this flag based on the skip state, since any
skippable macroblock should be transmitted as a skip. The coefficient
optimization pass (vp8_optimize_b()) could change the coefficients
such that a block that was not a skip becomes one. The encoder was
not updating the skip state in this situation for intra coded blocks.
The underlying issue predates it, but this bug was recently triggered
by enabling trellis quantization on the Y2 block in commit dcd29e3,
and by changing the quantizer range control in commit 305be4e.
Change-Id: I5cce5da0dbc2d22f7d79ee48149f01e868a64802
This patch reduces the size of the global tables maintained by the
tokenizer to 16k from 80k-96k. See issue #177.
Change-Id: If0275d5f28389af11ac83c5d929d1157cde90fbe
Changes 'The VP8 project' to 'The WebM project', for consistency
with other webmproject.org repositories.
Fixes issue #97.
Change-Id: I37c13ed5fbdb9d334ceef71c6350e9febed9bbba
The main reason for the change was to reduce cycles in the token
decoder. (~1.5% gain for 32 bit) This layout should be more
cache friendly.
As a result of this change, the encoder had to be updated.
Change-Id: Id5e804169d8889da0378b3a519ac04dabd28c837
Note: dixie uses a similar layout
These copies occurred for each macroblock in the encoder and decoder.
Thetemp MB_MODE_INFO mbmi was removed from MACROBLOCKD. As a result,
a large number compile errors had to be fixed.
Change-Id: I4cf0ffae3ce244f6db04a4c217d52dd256382cf3
Replace the exponential search for optimal rounding during
quantization with a linear Viterbi trellis and enable it
by default when using --best.
Right now this operates on top of the output of the adaptive
zero-bin quantizer in vp8_regular_quantize_b() and gives a small
gain.
It can be tested as a replacement for that quantizer by
enabling the call to vp8_strict_quantize_b(), which uses
normal rounding and no zero bin offset.
Ultimately, the quantizer will have to become a function of lambda
in order to take advantage of activity masking, since there is
limited ability to change the quantization factor itself.
However, currently vp8_strict_quantize_b() plus the trellis
quantizer (which is lambda-dependent) loses to
vp8_regular_quantize_b() alone (which is not) on my test clip.
Patch Set 3:
Fix an issue related to the cost evaluation of successor
states when a coefficient is reduced to zero. With this
issue fixed, now the trellis search almost exactly matches
the exponential search.
Patch Set 2:
Overall, the goal of this patch set is to make "trellis"
search to produce encodings that match the exponential
search version. There are three main differences between
Patch Set 2 and 1:
a. Patch set 1 did not properly account for the scale of
2nd order error, so patch set 2 disable it all together
for 2nd blocks.
b. Patch set 1 was not consistent on when to enable the
the quantization optimization. Patch set 2 restore the
condition to be consistent.
c. Patch set 1 checks quantized level L-1, and L for any
input coefficient was quantized to L. Patch set 2 limits
the candidate coefficient to those that were rounded up
to L. It is worth noting here that a strategy to check
L and L+1 for coefficients that were truncated down to L
might work.
(a and b get trellis quant to basically match the exponential
search on all mid/low rate encodings on cif set, without
a, b, trellis quant can hurt the psnr by 0.2 to .3db at
200kbps for some cif clips)
(c gets trellis quant to match the exponential search
to match at Q0 encoding, without c, trellis quant can be
1.5 to 2db lower for encodings with fixed Q at 0 on most
derf cif clips)
Change-Id: Ib1a043b665d75fbf00cb0257b7c18e90eebab95e
When the license headers were updated, they accidentally contained
trailing whitespace, so unfortunately we have to touch all the files
again.
Change-Id: I236c05fade06589e417179c0444cb39b09e4200d