A smooth weighting scheme is used to put more weight
on the intra predictor samples near the left/top boundaries
and decaying it to favor the inter predictor samples more as
we move away from these boundaries in the direction of
prediction.
Results:
derflr: +0.609% with only this experiment
derflr: +3.901% with all experiments
Change-Id: Ic9dbe599ad6162fb05900059cbd6fc88b203a09c
In lossless coding, tx_size can be larger than 4x4 when transform
skipping is activated. Compared to regular vp9 lossless coding,
performance improvement for derf is about 5%; gain is larger
for screen content videos.
Change-Id: Ib20ece7e117f29fb91543612757302a2400110b4
This patch improves the non-transform coding mode. At this
point, the coding gain on screen content videos is about
12% for lossless, an 15% for lossy case.
1. Encode tx_skip flags with context. Y tx_skip flag context is
whether the prediction mode is inter or intra. UV flag context
is Y flag.
2. Transform skipping is less helpful when the Q-index is high.
So it is enabled only when the Q-index is smaller than a
threshold. Currently the threshold is set as 255 for intra blocks,
and 0 for inter blocks.
3. The shift of the prediction residue, when copying them to the
coeff buffer, is set as 3 when the Q-index is larger than a
threshold (currently set as 0), and 2 otherwise.
Change-Id: I372973c7518cf385f6e542b22d0f803016e693b0
Reimplements the supertx experiment from the playground branch.
Makes it work with other experiments.
Results:
With --enable-superttx
derflr: +0.958
With --enable-supertx --enable-ext-tx
derflr: +2.25%
With --enable-supertx --enable-ext-tx --enable-filterintra
derflr: +2.73%
Change-Id: I5012418ef2556bf2758146d90c4e2fb8a14610c7
Extends the ext-tx experiment to include regular and flipped
DST variants. A total of 9 transforms are thus possible for
each inter block with transform size <= 16x16.
In this patch currently only the four ADST_ADST variants
(flipped or non-flipped in both dimensions) are enabled
for inter blocks.
The gain with the ext-tx experiment grows to +1.12 on derflr.
Further experiments are underway.
Change-Id: Ia2ed19a334face6135b064748f727fdc9db278ec
Non-transform option is enabled in both intra and inter modes.
In lossless case, the average coding gain on screen content
clips is 11.3% in my test.
Change-Id: I2e8de515fb39e74c61bb86ce0f682d5f79e15188
Preliminary 64x64 transform implementation.
Includes all code changes.
All mismatches resolved.
Coding results for derf and stdhd are within noise. stdhd is slightly
higher, derf is slightly lower.
To be further refined.
Change-Id: I091c183f62b156d23ed6f648202eb96c82e69b4b
When early termination is triggered, properly reset the rate cost
to invalid value to avoid potential ioc issue.
Change-Id: I3444390be2e49a34bb02cf8a74c33d5dbd96d88d
This commit fixes an ioc issue that will happen when the cumulative
variables are not in effective use. The fix discards these
redundant additions.
Change-Id: Idbac5bfb989c0cedc5f8a323effce938519b2457
This commit makes a struct that contains rate value, distortion
value, and the rate-distortion cost. The goal is to provide a
better interface for rate-distortion related operation. It is
first used in rd_pick_partition and saves a few RDCOST calculations.
Change-Id: I1a6ab7b35282d3c80195af59b6810e577544691f
- Some fixes to surface fit.
- Returns variance function as cost rather than sad in the
pattern search and diamond search functions. Only
vp9_pattern_search_sad function used in bigdia search
uses sad as integer 1-away costs.
- Deploys SUBPEL_TREE_PRUNED_MORE for speed 4+.
Results:
derf [Speed 3]: About +0.036% in coding efficiency without any
discernible speed loss.
derf [Speed 4]: About 2-3% faster at -0.199% loss in coding efficiency.
derf [Speed 5]: About 3-4% faster at -0.149% loss in coding efficiency.
Change-Id: I8462f94f6adb46966ca964f2bd0400977357fd63
In model_rd_for_sb function, the spatial domain SSE and variance
are checked to see if transform coefficients are quantized to 0.
Besides that, this patch adds another set of thresholds that are
much more strict. These thresholds are used to conduct a partition
block level check to measure if all its TX blocks are skippable
for YUV planes. If it is true, x->skip is set for this partition
block, and thus its mode search is terminated.
This speeds up the encoding at very low prediction error case,
such as screen sharing application. This patch covers what
rd_encode_breakout_test() does, so that function is removed.
Borg test at speed 3 shows:
For stdhd set, psnr: +0.008%, ssim: +0.014%;
For derf set, psnr: +0.018%, ssim: +0.025%.
No noticeable speed change.
Change-Id: I4e5f15cf10016a282a68e35175ff854b28195944
Fixed an encoder crash. Set skip_txfm to 0 for cases that skip_txfm
isn't calculated. Put memcpy of skip_txfm at right place.
Change-Id: Ib3b6afc1b251a85b2a853c8138fb3393f48cfef6
The functions b_width_log2 and b_height_log2 only do direct
table fetch. This commit unifies such use cases by using the
table directly and removes these functions.
Change-Id: I3103fc6ba959c1182886a2799d21b8b77c8a7b6b
This commit fixes a buffer pointer mis-use in store_coding_context.
The compression performance for stdhd set of speed 3 is improved by
0.097%. It fixes issue 869.
Change-Id: Idc59e22035eaf39f7133ca04174894374d647ff7
It is possible that the GOLDEN reference frame is not avaiable, in
which setting the predicted mv will be associated with a residual
value of INT_MAX. This commit checks this condition before
left shift and comparison with that of ALTREF frame, to avoid
overflow issue.
Change-Id: Ib98c3149dbdd016f2fe5beaafb13f67d469dd07c
This commit enables the encoder to skip split partition search if
the bigger block size has all non-zero quantized coefficients in low
frequency area and the total rate cost is below a certain threshold.
It logarithmatically scales the rate threshold according to the
current block size. For speed 3, the compression performance loss:
derf -0.093%
stdhd -0.066%
Local experiments show 4% - 20% encoding speed-up for speed 3.
blue_sky_1080p, 1500 kbps
51051 b/f, 35.891 dB, 67236 ms ->
50554 b/f, 35.857 dB, 59270 ms (12% speed-up)
old_town_cross_720p, 1500 kbps
14431 b/f, 36.249 dB, 57687 ms ->
14108 b/f, 36.172 dB, 46586 ms (19% speed-up)
pedestrian_area_1080p, 1500 kbps
50812 b/f, 40.124 dB, 100439 ms ->
50755 b/f, 40.118 dB, 96549 ms (4% speed-up)
mobile_calendar_720p, 1000 kbps
10352 b/f, 35.055 dB, 51837 ms ->
10172 b/f, 35.003 dB, 44076 ms (15% speed-up)
Change-Id: I412e34db49060775b3b89ba1738522317c3239c8