Disable SUPERTX mode when lossless mode is enabled because the largest
lossless transform size is 4X4.
Change-Id: I4167959a282728d62354119ced99ca29febabfd1
In lossless coding, tx_size can be larger than 4x4 when transform
skipping is activated. Compared to regular vp9 lossless coding,
performance improvement for derf is about 5%; gain is larger
for screen content videos.
Change-Id: Ib20ece7e117f29fb91543612757302a2400110b4
This patch improves the non-transform coding mode. At this
point, the coding gain on screen content videos is about
12% for lossless, an 15% for lossy case.
1. Encode tx_skip flags with context. Y tx_skip flag context is
whether the prediction mode is inter or intra. UV flag context
is Y flag.
2. Transform skipping is less helpful when the Q-index is high.
So it is enabled only when the Q-index is smaller than a
threshold. Currently the threshold is set as 255 for intra blocks,
and 0 for inter blocks.
3. The shift of the prediction residue, when copying them to the
coeff buffer, is set as 3 when the Q-index is larger than a
threshold (currently set as 0), and 2 otherwise.
Change-Id: I372973c7518cf385f6e542b22d0f803016e693b0
Reimplements the supertx experiment from the playground branch.
Makes it work with other experiments.
Results:
With --enable-superttx
derflr: +0.958
With --enable-supertx --enable-ext-tx
derflr: +2.25%
With --enable-supertx --enable-ext-tx --enable-filterintra
derflr: +2.73%
Change-Id: I5012418ef2556bf2758146d90c4e2fb8a14610c7
Extends the ext-tx experiment to include regular and flipped
DST variants. A total of 9 transforms are thus possible for
each inter block with transform size <= 16x16.
In this patch currently only the four ADST_ADST variants
(flipped or non-flipped in both dimensions) are enabled
for inter blocks.
The gain with the ext-tx experiment grows to +1.12 on derflr.
Further experiments are underway.
Change-Id: Ia2ed19a334face6135b064748f727fdc9db278ec
Non-transform option is enabled in both intra and inter modes.
In lossless case, the average coding gain on screen content
clips is 11.3% in my test.
Change-Id: I2e8de515fb39e74c61bb86ce0f682d5f79e15188
Extends quantizer range and some other fixes.
Change-Id: Ia9adf7848e772783365d6501efd07585bff80c15
stdhd: +0.166%, turns positive a little
derf: -0.036% (very few blocks use the 64x64 mode)
Preliminary 64x64 transform implementation.
Includes all code changes.
All mismatches resolved.
Coding results for derf and stdhd are within noise. stdhd is slightly
higher, derf is slightly lower.
To be further refined.
Change-Id: I091c183f62b156d23ed6f648202eb96c82e69b4b
This commit refactors the rate distortion structure used in the
non-RD coding mode and saves a few RDCOST calculations.
Change-Id: I62c3416c300d2c5372f21b96d93a6b633a34ab3a
The existing speed features produce horrible encoding results, almost
30% worse than cpu-used=4, this commit adjust the speed features to
produce relatively resonable results to be within 3%-5% of cpu-used=4.
Change-Id: I0ca6ebafb33024d4a0cbcf04c78a4a00b8dd1ecf
Its functionality has been replaced with choose_partitioning and
threshold based control on split mode check.
Change-Id: Ic9bb321df06b524f5c38ea5874dc6f6a8f93c5e3
This speed feature has been deprecated in both yt and rtc coding
modes. This commit removes the related operations.
Change-Id: I079c79c6adafe45581af2ebf8b98faebcface1ce
This commit re-designs the recursive partition search scheme in
rtc speed -5. It first checks if the current block is under cyclic
refresh mode. If so, apply recursive partition search. Otherwise,
perform sub-sampled pixel based partition selection. When the
pre-selection finds the partition size should be 32x32 or above,
use the partition size directly. Otherwise, apply partition search
at nearby levels around the preset partition size.
It is enabled in speed -5. The compression performance of rtc
speed -5 is improved by 9.4%. Speed wise, the run-time goes slower
from 1% to 10%.
nik_720p, 1000 kbps
33220 b/f, 38.977 dB, 10109 ms -> 33200 b/f, 39.119 dB, 10210 ms
vidyo1_720p, 1000 kbps
16536 b/f, 40.495 dB, 10119 ms -> 16536 b/f, 40.827 dB, 11287 ms
Change-Id: I65adba352e3adc03bae50854ddaea1b421653c6c
Extend --auto-alt-ref from parameter so we can use it to
turn multi-arf on and off from the command line.
For now the range is 0-off, 1-on, 2-multi-arf on.
Rename play_alternate to enable_auto_arf
Change-Id: Id7b64407cfbe76ba0090a83b588a03e22a240386
All sad function that process above 32 consecutive elements are optimized
for AVX2:
vp9_sad64x64
vp9_sad64x32
vp9_sad32x64
vp9_sad32x32
vp9_sad32x16
vp9_sad64x64_avg
vp9_sad64x32_avg
vp9_sad32x64_avg
vp9_sad32x32_avg
vp9_sad32x16_avg
The functions that appeared as a hotspot is vp9_sad32x32 and vp9_sad64x64
vp9_sad32x32 was optimized by 68% and vp9_sad64x64 was optimized by 90%
both of them gave and overall ~2.3% user level gain
Change-Id: Iccf86b375a2b54c5fbbe685902ead0c9a561b9fd
Cherry-picked from https://gerrit.chromium.org/gerrit/#/c/71914/
(a92f987a6b7819ae5c62a429e126e1c26bdb1b71) on highbitdepth branch.
Change-Id: I6903e4e4cb57d90590725c8a1c64c23da7ae65e8
Currently, the tokens for a tile are stored immediately after its
preceding tile, which causes a dependency. This is unnecessary
since we always allocate enough memory for tokens. Removing
the dependency allows token writing done in parallel. This patch
doesn't change encoding result.
Change-Id: I7365a6e5e2c2833eb14377c37e1503c9d0f26543