Remove default cortex-a8 tuning.
Probably not even the dominant platform the library is being built for.
Add --cpu= option description to help. The option already exists.
Don't allow passing just --cpu as a no-op.
BUG=826
Change-Id: Iaa3f4f693ec78b18927b159b480daafeba0549c0
Under the experiment of CONFIG_LAST4_REF. On derflr testset, using
highbitdepth (HBD), in average PSNR,
(1) LAST2+LAST3+LAST4 obtained +0.361% against LAST2+LAST3;
(2) LAST2+LAST3+LAST4 obtained +1.567% against baesline.
Change-Id: Ic8b14272de6a569df2b54418fa72b505e1ed3aad
Under experiment CONFIG_LAST3_REF, which can only be turned on when
the experiment of CONFIG_MULTI_REF is on, i.e. LAST3_FRAME can only
be used when LAST2_FRAME is used. CONFIG_LAST3_REF would most likely
be combined with CONFIG_MULTI_REF once the performance improvement
is further confirmed.
On the testset of derflr, using Average PSNR metrics, with HighBitDepth
(HBD) on:
(1) LAST2 HBD obtained +0.579% against base HBD;
(2) LAST2 + LAST3 HBD obtained +0.591% against LAST2 HBD;
(3) LAST2 + LAST3 HBD obtained +1.173% against base HBD.
Change-Id: I1aa2b2e2d2c9834e5f8e61bf2d8818c7b1516669
Under the experiment CONFIG_MULTI_REF. Current version shows
LAST2 vs base in nextgen on the testset of derflr:
(1) 8-bit: Average PSNR +0.53%
(worst: students_cif: -0.247%; best: mobile_cif: 1.902%)
(2) 12-bit HBD: Average PSNR +0.63%
(worst: pamphlet_cif: -0.213%, best: mobile_cif: 2.101%)
More tuning on the reference frame context design and default
probs are being conducted. This version does not guarantee to
work with other experiments in nextgen. A separate CL will address
the working with all other experiments.
Change-Id: I7f40d2522517afc26ca389c995bad56989587f65
CONFIG_SR_MODE=1, enable SR mode
USE_POST_F=1, enable SR post filter
SR_USE_MULTI_F=1, enable SR post filter family
Not compatible with other experiments yet
Change-Id: I116f1d898cc2ff7dd114d7379664304907afe0ec
Adds an additional transform in the ext_tx experiment that
is a 2d DST1-DST1 combination.
To enable use --enable-ext-tx --enable-dst1.
This needs to be later extended to combine DST1 with DCT
or ADST.
Change-Id: I6d29f1b778ef8294bcfb6a512a78fc5eda20723b
There are 4 entropy tables to select for initial entropy table,
depending on the frame base q-index. The entropy tables are
trained with derf, yt, and stdhd sets. About 0.2% gain on
the following test sets:
derflr 0.227%
yt 0.277%
stdhd 0.233%
hevclr 0.221%
hevcmr 0.155%
hevchr 0.182%
Change-Id: I3fde846c47fc020e80c814897690b4cda1da569c
Change-Id: I460408372586c823974f945ed9fd8dcb0360fbaf
The wavelets implemented are 2/6, 5/3 and 9/7 each with
a lifting based scheme for even block sizes. The 9/7
one is a double implementation currently.
This is to start experiments with:
1. Replacing large transforms (32x32 and 64x64) with wavelets
or wavelet-dct hybrids that can hopefully localize errors better
spatially. (Will also need alternate entropy coder)
2. Super-resolution modes where the higher sub-bands may be
selectively skipped from being conveyed, while a smart
reconstruction recovers the lost frequencies.
The current patch includes two types of 32x32 and 64x64
transforms: one where only wavelets are used, and another
where a single level wavelet decomposition is followed
by a lower resolution dct on the low-low band.
Change-Id: I2d6755c4e6c8ec9386a04633dacbe0de3b0043ec
Skip flag is removed from the bitstream for all blocks coded in intra
mode. Very minor coding gain in derf and stdhd sets (0.048 and 0.1)
Change-Id: I79f03300f16d6fa84ce54405cafecab8a021cd7d
This commit makes the bit-stream syntax support fast selective tile
decoding in a large scale tile array. It reduces the computational
complexity of computing the target tile offset in the bit-stream
from quadratic to linear scale, while maintaining relatively small
stack space requirement (in the order of 1024 bytes instead of 1M
bytes). The overhead cost due to tile separation remains identical.
Change-Id: Id60c6915733d33a627f49e167c57d2534e70aa96
Now this is an on-going work on the re-work on the motion vector
references for all Inter coding modes. Currently implementation on
sub8x8 and BLOCK_8X8 is done. More work will be added along the way.
Essetial ideas include:
(1) Added new nearestmv algorithm through adaptive median search, out of
the four nearest neighbors: TOP, LEFT, TOPLEFT, and TOPRIGHT;
(2) Added new sheme for sub8x8 to obtain the mv ref candidates,
specially, mv ref candidates are obtained depending on the sub8x8 mode
of the current block as well as the sub8x8 mode of the neighboring
block;
(3) Added top right corner mv ref candidate whenever it is available;
The adding of the top right mv ref has showed potential in helping such
video clips as bridge_far_cif.
Change-Id: I573c04cd346ed7010f4ad87a6eaa6bab6e2caf9c
Fix the row tile boundary detection issues. This allows to use
more resources for parallel encoding/decoding when avaiable.
Change-Id: Ifda9f66d1d7c2567dd4e0a572a99a83f179b55f9
Adds a framework to incorporate a parameterized loop
postfilter in the coding loop after the application of the
standard deblocking loop filter.
The first version uses a straight bilateral filter
where the parameters conveyed are just spatial and
intensity gaussian variances.
Results on derflr:
+0.523% (only with this experiment)
+6.714% (with all expts other than intrabc)
Change-Id: I20d47285b4d25b8c6386ff8af2a75ff88ac2b69b
This experiment, referred as NEWMVREF, also merged with NEWMVREF_SUB8X8
and the latter one has been removed. Runborgs results show that:
(1) Turning on this experiment only, compared against the base:
derflf: Average PSNR 0.40%; Overall PSNR 0.40%; SSIM 0.35%
(2) Turning on all the experiments including this feature, compared against
that without this feature, on the highbitdepth case using 12-bit:
derflf: Average PSNR 0.33%; Overall PSNR 0.32%; SSIM 0.30%.
Now for highbitdepth using 12-bit, compared against base:
derflf: Average PSNR 11.12%; Overall PSNR 11.07%; SSIM 20.27%.
Change-Id: Ie61dbfd5a19b8652920d2c602201a25a018a87a6
This framework allows lower quantization bins to be shrunk down or
expanded to match closer the source distribution (assuming a generalized
gaussian-like central peaky model for the coefficients) in an
entropy-constrained sense. Specifically, the width of the bins 0-4 are
modified as a factor of the nominal quantization step size and from 5
onwards all bins become the same as the nominal quantization step size.
Further, different bin width profiles as well as reconstruction values
can be used based on the coefficient band as well as the quantization step
size divided into 5 ranges.
A small gain currently on derflr of about 0.16% is observed with the
same paraemters for all q values.
Optimizing the parameters based on qstep value is left as a TODO for now.
Results on derflr with all expts on is +6.08% (up from 5.88%).
Experiments are in progress to tune the parameters for different
coefficient bands and quantization step ranges.
Change-Id: I88429d8cb0777021bfbb689ef69b764eafb3a1de
For all sub8x8 partition, the mv ref has changed to its own nearest_mv
instead of the nearest_mv of the super 8x8 block:
--enable-newmvref-sub8x8: ~0.1% gain for derflr
Besides the above new experiment, code has been cleaned largely for the
sub8x8 motion search, such that the mv ref can be fairly easily changed to
use other options, e.g., the next step's global motion effort.
Change-Id: I8e3f4aaa8553ba8c445369692e079db5ce282593
Added palette coding option for Y channel only, key frame only.
Palette colors are obtained with k-means clustering algorithm.
Run-length coding is used to compress the color indices.
On screen_content:
--enable-palette + 5.75%
--enable-palette --enable-tx_skip +15.04% (was 13.3%)
On derflr:
--enable-palette - 0.03%
with all the other expriments + 5.95% (was 5.98%)
Change-Id: I6d1cf45c889be764d14083170fdf14a424bd31b5
Added a function to compute a motion field for a pair of buffers, for use in
finding an affine transform or homography.
Change-Id: Id5169cc811a61037e877dfd57fccaca89d93936f
COMPOUND_MODES experiment encodes separate MV modes for each frame in a compound
reference prediction. Added modes NEAREST_NEARESTMV, ZERO_ZEROMV,
NEW_NEWMV, NEAREST_NEARMV, NEAR_NEARESTMV, NEW_NEARESTMV, NEAR_NEWMV,
NEW_NEARMV, and NEAREST_NEWMV.
Also enhances the wedge-partition expt to work better with compound
modes.
Results:
derflr +0.227
All experiments on: derflr +5.218
Change-Id: I719e8a34826bf1f1fe3988dac5733a845a89ef2b
A smooth weighting scheme is used to put more weight
on the intra predictor samples near the left/top boundaries
and decaying it to favor the inter predictor samples more as
we move away from these boundaries in the direction of
prediction.
Results:
derflr: +0.609% with only this experiment
derflr: +3.901% with all experiments
Change-Id: Ic9dbe599ad6162fb05900059cbd6fc88b203a09c
Reimplements the supertx experiment from the playground branch.
Makes it work with other experiments.
Results:
With --enable-superttx
derflr: +0.958
With --enable-supertx --enable-ext-tx
derflr: +2.25%
With --enable-supertx --enable-ext-tx --enable-filterintra
derflr: +2.73%
Change-Id: I5012418ef2556bf2758146d90c4e2fb8a14610c7
Non-transform option is enabled in both intra and inter modes.
In lossless case, the average coding gain on screen content
clips is 11.3% in my test.
Change-Id: I2e8de515fb39e74c61bb86ce0f682d5f79e15188
Preliminary 64x64 transform implementation.
Includes all code changes.
All mismatches resolved.
Coding results for derf and stdhd are within noise. stdhd is slightly
higher, derf is slightly lower.
To be further refined.
Change-Id: I091c183f62b156d23ed6f648202eb96c82e69b4b
Incorporates the WRAPLOW macro into the non-highbitdepth transforms
to aid hardware verification between a software C model and an
intended hardware implementation though the use of the configure
options: --enable-experimental --enable-emulate-hardware.
Note that to avoid further discrepancies between the sse/sse2
implementations of the transforms and the C implementation, when the
emulate hardware option is invoked, we also disable sse/sse2/etc.
Also incudes some minor cleanups/renaming etc.
Change-Id: Ib864d8493313927d429cce402982f1c8e45b3287
Adds various high bitdepth transform functions and tests.
Much of the changes are related to using typedefs tran_low_t
and tran_high_t for the final transform cofficients and intermediate
stages of the transform computation respectively rather than fixed
types int16_t/int. When vp9_highbitdepth configure flag is off,
these map tp int16_t/int32_t, but when the flag is on, they map
to int32_t/int64_t to make space for needed extra precision.
Change-Id: I3c56de79e15b904d6f655b62ffae170729befdd8
Adds config parameter vp9_highbitdepth, to support highbitdepth profiles.
Also includes most vpx level high bit-depth functions. However
encode/decode in the highbitdepth profiles will not work until
the rest of the code is in place.
Change-Id: I34c53b253c38873611057a6cbc89a1361b8985a6
This commit adds a configure time option used to enable strict error
checking in decoder to make sure intermediate stage cofficients of
inverse transforms are within valid range of signed 16 bit integer.
For valid VP9 input streams, intermediate stage coefficients should
always stay within the range of a signed 16 bit integer. Coefficients
can go out of this range for invalid/corrupt VP9 streams. However,
strictly checking this range for every intermediate coefficient can
be a burden for decoder, therefore such validation is only enabled
with configure option --enable-coefficient-range-checking.
Change-Id: I47d47c8c4e48a922c3d223ca59064f51b3f0f5ed
the code tied to CONFIG_MULTIPLE_ARF was deleted in:
2611022 Clean out old CONFIG_MULTIPLE_ARF code.
Change-Id: Ie70bf047cde7e88d4b3996c8ff529e409bbe99e2