Runborgs results on derflr show consistent results between NEW_INTER
and the previous combination of NEWMVREF and COMPOUND_MODES.
Change-Id: Ieba239c4faa7f93bc5c05ad656a7a3b818b4fbfc
Fix the row tile boundary detection issues. This allows to use
more resources for parallel encoding/decoding when avaiable.
Change-Id: Ifda9f66d1d7c2567dd4e0a572a99a83f179b55f9
Besides code cleaning, this patch contains 3 fixes:
(1) Fixed the COMPOUND_MODES for the NEW_NEWMV mode;
(2) Fixed the joint search when the NEAR_FORNEWMV mode (in NEWMVREF)
is being evaluated;
(3) Fixed the WEDGE_PARTITION when the NEAR_FORNEWMV mode (in NEWMVREF)
is being evaluated.
(4) Adjusted the entropy probability value for NEAR_FORNEW mode.
On derflr turning on all 14 experiments (except for global-motion), the
average gain w.r.t. PSNR is +0.07%:
Maximum on bridge_far_cif: +1.02%
Minimum on hallmonitor_cif: -0.16%
Change-Id: I4c9c6ee24a981af7e655a629580641d9f9745f91
Use separate token probabilities and counters for non-transform
blocks (pixel domain) . Initial probabilities are trained with screen_content
clips. On screen_content, it improves coding performance by about
2% (from +16.4% to +18.45%).
The initial probabilities are not optimized for natural videos. So it should
not be used for natural videos. Set FOR_SCREEN_CONTENT as 0/1 to specify
whether or not to enable this patch.
Change-Id: Ifa361c94bb62aa4b783cbfa50de08c3fecae0984
Implements a first version of global motion where the
existing ZEROMV mode is converted to a translation only
global motion mode.
A lot of the code for supporting a rotation-zoom affine
model is also incorporated.
WIP.
Change-Id: Ia1288a8dfe82f89484d4e291780288388e56d91b
The 8x8 DCT uses a fast version whenever possible.
There was a mistake in the checking code which
meant sometimes the fast version was used when it
was not safe to do so.
Change-Id: I154c84c9e2d836764768a11082947ca30f4b5ab7
In the case when there are only non-zero coefficients
in the first 4x4 block a special routine is called.
The highbitdepth optimized version of this routine
examined the wrong positions when deciding whether
to call an assembler or C inverse transform.
Change-Id: I62da663ca11775dadb66e402e42f4a1cb1927893
This is a combination of:
4a19fa6 Added sse2 acceleration for highbitdepth variance
c6f5d3b Fix high bit depth assembly function bugs
Change-Id: I446bdf3a405e4e9d2aa633d6281d66ea0cdfd79f
This change is made in preparation for a
subsequent patch which adds acceleration
for the highbitdepth transform functions.
The highbitdepth transform functions attempt
to use 16/32bit sse instructions where possible,
but fallback to using the C implementations if
potential overflow is detected. For this reason
the dct routines are made global so they can be
called from the acceleration functions in the
subsequent patch.
Change-Id: Ia921f191bf6936ccba4f13e8461624b120c1f665
It is a minor change, but the essential idea is to use the mv of the
top right block as the nearmv for the bottom left partition in the
sub8x8 block. The change is under the experiment of NEWMVREF.
When all 13 experiments are on (except for INTRABC), the gain is +0.05%:
Worse on bowing_cif: -0.17%
Best on foreman_cif: +0.42%; and bridge_far_cif: +0.40%
The total 13 experiments achieved a gain of +6.97% against base.
Change-Id: I3a51d9e28b34b0943fe16a984d62bfb38304ebca
Do not treat first element (dc) differently.
on screen_content
tx-skip only: +16.4% (was +15.45%)
no significant impact on natrual videos
Change-Id: I79415a9e948ebbb4a69109311c10126d8a0b96ab
Changes include:
* Uses double for RD cost computation to guard against overflow
for large resolution frames.
* Use previous frame's filter level to code the level better.
* Change precision of the filter parameters.
* Allow spatial variance for x and y to be different
Change-Id: I1669f65eb0ab1e8519962954c92d59e04f1277b7
derflr: +0.556% (a little up from before)
Use raster scan order for non-transform blocks
+15.45% (+2.1%) on screen_content
no significant change on natural videos
Change-Id: I0e264cb69e8624540639302d131f7de9c31c3ba7
Adds an internal buffer in the encoder to store the deblocked
result to help speed up the search for the best bilateral filter.
Very small change in performance but a lot faster:
derflr: +0.518%
Change-Id: I5d37e016088e559c16317789cfb1c2f49334b2b9
+0.3% on 10-bit
+0.3% on 12-bit
With other high bit compatible experiments on 12-bit
+12.44% (+0.17) over 8-bit baseline
Change-Id: I40b4c382fa54ba4640d08d9d01950ea8c1200bc9
Adds a framework to incorporate a parameterized loop
postfilter in the coding loop after the application of the
standard deblocking loop filter.
The first version uses a straight bilateral filter
where the parameters conveyed are just spatial and
intensity gaussian variances.
Results on derflr:
+0.523% (only with this experiment)
+6.714% (with all expts other than intrabc)
Change-Id: I20d47285b4d25b8c6386ff8af2a75ff88ac2b69b
This patch allows the prediction residues of tx-skipped blocks
to use probs that are different from regular transfrom
coefficients for token entropy coding. Prediction residues are
assumed as in band 6.
The initial value of probs is obtained with stats from limited
tests. The statistic model for constrained token nodes has not
been optimized. The probs for token extra bits have not been
optimized. These can be future work.
Certain coding improvment is observed:
derflr with all experiments: +6.26% (+0.10%)
screen_content with palette: +22.48% (+1.28%)
Change-Id: I1c0d78178ee9f3655febb6f30cdaef8ee9f8e3cc
This experiment, referred as NEWMVREF, also merged with NEWMVREF_SUB8X8
and the latter one has been removed. Runborgs results show that:
(1) Turning on this experiment only, compared against the base:
derflf: Average PSNR 0.40%; Overall PSNR 0.40%; SSIM 0.35%
(2) Turning on all the experiments including this feature, compared against
that without this feature, on the highbitdepth case using 12-bit:
derflf: Average PSNR 0.33%; Overall PSNR 0.32%; SSIM 0.30%.
Now for highbitdepth using 12-bit, compared against base:
derflf: Average PSNR 11.12%; Overall PSNR 11.07%; SSIM 20.27%.
Change-Id: Ie61dbfd5a19b8652920d2c602201a25a018a87a6
The basic idea is to use a pixel’s neighboring colors as
context to predict its own color. Up to 4 neighbors are
considered here: left, left-above, above, right-above.
To reduce the number of contexts, the combination of any
4 (or less) colors are mapped to a reduced number of
patterns. For example, 1111, 2222, 3333, … , can be mapped
to the same pattern: AAAA. SImilarly, 1122, 1133, 2233, …,
can be mapped to the pattern AABB. In this way, the total
number of color contexts is reduced to 16.
This almost doubles the gain of palette coding on screen
content videos.
on screen_content
--enable-palette +14.2%
--enable-palette --enable-tx-skip +21.2%
on derflr
--enable-palette +0.12%
with all other experiments +6.16%
Change-Id: I560306dae216f2ac11a9214968c2ad2319fa1718
Also make changes to transmit palette-enabled flag using
neighbor blocks as context.
on screen_content
--enable-palette +7.35%
on derflr
with all other experiments +6.05%
Change-Id: Id6c2f726d21913d54a3f86ecfea474a4044c27f6
on screen_content
--enable-palette +6.74%
on derflr
with all other experiments +6.02%
(--enable-supertx --enable-copy-mode
--enable-ext-tx --enable-filterintra
--enable-tx64x64 --enable-tx-skip
--enable-interintra --enable-wedge-partition
--enable-compound-modes --enable-new-quant
--enable-palette)
Change-Id: Ib85049b4c3fcf52bf95efbc9d6aecf53d53ca1a3
This framework allows lower quantization bins to be shrunk down or
expanded to match closer the source distribution (assuming a generalized
gaussian-like central peaky model for the coefficients) in an
entropy-constrained sense. Specifically, the width of the bins 0-4 are
modified as a factor of the nominal quantization step size and from 5
onwards all bins become the same as the nominal quantization step size.
Further, different bin width profiles as well as reconstruction values
can be used based on the coefficient band as well as the quantization step
size divided into 5 ranges.
A small gain currently on derflr of about 0.16% is observed with the
same paraemters for all q values.
Optimizing the parameters based on qstep value is left as a TODO for now.
Results on derflr with all expts on is +6.08% (up from 5.88%).
Experiments are in progress to tune the parameters for different
coefficient bands and quantization step ranges.
Change-Id: I88429d8cb0777021bfbb689ef69b764eafb3a1de
For 444 videos, a single palette of 3-d colors is
generated for YUV. For 420 videos, there may be two
palettes, one for Y, and the other for UV.
Also fixed a bug when palette and tx-skip are both on.
on derflr
--enable-palette +0.00%
with all experiments +5.87% (was +5.93%)
on screen_content
--enable-palette +6.00%
--enable-palette --enable-tx_skip +15.3%
on screen_content 444 version
--enable-palette +6.76%
--enable-palette --enable-tx_skip +19.5%
Change-Id: I7287090aecc90eebcd4335d132a8c2c3895dfdd4
All stats look fine.
derflr: +0.912 with respect to 10-bit internal baseline
(Was +0.747% w.r.t. 8 bit)
+5.545 with respect to 8-bit baseline
Change-Id: I3c14fd17718a640ea2f6bd39534e0b5cbe04fb66