The commit rationized and simplified the entropy context conversion
betwen MB using 8x8 transform and MB using 4x4 transform. The old version
had a number of weirdness in how 4x4 transform MB's context is used for
8x8 blocks other than the first 8x8 within a MB.
Test showed the change has a gain ~.1% for avg psnr, glb psnr and ssim on
the limited HD set.
Change-Id: I774536c416baa6845aa741f956d8a69fa40e5d47
Added a bit to signify that the feature changed since
the last time we sent it, or not so that we don't need
to send all the databits for every feature change.
added config
Change-Id: I8d3064ce90d4500bf0d5c6b87c664e46138dfcac
Merged in most of the current common prediction changes
that were under the #if CONFIG_COMPRED option.
Change-Id: If4e6f61dbe7b86dd449f6effbe93b5eb7e893885
This commit merges the NEWNEAR experiment such that it
is effectively always on.
The fact that there were changes in the threading code again
highlights the need to strip out such features during the
bitstream development phase as trying to maintain this code
(especially as it is not being tested) slows the development cycle.
Change-Id: I8b34950a1333231ced9928aa11cd6d6459984b65
This function adds the common prediction modules, some data structures
and a config option but does not use them.
It also corrects a bug in clearing down the MODE_INFO border and introduces
a new element that indicates if an entry corresponds to an "in image" macro block
or is part of the border.
Change-Id: Ib69eec0876173ebe9d1de9df9537d0b2447702e0
This commit removed the macro CONFIG_I8X8, which was used to indicate
the 8x8 intra prediction experiment, made the change fully merged in.
Change-Id: Iafa4443781ce6e83f5591c12ba615a0e92ce0ea0
This patch introduces the concept of dual inter16x16 prediction. A
16x16 inter-predicted macroblock can use 2 references instead of 1,
where both references use the same mvmode (new, near/est, zero). In the
case of newmv, this means that two MVs are coded instead of one. The
frame can be encoded in 3 ways: all MBs single-prediction, all MBs dual
prediction, or per-MB single/dual prediction selection ("hybrid"), in
which case a single bit is coded per-MB to indicate whether the MB uses
single or dual inter prediction.
In the future, we can (maybe?) get further gains by mixing this with
Adrian's 32x32 work, per-segment dual prediction settings, or adding
support for dual splitmv/8x8mv inter prediction.
Gain (on derf-set, CQ mode) is ~2.8% (SSIM) or ~3.6% (glb PSNR). Most
gain is at medium/high bitrates, but there's minor gains at low bitrates
also. Output was confirmed to match between encoder and decoder.
Note for optimization people: this patch introduces a 2nd version of
16x16/8x8 sixtap/bilin functions, which does an avg instead of a
store. They may want to look and make sure this is implemented to
their satisfaction so we can optimize it best in the future.
Change-ID: I59dc84b07cbb3ccf073ac0f756d03d294cb19281
this commit is to add an variable in the macroblock level mode
info structure to track the transform size used in each MB, so
the information can be used later in the loop filter to change
how loop filter works on MBs with different transform sizes.
Change-Id: Id0eeaba6cc854c6d1be00ed8d237b3d9e250e447
This is an experiment to include a mv contribution from last frame to
nearest and near mv definition. Initial test showed some small though
consistent gain.
latest patch slightly better result ~.13%-~.18%.
TODO: the entropy used to encode the mode choice, i.e. the mv counts
based conditional distribution of modes should be re-collected to
reflect this change, it is expected that there is some further gain
from that.
Change-Id: Ief1e284a36d8aa56b49ae5b360c91419ec494fa4
Removal of CONFIGURE_SEGMENTATION ifdefs.
Removal of legacy support code fo the old coding mechanism.
Use local reference "xd" for MACROBLOCKD structure in
encode_frame_to_data_rate()
Moved call to choose_segmap_coding_method() out of encode
loop as the cost of segmentation is not properly accounted
in the loop anyway. If this is desirable in the future it
can be moved back. The use of this function to do all the
analysis and set the probabilities also removes the need
to track segment useage in threading code.
Change-Id: I85bc8fd63440e7176c73d26cb742698f9b70cade
Changed name and sense of segment_flag to "seg_id_predicted"
Added some additional comments and retested.
I also did some experimentation with a spatial prediction option
using a similar strategy to the temporal mode implemented.
This helps in some cases where temporal prediction is bad but
I suspect there is more overlap here with work on a larger scale
block structure and spatial correlation will likely be better
handled through that mechanism.
Next check in will remove #ifdefs and legacy mode code.
Change-Id: I3b382b65ed2a57bd7775ac0f3a01a9508a209cbc
This check in includes quite a lot of clean up and refactoring.
Most of the analysis and set up for the different coding options for the
segment map (currently simple distribution based coding or temporaly
predicted coding), has been moved to one location (the function
choose_segmap_coding_method() in segmenation.c). This code was previously
scattered around in various locations making integration with other
experiments and modification / debug more difficult.
Currently the functionality is as it was with the exception that the
prediction probabilities are now only transmitted when the temporal
prediction mode is selected.
There is still quite a bit more clean up work that will be possible
when the #ifdef is removed. Also at that time I may rename and alter
the sense of macroblock based variable "segment_flag" which indicates
(1 that the segmnet id is not predicted vs 0 that it is predicted).
I also intend to experiment with a spatial prediction mode that can be
used when coding a key frame segment map or in cases where temporal
prediction does not work well but there is spatial correlation.
In a later check in when the ifdefs have gone I may also move the call
to choose_segmap_coding_method() to just before where the bitsream is
packed (currently it is in vp8_encode_frame()) to further reduce the
possibility of clashes with other experiments and prevent it being called
on each itteration of the recode loop.
Change-Id: I3d4aba2a2826ec21f367678d5b07c1d1c36db168
Temporary check in to turn off other segment features
tests when #if CONFIG_T8X8 is set as the assignment of
MBs to differnt segments in each case will conflict.
The 8x8 code will be modified to use the new segment
feature method properly in a later check in.
Increase bits allowed for EOB end stop marker to 6 ready
for 8x8.
Change-Id: I4835bc8d3bf98e1775c3d247d778639c90b01f7f
Removal of configure #ifdefs so that segment features
always available. Removal of code supporting old
segment feature method.
Still a good deal of tidying up to do.
Change-Id: I397855f086f8c09ab1fae0a5f65d9e06d2e3e39f
This quite large check in includes the following:
Merge in some code from Ronald (mbgraph.c) that scans a Gf/arf group.
This is used as a basis for a simple segmentation for the normal frames
in a gf/arf group. This code also uses satd functions from Yaowu.
Adds functionality for coding the latest possible position of an EOB for
blocks in the segment. (Currently 0-15 only, hence just for 4x4 dct).
Where the EOB position is 0 this acts like "skip" and the normal coding
of skip at the per mb level is disabled.
Added functions (seg_common.c) for setting and reading segment feature
elements. These may want to be optimized away at some point but while the
mecahnism is in a state of flux they provide a single location for making
changes and keep things a bit cleaner.
This is still proof of concept code. Currently the tested feature set:-
Quantizer,
Loop Filter level,
Reference frame,
Prediction Mode,
EOB end stop.
TBD:-
Add functions for setting and reading the feature data with range
and validity checking.
Handling of signed and unsigned feature data. At the moment all is assumed
to be signed and a sign bit is coded but many cannot be negative.
Correct handling of EOB feature with intra coded blocks.
Testing/trapping of legal/illegal ref frame and mode combinations.
Transform size switch plus merge and test with 8c8 DCT work
Merge and test with Sumans Segmenation coding optimizations
Change-Id: Iee12e83661c7abbd1e0ce6810915eb4ec35e2d8e
This commit added a 3 bit index to the bitstream, the index is used to
look into the intra mode coding entropy context table. The commit uses
the mode stats to calculate the cost of transmitting modes using 8
possible entropy distributions, and selects the distribution that
provides the lowest cost to do the actual mode coding.
Initial test show this provides additional .2%~.3% gain over quantizer
adaptive intra mode coding. So the adaptive intra mode coding provides
a total of .5%(psnr) to .6% gain(ssim) combined for all-key-encoding
To build and test, configure with
--enable-experimental --enable-qimode
Change-Id: I7c41cd8bfb352bc1fe7c5da1848a58faea5ed74a
make intra mode coding entropy distribution adaptive to baseQindex, an
encoding test on hd clips with all key frame shows universal gain on
all clips in both .2%(psnr) and (ssim).3%.
To build and test, configure with
--enable-experimental --enable-qimode
Change-Id: Iaa69241b984d4fdd8baa6d77ee78c0140f5ac00a
Patch 1 to Patch 3 is an initial implementation of 8x8 intra prediction
modes, here are with the following assumptions:
a. 8x8 has 4 prediction modes DC, H, V and TM
b. UV 4x4 block use the same mode as corresponding 8x8 area
c. i8x8 modes are enabled for key frame only for now
Patch 4:
d. removed debug code from previous patches
Patch 5:
e. added stats code to collect entropy stats and further cleaned up
Patch 6:
f. changed mode stats code to collect finer stats of modes
Patch 7:
g. normalized i8x8 modes distribution to total at 256 (8bits).
Patch 8:
h. fixed a bug in decoder and removed debug printf output.
Patch 9:
i. more cleanups to address paul's comment
Patch 10:
j. messy rebase/merges to bring the commit up to date.
Tests on HD clips encoded with all key frame showing consistent gain
on all clips and all metrics:~0.5%(psnr) and 0.6%(ssim):
http://www.corp.google.com/~yaowu/no_crawl/i8x8hd_allkey_fixedq.html
To build and test, configure with:
--enable-experimental --enable-i8x8
Change-Id: I9813fe07ae48cab5fdb5d904bca022514ad01e7f
This data structure is now [Segment ID][Features]
rather than [Features][Segment_ID]
I propose as a separate modification to make the experimental
bit stream reflect this such that all the features for a segment
are coded together.
Change-Id: I581e4e3ca2033bdbdef3d9300977a8202f55b4fb
Some basic plumbing added for a range of segment level features.
MB_LVL_* changed to SEG_LVL_* to better reflect meaning.
Change-Id: Iac96da36990aa0e40afc0d86e990df337fd0c50b
for SPLITMV and B_PRED modes. Modified code to use the bmi
found in mode_info_context instead of BLOCKD. On the decode
side, the uvmvs are calculated only when required, instead of
every macroblock. This is WIP. (bmi should eventually be
removed from BLOCKD)
Small performance gains noticed for RT encodes and decodes.(VGA)
Change-Id: I2ed7f0fd5ca733655df684aa82da575c77a973e7
sharpness was not recalculated in vp8cx_pick_filter_level_fast
remove last_filter_type. all values are calculated, don't need to update
the lfi data when it changes.
always use cm->sharpness_level. the extra indirection was annoying.
don't track last frame_type or sharpness_level manually. frame type
only matters for motion search and sharpness_level is taken care of in
frame_init
move function declarations to their proper header
Change-Id: I7ef037bd4bf8cf5e37d2d36bd03b5e22a2ad91db
In sub-pixel motion search, the search range is small(+/- 3 pixels).
Preload whole search area from reference buffer into a 32-byte
aligned buffer. Then in search, load reference data from this buffer
instead. This keeps data in cache, and reduces the crossing cache-
line penalty. For tulip clip, tests on Intel Core2 Quad machine(linux)
showed encoder speed improvement:
3.4% at --rt --cpu-used =-4
2.8% at --rt --cpu-used =-3
2.3% at --rt --cpu-used =-2
2.2% at --rt --cpu-used =-1
Test on Atom notebook showed only 1.1% speed improvement(speed=-4).
Test on Xeon machine also showed less improvement, since unaligned
data access latency is greatly reduced in newer cores.
Next, I will apply similar idea to other 2 sub-pixel search functions
for encoding speed > 4.
Make this change exclusively for x86 platforms.
Change-Id: Ia7bb9f56169eac0f01009fe2b2f2ab5b61d2eb2f
With this fix, the experimental branch now builds and encodes correctly
with the following two configure options respectively:
--enable-experimental --enable-t8x8
--enable-experimental
Change-Id: I3147c33c503fe713a85fd371e4f1a974805778bf
Declared the bmi in BLOCKD as a union instead of B_MODE_INFO.
Then removed B_MODE_INFO completely.
Change-Id: Ieb7469899e265892c66f7aeac87b7f2bf38e7a67
Declared the bmi in MODE_INFO as a union instead of B_MODE_INFO.
This reduced the memory footprint by 518,400 bytes for 1080
resolutions. The decoder performance improved by ~4% for the
clip used and the encoder showed very small improvements. (0.5%)
This reduction was first mentioned to me by John K. and in a
later discussion by Yaowu.
This is WIP.
Change-Id: I8e175fdbc46d28c35277302a04bee4540efc8d29
The compiler produces better assembly when using int_mv
for assignments. The compiler shifts and ors the two 16bit
values when assigning MV.
Change-Id: I52ce4bc2bfbfaf3f1151204b2f21e1e0654f960f
The dc_diff flag is used to skip loopfiltering. Instead
of setting this flag in the decoder/encoder, we now check
for this condition in the loopfilter.
Change-Id: Ie2b9cdf9e0f4e8b932bbd36e0878c05bffd28931