Prior to the added rounding, tests on randomly generated data showed
that forward-inverse transform round trip errors are about 3.02/block
for input range [-10,10] and 2.68/block for input range [-256, 255].
The added rounding reduced the errors to 0.031/block for input range
[-10,10] and 0.037/block for input range [-256, 255].
Maximum round trip error on for any pixel position is 1.
The average errors are calculated based on 100,000 blocks of randomly
with the specified ranges.
Paul mentioned in discussion that the change was not clear on why we
need change the rounding, so Patch 2 intends to make the rationale
obvious in code, it merged the two separate shifts into one, and the
two separate rounding factors into one. Patch 1 and 2 have same
numerical test results.
Change-Id: Ic5e2f5463de17253084d8b2398c4a210194b20de
Only encode sign bit for feature data that can have a sign.
Tweaks to the test segmentation rules so that it now actually gives
a net benefit on the derf set of about 0.4% though much higher
on some clips at the low end.
Change-Id: I8e61f1aebf41c9037db7e67e2f8975aa18a0c986
This quite large check in includes the following:
Merge in some code from Ronald (mbgraph.c) that scans a Gf/arf group.
This is used as a basis for a simple segmentation for the normal frames
in a gf/arf group. This code also uses satd functions from Yaowu.
Adds functionality for coding the latest possible position of an EOB for
blocks in the segment. (Currently 0-15 only, hence just for 4x4 dct).
Where the EOB position is 0 this acts like "skip" and the normal coding
of skip at the per mb level is disabled.
Added functions (seg_common.c) for setting and reading segment feature
elements. These may want to be optimized away at some point but while the
mecahnism is in a state of flux they provide a single location for making
changes and keep things a bit cleaner.
This is still proof of concept code. Currently the tested feature set:-
Quantizer,
Loop Filter level,
Reference frame,
Prediction Mode,
EOB end stop.
TBD:-
Add functions for setting and reading the feature data with range
and validity checking.
Handling of signed and unsigned feature data. At the moment all is assumed
to be signed and a sign bit is coded but many cannot be negative.
Correct handling of EOB feature with intra coded blocks.
Testing/trapping of legal/illegal ref frame and mode combinations.
Transform size switch plus merge and test with 8c8 DCT work
Merge and test with Sumans Segmenation coding optimizations
Change-Id: Iee12e83661c7abbd1e0ce6810915eb4ec35e2d8e
This commit added a 3 bit index to the bitstream, the index is used to
look into the intra mode coding entropy context table. The commit uses
the mode stats to calculate the cost of transmitting modes using 8
possible entropy distributions, and selects the distribution that
provides the lowest cost to do the actual mode coding.
Initial test show this provides additional .2%~.3% gain over quantizer
adaptive intra mode coding. So the adaptive intra mode coding provides
a total of .5%(psnr) to .6% gain(ssim) combined for all-key-encoding
To build and test, configure with
--enable-experimental --enable-qimode
Change-Id: I7c41cd8bfb352bc1fe7c5da1848a58faea5ed74a
make intra mode coding entropy distribution adaptive to baseQindex, an
encoding test on hd clips with all key frame shows universal gain on
all clips in both .2%(psnr) and (ssim).3%.
To build and test, configure with
--enable-experimental --enable-qimode
Change-Id: Iaa69241b984d4fdd8baa6d77ee78c0140f5ac00a
Patch 1 to Patch 3 is an initial implementation of 8x8 intra prediction
modes, here are with the following assumptions:
a. 8x8 has 4 prediction modes DC, H, V and TM
b. UV 4x4 block use the same mode as corresponding 8x8 area
c. i8x8 modes are enabled for key frame only for now
Patch 4:
d. removed debug code from previous patches
Patch 5:
e. added stats code to collect entropy stats and further cleaned up
Patch 6:
f. changed mode stats code to collect finer stats of modes
Patch 7:
g. normalized i8x8 modes distribution to total at 256 (8bits).
Patch 8:
h. fixed a bug in decoder and removed debug printf output.
Patch 9:
i. more cleanups to address paul's comment
Patch 10:
j. messy rebase/merges to bring the commit up to date.
Tests on HD clips encoded with all key frame showing consistent gain
on all clips and all metrics:~0.5%(psnr) and 0.6%(ssim):
http://www.corp.google.com/~yaowu/no_crawl/i8x8hd_allkey_fixedq.html
To build and test, configure with:
--enable-experimental --enable-i8x8
Change-Id: I9813fe07ae48cab5fdb5d904bca022514ad01e7f
This data structure is now [Segment ID][Features]
rather than [Features][Segment_ID]
I propose as a separate modification to make the experimental
bit stream reflect this such that all the features for a segment
are coded together.
Change-Id: I581e4e3ca2033bdbdef3d9300977a8202f55b4fb
Some basic plumbing added for a range of segment level features.
MB_LVL_* changed to SEG_LVL_* to better reflect meaning.
Change-Id: Iac96da36990aa0e40afc0d86e990df337fd0c50b
for SPLITMV and B_PRED modes. Modified code to use the bmi
found in mode_info_context instead of BLOCKD. On the decode
side, the uvmvs are calculated only when required, instead of
every macroblock. This is WIP. (bmi should eventually be
removed from BLOCKD)
Small performance gains noticed for RT encodes and decodes.(VGA)
Change-Id: I2ed7f0fd5ca733655df684aa82da575c77a973e7
Prepend idct function names with vp8_
so that under profiling they show up
associated with libvpx.
Change-Id: I4fe357b50236cb7730a4cc00164c0a3487a1d8b4
The data that the simple horizontal loopfilter reads is aligned, treat
it accordingly.
For the vertical, we only use the bottom 4 bytes, so don't read in 16
(and incur the penalty for unaligned access).
This shows a small improvement on older processors which have a
significant penalty for unaligned reads.
postproc_mmx.c is unused
Change-Id: I87b29bbc0c3b19ee1ca1de3c4f47332a53087b3d
Prepend . to local labels in assembly code. This
allows non unique labels within a file. Also
makes profiling information more informative
by keeping the function name with the loop name.
Change-Id: I7a983cb3a5ba2413d5dafd0a37936b268fb9e37f
When active map is specified and the current frame is not a key frame,
golden frame nor a altref frame then copy only those active regions.
This significantly reduces encoding time by as much as 19% on the test
system where realtime encoding is used. This is particularly useful
when the frame size is large (e.g. 2560x1600) and there's only a few
action macroblocks.
Change-Id: If394a813ec2df5a0201745d1348dbde4278f7ad4
This reverts commit b5ea2fbc2c. Further
testing showed noticable keyframe popping in some cases, reverting this
for now to give time for a proper fix.
Conflicts:
vp8/encoder/onyx_if.c
vp8/encoder/ratectrl.c
Change-Id: I159f53d1bf0e24c035754ab3ded8ccfd58fd04af
the neon code made several assumptions which were broken by a recent
change: https://review.webmproject.org/2676
update the code with new assumptions and guard them with a compile time
assert
Change-Id: I32a8378030759966068f34618d7b4b1b02e101a0
sharpness was not recalculated in vp8cx_pick_filter_level_fast
remove last_filter_type. all values are calculated, don't need to update
the lfi data when it changes.
always use cm->sharpness_level. the extra indirection was annoying.
don't track last frame_type or sharpness_level manually. frame type
only matters for motion search and sharpness_level is taken care of in
frame_init
move function declarations to their proper header
Change-Id: I7ef037bd4bf8cf5e37d2d36bd03b5e22a2ad91db
In sub-pixel motion search, the search range is small(+/- 3 pixels).
Preload whole search area from reference buffer into a 32-byte
aligned buffer. Then in search, load reference data from this buffer
instead. This keeps data in cache, and reduces the crossing cache-
line penalty. For tulip clip, tests on Intel Core2 Quad machine(linux)
showed encoder speed improvement:
3.4% at --rt --cpu-used =-4
2.8% at --rt --cpu-used =-3
2.3% at --rt --cpu-used =-2
2.2% at --rt --cpu-used =-1
Test on Atom notebook showed only 1.1% speed improvement(speed=-4).
Test on Xeon machine also showed less improvement, since unaligned
data access latency is greatly reduced in newer cores.
Next, I will apply similar idea to other 2 sub-pixel search functions
for encoding speed > 4.
Make this change exclusively for x86 platforms.
Change-Id: Ia7bb9f56169eac0f01009fe2b2f2ab5b61d2eb2f
With this fix, the experimental branch now builds and encodes correctly
with the following two configure options respectively:
--enable-experimental --enable-t8x8
--enable-experimental
Change-Id: I3147c33c503fe713a85fd371e4f1a974805778bf
allowing the compiler to inline this function. For real-time
encodes, this gave a boost of 1% to 2.5%, depending on the
speed setting.
Change-Id: I3929d176cca086b4261267b848419d5bcff21c02
This patch attempts to improve the handling of CBR streams with
respect to the short term buffering requirements. The "buffer level"
is changed to be an average over the rc buffer, rather than a long
running average. Overshoot is also tracked over the same interval
and the golden frame targets suppressed accordingly to correct for
overly aggressive boosting.
Testing shows that this is fairly consistently positive in one
metric or another -- some clips that show significant decreases
in quality have better buffering characteristics, others show
improvenents in both.
Change-Id: I924c89aa9bdb210271f2e03311e63de3f1f8f920