Tests showed ~1.2% performance boost on the HD clip used.
Performance will vary based on material.
Change-Id: Icbcf1a828750d5b4ae5252bf596b3ef594042e8a
Interleaved vp8_find_near_mvs and vp8_mv_ref_probs.
2.5% to 4% performance improvement for the HD clips used.
Change-Id: Id888b667cf5ae2f0e19da18743140f055ff7de8d
The partial frame copy function used to copy an extra 8 lines above
and below. The partial frame filtering can only modify 3 pixel rows
above the partial frame. Reduce copy to bare minimum needed, which is
4 lines, so that partial filtering on copied frame is possible.
Define the "magic" fraction number for partial filtering in
loopfilter.h .
Change-Id: I4791ffc541b6884b12759a0d0714a8faf16147ec
Restructure if statement to clarify the error condition. Trigger the
error before clobbering pc-> variables.
Change-Id: Id01cab798a341ce9899078fdcec265a0e942a0b7
Global function pointers can not be defined in header files. Restructure
vpx_scale pointer configuration.
Change-Id: I6f568a263ad770d32f530abad6007f990fd1003a
Decode the mv mode with if-then-elses instead of traversing
the vp8_mv_ref_tree data structure. This will make it
easier to interleave vp8_find_near_mvs and vp8_mv_ref_probs.
Change-Id: I1e798d6ec40fcaeeff06ccc82f81201978d12f74
Prior to the added rounding, tests on randomly generated data showed
that forward-inverse transform round trip errors are about 3.02/block
for input range [-10,10] and 2.68/block for input range [-256, 255].
The added rounding reduced the errors to 0.031/block for input range
[-10,10] and 0.037/block for input range [-256, 255].
Maximum round trip error on for any pixel position is 1.
The average errors are calculated based on 100,000 blocks of randomly
with the specified ranges.
Paul mentioned in discussion that the change was not clear on why we
need change the rounding, so Patch 2 intends to make the rationale
obvious in code, it merged the two separate shifts into one, and the
two separate rounding factors into one. Patch 1 and 2 have same
numerical test results.
Change-Id: Ic5e2f5463de17253084d8b2398c4a210194b20de
Only encode sign bit for feature data that can have a sign.
Tweaks to the test segmentation rules so that it now actually gives
a net benefit on the derf set of about 0.4% though much higher
on some clips at the low end.
Change-Id: I8e61f1aebf41c9037db7e67e2f8975aa18a0c986
This quite large check in includes the following:
Merge in some code from Ronald (mbgraph.c) that scans a Gf/arf group.
This is used as a basis for a simple segmentation for the normal frames
in a gf/arf group. This code also uses satd functions from Yaowu.
Adds functionality for coding the latest possible position of an EOB for
blocks in the segment. (Currently 0-15 only, hence just for 4x4 dct).
Where the EOB position is 0 this acts like "skip" and the normal coding
of skip at the per mb level is disabled.
Added functions (seg_common.c) for setting and reading segment feature
elements. These may want to be optimized away at some point but while the
mecahnism is in a state of flux they provide a single location for making
changes and keep things a bit cleaner.
This is still proof of concept code. Currently the tested feature set:-
Quantizer,
Loop Filter level,
Reference frame,
Prediction Mode,
EOB end stop.
TBD:-
Add functions for setting and reading the feature data with range
and validity checking.
Handling of signed and unsigned feature data. At the moment all is assumed
to be signed and a sign bit is coded but many cannot be negative.
Correct handling of EOB feature with intra coded blocks.
Testing/trapping of legal/illegal ref frame and mode combinations.
Transform size switch plus merge and test with 8c8 DCT work
Merge and test with Sumans Segmenation coding optimizations
Change-Id: Iee12e83661c7abbd1e0ce6810915eb4ec35e2d8e
check to make sure that cx_data buffer has enough room before
writting to it, prior behavior did not which could result in a crash.
Change-Id: I3fab6f2bc4a96d7c675ea81acd39ece121738b28
During the _pick only the Y plane is examined. In addition, data beyond
the borders of the frame is not read.
Change-Id: Ic549adfca70fc6e0b55f8aab0efe81f0afac89f9
Instead of using the predict buffer, the decoder now writes
the predictor into the recon buffer. For blocks with eob=0,
unnecessary idcts can be eliminated. This gave a performance
boost of ~1.8% for the HD clips used.
Tero: Added needed changes to ARM side and scheduled some
assembly code to prevent interlocks.
Patch Set 6: Merged (I1bcdca7a95aacc3a181b9faa6b10e3a71ee24df3)
into this commit because of similarities in the idct
functions.
Patch Set 7: EC bug fix.
Change-Id: Ie31d90b5d3522e1108163f2ac491e455e3f955e6
NEON version of copyframeyonly, extendframeborders, copy_frame_func were
not working for plane stride < 128 and/or y_width < 128.
Change-Id: Id6c2e6c795274da0c90134b15c0d5f62d1b17a6c
It was crashing when number of partitions was bigger than the number
of MB rows (ex. 128x96 with 8 partitions).
Start point was not checked against mb_rows, plus extra
"empty" partitions were not written out.
Change-Id: I9c2f013b9ec022354b658fab4ef799ff8b1de93d
Added the ability to create rate-targeted, temporally
scalable, VP8 compatible bitstreams.
The application vp8_scalable_patterns.c demonstrates how
to use this capability. Users can create output bitstreams
containing upto 5 temporally separable streams encoded
as a single VP8 bitstream.
(previously abandoned as:
I92d1483e887adb274d07ce9e567e4d0314881b0a)
Change-Id: I156250a3fe930be57c069d508c41b6a7a4ea8d6a