Tests showed ~1.2% performance boost on the HD clip used.
Performance will vary based on material.
Change-Id: Icbcf1a828750d5b4ae5252bf596b3ef594042e8a
Interleaved vp8_find_near_mvs and vp8_mv_ref_probs.
2.5% to 4% performance improvement for the HD clips used.
Change-Id: Id888b667cf5ae2f0e19da18743140f055ff7de8d
The partial frame copy function used to copy an extra 8 lines above
and below. The partial frame filtering can only modify 3 pixel rows
above the partial frame. Reduce copy to bare minimum needed, which is
4 lines, so that partial filtering on copied frame is possible.
Define the "magic" fraction number for partial filtering in
loopfilter.h .
Change-Id: I4791ffc541b6884b12759a0d0714a8faf16147ec
Restructure if statement to clarify the error condition. Trigger the
error before clobbering pc-> variables.
Change-Id: Id01cab798a341ce9899078fdcec265a0e942a0b7
Decode the mv mode with if-then-elses instead of traversing
the vp8_mv_ref_tree data structure. This will make it
easier to interleave vp8_find_near_mvs and vp8_mv_ref_probs.
Change-Id: I1e798d6ec40fcaeeff06ccc82f81201978d12f74
check to make sure that cx_data buffer has enough room before
writting to it, prior behavior did not which could result in a crash.
Change-Id: I3fab6f2bc4a96d7c675ea81acd39ece121738b28
During the _pick only the Y plane is examined. In addition, data beyond
the borders of the frame is not read.
Change-Id: Ic549adfca70fc6e0b55f8aab0efe81f0afac89f9
Instead of using the predict buffer, the decoder now writes
the predictor into the recon buffer. For blocks with eob=0,
unnecessary idcts can be eliminated. This gave a performance
boost of ~1.8% for the HD clips used.
Tero: Added needed changes to ARM side and scheduled some
assembly code to prevent interlocks.
Patch Set 6: Merged (I1bcdca7a95aacc3a181b9faa6b10e3a71ee24df3)
into this commit because of similarities in the idct
functions.
Patch Set 7: EC bug fix.
Change-Id: Ie31d90b5d3522e1108163f2ac491e455e3f955e6
It was crashing when number of partitions was bigger than the number
of MB rows (ex. 128x96 with 8 partitions).
Start point was not checked against mb_rows, plus extra
"empty" partitions were not written out.
Change-Id: I9c2f013b9ec022354b658fab4ef799ff8b1de93d
Added the ability to create rate-targeted, temporally
scalable, VP8 compatible bitstreams.
The application vp8_scalable_patterns.c demonstrates how
to use this capability. Users can create output bitstreams
containing upto 5 temporally separable streams encoded
as a single VP8 bitstream.
(previously abandoned as:
I92d1483e887adb274d07ce9e567e4d0314881b0a)
Change-Id: I156250a3fe930be57c069d508c41b6a7a4ea8d6a
buffer_level in VP8_COMP and starting_buffer_level, optimal_buffer_level
and maximum_buffer_size in VP8_CONFIG changed from int to int64_t
to avoid potential crash issues for larger target bit rates.
Change-Id: I0d5ab6c8a44c2fef51f30cd8df4bb4b739c5df26
Previous entropy probs need to be saved (and restored) only when
current updates are not propagated.
Change-Id: Ie6ee0543066e30874e56258be0a6b7d2dd2fdb2b
Uninitialized data could be written to the first pass file when no
motion vectors are present in the frame.
Also fix a number of compiler warnings.
Change-Id: Icc9f53b6d33da9de4563d86d9fd591910473ea90
The data processed by the loopfilter overlaps. At the block level, this
results in some redundant transforms. Grouping the filtering allows for
a single 16x16 transpose (and inversion) instead of three 16x8 transposes
(and three more inversions).
This implementation is x86_64 only. We retain the previous
implementation for x86.
Improvements are obviously material dependant, but it seems to be ~%1 in
tests here.
Change-Id: I467b7ec3655be98fb5f1a94b5d145e5e5a660007
vp8_find_near_mvs() is being called on all possible reference frames
but the data computed may be used if the loop exits early, which can
be due to x->skip beign set to 1.
Optimize this by call vp8_find_near_mvs() laziy only if it is going
to be used and not computed yet.
Change-Id: Iccdbd4c962a670c9f2c99b8aca8096042ca5dc98
Changes to the selection of Q limits for two pass
and two pass CQ mode.
Allowance made for Mode and motion vector costs.
Some refactoring of common code.
For Derf and YT sets CQ mode average improvement
circa 1% (SSIM and Global PSNR).
Some increased tendency to undershoot even when
user CQ not reached.
Patch2: Removed some test code accidentally merged.
Change-Id: Icf74d13af77437c08602571dc7a97e747cce5066
Sync with loopfilter thread just at the beginning of next frame encoding.
This returns control to application faster and allows a better multicore scaling.
When PSNR packets are generated the final filtered frame is needed imediatly
so we cannot delay the sync.
Change-Id: I288d97b5e331d41d6f5bb49d97986fa12ac6f066