This patch followed "Rewrite filter_selectively_horiz for parallel
loopfiltering" commit, and added x86 SSE2 optimization to do
16-pixel filtering in parallel. Also, corrected the declaration
of aligned arrays. For 8-pixel-in-parallel case, improved the
calculation of the masks and filters. Updated the threshold loading
since the thresholds were already duplicated. Updated neon C functions
to call neon loopfilters twice.
Using tulip clip, tests showed it gave a ~1.5% decoder speed gain.
Change-Id: Id02638626ac27a4b0e0b09d71792a24c0499bd35
As Jim suggested, 1D array was used to store filter levels instead
of 2D array. This used shift_y in setup_mask directly, and saved
few cycles.
Change-Id: If61ab298784861f1806b1cd396d4e4e2e0f097b9
Implements scan order to band map with arrays in both the encoder
and decoder to remove conditional statements.
Encoding seems to be about 1% faster at speed 0, tested on football.
Decoding seems to be about 0.5-1% faster on a set of 25 videos.
Change-Id: Idb233ca0b9e0efd790e30880642e8717e1c5c8dd
This commit enables the dual buffer rate-distortion optimization
and encoding scheme. It stacks the original transform coefficients,
quantized levels, and reconstructed coefficients, in the rate-
distortion optimization search process, hence eliminates the need
to re-run residual generation, forward transform, and quantization
in the encoding stage.
Change-Id: I011bfad3a59a380a869ee552e91dae0394ec492e
Added loop filter mask checking, and made the caller function
ready for implementation of parallel loopfiltering in horizontal
direction.
Next, we need to go through the loopfilter functions (both c and
optimized versions), and provide 16-byte wide loopfiltering for
each filter type.
Change-Id: Ifef47e7ef9086ebc2fd6ca7ede8f27c9bbf79e66
Allocate memory space of dual buffer sets that store the coeff, qcoeff,
dqcoeff, and eobs. Connect the pointers of macroblock_plane and
macroblockd_plane to the actual buffer in use accordingly.
Change-Id: I2f0b5f482ca879fae39095013eaf8901db20a5a4
Make the macroblockd_plane contain dynamic buffer pointers instead
static pointers to the memory space allocated therein. The decoder
uses the buffer allocated in pbi, while encoder will use a dual
buffer approach for rate-distortion optimization search.
Change-Id: Ie6f24be2dcda35df7c15b4014e5ccf236fb3f76c
This commit fixes the assignment of mode_info pointer per tile. It
makes recognition of tiles in both row and column formats and properly
arrange the use of mode_info.
The bug was first introduced in
I6226456dd11f275fa991e4a7a930549da6675915
https://gerrit.chromium.org/gerrit/#/c/67492/
Change-Id: Ie12cd209f53241513728c461ee3d7b9599ddb860
Inlining set_contexts_on_border() into set_contexts(). The only difference
is the additional check that "has_eob != 0" in addition to
"xd->mb_to_right_edge < 0" and "xd->mb_to_right_edge < 0". If has_eob == 0
then memset does the right thing and works faster.
Change-Id: I5206f767d729f758b14c667592b7034df4837d0e
This patch continued the work done in "Rewrite loop_filter_info_n
struct"(commit:00dbd369c70270428d56da6d15ea5486fc821c52) to further
improve loopfilter function.
1. Instead of storing pointers to thresholds, store loopfilter
levels within 64x64 SB;
2. Since loopfilter levels are already calculated in setup_mask,
we don't need call build_lfi to look up them again. Just save
loopfilter levels in setup_mask.
3. Reorganized and simplified filter_block_plane().
Tests showed a ~0.8% decoder speedup.
Change-Id: I723c7779738bbc2afcb9afa2c6f78580ee6c3af7
This to make sure that prediction residue always get coded in lossless
mode.
This commit also fixed lossless unit test
Change-Id: I537726ee55328d4e4cf0a0196393a67e12bfcde1
The new expression is much more logical than previous one. Surprisingly
both expressions give exactly the same set of dependent values
-- have_top, have_left, have_right -- in vp9_predict_intra_block.
Change-Id: I63eb1b592b8c37883b3a0dbb1f3daa271e446109