Was always using sb_type of first column in a row of 8x8 units when
determining decoded block edges as a subcondition for loop filter
skipping.
Change-Id: Ib17554633a63a90b70cdaa7bed65db035a8ad9d8
Added condition to not to skip filtering of transform block edges when
the edge is also a decoding block edge.
Change-Id: Iaccb6206c4202b78e5dca3b89379556e0f4aba0c
This version of the loop filter supports non-4:2:0 subsampling and
a fourth plane, as well as changing the filtering order to be more
friendly to hardware implementations.
The filters are applied first to all vertical edges within the
64x64 SB, followed by the top horizontal edge and any internal
horizontal edges. Since filtering is applied on each 4x4 edge
serially, a dependency is created from filtering one block edge
to the next. It would be possible to remove this depencnecy by
building all filtering decisions from the unfiltered
reconstruction data.
Change-Id: I08f3e9683eb7bded8a76651cbc50fc0dfdd05fa7
This patch changes the coefficient tree to move the EOB to below
the ZERO node in order to save number of bool decodes.
The advantages of moving EOB one step down as opposed to two steps down
in the other parallel patch are: 1. The coef modeling based on
the One-node becomes independent of the tree structure above it, and
2. Fewer conext/counter increases are needed.
The drawback is that the potential savings in bool decodes will be
less, but assuming that 0s are much more predominant than 1's the
potential savings is still likely to be substantial.
Results on derf300: -0.237%
Change-Id: Ie784be13dc98291306b338e8228703a4c2ea2242
For 4x4 blocks valgrind points out the cache was uninitalized.
This resolves the issue by setting it.
Change-Id: I22733000da048643762813a84fbda66d8e4040d2
This commit makes clean-ups in the rate-distortion loop for 4x4,
4x8, and 8x4 block sizes for the use of iterative motion search.
Removed unnecessary use of bmi in handle_inter_mode.
Deprecated loop over labels in the 4x4/4x8/8x4 block rd search.
Change-Id: I71203dbb68b65e66f073b37abd90d82ef5ae6826
scales for second reference frame vars are unitialized if the
second ref frame is one of of those disallowed by refframeflags
Change-Id: I4ce42de391178c1699dcaede18c5f12c84993c61
Proposal for tuning the residual coding by changing how the context
from previous tokens is calculated. Storing the energy class of previous
tokens instead of the token itself eases the critical path of
HW implementations.
Change-Id: I6d71d856b84518f6c88de771ddd818436f794bab
Adding API to read/write uncompressed frame header bits (it is not final
yet). Separate functions to read/write uncompressed header. Moving
clr_type, error_resilient_mode, refresh_frame_context,
frame_parallel_decoding_mode, frame_context_idx from compressed partition
to uncompressed frame header.
Change-Id: Id3ed8a387980c652ae147549412f4ec24a0a5bd0
This commit pulls the iterative motion search for compound inter-
inter out from handle_inter_mode_ as a separate function. Hence,
it is applicable to 4x4/4x8/8x4 level compound inter search to be
enabled later.
Also edit the rd loop for 4x4 inter block sizes for cosmetic
purpose.
Change-Id: Ibc71a11cbe5a26cd52faba01026cf8446cf4d2b4
Removed one 4x4 prediction step that was unnessary in the rd loop.
Removed a unused modecosts estimate from encoder side.
Change-Id: I65221a52719d6876492996955ef04142d2752d86
1. remove prediction mode conversion
2. unified bmode, same for key and non-key frame
3. set I4X4_PRED count for pdf to 0, as I4X4_PRED is no longer
coded ever. It is determined by ref_frame and block partition
Change-Id: If5b282957c24339b241acdb9f2afef85658fe47d
This commit removes the use of bmi_ in the first-pass encoding by
forcing encode_intra4x4block_ to use DC_PRED, followed by DCT_DCT
only, as John suggested. This makes the need for bmi buffer only
up to 4 entries, instead of 16.
Change-Id: I3410007dfae789ee46a09ae20c39d3ce3c7954aa
Hardware implementation needs to load coeff probs based on the
transform size. For selectable transform size, moving these bits
earlier in the bitstream adds some delay giving time to preload
the probs and speeds up the decoding process.
Change-Id: I3bfc1f662ae6f219c9286fe9ae6310c7d8a63ea7
Also do per-partition motion vector referencing in <sb8x8 partitions,
and adjust mvref finding for sub8x8 partitions.
Change-Id: Id3ed1ed4d2a8910d11d327db6cc63b8eb79f941f
This code does not seem to be necessary anymore.
For the 1080p clip used, the decoder performance improved by
~2%.
Change-Id: I66bb0496d4998b0d6c6637c746b642b77bdbef88
1) Added an initialization of rd_tx_select_threshs[].
2) Made updating transform size counts to be consistent
Change-Id: Iaa9d6c6be825b0364c9d61a9802873d01356815c