This commit replaces the previous table based intra mode model
coding with a more balanced entropy coding system. It reduces the
decoder lookup table size by 1K bytes. The key frame compression
performance is about even on average. There are a few points where
the compression performance is improved by over 5%. Most test
points are fairly close to the lookup table approach.
Change-Id: I47154276c0a6a22ae87de8845bc2d494681b95f6
This commit allows the entropy coder for transform block partition
to account for its relative position with respect to the block size.
Change-Id: I2b5019c378bfb58c11b926fa50c0db1933f35852
Refactor the recursive transform block partition to reduce repeated
computation maximum transform block size per block.
Change-Id: Ib408c78dc6923fe7d337dc937e74f2701ac63859
If a block is coded in the intra modes, update the transform block
partition information as maximum block size.
Change-Id: I5ea440c700fc887ff2fe84fabde77a9d896d16f4
If a block has all coefficients quantized to zero, the codec will
assume that it uses largest transform block size.
Change-Id: I1a32527e50026e8e4759ad8de474189cd20e89c8
This commit refactors the transform block partition entropy
coding process to improve the encoding speed. There is no change
in the compression statistics.
Change-Id: I237466fd95c1b888df432babfa36e01f74240eef
This commit allows the encoder to account for the boundary block
information to estimate the transform block partitiion rate cost
in the rate-distortion optimization scheme.
Change-Id: Idb79cf936d96cdd15bcba27e47318295413a5f5d
Select the probability model for transform block partition coding
conditioned on the neighbor transform block sizes.
Change-Id: Ib701296e59009bad97dbd21d8dcd58bc5e552f39
It allows the decoder to recursively parse and use the transform
block size for inter coded blocks.
Change-Id: I12ceea48ab35501ac1a3447142deb2a334eff3b8
Make the bit-stream syntax elelment coding ready to support
variable transform coding block sizes.
Change-Id: I07ae4ab62d1ecd46c4a5ae45702fc14bd1d4b07d
This commit re-designs the bitstream syntax to support recursive
transform block partition. It disables the decoder vector unit
tests.
Change-Id: I6cac24c4f1e44f29ffcc9b87ba1167eeb32d1b69
The current multi-threaded tile decoder requires that the videoes
are encoded with frame_parallel_decoding_mode = 1. This requirement
is not necessary, and is better to be removed. This patch includes
the first part of the work.
Change-Id: Ic7695fb3cfe13f9022582c9f0edd2aa6e2e36d28
In frame parallel decode, libvpx decoder decodes several frames on all
cpus in parallel fashion. If not being flushed, it will only return frame
when all the cpus are busy. If getting flushed, it will return all the
frames in the decoder. Compare with current serial decode mode in which
libvpx decoder is idle between decode calls, libvpx decoder is busy
between decode calls.
Current frame parallel decode will only speed up the decoding for frame
parallel encoded videos. For non frame parallel encoded videos, frame
parallel decode is slower than serial decode due to lack of loopfilter
worker thread.
There are still some known issues that need to be addressed. For example:
decode frame parallel videos with segmentation enabled is not right sometimes.
* frame-parallel:
Add error handling for frame parallel decode and unit test for that.
Fix a bug in frame parallel decode and add a unit test for that.
Add two test vectors to test frame parallel decode.
Add key frame seeking to webmdec and webm_video_source.
Implement frame parallel decode for VP9.
Increase the thread test range to cover 5, 6, 7, 8 threads.
Fix a bug in adding frame parallel unit test.
Add VP9 frame-parallel unit test.
Manually pick "Make the api behavior conform to api spec." from master branch.
Move vp9_dec_build_inter_predictors_* to decoder folder.
Add segmentation map array for current and last frame segmentation.
Include the right header for VP9 worker thread.
Move vp9_thread.* to common.
ctrl_get_reference does not need user_priv.
Seperate the frame buffers from VP9 encoder/decoder structure.
Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:"""
Conflicts:
test/codec_factory.h
test/decode_test_driver.cc
test/decode_test_driver.h
test/invalid_file_test.cc
test/test-data.sha1
test/test.mk
test/test_vectors.cc
vp8/vp8_dx_iface.c
vp9/common/vp9_alloccommon.c
vp9/common/vp9_entropymode.c
vp9/common/vp9_loopfilter_thread.c
vp9/common/vp9_loopfilter_thread.h
vp9/common/vp9_mvref_common.c
vp9/common/vp9_onyxc_int.h
vp9/common/vp9_reconinter.c
vp9/decoder/vp9_decodeframe.c
vp9/decoder/vp9_decodeframe.h
vp9/decoder/vp9_decodemv.c
vp9/decoder/vp9_decoder.c
vp9/decoder/vp9_decoder.h
vp9/encoder/vp9_encoder.c
vp9/encoder/vp9_pickmode.c
vp9/encoder/vp9_rdopt.c
vp9/vp9_cx_iface.c
vp9/vp9_dx_iface.c
This reverts commit a18da9760a74d9ce6fb9f875706dc639c95402f5.
Change-Id: I361442ffec1586d036ea2e0ee97ce4f077585f02
In frame parallel decode, libvpx decoder decodes several frames on all
cpus in parallel fashion. If not being flushed, it will only return frame
when all the cpus are busy. If getting flushed, it will return all the
frames in the decoder. Compare with current serial decode mode in which
libvpx decoder is idle between decode calls, libvpx decoder is busy
between decode calls. VP9 frame parallel decode is >30% faster than serial
decode with tile parallel threading which will makes devices play 1080P
VP9 videos more easily.
* frame-parallel:
Add error handling for frame parallel decode and unit test for that.
Fix a bug in frame parallel decode and add a unit test for that.
Add two test vectors to test frame parallel decode.
Add key frame seeking to webmdec and webm_video_source.
Implement frame parallel decode for VP9.
Increase the thread test range to cover 5, 6, 7, 8 threads.
Fix a bug in adding frame parallel unit test.
Add VP9 frame-parallel unit test.
Manually pick "Make the api behavior conform to api spec." from master branch.
Move vp9_dec_build_inter_predictors_* to decoder folder.
Add segmentation map array for current and last frame segmentation.
Include the right header for VP9 worker thread.
Move vp9_thread.* to common.
ctrl_get_reference does not need user_priv.
Seperate the frame buffers from VP9 encoder/decoder structure.
Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:"""
Conflicts:
test/codec_factory.h
test/decode_test_driver.cc
test/decode_test_driver.h
test/invalid_file_test.cc
test/test-data.sha1
test/test.mk
test/test_vectors.cc
vp8/vp8_dx_iface.c
vp9/common/vp9_alloccommon.c
vp9/common/vp9_entropymode.c
vp9/common/vp9_loopfilter_thread.c
vp9/common/vp9_loopfilter_thread.h
vp9/common/vp9_mvref_common.c
vp9/common/vp9_onyxc_int.h
vp9/common/vp9_reconinter.c
vp9/decoder/vp9_decodeframe.c
vp9/decoder/vp9_decodeframe.h
vp9/decoder/vp9_decodemv.c
vp9/decoder/vp9_decoder.c
vp9/decoder/vp9_decoder.h
vp9/encoder/vp9_encoder.c
vp9/encoder/vp9_pickmode.c
vp9/encoder/vp9_rdopt.c
vp9/vp9_cx_iface.c
vp9/vp9_dx_iface.c
Change-Id: Ib92eb35851c172d0624970e312ed515054e5ca64
No more checking of corrupted reference frame as we skip
decoding any non-intra frame in case of frame corrupted.
Change-Id: I77d41bbb02fc5f61972740e2d411441eb6a17073
This will save a lot of memory for decoder due to removing of prev_mi,
but prev_mi is still needed in encoder. So this will increase a little bit
memory for encoder.
Change-Id: I24b2f1a423ebffa55a9bd2fcee1077dac995b2ed
This patch allocated frame contexts outside VP9_COMMON. This allows
multiple threads to share the same copy of frame contexts, and
reduces the overhead. It also guarantees the correct update of
these contexts during bitstream packing. This patch doesn't change
encoding result.
Change-Id: Ic181a2460b891d1d587278a6d02d8057b9dbd353
Using 4 threads, frame parallel decode is ~3x faster than single thread
decode and around 30% faster than tile parallel decode for frame parallel
encoded video on both Android and desktop with 4 threads. Decode speed is
scalable to threads too which means decode could be even faster with more threads.
Change-Id: Ia0a549aaa3e83b5a17b31d8299aa496ea4f21e3e
mi_grid_* are arrays of pointer to pointer. They save the pointers that point
to the MIs in cm->mi. But they are unnecessary and complicated. The original
goal was to remove MODE_INFO_t copy. But with an extra MODE_INFO_t pointer
inside MODE_INFO_t, same goal could be achieved.
This commit totally removes the mi_grid_* structures. But there are still
many dummy MODE_INFO_t inside cm->mi which are a waste of memory. Next commit
will do on-demand MODE_INFO_t allocation in order to save these memories.
Change-Id: I3a05cf1610679fed26e0b2eadd315a9ae91afdd6
The original implementation only allocates one segmentation map and this
works fine for serial decode. But for frame parallel decode, each thread
need to have its own segmentation map and the last frame segmentation map
should be provided from last frame decoding thread.
After finishing decoding a frame, thread need to serve the old segmentation
map that associate with the previous decoded frame. The thread also need to
use another segmentation map for decoding the current frame.
Change-Id: I442ddff36b5de9cb8a7eb59e225744c78f4492d8
A previous change, https://gerrit.chromium.org/gerrit/#/c/70632,
introduced a size validation for reference frames to insuare the
input stream is a valid VP9 stream. However, the logic requiring
all reference frames have valid size turned out to be too strict.
In this commit, we modify the validation to require one of the
reference frame has valid dimension. In addition, the decoder
reports error whenever it detects the use of reference frame
with invalid scalig ratio.
Change-Id: If8efc312244087556cfe00f1fcbdff811268ebad
Reimplementing sub8x8-reading of intra block modes in
read_intra_frame_mode_info() and read_intra_block_mode_info(). Code looks
more readable as well.
Change-Id: Ia42fc7d0dad708bc0c7a8bff1f8b37809b843f40
This commit removes the use of best_mv in the decoding process. This
variable can be replaced with nearest_mv. It saves a few cycles on
assigning the values for best_mv.
Change-Id: Ic183f9c1fb615c54efd7e6ccfedcf09d493435e4