In frame parallel decode, libvpx decoder decodes several frames on all
cpus in parallel fashion. If not being flushed, it will only return frame
when all the cpus are busy. If getting flushed, it will return all the
frames in the decoder. Compare with current serial decode mode in which
libvpx decoder is idle between decode calls, libvpx decoder is busy
between decode calls.
Current frame parallel decode will only speed up the decoding for frame
parallel encoded videos. For non frame parallel encoded videos, frame
parallel decode is slower than serial decode due to lack of loopfilter
worker thread.
There are still some known issues that need to be addressed. For example:
decode frame parallel videos with segmentation enabled is not right sometimes.
* frame-parallel:
Add error handling for frame parallel decode and unit test for that.
Fix a bug in frame parallel decode and add a unit test for that.
Add two test vectors to test frame parallel decode.
Add key frame seeking to webmdec and webm_video_source.
Implement frame parallel decode for VP9.
Increase the thread test range to cover 5, 6, 7, 8 threads.
Fix a bug in adding frame parallel unit test.
Add VP9 frame-parallel unit test.
Manually pick "Make the api behavior conform to api spec." from master branch.
Move vp9_dec_build_inter_predictors_* to decoder folder.
Add segmentation map array for current and last frame segmentation.
Include the right header for VP9 worker thread.
Move vp9_thread.* to common.
ctrl_get_reference does not need user_priv.
Seperate the frame buffers from VP9 encoder/decoder structure.
Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:"""
Conflicts:
test/codec_factory.h
test/decode_test_driver.cc
test/decode_test_driver.h
test/invalid_file_test.cc
test/test-data.sha1
test/test.mk
test/test_vectors.cc
vp8/vp8_dx_iface.c
vp9/common/vp9_alloccommon.c
vp9/common/vp9_entropymode.c
vp9/common/vp9_loopfilter_thread.c
vp9/common/vp9_loopfilter_thread.h
vp9/common/vp9_mvref_common.c
vp9/common/vp9_onyxc_int.h
vp9/common/vp9_reconinter.c
vp9/decoder/vp9_decodeframe.c
vp9/decoder/vp9_decodeframe.h
vp9/decoder/vp9_decodemv.c
vp9/decoder/vp9_decoder.c
vp9/decoder/vp9_decoder.h
vp9/encoder/vp9_encoder.c
vp9/encoder/vp9_pickmode.c
vp9/encoder/vp9_rdopt.c
vp9/vp9_cx_iface.c
vp9/vp9_dx_iface.c
This reverts commit a18da9760a74d9ce6fb9f875706dc639c95402f5.
Change-Id: I361442ffec1586d036ea2e0ee97ce4f077585f02
In frame parallel decode, libvpx decoder decodes several frames on all
cpus in parallel fashion. If not being flushed, it will only return frame
when all the cpus are busy. If getting flushed, it will return all the
frames in the decoder. Compare with current serial decode mode in which
libvpx decoder is idle between decode calls, libvpx decoder is busy
between decode calls. VP9 frame parallel decode is >30% faster than serial
decode with tile parallel threading which will makes devices play 1080P
VP9 videos more easily.
* frame-parallel:
Add error handling for frame parallel decode and unit test for that.
Fix a bug in frame parallel decode and add a unit test for that.
Add two test vectors to test frame parallel decode.
Add key frame seeking to webmdec and webm_video_source.
Implement frame parallel decode for VP9.
Increase the thread test range to cover 5, 6, 7, 8 threads.
Fix a bug in adding frame parallel unit test.
Add VP9 frame-parallel unit test.
Manually pick "Make the api behavior conform to api spec." from master branch.
Move vp9_dec_build_inter_predictors_* to decoder folder.
Add segmentation map array for current and last frame segmentation.
Include the right header for VP9 worker thread.
Move vp9_thread.* to common.
ctrl_get_reference does not need user_priv.
Seperate the frame buffers from VP9 encoder/decoder structure.
Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:"""
Conflicts:
test/codec_factory.h
test/decode_test_driver.cc
test/decode_test_driver.h
test/invalid_file_test.cc
test/test-data.sha1
test/test.mk
test/test_vectors.cc
vp8/vp8_dx_iface.c
vp9/common/vp9_alloccommon.c
vp9/common/vp9_entropymode.c
vp9/common/vp9_loopfilter_thread.c
vp9/common/vp9_loopfilter_thread.h
vp9/common/vp9_mvref_common.c
vp9/common/vp9_onyxc_int.h
vp9/common/vp9_reconinter.c
vp9/decoder/vp9_decodeframe.c
vp9/decoder/vp9_decodeframe.h
vp9/decoder/vp9_decodemv.c
vp9/decoder/vp9_decoder.c
vp9/decoder/vp9_decoder.h
vp9/encoder/vp9_encoder.c
vp9/encoder/vp9_pickmode.c
vp9/encoder/vp9_rdopt.c
vp9/vp9_cx_iface.c
vp9/vp9_dx_iface.c
Change-Id: Ib92eb35851c172d0624970e312ed515054e5ca64
1. Added row-based loopfilter in encoder;
2. Moved common multi-threaded loopfilter functions from decoder
to common;
3. Merged multi-threaded loopfilter code, and made encoder/
decoder call same function to reduce code duplication.
Encoder tests showed that 1% - 2% speedup was seen for good-quality
2-pass mode(at speed 3); 1% - 3% speedup using 2 threads and 4% - 6%
speedup using 4 threads were seen for real-time mode(at speed 7).
Change-Id: I8a4ac51c2ad9bab9fa7b864e90743931c53ec1c4
This commit adds encoder side control for vp9 to set color space info
in the output compressed bitstream.
It also amends the "vp9_encoder_params_get_to_decoder" test to verify
the correct color space information is passed from the encoder end to
decoder end.
Change-Id: Ibf5fba2edcb2a8dc37557f6fae5c7816efa52650
This commit added a field to vpx_image_t for indicating color space,
the field is also added to YUV_BUFFER_CONFIG. This allows the color
space information pass through the decoder from input stream to the
output buffer.
The commit also updated compare_img() function with added verification
of matching color space to ensure the color space information to be
correctly passed from encode to decoder in compressed vp9 streams.
Change-Id: I412776ec83defd8a09d76759aeb057b8fa690371
Replaced "color space" with "color format" in comments where color
sampling format is concerned, so to differentiate from the concept
defined in COLOR_SPACE.
Change-Id: I8c935034c166b24307a99352dab1686531276bb8
This reverts commit 91471d6aad285ff10e7582e485d8adadd1986fe2.
Fixes the compile issues if post_proc is enabled.
Change-Id: Ib40a15ce2c194f9b5adfa65a17ab01ddf60f5a59
Fails to compile. Bad calls to vp9_alloc_frame_buffer
and vp9_realloc_frame_buffer in postproc.c
This reverts commit 399823b6f50fb7465f62822d1395e2192e7b07fc.
Change-Id: I29f0e173f8e185d3a303cfdb17813e1eccb51e3a
Add support for setting byte alignment on the Y, U, and V plane of the
reference buffers. The byte alignment must be a power of 2, from 32 to
1024. A value of 0 sets legacy alignment.
Change-Id: I7c1399622f7aa68e123646369216b32047dda73d
If decoding starts with intra-only frame, there is a possibility
of using uninitialized entropy context, what leads to undefined
behavior.
Change-Id: Icbb64b5b1bd1e5de2a4bfa2884e56bc0a20840af
No more checking of corrupted reference frame as we skip
decoding any non-intra frame in case of frame corrupted.
Change-Id: I77d41bbb02fc5f61972740e2d411441eb6a17073
This will save a lot of memory for decoder due to removing of prev_mi,
but prev_mi is still needed in encoder. So this will increase a little bit
memory for encoder.
Change-Id: I24b2f1a423ebffa55a9bd2fcee1077dac995b2ed
This patch allocated frame contexts outside VP9_COMMON. This allows
multiple threads to share the same copy of frame contexts, and
reduces the overhead. It also guarantees the correct update of
these contexts during bitstream packing. This patch doesn't change
encoding result.
Change-Id: Ic181a2460b891d1d587278a6d02d8057b9dbd353
Using 4 threads, frame parallel decode is ~3x faster than single thread
decode and around 30% faster than tile parallel decode for frame parallel
encoded video on both Android and desktop with 4 threads. Decode speed is
scalable to threads too which means decode could be even faster with more threads.
Change-Id: Ia0a549aaa3e83b5a17b31d8299aa496ea4f21e3e
move them from VP9Worker::data[12] to allow the structure to be reused a
bit more naturally by the multi-threaded loopfilter.
Change-Id: I31b49c9e93ca744fd7f6d6ed8696671188fb2c1d
Function will jump to error handler when ref buffer is corrupted.
So "xd->corrupted |= ref_buffer->buf->corrupted;" is useless.
Change-Id: I35353a0637ad0dbb682454e040ef69fa68280bfa
The functions b_width_log2 and b_height_log2 only do direct
table fetch. This commit unifies such use cases by using the
table directly and removes these functions.
Change-Id: I3103fc6ba959c1182886a2799d21b8b77c8a7b6b
mi_grid_* are arrays of pointer to pointer. They save the pointers that point
to the MIs in cm->mi. But they are unnecessary and complicated. The original
goal was to remove MODE_INFO_t copy. But with an extra MODE_INFO_t pointer
inside MODE_INFO_t, same goal could be achieved.
This commit totally removes the mi_grid_* structures. But there are still
many dummy MODE_INFO_t inside cm->mi which are a waste of memory. Next commit
will do on-demand MODE_INFO_t allocation in order to save these memories.
Change-Id: I3a05cf1610679fed26e0b2eadd315a9ae91afdd6
Adds various high bitdepth transform functions and tests.
Much of the changes are related to using typedefs tran_low_t
and tran_high_t for the final transform cofficients and intermediate
stages of the transform computation respectively rather than fixed
types int16_t/int. When vp9_highbitdepth configure flag is off,
these map tp int16_t/int32_t, but when the flag is on, they map
to int32_t/int64_t to make space for needed extra precision.
Change-Id: I3c56de79e15b904d6f655b62ffae170729befdd8
don't bother decoding any further after receiving an earlier decode
error until a key/intra-only frame is encountered.
Change-Id: I381917b70d7a9e6f8d6de42e3d181bb113a4cec4