316 Commits

Author SHA1 Message Date
hkuang
be6aeadaf4 Try again to merge branch 'frame-parallel' into master branch.
In frame parallel decode, libvpx decoder decodes several frames on all
cpus in parallel fashion. If not being flushed, it will only return frame
when all the cpus are busy. If getting flushed, it will return all the
frames in the decoder. Compare with current serial decode mode in which
libvpx decoder is idle between decode calls, libvpx decoder is busy
between decode calls.

Current frame parallel decode will only speed up the decoding for frame
parallel encoded videos. For non frame parallel encoded videos, frame
parallel decode is slower than serial decode due to lack of loopfilter
worker thread.

There are still some known issues that need to be addressed. For example:
decode frame parallel videos with segmentation enabled is not right sometimes.

* frame-parallel:
  Add error handling for frame parallel decode and unit test for that.
  Fix a bug in frame parallel decode and add a unit test for that.
  Add two test vectors to test frame parallel decode.
  Add key frame seeking to webmdec and webm_video_source.
  Implement frame parallel decode for VP9.
  Increase the thread test range to cover 5, 6, 7, 8 threads.
  Fix a bug in adding frame parallel unit test.
  Add VP9 frame-parallel unit test.
  Manually pick "Make the api behavior conform to api spec." from master branch.
  Move vp9_dec_build_inter_predictors_* to decoder folder.
  Add segmentation map array for current and last frame segmentation.
  Include the right header for VP9 worker thread.
  Move vp9_thread.* to common.
  ctrl_get_reference does not need user_priv.
  Seperate the frame buffers from VP9 encoder/decoder structure.
  Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:"""
 Conflicts:
       test/codec_factory.h
       test/decode_test_driver.cc
       test/decode_test_driver.h
       test/invalid_file_test.cc
       test/test-data.sha1
       test/test.mk
       test/test_vectors.cc
       vp8/vp8_dx_iface.c
       vp9/common/vp9_alloccommon.c
       vp9/common/vp9_entropymode.c
       vp9/common/vp9_loopfilter_thread.c
       vp9/common/vp9_loopfilter_thread.h
       vp9/common/vp9_mvref_common.c
       vp9/common/vp9_onyxc_int.h
       vp9/common/vp9_reconinter.c
       vp9/decoder/vp9_decodeframe.c
       vp9/decoder/vp9_decodeframe.h
       vp9/decoder/vp9_decodemv.c
       vp9/decoder/vp9_decoder.c
       vp9/decoder/vp9_decoder.h
       vp9/encoder/vp9_encoder.c
       vp9/encoder/vp9_pickmode.c
       vp9/encoder/vp9_rdopt.c
       vp9/vp9_cx_iface.c
       vp9/vp9_dx_iface.c

This reverts commit a18da9760a74d9ce6fb9f875706dc639c95402f5.

Change-Id: I361442ffec1586d036ea2e0ee97ce4f077585f02
2015-01-30 21:00:13 -08:00
Johann
a18da9760a Revert "Merge branch 'frame-parallel' to enable frame parallel decode in master branch."
This reverts commit bde04ce5039cbcf86c8b34bdb4127e18d7e1d0c7

Change-Id: I053dae04c761b04a36dc239558503905a14d2470
2015-01-23 08:42:02 -08:00
hkuang
bde04ce503 Merge branch 'frame-parallel' to enable frame parallel decode in master branch.
In frame parallel decode, libvpx decoder decodes several frames on all
cpus in parallel fashion. If not being flushed, it will only return frame
when all the cpus are busy. If getting flushed, it will return all the
frames in the decoder. Compare with current serial decode mode in which
libvpx decoder is idle between decode calls, libvpx decoder is busy
between decode calls. VP9 frame parallel decode is >30% faster than serial
decode with tile parallel threading which will makes devices play 1080P
VP9 videos more easily.

* frame-parallel:
  Add error handling for frame parallel decode and unit test for that.
  Fix a bug in frame parallel decode and add a unit test for that.
  Add two test vectors to test frame parallel decode.
  Add key frame seeking to webmdec and webm_video_source.
  Implement frame parallel decode for VP9.
  Increase the thread test range to cover 5, 6, 7, 8 threads.
  Fix a bug in adding frame parallel unit test.
  Add VP9 frame-parallel unit test.
  Manually pick "Make the api behavior conform to api spec." from master branch.
  Move vp9_dec_build_inter_predictors_* to decoder folder.
  Add segmentation map array for current and last frame segmentation.
  Include the right header for VP9 worker thread.
  Move vp9_thread.* to common.
  ctrl_get_reference does not need user_priv.
  Seperate the frame buffers from VP9 encoder/decoder structure.
  Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:"""

 Conflicts:
       test/codec_factory.h
       test/decode_test_driver.cc
       test/decode_test_driver.h
       test/invalid_file_test.cc
       test/test-data.sha1
       test/test.mk
       test/test_vectors.cc
       vp8/vp8_dx_iface.c
       vp9/common/vp9_alloccommon.c
       vp9/common/vp9_entropymode.c
       vp9/common/vp9_loopfilter_thread.c
       vp9/common/vp9_loopfilter_thread.h
       vp9/common/vp9_mvref_common.c
       vp9/common/vp9_onyxc_int.h
       vp9/common/vp9_reconinter.c
       vp9/decoder/vp9_decodeframe.c
       vp9/decoder/vp9_decodeframe.h
       vp9/decoder/vp9_decodemv.c
       vp9/decoder/vp9_decoder.c
       vp9/decoder/vp9_decoder.h
       vp9/encoder/vp9_encoder.c
       vp9/encoder/vp9_pickmode.c
       vp9/encoder/vp9_rdopt.c
       vp9/vp9_cx_iface.c
       vp9/vp9_dx_iface.c

Change-Id: Ib92eb35851c172d0624970e312ed515054e5ca64
2015-01-22 18:18:53 -08:00
Yunqing Wang
e76eaf05b1 vp9_ethread: add parallel loopfilter
1. Added row-based loopfilter in encoder;
2. Moved common multi-threaded loopfilter functions from decoder
   to common;
3. Merged multi-threaded loopfilter code, and made encoder/
   decoder call same function to reduce code duplication.

Encoder tests showed that 1% - 2% speedup was seen for good-quality
2-pass mode(at speed 3); 1% - 3% speedup using 2 threads and 4% - 6%
speedup using 4 threads were seen for real-time mode(at speed 7).

Change-Id: I8a4ac51c2ad9bab9fa7b864e90743931c53ec1c4
2015-01-16 17:19:27 -08:00
Yaowu Xu
e94b415c34 Add encoder control for setting color space
This commit adds encoder side control for vp9 to set color space info
in the output compressed bitstream.

It also amends the "vp9_encoder_params_get_to_decoder" test to verify
the correct color space information is passed from the encoder end to
decoder end.

Change-Id: Ibf5fba2edcb2a8dc37557f6fae5c7816efa52650
2015-01-14 10:17:14 -08:00
Yaowu Xu
6b223fcb58 Enable decoder to pass through color space info
This commit added a field to vpx_image_t for indicating color space,
the field is also added to YUV_BUFFER_CONFIG. This allows the color
space information pass through the decoder from input stream to the
output buffer.

The commit also updated compare_img() function with added verification
of matching color space to ensure the color space information to be
correctly passed from encode to decoder in compressed vp9 streams.

Change-Id: I412776ec83defd8a09d76759aeb057b8fa690371
2015-01-13 15:13:19 -08:00
Yaowu Xu
ecbca31a1d Fix comments and color format
Replaced "color space" with "color format" in comments where color
sampling format is concerned, so to differentiate from the concept
defined in COLOR_SPACE.

Change-Id: I8c935034c166b24307a99352dab1686531276bb8
2015-01-09 10:36:43 -08:00
James Zern
4d6838627d Merge "vp9: add per-tile longjmp error handling" 2015-01-08 15:53:37 -08:00
hkuang
aa5563cd41 Remove unnecessary init_macroblockd.
macroblockd are init again inside decode_tiles and decode_tiles_mt.

Change-Id: I1f42837864f095c319cdb24cec7d6aa6a3a4da50
2014-12-30 15:23:52 -08:00
James Zern
953dd1894d vp9: add per-tile longjmp error handling
this avoids longjmp'ing from another thread on error which will cause
undesired behavior

Change-Id: Ic9074ed8cc4243944bf2539d6e482f213f4e8c86
2014-12-19 11:50:04 -08:00
Frank Galligan
5fdd0f1fe0 Merge "Revert "Revert "Add support for setting byte alignment.""" 2014-12-16 15:14:17 -08:00
Yaowu Xu
b60ae45f36 Merge "Prevent decoder from using uninitialized entropy context." 2014-12-16 09:30:24 -08:00
Frank Galligan
c4f7079ad4 Revert "Revert "Add support for setting byte alignment.""
This reverts commit 91471d6aad285ff10e7582e485d8adadd1986fe2.

Fixes the compile issues if post_proc is enabled.

Change-Id: Ib40a15ce2c194f9b5adfa65a17ab01ddf60f5a59
2014-12-15 12:20:37 -08:00
Paul Wilkins
91471d6aad Revert "Add support for setting byte alignment."
Fails to compile. Bad calls to vp9_alloc_frame_buffer
and vp9_realloc_frame_buffer in postproc.c

This reverts commit 399823b6f50fb7465f62822d1395e2192e7b07fc.

Change-Id: I29f0e173f8e185d3a303cfdb17813e1eccb51e3a
2014-12-15 11:54:13 +00:00
Frank Galligan
399823b6f5 Add support for setting byte alignment.
Add support for setting byte alignment on the Y, U, and V plane of the
reference buffers. The byte alignment must be a power of 2, from 32 to
1024. A value of 0 sets legacy alignment.

Change-Id: I7c1399622f7aa68e123646369216b32047dda73d
2014-12-12 13:34:36 -08:00
hkuang
3c7a06c3cc Remove unnecessary dqcoeff memset.
dqcoeff is set to be 0 on initialization. And set back to 0 after being
used everytime.

Change-Id: I32b8e149bba40a8d707849f737a8e49a691f319c
2014-12-11 12:27:25 -08:00
Alexander Voronov
6c6a97814f Prevent decoder from using uninitialized entropy context.
If decoding starts with intra-only frame, there is a possibility
of using uninitialized entropy context, what leads to undefined
behavior.

Change-Id: Icbb64b5b1bd1e5de2a4bfa2884e56bc0a20840af
2014-12-11 20:44:19 +03:00
hkuang
d05cf10fe7 Add error handling for frame parallel decode and unit test for that.
Change-Id: I6e309e11f1641618d2424b7a2c0fe744b8974dec
2014-12-08 12:30:19 -08:00
hkuang
dde819599b Clean up the logic of handling corrupted frame.
No more checking of corrupted reference frame as we skip
decoding any non-intra frame in case of frame corrupted.

Change-Id: I77d41bbb02fc5f61972740e2d411441eb6a17073
2014-12-04 15:07:59 -08:00
hkuang
55577431ae Bind motion vectors with frame buffer structure.
This will save a lot of memory for decoder due to removing of prev_mi,
but prev_mi is still needed in encoder. So this will increase a little bit
memory for encoder.

Change-Id: I24b2f1a423ebffa55a9bd2fcee1077dac995b2ed
2014-10-31 17:01:08 -07:00
James Zern
01900edc40 Merge changes I8a9c9019,Ic7b2faa3,I44d42a50,I3f3a3924,I10747b32,I31b49c9e
* changes:
  add vp9_loop_filter_data_reset
  move LFWorkerData allocation to VP9LfSync
  vp9_loop_filter_frame_mt: remove pbi dependency
  vp9_loop_filter_frame_mt: pass planes directly
  vp9_loop_filter_frame_mt: pass VP9LfSync directly
  vp9: store TileWorkerData allocations separately
2014-10-24 11:43:51 -07:00
James Zern
01483677e5 add vp9_loop_filter_data_reset
Change-Id: I8a9c9019242ec10fa499a78db322221bf96a0275
2014-10-23 19:43:48 +02:00
Yunqing Wang
7c7e4d4eb8 vp9_ethread: allocate frame contexts outside VP9_COMMON struct
This patch allocated frame contexts outside VP9_COMMON. This allows
multiple threads to share the same copy of frame contexts, and
reduces the overhead. It also guarantees the correct update of
these contexts during bitstream packing. This patch doesn't change
encoding result.

Change-Id: Ic181a2460b891d1d587278a6d02d8057b9dbd353
2014-10-22 15:03:12 -07:00
Hangyu Kuang
9ce3a7d76c Implement frame parallel decode for VP9.
Using 4 threads, frame parallel decode is ~3x faster than single thread
decode and around 30% faster than tile parallel decode for frame parallel
encoded video on both Android and desktop with 4 threads. Decode speed is
scalable to threads too which means decode could be even faster with more threads.

Change-Id: Ia0a549aaa3e83b5a17b31d8299aa496ea4f21e3e
2014-10-22 10:50:58 -07:00
James Zern
175c870efa vp9_loop_filter_frame_mt: remove pbi dependency
Change-Id: I44d42a5098305a2d050ce8ff3c76baf7798c48af
2014-10-16 18:55:44 +02:00
James Zern
69f11d2b9d vp9_loop_filter_frame_mt: pass planes directly
one less dependency on pbi

Change-Id: I3f3a392416d3523f4aea6682c3965885baf85197
2014-10-16 18:54:10 +02:00
James Zern
eb3fdfba09 vp9_loop_filter_frame_mt: pass VP9LfSync directly
a step towards removing the pbi dependency

Change-Id: I10747b325e81c172f5e67031ea5159159fc26e91
2014-10-16 17:27:57 +02:00
James Zern
ff3ae42d8c vp9: store TileWorkerData allocations separately
move them from VP9Worker::data[12] to allow the structure to be reused a
bit more naturally by the multi-threaded loopfilter.

Change-Id: I31b49c9e93ca744fd7f6d6ed8696671188fb2c1d
2014-10-16 17:27:57 +02:00
hkuang
cf608110fc Merge "Correct the format." 2014-10-14 13:45:11 -07:00
hkuang
c1b0d0da0b Correct the format.
Change-Id: I59a53b419adda3a609d50b2a82f5a4a54849752e
2014-10-14 11:35:26 -07:00
hkuang
c7f9c717de Remove unnecessary local variable.
Change-Id: I7b19b6061cec369825a0a0b7a485ca490dbc12ee
2014-10-13 14:05:42 -07:00
hkuang
3304d4e6ca Optimize the code to set the refernce frame right after reading the header.
Change-Id: I495cf4a366e06e3220ed132500b1ba1c8448f708
2014-10-09 16:32:36 -07:00
hkuang
ca27459c1a Merge "Remove unnecessary code." 2014-10-09 15:44:08 -07:00
hkuang
336e255236 Merge "Remove unnecessary scale check in set_ref." 2014-10-09 15:43:31 -07:00
hkuang
0e06c8ff36 Remove unnecessary code.
Function will jump to error handler when ref buffer is corrupted.
So "xd->corrupted |= ref_buffer->buf->corrupted;" is useless.

Change-Id: I35353a0637ad0dbb682454e040ef69fa68280bfa
2014-10-09 15:12:12 -07:00
Deb Mukherjee
1929c9b391 Rename highbitdepth functions to use highbd prefix
Uses highbd_ prefix convention consistently.

Change-Id: I58f7f799a7ff8e32701bcd71c955bcf1cdd4581e
2014-10-09 14:40:40 -07:00
hkuang
15a3e5f742 Remove unnecessary scale check in set_ref.
Scale check has been done in read_inter_block_mode_info.

Change-Id: I6c86f93bd579109ed30ff13a04a30e35f5ae6fc5
2014-10-09 12:19:55 -07:00
Jingning Han
b66f7016c1 Take out repeated block width/height lookup functions
The functions b_width_log2 and b_height_log2 only do direct
table fetch. This commit unifies such use cases by using the
table directly and removes these functions.

Change-Id: I3103fc6ba959c1182886a2799d21b8b77c8a7b6b
2014-10-07 12:33:07 -07:00
Alex Converse
a0befb93e7 Fix subsampling check for images 1 pixel wide/tall
Change-Id: I0e262ede7eb4a4ae0c86181922d744e542e93350
2014-10-02 11:02:57 -07:00
Deb Mukherjee
9ed23de13f Miscellaneous decoder changes for high bitdepth
Also includes yv12 config changes.

Change-Id: Iacf40d8bf486815b54c32a127ce3cd4516b7e44f
2014-09-29 11:27:45 -07:00
Deb Mukherjee
993d10a217 Adds various high bit-depth encode functions
Change-Id: I6f67b171022bbc8199c6d674190b57f6bab1b62f
2014-09-25 01:50:36 -07:00
hkuang
c70cea97ac Remove mi_grid_* structures.
mi_grid_* are arrays of pointer to pointer. They save the pointers that point
to the MIs in cm->mi. But they are unnecessary and complicated. The original
goal was to remove MODE_INFO_t copy. But with an extra MODE_INFO_t pointer
inside MODE_INFO_t, same goal could be achieved.

This commit totally removes the mi_grid_* structures. But there are still
many dummy MODE_INFO_t inside cm->mi which are a waste of memory. Next commit
will do on-demand MODE_INFO_t allocation in order to save these memories.

Change-Id: I3a05cf1610679fed26e0b2eadd315a9ae91afdd6
2014-09-19 21:27:11 -07:00
Deb Mukherjee
0d3c3d3ce7 Adds high bitdepth convolve, interpred & scaling
Change-Id: Ie51c352a6b250547207cbc1ebba833a01ed053e3
2014-09-18 07:26:17 -07:00
Deb Mukherjee
5cd0aab81a Adds high bitdepth quantization functions
Adds various high bitdepth quantization functions.

Change-Id: I36fc0bf75a1bd15128ed271df8723de0ac134b0c
2014-09-16 14:55:37 -07:00
Deb Mukherjee
10783d4f3a Adds high bitdepth transform functions and tests
Adds various high bitdepth transform functions and tests.
Much of the changes are related to using typedefs tran_low_t
and tran_high_t for the final transform cofficients and intermediate
stages of the transform computation respectively rather than fixed
types int16_t/int. When vp9_highbitdepth configure flag is off,
these map tp int16_t/int32_t, but when the flag is on, they map
to int32_t/int64_t to make space for needed extra precision.

Change-Id: I3c56de79e15b904d6f655b62ffae170729befdd8
2014-09-11 19:56:33 -07:00
James Zern
7ee073e61d vp9: wait for key/intra-only frame after corruption
don't bother decoding any further after receiving an earlier decode
error until a key/intra-only frame is encountered.

Change-Id: I381917b70d7a9e6f8d6de42e3d181bb113a4cec4
2014-09-09 19:36:11 -07:00
James Zern
2215d2f135 Merge changes If8887e1d,I36bfc9c8,I3d1e6c42
* changes:
  vp9_dthread: simplify loop_filter_row_worker signature
  simplify vp9_loop_filter_worker signature
  vp9_decodeframe: simplify tile_work_hook signature
2014-09-09 16:50:28 -07:00
James Zern
f853117b87 vp9_decodeframe: simplify tile_work_hook signature
use the type names directly in the function declaration rather than
(void *arg1, void *arg2)

Change-Id: I3d1e6c42d384d8e628d7f2075fa561c2c5e20749
2014-09-08 19:53:46 -07:00
Alex Converse
b932c6c5dd BITSTREAM CLARIFICATION: Forbid referencing across color spaces.
Check image format of reference frames.

Change-Id: I7d8d7f097ba547839ff9cec3880bd15a4948ee06
2014-09-08 11:12:09 -07:00
James Zern
46faaeeffb Merge changes I7b9f40dc,I76e74f2e
* changes:
  vp9: correct context buffer resize check
  vp9: fail decode if block/frame refs are corrupt
2014-09-05 12:01:59 -07:00