Commit Graph

1244 Commits

Author SHA1 Message Date
paulwilkins
8ea7bafdaa Merge "Revised rd adjustment for variance." 2015-03-24 03:12:56 -07:00
paulwilkins
c0b71cf82f Merge "Experimental rd bias based on source vs recon variance." 2015-03-24 03:12:41 -07:00
paulwilkins
7e234b9228 Revised rd adjustment for variance.
Revised adjustment for rd based on source complexity.
Two cases:

1) Bias against low variance intra predictors
when the actual source variance is higher.

2) When the source variance is very low to give a slight
bias against predictors that might introduce false texture
or features.

The impact on metrics of this change across the test sets is
small and mixed.

derf -0.073%, -0.049%, -0.291%
std hd -0.093%, -0.1%, -0.557%
yt  +0.186%, +0.04%, - 0.074%
ythd +0.625%, + 0.563%, +0.584%

Medium to strong psycho-visual improvements in some
problem clips.

This feature and intra weight on GF group length now
turned on by default.

Change-Id: Idefc8b633a7b7bc56c42dbe19f6b2f872d73851e
2015-03-20 11:59:39 +00:00
paulwilkins
9a1ce7be7d Experimental rd bias based on source vs recon variance.
This experiment biases the rd decision based on the impact
a mode decision has on the relative spatial complexity of the
reconstruction vs the source.

The aim is to better retain a semblance of texture even if it
is slightly misaligned / wrong, rather than use a simple rd
measure that tends to favor use of a flat predictor if a perfect
match can't be found.

This improves the appearance of texture and visual quality
on specific test clips but is hidden under a flag and currently
off by default pending visual quality testing on a wider Yt set.

Change-Id: Idf6e754a8949bf39ed9d314c6f2daaa20c888aad
2015-03-20 11:57:36 +00:00
Adrian Grange
12d946df89 Restore first ref frame pointer to the correct value
The joint_motion_search function alternates prediction
between two reference frames. In order to reuse existing
code, a pointer to the appropriate reference frame is
written into xd->plane[0].pre[0], that the motion
estimation code assumes points to the reference frame.

If this first reference frame was scaled then the
pointer was incorrectly being reset to point to the
unscaled reference frame rather than the scaled
version.

Change-Id: I76f73a8d8f4f15c1f3a5e7e08a35140cdb7886ab
2015-03-19 16:17:31 -07:00
Adrian Grange
53c9ebe609 Move joint_motion_search & delete function prototype
Change-Id: I7fb3a78ed0e0bc940d8b4a57c470302f8369782f
2015-03-19 14:28:52 -07:00
Alex Converse
ad01d275e9 Merge "Don't inline cost_coeffs." 2015-03-05 13:54:44 -08:00
Adrian Grange
6e3be5c3b6 Merge "Fix valgrind memcpy memory overlaps warning" 2015-03-05 12:52:57 -08:00
Alex Converse
2eb113d00a Don't inline cost_coeffs.
It was tiny when it was orginally marked INLINE. Forcing this function
to be inlined prevents the compiler from inlining its much smaller
callers.

No measurable speed impact, 28320 byte smaller libvpx.a

Change-Id: I6bf4c917157d15cbadb3cd3e20a9e82d35dc7d6f
2015-03-05 12:39:02 -08:00
Adrian Grange
3807dd82ab Make encoder buffer allocation dynamic
Frame buffers are now allocated dynamically on-demand.

Entries in the reference frame map, cm->ref_frame_map,
may now be set to -1 (INVALID_IDX) to indicate that
there is not a valid reference buffer in that "slot".

All slots in the reference frame map are now initialized
to the empty state (-1) and each buffer is initialized
to have a reference count of 0.

Change-Id: Id1afe98de98db4ae8b2dfefed7889c3b28c68582
2015-03-04 07:58:32 -08:00
Adrian Grange
852f62fde5 Fix valgrind memcpy memory overlaps warning
Change-Id: Id0bb162b48b891c5c849f0411ef2ac0aa4bbe261
2015-03-03 15:06:34 -08:00
Jingning Han
5041aa0fbe Fix ioc issue in block_rd_txfm
Force 64-bit precision in the intermediate steps.

Change-Id: I666113d9adcef8975da201d5aa1a13b783d09594
2015-02-12 12:51:39 -08:00
Adrian Grange
23ebacdb81 Auto-adaptive encoder frame resizing logic
Note: This feature is still in development.

Add an option for the encoder to decide the resolution
at which to encode each frame.

Each KF/GF/ARF goup is tested to see if it would be
better encoded at a lower resolution. At present, each
KF/GF/ARF is coded first at full-size and if the coded
size exceeds a threshold (twice target data rate) at
the maximum active Q then the entire group is encoded
at lower resolution.

This feature is enabled in vpxenc by setting:
  --resize-allowed=1

In addition, if the vpxenc command line also specifies
valid frame dimensions using:
  --resize-width=XXXX & --resize_height=YYYY
then *all* frames will be encoded at this resolution.

Change-Id: I13f341e0a82512f9e84e144e0f3b5aed8a65402b
2015-02-10 09:59:32 -08:00
hkuang
be6aeadaf4 Try again to merge branch 'frame-parallel' into master branch.
In frame parallel decode, libvpx decoder decodes several frames on all
cpus in parallel fashion. If not being flushed, it will only return frame
when all the cpus are busy. If getting flushed, it will return all the
frames in the decoder. Compare with current serial decode mode in which
libvpx decoder is idle between decode calls, libvpx decoder is busy
between decode calls.

Current frame parallel decode will only speed up the decoding for frame
parallel encoded videos. For non frame parallel encoded videos, frame
parallel decode is slower than serial decode due to lack of loopfilter
worker thread.

There are still some known issues that need to be addressed. For example:
decode frame parallel videos with segmentation enabled is not right sometimes.

* frame-parallel:
  Add error handling for frame parallel decode and unit test for that.
  Fix a bug in frame parallel decode and add a unit test for that.
  Add two test vectors to test frame parallel decode.
  Add key frame seeking to webmdec and webm_video_source.
  Implement frame parallel decode for VP9.
  Increase the thread test range to cover 5, 6, 7, 8 threads.
  Fix a bug in adding frame parallel unit test.
  Add VP9 frame-parallel unit test.
  Manually pick "Make the api behavior conform to api spec." from master branch.
  Move vp9_dec_build_inter_predictors_* to decoder folder.
  Add segmentation map array for current and last frame segmentation.
  Include the right header for VP9 worker thread.
  Move vp9_thread.* to common.
  ctrl_get_reference does not need user_priv.
  Seperate the frame buffers from VP9 encoder/decoder structure.
  Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:"""
 Conflicts:
       test/codec_factory.h
       test/decode_test_driver.cc
       test/decode_test_driver.h
       test/invalid_file_test.cc
       test/test-data.sha1
       test/test.mk
       test/test_vectors.cc
       vp8/vp8_dx_iface.c
       vp9/common/vp9_alloccommon.c
       vp9/common/vp9_entropymode.c
       vp9/common/vp9_loopfilter_thread.c
       vp9/common/vp9_loopfilter_thread.h
       vp9/common/vp9_mvref_common.c
       vp9/common/vp9_onyxc_int.h
       vp9/common/vp9_reconinter.c
       vp9/decoder/vp9_decodeframe.c
       vp9/decoder/vp9_decodeframe.h
       vp9/decoder/vp9_decodemv.c
       vp9/decoder/vp9_decoder.c
       vp9/decoder/vp9_decoder.h
       vp9/encoder/vp9_encoder.c
       vp9/encoder/vp9_pickmode.c
       vp9/encoder/vp9_rdopt.c
       vp9/vp9_cx_iface.c
       vp9/vp9_dx_iface.c

This reverts commit a18da9760a.

Change-Id: I361442ffec1586d036ea2e0ee97ce4f077585f02
2015-01-30 21:00:13 -08:00
Jingning Han
9bdc0ae2b2 Format fixes in vp9_rd_pick_inter_mode_sb/sub8x8
Add parentheses to bit operations.

Change-Id: I095d601f0631d055adc4b3a8fde70c9cbae9e749
2015-01-23 11:48:58 -08:00
Johann
a18da9760a Revert "Merge branch 'frame-parallel' to enable frame parallel decode in master branch."
This reverts commit bde04ce503

Change-Id: I053dae04c761b04a36dc239558503905a14d2470
2015-01-23 08:42:02 -08:00
hkuang
bde04ce503 Merge branch 'frame-parallel' to enable frame parallel decode in master branch.
In frame parallel decode, libvpx decoder decodes several frames on all
cpus in parallel fashion. If not being flushed, it will only return frame
when all the cpus are busy. If getting flushed, it will return all the
frames in the decoder. Compare with current serial decode mode in which
libvpx decoder is idle between decode calls, libvpx decoder is busy
between decode calls. VP9 frame parallel decode is >30% faster than serial
decode with tile parallel threading which will makes devices play 1080P
VP9 videos more easily.

* frame-parallel:
  Add error handling for frame parallel decode and unit test for that.
  Fix a bug in frame parallel decode and add a unit test for that.
  Add two test vectors to test frame parallel decode.
  Add key frame seeking to webmdec and webm_video_source.
  Implement frame parallel decode for VP9.
  Increase the thread test range to cover 5, 6, 7, 8 threads.
  Fix a bug in adding frame parallel unit test.
  Add VP9 frame-parallel unit test.
  Manually pick "Make the api behavior conform to api spec." from master branch.
  Move vp9_dec_build_inter_predictors_* to decoder folder.
  Add segmentation map array for current and last frame segmentation.
  Include the right header for VP9 worker thread.
  Move vp9_thread.* to common.
  ctrl_get_reference does not need user_priv.
  Seperate the frame buffers from VP9 encoder/decoder structure.
  Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:"""

 Conflicts:
       test/codec_factory.h
       test/decode_test_driver.cc
       test/decode_test_driver.h
       test/invalid_file_test.cc
       test/test-data.sha1
       test/test.mk
       test/test_vectors.cc
       vp8/vp8_dx_iface.c
       vp9/common/vp9_alloccommon.c
       vp9/common/vp9_entropymode.c
       vp9/common/vp9_loopfilter_thread.c
       vp9/common/vp9_loopfilter_thread.h
       vp9/common/vp9_mvref_common.c
       vp9/common/vp9_onyxc_int.h
       vp9/common/vp9_reconinter.c
       vp9/decoder/vp9_decodeframe.c
       vp9/decoder/vp9_decodeframe.h
       vp9/decoder/vp9_decodemv.c
       vp9/decoder/vp9_decoder.c
       vp9/decoder/vp9_decoder.h
       vp9/encoder/vp9_encoder.c
       vp9/encoder/vp9_pickmode.c
       vp9/encoder/vp9_rdopt.c
       vp9/vp9_cx_iface.c
       vp9/vp9_dx_iface.c

Change-Id: Ib92eb35851c172d0624970e312ed515054e5ca64
2015-01-22 18:18:53 -08:00
Jingning Han
5c31fd5c6d Merge "Enable sub8x8 inter block search for RTC coding mode" 2015-01-02 10:00:35 -08:00
Jingning Han
dad89d5ca1 Enable sub8x8 inter block search for RTC coding mode
This commit enables sub8x8 inter block coding for RTC mode. The
use of sub8x8 blocks can be turned on by allowing
choose_partitioning function to select 4x4/4x8/8x4 block sizes.

Change-Id: Ifbf1fb3888fe4c094fc85158ac3aa89867d8494a
2014-12-24 17:40:31 -08:00
Jim Bankoski
b3c66f8a2f WIP: Remove giant value cost table
Change-Id: Iabe8a8868a747626c24bb13f1796f4c7827af367
2014-12-23 15:06:17 -08:00
Jim Bankoski
4b8c6d96ec Tokenization without huge tables.
Change-Id: Iff528c4b7528cc70320343b3a7ce07a92b024dfd
2014-12-22 08:42:52 -08:00
Paul Wilkins
2e39817f5e Merge "Improve motion detection for low complexity regions." 2014-12-18 08:38:21 -08:00
Jingning Han
01613aa753 Set second ref frame to be NONE in key frame coding
This commit explicitly set the second reference frame type to be
NONE in key frame coding mode. This fixes a subtle dependency of
reference motion vector used by next inter frame on mode_info
reset before key frame coding.

Change-Id: I5ff0359753fdc9992b0bfe889490f7a32d7d5f6a
2014-12-16 15:49:58 -08:00
Paul Wilkins
b6c75c5a8d Improve motion detection for low complexity regions.
Where there is very subtle motion, especially when combined
with low spatial complexity, the codec sometimes fails to quickly
pick up the ambient motion field.

Once it has been established though the field propagates well using
Nearest and Near MV.

This patch looks specifically at the case where the Nearest and Near
have not been established as non zero vectors and in this case
discounts the cost of searching for a new vector in the rd code.

This will almost certainly have some implications in terms of encode
speed but it should be possible to mitigate the impact in a subsequent
using first pass stats and the local spatial complexity.

Average results for test sets approximately neutral.

Change-Id: I44a29e20f11f7ab10f8c93ffbdc50183d9801524
2014-12-16 17:22:54 +00:00
Jingning Han
eefe869291 Simplify rate-distortion modeling function
Use left shift to replace one multiplication. The computation
outcome remains identical.

Change-Id: I1e1737af0a245de0d2a2bde10f0c171477199fc1
2014-12-15 11:51:16 -08:00
James Zern
72ece1308b vp9: move encoder-only member from common
allow_comp_inter_inter VP9_COMMON -> VP9_COMP

Change-Id: I6d9dc25d1cdd7e2ab62f5be69cd9fa883d21dbb6
2014-12-12 11:17:44 -08:00
hkuang
382f86f945 Improve the performance by caching the left_mi and right_mi in macroblockd.
This improve the deocde performance by ~2% on Nexus 7 2013.

Change-Id: Ie9c4ba0371a149eb7fddc687a6a291c17298d6c3
2014-12-05 16:25:42 -08:00
Jingning Han
74ded4863e Enable conditional skip path in rd_pick_intra_sby_mode
These speed-up features for key frame coding are only turned on
in the settings of hybrid non-RD and RD mode decision. It provides
about 20% speed-up to the hybrid key frame coding at the expense
of certain compression performance loss. For vidyo1, the key frame
coding statistics are changed
9838F, 35.020 dB, 61677 us -> 9920F, 34.834 dB, 47556 us

Overall rtc set compression performance is down by -0.257%.

Change-Id: I0025447fda26bb7855e982955642b5f55d71b51f
2014-12-05 09:36:09 -08:00
Yunqing Wang
edbd61e136 vp9_ethread: modify VP9_COMP structure
This patch modified struct VP9_COMP. Created a struct ThreadData
to include data that need to be copied for each thread. In
multiple thread case, one thread processes one tile. all threads
share one copy of VP9_COMP,
(refer to VP9_COMP *cpi in the code)
but each thread has its own copy of ThreadData,
(refer to ThreadData *td in the code).
Therefore, within the scope of encode_tiles(), both cpi and td
need to be passed as function parameters.

In single thread case, the FRAME_COUNTS pointer in ThreadData
points to "counts" in VP9_COMMON.

Change-Id: Ib37908b2d8e2c0f4f9c18f38017df5ce60e8b13e
2014-11-24 17:57:38 -08:00
Yunqing Wang
379334c2d8 vp9_ethread: move filter_cache out of RD_OPT struct
Similar to mask_filter, the filter_cache in RD_OPT struct can be
moved out, and declared as a local variable since it is only
used in pick_inter_mode functions.

Change-Id: I412b99cca82bade07ac912064ec03dd1de6b2c17
2014-11-20 13:44:16 -08:00
Yunqing Wang
b0efddd8e6 vp9_ethread: change mask_filter to a local variable
The mask_filter in RD_OPT struct is used to record rd result in
filter decision. It is only used in pick_inter_mode functions,
and is removed from the struct and declared as a local variable.

Change-Id: I3c95c8632ba7241591ce00ef2ef5677b5e297d7b
2014-11-20 09:41:49 -08:00
Yunqing Wang
70c9d2983b Revert "vp9_ethread: include a pointer to mb in VP9_COMP"
This reverts commit 6906d218dd.

Another way will be used to handle mb struct.

Change-Id: Ic1111a46b2b1ee00f8f9e3fcd4cf3eb6030b2dc4
2014-11-20 08:31:12 -08:00
Yunqing Wang
6906d218dd vp9_ethread: include a pointer to mb in VP9_COMP
Modified VP9_COMP struct to include MACROBLOCK *mb. This change
makes it feasible in multi-thread case to allocate a mb for each
thread.

Change-Id: I624d6d1aa9c132362200753e5d90b581b1738d6e
2014-11-14 12:31:06 -08:00
Jingning Han
61966b1d10 Merge "Refactor vp9_update_rd_thresh_fact" 2014-10-31 08:55:28 -07:00
Jingning Han
f7b46d8c5e Refactor vp9_update_rd_thresh_fact
Reduce the scope of function parameters.

Change-Id: Ifef2cfb559908a97498ffdbd6ea53da1cd45a73c
2014-10-30 11:09:40 -07:00
Hui Su
66906da066 Merge "Combine vp9_encode_block_intra and encode_block_intra" 2014-10-30 11:02:31 -07:00
Jingning Han
9349a28e80 Enable mode search threshold update in non-RD coding mode
Adaptively adjust the mode thresholds after each mode search round
to skip checking less likely selected modes. Local tests indicate
5% - 10% speed-up in speed -5 and -6. Average coding performance
loss is -1.055%.

speed -5
vidyo1 720p 1000 kbps
16533 b/f, 40.851 dB, 12607 ms -> 16556 b/f, 40.796 dB, 11831 ms

nik 720p 1000 kbps
33229 b/f, 39.127 dB, 11468 ms -> 33235 b/f, 39.131 dB, 10919 ms

speed -6
vidyo1 720p 1000 kbps
16549 b/f, 40.268 dB, 10138 ms -> 16538 b/f, 40.212 dB, 8456 ms

nik 720p 1000 kbps
33271 b/f, 38.433 dB,  7886 ms -> 33279 b/f, 38.416 dB, 7843 ms

Change-Id: I2c2963f1ce4ed9c1cf233b5b2c880b682e1c1e8b
2014-10-29 10:55:34 -07:00
Hui Su
0928da3b6e Combine vp9_encode_block_intra and encode_block_intra
Change-Id: I79091fb677b64892ecca2fb466fde14602d8cdfc
2014-10-28 18:57:01 -07:00
Jingning Han
d56b3eb0cf Refactor encoder tile data structure
Make the common tile info as one element in the encoder tile data
struct.

Change-Id: I8c474b4ba67ee3e2c86ab164f353ff71ea9992be
2014-10-27 19:37:13 -07:00
Jingning Han
eee201c221 Tile based adaptive mode search in RD loop
Make the spatially adaptive mode search in rate-distortion
optimization loop inter tile independent. Experiments suggest that
this does not significantly change the coding staticstics.

Single tile, speed 3:
pedestrian_area 1080p 1500 kbps
59192 b/f, 40.611 dB, 101689 ms

blue_sky 1080p 1500 kbps
58505 b/f, 36.347 dB, 62458 ms

mobile_cal 720p 1000 kbps
13335 b/f, 35.646 dB, 45655 ms

as compared to 4 column tiles, speed 3:
pedestrian_area 1080p 1500 kbps
59329 b/f, 40.597 dB, 101917 ms

blue_sky 1080p 1500 kbps
58712 b/f, 36.320 dB, 62693 ms

mobile_cal 720p 1000 kbps
13191 b/f, 35.485 dB, 45319 ms

Change-Id: I35c6e1e0a859fece8f4145dec28623cbc6a12325
2014-10-24 10:00:27 -07:00
Yunqing Wang
330a6b2756 Merge "vp9_ethread: allocate frame contexts outside VP9_COMMON struct" 2014-10-22 17:10:39 -07:00
Yunqing Wang
7c7e4d4eb8 vp9_ethread: allocate frame contexts outside VP9_COMMON struct
This patch allocated frame contexts outside VP9_COMMON. This allows
multiple threads to share the same copy of frame contexts, and
reduces the overhead. It also guarantees the correct update of
these contexts during bitstream packing. This patch doesn't change
encoding result.

Change-Id: Ic181a2460b891d1d587278a6d02d8057b9dbd353
2014-10-22 15:03:12 -07:00
Hangyu Kuang
9ce3a7d76c Implement frame parallel decode for VP9.
Using 4 threads, frame parallel decode is ~3x faster than single thread
decode and around 30% faster than tile parallel decode for frame parallel
encoded video on both Android and desktop with 4 threads. Decode speed is
scalable to threads too which means decode could be even faster with more threads.

Change-Id: Ia0a549aaa3e83b5a17b31d8299aa496ea4f21e3e
2014-10-22 10:50:58 -07:00
Jingning Han
94ecfa323f Reset rate cost value in rd mode search
When early termination is triggered, properly reset the rate cost
to invalid value to avoid potential ioc issue.

Change-Id: I3444390be2e49a34bb02cf8a74c33d5dbd96d88d
2014-10-17 09:33:59 -07:00
Jingning Han
ed100c0b00 Fix an ioc issue in super_block_uvrd
This commit fixes an ioc issue that will happen when the cumulative
variables are not in effective use. The fix discards these
redundant additions.

Change-Id: Idbac5bfb989c0cedc5f8a323effce938519b2457
2014-10-16 11:07:39 -07:00
Jingning Han
f3a5de816d Refactor super_block_uvrd function to remove goto statement
Use return value 0/1 as indicator of the validity of the rate-
distortion cost.

Change-Id: I6244126fbf03472cebcba4f177a6cd329fae4743
2014-10-14 09:58:11 -07:00
Jingning Han
69a09a70e9 Use speed feature variable in vp9_rd_pick_inter/intra_mode
Replace repeated fetch cpi->sf with a local sf pointer.

Change-Id: I5a55bba3e1c41fbdbc6ad5f078d2fa49dd95ee67
2014-10-13 16:15:00 -07:00
Jingning Han
3bdb6bfcee Fix vp9_rd_pick_inter/intra function types
The returned value is not used anywhere, hence changing the function
type into void.

Change-Id: I0ece49ed61e7aab6df01140135503ad41d4ef4a4
2014-10-13 16:00:46 -07:00
Jingning Han
811cef97c9 Refactor rate distortion cost structure
This commit makes a struct that contains rate value, distortion
value, and the rate-distortion cost. The goal is to provide a
better interface for rate-distortion related operation. It is
first used in rd_pick_partition and saves a few RDCOST calculations.

Change-Id: I1a6ab7b35282d3c80195af59b6810e577544691f
2014-10-13 14:27:16 -07:00
Jingning Han
a62acf3c0a Fix ActiveMapTest valgrind warning
This fixes a valgrind warning in the ActiveMapTest unit test
reported in issue 870.

Change-Id: Idf172ab0244ebefe630c3577e649bc9ba7c43d10
2014-10-11 22:36:58 -07:00