Commit Graph

16161 Commits

Author SHA1 Message Date
Yi Luo
a4593f17ca HBD hybrid transform 4x4 SSE4.1 optimization
- Optimization on tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
- Overall encoder speed improves ~4.5%-6%.
- Update bit-exact unit test against current C version.

Change-Id: If751c030612245b1c2470200c9570cf40d655504
2016-04-25 09:53:09 -07:00
Jingning Han
b4cbe54ed6 Merge "Fix out-of-bound memory access in loop filter" into nextgenv2 2016-04-25 16:13:04 +00:00
Jingning Han
221c09aa99 Merge "Refactor sub-pixel motion search" into nextgenv2 2016-04-25 16:12:51 +00:00
James Zern
0aa6435c45 vp10/rdopt: quiet unused variable warning
when CONFIG_REF_MV and CONFIG_EXT_INTER are enabled

Change-Id: I17fa2b5fe0e1878333099cc5fa2b1ee36636b4d3
2016-04-23 16:59:45 +00:00
Yue Chen
8ce563bf92 Merge "Fix EXT_INTER unit test failure in 32-bit builds" into nextgenv2 2016-04-23 16:51:08 +00:00
Jingning Han
004c7fa668 Fix out-of-bound memory access in loop filter
This commit fixes an out-of-bound memory access case in the
loop filter mask setting. This issue was introduced in

10232ed Refactor loopfilter level arrays to 2D.
https://chromium-review.googlesource.com/#/c/336645/

Change-Id: I7101a4a79b9ecfdd8ec5ef13a0b314cc95f48d12
2016-04-22 22:57:14 -07:00
Yue Chen
6daf1a460e Fix EXT_INTER unit test failure in 32-bit builds
Align new buffers that are used in interintra and wedgeinterinter prediction.
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1196

Change-Id: I1ef49fdf13c79a22cf8a1737e3d3052da0a92dfe
2016-04-22 22:37:13 -07:00
Jingning Han
cdf989adb7 Silence compiler above-boundary warnings
Change-Id: I6d806f92e8d38d5b0b01bc8e0fd97bd8839c84df
2016-04-22 15:49:34 -07:00
Jingning Han
77d451ecca Refactor sub-pixel motion search
Unify the rate cost used in the motion estimation process.

Change-Id: I8e52ca9f29eee3469553433302b62fb02a038919
2016-04-22 22:27:09 +00:00
Jingning Han
0dccc85c98 Replace left shift with multiplications
This avoids the potential risk in left shift of negative numbers.

Change-Id: I7aecb499ee6ce7342b172adc4741de5c6c107a24
2016-04-22 20:56:02 +00:00
Jingning Han
3f6ec144e5 Fix an enc/dec mismatch issue in ext-inter experiment
This commit fixes an encoding decision process issue that could
trigger enc/dec mismatch in the ext-inter experiment.

Change-Id: I6f10d1fd2fd1aa04e51df04c39a65cf72ac66c42
2016-04-22 20:49:29 +00:00
Yi Luo
cf7f00691f Change hybrid transform function argument from TXFM_2D_CFG* to int
Unit test shows manually developed SSE4.1 code would performs ~30%
  better if TXFM_2D_CFG configuration is set in lower level. This
  change only updates function signature. There is no performance
  impact.

Change-Id: I62692bd50a21ffc8a944bbd6c155c0a2020ad77b
2016-04-21 18:37:21 -07:00
Alex Converse
8f2fa04181 Unbreak the non-var_tx build.
Change-Id: I76cc3d88122de42f035fbf6508bdf3fd7c995012
2016-04-21 13:27:19 -07:00
Debargha Mukherjee
53968c3917 Merge "Fix uninitialized blk_skip for VAR TX." into nextgenv2 2016-04-21 19:56:17 +00:00
Alex Converse
67058089b4 Merge "Move ZERO_TOKEN into the ANS coef tokenset." into nextgenv2 2016-04-21 18:08:04 +00:00
Angie Chiang
7d598d658c Merge "relax txfm test error constraint" into nextgenv2 2016-04-20 02:17:40 +00:00
Alex Converse
fcea1485bb Merge "Store ANS token CDFs in the FRAME_CONTEXT rather than in a global table." into nextgenv2 2016-04-19 23:29:14 +00:00
Alex Converse
3829cd2f2f Move ZERO_TOKEN into the ANS coef tokenset.
Change-Id: I87943e027437543ab31fa3ae1aa8b2de3a063ae5
2016-04-19 15:29:47 -07:00
Jingning Han
1a0352d18e Merge "Handle zero motion vector residual" into nextgenv2 2016-04-19 21:20:08 +00:00
Hui Su
1e02f2e8a4 Merge "Adjust optimize_b RD parameters" into nextgenv2 2016-04-19 21:18:50 +00:00
Hui Su
ee8c72d95a Merge "Enable optimize_b for intra blocks" into nextgenv2 2016-04-19 21:18:37 +00:00
Angie Chiang
aea0cc5041 Merge "Change the naming of txfm#d_test" into nextgenv2 2016-04-19 19:52:03 +00:00
Angie Chiang
218dfbd547 Change the naming of txfm#d_test
Change-Id: I151b18b38f7a000fb6e431cd42675ac4e7e9e3ca
2016-04-19 11:59:00 -07:00
Yue Chen
feb2184c4e Merge "Remove an unsuccessful adaption of overlap sizes in obmc experiment" into nextgenv2 2016-04-19 18:41:02 +00:00
hui su
ad59b08f76 Adjust optimize_b RD parameters
Coding gain:
lowres  0.44%
midres  0.24%
hdres   0.32%

Change-Id: Ie558203b2b2bf5c16cd49b114df3d696c4f35049
2016-04-19 09:54:08 -07:00
hui su
e43c21112d Enable optimize_b for intra blocks
Coding gain:
lowres  0.05%
midres  0.10%
hdres   0.18%

Change-Id: I508b150c02588f911a8ddddfe73c770f0819fe10
2016-04-19 09:50:45 -07:00
Alex Converse
6ca364606b Store ANS token CDFs in the FRAME_CONTEXT rather than in a global table.
This will facilitate bringing the zero node into the token set while
allowing its probability to vary independently.

Change-Id: I57b44c0fce44debb8e612021e44713b229d1b3cf
2016-04-19 09:39:48 -07:00
Alex Converse
ab759be8d9 Merge "Use an exponential growth approach for the ANS reversal buffer." into nextgenv2 2016-04-19 16:39:18 +00:00
Geza Lore
7aa95be980 Fix uninitialized blk_skip for VAR TX.
x->blk_skip used to be uninitialized (leftover from encoding the
previous block), if cm->tx_mode != TX_MODE_SELECT (which is used with
higher --cpu-used or --rt options). This resulted in degraded coding
performance when using cm->tx_mode != TX_MODE_SELECT.

This fixes the VP10/EndToEndTestLarge.EndtoEndPSNRTest/40 unit test.

Also fixed an edge effect where encode_block in encodemb.c used the
formal width of the block (without cropping at the right edge), to
look up blk_skip, while select_tx_block in rdopt.c used the cropped
width to set blk_skip.

Change-Id: I76d0f49ac5ab3ab54203573e0d7fcfcc1c6aa10d
2016-04-19 17:00:20 +01:00
Yaowu Xu
efc6aa0c97 Merge "Merge branch 'master' into nextgenv2" into nextgenv2 2016-04-19 15:43:58 +00:00
Geza Lore
8d64b53dc8 Revert "Fix uninitialized blk_skip for VAR TX."
This reverts commit e7b89d8835.
2016-04-19 15:41:56 +01:00
Geza Lore
e7b89d8835 Fix uninitialized blk_skip for VAR TX.
x->blk_skip used to be uninitialzied (leftover from encoding the
previous block), if cm->tx_mode != TX_MODE_SELECT (which is used with
higher --cpu-used or --rt options). This resulted in degraded coding
performance when uning cm->tx_mode != TX_MODE_SELECT.

This fixes the VP10/EndToEndTestLarge.EndtoEndPSNRTest/40 unit test.

Change-Id: If39062927446798c626fc93694b4e6a4f35fa5da
2016-04-19 14:22:48 +01:00
Jingning Han
ec2ffda599 Handle zero motion vector residual
This commit handles the zero motion vector residuals for single
and compound reference modes, respectively. It improves the coding
performance by 0.13% with no additional encoding complexity.

Change-Id: I16075a836025bd2746da2ff4698fb9261e4b08c1
2016-04-18 18:14:01 -07:00
Yi Luo
3cf1a082e0 Merge "Disable HBD 4x4 DCT_DCT HT test" into nextgenv2 2016-04-18 23:07:25 +00:00
Yue Chen
c0fd271932 Remove an unsuccessful adaption of overlap sizes in obmc experiment
We removed this adaption, which intended to reduce the size of
overlapped region if the neighboring block is a non-skip one. Thus,
now the width/height of the overlapping region is fixed as a half of
the current block.

Performance improvement (lowres/midres): 0.111%/0.102%

Change-Id: Ife75dad9d4eb355c78a05178b50cc015c442884f
2016-04-18 15:27:59 -07:00
Yaowu Xu
ed04e82a04 Merge branch 'master' into nextgenv2
Conflicts:
	vp10/common/scan.c
	vp9/common/vp9_pred_common.c
	vp9/decoder/vp9_decoder.c

Change-Id: Id559d98ea676da15d60ed464ddb6c48d3eed1111
2016-04-18 15:15:05 -07:00
Jingning Han
2aa6117bda Refactor transform selection process
This commit re-arranges the transform type and size selectio
process. It removes an unnecessary rate-distortion cost computation
step. Local experiments show that this speeds up the encoding
process by 6% for both the baseline and the ext-intra experiment.

Change-Id: Iab3b86a63a1e9e55548466791ed5d29a0575c1e7
2016-04-18 19:45:56 +00:00
Jingning Han
c5449d3eb7 Merge "Refactor rd_variance_adjustment function" into nextgenv2 2016-04-18 19:45:45 +00:00
Angie Chiang
caf066f845 Merge changes I67543d36,I763f2924 into nextgenv2
* changes:
  Reduce shift in txfm8x8
  Let txfm's constant bit be the same for each stage
2016-04-18 19:40:33 +00:00
Yi Luo
dd04329367 Disable HBD 4x4 DCT_DCT HT test
- HBD HT unit tests will be modified to test against new algorithm.

Change-Id: Iba58eeb21a45612685c93c98d7c846dab25e6638
2016-04-18 12:24:31 -07:00
Paul Wilkins
2e0841931c Merge "Adjustment to prediction decay." 2016-04-18 18:47:13 +00:00
Angie Chiang
d72560e10d Merge "Fit adst/dct's stage range into 32-bit in bd12" into nextgenv2 2016-04-18 18:40:28 +00:00
Angie Chiang
cf3ef18fc4 Merge "Remove double operation from tx_size selection" into nextgenv2 2016-04-18 18:11:36 +00:00
Yi Luo
a431a93cb1 Merge "Improvement on hybrid transform 4x4 DCT_DCT SSE4.1 optimization" into nextgenv2 2016-04-18 18:04:06 +00:00
Angie Chiang
6de4a77df3 Remove double operation from tx_size selection
This CL fix the bug
rdopt.c:1687: choose_tx_size_from_rd: Assertion
`mbmi->tx_type == DCT_DCT' failed

It is caused by
1) mms register access before double operation
2) different compiler behaviors
code:
  int64_t a = INT64_MAX;
  double b = 1. * INT64_MAX;
  printf("a < b: %d\n", a < b);
result:
  a < b: 0

code:
  --target=x86-linux-gcc
  int64_t a = INT64_MAX;
  double b = 1. * INT64_MAX;
  printf("a < b: %d\n", a < b);
result:
  a < b: 1

I remove the double operation and test it with EXT_TX experiment.
The psnr change is around 0.05%, which is considered as noise level.

Change-Id: If8935c70c8603617fcfa8571accd30ccdda786a0
2016-04-18 11:00:13 -07:00
Jingning Han
c8312daad1 Refactor rd_variance_adjustment function
Compute the reconstruction variance in the prediction mode search.

Change-Id: Id9c7635a9c9f5383e61c0e427e95234211834301
2016-04-18 09:37:34 -07:00
Yue Chen
16a99e967c Merge "Optimization for EXT_INTER + OBMC combination" into nextgenv2 2016-04-17 18:54:33 +00:00
Yue Chen
321794c4d5 Optimization for EXT_INTER + OBMC combination
In the rd loop, check the perf of obmc, whose mv is copied from regular
inter predictor, when wedge interinter is better than regular inter
(previously it will force allow_obmc = 0). The condition of the early
termination before this step is relaxed to avoid skipping too many obmc
predictions. The rates of the overhead are properly calculated for these tools.

The logic of the bitstream syntax:
(a single ref) the interintra flag is sent first, only if it is 0, we
send the obmc flag;
(compound refs) the obmc flag is sent first, only if it is 0, we send
the wedge interinter flag

Coding gain
lowres: 0.428% (2.287%->2.715%)

Change-Id: I5f3a34640b398e313cbf84235c9fe2073eb2173f
2016-04-15 17:03:20 -07:00
Yi Luo
71fa2b2218 Merge "Fix an unaligned memory allocation in HT 4x4 speed test" into nextgenv2 2016-04-15 23:56:21 +00:00
Angie Chiang
e7f64756a1 Merge "remove redundant header" into nextgenv2 2016-04-15 22:44:33 +00:00