678 Commits

Author SHA1 Message Date
Jingning Han
9f35cafaa1 Merge "Replace hard coded values in mv_has_subpel" into nextgenv2 2016-05-03 19:25:19 +00:00
Jingning Han
113f8d8746 Replace hard coded values in mv_has_subpel
Change-Id: Id437740c2db1a3a56c1ad29d8b51bb763c044c1d
2016-05-03 09:08:06 -07:00
Debargha Mukherjee
3407785536 Refactoring and uv fix for wedge
lowres: -1.72%

Change-Id: I4c883097caac72fab8e01945454579891617145e
2016-05-03 08:02:08 -07:00
Jim Bankoski
fce3cee8dd Move vpx_add_plane from codec to vpx_dsp and dedup.
Change-Id: I12218d8331c0558c0587a66321e3ca46da7e5cc7
2016-05-02 12:17:39 -07:00
Yue Chen
326975ada3 Merge "Bug fixes for obmc/ext-inter/ext-tile experiment" into nextgenv2 2016-05-02 18:09:08 +00:00
Yi Luo
9be7075f61 Merge "HBD hybrid transform 8x8 SSE4.1 optimization" into nextgenv2 2016-05-02 17:34:50 +00:00
Yue Chen
c1d473849e Bug fixes for obmc/ext-inter/ext-tile experiment
Fix 1: in ext-inter + obmc config, properly identify if the left
predictor used for obmc is a compound one in the case that the
neighbor uses wedgeinterinter pred and we will dump the ALTREF part.
This will fix the seg fault in unit test:
VP10/AltRefForcedKeyTestLarge.Frame1IsKey/0

Fix 2: in ext-tile + obmc experiment, handle the case that the
above block does not fit in the same row tile with the current one,
so as to prevent potential crashes.

Change-Id: I1c177d4f4ad15e10d11d8756e146496437753eea
2016-04-29 19:03:39 -07:00
Yi Luo
299c5fc202 HBD hybrid transform 8x8 SSE4.1 optimization
- Tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
- Update bit-exact unit test against current C version.
- HBD encoder speed improves ~3.8%.

Change-Id: Ie13925ba11214eef2b5326814940638507bf68ec
2016-04-29 17:04:52 -07:00
Debargha Mukherjee
88fe7871be Refactor wedge generation
Change-Id: I2ec4f562e28a4673477e20186f9d6167b24b76b8
2016-04-28 17:51:21 -07:00
Debargha Mukherjee
cf3ee22597 Merge "Make the backward updates work with bitshifts" into nextgenv2 2016-04-28 20:25:53 +00:00
Debargha Mukherjee
e4bf50b9b9 Make the backward updates work with bitshifts
Removes integer divides from backward updates for VP10.
Currently this is put in as part of the entropy experiment.
Coding efficiency change is in the noise level.

Change-Id: I5b3c0ab6169ee6d82d0ca1778e264fd4577cdd32
2016-04-28 11:51:18 -07:00
Debargha Mukherjee
7ff7943455 Brings back near-near compound mode into ext-inter
lowres: improves by 0.1%

Change-Id: I245019916bf47c6e24bc8c3953b86715ab0193c9
2016-04-28 11:34:13 -07:00
Hui Su
338c9e704a Merge "ext-intra: completely remove floating point operations" into nextgenv2 2016-04-27 22:00:22 +00:00
Alex Converse
38dfee802f Merge "Fix vp10 txfm on MSVC 2015." into nextgenv2 2016-04-27 21:38:31 +00:00
Alex Converse
97673cb128 Fix vp10 txfm on MSVC 2015.
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1187

Change-Id: Ied6d3d003ed6ab9cf4f03cdd1d0037ae755254f4
2016-04-27 19:40:02 +00:00
hui su
6e39af3697 ext-intra: completely remove floating point operations
No performance changes

Change-Id: Ia489041253423ddf8ebc7e2d41fbfb9e138109f0
2016-04-27 12:08:38 -07:00
Yue Chen
88bb103f75 Merge "Optimization for EXT_INTER + OBMC" into nextgenv2 2016-04-27 06:29:38 +00:00
Yue Chen
3ac12aecc5 Optimization for EXT_INTER + OBMC
Remove the restriction that the neighboring predictor cannot be
used in obmc prediction if it is an interintra or wedgeinterinter
block. The inter predictor of the interintra block, or the first
inter predictor(using LAST or GOLDEN frame) of the wedgeinterinter
block will be exploited in obmc prediction.

Coding gain: 0.248% (2.833%->3.081%) lowres

Change-Id: I4ac0368b9d2f2956f266b30c1ac97db8bafa0742
2016-04-26 16:50:10 -07:00
Debargha Mukherjee
e5b8a01fd5 Merge "Reduce intra transform set" into nextgenv2 2016-04-26 23:32:16 +00:00
Jingning Han
2f2448aec9 Merge "Rework motion vector precision limit" into nextgenv2 2016-04-26 21:31:10 +00:00
Hui Su
3f7a709676 Merge "ext-intra: get rid of some floating operations." into nextgenv2 2016-04-26 18:53:33 +00:00
Jingning Han
8678ab4c55 Rework motion vector precision limit
This commit enables 1/8 luma component motion vector precision
for all motion vector cases. It improves the compression performance
of lowres by 0.13% and hdres by 0.49%.

Change-Id: Iccfc85e8ee1c0154dfbd18f060344f1e3db5dc18
2016-04-26 10:14:26 -07:00
Debargha Mukherjee
8851acc5ed Reduce intra transform set
Reduce transform set for intra for 8x8 and smalller to 7 from 12.
Also fixes an issue with prob updates.

Enocder Speed-up about 8-10%

Coding efficiency very little change.
lowres: -2.996 (from -3.055 before)
midres: -2.482 (from -2.552 before)

Change-Id: I4ba50ff967521b33c748fe423bd92f7cf4105ebc
2016-04-26 10:10:55 -07:00
hui su
ad50c226e6 ext-intra: get rid of some floating operations.
No performance changes.

Change-Id: Idd4043090fec09e57520bc970ed2e39e6f7e1a5e
2016-04-25 14:44:42 -07:00
Debargha Mukherjee
022da62579 Merge "Clear X87 register state before using double." into nextgenv2 2016-04-25 21:42:23 +00:00
Yi Luo
333ff883e1 Merge "HBD hybrid transform 4x4 SSE4.1 optimization" into nextgenv2 2016-04-25 19:43:35 +00:00
Geza Lore
23c4116ebb Clear X87 register state before using double.
MMX and X87 floating point instructions cannot be mixed freely on
the 32 bit x86 architecture.

This fixes a lot of unit tests in the 32bit build with
--enable-ext-intra.

BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1196

Change-Id: I0e1c3565f4b9cb4fc2d716e94d9c40e68b36fac8
2016-04-25 10:30:20 -07:00
Alex Converse
d4fe243cdf Merge "Raise the probability resolution for rANS tokens to 10-bits per symbol" into nextgenv2 2016-04-25 17:11:16 +00:00
Yi Luo
a4593f17ca HBD hybrid transform 4x4 SSE4.1 optimization
- Optimization on tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
- Overall encoder speed improves ~4.5%-6%.
- Update bit-exact unit test against current C version.

Change-Id: If751c030612245b1c2470200c9570cf40d655504
2016-04-25 09:53:09 -07:00
Jingning Han
b4cbe54ed6 Merge "Fix out-of-bound memory access in loop filter" into nextgenv2 2016-04-25 16:13:04 +00:00
Jingning Han
004c7fa668 Fix out-of-bound memory access in loop filter
This commit fixes an out-of-bound memory access case in the
loop filter mask setting. This issue was introduced in

10232ed Refactor loopfilter level arrays to 2D.
https://chromium-review.googlesource.com/#/c/336645/

Change-Id: I7101a4a79b9ecfdd8ec5ef13a0b314cc95f48d12
2016-04-22 22:57:14 -07:00
Yue Chen
6daf1a460e Fix EXT_INTER unit test failure in 32-bit builds
Align new buffers that are used in interintra and wedgeinterinter prediction.
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1196

Change-Id: I1ef49fdf13c79a22cf8a1737e3d3052da0a92dfe
2016-04-22 22:37:13 -07:00
Alex Converse
1f57aa38cd Raise the probability resolution for rANS tokens to 10-bits per symbol
Change-Id: I397b5a9371c85d1df401d261143c985623e9def6
2016-04-22 15:48:11 -07:00
Yi Luo
cf7f00691f Change hybrid transform function argument from TXFM_2D_CFG* to int
Unit test shows manually developed SSE4.1 code would performs ~30%
  better if TXFM_2D_CFG configuration is set in lower level. This
  change only updates function signature. There is no performance
  impact.

Change-Id: I62692bd50a21ffc8a944bbd6c155c0a2020ad77b
2016-04-21 18:37:21 -07:00
Alex Converse
67058089b4 Merge "Move ZERO_TOKEN into the ANS coef tokenset." into nextgenv2 2016-04-21 18:08:04 +00:00
Alex Converse
fcea1485bb Merge "Store ANS token CDFs in the FRAME_CONTEXT rather than in a global table." into nextgenv2 2016-04-19 23:29:14 +00:00
Alex Converse
3829cd2f2f Move ZERO_TOKEN into the ANS coef tokenset.
Change-Id: I87943e027437543ab31fa3ae1aa8b2de3a063ae5
2016-04-19 15:29:47 -07:00
Jingning Han
1a0352d18e Merge "Handle zero motion vector residual" into nextgenv2 2016-04-19 21:20:08 +00:00
Yue Chen
feb2184c4e Merge "Remove an unsuccessful adaption of overlap sizes in obmc experiment" into nextgenv2 2016-04-19 18:41:02 +00:00
Alex Converse
6ca364606b Store ANS token CDFs in the FRAME_CONTEXT rather than in a global table.
This will facilitate bringing the zero node into the token set while
allowing its probability to vary independently.

Change-Id: I57b44c0fce44debb8e612021e44713b229d1b3cf
2016-04-19 09:39:48 -07:00
Yaowu Xu
efc6aa0c97 Merge "Merge branch 'master' into nextgenv2" into nextgenv2 2016-04-19 15:43:58 +00:00
Jingning Han
ec2ffda599 Handle zero motion vector residual
This commit handles the zero motion vector residuals for single
and compound reference modes, respectively. It improves the coding
performance by 0.13% with no additional encoding complexity.

Change-Id: I16075a836025bd2746da2ff4698fb9261e4b08c1
2016-04-18 18:14:01 -07:00
Yue Chen
c0fd271932 Remove an unsuccessful adaption of overlap sizes in obmc experiment
We removed this adaption, which intended to reduce the size of
overlapped region if the neighboring block is a non-skip one. Thus,
now the width/height of the overlapping region is fixed as a half of
the current block.

Performance improvement (lowres/midres): 0.111%/0.102%

Change-Id: Ife75dad9d4eb355c78a05178b50cc015c442884f
2016-04-18 15:27:59 -07:00
Yaowu Xu
ed04e82a04 Merge branch 'master' into nextgenv2
Conflicts:
	vp10/common/scan.c
	vp9/common/vp9_pred_common.c
	vp9/decoder/vp9_decoder.c

Change-Id: Id559d98ea676da15d60ed464ddb6c48d3eed1111
2016-04-18 15:15:05 -07:00
Angie Chiang
caf066f845 Merge changes I67543d36,I763f2924 into nextgenv2
* changes:
  Reduce shift in txfm8x8
  Let txfm's constant bit be the same for each stage
2016-04-18 19:40:33 +00:00
Angie Chiang
d72560e10d Merge "Fit adst/dct's stage range into 32-bit in bd12" into nextgenv2 2016-04-18 18:40:28 +00:00
Yue Chen
16a99e967c Merge "Optimization for EXT_INTER + OBMC combination" into nextgenv2 2016-04-17 18:54:33 +00:00
Yue Chen
321794c4d5 Optimization for EXT_INTER + OBMC combination
In the rd loop, check the perf of obmc, whose mv is copied from regular
inter predictor, when wedge interinter is better than regular inter
(previously it will force allow_obmc = 0). The condition of the early
termination before this step is relaxed to avoid skipping too many obmc
predictions. The rates of the overhead are properly calculated for these tools.

The logic of the bitstream syntax:
(a single ref) the interintra flag is sent first, only if it is 0, we
send the obmc flag;
(compound refs) the obmc flag is sent first, only if it is 0, we send
the wedge interinter flag

Coding gain
lowres: 0.428% (2.287%->2.715%)

Change-Id: I5f3a34640b398e313cbf84235c9fe2073eb2173f
2016-04-15 17:03:20 -07:00
Jingning Han
4d503d1043 Remove duplicated TxfmFunc declarations
Change-Id: If3876610a1fbce0988cc21ea917596bbb467df93
2016-04-15 12:03:21 -07:00
Angie Chiang
0a715add2e Reduce shift in txfm8x8
Change-Id: I67543d365cbef3c3e113f01660ae8cb744cc556d
2016-04-14 19:12:22 -07:00