1784 Commits

Author SHA1 Message Date
Debargha Mukherjee
24a04f9048 Merge "Fix decoder crash with supertx" into nextgenv2 2016-06-07 09:46:48 +00:00
Angie Chiang
f67196b2ed Move #if out of TEST_P in vp10_fwd/inv_txfm2d_test.cc
Change-Id: I1d5b2408f27a1e277574c2238f1e49e884596309
2016-06-06 12:45:54 -07:00
Geza Lore
efda2831e5 Optimize wedge partition selection.
We can optimize wedge partition selection by pre-computing the
residuals of the 2 underlying predictors, and then blend these
to compute the sse of the compound predictor, without actually
having to compute and subtract the compound predictor.

Similarly we can pre-compute a proxy array which we can use to
cheaply check which mask sign would have lower sse.

Details are in wedge_utils.c.

Mathematically these are equivalence transformations, but due to the
finite precision the encoder output will be perturbed, though on
average this should make 0% difference.

ext-inter gains about ~4.5% speedup.

Change-Id: Ib2657c3209ae161b4090b58b4b6c392641bf2792
2016-06-06 14:43:10 +01:00
Geza Lore
6c4306c27d Fix decoder crash with supertx
xd->plane[0].n4_h and xd->plane[0].n4_w are not set at that point
when using supertx.

While this fixes the immediate crash described in the referenced
bug report, there are still issues in the ref-mv experiment that
causes these tests to fail, so they are kept disabled.

BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1230

Change-Id: Ibf8ef02847a903f8d10e6be28e16694db10c75af
2016-06-06 09:58:11 +01:00
James Zern
e34e684059 Merge changes If31d36c8,I10b947e7
* changes:
  vpx_dsp,add_noise: remove mmx implementation
  vpx_dsp: remove mmx variance implementations
2016-06-04 00:56:06 +00:00
Linfeng Zhang
b90166665f Merge "Slow pshufb removal in 3 intra prediction functions." 2016-06-03 16:35:14 +00:00
Geza Lore
f19700fe52 Add 1D version of vpx_sum_squares_i16
Change-Id: I0d7bda2fe6f995a9e88a9f66540b4979b3f7fab1
2016-06-03 09:34:55 +01:00
Geza Lore
5a69ee0e11 Move template specializations into .cc from .h
Change-Id: I6d8775c1fa228fde25016a401e3c22a8e3da42f9
2016-06-03 09:34:55 +01:00
James Zern
462e0ff88b vpx_dsp,add_noise: remove mmx implementation
a sse2 version exists, this is a reasonable modern baseline.

Change-Id: If31d36c8412d25b53f41b4a93cf02f46802c0c33
2016-06-02 23:51:22 -07:00
James Zern
eea8ea88ab vpx_dsp: remove mmx variance implementations
there are sse2 equivalents for all remaining variance implementations

Change-Id: I10b947e73fc0067688181f819b59e47966bec3d2
2016-06-02 23:46:16 -07:00
Linfeng Zhang
ad0646cb84 Slow pshufb removal in 3 intra prediction functions.
Replaced vpx_d45_predictor_4x4_ssse3(), vpx_d45_predictor_8x8_ssse3()
and vpx_d207_predictor_4x4_ssse3() with
created vpx_d45_predictor_4x4_sse2(), vpx_d45_predictor_8x8_sse2()
and vpx_d207_predictor_4x4_sse2() respectively.
It's mostly neutral or slightly worse than ssse3 in good cases and
better than ssse3 in the bad cases (but still worse than using the mmx
regs).

Change-Id: Ib0237ceb71d2c57b8a93fd3170330cfed9d56bdd
2016-06-02 10:55:58 -07:00
Alex Converse
380c4ee32d Merge "segmentation: Don't use uninitialized probability data." into nextgenv2 2016-06-01 17:50:37 +00:00
Yaowu Xu
6382727dc5 Fix UBSAN/IOC errors
1. test/dct16x16_test.cc
2. test/dct32x32_test.cc
3. test/fdct8x8_test.cc

BUG=webm:1225

Change-Id: I9c9315fbd65ddb3b44f688e01ba265fd22192198
2016-06-01 16:01:18 +00:00
Alex Converse
7a6cb59dbb segmentation: Don't use uninitialized probability data.
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1224

Change-Id: I17b76fcf0d8c191850350d5aa50dcc007b8b0cdc
2016-05-31 16:42:29 -07:00
James Zern
f6ac6cf5bd Merge "acm_random,Rand9Signed: correct cast" 2016-05-27 18:32:06 +00:00
Linfeng Zhang
2ab7b9a6c9 Merge "Upgrade fwht4x4_mmx() to fwht4x4_sse2() for vp9 and vp10." 2016-05-27 17:51:35 +00:00
James Zern
13d48c4267 acm_random,Rand9Signed: correct cast
convert the random value to int16 before subtracting 256 from it; quiets
a ubsan (sanitize=integer) warning

BUG=webm:1225

Change-Id: Ibc2c5a21f30e112bd6c180f7d6a033327c38d0df
2016-05-27 10:33:56 -07:00
Linfeng Zhang
af7fb17c09 Upgrade fwht4x4_mmx() to fwht4x4_sse2() for vp9 and vp10.
Function level timing test shows about 27% time saving on
a Xeon E5-2680 v2 desktop.

Rename vp9_dct_sse2.c to vp9_dct_intrin_sse2.c for vp9 and
rename dct_sse2.c to dct_intrin_sse2.c for vp10 to avoid
duplicate basenames.

Actually vp9_fwht4x4_mmx/sse2() and vp10_fwht4x4_mmx/sse2()
are identical. TODO: They should be unified later if there is
no intention to keep a duplicate.

Change-Id: I3e537b7bbd9ba417c606cd7c68c4dbbfa583f77d
2016-05-27 09:51:16 -07:00
Linfeng Zhang
0ba9b299e9 Merge "Upgrade vpx_lpf_{vertical,horizontal}_4 mmx to sse2" 2016-05-27 15:47:28 +00:00
James Zern
5d237f0986 vp10_inv_txfm2d_test: fix memory leak
input_, ref_input_ and output_ were being allocated with new[] followed
by vpx_memalign, remove the former

Change-Id: Ia16d0f9b9317042a24445095ad3c284f4e7bb481
2016-05-26 20:04:59 -07:00
Linfeng Zhang
4b5e462d08 Upgrade vpx_lpf_{vertical,horizontal}_4 mmx to sse2
Followed the code style of other lpf fuctions.
These 2 functions put 2 rows of data in a single xmm register,
so they have similar but not identical filter operations,
and cannot share the same macros.

Change-Id: I3bab55a5d1a1232926ac8fd1f03251acc38302bc
2016-05-26 14:55:18 -07:00
Scott LaVarnway
9d24fe60f1 Merge "Code clean of sub_pixel_variance4xh -- 2" 2016-05-26 13:20:24 +00:00
Yi Luo
469d002f4e Merge "Integrate HBD inverse HT flip types sse4.1 optimization" into nextgenv2 2016-05-25 21:35:14 +00:00
Marco
75d551783d vp9: Add datarate test for 1 pass VBR mode.
Existing tests are only for CBR mode.

Change-Id: Ie3b2cd46236457748e2650901d1a347a730f38af
2016-05-25 14:20:30 -07:00
Yi Luo
bfe4c0ae07 Integrate HBD inverse HT flip types sse4.1 optimization
- tx_size: 4x4, 8x8, 16x16.
- tx_type: FLIPADST_DCT, DCT_FLIPADST, FLIPADST_FLIPADST,
  ADST_FLIPADST, FLIPADST_ADST.
- Encoder speed improvement:
  park_joy_1080p_12: ~11%, crowd_run_1080p_12: ~7%.
- Add unit test cases for bit-exact against C.

Change-Id: Ia69d069031fa76c4625e845bfbfe7e6f6ed6e841
2016-05-25 12:32:10 -07:00
James Zern
008f27e70a Merge "add vp10 ActiveMap/ActiveMapRefreshTest" into nextgenv2 2016-05-25 19:05:02 +00:00
Yi Luo
28cdee448d HBD inverse HT 8x8 and 16x16 sse4.1 optimization
- Covers tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
- Encoding speed improves ~27% on crowd_run_1080p_12.
- Merge 4x4, 8x8, 16x16 unit tests in one test file.

Change-Id: I058ef5254d068a9523a826480c78ebbdd231824c
2016-05-24 12:55:30 -07:00
Scott LaVarnway
a4f3751be5 Code clean of sub_pixel_variance4xh -- 2
Replace MMX with SSE2.

Change-Id: Id8482d2589131f9427e7f36bc64413f058caf31f
2016-05-24 04:44:05 -07:00
Debargha Mukherjee
fb65f9b54b Merge "Add optimized vpx_blend_mask6" into nextgenv2 2016-05-23 23:43:52 +00:00
Geza Lore
a661bc87c4 Add optimized vpx_blend_mask6
This is to replace vp10/common/reconinter.c:build_masked_compound.
Functionality is equivalent, but the interface is slightly more
generic.

Total encoder speedup with ext-inter: ~7.5%

Change-Id: Iee18b83ae324ffc9c7f7dc16d4b2b06adb4d4305
2016-05-23 16:28:58 +01:00
Jingning Han
8c9f6c5531 Merge "Clear redundant condition check from vp10_ext_tile_test.cc" into nextgenv2 2016-05-20 22:10:41 +00:00
James Zern
e4bdbd3c0b Merge "Revert "Code clean of sub_pixel_variance4xh"" 2016-05-20 19:11:06 +00:00
Yaowu Xu
0924bcd824 Fix build when vp8 is disabled
Change-Id: Ie1765f086b10d0f7c4d72961d238dfe0d6056dc2
2016-05-20 11:33:07 -07:00
James Zern
3fb55d24e8 Revert "Code clean of sub_pixel_variance4xh"
This reverts commit 2468163e0770108f5216b65445ce05a8241bca21.

causes valgrind errors for overread of buffer in SubpelVarianceTest

Change-Id: I448e52c76f815ac199305b71f7d169f2bc167679
2016-05-19 23:37:27 -07:00
James Zern
84e3639454 Revert "Extend the external fb interface to allocate individual planes."
This reverts commit 6dd7f2b50a65373aa906d678cb5a29fb65531a55.

conversion warnings, crashes in 32-bit builds

Change-Id: I529ead34cd93c862dd07c9a29d8542dda2fc20ea
2016-05-19 23:33:51 -07:00
Jingning Han
7488ae014b Merge "Remove unused private variables from vp10_inv_txfm2d_test.cc" into nextgenv2 2016-05-20 01:23:25 +00:00
Daniele Castagna
04fdbdc5ca Merge "Extend the external fb interface to allocate individual planes." 2016-05-19 18:01:59 +00:00
Jingning Han
e816401a81 Clear redundant condition check from vp10_ext_tile_test.cc
Change-Id: I74e9df9e314e49b931c23a81d14f5a9e143b0b7d
2016-05-19 09:31:18 -07:00
Jingning Han
7d5ccccd47 Remove unused private variables from vp10_inv_txfm2d_test.cc
Change-Id: Ie933d754aca649bdf17cd679b9a31239bf413b63
2016-05-19 09:21:13 -07:00
Yi Luo
346d2449f0 Fix to conform Google's coding convention
- Confirm input coeff buffer is 16-byte aligned.
- sizeof() prefer variable name instead of type.
- Fix function name (Capital first letter then Pascal case).
- Long base class name uses a newline (with colon and 4 space indent).
- Remove a unnecessary reference function variable.
- Method declaration precedes variable declaration in class definition.

Change-Id: I317f7e679926b5219f58c5f7d14512e94985e7fe
2016-05-18 18:15:53 -07:00
James Zern
146ccd304f Merge "Code clean of sub_pixel_variance4xh" 2016-05-18 23:18:35 +00:00
Daniele Castagna
6dd7f2b50a Extend the external fb interface to allocate individual planes.
Change-Id: I73e1b9ea6f4c76ae539e2b3292ee4c751d9c7de4
2016-05-18 16:20:18 -04:00
Johann Koenig
36b610d8c1 Merge "neon hadamard 8x8" 2016-05-18 20:11:16 +00:00
Angie Chiang
6f28581b26 Turn on flip in inverse txfm2d
Fix build failed
Reduce txfm test time

Change-Id: Ieaf6b27f3a272d06286f817f01230413fa8adcf6
2016-05-18 11:26:57 -07:00
Scott LaVarnway
2468163e07 Code clean of sub_pixel_variance4xh
Replace MMX with SSE2.

Change-Id: Ia8fcba755952804e347d7d7736f57d1f90c988a0
2016-05-18 04:24:41 -07:00
Yi Luo
1d307368a9 Integrate HBD row/column flip fwd txfm SSE4.1 optimization
- Integrate 5 flip transform types for each 4x4, 8x8, and 16x16
  block, for experiment, EXT_TX.
- Encoder speed improves about 12%-15%.
- Update the unit tests for bit-exact result against C.

Change-Id: Idf27c87f1e516ca5b66c7b70142477a115404ccb
2016-05-18 03:48:01 +00:00
Yi Luo
ceabb00704 Merge "HBD inverse HT 4x4 SSE4.1 optimization" into nextgenv2 2016-05-16 21:15:08 +00:00
Johann
9b54e812f7 neon hadamard 8x8
Runs about 30% faster than the C

BUG=webm:1021

Change-Id: I6809d6d84c3077ab619c53298296950e976bdaba
2016-05-16 11:58:02 -07:00
hui su
cafbf63d30 Add level test for VP9
Change-Id: I99f50bdd5af3f64a029c2f5f6f5fb1ff45bad67e
2016-05-16 09:54:23 -07:00
Angie Chiang
fdaad9f673 Refactor and add flip unit test to vp10_inv_txfm2d_test.cc
Change-Id: I6aa75c66429a0178852cf8df88f16eaa8e36b629
2016-05-13 12:30:51 -07:00