309 Commits

Author SHA1 Message Date
Angie Chiang
ed8cd9a9b4 adapt_scan experiment
Performance improvement
        BDRate
lowres  0.921%
midres  0.730%
hdres   1.019%

Change-Id: I26208d6c0531937bff44de505b4ea355c7852802
2016-10-24 18:24:56 -07:00
hui su
9ff4134f54 Renaming in filter-intra sse4 code
Change-Id: Iff1786a92d164e6b9cfaf4a59ece79819494276f
2016-10-19 21:41:06 -07:00
hui su
5db9743fbb Seperate FILTER_INTRA from EXT_INTRA experiment
Prepare for the av1/nextgenv2 merge.

Coding gain (%):

               lowres     midres
ext-intra       0.69       0.97
filter-intra    0.67       0.83
both            1.05       1.48

Change-Id: Ia24d6fafb3e484c4f92192e0b7eee5e39f4f4ee6
2016-10-19 21:40:49 -07:00
Michael Bebenita
6048d05225 Bit accounting.
This patch adds bit account infrastructure to the bit reader API.
When configured with --enable-accounting, every bit reader API
function records the number of bits necessary to decoding a symbol.
Accounting symbol entries are collected in global accounting data
structure, that can be used to understand exactly where bits are
spent (http://aomanalyzer.org). The data structure is cleared and
reused each frame to reduce memory usage. When configured without
--enable-accounting, bit accounting does not incur any runtime
overhead.

All aom_read_xxx functions now have an additional string parameter
that specifies the symbol name. By default, the ACCT_STR macro is
used (which expands to __func__). For more precise accounting,
these should be replaced with more descriptive names.

Change-Id: Ia2e1343cb842c9391b12b77272587dfbe307a56d
2016-10-19 04:34:29 +00:00
Debargha Mukherjee
078856a4df Merge "Simplify 8x16 and 16x8 inverse transform tests" into nextgenv2 2016-10-14 02:53:38 +00:00
Debargha Mukherjee
a720f4b3b5 Merge "Add sse2 forward and inverse 16x32 and 32x16 transforms" into nextgenv2 2016-10-14 02:49:20 +00:00
Yue Chen
a48764d05f Merge "Renamings for OBMC experiment" into nextgenv2 2016-10-14 01:33:00 +00:00
Yue Chen
cb60b185c7 Renamings for OBMC experiment
To get ready for pulling AV1 to nextgenv2
Replace the experimental flag by MOTION_VAR. Rename major variables.

Change-Id: If6cf4f37b9319c46d8f90df551cc7295d66ca205
2016-10-13 15:51:22 -07:00
David Barker
4f803efac1 Simplify 8x16 and 16x8 inverse transform tests
Change-Id: Ie86aedfb1f3e0d9c0cf58d7183861a0ed0e8ccc8
2016-10-13 16:02:59 +01:00
David Barker
33231d4801 Add sse2 forward and inverse 16x32 and 32x16 transforms
Change-Id: I1241257430f1e08ead1ce0f31db8272b50783102
2016-10-13 14:01:22 +01:00
Yi Luo
fed8e1c06d Hybrid forward transform 32x32 AVX2 optimization
- av1_fht32x32 AVX2 function level time reduction ~89% compared to C.

- av1_fht32x32_avx2() on DCT_DCT improves 42.62% over aom_fdct32x32_avx2()
  But function replacement must go with the corresponding inverse txfm.

- No obvious user level time reduction due to 32x32 TX_TYPE selection.

- Zero high 128b YMM to avoid AVX-SSE transition penalties
  (fix 16x16 case).

- Added 32x32 AVX2 unit tests to verify bitexact.

- AVX2 optimization summary:
  On CPU i7-6700, based on 16x16/32x32 fwd txfm optimization results:
  C to AVX2: function level time reduction, ~86-89%.
  SSE2 to AVX2: function level time reduction, ~51%.

Change-Id: Idd0cd8bf066a61c7117140ef15ab6c1f8eb4b036
2016-10-12 14:19:53 -07:00
Yaowu Xu
4960f7c3bd Merge "Added generic SIMD support for CLPF." into nextgenv2 2016-10-11 16:05:18 +00:00
Steinar Midtskogen
be668e92c3 Added generic SIMD support for CLPF.
Change-Id: Ie03f9a5b0a4c708a586532198d755a1e7509f149
2016-10-10 11:19:37 -07:00
David Barker
4d03d6fc6f Add sse2 forward / inverse 4x8 and 8x4 transforms
Change-Id: I89ed93fb20cf975c2b463cff58879521ceaa4163
2016-10-10 09:02:45 -07:00
Peter de Rivaz
1baecfeb03 Added sse2 inverse 8x16 and 16x8 transforms
Change-Id: I43628407b11e5c8e6af4df69f2acdc67ac827834
2016-10-06 11:23:14 -07:00
Alex Converse
0ad82c6edb Rename av1_ans_test to match aom/master.
Change-Id: I3b2137903a87a1f8169ff45e940575b917c26a6a
2016-09-26 13:15:41 -07:00
Alex Converse
080a2cccba Migrate bitwriter to the interface in aom/master
Change-Id: I73d46229f0feea43cbe933e51da997833cce032b
2016-09-21 11:17:08 -07:00
Geza Lore
1a800f6539 Add SSE2 versions of av1_fht8x16 and av1_fht16x8
Encoder speedup ~2% with ext-tx + rect-tx

Change-Id: Id56ddf102a887de31d181bde6d8ef8c4f03da945
2016-09-09 11:29:41 -07:00
Yaowu Xu
f883b42cab Port renaming changes from AOMedia
Cherry-Picked the following commits:
0defd8f Changed "WebM" to "AOMedia" & "webm" to "aomedia"
54e6676 Replace "VPx" by "AVx"
5082a36 Change "Vpx" to "Avx"
7df44f1 Replace "Vp9" w/ "Av1"
967f722 Remove kVp9CodecId
828f30c Change "Vp8" to "AOM"
030b5ff AUTHORS regenerated
2524cae Add ref-mv experimental flag
016762b Change copyright notice to AOMedia form
81e5526 Replace vp9 w/ av1
9b94565 Add missing files
fa8ca9f Change "vp9" to "av1"
ec838b7  Convert "vp8" to "aom"
80edfa0 Change "VP9" to "AV1"
d1a11fb Change "vp8" to "aom"
7b58251 Point to WebM test data
dd1a5c8 Replace "VP8" with "AOM"
ff00fc0 Change "VPX" to "AOM"
01dee0b Change "vp10" to "av1" in source code
cebe6f0 Convert "vpx" to "aom"
17b0567 rename vp10*.mk to av1_*.mk
fe5f8a8 rename files vp10_* to av1_*

Change-Id: I6fc3d18eb11fc171e46140c836ad5339cf6c9419
2016-08-31 18:19:03 -07:00
Yaowu Xu
c27fc14b02 Port folder renaming changes from AOM
Manually cherry-picked commits:
ceef058 libvpx->libaom part2
3d26d91 libvpx -> libaom
cfea7dd vp10/ -> av1/
3a8eff7 Fix a build issue for a test
bf4202e Rename vpx to aom

Change-Id: I1b0eb5a40796e3aaf41c58984b4229a439a597dc
2016-08-31 17:26:24 -07:00
Yaowu Xu
253c001f8f Port dering experiment from aom
Mannually cherry-picked:
1579133 Use OD_DIVU for small divisions in temporal_filter.
0312229 Replace divides by small values with multiplies.
9c48eec Removing divisions from od_dir_find8()
0950ed8 Merge "Port active map / cyclic refresh fixes to vp10."
efefdad Port active map / cyclic refresh fixes to vp10.
1eaf748 Port switch to 9-bit rate cost to aom.
0b1606e Only build deringing code when --enable-dering.
e2511e1 Deringing cleanup: don't hardcode the number of levels
8fe5c5d Rename dering_in to od_dering_in to sync with Daala
4eb1380 Makes second filters for 45-degree directions horizontal
7f4c3f5 Removes the superblock variance contribution to the threshold
3dc56f9 Simplifying arithmetic by using multiply+shift
cf2aaba Return 0 explicitly for OD_ILOG(0).
49ca22a Use the Daala implementation of OD_ILOG().
8518724 Fix compiler warning in od_dering.c.
485d6a6 Prevent multiple inclusion of odintrin.h.
51b7a99 Adds the Daala deringing filter as experimental

Note that a few of the changes were already in libvpx codebse.

Change-Id: I1c32ee7694e5ad22c98b06ff97737cd792cd88ae
2016-08-16 13:47:18 +00:00
James Zern
cc73e1fcd4 remove SVC
spatial/temporal scalability are not supported in VP10 currently.
+ remove the unused vp10/encoder/skin_detection.[hc]

this also enables DatarateTestLarge for VP10 which passes with no
experiments enabled. these were removed previously when only the SVC
tests should have been:
134710a Disable tests not applicable to VP10

Change-Id: I9ee7a0dd5ad3d8cc1e8fd5f0a90260fa43da387c
2016-08-09 18:42:20 -07:00
Yi Luo
57c4711b5c Optimization EXT_INTRA's filtered intra predictor (SSE4.1)
- Add unit tests to verify the bit-exact result.
- In speed test, function speed (for each mode/tx_size)
  improves about 23%~35%.
- On E5-2680, park_joy_1080p, 10 frames, --kf-max-dist=1,
  encoding time improves about 1%~2%.

Change-Id: Id89f313d44eea562c02e775a6253dc4df7e046a9
2016-08-08 10:02:36 -07:00
Yaowu Xu
a3cff08259 more cleanup of vp8 and vp9
Change-Id: Ic90ebe6136f4b75645ba699d49c0bcb3764ddccf
2016-08-03 12:20:33 -07:00
Yaowu Xu
5eee90730b Rename files with vp9_ prefix
Change-Id: I9c51ae3a2af698efe32288b807f881385e19822b
2016-07-29 16:45:08 +00:00
Yaowu Xu
3fa28d51af More vp8/vp9 clean up
Change-Id: I8101de20e873c19d03c7fd2977bc22003e395807
2016-07-28 18:22:47 -07:00
Yaowu Xu
3bd709fafe Remove vp8, vp9 folders
Change-Id: I09b8acd22d031ece52e1fee18b998349bf1cf06b
2016-07-28 14:33:21 +00:00
Yi Luo
b2663a8a67 HBD fast path quantization speed improvement
- HBD encoder speed improvement (SSE4.1):
  Enable CONFIG_VP9_HIGHBITDEPTH, on Xeon E5-2680,
  50 frames, park_joy_1080p, 12-bit,
  Encoding time reduces from 4846481 to 4177471 (ms)
- Add unit test to verify bit-exact and EOB calculation

Change-Id: I08e8ef3549ddad5ab36d86e78557df3b288537ea
2016-07-20 14:11:10 -07:00
Yaowu Xu
6fe07a207b Merge branch 'master' into nextgenv2
Change-Id: Ia3c0f2103fd997613d9f16156795028f89f63265
2016-07-14 16:05:48 -07:00
Geza Lore
ebc2d34cd9 Add SSE4.1 vpx_obmc_variance* implementations and cosmetics
Speedup for these functions: 4x
Also include some cosmetic changes to SAD functions

Change-Id: I344c32c795492507ae08742f52d035a13f583799
2016-07-12 21:04:46 -07:00
Geza Lore
bfa59b4a5f Improve vpx_blend_* functions.
- Made source buffers pointers to const.
- Renamed vpx_blend_mask6b to vpx_blend_a64_mask. This is more
  indicative that the function does alpha blending. The 6, or 6b
  suffix was misleading, as the max mask value (64) does not fit into
  6 bits.
- Added VPX_BLEND_* macros to use when needing to blend scalars.
- Use VPX_BLEND_A256 in combine_interintra to be more explicit about
  the operation being done.
- Added versions of vpx_blend_a64_* which take 1D horizontal/vertical
  masks directly and apply them to all rows/columns
  (vpx_blend_a64_hmask and vpx_blend_a64_vmask). The SSE4.1 optimzied
  horizontal version now falls back on the 2D version. This can be
  improved upon if it show up high enough in a profile.
- All vpx_blend_a64_* functions now support block sizes down to 1x1
  (ie: a single pixel). This is for usage convenience. The SSE4.1
  optimized versions fall back on the C implementation if
  w <= 2 or h <= 2. This can again be improved if it becomes hot code.

Change-Id: I13ab3835146ffafe3e1d74d8e9cf64a5abe4144d
2016-07-11 19:05:17 +01:00
Debargha Mukherjee
72ef6d7704 Refactor and clean up on blend_mask6
Change-Id: Ie9188471e7dc07ab9c95b22f258b1662e895c533
2016-07-08 15:02:57 -07:00
Geza Lore
e6f8c17ac5 Remove various testing utilities.
test/assertion_helpers.h
test/randomise.{cc,h}
test/snapshot.h

Modfiy blend_mask6_test.cc not to rely on these.

Change-Id: I88b8933fe0a729a606797e5cd421795a544c612d
2016-07-07 16:22:07 +01:00
Debargha Mukherjee
fabc0ed7ad Merge "Reinstate tests for wedge partition selection optimizations." into nextgenv2 2016-07-07 05:55:07 +00:00
Geza Lore
aacdf98c9a Add SSE4.1 vpx_obmc_sad* implementations.
Speedup for these functions: 4x

Change-Id: I21baa04f53c6ab308ea3edf3ebacc62970e97454
2016-07-06 19:46:13 +00:00
Geza Lore
2791d9db1e Reinstate tests for wedge partition selection optimizations.
This reinstates the tests from commit
efda2831e5f758b4f350679b5c55c0b9282449b0 with the appropriate
fixes for 32 bit x86 builds.

Change-Id: Ib331906c5b448ca964895ee9cbfd4266f67d1089
2016-07-06 15:09:46 +01:00
Yi Luo
f1a50db2d1 Merge "Convolution horizontal filter SSSE3 optimization" into nextgenv2 2016-06-20 20:06:02 +00:00
Yi Luo
229690a95c Convolution horizontal filter SSSE3 optimization
- Apply signal direction/4-pixel vertical/8-pixel vertical
  parallelism.
- Add unit test to verify the bit exact result.
- Overall encoding time improves ~24% on Xeon E5-2680 CPU.

Change-Id: I104dcbfd43451476fee1f94cd16ca5f965878e59
2016-06-20 11:10:30 -07:00
Tom Finegan
5a9f21db54 Output frames in first pass for VPX_DL_REALTIME.
Since combining VPX_DL_REALTIME with VPX_RC_FIRST_PASS is basically
nonsense, ignore the user's pass setting when this happens and
behave as if the requested encode is a single pass encode.

BUG=webm:1233

Change-Id: I5ee4c4e5838c4ca6d24988890aae490b10826db2
2016-06-17 11:25:55 -07:00
James Zern
94e84bbc07 cosmetics,test.mk: fix a typo
Change-Id: Ib74a494e1cf50a356f51e8185e19ca66fcb896a2
2016-06-15 20:33:04 -07:00
James Zern
fba6f748e8 rename vp9_end_to_end_test.cc -> end_to_end_test.cc
this is shared between vp9/10

BUG=webm:1235

Change-Id: I2f44b15268a33453a1c1e0c691d4fc1fc12d0263
2016-06-15 18:30:22 -07:00
James Zern
2710f76692 vp9_end_to_end_test: enable in vp10-only builds
this file is shared between vp9 & vp10; this makes it available in the
presence of --disable-vp9

BUG=webm:1235

Change-Id: Iaf060c3c09afd2c7df69995b0c01589f78d4945e
2016-06-15 18:28:30 -07:00
Angie Chiang
95340fccb3 Revert "Optimize wedge partition selection."
This reverts commit efda2831e5f758b4f350679b5c55c0b9282449b0.

This commit causes segmentation fault at SSE2/SumSquares2DTest.RandomValues/0

Change-Id: I171937e4daf6f15323e8206418773deb03bd8c53
2016-06-09 19:17:37 -07:00
Geza Lore
efda2831e5 Optimize wedge partition selection.
We can optimize wedge partition selection by pre-computing the
residuals of the 2 underlying predictors, and then blend these
to compute the sse of the compound predictor, without actually
having to compute and subtract the compound predictor.

Similarly we can pre-compute a proxy array which we can use to
cheaply check which mask sign would have lower sse.

Details are in wedge_utils.c.

Mathematically these are equivalence transformations, but due to the
finite precision the encoder output will be perturbed, though on
average this should make 0% difference.

ext-inter gains about ~4.5% speedup.

Change-Id: Ib2657c3209ae161b4090b58b4b6c392641bf2792
2016-06-06 14:43:10 +01:00
Geza Lore
5a69ee0e11 Move template specializations into .cc from .h
Change-Id: I6d8775c1fa228fde25016a401e3c22a8e3da42f9
2016-06-03 09:34:55 +01:00
James Zern
008f27e70a Merge "add vp10 ActiveMap/ActiveMapRefreshTest" into nextgenv2 2016-05-25 19:05:02 +00:00
Yi Luo
28cdee448d HBD inverse HT 8x8 and 16x16 sse4.1 optimization
- Covers tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
- Encoding speed improves ~27% on crowd_run_1080p_12.
- Merge 4x4, 8x8, 16x16 unit tests in one test file.

Change-Id: I058ef5254d068a9523a826480c78ebbdd231824c
2016-05-24 12:55:30 -07:00
Geza Lore
a661bc87c4 Add optimized vpx_blend_mask6
This is to replace vp10/common/reconinter.c:build_masked_compound.
Functionality is equivalent, but the interface is slightly more
generic.

Total encoder speedup with ext-inter: ~7.5%

Change-Id: Iee18b83ae324ffc9c7f7dc16d4b2b06adb4d4305
2016-05-23 16:28:58 +01:00
Yi Luo
ceabb00704 Merge "HBD inverse HT 4x4 SSE4.1 optimization" into nextgenv2 2016-05-16 21:15:08 +00:00
hui su
cafbf63d30 Add level test for VP9
Change-Id: I99f50bdd5af3f64a029c2f5f6f5fb1ff45bad67e
2016-05-16 09:54:23 -07:00