253 Commits

Author SHA1 Message Date
Jerome Jiang
b1dcaf7f1e Merge "Fix segmentation fault caused by denoiser working with spatial SVC." 2017-02-22 04:44:55 +00:00
Jerome Jiang
0d1e5a21c4 Fix segmentation fault caused by denoiser working with spatial SVC.
Re-enable the affected test.
BUG=webm:1374

Change-Id: I98cd49403927123546d1d0056660b98c9cb8babb
2017-02-21 09:38:28 -08:00
Yi Luo
1f8e8e5bf1 Fix idct8x8 SSSE3 SingleExtremeCoeff unit tests
- In SSSE3 optimization, 16-bit addition and subtraction would
  overflow when input coefficient is 16-bit signed extreme values.
- Function-level speed becomes slower (unit ms):
  idct8x8_64: 284 -> 294
  idct8x8_12: 145 -> 158.

BUG=webm:1332

Change-Id: I1e4bf9d30a6d4112b8cac5823729565bf145e40b
2017-02-17 14:05:05 -08:00
Yi Luo
f62dcc9c33 Replace idct32x32_1024_add_ssse3 assembly with intrinsics
- Encoding/decoding test, BQTerrace_1920x1080_60.y4m, on
  i7-6700, no obvious user-level speed performance downgrade.
- Passed unit tests.

Change-Id: I20688e0dd3731021ec8fb4404734336f1a426bfc
2017-02-16 16:10:40 -08:00
Johann Koenig
a9b81da575 Merge "block error avx2: use tran_low_t" 2017-02-16 23:51:14 +00:00
Johann Koenig
06a82af0de Merge "correct bitdepth_conversion_sse2.h header guard" 2017-02-16 21:41:28 +00:00
Johann
6c2d732bf4 correct bitdepth_conversion_sse2.h header guard
Change-Id: Ic4ffd861608e67fe59bcb3a86010ce3ef11a5519
2017-02-16 12:43:33 -08:00
Yi Luo
1cb44945fb Merge "Add idct32x32_135_add SSSE3 intrinsics" 2017-02-16 20:43:29 +00:00
Johann
2104454607 block error avx2: use tran_low_t
Change-Id: Ic5f3a1f569d6f82afeaf4fcd7235374bb460db3c
2017-02-16 12:39:02 -08:00
Yi Luo
72a43e2378 Add idct32x32_135_add SSSE3 intrinsics
- Replace the corresponding assembly code.
- No user level speed performance degrade.
- Unit tests passed.

Change-Id: Idd0c5a4bad4976f1617c34100cb46e75e3b961e5
2017-02-16 11:29:34 -08:00
Johann
4682130b60 quantize_fp highbd ssse3: use tran_low_t for coeff
Change-Id: Iebade0efc0efbb0a80a0f3adbef4962e3a2f25e8
2017-02-16 07:40:56 -08:00
Johann
44600442dc bitdepth conversion: really use num elements
The previous implementation confused bit/bytes/elements. It was using
'32' as the multiplier but that was mistakenly adopted because a 32x32
transform embedded the stride.

Change-Id: Ieeb867a332416b9a40580b5e7c9b20088e9e691a
2017-02-16 15:02:48 +00:00
Johann
327a02d77e Use 'packssdw' for loading tran_low_t values
This matches bitdepth_conversion_sse2.asm and produces substantially
better assembly. The old way had lots of 'movzwl' and 'shl' and storing
back to memory before loading into an xmm register.

Change-Id: Ib33e35354dfd691a4f8b1e39f4dbcbb14cd5302b
2017-02-14 22:39:49 +00:00
clang-format
4b402746ca apply clang-format
Change-Id: I75e4a9e0b37bd4586f26c8d6c1fa27f3f6ff1bce
2017-02-14 12:45:52 -08:00
Yi Luo
bd86de1ac8 Replace idct32x32_34_add_ssse3 assembly with intrinsics
- No user-level speed performance change.
- Pass unit tests.

Change-Id: Idfc598e00f354265e41f6b3219f4734216c115c6
2017-02-14 10:38:36 -08:00
Yi Luo
ac04d11abc Replace idct8x8_12_add_ssse3 assembly code with intrinsics
- Performance achieves the same as assembly.
- Unit tests pass.

Change-Id: I6eacfbbd826b3946c724d78fbef7948af6406ccd
2017-02-08 10:07:45 -08:00
Johann
641fda79bb highbd x86: consolidate tran_low_t conversions
Create new helper files specifically for converting tran_low_t types.

Change-Id: I7c4c458ef910f3b3d10a3cfbf9df4de7682fd905
2017-02-06 10:43:26 -08:00
Jingning Han
bb40844e32 Merge "Add SSSE3 intrinsic 8x8 inverse 2D-DCT" 2017-02-02 22:18:32 +00:00
Johann Koenig
ce6318f254 Merge changes I43521ad3,I013659f6
* changes:
  satd highbd neon: use tran_low_t for coeff
  satd highbd sse2: use tran_low_t for coeff
2017-02-02 03:03:58 +00:00
Jingning Han
8f95389742 Add SSSE3 intrinsic 8x8 inverse 2D-DCT
The intrinsic version reduces the average cycles from 183 to 175.

Change-Id: I7c1bcdb0a830266e93d8347aed38120fb3be0e03
2017-02-01 14:47:53 -08:00
Johann
2ba383474d satd highbd sse2: use tran_low_t for coeff
BUG=webm:1365

Change-Id: I013659f6b9fbf9cc52ab840eae520fe0b5f883fb
2017-02-01 11:55:16 -08:00
Johann
0f751ecee3 hadamard highbd ssse3: use tran_low_t for coeff
BUG=webm:1365

Change-Id: I374dfc08732932382043905f128e928b08cb4f57
2017-02-01 11:51:15 -08:00
Johann
2dac808dd1 hadamard highbd sse2: use tran_low_t for coeff
BUG=webm:1365

Change-Id: Ica414007d8412ceebfffa9e58e8416226a3fe934
2017-02-01 11:46:57 -08:00
Johann
dcfff3ccc8 quantize ssse3: remove unused pxor
Change-Id: Ifa22d77fd530827de0b32ae71810dc2213ab2937
2017-01-30 17:02:57 -08:00
Jingning Han
39fff1bea0 Rework 8x8 transpose SSSE3 for avg computation
Use same transpose process as inv_txfm_sse2 does.

Change-Id: I2db05f0b254628a11f621c4c09abb89501ba6d3c
2017-01-12 15:16:07 -08:00
Jingning Han
f65170ea84 Rework 8x8 transpose SSSE3 for inverse 2D-DCT
Use same transpose process as inv_txfm_sse2 does.

Change-Id: Ic4827825bd174cba57a0a80e19bf458a648e7d94
2017-01-12 15:13:18 -08:00
Jingning Han
9a780fa7db Rework forward 8x8 2D-DCT ssse3 implementation
This commit reworks the SSSE3 implementation of the forward 8x8
2D-DCT. It uses a cyclic rotation approach to the temporary xmm
registers. It reduces the average cycles from 158 to 154. The SSE2
version uses 169 cycles.

Change-Id: I1b79b9642aae0ed3fb3cefb5b70246e6de5d5caa
2017-01-10 12:50:55 -08:00
Linfeng Zhang
c8f25fa5c0 Clean hbd idct 4x4 neon functions and other
BUG=webm:1301

Change-Id: I387b7eae716a7df15c691dc6f368b07602df7342
2016-12-14 11:38:28 -08:00
Linfeng Zhang
264f6e70ec Update idct x86 intrinsics to not use saturated add and sub
Change-Id: Iaa64d23fdb45ca1f235b0ea57e614516e548eca4
2016-11-29 17:06:08 -08:00
Jerome Jiang
de5fd00ec5 Change *_xmm to *_sse2 in deblocker assembly functions.
Some cosmetic changes because xmm is an anachronism.

Change-Id: I436a5b78a3c52776c20d6640939311f2a84a9bc7
2016-11-17 23:38:04 +00:00
Linfeng Zhang
d545c19afa Rename vpx_highbd_idct8x8_10{*}() to vpx_highbd_idct8x8_12{*}()
Also update its trigger threshold from 10 to 12.

Change-Id: Ib8dddd87a5a22a12ca66e7084d342fbb027b0a2f
2016-11-07 09:07:55 -08:00
Linfeng Zhang
a9874961f0 Merge "Replace highbd_dct_const_round_shift with dct_const_round_shift" 2016-11-07 16:55:01 +00:00
Johann
e10c95dc83 Update vp9_fdct8x8_quant_ssse3 for highbitdepth
Borrow transition functions from fdct.h nee vpx_quantize_b_sse2

BUG=webm:1304

Change-Id: I9c88c3eec3ff8bb461411d98c26c3c236ea28ef1
2016-11-05 01:23:07 +00:00
Linfeng Zhang
04c3bf3c85 Replace highbd_dct_const_round_shift with dct_const_round_shift
They are identical.

Change-Id: I1ccaf03c81c3cbf88e82d77ffeb8204f5b063c61
2016-11-04 16:15:02 -07:00
Johann
cf35ffc025 Extract high bit depth helper functions
These can be used in the vp9 fdct as well.

Change-Id: I4f3875e0cba1b8cad209c3a0581e121deba7675e
2016-11-04 18:13:51 +00:00
Urvang Joshi
e084e05484 Fix warnings reported by -Wshadow: Part1: vpx_dsp directory
While we are at it:
- Rename some variables to more meaningful names
- Reuse some common consts from a header instead of redefining them.

Change-Id: I75c4248cb75aa54c52111686f139b096dc119328
(cherry picked from aomedia 09eea21)
2016-10-17 19:25:19 -07:00
Linfeng Zhang
9c8981c666 add vpx high bitdepth convolve8 NEON intrinsics optimization
BUG=webm:1299

Change-Id: I236bfa0441e357b6ff05add8269a2cfb543924d1
2016-10-17 15:23:54 -07:00
Linfeng Zhang
7f1f35183a Unify loopfilter function names
Rename vpx_lpf_horizontal_edge_8() to vpx_lpf_horizontal_16().
Rename vpx_lpf_horizontal_edge_16() to vpx_lpf_horizontal_16_dual().

Change-Id: I798ca8fbbd657d06d3db2bfb0fb3321168f49e52
2016-09-29 16:25:42 -07:00
Urvang Joshi
0aa3e2564f Add compiler warning flag -Wextra and fix related warnings.
Note: some of these warnings are enabled by a combination of -Wunused
(added earlier) and -Wextra.

Cherry-picked from AOM 4790a69faaec8f03d65f64ff070f6ab4307dbb16

Expands use of (void)x; on unused variables. AOM only supports one codec
in codec_factory.h

Does not include changes to HandleDecodeResult. AOM removed
invalid_file_test.cc which does use the video parameter.

Does not enable -Wextra yet. There are more issues to fix.

BUG=webm:1069

Change-Id: I322a1366bd4fd6c0dec9e758c2d5e88e003b1cbf
2016-09-27 12:05:01 -07:00
James Zern
fdd1186f97 vpx_idct32x32_34_add_sse2: rm unneeded transposes
this change is neutral to mildly positive across various x86-64
platforms

Change-Id: I28fb5ae598fc1317b7a42c9a846ac5d57d104784
2016-09-21 19:49:25 -07:00
James Zern
6acd061aad variance_avx2: sync variance functions with c-code
add missing int64 -> uint32 cast; quiets -Wshorten-64-to-32 warnings

Change-Id: I4850b36e18dc8b399108342be4bfe0b684aefb78
2016-09-19 16:19:29 -07:00
James Zern
33aef48f29 vpx_subpixel_8t_intrin_avx2: tolerate unversioned clang
assume __clang_major__==0 has the latest version of
_mm256_broadcastsi128_si256. fixes builds with custom clang toolchains.

BUG=b/30970831

Change-Id: I90becd56278e4716bd46e2ba9d910af977e8dfa6
2016-09-16 07:14:17 +00:00
clang-format
5f6d143b41 apply clang-format
Change-Id: I501597b7c1e0f0c7ae2aea3ee8073f0a641b3487
2016-09-15 15:07:53 -07:00
James Zern
4b0e78bfda Merge "vpx_dsp: added vpx_highbd_idct32x32_1_add_sse2()" 2016-09-08 01:05:18 +00:00
Scott LaVarnway
309125b1e7 vpx_dsp: added vpx_highbd_idct32x32_1_add_sse2()
Change-Id: I140d93aebadb0eaf6220881e61a0451450081227
2016-09-07 05:58:29 -07:00
Johann
d393885af1 Remove halfpix specialization
This function only exists as a shortcut to subpixel variance with
predefined offsets. xoffset = 4 for horizontal, yoffset = 4 for vertical
and both for "hv"

Removing this allows the existing optimizations for the variance
functions to be called. Instead of having only sse2 optimizations, this
gives sse2, ssse3, msa and neon.

BUG=webm:1273

Change-Id: Ieb407b423b91b87d33c4263c6a1ad5e673b0efd6
2016-08-23 17:05:39 -07:00
James Zern
bd7cfb46fb variance_impl_avx2: restore table layout
disable clang-format for bilinear_filters_avx2

restores the row layout prior to:
099bd7f vpx_dsp: apply clang-format
but keeps the justification used by clang-format

Change-Id: Icf1733a37edb807e74c26b23a93963c03bd08fd7
2016-08-12 11:52:53 -07:00
Alex Converse
c0241664aa Resolve -Wshorten-64-to-32 in variance.
The subtrahend is small enough to fit into uint32_t.

Change-Id: Ic4d7128aaa665eaf6b25d562610ba8942c46137f
2016-07-28 10:16:31 -07:00
clang-format
956af1d478 vpx_dsp/x86/quantize_sse2.c: apply clang-format
post:
e429080 .clang-format: disable DerivePointerAlignment

Change-Id: I21a0546668edb2b09660e216d4875a1d2ad24d53
2016-07-27 21:41:18 -07:00
clang-format
099bd7f07e vpx_dsp: apply clang-format
Change-Id: I3ea3e77364879928bd916f2b0a7838073ade5975
2016-07-25 14:14:19 -07:00