Commit Graph

990 Commits

Author SHA1 Message Date
Linfeng Zhang
9351f96069 Add vp9_highbd_iht16x16_256_add_neon()
BUG=webm:1403

Change-Id: I2293c11666786be276909d48ee78dacb40a89e25
2018-03-13 17:39:23 -07:00
Linfeng Zhang
88c2386447 Add vp9_iht16x16_256_add_neon()
BUG=webm:1403

Change-Id: I1413cc3dfcb62143ba04fe9b0f8d8b010fdf69b6
2018-02-27 10:13:20 -08:00
Linfeng Zhang
3c6dc743aa Fix a bug in create_s16x4_neon()
This bug exposes when 2nd argument is negative, and the higher 32 bits
would be all 1s.

Change-Id: I189ee8cd3753fde00a34847e7a37cde2caa4ba72
2018-02-26 17:49:24 -08:00
Linfeng Zhang
167594414f Merge "Add vp9_highbd_iht8x8_16_add_neon()" 2018-02-23 01:42:59 +00:00
Kyle Siefring
dccb8b45bb Merge "Fold adds in 16->32-bit converts in SSE2/AVX2 fDCT" 2018-02-21 23:12:07 +00:00
Linfeng Zhang
29b6a30cd9 Add vp9_highbd_iht8x8_16_add_neon()
BUG=webm:1403

Change-Id: I11efb652f1aee371c71eee2d29e33793e4736832
2018-02-20 17:21:31 -08:00
Johann
c1435e321c remove deprecated 'register' keyword
Will be removed in C++17:
http://en.cppreference.com/w/cpp/language/storage_duration

Change-Id: Iadce5e2b974c707799fa939f3ff1c420fb79a871
2018-02-20 14:49:02 -08:00
Kyle Siefring
811b2e412e Fold adds in 16->32-bit converts in SSE2/AVX2 fDCT
Changes in the function size in bytes (in lieu of performance metrics)
                   Before    After    Diff
vpx_fdct32x32_avx2  29564 -> 28334   -1230
vpx_fdct32x32_sse2  38053 -> 36309   -1744

Change-Id: Ie0b3e6ed7c3f2e9ea45f9d6a1ce1e27d068cee6b
2018-02-10 14:25:24 -05:00
Linfeng Zhang
0f3edc6625 Update iadst NEON functions
Use scalar multiply. No impact on clang, but improves gcc compiling.

BUG=webm:1403

Change-Id: I4922e7e033d9e93282c754754100850e232e1529
2018-02-08 07:23:55 +00:00
Linfeng Zhang
3636330490 Add vp9_highbd_iht4x4_16_add_neon()
BUG=webm:1403

Change-Id: Id9833e985fb70958cf4bde38f8e6303ed83c12f9
2018-02-05 13:42:16 -08:00
James Zern
73d1236384 inv_txfm_vsx.c: make code c90 compatible
move for loop declarations to function scope

Change-Id: I84d92a1a6ca6c5ac30aacb0f55d87ca3aef4c98f
2018-02-01 19:40:28 -08:00
Linfeng Zhang
b14b616d96 Update vp9_iht8x8_64_add_neon()
Change-Id: Ie70ed8b9273df5e1fd06bc93cb469e80630941d2
2018-01-29 15:17:08 -08:00
Linfeng Zhang
884d1681f8 Clean dct_const_round_shift() related neon code
Change-Id: I8f4e0fc6ecb77b623519f2dd3cd2886f89218ddd
2018-01-29 10:23:24 -08:00
Linfeng Zhang
2654afc16c Merge "cosmetic: clean idct neon functions" 2018-01-29 17:34:11 +00:00
Scott LaVarnway
15b261d854 Merge "BUG FIX: sse2 subpel variance is not PIC compliant" 2018-01-24 22:54:42 +00:00
Linfeng Zhang
6248f0c91f cosmetic: clean idct neon functions
Change-Id: I9c7c52567850aded0437b13ba1260e94441bc49d
2018-01-24 13:55:15 -08:00
Scott LaVarnway
cb9f4dc105 BUG FIX: sse2 subpel variance is not PIC compliant
BUG=webm:1464

Change-Id: Ibc15bac54aaf509365bed5892a26a29972ad3540
2018-01-24 05:58:54 -08:00
Scott LaVarnway
b9e44842fc Merge "vp9_quantize_fp_avx2()" 2018-01-24 13:58:08 +00:00
Linfeng Zhang
231012fdab Add vp9_highbd_iht16x16_256_add_sse4_1()
BUG=webm:1413

Change-Id: I8d7eeae1bd219eb848c1a86071046a477f7a91af
2018-01-23 11:24:42 -08:00
Linfeng Zhang
8f50e06012 Add "vpx_" prefix to 2 idct x86 functions
Change-Id: I4f3052d8748e16b06e9155f8daf22f867dfaa7a3
2018-01-23 09:17:38 -08:00
Linfeng Zhang
6fea41abee Merge "Add vp9_highbd_iht8x8_64_add_sse4_1()" 2018-01-23 17:04:20 +00:00
Linfeng Zhang
9874ec07bd Add vp9_highbd_iht8x8_64_add_sse4_1()
BUG=webm:1413

Change-Id: Id9038226902b2d793fc6c17ac81bb104c1a18988
2018-01-18 15:49:44 -08:00
Scott LaVarnway
c7449b482c vp9_quantize_fp_avx2()
Started from vp9_quantize_fp_sse2 and tweaked to use avx2.

Change-Id: Ic2da50cc9d73896c7ef2f3cd3db5b1c5d7795b8b
2018-01-18 13:33:30 -08:00
Johann
97acbbb701 clang-format v5.0.0 vpx_dsp/
Remove comments above #define statements because they get
indented unnecessarily.
https://bugs.llvm.org/show_bug.cgi?id=35930

Add blank lines to prevent comments from being treated as
blocks.

Change-Id: I04dce21b2a10e13b8dc07411a0019c098f6dd705
2018-01-18 12:37:50 -08:00
Johann
f5b2dd2a66 adopt some clang 5.0.0 formatting
At least the changes that don't conflict with 4.0.1

Change-Id: I9b6a7c14dadc0738cd0f628a10ece90fc7ee89fd
2018-01-11 12:35:24 -08:00
Linfeng Zhang
e20ca4fead Add vp9_highbd_iht4x4_16_add_sse4_1()
BUG=webm:1413

Change-Id: I14930d0af24370a44ab359de5bba5512eef4e29f
2018-01-08 10:14:20 -08:00
Linfeng Zhang
7a41610581 Update dct_test.cc
Make 8-bit functions testing available in high bitdepth.

Change-Id: Ic030c75aa4c6b649c52426abb4bb2122882de0fe
2018-01-08 10:07:38 -08:00
Linfeng Zhang
867b593caa Update iadst4_sse2()
Change-Id: I21ff81df0d6898170a3b80b3b5220f9f3ac7f4e8
2017-12-28 16:47:57 -08:00
Johann
e4b3f03c64 add copyright to rtcd files
Allows them to pass the license check in chromium.

BUG=chromium:98319

Change-Id: Iefc1706152a549d8c4ae774c917596bf1c9492d8
2017-12-14 22:50:08 +00:00
Shiyou Yin
90ce21e519 Merge "vpx_dsp: [loongson] optimize variance v2." 2017-12-04 01:30:06 +00:00
Johann
bdbecea1ba explicitly label .text sections
nasm should infer .text but does not for windows:
https://bugzilla.nasm.us/show_bug.cgi?id=3392451

Change-Id: Ib195465e5f33405f5ff61c4cf88aa2a72640cacb
2017-12-01 14:33:04 -08:00
Shiyou Yin
298f5ca47d vpx_dsp: [loongson] optimize variance v2.
1. Delete unnecessary zero setting process.
2. Optimize the method of calculating SSE in vpx_varianceWxH.

Change-Id: I58890c6a2ed1543379acb48e03e620c144f6515f
2017-12-01 13:44:48 +08:00
Kaustubh Raste
8099220e6c Merge "mips msa optimize vpx_scaled_2d function" 2017-12-01 01:24:25 +00:00
Shiyou Yin
8d70aef05f Merge "vpx: [loongson] fix bug in var_filter_block2d_bil_16x" 2017-11-30 00:53:37 +00:00
Kyle Siefring
3ae909b0f9 Merge "Remove unnecessary includes of emmintrin_compat.h" 2017-11-29 19:14:45 +00:00
Kyle Siefring
a60da3a2eb Remove unnecessary includes of emmintrin_compat.h
Change-Id: Ie60381a0c6ee01f828cd364a43f01517f4cb03e9
2017-11-29 11:48:24 -05:00
Kaustubh Raste
339f4dcaee mips msa optimize vpx_scaled_2d function
Change-Id: I638507b360c71489ab0e87bd558d2719ad995333
2017-11-29 13:27:04 +05:30
Shiyou Yin
a0ca2a4079 vpx: [loongson] fix bug in var_filter_block2d_bil_16x
Which cause failed case:
1. MMI/VpxSubpelVarianceTest.Ref/6
2. MMI/VpxSubpelVarianceTest.Ref/7
3. MMI/VpxSubpelVarianceTest.ExtremeRef/6
4. MMI/VpxSubpelVarianceTest.ExtremeRef/7

Change-Id: I122ca20089e14ac324edd61295cf8f506e06afc8
2017-11-29 10:26:43 +08:00
Johann
bd990cad72 quantize x86: dedup some parts
Change-Id: I9f95f47bc7ecbb7980f21cbc3a91f699624141af
2017-11-27 13:09:21 -08:00
Kyle Siefring
dd4cc5b596 Merge "Optimize AVX2 get16x16var and get32x16var functions" 2017-11-20 22:37:57 +00:00
Kyle Siefring
07a0bf038f Optimize AVX2 get16x16var and get32x16var functions
Change-Id: If8b91aaa883c01107f0ea3468139fa24cfb301d2
2017-11-17 13:55:49 -05:00
Johann
3e3a568616 fwd txfm ssse3: use GLOBAL() for loading constants
Fixes a build issue when relocation is not allowed:
relocation R_X86_64_32 against '.rodata' can not be used when making a shared object

Change-Id: Ica3e90c926847bc384e818d7854f0030f4d69aa0
2017-11-15 13:01:44 -08:00
Scott LaVarnway
8e6022844f vpx: [x86] add vpx_satd_avx2()
SSE2 instrinsic vs AVX2 intrinsic speed gains:
blocksize   16: ~1.33
blocksize   64: ~1.51
blocksize  256: ~3.03
blocksize 1024: ~3.71

Change-Id: I79b28cba82d21f9dd765e79881aa16d24fd0cb58
2017-11-10 12:24:12 -08:00
Scott LaVarnway
2387024f41 runtime error fix: bitdepth_conversion_avx2.h
Change-Id: I7364a157de39eb7137b599808474b8d46d19d376
2017-11-09 12:26:43 -08:00
Kyle Siefring
b383a17fa4 Support building AVX-512 and implement sadx4 for AVX-512
The added AVX-512 support requires the subset of AVX-512 added in Skylake-X.

Change-Id: I39666b00d10bf96d06c709823663eb09b89265b7
2017-11-03 13:37:23 -04:00
Scott LaVarnway
3bf02ad74a vpx: hadamard: use ptrdiff_t instead of int for stride
Eliminates the following instruction for the x86 (64 bit)
intrinsic code:

movslq %esi,%rax

Change-Id: I8f5ebd40726f998708a668b0f52ea7a0576befae
2017-10-26 11:41:48 -07:00
Kyle Siefring
037e596f04 Merge "Optimize convolve8 SSSE3 and AVX2 intrinsics" 2017-10-24 19:22:36 +00:00
Kyle Siefring
ae35425ae6 Optimize convolve8 SSSE3 and AVX2 intrinsics
Changed the intrinsics to perform summation similiar to the way the assembly does.

The new code diverges from the assembly by preferring unsaturated additions.

Results for haswell

SSSE3
Horiz/Vert  Size  Speedup
Horiz       x4    ~32%
Horiz       x8    ~6%
Vert        x8    ~4%

AVX2
Horiz/Vert  Size  Speedup
Horiz       x16   ~16%
Vert        x16   ~14%

BUG=webm:1471

Change-Id: I7ad98ea688c904b1ba324adf8eb977873c8b8668
2017-10-24 10:39:48 -04:00
Scott LaVarnway
512bf4e029 vpx: [x86] vpx_hadamard_16x16_avx2() highbitdepth fix
Use an intermediate buffer before storing to coeffs when
highbitdepth is enabled.

Change-Id: I101981a1995f1108ad107c55c37d6e09eadb404b
2017-10-23 08:49:32 -07:00
Scott LaVarnway
4906cea027 vpx: [x86] vpx_hadamard_16x16_avx2() improvements
~10% performance gain.  Fixed the cosmetics noted in the
previous commit.

Change-Id: Iddf475f34d0d0a3e356b2143682aeabac459ed13
2017-10-20 08:55:06 -07:00