Jingning Han
bb40844e32
Merge "Add SSSE3 intrinsic 8x8 inverse 2D-DCT"
2017-02-02 22:18:32 +00:00
Kaustubh Raste
5b10674b5c
Merge "Add mips msa sum_squares_2d_i16 function"
2017-02-02 08:09:21 +00:00
Johann Koenig
726556dde9
Merge "Remove neon assembly for idct 16x16 and 8x8"
2017-02-02 03:25:31 +00:00
Johann Koenig
ce6318f254
Merge changes I43521ad3,I013659f6
...
* changes:
satd highbd neon: use tran_low_t for coeff
satd highbd sse2: use tran_low_t for coeff
2017-02-02 03:03:58 +00:00
Jingning Han
8f95389742
Add SSSE3 intrinsic 8x8 inverse 2D-DCT
...
The intrinsic version reduces the average cycles from 183 to 175.
Change-Id: I7c1bcdb0a830266e93d8347aed38120fb3be0e03
2017-02-01 14:47:53 -08:00
Johann Koenig
dc90501ba3
Merge changes I374dfc08,I7e15192e,Ica414007
...
* changes:
hadamard highbd ssse3: use tran_low_t for coeff
hadamard highbd neon: use tran_low_t for coeff
hadamard highbd sse2: use tran_low_t for coeff
2017-02-01 21:56:36 +00:00
Johann Koenig
f60171bb4f
Merge "deblock: annotate postproc parameters"
2017-02-01 19:57:29 +00:00
Johann
f8d744d91a
satd highbd neon: use tran_low_t for coeff
...
BUG=webm:1365
Change-Id: I43521ad32b6c96737a8ef2b8c327f901fd7eaf84
2017-02-01 11:55:47 -08:00
Johann
2ba383474d
satd highbd sse2: use tran_low_t for coeff
...
BUG=webm:1365
Change-Id: I013659f6b9fbf9cc52ab840eae520fe0b5f883fb
2017-02-01 11:55:16 -08:00
Johann
0f751ecee3
hadamard highbd ssse3: use tran_low_t for coeff
...
BUG=webm:1365
Change-Id: I374dfc08732932382043905f128e928b08cb4f57
2017-02-01 11:51:15 -08:00
Johann
1eb8a718bf
hadamard highbd neon: use tran_low_t for coeff
...
BUG=webm:1365
Change-Id: I7e15192ead3a3631755b386f102c979f06e26279
2017-02-01 11:50:46 -08:00
Johann
2dac808dd1
hadamard highbd sse2: use tran_low_t for coeff
...
BUG=webm:1365
Change-Id: Ica414007d8412ceebfffa9e58e8416226a3fe934
2017-02-01 11:46:57 -08:00
Johann Koenig
3bda634576
Merge "quantize ssse3: remove unused pxor"
2017-02-01 19:41:41 +00:00
Jingning Han
969957f9f2
Fix real-time compression regression in hbd mode
...
This commit resolves the compression performance regression in
real-time encoding setting when high bit-depth mode is enabled.
The current solution temporarily disables the SIMD implementations
of vpx_satd, hadamard8x8, and hadamard16x16 in high bit-depth mode.
The commit makes the coding results bit-wise identical between
regular coding pipeline and high bit-depth at profile 0.
BUG=webm:1365
Change-Id: Icfb900821733749685370460a1a5a7e07f76f4bf
2017-01-31 23:17:09 -08:00
Johann
32f68cc58c
deblock: annotate postproc parameters
...
Clears a clang static analyzer warning where 'cols' is assumed to be
less than 0, preventing the for loop from executing.
The assembly already requires that the size be 8 or 16 (U/V or Y plane)
and cols is a multiple of 8.
Change-Id: Ica4612690ead1638c94cfe56b306e87f8ce644f9
2017-01-31 15:58:57 -08:00
Kaustubh Raste
750e753134
Add mips msa sum_squares_2d_i16 function
...
average improvement ~4x-5x
Change-Id: I8d91b71d0677009be52b412e4f52b40b98573a53
2017-01-31 12:22:43 +00:00
Kaustubh Raste
df7e1fecc1
Add mips msa vpx_minmax_8x8 function
...
average improvement ~4x-5x
Change-Id: I83aee9977534fddb8a9b80d31af646c0b6b1a8c3
2017-01-31 10:00:43 +05:30
Johann
dcfff3ccc8
quantize ssse3: remove unused pxor
...
Change-Id: Ifa22d77fd530827de0b32ae71810dc2213ab2937
2017-01-30 17:02:57 -08:00
Kaustubh Raste
4ce20fb3f4
Add mips msa vpx_vector_var function
...
average improvement ~4x-5x
Change-Id: I2f63ef83d816052ca8dc42421e7e9d42f7a7af6b
2017-01-28 08:53:20 +00:00
Kaustubh Raste
407fad2356
Add mips msa vpx Integer projection row/col functions
...
average improvement ~4x-5x
Change-Id: I17c41383250282b39f5ecae0197ef1df7de20801
2017-01-27 11:11:42 +05:30
Kaustubh Raste
182ea677a0
Add mips msa vpx satd function
...
average improvement ~4x-5x
Change-Id: If8683d636fe2606d4ca1038e28185bca53bbe244
2017-01-24 10:44:22 +05:30
Johann
13234d3c43
Remove neon assembly for idct 16x16 and 8x8
...
Tested using test/partial_idct_test.cc:DISABLED_Speed
Both gcc 4.9 and clang 3.8 from the r13 Android NDK offer improvements
using the intrinsics:
<function> <clang asm> <gcc asm> <clang intrin> <gcc intrin>
idct16x16_256 1720ms 1703ms 1546ms 1554ms
idct16x16_10 1320ms 1247ms 518ms 488ms
idct16x16_1 107ms 108ms 64ms 68ms
idct8x8_64 924ms 931ms 866ms 989ms
idct8x8_12 826ms 824ms 519ms 514ms
idct8x8_1 172ms 166ms 110ms 125ms
idct8x8_64 isn't quite perfect (slight regression with gcc intrinsics)
but as a counter example idct16x16_10 goes from ~1300ms to ~500ms
On a sample clip, clang improved from 48.5 to 49fps and gcc stayed roughly
stable.
BUG=webm:1303
Change-Id: I9d4fd2b41b46ea6174a887b40a82c8e6e4769ed4
2017-01-19 12:27:31 -08:00
Kaustubh Raste
e0c0e65378
Add mips msa vpx hadamard functions
...
average improvement ~4x-5x
Change-Id: I167132d894c04fa85dda8dde7906ff9c61b3a65d
2017-01-19 14:44:03 +05:30
Jingning Han
b6fe63a505
Merge "Rework 8x8 transpose SSSE3 for avg computation"
2017-01-13 18:25:17 +00:00
Jingning Han
553e9e291f
Merge "Rework 8x8 transpose SSSE3 for inverse 2D-DCT"
2017-01-13 18:25:09 +00:00
Jingning Han
39fff1bea0
Rework 8x8 transpose SSSE3 for avg computation
...
Use same transpose process as inv_txfm_sse2 does.
Change-Id: I2db05f0b254628a11f621c4c09abb89501ba6d3c
2017-01-12 15:16:07 -08:00
Jingning Han
f65170ea84
Rework 8x8 transpose SSSE3 for inverse 2D-DCT
...
Use same transpose process as inv_txfm_sse2 does.
Change-Id: Ic4827825bd174cba57a0a80e19bf458a648e7d94
2017-01-12 15:13:18 -08:00
Johann Koenig
9f27d1f843
Merge "arm idct16x16: remove extra config guards"
2017-01-11 20:22:27 +00:00
Johann
68d0f46ec0
arm idct16x16: remove extra config guards
...
This file is guarded by HAVE_NEON_ASM in the .mk file now.
Change-Id: I513a621c234aa90ad52e426c8ed494d8a7d4b74a
2017-01-11 10:17:14 -08:00
Jingning Han
9a780fa7db
Rework forward 8x8 2D-DCT ssse3 implementation
...
This commit reworks the SSSE3 implementation of the forward 8x8
2D-DCT. It uses a cyclic rotation approach to the temporary xmm
registers. It reduces the average cycles from 158 to 154. The SSE2
version uses 169 cycles.
Change-Id: I1b79b9642aae0ed3fb3cefb5b70246e6de5d5caa
2017-01-10 12:50:55 -08:00
James Zern
9480da21e8
Merge "Refine 8-bit 16x16 idct NEON intrinsics"
2017-01-09 23:52:29 +00:00
Johann Koenig
371a64bfe7
Merge "postproc: vpx_mbpost_proc_down_neon"
2017-01-09 19:53:15 +00:00
Johann Koenig
8a7847c2c9
Merge "Fix mips dspr2 idct32x32 functions for large coefficient input"
2017-01-09 19:47:47 +00:00
Johann Koenig
bf168b24f5
Merge "Fix mips dspr2 idct16x16 functions for large coefficient input"
2017-01-09 19:47:00 +00:00
Johann Koenig
08d0a7fd0f
Merge "Fix mips dspr2 idct8x8 functions for large coefficient input"
2017-01-09 19:46:18 +00:00
Johann Koenig
ab20869221
Merge "Fix mips dspr2 idct4x4 functions for large coefficient input"
2017-01-09 19:45:54 +00:00
Johann
c23970ec25
postproc: vpx_mbpost_proc_down_neon
...
This was much more amenable to optimization than the across filter.
Speedup of almost 2.5x
BUG=webm:1320
Change-Id: I49acc0f9cb2e7642303df90132cbc938acade4c4
2017-01-09 10:21:56 -08:00
Johann Koenig
9af97fb630
Merge "postproc: vpx_mbpost_proc_across_ip_neon"
2017-01-09 18:17:26 +00:00
Kaustubh Raste
50dd3eb62c
Fix mips dspr2 idct32x32 functions for large coefficient input
...
Change-Id: If9da7099f226a27a09cc9e2899eb66a1158909d2
2017-01-09 17:21:09 +05:30
Kaustubh Raste
c06991fce6
Fix mips dspr2 idct16x16 functions for large coefficient input
...
Change-Id: I9be3d3d040837f658c6314606e28db8c31092a1a
2017-01-09 16:35:28 +05:30
Kaustubh Raste
24d804f79c
Fix mips dspr2 idct8x8 functions for large coefficient input
...
Change-Id: If011dd923bbe976589735d5aa1c3167dda1a3b61
2017-01-09 16:22:19 +05:30
Kaustubh Raste
afd2d797eb
Fix mips dspr2 idct4x4 functions for large coefficient input
...
Change-Id: I06730eec80ca81e0b7436d26232465b79f447e89
2017-01-09 15:28:30 +05:30
Linfeng Zhang
6abdd31555
Refine 8-bit 16x16 idct NEON intrinsics
...
Speed test shows 25% gain on vpx_idct16x16_256_add_neon(),
and vpx_idct16x16_10_add_neon() got trippled.
Change-Id: If8518d9b6a3efab74031297b8d40cd83c4a49541
2017-01-06 17:52:07 -08:00
Johann
4dca923454
postproc: vpx_mbpost_proc_across_ip_neon
...
The speedup is pretty poor. I would be concerned except the SSE2 is
worse:
Existing SSE2 improvement: 22%
New neon improvement: 35%
BUG=webm:1320
Change-Id: Ied598a261134aa6cbe69f96f58589d2bae17bf62
2017-01-06 16:39:17 -08:00
Linfeng Zhang
2d12a52ff0
Merge "Add high bitdepth 8x8 idct NEON intrinsics"
2017-01-06 16:47:23 +00:00
Linfeng Zhang
911bb980b1
Clean DC only idct NEON intrinsics
...
BUG=webm:1301
Change-Id: Iffc83854218460b3f687f3774e71d45b552382a5
2016-12-28 13:51:44 -08:00
Linfeng Zhang
9b187954df
Add high bitdepth 8x8 idct NEON intrinsics
...
BUG=webm:1301
Change-Id: I56e3bc3aab9214e2debac93796389a7194991084
2016-12-27 16:28:53 -08:00
Linfeng Zhang
6d5a3fe583
Clean idct 8x8 neon functions
...
BUG=webm:1301
Change-Id: I05f47dca1fddc155c8396e627cfccf6449677307
2016-12-21 14:24:17 -08:00
James Zern
a68b36c752
vpx_idct32x32_1024_add_neon: quiet uninitialized warning
...
relocate the assignment to 'in' outside of the for loop. this quiets a
spurious warning in visual studio builds since:
86e340c enable vpx_idct32x32_1024_add_neon in hbd builds
+ give the variable a more descriptive name
BUG=webm:1294
Change-Id: I5c3da5c7939621477e0fc0ad3a1b2a3045c5bffd
2016-12-19 12:49:44 -08:00
Linfeng Zhang
7e23f895ca
Merge "Clean hbd idct 4x4 neon functions and other"
2016-12-19 17:09:26 +00:00