James Zern
67cde46dd7
intrapred: specialize highbd 4x4 predictors
...
d207/d63/d45/d117/d135/d153
~9-45% better depending on the predictor on 32-bit ARM, similar range on
x86-64
this matches the non-highbitdepth implementation
BUG=webm:1316
Change-Id: Iddebdf7c58c6f31c47cae04da95c6e5318200e4c
2017-03-24 20:45:36 -07:00
James Zern
e05f4cf8f4
intrapred: rename d63f to d63e
...
this is consistent with he/ve/d45e
Change-Id: I75641ae5667430b0ecd370db86fff6e666cb577d
2017-03-24 20:41:39 -07:00
James Zern
d45617c702
remove CONFIG_MISC_FIXES
...
this belonged to vp10 with the changes now migrated to av1.
Change-Id: Ie30ead3e7b71f465bc14136e1b6f156ea978c43f
2017-03-24 20:41:39 -07:00
Kaustubh Raste
8ee9b855a0
Merge "Fix mips msa fwd xform mismatch"
2017-03-23 07:44:16 +00:00
James Zern
f16ea6a6eb
Merge "vp9_rdopt: correct size to vpx_sum_squares_2d_i16"
2017-03-23 00:53:22 +00:00
James Zern
e097bb1d39
Merge "idct_neon: prefix non-static functions w/'vpx_'"
2017-03-22 19:30:11 +00:00
James Zern
5661cd8ff4
vp9_rdopt: correct size to vpx_sum_squares_2d_i16
...
the current implementations expect pixel size, not the block type
BUG=webm:1392
Change-Id: Ib91e9f30a1f56e13566b1fb76f089dae9bb50cdc
2017-03-22 12:04:33 -07:00
James Zern
f91c3bb3ab
idct_neon: prefix non-static functions w/'vpx_'
...
Change-Id: I94fcdeae18468e6ef0cb7119b8142d982a048031
2017-03-22 11:49:23 -07:00
Kaustubh Raste
e45c1f55b4
Fix mips msa fwd xform mismatch
...
Change-Id: I32a6df11463144aa1a562256ee7d57a41fd678d6
2017-03-22 14:01:03 +05:30
Yi Luo
cb9b277b2f
Merge "Make butterfly_self() signature consistent with butterfly()"
2017-03-21 22:32:20 +00:00
Yi Luo
266868a40b
Make butterfly_self() signature consistent with butterfly()
...
- Refer to patch: 48fca113d
inv_txfm_ssse3,butterfly: fix win32 abi
compatibility.
- Change four butterfly() calls to butterfly_self(), to simplify the
operations.
Change-Id: Ib2a8cfe6cddcaf0a59e6e6270d8380055ea42ef3
2017-03-21 09:36:35 -07:00
James Zern
e0b4c4d1ae
Merge "Add vpx_highbd_idct32x32_1024_add_neon()"
2017-03-21 03:27:35 +00:00
James Zern
6d71d33d55
Merge "Add vpx_highbd_idct32x32_34_add_neon()"
2017-03-21 03:02:51 +00:00
James Zern
5da2e500d7
inv_txfm_sse2: clear conversion warning in hbd build
...
tran_high -> tran_low in return from dct_const_round_shift()
Change-Id: I2fe06c4b604823b1d1fe40a487017c3c2819a440
2017-03-17 01:16:38 -07:00
Linfeng Zhang
27530d484e
Add vpx_highbd_idct32x32_1024_add_neon()
...
BUG=webm:1301
Change-Id: Ib90af0c1712e56b301d0e981dbe9a641e15e36ca
2017-03-17 00:27:46 -07:00
Linfeng Zhang
50b13f75b8
Add vpx_highbd_idct32x32_34_add_neon()
...
BUG=webm:1301
Change-Id: I74dd16c6c64e7bb71aa991cedccddf0663ef5e06
2017-03-17 00:27:46 -07:00
James Zern
2882778310
Merge "Add vpx_highbd_idct32x32_135_add_neon()"
2017-03-17 07:26:52 +00:00
Linfeng Zhang
65e9fb65e8
Add vpx_highbd_idct32x32_135_add_neon()
...
BUG=webm:1301
Change-Id: I58c2d65d385080711c3666d6d8f9d241dac7b21a
2017-03-16 22:37:55 -07:00
James Zern
68efc64b72
Merge "Clean vpx_idct32x32_1024_add_neon()"
2017-03-17 05:24:58 +00:00
Rafael de Lucena Valle
405b94c661
Add Hadamard for Power8
...
Change-Id: I3b4b043c1402b4100653ace4869847e030861b18
Signed-off-by: Rafael de Lucena Valle <rafaeldelucena@gmail.com>
2017-03-15 23:46:18 -03:00
Linfeng Zhang
e54231d613
Clean vpx_idct32x32_1024_add_neon()
...
Change-Id: I05921e16d6a3e4e7e5b00a90624735050a186636
2017-03-15 11:24:31 -07:00
Yi Luo
8440cc4817
Merge "Improve idct32x32_1024_add SSSE3 intrinsics performance"
2017-03-15 02:32:52 +00:00
Linfeng Zhang
c756eb01c8
Fix overflow issue in 32x32 idct NEON intrinsics
...
Similar issue as Change bc1c18e
.
The PartialIDctTest.ResultsMatch test on vpx_idct32x32_135_add_neon()
in high bit-depth mode exposes 16-bit overflow in final stage of pass
2, when changing the test number from 1,000 to 1,000,000.
Change to use saturating add/sub for vpx_idct32x32_34_add_neon(),
vpx_idct32x32_135_add_neon and vpx_idct32x32_1024_add_neon() in high
bit-depth mode.
Change-Id: Iaec0e9aeab41a3fdb4e170d7e9b3ad1fda922f6f
2017-03-14 16:59:14 -07:00
Yi Luo
fedcf83f33
Improve idct32x32_1024_add SSSE3 intrinsics performance
...
- Function level speed improves ~12%.
Change-Id: I9b7dbddabf08c7d0f6b25264e6074d5ccbe39290
2017-03-14 14:04:08 -07:00
Linfeng Zhang
b0bfcc368c
Merge "Add vpx_highbd_idct32x32_135_add_c()"
2017-03-13 18:49:01 +00:00
James Zern
48fca113d1
inv_txfm_ssse3,butterfly: fix win32 abi compatibility
...
only the first 3 parameters can be aligned to 16 as required by __m128i,
make them all pointers for consistency.
since:
07c48ccfe
Improve idct32x32_34_add SSSE3 intrinsics performance
BUG=webm:1384
Change-Id: I0324f701e723a27cb470036a180693ba8829d01d
2017-03-10 19:57:17 -08:00
Yi Luo
327add990f
Improve idct32x32_135_add SSSE3 intrinsics performance
...
- Split the inv txfm into three parts to avoid stack spillover.
- Function level speed improves ~12%.
- Use function and macro to remove some repeated code.
Change-Id: I14f5f072334fd766808cb52bf648df792e7379ee
2017-03-09 16:17:54 -08:00
Linfeng Zhang
77311e0dff
Update vpx_idct32x32_1024_add_neon()
...
Most are cosmetics changes.
Speed has no change with clang 3.8, and about 5% faster with gcc 4.8.4
Tried the strategy used in 8x8 and 16x16 (which operations' orders are
similar to the C code), though speed gets better with gcc, it's worse
with clang.
Tried to remove store_in_output(), but speed gets worse.
Change-Id: I93c8d284e90836f98962bb23d63a454cd40f776e
2017-03-08 12:39:04 -08:00
Linfeng Zhang
48f5886605
Add vpx_highbd_idct32x32_135_add_c()
...
When eob is less than or equal to 135 for high-bitdepth 32x32 idct,
call this function.
BUG=webm:1301
Change-Id: I8a5864f5c076e449c984e602946547a7b09c9fe6
2017-03-08 10:46:33 -08:00
Linfeng Zhang
c4e5c54d69
cosmetics,dsp/arm/: vpx_idct32x32_{34,135}_add_neon()
...
No speed changes and disassembly is almost identical.
Change-Id: Id07996237d2607ca6004da5906b7d288b8307e1f
2017-03-08 08:58:32 -08:00
Linfeng Zhang
3cf5c213f1
cosmetics,dsp/arm/: rename a variable
...
Rename cospi_6_26_14_18N to cospi_6_26N_14_18N for consistency.
Change-Id: I00498b43bb612b368219a489b3adaa41729bf31a
2017-03-08 08:55:41 -08:00
Yi Luo
07c48ccfe0
Improve idct32x32_34_add SSSE3 intrinsics performance
...
- Split the transform into first half and second half.
- Reschedule the instructions to avoid stack spillover.
- Function level speed improves ~16%.
Change-Id: I166889840d23aa8a273eca00f6fbdae8b4566f35
2017-03-01 11:14:48 -08:00
James Zern
47d6f16a04
get_prob(): rationalize int types
...
promote the unsigned int calculation to uint64_t rather than int64_t for
type consistency
Change-Id: Ic34dee1dc707d9faf6a3ae250bfe39b60bef3438
2017-02-24 15:36:52 -08:00
Jerome Jiang
b1dcaf7f1e
Merge "Fix segmentation fault caused by denoiser working with spatial SVC."
2017-02-22 04:44:55 +00:00
Yi Luo
6036a0d24f
Following SSSE3 intrinsics functions also work for HBD
...
- vpx_idct8x8_12_add_ssse3
vpx_idct8x8_64_add_ssse3
vpx_idct32x32_34_add_ssse3
vpx_idct32x32_135_add_ssse3
vpx_idct32x32_1024_add_ssse3
- turn on unit tests.
Change-Id: I788b2b3b2074a6f3ab6a0e6f469c1327a123eff7
2017-02-21 12:37:53 -08:00
Jerome Jiang
0d1e5a21c4
Fix segmentation fault caused by denoiser working with spatial SVC.
...
Re-enable the affected test.
BUG=webm:1374
Change-Id: I98cd49403927123546d1d0056660b98c9cb8babb
2017-02-21 09:38:28 -08:00
Yi Luo
1f8e8e5bf1
Fix idct8x8 SSSE3 SingleExtremeCoeff unit tests
...
- In SSSE3 optimization, 16-bit addition and subtraction would
overflow when input coefficient is 16-bit signed extreme values.
- Function-level speed becomes slower (unit ms):
idct8x8_64: 284 -> 294
idct8x8_12: 145 -> 158.
BUG=webm:1332
Change-Id: I1e4bf9d30a6d4112b8cac5823729565bf145e40b
2017-02-17 14:05:05 -08:00
James Zern
3e7025022e
Merge "Add vpx_highbd_idct16x16_10_add_neon()"
2017-02-17 20:29:37 +00:00
Yi Luo
f62dcc9c33
Replace idct32x32_1024_add_ssse3 assembly with intrinsics
...
- Encoding/decoding test, BQTerrace_1920x1080_60.y4m, on
i7-6700, no obvious user-level speed performance downgrade.
- Passed unit tests.
Change-Id: I20688e0dd3731021ec8fb4404734336f1a426bfc
2017-02-16 16:10:40 -08:00
Johann Koenig
a9b81da575
Merge "block error avx2: use tran_low_t"
2017-02-16 23:51:14 +00:00
Linfeng Zhang
0620081731
Add vpx_highbd_idct16x16_10_add_neon()
...
BUG=webm:1301
Change-Id: If686c8144764c4162458f0bc4bb1bbf6555c48ab
2017-02-16 15:13:50 -08:00
James Zern
0f014c97e5
Merge "Fix mips vpx_post_proc_down_and_across_mb_row_msa function"
2017-02-16 23:02:10 +00:00
Johann Koenig
06a82af0de
Merge "correct bitdepth_conversion_sse2.h header guard"
2017-02-16 21:41:28 +00:00
Johann
6c2d732bf4
correct bitdepth_conversion_sse2.h header guard
...
Change-Id: Ic4ffd861608e67fe59bcb3a86010ce3ef11a5519
2017-02-16 12:43:33 -08:00
Yi Luo
1cb44945fb
Merge "Add idct32x32_135_add SSSE3 intrinsics"
2017-02-16 20:43:29 +00:00
Johann
2104454607
block error avx2: use tran_low_t
...
Change-Id: Ic5f3a1f569d6f82afeaf4fcd7235374bb460db3c
2017-02-16 12:39:02 -08:00
Yi Luo
72a43e2378
Add idct32x32_135_add SSSE3 intrinsics
...
- Replace the corresponding assembly code.
- No user level speed performance degrade.
- Unit tests passed.
Change-Id: Idd0c5a4bad4976f1617c34100cb46e75e3b961e5
2017-02-16 11:29:34 -08:00
Johann
4682130b60
quantize_fp highbd ssse3: use tran_low_t for coeff
...
Change-Id: Iebade0efc0efbb0a80a0f3adbef4962e3a2f25e8
2017-02-16 07:40:56 -08:00
Johann
44600442dc
bitdepth conversion: really use num elements
...
The previous implementation confused bit/bytes/elements. It was using
'32' as the multiplier but that was mistakenly adopted because a 32x32
transform embedded the stride.
Change-Id: Ieeb867a332416b9a40580b5e7c9b20088e9e691a
2017-02-16 15:02:48 +00:00
Kaustubh Raste
fddf66b741
Fix mips vpx_post_proc_down_and_across_mb_row_msa function
...
Added fix to handle non-multiple of 16 cols case for size 16
Change-Id: If3a6d772d112077c5e0a9be9e612e1148f04338c
2017-02-16 13:17:00 +05:30