Commit Graph

639 Commits

Author SHA1 Message Date
Luca Barbato
0d9417de4a ppc: tm predictor 32x32
About 8x faster.

Change-Id: I9bad827ccbdf47ec95406e961c74ac2ff45f80cf
2017-04-19 19:49:26 -07:00
James Zern
a81f037f15 Merge changes I1f5a3752,I95123051,I3bb724e0,Ie81077fa,Ic80f3c05, ...
* changes:
  ppc: tm predictor 16x16
  ppc: tm predictor 8x8
  ppc: horizontal predictor 32x32
  ppc: horizontal predictor 16x16
  ppc: vertical intrapred 16x16 and 32x32
  configure: Workaround clang not enabling altivec on -mvsx
  configure: Match power*64* as ppc64
2017-04-20 02:45:45 +00:00
Linfeng Zhang
bf8a49abbd Clean CONVERT_TO_BYTEPTR/SHORTPTR in convolve
Replace by CAST_TO_BYTEPTR/SHORTPTR.
The rule is: if a short ptr is casted to a byte ptr, any offset
operation on the byte ptr must be doubled. We do this by casting to
short ptr first, adding offset, then casting back to byte ptr.

BUG=webm:1388

Change-Id: I9e18a73ba45ddae58fc9dae470c0ff34951fe248
2017-04-19 12:13:49 -07:00
Luca Barbato
479443a570 ppc: tm predictor 16x16
About 10x faster.

Change-Id: I1f5a3752d346459df3b45f92963208bf3e520f06
2017-04-19 01:48:10 +02:00
Luca Barbato
c8f5a55df4 ppc: tm predictor 8x8
About 5x faster.

Change-Id: I951230517f49c0dca9ac9eac2efa8916a303b85a
2017-04-19 01:48:09 +02:00
Luca Barbato
7b0e12934e ppc: horizontal predictor 32x32
About 5x faster.

Change-Id: I3bb724e07baffd901aa2d0f65060ba48882cc9b8
2017-04-19 01:48:09 +02:00
Luca Barbato
a7a2d1653b ppc: horizontal predictor 16x16
About 10x faster.

Change-Id: Ie81077fa32ad214cdb46bdcb0be4e9e2c7df47c2
2017-04-19 01:48:09 +02:00
Luca Barbato
7ad1faa6f8 ppc: vertical intrapred 16x16 and 32x32
Change-Id: Ic80f3c050cfbe7697e81a311b4edaaa597b85cab
2017-04-19 01:48:09 +02:00
Johann
9fa24f03b5 re-enable vpx_comp_avg_pred_sse2
Buffers on 32 bit x86 builds only guaranteed 8 byte alignment. Fixed
with "AvgPred test: use aligned buffers" and "sad avg: align
intermediate buffer"

Also re-enable asserts on the C version.

BUG=webm:1390

Change-Id: I93081f1b0002a352bb0a3371ac35452417fa8514
2017-04-17 08:40:43 -07:00
Johann
069b772915 sad avg: align intermediate buffer
comp_avg_pred has started declaring a requirement for aligned buffers.

BUG=webm:1390

Change-Id: Idaf6667498ea343e8d49b32bc9d8b9d0aa43ef5c
2017-04-17 14:26:33 +00:00
James Zern
4ba20da8b1 Merge "Add AVX2 optimization to copy/avg functions" 2017-04-15 00:26:08 +00:00
Yi Luo
aa5a941992 Add AVX2 optimization to copy/avg functions
Change-Id: Ibcef70e4fead74e2c2909330a7044a29381a8074
2017-04-14 16:50:10 -07:00
Johann
eaa7cdf05d Disable vpx_comp_avg_pred_sse2
Failures on windows:
unknown file: error: SEH exception with code 0xc0000005 thrown in the
test body.

Alignment check errors on linux:
test_libvpx: ../libvpx/vpx_dsp/variance.c:230: void
vpx_comp_avg_pred_c(uint8_t *, const uint8_t *, int, int, const uint8_t
*, int): Assertion `((intptr_t)comp_pred & 0xf) == 0' failed.

BUG=webm:1390

Change-Id: I5eed5381c0f1a8fe594a128eb415e77232f544ea
2017-04-14 08:43:06 -07:00
Johann
28a8622143 vpx_comp_avg_pred: sse2 optimization
Provides over 15x speedup for width > 8.

Due to smaller loads and shifting for width == 8 it gets about 8x
speedup.

For width == 4 it's only about 4x speedup because there is a lot of
shuffling and shifting to get the data properly situated.

BUG=webm:1390

Change-Id: Ice0b3dbbf007be3d9509786a61e7f35e94bdffa8
2017-04-13 08:44:52 -07:00
James Zern
04e9456567 Merge changes from topic 'Wshorten'
* changes:
  configure: enable -Wshorten-64-to-32 for hbd
  vp9_encodeframe: resolve -Wshorten-64-to-32 in hbd
  Resolve -Wshorten-64-to-32 in highbd variance.
2017-04-07 07:32:14 +00:00
James Zern
47b9a09120 Resolve -Wshorten-64-to-32 in highbd variance.
For 8-bit the subtrahend is small enough to fit into uint32_t.

This is the same that was done for:
c0241664a Resolve -Wshorten-64-to-32 in variance.

For 10/12-bit apply:
63a37d16f Prevent negative variance

Change-Id: Iab35e3f3f269035e17c711bd6cc01272c3137e1d
2017-04-05 17:34:02 -07:00
Linfeng Zhang
6fc2e57c2c Update 32x32 high bitdepth idct NEON optimization
Preparation of CONVERT_TO_BYTEPTR/SHORTPTR clean up.

BUG=webm:1388

Change-Id: I928d30a5698023bb90888d783cf81c51ec183760
2017-04-05 15:28:11 -07:00
James Zern
aefc1088a2 intrapred: sync highbd_d135_predictor w/d135_
previously:
05437805f intrapred/d135: flatten border results before storing

BUG=webm:1316

Change-Id: I3b8bd89117ad7f2f4560b57f7c148da781e86f85
2017-03-24 20:45:44 -07:00
James Zern
67cde46dd7 intrapred: specialize highbd 4x4 predictors
d207/d63/d45/d117/d135/d153

~9-45% better depending on the predictor on 32-bit ARM, similar range on
x86-64

this matches the non-highbitdepth implementation

BUG=webm:1316

Change-Id: Iddebdf7c58c6f31c47cae04da95c6e5318200e4c
2017-03-24 20:45:36 -07:00
James Zern
e05f4cf8f4 intrapred: rename d63f to d63e
this is consistent with he/ve/d45e

Change-Id: I75641ae5667430b0ecd370db86fff6e666cb577d
2017-03-24 20:41:39 -07:00
James Zern
d45617c702 remove CONFIG_MISC_FIXES
this belonged to vp10 with the changes now migrated to av1.

Change-Id: Ie30ead3e7b71f465bc14136e1b6f156ea978c43f
2017-03-24 20:41:39 -07:00
Kaustubh Raste
8ee9b855a0 Merge "Fix mips msa fwd xform mismatch" 2017-03-23 07:44:16 +00:00
James Zern
f16ea6a6eb Merge "vp9_rdopt: correct size to vpx_sum_squares_2d_i16" 2017-03-23 00:53:22 +00:00
James Zern
e097bb1d39 Merge "idct_neon: prefix non-static functions w/'vpx_'" 2017-03-22 19:30:11 +00:00
James Zern
5661cd8ff4 vp9_rdopt: correct size to vpx_sum_squares_2d_i16
the current implementations expect pixel size, not the block type

BUG=webm:1392

Change-Id: Ib91e9f30a1f56e13566b1fb76f089dae9bb50cdc
2017-03-22 12:04:33 -07:00
James Zern
f91c3bb3ab idct_neon: prefix non-static functions w/'vpx_'
Change-Id: I94fcdeae18468e6ef0cb7119b8142d982a048031
2017-03-22 11:49:23 -07:00
Kaustubh Raste
e45c1f55b4 Fix mips msa fwd xform mismatch
Change-Id: I32a6df11463144aa1a562256ee7d57a41fd678d6
2017-03-22 14:01:03 +05:30
Yi Luo
cb9b277b2f Merge "Make butterfly_self() signature consistent with butterfly()" 2017-03-21 22:32:20 +00:00
Yi Luo
266868a40b Make butterfly_self() signature consistent with butterfly()
- Refer to patch: 48fca113d inv_txfm_ssse3,butterfly: fix win32 abi
  compatibility.
- Change four butterfly() calls to butterfly_self(), to simplify the
  operations.

Change-Id: Ib2a8cfe6cddcaf0a59e6e6270d8380055ea42ef3
2017-03-21 09:36:35 -07:00
James Zern
e0b4c4d1ae Merge "Add vpx_highbd_idct32x32_1024_add_neon()" 2017-03-21 03:27:35 +00:00
James Zern
6d71d33d55 Merge "Add vpx_highbd_idct32x32_34_add_neon()" 2017-03-21 03:02:51 +00:00
James Zern
5da2e500d7 inv_txfm_sse2: clear conversion warning in hbd build
tran_high -> tran_low in return from dct_const_round_shift()

Change-Id: I2fe06c4b604823b1d1fe40a487017c3c2819a440
2017-03-17 01:16:38 -07:00
Linfeng Zhang
27530d484e Add vpx_highbd_idct32x32_1024_add_neon()
BUG=webm:1301

Change-Id: Ib90af0c1712e56b301d0e981dbe9a641e15e36ca
2017-03-17 00:27:46 -07:00
Linfeng Zhang
50b13f75b8 Add vpx_highbd_idct32x32_34_add_neon()
BUG=webm:1301

Change-Id: I74dd16c6c64e7bb71aa991cedccddf0663ef5e06
2017-03-17 00:27:46 -07:00
James Zern
2882778310 Merge "Add vpx_highbd_idct32x32_135_add_neon()" 2017-03-17 07:26:52 +00:00
Linfeng Zhang
65e9fb65e8 Add vpx_highbd_idct32x32_135_add_neon()
BUG=webm:1301

Change-Id: I58c2d65d385080711c3666d6d8f9d241dac7b21a
2017-03-16 22:37:55 -07:00
James Zern
68efc64b72 Merge "Clean vpx_idct32x32_1024_add_neon()" 2017-03-17 05:24:58 +00:00
Rafael de Lucena Valle
405b94c661 Add Hadamard for Power8
Change-Id: I3b4b043c1402b4100653ace4869847e030861b18
Signed-off-by: Rafael de Lucena Valle <rafaeldelucena@gmail.com>
2017-03-15 23:46:18 -03:00
Linfeng Zhang
e54231d613 Clean vpx_idct32x32_1024_add_neon()
Change-Id: I05921e16d6a3e4e7e5b00a90624735050a186636
2017-03-15 11:24:31 -07:00
Yi Luo
8440cc4817 Merge "Improve idct32x32_1024_add SSSE3 intrinsics performance" 2017-03-15 02:32:52 +00:00
Linfeng Zhang
c756eb01c8 Fix overflow issue in 32x32 idct NEON intrinsics
Similar issue as Change bc1c18e.

The PartialIDctTest.ResultsMatch test on vpx_idct32x32_135_add_neon()
in high bit-depth mode exposes 16-bit overflow in final stage of pass
2, when changing the test number from 1,000 to 1,000,000.

Change to use saturating add/sub for vpx_idct32x32_34_add_neon(),
vpx_idct32x32_135_add_neon and vpx_idct32x32_1024_add_neon() in high
bit-depth mode.

Change-Id: Iaec0e9aeab41a3fdb4e170d7e9b3ad1fda922f6f
2017-03-14 16:59:14 -07:00
Yi Luo
fedcf83f33 Improve idct32x32_1024_add SSSE3 intrinsics performance
- Function level speed improves ~12%.

Change-Id: I9b7dbddabf08c7d0f6b25264e6074d5ccbe39290
2017-03-14 14:04:08 -07:00
Linfeng Zhang
b0bfcc368c Merge "Add vpx_highbd_idct32x32_135_add_c()" 2017-03-13 18:49:01 +00:00
James Zern
48fca113d1 inv_txfm_ssse3,butterfly: fix win32 abi compatibility
only the first 3 parameters can be aligned to 16 as required by __m128i,
make them all pointers for consistency.

since:
07c48ccfe Improve idct32x32_34_add SSSE3 intrinsics performance

BUG=webm:1384

Change-Id: I0324f701e723a27cb470036a180693ba8829d01d
2017-03-10 19:57:17 -08:00
Yi Luo
327add990f Improve idct32x32_135_add SSSE3 intrinsics performance
- Split the inv txfm into three parts to avoid stack spillover.
- Function level speed improves ~12%.
- Use function and macro to remove some repeated code.

Change-Id: I14f5f072334fd766808cb52bf648df792e7379ee
2017-03-09 16:17:54 -08:00
Linfeng Zhang
77311e0dff Update vpx_idct32x32_1024_add_neon()
Most are cosmetics changes.
Speed has no change with clang 3.8, and about 5% faster with gcc 4.8.4

Tried the strategy used in 8x8 and 16x16 (which operations' orders are
similar to the C code), though speed gets better with gcc, it's worse
with clang.

Tried to remove store_in_output(), but speed gets worse.

Change-Id: I93c8d284e90836f98962bb23d63a454cd40f776e
2017-03-08 12:39:04 -08:00
Linfeng Zhang
48f5886605 Add vpx_highbd_idct32x32_135_add_c()
When eob is less than or equal to 135 for high-bitdepth 32x32 idct,
call this function.

BUG=webm:1301

Change-Id: I8a5864f5c076e449c984e602946547a7b09c9fe6
2017-03-08 10:46:33 -08:00
Linfeng Zhang
c4e5c54d69 cosmetics,dsp/arm/: vpx_idct32x32_{34,135}_add_neon()
No speed changes and disassembly is almost identical.

Change-Id: Id07996237d2607ca6004da5906b7d288b8307e1f
2017-03-08 08:58:32 -08:00
Linfeng Zhang
3cf5c213f1 cosmetics,dsp/arm/: rename a variable
Rename cospi_6_26_14_18N to cospi_6_26N_14_18N for consistency.

Change-Id: I00498b43bb612b368219a489b3adaa41729bf31a
2017-03-08 08:55:41 -08:00
Yi Luo
07c48ccfe0 Improve idct32x32_34_add SSSE3 intrinsics performance
- Split the transform into first half and second half.
- Reschedule the instructions to avoid stack spillover.
- Function level speed improves ~16%.

Change-Id: I166889840d23aa8a273eca00f6fbdae8b4566f35
2017-03-01 11:14:48 -08:00