660 Commits

Author SHA1 Message Date
Linfeng Zhang
0178d974e5 Clean specializes of idct functions
Change-Id: I8bb660de47b5f97263ec381dc428db96e9c9a4b2
2017-05-02 18:01:19 -07:00
Linfeng Zhang
4412996d59 Clean add_protos of highbd idct functions
Change-Id: Ica51d780b92b316ce9112740c56cdf7670816371
2017-05-02 17:59:38 -07:00
Linfeng Zhang
a7a57d9756 Clean add_protos of idct functions
Change-Id: I6037525d92ec172810edab720389eb1865ed3b1a
2017-05-02 17:58:40 -07:00
Luca Barbato
d51d3934f5 ppc: Add convolve_avg
Change-Id: Ib203c444c708f42072e38301ee3db97b5b53d014
2017-04-29 15:47:25 +02:00
Luca Barbato
63860ba7b8 ppc: Add convolve_copy
Change-Id: Ie26d6dbe090e711d84bac01ba7da270db983f405
2017-04-29 15:47:25 +02:00
Linfeng Zhang
51dc998f3a Update highbd convolve functions arguments to use uint16_t src/dst
BUG=webm:1388

Change-Id: I6912de2639895d817ce850da8ea9f6c8fe21da42
2017-04-25 14:22:19 -07:00
Luca Barbato
914b160fb5 ppc: h predictor 8x8
Slightly faster with the current compiler.

Change-Id: Iae225fac08395eb430c97a2abec69c60f5cf5c47
2017-04-19 19:57:51 -07:00
Luca Barbato
0b9be93205 ppc: d63 predictor 8x8
10x faster.

Change-Id: I7cedbf4df2ce7df5b6f1108b11815d088fdb9ba8
2017-04-19 19:57:51 -07:00
Luca Barbato
ee9325b0bd ppc: tm predictor 4x4
Slightly faster.

Change-Id: I0ca43f309b3d9b50435d69bd5be64b53a99bd191
2017-04-19 19:57:51 -07:00
Luca Barbato
2904eb5800 ppc: h predictor 4x4
2x faster.

Change-Id: I0583dec353299c6797401b646099f18db4e0420d
2017-04-19 19:57:51 -07:00
Luca Barbato
58245d7050 ppc: dc predictor 8x8
Slightly faster, the other dc predictors cannot be faster since
the computation speedup is overwhelmed by the time spent reading
dst to write just the 8x8 part.

Change-Id: I94a0b50500adf8b7b6bb919dbf5c7adf5b9fba66
2017-04-19 19:57:51 -07:00
Luca Barbato
6b4a65e8b1 ppc: d45 predictor 8x8
11x faster.

Change-Id: I5b8f39213ee1f5260724fc254e3fb5c462435798
2017-04-19 19:57:51 -07:00
Luca Barbato
92e33c7b31 ppc: d63 predictor 32x32
About 10x faster.

Change-Id: If7d0645f75c5d7deb9751edd0bf47e2f9068e9e7
2017-04-19 19:57:51 -07:00
Luca Barbato
a5469a00a8 ppc: d63 predictor 16x16
About 18x faster.

Change-Id: Id043bf76c011e03e992085bb5e20f330d3e98cd4
2017-04-19 19:57:51 -07:00
Luca Barbato
cc868da526 ppc: d45 predictor 32x32
About 12x faster.

Change-Id: I22c150256aefb4941861ab1f6c17d554fb694bed
2017-04-19 19:57:51 -07:00
Luca Barbato
7a7dc9e624 ppc: d45 predictor 16x16
About 16x faster.

Change-Id: Ie5469fb32d5fd11bb6cb06318cea475d8a5b00b9
2017-04-19 19:57:51 -07:00
Luca Barbato
c08baa2900 ppc: dc predictor 32x32
10x and 5x faster.

Change-Id: I7913c58c768334d818f541a5e219f1035791eeaf
2017-04-19 19:57:47 -07:00
Luca Barbato
22ca468c7c ppc: dc top and left predictor 32x32
6x faster.

Change-Id: I717995b4056e5579c68191d11b495372971fe1ae
2017-04-19 19:49:31 -07:00
Luca Barbato
ad9dea1f6d ppc: dc top and left predictor 16x16
13x faster.

Change-Id: I1771ac39fda599153f933cb3f0506c9f97a6cbe6
2017-04-19 19:49:31 -07:00
Luca Barbato
d68d37872c ppc: dc_128 predictor 32x32
6x faster.

Change-Id: I1da8f51b4262871cb98f0aa03ccda41b0ac2b08b
2017-04-19 19:49:31 -07:00
Luca Barbato
f9d20e6df2 ppc: dc_128 predictor 16x16
20x faster.

Change-Id: I05f0deb2d38ae7966eae6b71fbc0aa51880e5709
2017-04-19 19:49:31 -07:00
Luca Barbato
0d9417de4a ppc: tm predictor 32x32
About 8x faster.

Change-Id: I9bad827ccbdf47ec95406e961c74ac2ff45f80cf
2017-04-19 19:49:26 -07:00
James Zern
a81f037f15 Merge changes I1f5a3752,I95123051,I3bb724e0,Ie81077fa,Ic80f3c05, ...
* changes:
  ppc: tm predictor 16x16
  ppc: tm predictor 8x8
  ppc: horizontal predictor 32x32
  ppc: horizontal predictor 16x16
  ppc: vertical intrapred 16x16 and 32x32
  configure: Workaround clang not enabling altivec on -mvsx
  configure: Match power*64* as ppc64
2017-04-20 02:45:45 +00:00
Linfeng Zhang
bf8a49abbd Clean CONVERT_TO_BYTEPTR/SHORTPTR in convolve
Replace by CAST_TO_BYTEPTR/SHORTPTR.
The rule is: if a short ptr is casted to a byte ptr, any offset
operation on the byte ptr must be doubled. We do this by casting to
short ptr first, adding offset, then casting back to byte ptr.

BUG=webm:1388

Change-Id: I9e18a73ba45ddae58fc9dae470c0ff34951fe248
2017-04-19 12:13:49 -07:00
Luca Barbato
479443a570 ppc: tm predictor 16x16
About 10x faster.

Change-Id: I1f5a3752d346459df3b45f92963208bf3e520f06
2017-04-19 01:48:10 +02:00
Luca Barbato
c8f5a55df4 ppc: tm predictor 8x8
About 5x faster.

Change-Id: I951230517f49c0dca9ac9eac2efa8916a303b85a
2017-04-19 01:48:09 +02:00
Luca Barbato
7b0e12934e ppc: horizontal predictor 32x32
About 5x faster.

Change-Id: I3bb724e07baffd901aa2d0f65060ba48882cc9b8
2017-04-19 01:48:09 +02:00
Luca Barbato
a7a2d1653b ppc: horizontal predictor 16x16
About 10x faster.

Change-Id: Ie81077fa32ad214cdb46bdcb0be4e9e2c7df47c2
2017-04-19 01:48:09 +02:00
Luca Barbato
7ad1faa6f8 ppc: vertical intrapred 16x16 and 32x32
Change-Id: Ic80f3c050cfbe7697e81a311b4edaaa597b85cab
2017-04-19 01:48:09 +02:00
Johann
9fa24f03b5 re-enable vpx_comp_avg_pred_sse2
Buffers on 32 bit x86 builds only guaranteed 8 byte alignment. Fixed
with "AvgPred test: use aligned buffers" and "sad avg: align
intermediate buffer"

Also re-enable asserts on the C version.

BUG=webm:1390

Change-Id: I93081f1b0002a352bb0a3371ac35452417fa8514
2017-04-17 08:40:43 -07:00
Johann
069b772915 sad avg: align intermediate buffer
comp_avg_pred has started declaring a requirement for aligned buffers.

BUG=webm:1390

Change-Id: Idaf6667498ea343e8d49b32bc9d8b9d0aa43ef5c
2017-04-17 14:26:33 +00:00
James Zern
4ba20da8b1 Merge "Add AVX2 optimization to copy/avg functions" 2017-04-15 00:26:08 +00:00
Yi Luo
aa5a941992 Add AVX2 optimization to copy/avg functions
Change-Id: Ibcef70e4fead74e2c2909330a7044a29381a8074
2017-04-14 16:50:10 -07:00
Johann
eaa7cdf05d Disable vpx_comp_avg_pred_sse2
Failures on windows:
unknown file: error: SEH exception with code 0xc0000005 thrown in the
test body.

Alignment check errors on linux:
test_libvpx: ../libvpx/vpx_dsp/variance.c:230: void
vpx_comp_avg_pred_c(uint8_t *, const uint8_t *, int, int, const uint8_t
*, int): Assertion `((intptr_t)comp_pred & 0xf) == 0' failed.

BUG=webm:1390

Change-Id: I5eed5381c0f1a8fe594a128eb415e77232f544ea
2017-04-14 08:43:06 -07:00
Johann
28a8622143 vpx_comp_avg_pred: sse2 optimization
Provides over 15x speedup for width > 8.

Due to smaller loads and shifting for width == 8 it gets about 8x
speedup.

For width == 4 it's only about 4x speedup because there is a lot of
shuffling and shifting to get the data properly situated.

BUG=webm:1390

Change-Id: Ice0b3dbbf007be3d9509786a61e7f35e94bdffa8
2017-04-13 08:44:52 -07:00
James Zern
04e9456567 Merge changes from topic 'Wshorten'
* changes:
  configure: enable -Wshorten-64-to-32 for hbd
  vp9_encodeframe: resolve -Wshorten-64-to-32 in hbd
  Resolve -Wshorten-64-to-32 in highbd variance.
2017-04-07 07:32:14 +00:00
James Zern
47b9a09120 Resolve -Wshorten-64-to-32 in highbd variance.
For 8-bit the subtrahend is small enough to fit into uint32_t.

This is the same that was done for:
c0241664a Resolve -Wshorten-64-to-32 in variance.

For 10/12-bit apply:
63a37d16f Prevent negative variance

Change-Id: Iab35e3f3f269035e17c711bd6cc01272c3137e1d
2017-04-05 17:34:02 -07:00
Linfeng Zhang
6fc2e57c2c Update 32x32 high bitdepth idct NEON optimization
Preparation of CONVERT_TO_BYTEPTR/SHORTPTR clean up.

BUG=webm:1388

Change-Id: I928d30a5698023bb90888d783cf81c51ec183760
2017-04-05 15:28:11 -07:00
James Zern
aefc1088a2 intrapred: sync highbd_d135_predictor w/d135_
previously:
05437805f intrapred/d135: flatten border results before storing

BUG=webm:1316

Change-Id: I3b8bd89117ad7f2f4560b57f7c148da781e86f85
2017-03-24 20:45:44 -07:00
James Zern
67cde46dd7 intrapred: specialize highbd 4x4 predictors
d207/d63/d45/d117/d135/d153

~9-45% better depending on the predictor on 32-bit ARM, similar range on
x86-64

this matches the non-highbitdepth implementation

BUG=webm:1316

Change-Id: Iddebdf7c58c6f31c47cae04da95c6e5318200e4c
2017-03-24 20:45:36 -07:00
James Zern
e05f4cf8f4 intrapred: rename d63f to d63e
this is consistent with he/ve/d45e

Change-Id: I75641ae5667430b0ecd370db86fff6e666cb577d
2017-03-24 20:41:39 -07:00
James Zern
d45617c702 remove CONFIG_MISC_FIXES
this belonged to vp10 with the changes now migrated to av1.

Change-Id: Ie30ead3e7b71f465bc14136e1b6f156ea978c43f
2017-03-24 20:41:39 -07:00
Kaustubh Raste
8ee9b855a0 Merge "Fix mips msa fwd xform mismatch" 2017-03-23 07:44:16 +00:00
James Zern
f16ea6a6eb Merge "vp9_rdopt: correct size to vpx_sum_squares_2d_i16" 2017-03-23 00:53:22 +00:00
James Zern
e097bb1d39 Merge "idct_neon: prefix non-static functions w/'vpx_'" 2017-03-22 19:30:11 +00:00
James Zern
5661cd8ff4 vp9_rdopt: correct size to vpx_sum_squares_2d_i16
the current implementations expect pixel size, not the block type

BUG=webm:1392

Change-Id: Ib91e9f30a1f56e13566b1fb76f089dae9bb50cdc
2017-03-22 12:04:33 -07:00
James Zern
f91c3bb3ab idct_neon: prefix non-static functions w/'vpx_'
Change-Id: I94fcdeae18468e6ef0cb7119b8142d982a048031
2017-03-22 11:49:23 -07:00
Kaustubh Raste
e45c1f55b4 Fix mips msa fwd xform mismatch
Change-Id: I32a6df11463144aa1a562256ee7d57a41fd678d6
2017-03-22 14:01:03 +05:30
Yi Luo
cb9b277b2f Merge "Make butterfly_self() signature consistent with butterfly()" 2017-03-21 22:32:20 +00:00
Yi Luo
266868a40b Make butterfly_self() signature consistent with butterfly()
- Refer to patch: 48fca113d inv_txfm_ssse3,butterfly: fix win32 abi
  compatibility.
- Change four butterfly() calls to butterfly_self(), to simplify the
  operations.

Change-Id: Ib2a8cfe6cddcaf0a59e6e6270d8380055ea42ef3
2017-03-21 09:36:35 -07:00