Luca Barbato
92e33c7b31
ppc: d63 predictor 32x32
...
About 10x faster.
Change-Id: If7d0645f75c5d7deb9751edd0bf47e2f9068e9e7
2017-04-19 19:57:51 -07:00
Luca Barbato
a5469a00a8
ppc: d63 predictor 16x16
...
About 18x faster.
Change-Id: Id043bf76c011e03e992085bb5e20f330d3e98cd4
2017-04-19 19:57:51 -07:00
Luca Barbato
cc868da526
ppc: d45 predictor 32x32
...
About 12x faster.
Change-Id: I22c150256aefb4941861ab1f6c17d554fb694bed
2017-04-19 19:57:51 -07:00
Luca Barbato
7a7dc9e624
ppc: d45 predictor 16x16
...
About 16x faster.
Change-Id: Ie5469fb32d5fd11bb6cb06318cea475d8a5b00b9
2017-04-19 19:57:51 -07:00
Luca Barbato
c08baa2900
ppc: dc predictor 32x32
...
10x and 5x faster.
Change-Id: I7913c58c768334d818f541a5e219f1035791eeaf
2017-04-19 19:57:47 -07:00
Luca Barbato
22ca468c7c
ppc: dc top and left predictor 32x32
...
6x faster.
Change-Id: I717995b4056e5579c68191d11b495372971fe1ae
2017-04-19 19:49:31 -07:00
Luca Barbato
ad9dea1f6d
ppc: dc top and left predictor 16x16
...
13x faster.
Change-Id: I1771ac39fda599153f933cb3f0506c9f97a6cbe6
2017-04-19 19:49:31 -07:00
Luca Barbato
d68d37872c
ppc: dc_128 predictor 32x32
...
6x faster.
Change-Id: I1da8f51b4262871cb98f0aa03ccda41b0ac2b08b
2017-04-19 19:49:31 -07:00
Luca Barbato
f9d20e6df2
ppc: dc_128 predictor 16x16
...
20x faster.
Change-Id: I05f0deb2d38ae7966eae6b71fbc0aa51880e5709
2017-04-19 19:49:31 -07:00
Luca Barbato
0d9417de4a
ppc: tm predictor 32x32
...
About 8x faster.
Change-Id: I9bad827ccbdf47ec95406e961c74ac2ff45f80cf
2017-04-19 19:49:26 -07:00
James Zern
a81f037f15
Merge changes I1f5a3752,I95123051,I3bb724e0,Ie81077fa,Ic80f3c05, ...
...
* changes:
ppc: tm predictor 16x16
ppc: tm predictor 8x8
ppc: horizontal predictor 32x32
ppc: horizontal predictor 16x16
ppc: vertical intrapred 16x16 and 32x32
configure: Workaround clang not enabling altivec on -mvsx
configure: Match power*64* as ppc64
2017-04-20 02:45:45 +00:00
Linfeng Zhang
bf8a49abbd
Clean CONVERT_TO_BYTEPTR/SHORTPTR in convolve
...
Replace by CAST_TO_BYTEPTR/SHORTPTR.
The rule is: if a short ptr is casted to a byte ptr, any offset
operation on the byte ptr must be doubled. We do this by casting to
short ptr first, adding offset, then casting back to byte ptr.
BUG=webm:1388
Change-Id: I9e18a73ba45ddae58fc9dae470c0ff34951fe248
2017-04-19 12:13:49 -07:00
Luca Barbato
479443a570
ppc: tm predictor 16x16
...
About 10x faster.
Change-Id: I1f5a3752d346459df3b45f92963208bf3e520f06
2017-04-19 01:48:10 +02:00
Luca Barbato
c8f5a55df4
ppc: tm predictor 8x8
...
About 5x faster.
Change-Id: I951230517f49c0dca9ac9eac2efa8916a303b85a
2017-04-19 01:48:09 +02:00
Luca Barbato
7b0e12934e
ppc: horizontal predictor 32x32
...
About 5x faster.
Change-Id: I3bb724e07baffd901aa2d0f65060ba48882cc9b8
2017-04-19 01:48:09 +02:00
Luca Barbato
a7a2d1653b
ppc: horizontal predictor 16x16
...
About 10x faster.
Change-Id: Ie81077fa32ad214cdb46bdcb0be4e9e2c7df47c2
2017-04-19 01:48:09 +02:00
Luca Barbato
7ad1faa6f8
ppc: vertical intrapred 16x16 and 32x32
...
Change-Id: Ic80f3c050cfbe7697e81a311b4edaaa597b85cab
2017-04-19 01:48:09 +02:00
Johann
9fa24f03b5
re-enable vpx_comp_avg_pred_sse2
...
Buffers on 32 bit x86 builds only guaranteed 8 byte alignment. Fixed
with "AvgPred test: use aligned buffers" and "sad avg: align
intermediate buffer"
Also re-enable asserts on the C version.
BUG=webm:1390
Change-Id: I93081f1b0002a352bb0a3371ac35452417fa8514
2017-04-17 08:40:43 -07:00
Johann
069b772915
sad avg: align intermediate buffer
...
comp_avg_pred has started declaring a requirement for aligned buffers.
BUG=webm:1390
Change-Id: Idaf6667498ea343e8d49b32bc9d8b9d0aa43ef5c
2017-04-17 14:26:33 +00:00
James Zern
4ba20da8b1
Merge "Add AVX2 optimization to copy/avg functions"
2017-04-15 00:26:08 +00:00
Yi Luo
aa5a941992
Add AVX2 optimization to copy/avg functions
...
Change-Id: Ibcef70e4fead74e2c2909330a7044a29381a8074
2017-04-14 16:50:10 -07:00
Johann
eaa7cdf05d
Disable vpx_comp_avg_pred_sse2
...
Failures on windows:
unknown file: error: SEH exception with code 0xc0000005 thrown in the
test body.
Alignment check errors on linux:
test_libvpx: ../libvpx/vpx_dsp/variance.c:230: void
vpx_comp_avg_pred_c(uint8_t *, const uint8_t *, int, int, const uint8_t
*, int): Assertion `((intptr_t)comp_pred & 0xf) == 0' failed.
BUG=webm:1390
Change-Id: I5eed5381c0f1a8fe594a128eb415e77232f544ea
2017-04-14 08:43:06 -07:00
Johann
28a8622143
vpx_comp_avg_pred: sse2 optimization
...
Provides over 15x speedup for width > 8.
Due to smaller loads and shifting for width == 8 it gets about 8x
speedup.
For width == 4 it's only about 4x speedup because there is a lot of
shuffling and shifting to get the data properly situated.
BUG=webm:1390
Change-Id: Ice0b3dbbf007be3d9509786a61e7f35e94bdffa8
2017-04-13 08:44:52 -07:00
James Zern
04e9456567
Merge changes from topic 'Wshorten'
...
* changes:
configure: enable -Wshorten-64-to-32 for hbd
vp9_encodeframe: resolve -Wshorten-64-to-32 in hbd
Resolve -Wshorten-64-to-32 in highbd variance.
2017-04-07 07:32:14 +00:00
James Zern
47b9a09120
Resolve -Wshorten-64-to-32 in highbd variance.
...
For 8-bit the subtrahend is small enough to fit into uint32_t.
This is the same that was done for:
c0241664a Resolve -Wshorten-64-to-32 in variance.
For 10/12-bit apply:
63a37d16f Prevent negative variance
Change-Id: Iab35e3f3f269035e17c711bd6cc01272c3137e1d
2017-04-05 17:34:02 -07:00
Linfeng Zhang
6fc2e57c2c
Update 32x32 high bitdepth idct NEON optimization
...
Preparation of CONVERT_TO_BYTEPTR/SHORTPTR clean up.
BUG=webm:1388
Change-Id: I928d30a5698023bb90888d783cf81c51ec183760
2017-04-05 15:28:11 -07:00
James Zern
aefc1088a2
intrapred: sync highbd_d135_predictor w/d135_
...
previously:
05437805f intrapred/d135: flatten border results before storing
BUG=webm:1316
Change-Id: I3b8bd89117ad7f2f4560b57f7c148da781e86f85
2017-03-24 20:45:44 -07:00
James Zern
67cde46dd7
intrapred: specialize highbd 4x4 predictors
...
d207/d63/d45/d117/d135/d153
~9-45% better depending on the predictor on 32-bit ARM, similar range on
x86-64
this matches the non-highbitdepth implementation
BUG=webm:1316
Change-Id: Iddebdf7c58c6f31c47cae04da95c6e5318200e4c
2017-03-24 20:45:36 -07:00
James Zern
e05f4cf8f4
intrapred: rename d63f to d63e
...
this is consistent with he/ve/d45e
Change-Id: I75641ae5667430b0ecd370db86fff6e666cb577d
2017-03-24 20:41:39 -07:00
James Zern
d45617c702
remove CONFIG_MISC_FIXES
...
this belonged to vp10 with the changes now migrated to av1.
Change-Id: Ie30ead3e7b71f465bc14136e1b6f156ea978c43f
2017-03-24 20:41:39 -07:00
Kaustubh Raste
8ee9b855a0
Merge "Fix mips msa fwd xform mismatch"
2017-03-23 07:44:16 +00:00
James Zern
f16ea6a6eb
Merge "vp9_rdopt: correct size to vpx_sum_squares_2d_i16"
2017-03-23 00:53:22 +00:00
James Zern
e097bb1d39
Merge "idct_neon: prefix non-static functions w/'vpx_'"
2017-03-22 19:30:11 +00:00
James Zern
5661cd8ff4
vp9_rdopt: correct size to vpx_sum_squares_2d_i16
...
the current implementations expect pixel size, not the block type
BUG=webm:1392
Change-Id: Ib91e9f30a1f56e13566b1fb76f089dae9bb50cdc
2017-03-22 12:04:33 -07:00
James Zern
f91c3bb3ab
idct_neon: prefix non-static functions w/'vpx_'
...
Change-Id: I94fcdeae18468e6ef0cb7119b8142d982a048031
2017-03-22 11:49:23 -07:00
Kaustubh Raste
e45c1f55b4
Fix mips msa fwd xform mismatch
...
Change-Id: I32a6df11463144aa1a562256ee7d57a41fd678d6
2017-03-22 14:01:03 +05:30
Yi Luo
cb9b277b2f
Merge "Make butterfly_self() signature consistent with butterfly()"
2017-03-21 22:32:20 +00:00
Yi Luo
266868a40b
Make butterfly_self() signature consistent with butterfly()
...
- Refer to patch: 48fca113d inv_txfm_ssse3,butterfly: fix win32 abi
compatibility.
- Change four butterfly() calls to butterfly_self(), to simplify the
operations.
Change-Id: Ib2a8cfe6cddcaf0a59e6e6270d8380055ea42ef3
2017-03-21 09:36:35 -07:00
James Zern
e0b4c4d1ae
Merge "Add vpx_highbd_idct32x32_1024_add_neon()"
2017-03-21 03:27:35 +00:00
James Zern
6d71d33d55
Merge "Add vpx_highbd_idct32x32_34_add_neon()"
2017-03-21 03:02:51 +00:00
James Zern
5da2e500d7
inv_txfm_sse2: clear conversion warning in hbd build
...
tran_high -> tran_low in return from dct_const_round_shift()
Change-Id: I2fe06c4b604823b1d1fe40a487017c3c2819a440
2017-03-17 01:16:38 -07:00
Linfeng Zhang
27530d484e
Add vpx_highbd_idct32x32_1024_add_neon()
...
BUG=webm:1301
Change-Id: Ib90af0c1712e56b301d0e981dbe9a641e15e36ca
2017-03-17 00:27:46 -07:00
Linfeng Zhang
50b13f75b8
Add vpx_highbd_idct32x32_34_add_neon()
...
BUG=webm:1301
Change-Id: I74dd16c6c64e7bb71aa991cedccddf0663ef5e06
2017-03-17 00:27:46 -07:00
James Zern
2882778310
Merge "Add vpx_highbd_idct32x32_135_add_neon()"
2017-03-17 07:26:52 +00:00
Linfeng Zhang
65e9fb65e8
Add vpx_highbd_idct32x32_135_add_neon()
...
BUG=webm:1301
Change-Id: I58c2d65d385080711c3666d6d8f9d241dac7b21a
2017-03-16 22:37:55 -07:00
James Zern
68efc64b72
Merge "Clean vpx_idct32x32_1024_add_neon()"
2017-03-17 05:24:58 +00:00
Rafael de Lucena Valle
405b94c661
Add Hadamard for Power8
...
Change-Id: I3b4b043c1402b4100653ace4869847e030861b18
Signed-off-by: Rafael de Lucena Valle <rafaeldelucena@gmail.com>
2017-03-15 23:46:18 -03:00
Linfeng Zhang
e54231d613
Clean vpx_idct32x32_1024_add_neon()
...
Change-Id: I05921e16d6a3e4e7e5b00a90624735050a186636
2017-03-15 11:24:31 -07:00
Yi Luo
8440cc4817
Merge "Improve idct32x32_1024_add SSSE3 intrinsics performance"
2017-03-15 02:32:52 +00:00
Linfeng Zhang
c756eb01c8
Fix overflow issue in 32x32 idct NEON intrinsics
...
Similar issue as Change bc1c18e.
The PartialIDctTest.ResultsMatch test on vpx_idct32x32_135_add_neon()
in high bit-depth mode exposes 16-bit overflow in final stage of pass
2, when changing the test number from 1,000 to 1,000,000.
Change to use saturating add/sub for vpx_idct32x32_34_add_neon(),
vpx_idct32x32_135_add_neon and vpx_idct32x32_1024_add_neon() in high
bit-depth mode.
Change-Id: Iaec0e9aeab41a3fdb4e170d7e9b3ad1fda922f6f
2017-03-14 16:59:14 -07:00