Commit Graph

776 Commits

Author SHA1 Message Date
Johann
eae7cf2368 fdct16x16 neon optimization
Roughly 2x speedup. Since the only change for HBD is to store(), the
improvement appears to hold there as well.

BUG=webm:1424

Change-Id: I15b813d50deb2e47b49a6b0705945de748e83c19
2017-06-07 14:59:55 -07:00
James Zern
ff42e04f9c Merge "ppc: Add vpx_sadnxmx4d_vsx for n,m = {8, 16, 32 ,64}" 2017-06-06 23:52:39 +00:00
James Zern
4753c23983 Merge "ppc: Add vpx_sad64/32/16x64/32/16_avg_vsx" 2017-06-06 02:19:41 +00:00
Johann Koenig
755b3daf90 Merge "comp_avg_pred neon: used by sub pixel avg variance" 2017-05-31 18:17:28 +00:00
Linfeng Zhang
30ea3ef283 Merge "Update vpx_highbd_idct4x4_16_add_sse2()" 2017-05-31 15:56:20 +00:00
Johann
f695b30ac2 comp_avg_pred neon: used by sub pixel avg variance
BUG=webm:1423

Change-Id: I33de537f238f58f89b7a6c1c2d6e8110de4b8804
2017-05-30 22:47:34 +00:00
Linfeng Zhang
45048dc9dc Update vpx_highbd_idct4x4_16_add_sse2()
BUG=webm:1412

Change-Id: I26e4b34ae9bc1ae80c24f56d740d737a95f1ab84
2017-05-30 09:25:30 -07:00
Johann Koenig
b9649d2407 Merge "comp_avg_pred: alignment" 2017-05-30 16:21:05 +00:00
Johann
ea8b4a450d comp_avg_pred: alignment
x86 requires 16 byte alignment for some vector loads/stores.

arm does not have the same requirement.

The asserts are still in avg_pred_sse2.c. This just removes them from
the common code.

Change-Id: Ic5175c607a94d2abf0b80d431c4e30c8a6f731b6
2017-05-30 07:46:43 -07:00
Johann
42ce25821d remove DECLARE_ALIGNED from neon code
Unlike x86 neon only requires type alignment when loading into vectors.

Change-Id: I7bbbe4d51f78776e499ce137578d8c0effdbc02f
2017-05-26 10:41:57 -07:00
Johann
f3c97ed32e subpel variance neon: reduce stack usage
Unlike x86, arm does not impose additional alignment restrictions on
vector loads. For incoming values to the first pass, it uses vld1_u32()
which typically does impose a 4 byte alignment. However, as the first
pass operates on user-supplied values we must prepare for unaligned
values anyway (and have, see mem_neon.h).

But for the local temporary values there is no stride and the load will
use vld1_u8 which does not require 4 byte alignment.

There are 3 temporary structures. In the C, one is uint16_t. The arm
saturates between passes but still passes tests. If this becomes an
issue new functions will be needed.

Change-Id: I3c9d4701bfeb14b77c783d0164608e621bfecfb1
2017-05-24 13:28:13 -07:00
Johann
d204c4bf01 Use vdup instead of vmov
Change-Id: Idb6248c1429b55176bb3e9f4e8365ea0ed2be62a
2017-05-24 11:38:15 -07:00
Johann Koenig
de1a9c77a7 Merge changes Iaab2b9a1,Idfb458d3
* changes:
  sub pel avg variance neon: 4x block sizes
  sub pel variance neon: 4x block sizes
2017-05-24 18:33:53 +00:00
Johann Koenig
b11a37f540 Merge changes I31fa6ef8,I228c6f29
* changes:
  sub pel avg variance neon: add neon optimizations
  sub pel variance neon: normalize variable names
2017-05-24 18:32:02 +00:00
Alexandra Hájková
8bf6eaf433 ppc: Add vpx_sadnxmx4d_vsx for n,m = {8, 16, 32 ,64}
Change-Id: I547d0099e15591655eae954e3ce65fdf3b003123
2017-05-24 13:27:09 +00:00
Linfeng Zhang
6444958f62 Update inv_txfm_sse2.h and inv_txfm_sse2.c
Extract shared code into inline functions.

Change-Id: Iee1e5a4bc6396aeed0d301163095c9b21aa66b2f
2017-05-23 14:54:46 -07:00
Johann
f6fcd3410d sub pel avg variance neon: 4x block sizes
BUG=webm:1423

Change-Id: Iaab2b9a183fdb54aae5f717aba95d90dc36a9e3b
2017-05-22 14:40:05 -07:00
Johann
188d58eaa9 sub pel variance neon: 4x block sizes
Add optimizations for blocks of width 4

BUG=webm:1423

Change-Id: Idfb458d36db3014d48fbfbe7f5462aa6eb249938
2017-05-22 14:40:01 -07:00
Johann
9b0d306a2f sub pel avg variance neon: add neon optimizations
These are missing an optimized version of vpx_comp_avg_pred

BUG=webm:1423

Change-Id: I31fa6ef842e98f7ff3ea079ffed51ae33178e2ed
2017-05-22 13:58:43 -07:00
Johann
e0d294c3af sub pel variance neon: normalize variable names
match vpx_dsp/variance.c variable names

Change-Id: I228c6f296c183af147b079b7c8bcdf97bd09cf3a
2017-05-22 13:58:43 -07:00
Linfeng Zhang
27beada6d0 Merge "Add vpx_highbd_idct{4x4,8x8,16x16}_1_add_sse2" 2017-05-22 20:58:18 +00:00
Johann
67ac68e399 variance neon: assert overflow conditions
Change-Id: I12faca82d062eb33dc48dfeb39739b25112316cd
2017-05-22 11:25:06 -07:00
Linfeng Zhang
c167345ffb Add vpx_highbd_idct{4x4,8x8,16x16}_1_add_sse2
BUG=webm:1412

Change-Id: Ia338a6057d36f9ed7eaa9cbd4dfbf0c3cbdc6468
2017-05-22 11:24:21 -07:00
Johann
d217c87139 neon variance: special case 4x
The sub pixel variance uses a temp buffer which guarantees width ==
stride. Take advantage of this with the 4x and avoid the very costly
lane loads.

Change-Id: Ia0c97eb8c29dc8dfa6e51a29dff9b75b3c6726f1
2017-05-22 10:51:31 -07:00
Johann Koenig
e7cac13016 Merge changes Ib8dd96f7,Ie9854b77
* changes:
  neon variance: process 4x blocks
  use memcpy for unaligned neon stores
2017-05-22 17:48:33 +00:00
Johann Koenig
b5055002d7 Merge "neon 4 byte helper functions" 2017-05-19 17:11:30 +00:00
Johann Koenig
3c603eadb4 Merge "neon fdct: 4x4 implementation" 2017-05-19 17:08:58 +00:00
Johann
7b742da63e neon variance: process 4x blocks
Continue processing sets of 16 values. Plenty of improvement for 4x8
(doubles the speed) but only about 30% for 4x4.

BUG=webm:1422

Change-Id: Ib8dd96f75d474f0348800271d11e58356b620905
2017-05-17 17:35:01 -07:00
Johann
2057d3ef75 use memcpy for unaligned neon stores
Advise the compiler that the store is eventually going to a uint8_t
buffer. This helps avoid getting alignment hints which would cause the
memory access to fail.

Originally added as a workaround for clang:
https://bugs.llvm.org//show_bug.cgi?id=24421

Change-Id: Ie9854b777cfb2f4baaee66764f0e51dcb094d51e
2017-05-17 12:11:31 -07:00
Johann
105503b839 neon fdct: 4x4 implementation
Approximately twice as fast as C implementation.

BUG=webm:1424

Change-Id: I3c0307fb08ddc23df42545cd089a78e2ed5c9d3f
2017-05-17 07:38:18 -07:00
Linfeng Zhang
18e8baa5c0 Add transpose_32bit_4x4() and rename transpose_4x4() for vpx_dsp/x86
Change-Id: Ib57377f6cf6573c04720d3cc5dea4285362b4220
2017-05-16 17:46:37 -07:00
Johann Koenig
2300e16675 Revert "Add visibility="protected" attribute for global variables referenced in asm files."
This reverts commit 0d88e15454.

Reason for revert: chromium builds are failing to locate vpx_rv during dlopen()

dlopen failed: cannot locate symbol "vpx_rv" referenced by "libstandalonelibwebviewchromium.so"

Original change's description:
> Add visibility="protected" attribute for global variables referenced in asm files.
>
> During aosp builds with binutils-2.27, we're seeing linker error
> messages of this form:
> libvpx.a(subpixel_mmx.o): relocation R_386_GOTOFF against preemptible
> symbol vp8_bilinear_filters_x86_8 cannot be used when making a shared
> object
>
> subpixel_mmx.o is assembled from "vp8/common/x86/subpixel_mmx.asm".
> Other messages refer to symbol references from deblock_sse2.o and
> subpixel_sse2.o, also assembled from asm files.
>
> This change marks such symbols as having "protected" visibility. This
> satisfies the linker as the symbols are not preemptible from outside
> the shared library now, which I think is the original intent anyway.
>
> Change-Id: I2817f7a5f43041533d65ebf41aefd63f8581a452
>

TBR=jzern@google.com,johannkoenig@google.com,rahulchaudhry@chromium.org,builds@webmproject.org

Change-Id: I0c2ea375aa7ef5fda15b9d9e23e654bb315c941b
2017-05-16 15:54:33 -07:00
Johann
7498fe2e54 neon 4 byte helper functions
When data is guaranteed to be aligned, use helper functions which
assert that requirement.

Change-Id: Ic4b188593aea0799d5bd8eda64f9858a1592a2a3
2017-05-15 13:42:31 -07:00
Johann
1088b4f87c move neon load/stores to a new file
Move the tran_low_t helper functions to a new file. Additional
load/store functions will be added here.

Change-Id: I52bf652c344c585ea2f3e1230886be93f5caefc3
2017-05-15 08:29:43 -07:00
Alexandra Hájková
bcbc3929ae ppc: Add vpx_sad64/32/16x64/32/16_avg_vsx
Change-Id: Ic9639b1331d8c5cbc207c2a036891ff0137fc56f
2017-05-13 13:13:15 +00:00
Rahul Chaudhry
0d88e15454 Add visibility="protected" attribute for global variables referenced in asm files.
During aosp builds with binutils-2.27, we're seeing linker error
messages of this form:
libvpx.a(subpixel_mmx.o): relocation R_386_GOTOFF against preemptible
symbol vp8_bilinear_filters_x86_8 cannot be used when making a shared
object

subpixel_mmx.o is assembled from "vp8/common/x86/subpixel_mmx.asm".
Other messages refer to symbol references from deblock_sse2.o and
subpixel_sse2.o, also assembled from asm files.

This change marks such symbols as having "protected" visibility. This
satisfies the linker as the symbols are not preemptible from outside
the shared library now, which I think is the original intent anyway.

Change-Id: I2817f7a5f43041533d65ebf41aefd63f8581a452
2017-05-12 11:11:16 -07:00
James Zern
ac8f58f6ab Merge changes I1b54a7a5,I3028bdad,I59788cd9
* changes:
  ppc: Add get_mb_ss_vsx
  ppc: Add get4x4sse_cs_vsx
  ppc: Add comp_avg_pred_vsx
2017-05-12 15:24:59 +00:00
Luca Barbato
143b21e362 ppc: Add get_mb_ss_vsx
Change-Id: I1b54a7a5bb642e4b836d786ea1ae506eed025e3f
2017-05-12 17:23:00 +02:00
Luca Barbato
6d225eb5f9 ppc: Add get4x4sse_cs_vsx
Change-Id: I3028bdadf653665d18e781d28e9625f62804b3d8
2017-05-12 17:23:00 +02:00
Luca Barbato
a7f8bd451b ppc: Add comp_avg_pred_vsx
Change-Id: I59788cd98231e707239c2ad95ae54f67cfe24e10
2017-05-12 17:22:55 +02:00
Alexandra Hájková
f48532e271 ppc: Add vpx_sad64x32/64_vsx
Change-Id: I84e3705fa52f75cb91b2bab4abf5cc77585ee3e2
2017-05-12 16:10:16 +02:00
Alexandra Hájková
0b15bf1e54 ppc Add vpx_sad32x16/32/64_vsx
Change-Id: I3c4f9d595275669580413a71b3c3c810e7ddcacd
2017-05-12 16:10:11 +02:00
James Zern
a12ea1d5e9 Merge "ppc: Add vpx_sad16x8/16/32_vsx" 2017-05-12 13:33:51 +00:00
Alexandra Hájková
cc7f0c0f3e ppc: Add vpx_sad16x8/16/32_vsx
Change-Id: I60619d28fffd9809f93b1af510a50e1aa02519a9
2017-05-10 19:57:30 +00:00
Linfeng Zhang
764b3b8090 Update specializations of idct functions
Introduced append situation in Commit 0178d97 which could be
confusing. Clean a little bit and add some comments.

Change-Id: I69ad336f805aca7ce9d45515b8cd237423fadbb2
2017-05-10 12:51:18 -07:00
Johann Koenig
d713ec3c46 Merge changes I92eb4312,Ibb2afe4e
* changes:
  subpel variance neon: add mixed sizes
  sub pixel variance neon: use generic variance
2017-05-10 18:19:52 +00:00
Linfeng Zhang
f532504864 Clean 32x32 idct C code
Change-Id: I73b8104a9e7a70ffe827c1b7ff43618f24f5d7bd
2017-05-09 11:05:51 -07:00
Linfeng Zhang
ecd1eb2162 Update 4x4 idct sse2 functions
It's a bit faster to call idct4_sse2() in vpx_idct4x4_16_add_sse2()

Change-Id: I1513be7a895cd2fc190f4a8297c240b17de0f876
2017-05-08 16:16:52 -07:00
Johann
f7d1486f48 neon variance: process 16 values at a time
Read in a Q register. Works on blocks of 16 and larger.

Improvement of about 20% for 64x64. The smaller blocks are faster, but
don't have quite the same level of improvement. 16x32 is only about 5%

BUG=webm:1422

Change-Id: Ie11a877c7b839e66690a48117a46657b2ac82d4b
2017-05-08 18:48:55 +00:00
Johann Koenig
1814463864 Merge changes Id602909a,Ib0e85608
* changes:
  neon variance: process two rows of 8 at a time
  neon variance: add small missing sizes
2017-05-08 17:34:20 +00:00
Linfeng Zhang
2c3a2ad6f1 Merge changes I0cfe4117,I3581d80d,Ida62c941
* changes:
  Split dsp/x86/inv_txfm_sse2.c
  Update highbd idct functions arguments to use uint16_t dst
  Clean CONVERT_TO_BYTEPTR/SHORTPTR in idct
2017-05-08 16:15:57 +00:00
Johann
2346a6da4a subpel variance neon: add mixed sizes
Add support for everything except block sizes of 4.

Performance is better but numbers will improve again when the variance
optimizations land.

BUG=webm:1423

Change-Id: I92eb4312b20be423fa2fe6fdb18167a604ff4d80
2017-05-04 15:30:01 -07:00
Johann
19e1ec8359 sub pixel variance neon: use generic variance
When a neon version is available it will be called. This allows
decoupling the variance implementations and has no real downside. For
most configurations, the call will be #define'd to the neon
implementation.

Change-Id: Ibb2afe4e156c5610e89488504d366b3e6d1ba712
2017-05-04 15:30:01 -07:00
Johann
462e29703c fdct 8x8 neon: minor comment cleanup
Simplify HBD/non distinction in test.

Document why transpose_neon.h is not used

Change-Id: I17659414206ddbb8c2f1ef0d9f4a17f1745d5a52
2017-05-04 15:14:23 -07:00
Johann
d6a7489dd5 neon variance: process two rows of 8 at a time
When the width is equal to 8, process two rows at a time. This doubles
the speed of 8x4 and improves 8x8 by about 20%.

8x16 was using this technique already, but still improved a little bit
with the rewrite.

Also use this for vpx_get8x8var_neon

BUG=webm:1422

Change-Id: Id602909afcec683665536d11298b7387ac0a1207
2017-05-04 08:59:46 -07:00
Johann
cb9133c72f neon variance: add small missing sizes
Some of the mixed sizes were missing. They can be implemented trivially
using the existing helper function.

When comparing the previous 16x8 and 8x16 implementations, the helper
function is about 10% faster than the 16x8 version. The 8x16 is very
close, but the existing version appears to be faster.

BUG=webm:1422

Change-Id: Ib0e856083c1893e1bd399373c5fbcd6271a7f004
2017-05-04 08:59:42 -07:00
Linfeng Zhang
2231669a83 Split dsp/x86/inv_txfm_sse2.c
Spin out highbd idct functions.

BUG=webm:1412

Change-Id: I0cfe4117c00039b6778c59c022eee79ad089a2af
2017-05-03 15:43:02 -07:00
Linfeng Zhang
d5de63d2be Update highbd idct functions arguments to use uint16_t dst
BUG=webm:1388

Change-Id: I3581d80d0389b99166e70987d38aba2db6c469d5
2017-05-03 13:59:16 -07:00
Linfeng Zhang
081b39f2b7 Clean CONVERT_TO_BYTEPTR/SHORTPTR in idct
BUG=webm:1388

Change-Id: Ida62c941f2b836d6c9e27b427a7d5008ab6dc112
2017-05-03 13:58:31 -07:00
Yi Luo
a3452996a1 High bit depth inter prediction horizontal/vertical filters AVX2
User level speed improvement on i7-6700, cpu-used=1,
  x86_64 Linux, bitrate, 1080p, 8Mbps, 4K, 16Mbps:
- Decoder:
  1080p: ~4%
  4K: ~5%
- Encoder:
  1080p: ~1%
  4K: ~3%

Change-Id: I51b48f9c5de0d62487d5a11aa579c97bd03dd640
2017-05-03 12:18:01 -07:00
Linfeng Zhang
a10a5cb356 Merge changes I8bb660de,Ica51d780,I6037525d
* changes:
  Clean specializes of idct functions
  Clean add_protos of highbd idct functions
  Clean add_protos of idct functions
2017-05-03 19:17:55 +00:00
Luca Barbato
e2ad89092d ppc: Add convolve8_vsx and convolve8_avg_vsx
Change-Id: Ia5293d948003a7fff5a7cbad6e83d8a72717c857
2017-05-02 20:27:47 -07:00
Luca Barbato
e6ca81ee67 ppc: Add convolve8_avg_vert_vsx
Only the generic one again, speedups for 8x8 and larger blocks to
come later.

Change-Id: I90d481d3a602d1e277ead8f3934eca126b86b72d
2017-05-02 20:27:42 -07:00
Luca Barbato
a65f1771ad ppc: Add convolve8_vert
Only the generic one again, speedups for 8x8 and larger blocks
to come later.

Change-Id: Ia509d6225984b4930ec03928c9bcbf51486da99f
2017-05-02 20:27:33 -07:00
Luca Barbato
77772350f3 ppc: Add convolve8_horiz_avg
The 8x8 and larger blocks cases can be sped up further.

Change-Id: I54549b03ac6c7a4e3f485738b100c3cac7ac2e15
2017-05-02 20:27:28 -07:00
Luca Barbato
08edb85bd0 ppc: Add convolve8_horiz
The 8x8 and larger blocks cases can be sped up further.

Change-Id: I89b635d6b01c59f523f2d54b1284ed32916c5046
2017-05-02 20:27:16 -07:00
Linfeng Zhang
0178d974e5 Clean specializes of idct functions
Change-Id: I8bb660de47b5f97263ec381dc428db96e9c9a4b2
2017-05-02 18:01:19 -07:00
Linfeng Zhang
4412996d59 Clean add_protos of highbd idct functions
Change-Id: Ica51d780b92b316ce9112740c56cdf7670816371
2017-05-02 17:59:38 -07:00
Linfeng Zhang
a7a57d9756 Clean add_protos of idct functions
Change-Id: I6037525d92ec172810edab720389eb1865ed3b1a
2017-05-02 17:58:40 -07:00
Luca Barbato
d51d3934f5 ppc: Add convolve_avg
Change-Id: Ib203c444c708f42072e38301ee3db97b5b53d014
2017-04-29 15:47:25 +02:00
Luca Barbato
63860ba7b8 ppc: Add convolve_copy
Change-Id: Ie26d6dbe090e711d84bac01ba7da270db983f405
2017-04-29 15:47:25 +02:00
Linfeng Zhang
51dc998f3a Update highbd convolve functions arguments to use uint16_t src/dst
BUG=webm:1388

Change-Id: I6912de2639895d817ce850da8ea9f6c8fe21da42
2017-04-25 14:22:19 -07:00
Luca Barbato
914b160fb5 ppc: h predictor 8x8
Slightly faster with the current compiler.

Change-Id: Iae225fac08395eb430c97a2abec69c60f5cf5c47
2017-04-19 19:57:51 -07:00
Luca Barbato
0b9be93205 ppc: d63 predictor 8x8
10x faster.

Change-Id: I7cedbf4df2ce7df5b6f1108b11815d088fdb9ba8
2017-04-19 19:57:51 -07:00
Luca Barbato
ee9325b0bd ppc: tm predictor 4x4
Slightly faster.

Change-Id: I0ca43f309b3d9b50435d69bd5be64b53a99bd191
2017-04-19 19:57:51 -07:00
Luca Barbato
2904eb5800 ppc: h predictor 4x4
2x faster.

Change-Id: I0583dec353299c6797401b646099f18db4e0420d
2017-04-19 19:57:51 -07:00
Luca Barbato
58245d7050 ppc: dc predictor 8x8
Slightly faster, the other dc predictors cannot be faster since
the computation speedup is overwhelmed by the time spent reading
dst to write just the 8x8 part.

Change-Id: I94a0b50500adf8b7b6bb919dbf5c7adf5b9fba66
2017-04-19 19:57:51 -07:00
Luca Barbato
6b4a65e8b1 ppc: d45 predictor 8x8
11x faster.

Change-Id: I5b8f39213ee1f5260724fc254e3fb5c462435798
2017-04-19 19:57:51 -07:00
Luca Barbato
92e33c7b31 ppc: d63 predictor 32x32
About 10x faster.

Change-Id: If7d0645f75c5d7deb9751edd0bf47e2f9068e9e7
2017-04-19 19:57:51 -07:00
Luca Barbato
a5469a00a8 ppc: d63 predictor 16x16
About 18x faster.

Change-Id: Id043bf76c011e03e992085bb5e20f330d3e98cd4
2017-04-19 19:57:51 -07:00
Luca Barbato
cc868da526 ppc: d45 predictor 32x32
About 12x faster.

Change-Id: I22c150256aefb4941861ab1f6c17d554fb694bed
2017-04-19 19:57:51 -07:00
Luca Barbato
7a7dc9e624 ppc: d45 predictor 16x16
About 16x faster.

Change-Id: Ie5469fb32d5fd11bb6cb06318cea475d8a5b00b9
2017-04-19 19:57:51 -07:00
Luca Barbato
c08baa2900 ppc: dc predictor 32x32
10x and 5x faster.

Change-Id: I7913c58c768334d818f541a5e219f1035791eeaf
2017-04-19 19:57:47 -07:00
Luca Barbato
22ca468c7c ppc: dc top and left predictor 32x32
6x faster.

Change-Id: I717995b4056e5579c68191d11b495372971fe1ae
2017-04-19 19:49:31 -07:00
Luca Barbato
ad9dea1f6d ppc: dc top and left predictor 16x16
13x faster.

Change-Id: I1771ac39fda599153f933cb3f0506c9f97a6cbe6
2017-04-19 19:49:31 -07:00
Luca Barbato
d68d37872c ppc: dc_128 predictor 32x32
6x faster.

Change-Id: I1da8f51b4262871cb98f0aa03ccda41b0ac2b08b
2017-04-19 19:49:31 -07:00
Luca Barbato
f9d20e6df2 ppc: dc_128 predictor 16x16
20x faster.

Change-Id: I05f0deb2d38ae7966eae6b71fbc0aa51880e5709
2017-04-19 19:49:31 -07:00
Luca Barbato
0d9417de4a ppc: tm predictor 32x32
About 8x faster.

Change-Id: I9bad827ccbdf47ec95406e961c74ac2ff45f80cf
2017-04-19 19:49:26 -07:00
James Zern
a81f037f15 Merge changes I1f5a3752,I95123051,I3bb724e0,Ie81077fa,Ic80f3c05, ...
* changes:
  ppc: tm predictor 16x16
  ppc: tm predictor 8x8
  ppc: horizontal predictor 32x32
  ppc: horizontal predictor 16x16
  ppc: vertical intrapred 16x16 and 32x32
  configure: Workaround clang not enabling altivec on -mvsx
  configure: Match power*64* as ppc64
2017-04-20 02:45:45 +00:00
Linfeng Zhang
bf8a49abbd Clean CONVERT_TO_BYTEPTR/SHORTPTR in convolve
Replace by CAST_TO_BYTEPTR/SHORTPTR.
The rule is: if a short ptr is casted to a byte ptr, any offset
operation on the byte ptr must be doubled. We do this by casting to
short ptr first, adding offset, then casting back to byte ptr.

BUG=webm:1388

Change-Id: I9e18a73ba45ddae58fc9dae470c0ff34951fe248
2017-04-19 12:13:49 -07:00
Luca Barbato
479443a570 ppc: tm predictor 16x16
About 10x faster.

Change-Id: I1f5a3752d346459df3b45f92963208bf3e520f06
2017-04-19 01:48:10 +02:00
Luca Barbato
c8f5a55df4 ppc: tm predictor 8x8
About 5x faster.

Change-Id: I951230517f49c0dca9ac9eac2efa8916a303b85a
2017-04-19 01:48:09 +02:00
Luca Barbato
7b0e12934e ppc: horizontal predictor 32x32
About 5x faster.

Change-Id: I3bb724e07baffd901aa2d0f65060ba48882cc9b8
2017-04-19 01:48:09 +02:00
Luca Barbato
a7a2d1653b ppc: horizontal predictor 16x16
About 10x faster.

Change-Id: Ie81077fa32ad214cdb46bdcb0be4e9e2c7df47c2
2017-04-19 01:48:09 +02:00
Luca Barbato
7ad1faa6f8 ppc: vertical intrapred 16x16 and 32x32
Change-Id: Ic80f3c050cfbe7697e81a311b4edaaa597b85cab
2017-04-19 01:48:09 +02:00
Johann
9fa24f03b5 re-enable vpx_comp_avg_pred_sse2
Buffers on 32 bit x86 builds only guaranteed 8 byte alignment. Fixed
with "AvgPred test: use aligned buffers" and "sad avg: align
intermediate buffer"

Also re-enable asserts on the C version.

BUG=webm:1390

Change-Id: I93081f1b0002a352bb0a3371ac35452417fa8514
2017-04-17 08:40:43 -07:00
Johann
069b772915 sad avg: align intermediate buffer
comp_avg_pred has started declaring a requirement for aligned buffers.

BUG=webm:1390

Change-Id: Idaf6667498ea343e8d49b32bc9d8b9d0aa43ef5c
2017-04-17 14:26:33 +00:00
James Zern
4ba20da8b1 Merge "Add AVX2 optimization to copy/avg functions" 2017-04-15 00:26:08 +00:00
Yi Luo
aa5a941992 Add AVX2 optimization to copy/avg functions
Change-Id: Ibcef70e4fead74e2c2909330a7044a29381a8074
2017-04-14 16:50:10 -07:00
Johann
eaa7cdf05d Disable vpx_comp_avg_pred_sse2
Failures on windows:
unknown file: error: SEH exception with code 0xc0000005 thrown in the
test body.

Alignment check errors on linux:
test_libvpx: ../libvpx/vpx_dsp/variance.c:230: void
vpx_comp_avg_pred_c(uint8_t *, const uint8_t *, int, int, const uint8_t
*, int): Assertion `((intptr_t)comp_pred & 0xf) == 0' failed.

BUG=webm:1390

Change-Id: I5eed5381c0f1a8fe594a128eb415e77232f544ea
2017-04-14 08:43:06 -07:00