Linfeng Zhang
764b3b8090
Update specializations of idct functions
...
Introduced append situation in Commit 0178d97
which could be
confusing. Clean a little bit and add some comments.
Change-Id: I69ad336f805aca7ce9d45515b8cd237423fadbb2
2017-05-10 12:51:18 -07:00
Johann Koenig
d713ec3c46
Merge changes I92eb4312,Ibb2afe4e
...
* changes:
subpel variance neon: add mixed sizes
sub pixel variance neon: use generic variance
2017-05-10 18:19:52 +00:00
Linfeng Zhang
f532504864
Clean 32x32 idct C code
...
Change-Id: I73b8104a9e7a70ffe827c1b7ff43618f24f5d7bd
2017-05-09 11:05:51 -07:00
Linfeng Zhang
ecd1eb2162
Update 4x4 idct sse2 functions
...
It's a bit faster to call idct4_sse2() in vpx_idct4x4_16_add_sse2()
Change-Id: I1513be7a895cd2fc190f4a8297c240b17de0f876
2017-05-08 16:16:52 -07:00
Johann
f7d1486f48
neon variance: process 16 values at a time
...
Read in a Q register. Works on blocks of 16 and larger.
Improvement of about 20% for 64x64. The smaller blocks are faster, but
don't have quite the same level of improvement. 16x32 is only about 5%
BUG=webm:1422
Change-Id: Ie11a877c7b839e66690a48117a46657b2ac82d4b
2017-05-08 18:48:55 +00:00
Johann Koenig
1814463864
Merge changes Id602909a,Ib0e85608
...
* changes:
neon variance: process two rows of 8 at a time
neon variance: add small missing sizes
2017-05-08 17:34:20 +00:00
Linfeng Zhang
2c3a2ad6f1
Merge changes I0cfe4117,I3581d80d,Ida62c941
...
* changes:
Split dsp/x86/inv_txfm_sse2.c
Update highbd idct functions arguments to use uint16_t dst
Clean CONVERT_TO_BYTEPTR/SHORTPTR in idct
2017-05-08 16:15:57 +00:00
Johann
2346a6da4a
subpel variance neon: add mixed sizes
...
Add support for everything except block sizes of 4.
Performance is better but numbers will improve again when the variance
optimizations land.
BUG=webm:1423
Change-Id: I92eb4312b20be423fa2fe6fdb18167a604ff4d80
2017-05-04 15:30:01 -07:00
Johann
19e1ec8359
sub pixel variance neon: use generic variance
...
When a neon version is available it will be called. This allows
decoupling the variance implementations and has no real downside. For
most configurations, the call will be #define'd to the neon
implementation.
Change-Id: Ibb2afe4e156c5610e89488504d366b3e6d1ba712
2017-05-04 15:30:01 -07:00
Johann
462e29703c
fdct 8x8 neon: minor comment cleanup
...
Simplify HBD/non distinction in test.
Document why transpose_neon.h is not used
Change-Id: I17659414206ddbb8c2f1ef0d9f4a17f1745d5a52
2017-05-04 15:14:23 -07:00
Johann
d6a7489dd5
neon variance: process two rows of 8 at a time
...
When the width is equal to 8, process two rows at a time. This doubles
the speed of 8x4 and improves 8x8 by about 20%.
8x16 was using this technique already, but still improved a little bit
with the rewrite.
Also use this for vpx_get8x8var_neon
BUG=webm:1422
Change-Id: Id602909afcec683665536d11298b7387ac0a1207
2017-05-04 08:59:46 -07:00
Johann
cb9133c72f
neon variance: add small missing sizes
...
Some of the mixed sizes were missing. They can be implemented trivially
using the existing helper function.
When comparing the previous 16x8 and 8x16 implementations, the helper
function is about 10% faster than the 16x8 version. The 8x16 is very
close, but the existing version appears to be faster.
BUG=webm:1422
Change-Id: Ib0e856083c1893e1bd399373c5fbcd6271a7f004
2017-05-04 08:59:42 -07:00
Linfeng Zhang
2231669a83
Split dsp/x86/inv_txfm_sse2.c
...
Spin out highbd idct functions.
BUG=webm:1412
Change-Id: I0cfe4117c00039b6778c59c022eee79ad089a2af
2017-05-03 15:43:02 -07:00
Linfeng Zhang
d5de63d2be
Update highbd idct functions arguments to use uint16_t dst
...
BUG=webm:1388
Change-Id: I3581d80d0389b99166e70987d38aba2db6c469d5
2017-05-03 13:59:16 -07:00
Linfeng Zhang
081b39f2b7
Clean CONVERT_TO_BYTEPTR/SHORTPTR in idct
...
BUG=webm:1388
Change-Id: Ida62c941f2b836d6c9e27b427a7d5008ab6dc112
2017-05-03 13:58:31 -07:00
Yi Luo
a3452996a1
High bit depth inter prediction horizontal/vertical filters AVX2
...
User level speed improvement on i7-6700, cpu-used=1,
x86_64 Linux, bitrate, 1080p, 8Mbps, 4K, 16Mbps:
- Decoder:
1080p: ~4%
4K: ~5%
- Encoder:
1080p: ~1%
4K: ~3%
Change-Id: I51b48f9c5de0d62487d5a11aa579c97bd03dd640
2017-05-03 12:18:01 -07:00
Linfeng Zhang
a10a5cb356
Merge changes I8bb660de,Ica51d780,I6037525d
...
* changes:
Clean specializes of idct functions
Clean add_protos of highbd idct functions
Clean add_protos of idct functions
2017-05-03 19:17:55 +00:00
Luca Barbato
e2ad89092d
ppc: Add convolve8_vsx and convolve8_avg_vsx
...
Change-Id: Ia5293d948003a7fff5a7cbad6e83d8a72717c857
2017-05-02 20:27:47 -07:00
Luca Barbato
e6ca81ee67
ppc: Add convolve8_avg_vert_vsx
...
Only the generic one again, speedups for 8x8 and larger blocks to
come later.
Change-Id: I90d481d3a602d1e277ead8f3934eca126b86b72d
2017-05-02 20:27:42 -07:00
Luca Barbato
a65f1771ad
ppc: Add convolve8_vert
...
Only the generic one again, speedups for 8x8 and larger blocks
to come later.
Change-Id: Ia509d6225984b4930ec03928c9bcbf51486da99f
2017-05-02 20:27:33 -07:00
Luca Barbato
77772350f3
ppc: Add convolve8_horiz_avg
...
The 8x8 and larger blocks cases can be sped up further.
Change-Id: I54549b03ac6c7a4e3f485738b100c3cac7ac2e15
2017-05-02 20:27:28 -07:00
Luca Barbato
08edb85bd0
ppc: Add convolve8_horiz
...
The 8x8 and larger blocks cases can be sped up further.
Change-Id: I89b635d6b01c59f523f2d54b1284ed32916c5046
2017-05-02 20:27:16 -07:00
Linfeng Zhang
0178d974e5
Clean specializes of idct functions
...
Change-Id: I8bb660de47b5f97263ec381dc428db96e9c9a4b2
2017-05-02 18:01:19 -07:00
Linfeng Zhang
4412996d59
Clean add_protos of highbd idct functions
...
Change-Id: Ica51d780b92b316ce9112740c56cdf7670816371
2017-05-02 17:59:38 -07:00
Linfeng Zhang
a7a57d9756
Clean add_protos of idct functions
...
Change-Id: I6037525d92ec172810edab720389eb1865ed3b1a
2017-05-02 17:58:40 -07:00
Luca Barbato
d51d3934f5
ppc: Add convolve_avg
...
Change-Id: Ib203c444c708f42072e38301ee3db97b5b53d014
2017-04-29 15:47:25 +02:00
Luca Barbato
63860ba7b8
ppc: Add convolve_copy
...
Change-Id: Ie26d6dbe090e711d84bac01ba7da270db983f405
2017-04-29 15:47:25 +02:00
Linfeng Zhang
51dc998f3a
Update highbd convolve functions arguments to use uint16_t src/dst
...
BUG=webm:1388
Change-Id: I6912de2639895d817ce850da8ea9f6c8fe21da42
2017-04-25 14:22:19 -07:00
Luca Barbato
914b160fb5
ppc: h predictor 8x8
...
Slightly faster with the current compiler.
Change-Id: Iae225fac08395eb430c97a2abec69c60f5cf5c47
2017-04-19 19:57:51 -07:00
Luca Barbato
0b9be93205
ppc: d63 predictor 8x8
...
10x faster.
Change-Id: I7cedbf4df2ce7df5b6f1108b11815d088fdb9ba8
2017-04-19 19:57:51 -07:00
Luca Barbato
ee9325b0bd
ppc: tm predictor 4x4
...
Slightly faster.
Change-Id: I0ca43f309b3d9b50435d69bd5be64b53a99bd191
2017-04-19 19:57:51 -07:00
Luca Barbato
2904eb5800
ppc: h predictor 4x4
...
2x faster.
Change-Id: I0583dec353299c6797401b646099f18db4e0420d
2017-04-19 19:57:51 -07:00
Luca Barbato
58245d7050
ppc: dc predictor 8x8
...
Slightly faster, the other dc predictors cannot be faster since
the computation speedup is overwhelmed by the time spent reading
dst to write just the 8x8 part.
Change-Id: I94a0b50500adf8b7b6bb919dbf5c7adf5b9fba66
2017-04-19 19:57:51 -07:00
Luca Barbato
6b4a65e8b1
ppc: d45 predictor 8x8
...
11x faster.
Change-Id: I5b8f39213ee1f5260724fc254e3fb5c462435798
2017-04-19 19:57:51 -07:00
Luca Barbato
92e33c7b31
ppc: d63 predictor 32x32
...
About 10x faster.
Change-Id: If7d0645f75c5d7deb9751edd0bf47e2f9068e9e7
2017-04-19 19:57:51 -07:00
Luca Barbato
a5469a00a8
ppc: d63 predictor 16x16
...
About 18x faster.
Change-Id: Id043bf76c011e03e992085bb5e20f330d3e98cd4
2017-04-19 19:57:51 -07:00
Luca Barbato
cc868da526
ppc: d45 predictor 32x32
...
About 12x faster.
Change-Id: I22c150256aefb4941861ab1f6c17d554fb694bed
2017-04-19 19:57:51 -07:00
Luca Barbato
7a7dc9e624
ppc: d45 predictor 16x16
...
About 16x faster.
Change-Id: Ie5469fb32d5fd11bb6cb06318cea475d8a5b00b9
2017-04-19 19:57:51 -07:00
Luca Barbato
c08baa2900
ppc: dc predictor 32x32
...
10x and 5x faster.
Change-Id: I7913c58c768334d818f541a5e219f1035791eeaf
2017-04-19 19:57:47 -07:00
Luca Barbato
22ca468c7c
ppc: dc top and left predictor 32x32
...
6x faster.
Change-Id: I717995b4056e5579c68191d11b495372971fe1ae
2017-04-19 19:49:31 -07:00
Luca Barbato
ad9dea1f6d
ppc: dc top and left predictor 16x16
...
13x faster.
Change-Id: I1771ac39fda599153f933cb3f0506c9f97a6cbe6
2017-04-19 19:49:31 -07:00
Luca Barbato
d68d37872c
ppc: dc_128 predictor 32x32
...
6x faster.
Change-Id: I1da8f51b4262871cb98f0aa03ccda41b0ac2b08b
2017-04-19 19:49:31 -07:00
Luca Barbato
f9d20e6df2
ppc: dc_128 predictor 16x16
...
20x faster.
Change-Id: I05f0deb2d38ae7966eae6b71fbc0aa51880e5709
2017-04-19 19:49:31 -07:00
Luca Barbato
0d9417de4a
ppc: tm predictor 32x32
...
About 8x faster.
Change-Id: I9bad827ccbdf47ec95406e961c74ac2ff45f80cf
2017-04-19 19:49:26 -07:00
James Zern
a81f037f15
Merge changes I1f5a3752,I95123051,I3bb724e0,Ie81077fa,Ic80f3c05, ...
...
* changes:
ppc: tm predictor 16x16
ppc: tm predictor 8x8
ppc: horizontal predictor 32x32
ppc: horizontal predictor 16x16
ppc: vertical intrapred 16x16 and 32x32
configure: Workaround clang not enabling altivec on -mvsx
configure: Match power*64* as ppc64
2017-04-20 02:45:45 +00:00
Linfeng Zhang
bf8a49abbd
Clean CONVERT_TO_BYTEPTR/SHORTPTR in convolve
...
Replace by CAST_TO_BYTEPTR/SHORTPTR.
The rule is: if a short ptr is casted to a byte ptr, any offset
operation on the byte ptr must be doubled. We do this by casting to
short ptr first, adding offset, then casting back to byte ptr.
BUG=webm:1388
Change-Id: I9e18a73ba45ddae58fc9dae470c0ff34951fe248
2017-04-19 12:13:49 -07:00
Luca Barbato
479443a570
ppc: tm predictor 16x16
...
About 10x faster.
Change-Id: I1f5a3752d346459df3b45f92963208bf3e520f06
2017-04-19 01:48:10 +02:00
Luca Barbato
c8f5a55df4
ppc: tm predictor 8x8
...
About 5x faster.
Change-Id: I951230517f49c0dca9ac9eac2efa8916a303b85a
2017-04-19 01:48:09 +02:00
Luca Barbato
7b0e12934e
ppc: horizontal predictor 32x32
...
About 5x faster.
Change-Id: I3bb724e07baffd901aa2d0f65060ba48882cc9b8
2017-04-19 01:48:09 +02:00
Luca Barbato
a7a2d1653b
ppc: horizontal predictor 16x16
...
About 10x faster.
Change-Id: Ie81077fa32ad214cdb46bdcb0be4e9e2c7df47c2
2017-04-19 01:48:09 +02:00
Luca Barbato
7ad1faa6f8
ppc: vertical intrapred 16x16 and 32x32
...
Change-Id: Ic80f3c050cfbe7697e81a311b4edaaa597b85cab
2017-04-19 01:48:09 +02:00
Johann
9fa24f03b5
re-enable vpx_comp_avg_pred_sse2
...
Buffers on 32 bit x86 builds only guaranteed 8 byte alignment. Fixed
with "AvgPred test: use aligned buffers" and "sad avg: align
intermediate buffer"
Also re-enable asserts on the C version.
BUG=webm:1390
Change-Id: I93081f1b0002a352bb0a3371ac35452417fa8514
2017-04-17 08:40:43 -07:00
Johann
069b772915
sad avg: align intermediate buffer
...
comp_avg_pred has started declaring a requirement for aligned buffers.
BUG=webm:1390
Change-Id: Idaf6667498ea343e8d49b32bc9d8b9d0aa43ef5c
2017-04-17 14:26:33 +00:00
James Zern
4ba20da8b1
Merge "Add AVX2 optimization to copy/avg functions"
2017-04-15 00:26:08 +00:00
Yi Luo
aa5a941992
Add AVX2 optimization to copy/avg functions
...
Change-Id: Ibcef70e4fead74e2c2909330a7044a29381a8074
2017-04-14 16:50:10 -07:00
Johann
eaa7cdf05d
Disable vpx_comp_avg_pred_sse2
...
Failures on windows:
unknown file: error: SEH exception with code 0xc0000005 thrown in the
test body.
Alignment check errors on linux:
test_libvpx: ../libvpx/vpx_dsp/variance.c:230: void
vpx_comp_avg_pred_c(uint8_t *, const uint8_t *, int, int, const uint8_t
*, int): Assertion `((intptr_t)comp_pred & 0xf) == 0' failed.
BUG=webm:1390
Change-Id: I5eed5381c0f1a8fe594a128eb415e77232f544ea
2017-04-14 08:43:06 -07:00
Johann
28a8622143
vpx_comp_avg_pred: sse2 optimization
...
Provides over 15x speedup for width > 8.
Due to smaller loads and shifting for width == 8 it gets about 8x
speedup.
For width == 4 it's only about 4x speedup because there is a lot of
shuffling and shifting to get the data properly situated.
BUG=webm:1390
Change-Id: Ice0b3dbbf007be3d9509786a61e7f35e94bdffa8
2017-04-13 08:44:52 -07:00
James Zern
04e9456567
Merge changes from topic 'Wshorten'
...
* changes:
configure: enable -Wshorten-64-to-32 for hbd
vp9_encodeframe: resolve -Wshorten-64-to-32 in hbd
Resolve -Wshorten-64-to-32 in highbd variance.
2017-04-07 07:32:14 +00:00
James Zern
47b9a09120
Resolve -Wshorten-64-to-32 in highbd variance.
...
For 8-bit the subtrahend is small enough to fit into uint32_t.
This is the same that was done for:
c0241664a
Resolve -Wshorten-64-to-32 in variance.
For 10/12-bit apply:
63a37d16f
Prevent negative variance
Change-Id: Iab35e3f3f269035e17c711bd6cc01272c3137e1d
2017-04-05 17:34:02 -07:00
Linfeng Zhang
6fc2e57c2c
Update 32x32 high bitdepth idct NEON optimization
...
Preparation of CONVERT_TO_BYTEPTR/SHORTPTR clean up.
BUG=webm:1388
Change-Id: I928d30a5698023bb90888d783cf81c51ec183760
2017-04-05 15:28:11 -07:00
James Zern
aefc1088a2
intrapred: sync highbd_d135_predictor w/d135_
...
previously:
05437805f
intrapred/d135: flatten border results before storing
BUG=webm:1316
Change-Id: I3b8bd89117ad7f2f4560b57f7c148da781e86f85
2017-03-24 20:45:44 -07:00
James Zern
67cde46dd7
intrapred: specialize highbd 4x4 predictors
...
d207/d63/d45/d117/d135/d153
~9-45% better depending on the predictor on 32-bit ARM, similar range on
x86-64
this matches the non-highbitdepth implementation
BUG=webm:1316
Change-Id: Iddebdf7c58c6f31c47cae04da95c6e5318200e4c
2017-03-24 20:45:36 -07:00
James Zern
e05f4cf8f4
intrapred: rename d63f to d63e
...
this is consistent with he/ve/d45e
Change-Id: I75641ae5667430b0ecd370db86fff6e666cb577d
2017-03-24 20:41:39 -07:00
James Zern
d45617c702
remove CONFIG_MISC_FIXES
...
this belonged to vp10 with the changes now migrated to av1.
Change-Id: Ie30ead3e7b71f465bc14136e1b6f156ea978c43f
2017-03-24 20:41:39 -07:00
Kaustubh Raste
8ee9b855a0
Merge "Fix mips msa fwd xform mismatch"
2017-03-23 07:44:16 +00:00
James Zern
f16ea6a6eb
Merge "vp9_rdopt: correct size to vpx_sum_squares_2d_i16"
2017-03-23 00:53:22 +00:00
James Zern
e097bb1d39
Merge "idct_neon: prefix non-static functions w/'vpx_'"
2017-03-22 19:30:11 +00:00
James Zern
5661cd8ff4
vp9_rdopt: correct size to vpx_sum_squares_2d_i16
...
the current implementations expect pixel size, not the block type
BUG=webm:1392
Change-Id: Ib91e9f30a1f56e13566b1fb76f089dae9bb50cdc
2017-03-22 12:04:33 -07:00
James Zern
f91c3bb3ab
idct_neon: prefix non-static functions w/'vpx_'
...
Change-Id: I94fcdeae18468e6ef0cb7119b8142d982a048031
2017-03-22 11:49:23 -07:00
Kaustubh Raste
e45c1f55b4
Fix mips msa fwd xform mismatch
...
Change-Id: I32a6df11463144aa1a562256ee7d57a41fd678d6
2017-03-22 14:01:03 +05:30
Yi Luo
cb9b277b2f
Merge "Make butterfly_self() signature consistent with butterfly()"
2017-03-21 22:32:20 +00:00
Yi Luo
266868a40b
Make butterfly_self() signature consistent with butterfly()
...
- Refer to patch: 48fca113d
inv_txfm_ssse3,butterfly: fix win32 abi
compatibility.
- Change four butterfly() calls to butterfly_self(), to simplify the
operations.
Change-Id: Ib2a8cfe6cddcaf0a59e6e6270d8380055ea42ef3
2017-03-21 09:36:35 -07:00
James Zern
e0b4c4d1ae
Merge "Add vpx_highbd_idct32x32_1024_add_neon()"
2017-03-21 03:27:35 +00:00
James Zern
6d71d33d55
Merge "Add vpx_highbd_idct32x32_34_add_neon()"
2017-03-21 03:02:51 +00:00
James Zern
5da2e500d7
inv_txfm_sse2: clear conversion warning in hbd build
...
tran_high -> tran_low in return from dct_const_round_shift()
Change-Id: I2fe06c4b604823b1d1fe40a487017c3c2819a440
2017-03-17 01:16:38 -07:00
Linfeng Zhang
27530d484e
Add vpx_highbd_idct32x32_1024_add_neon()
...
BUG=webm:1301
Change-Id: Ib90af0c1712e56b301d0e981dbe9a641e15e36ca
2017-03-17 00:27:46 -07:00
Linfeng Zhang
50b13f75b8
Add vpx_highbd_idct32x32_34_add_neon()
...
BUG=webm:1301
Change-Id: I74dd16c6c64e7bb71aa991cedccddf0663ef5e06
2017-03-17 00:27:46 -07:00
James Zern
2882778310
Merge "Add vpx_highbd_idct32x32_135_add_neon()"
2017-03-17 07:26:52 +00:00
Linfeng Zhang
65e9fb65e8
Add vpx_highbd_idct32x32_135_add_neon()
...
BUG=webm:1301
Change-Id: I58c2d65d385080711c3666d6d8f9d241dac7b21a
2017-03-16 22:37:55 -07:00
James Zern
68efc64b72
Merge "Clean vpx_idct32x32_1024_add_neon()"
2017-03-17 05:24:58 +00:00
Rafael de Lucena Valle
405b94c661
Add Hadamard for Power8
...
Change-Id: I3b4b043c1402b4100653ace4869847e030861b18
Signed-off-by: Rafael de Lucena Valle <rafaeldelucena@gmail.com>
2017-03-15 23:46:18 -03:00
Linfeng Zhang
e54231d613
Clean vpx_idct32x32_1024_add_neon()
...
Change-Id: I05921e16d6a3e4e7e5b00a90624735050a186636
2017-03-15 11:24:31 -07:00
Yi Luo
8440cc4817
Merge "Improve idct32x32_1024_add SSSE3 intrinsics performance"
2017-03-15 02:32:52 +00:00
Linfeng Zhang
c756eb01c8
Fix overflow issue in 32x32 idct NEON intrinsics
...
Similar issue as Change bc1c18e
.
The PartialIDctTest.ResultsMatch test on vpx_idct32x32_135_add_neon()
in high bit-depth mode exposes 16-bit overflow in final stage of pass
2, when changing the test number from 1,000 to 1,000,000.
Change to use saturating add/sub for vpx_idct32x32_34_add_neon(),
vpx_idct32x32_135_add_neon and vpx_idct32x32_1024_add_neon() in high
bit-depth mode.
Change-Id: Iaec0e9aeab41a3fdb4e170d7e9b3ad1fda922f6f
2017-03-14 16:59:14 -07:00
Yi Luo
fedcf83f33
Improve idct32x32_1024_add SSSE3 intrinsics performance
...
- Function level speed improves ~12%.
Change-Id: I9b7dbddabf08c7d0f6b25264e6074d5ccbe39290
2017-03-14 14:04:08 -07:00
Linfeng Zhang
b0bfcc368c
Merge "Add vpx_highbd_idct32x32_135_add_c()"
2017-03-13 18:49:01 +00:00
James Zern
48fca113d1
inv_txfm_ssse3,butterfly: fix win32 abi compatibility
...
only the first 3 parameters can be aligned to 16 as required by __m128i,
make them all pointers for consistency.
since:
07c48ccfe
Improve idct32x32_34_add SSSE3 intrinsics performance
BUG=webm:1384
Change-Id: I0324f701e723a27cb470036a180693ba8829d01d
2017-03-10 19:57:17 -08:00
Yi Luo
327add990f
Improve idct32x32_135_add SSSE3 intrinsics performance
...
- Split the inv txfm into three parts to avoid stack spillover.
- Function level speed improves ~12%.
- Use function and macro to remove some repeated code.
Change-Id: I14f5f072334fd766808cb52bf648df792e7379ee
2017-03-09 16:17:54 -08:00
Linfeng Zhang
77311e0dff
Update vpx_idct32x32_1024_add_neon()
...
Most are cosmetics changes.
Speed has no change with clang 3.8, and about 5% faster with gcc 4.8.4
Tried the strategy used in 8x8 and 16x16 (which operations' orders are
similar to the C code), though speed gets better with gcc, it's worse
with clang.
Tried to remove store_in_output(), but speed gets worse.
Change-Id: I93c8d284e90836f98962bb23d63a454cd40f776e
2017-03-08 12:39:04 -08:00
Linfeng Zhang
48f5886605
Add vpx_highbd_idct32x32_135_add_c()
...
When eob is less than or equal to 135 for high-bitdepth 32x32 idct,
call this function.
BUG=webm:1301
Change-Id: I8a5864f5c076e449c984e602946547a7b09c9fe6
2017-03-08 10:46:33 -08:00
Linfeng Zhang
c4e5c54d69
cosmetics,dsp/arm/: vpx_idct32x32_{34,135}_add_neon()
...
No speed changes and disassembly is almost identical.
Change-Id: Id07996237d2607ca6004da5906b7d288b8307e1f
2017-03-08 08:58:32 -08:00
Linfeng Zhang
3cf5c213f1
cosmetics,dsp/arm/: rename a variable
...
Rename cospi_6_26_14_18N to cospi_6_26N_14_18N for consistency.
Change-Id: I00498b43bb612b368219a489b3adaa41729bf31a
2017-03-08 08:55:41 -08:00
Yi Luo
07c48ccfe0
Improve idct32x32_34_add SSSE3 intrinsics performance
...
- Split the transform into first half and second half.
- Reschedule the instructions to avoid stack spillover.
- Function level speed improves ~16%.
Change-Id: I166889840d23aa8a273eca00f6fbdae8b4566f35
2017-03-01 11:14:48 -08:00
James Zern
47d6f16a04
get_prob(): rationalize int types
...
promote the unsigned int calculation to uint64_t rather than int64_t for
type consistency
Change-Id: Ic34dee1dc707d9faf6a3ae250bfe39b60bef3438
2017-02-24 15:36:52 -08:00
Jerome Jiang
b1dcaf7f1e
Merge "Fix segmentation fault caused by denoiser working with spatial SVC."
2017-02-22 04:44:55 +00:00
Yi Luo
6036a0d24f
Following SSSE3 intrinsics functions also work for HBD
...
- vpx_idct8x8_12_add_ssse3
vpx_idct8x8_64_add_ssse3
vpx_idct32x32_34_add_ssse3
vpx_idct32x32_135_add_ssse3
vpx_idct32x32_1024_add_ssse3
- turn on unit tests.
Change-Id: I788b2b3b2074a6f3ab6a0e6f469c1327a123eff7
2017-02-21 12:37:53 -08:00
Jerome Jiang
0d1e5a21c4
Fix segmentation fault caused by denoiser working with spatial SVC.
...
Re-enable the affected test.
BUG=webm:1374
Change-Id: I98cd49403927123546d1d0056660b98c9cb8babb
2017-02-21 09:38:28 -08:00
Yi Luo
1f8e8e5bf1
Fix idct8x8 SSSE3 SingleExtremeCoeff unit tests
...
- In SSSE3 optimization, 16-bit addition and subtraction would
overflow when input coefficient is 16-bit signed extreme values.
- Function-level speed becomes slower (unit ms):
idct8x8_64: 284 -> 294
idct8x8_12: 145 -> 158.
BUG=webm:1332
Change-Id: I1e4bf9d30a6d4112b8cac5823729565bf145e40b
2017-02-17 14:05:05 -08:00
James Zern
3e7025022e
Merge "Add vpx_highbd_idct16x16_10_add_neon()"
2017-02-17 20:29:37 +00:00
Yi Luo
f62dcc9c33
Replace idct32x32_1024_add_ssse3 assembly with intrinsics
...
- Encoding/decoding test, BQTerrace_1920x1080_60.y4m, on
i7-6700, no obvious user-level speed performance downgrade.
- Passed unit tests.
Change-Id: I20688e0dd3731021ec8fb4404734336f1a426bfc
2017-02-16 16:10:40 -08:00
Johann Koenig
a9b81da575
Merge "block error avx2: use tran_low_t"
2017-02-16 23:51:14 +00:00
Linfeng Zhang
0620081731
Add vpx_highbd_idct16x16_10_add_neon()
...
BUG=webm:1301
Change-Id: If686c8144764c4162458f0bc4bb1bbf6555c48ab
2017-02-16 15:13:50 -08:00
James Zern
0f014c97e5
Merge "Fix mips vpx_post_proc_down_and_across_mb_row_msa function"
2017-02-16 23:02:10 +00:00
Johann Koenig
06a82af0de
Merge "correct bitdepth_conversion_sse2.h header guard"
2017-02-16 21:41:28 +00:00
Johann
6c2d732bf4
correct bitdepth_conversion_sse2.h header guard
...
Change-Id: Ic4ffd861608e67fe59bcb3a86010ce3ef11a5519
2017-02-16 12:43:33 -08:00
Yi Luo
1cb44945fb
Merge "Add idct32x32_135_add SSSE3 intrinsics"
2017-02-16 20:43:29 +00:00
Johann
2104454607
block error avx2: use tran_low_t
...
Change-Id: Ic5f3a1f569d6f82afeaf4fcd7235374bb460db3c
2017-02-16 12:39:02 -08:00
Yi Luo
72a43e2378
Add idct32x32_135_add SSSE3 intrinsics
...
- Replace the corresponding assembly code.
- No user level speed performance degrade.
- Unit tests passed.
Change-Id: Idd0c5a4bad4976f1617c34100cb46e75e3b961e5
2017-02-16 11:29:34 -08:00
Johann
4682130b60
quantize_fp highbd ssse3: use tran_low_t for coeff
...
Change-Id: Iebade0efc0efbb0a80a0f3adbef4962e3a2f25e8
2017-02-16 07:40:56 -08:00
Johann
44600442dc
bitdepth conversion: really use num elements
...
The previous implementation confused bit/bytes/elements. It was using
'32' as the multiplier but that was mistakenly adopted because a 32x32
transform embedded the stride.
Change-Id: Ieeb867a332416b9a40580b5e7c9b20088e9e691a
2017-02-16 15:02:48 +00:00
Kaustubh Raste
fddf66b741
Fix mips vpx_post_proc_down_and_across_mb_row_msa function
...
Added fix to handle non-multiple of 16 cols case for size 16
Change-Id: If3a6d772d112077c5e0a9be9e612e1148f04338c
2017-02-16 13:17:00 +05:30
Johann Koenig
b63e88e506
Merge "Use 'packssdw' for loading tran_low_t values"
2017-02-16 02:41:00 +00:00
Linfeng Zhang
106c342659
cosmetics,dsp/inv_txfm.c: reorder functions
...
Change-Id: Ie0f7689ebe230c68eadb22a32b14838c1a7543a6
2017-02-15 11:40:35 -08:00
Linfeng Zhang
81914ce68a
Add vpx_highbd_idct16x16_38_add_neon()
...
BUG=webm:1301
Change-Id: Ic6cd8c1e63e1b7a997cbed221e20fff4c599e0fe
2017-02-15 09:12:02 -08:00
Linfeng Zhang
e07e74fb0f
Add vpx_highbd_idct16x16_38_add_c()
...
When eob is less than or equal to 38 for high-bitdepth 16x16 idct,
call this function.
BUG=webm:1301
Change-Id: I09167f89d29c401f9c36710b0fd2d02644052060
2017-02-14 17:25:52 -08:00
Johann
327a02d77e
Use 'packssdw' for loading tran_low_t values
...
This matches bitdepth_conversion_sse2.asm and produces substantially
better assembly. The old way had lots of 'movzwl' and 'shl' and storing
back to memory before loading into an xmm register.
Change-Id: Ib33e35354dfd691a4f8b1e39f4dbcbb14cd5302b
2017-02-14 22:39:49 +00:00
Linfeng Zhang
429e652809
Replace 14 with DCT_CONST_BITS in idct NEON functions' shifts
...
Change-Id: I2a39a3bb87516b04d273bc1c0f4a634e3fb6f0f6
2017-02-14 13:08:41 -08:00
clang-format
4b402746ca
apply clang-format
...
Change-Id: I75e4a9e0b37bd4586f26c8d6c1fa27f3f6ff1bce
2017-02-14 12:45:52 -08:00
Yi Luo
c1a90dc160
Merge "Replace idct32x32_34_add_ssse3 assembly with intrinsics"
2017-02-14 20:13:27 +00:00
Yi Luo
bd86de1ac8
Replace idct32x32_34_add_ssse3 assembly with intrinsics
...
- No user-level speed performance change.
- Pass unit tests.
Change-Id: Idfc598e00f354265e41f6b3219f4734216c115c6
2017-02-14 10:38:36 -08:00
Linfeng Zhang
de9ae32b93
Merge "Add vpx_highbd_idct16x16_256_add_neon()"
2017-02-14 01:15:34 +00:00
Linfeng Zhang
5ad4159ebb
Add vpx_highbd_idct16x16_256_add_neon()
...
BUG=webm:1301
Change-Id: I6bb755552a39bdd26eef3f449601f6a9766c65ec
2017-02-13 15:50:33 -08:00
Johann
5ecde212a8
fdct8x8 highbd neon: use tran_low_t for output
...
Change-Id: I100c4a1955d80bec4d28e82796b3e7f57e84d0ba
2017-02-13 22:16:14 +00:00
Linfeng Zhang
016933ad48
Add vpx_highbd_idct{16x16,32x32}_1_add_neon()
...
and update vpx_highbd_idct8x8_1_add_neon()
BUG=webm:1301
Change-Id: I18d1a0cbe98ba822d5194c1b4e13a4c29c5c75f4
2017-02-13 10:25:22 -08:00
James Zern
91f87e7513
Merge "Add vpx_idct16x16_38_add_neon()"
2017-02-11 03:42:36 +00:00
Linfeng Zhang
bc1c18e18c
Add vpx_idct16x16_38_add_neon()
...
The RunQuantCheck() test on it exposes 16-bit overflow in stage 7 of
pass 2. Change to use saturating add/sub for both
vpx_idct16x16_38_add_neon() and vpx_idct16x16_256_add_neon() for high
bitdepth.
Change-Id: Ibf4c107a887553a52852cc582e28d38a5a5a2712
2017-02-08 12:15:22 -08:00
Yi Luo
ac04d11abc
Replace idct8x8_12_add_ssse3 assembly code with intrinsics
...
- Performance achieves the same as assembly.
- Unit tests pass.
Change-Id: I6eacfbbd826b3946c724d78fbef7948af6406ccd
2017-02-08 10:07:45 -08:00
Linfeng Zhang
cf76ee2cb7
Add vpx_idct16x16_38_add_c()
...
When eob is less than or equal to 38 for 16x16 idct, call this function.
Change-Id: Ief6f3fb16a49ace3c92cebf4e220bf5bf52a6087
2017-02-07 09:40:51 -08:00
Linfeng Zhang
66695533a8
Merge "Update 16x16 8-bit idct NEON intrinsics"
2017-02-07 16:52:40 +00:00
Johann
641fda79bb
highbd x86: consolidate tran_low_t conversions
...
Create new helper files specifically for converting tran_low_t types.
Change-Id: I7c4c458ef910f3b3d10a3cfbf9df4de7682fd905
2017-02-06 10:43:26 -08:00
Jingning Han
bb40844e32
Merge "Add SSSE3 intrinsic 8x8 inverse 2D-DCT"
2017-02-02 22:18:32 +00:00
Kaustubh Raste
5b10674b5c
Merge "Add mips msa sum_squares_2d_i16 function"
2017-02-02 08:09:21 +00:00
Johann Koenig
726556dde9
Merge "Remove neon assembly for idct 16x16 and 8x8"
2017-02-02 03:25:31 +00:00
Johann Koenig
ce6318f254
Merge changes I43521ad3,I013659f6
...
* changes:
satd highbd neon: use tran_low_t for coeff
satd highbd sse2: use tran_low_t for coeff
2017-02-02 03:03:58 +00:00
Linfeng Zhang
e4985cf619
Update 16x16 8-bit idct NEON intrinsics
...
Remove redundant memory accesses.
Change-Id: I8049074bdba5f49eab7e735b2b377423a69cd4c8
2017-02-01 17:04:33 -08:00
Jingning Han
8f95389742
Add SSSE3 intrinsic 8x8 inverse 2D-DCT
...
The intrinsic version reduces the average cycles from 183 to 175.
Change-Id: I7c1bcdb0a830266e93d8347aed38120fb3be0e03
2017-02-01 14:47:53 -08:00
Johann Koenig
dc90501ba3
Merge changes I374dfc08,I7e15192e,Ica414007
...
* changes:
hadamard highbd ssse3: use tran_low_t for coeff
hadamard highbd neon: use tran_low_t for coeff
hadamard highbd sse2: use tran_low_t for coeff
2017-02-01 21:56:36 +00:00
Johann Koenig
f60171bb4f
Merge "deblock: annotate postproc parameters"
2017-02-01 19:57:29 +00:00
Johann
f8d744d91a
satd highbd neon: use tran_low_t for coeff
...
BUG=webm:1365
Change-Id: I43521ad32b6c96737a8ef2b8c327f901fd7eaf84
2017-02-01 11:55:47 -08:00
Johann
2ba383474d
satd highbd sse2: use tran_low_t for coeff
...
BUG=webm:1365
Change-Id: I013659f6b9fbf9cc52ab840eae520fe0b5f883fb
2017-02-01 11:55:16 -08:00
Johann
0f751ecee3
hadamard highbd ssse3: use tran_low_t for coeff
...
BUG=webm:1365
Change-Id: I374dfc08732932382043905f128e928b08cb4f57
2017-02-01 11:51:15 -08:00
Johann
1eb8a718bf
hadamard highbd neon: use tran_low_t for coeff
...
BUG=webm:1365
Change-Id: I7e15192ead3a3631755b386f102c979f06e26279
2017-02-01 11:50:46 -08:00
Johann
2dac808dd1
hadamard highbd sse2: use tran_low_t for coeff
...
BUG=webm:1365
Change-Id: Ica414007d8412ceebfffa9e58e8416226a3fe934
2017-02-01 11:46:57 -08:00
Johann Koenig
3bda634576
Merge "quantize ssse3: remove unused pxor"
2017-02-01 19:41:41 +00:00
Jingning Han
969957f9f2
Fix real-time compression regression in hbd mode
...
This commit resolves the compression performance regression in
real-time encoding setting when high bit-depth mode is enabled.
The current solution temporarily disables the SIMD implementations
of vpx_satd, hadamard8x8, and hadamard16x16 in high bit-depth mode.
The commit makes the coding results bit-wise identical between
regular coding pipeline and high bit-depth at profile 0.
BUG=webm:1365
Change-Id: Icfb900821733749685370460a1a5a7e07f76f4bf
2017-01-31 23:17:09 -08:00
Johann
32f68cc58c
deblock: annotate postproc parameters
...
Clears a clang static analyzer warning where 'cols' is assumed to be
less than 0, preventing the for loop from executing.
The assembly already requires that the size be 8 or 16 (U/V or Y plane)
and cols is a multiple of 8.
Change-Id: Ica4612690ead1638c94cfe56b306e87f8ce644f9
2017-01-31 15:58:57 -08:00
Kaustubh Raste
750e753134
Add mips msa sum_squares_2d_i16 function
...
average improvement ~4x-5x
Change-Id: I8d91b71d0677009be52b412e4f52b40b98573a53
2017-01-31 12:22:43 +00:00
Kaustubh Raste
df7e1fecc1
Add mips msa vpx_minmax_8x8 function
...
average improvement ~4x-5x
Change-Id: I83aee9977534fddb8a9b80d31af646c0b6b1a8c3
2017-01-31 10:00:43 +05:30
Johann
dcfff3ccc8
quantize ssse3: remove unused pxor
...
Change-Id: Ifa22d77fd530827de0b32ae71810dc2213ab2937
2017-01-30 17:02:57 -08:00
Kaustubh Raste
4ce20fb3f4
Add mips msa vpx_vector_var function
...
average improvement ~4x-5x
Change-Id: I2f63ef83d816052ca8dc42421e7e9d42f7a7af6b
2017-01-28 08:53:20 +00:00
Kaustubh Raste
407fad2356
Add mips msa vpx Integer projection row/col functions
...
average improvement ~4x-5x
Change-Id: I17c41383250282b39f5ecae0197ef1df7de20801
2017-01-27 11:11:42 +05:30
Kaustubh Raste
182ea677a0
Add mips msa vpx satd function
...
average improvement ~4x-5x
Change-Id: If8683d636fe2606d4ca1038e28185bca53bbe244
2017-01-24 10:44:22 +05:30
Johann
13234d3c43
Remove neon assembly for idct 16x16 and 8x8
...
Tested using test/partial_idct_test.cc:DISABLED_Speed
Both gcc 4.9 and clang 3.8 from the r13 Android NDK offer improvements
using the intrinsics:
<function> <clang asm> <gcc asm> <clang intrin> <gcc intrin>
idct16x16_256 1720ms 1703ms 1546ms 1554ms
idct16x16_10 1320ms 1247ms 518ms 488ms
idct16x16_1 107ms 108ms 64ms 68ms
idct8x8_64 924ms 931ms 866ms 989ms
idct8x8_12 826ms 824ms 519ms 514ms
idct8x8_1 172ms 166ms 110ms 125ms
idct8x8_64 isn't quite perfect (slight regression with gcc intrinsics)
but as a counter example idct16x16_10 goes from ~1300ms to ~500ms
On a sample clip, clang improved from 48.5 to 49fps and gcc stayed roughly
stable.
BUG=webm:1303
Change-Id: I9d4fd2b41b46ea6174a887b40a82c8e6e4769ed4
2017-01-19 12:27:31 -08:00
Kaustubh Raste
e0c0e65378
Add mips msa vpx hadamard functions
...
average improvement ~4x-5x
Change-Id: I167132d894c04fa85dda8dde7906ff9c61b3a65d
2017-01-19 14:44:03 +05:30
Jingning Han
b6fe63a505
Merge "Rework 8x8 transpose SSSE3 for avg computation"
2017-01-13 18:25:17 +00:00
Jingning Han
553e9e291f
Merge "Rework 8x8 transpose SSSE3 for inverse 2D-DCT"
2017-01-13 18:25:09 +00:00
Jingning Han
39fff1bea0
Rework 8x8 transpose SSSE3 for avg computation
...
Use same transpose process as inv_txfm_sse2 does.
Change-Id: I2db05f0b254628a11f621c4c09abb89501ba6d3c
2017-01-12 15:16:07 -08:00
Jingning Han
f65170ea84
Rework 8x8 transpose SSSE3 for inverse 2D-DCT
...
Use same transpose process as inv_txfm_sse2 does.
Change-Id: Ic4827825bd174cba57a0a80e19bf458a648e7d94
2017-01-12 15:13:18 -08:00
Johann Koenig
9f27d1f843
Merge "arm idct16x16: remove extra config guards"
2017-01-11 20:22:27 +00:00
Johann
68d0f46ec0
arm idct16x16: remove extra config guards
...
This file is guarded by HAVE_NEON_ASM in the .mk file now.
Change-Id: I513a621c234aa90ad52e426c8ed494d8a7d4b74a
2017-01-11 10:17:14 -08:00
Jingning Han
9a780fa7db
Rework forward 8x8 2D-DCT ssse3 implementation
...
This commit reworks the SSSE3 implementation of the forward 8x8
2D-DCT. It uses a cyclic rotation approach to the temporary xmm
registers. It reduces the average cycles from 158 to 154. The SSE2
version uses 169 cycles.
Change-Id: I1b79b9642aae0ed3fb3cefb5b70246e6de5d5caa
2017-01-10 12:50:55 -08:00
James Zern
9480da21e8
Merge "Refine 8-bit 16x16 idct NEON intrinsics"
2017-01-09 23:52:29 +00:00
Johann Koenig
371a64bfe7
Merge "postproc: vpx_mbpost_proc_down_neon"
2017-01-09 19:53:15 +00:00
Johann Koenig
8a7847c2c9
Merge "Fix mips dspr2 idct32x32 functions for large coefficient input"
2017-01-09 19:47:47 +00:00
Johann Koenig
bf168b24f5
Merge "Fix mips dspr2 idct16x16 functions for large coefficient input"
2017-01-09 19:47:00 +00:00
Johann Koenig
08d0a7fd0f
Merge "Fix mips dspr2 idct8x8 functions for large coefficient input"
2017-01-09 19:46:18 +00:00
Johann Koenig
ab20869221
Merge "Fix mips dspr2 idct4x4 functions for large coefficient input"
2017-01-09 19:45:54 +00:00
Johann
c23970ec25
postproc: vpx_mbpost_proc_down_neon
...
This was much more amenable to optimization than the across filter.
Speedup of almost 2.5x
BUG=webm:1320
Change-Id: I49acc0f9cb2e7642303df90132cbc938acade4c4
2017-01-09 10:21:56 -08:00
Johann Koenig
9af97fb630
Merge "postproc: vpx_mbpost_proc_across_ip_neon"
2017-01-09 18:17:26 +00:00
Kaustubh Raste
50dd3eb62c
Fix mips dspr2 idct32x32 functions for large coefficient input
...
Change-Id: If9da7099f226a27a09cc9e2899eb66a1158909d2
2017-01-09 17:21:09 +05:30
Kaustubh Raste
c06991fce6
Fix mips dspr2 idct16x16 functions for large coefficient input
...
Change-Id: I9be3d3d040837f658c6314606e28db8c31092a1a
2017-01-09 16:35:28 +05:30
Kaustubh Raste
24d804f79c
Fix mips dspr2 idct8x8 functions for large coefficient input
...
Change-Id: If011dd923bbe976589735d5aa1c3167dda1a3b61
2017-01-09 16:22:19 +05:30
Kaustubh Raste
afd2d797eb
Fix mips dspr2 idct4x4 functions for large coefficient input
...
Change-Id: I06730eec80ca81e0b7436d26232465b79f447e89
2017-01-09 15:28:30 +05:30
Linfeng Zhang
6abdd31555
Refine 8-bit 16x16 idct NEON intrinsics
...
Speed test shows 25% gain on vpx_idct16x16_256_add_neon(),
and vpx_idct16x16_10_add_neon() got trippled.
Change-Id: If8518d9b6a3efab74031297b8d40cd83c4a49541
2017-01-06 17:52:07 -08:00
Johann
4dca923454
postproc: vpx_mbpost_proc_across_ip_neon
...
The speedup is pretty poor. I would be concerned except the SSE2 is
worse:
Existing SSE2 improvement: 22%
New neon improvement: 35%
BUG=webm:1320
Change-Id: Ied598a261134aa6cbe69f96f58589d2bae17bf62
2017-01-06 16:39:17 -08:00
Linfeng Zhang
2d12a52ff0
Merge "Add high bitdepth 8x8 idct NEON intrinsics"
2017-01-06 16:47:23 +00:00
Linfeng Zhang
911bb980b1
Clean DC only idct NEON intrinsics
...
BUG=webm:1301
Change-Id: Iffc83854218460b3f687f3774e71d45b552382a5
2016-12-28 13:51:44 -08:00
Linfeng Zhang
9b187954df
Add high bitdepth 8x8 idct NEON intrinsics
...
BUG=webm:1301
Change-Id: I56e3bc3aab9214e2debac93796389a7194991084
2016-12-27 16:28:53 -08:00
Linfeng Zhang
6d5a3fe583
Clean idct 8x8 neon functions
...
BUG=webm:1301
Change-Id: I05f47dca1fddc155c8396e627cfccf6449677307
2016-12-21 14:24:17 -08:00
James Zern
a68b36c752
vpx_idct32x32_1024_add_neon: quiet uninitialized warning
...
relocate the assignment to 'in' outside of the for loop. this quiets a
spurious warning in visual studio builds since:
86e340c
enable vpx_idct32x32_1024_add_neon in hbd builds
+ give the variable a more descriptive name
BUG=webm:1294
Change-Id: I5c3da5c7939621477e0fc0ad3a1b2a3045c5bffd
2016-12-19 12:49:44 -08:00
Linfeng Zhang
7e23f895ca
Merge "Clean hbd idct 4x4 neon functions and other"
2016-12-19 17:09:26 +00:00
Johann
41b0888a84
postproc: neon down and across macroblock filter
...
Implement vpx_post_proc_down_and_across_mb_row in NEON.
Runs about 6-7x faster than C.
BUG=webm:1320
Change-Id: Ic5c7d3552a88cfcf999ec5bf2bd46fee460642c2
2016-12-14 15:11:28 -08:00
Linfeng Zhang
c8f25fa5c0
Clean hbd idct 4x4 neon functions and other
...
BUG=webm:1301
Change-Id: I387b7eae716a7df15c691dc6f368b07602df7342
2016-12-14 11:38:28 -08:00
James Zern
86e340c76e
enable vpx_idct32x32_1024_add_neon in hbd builds
...
BUG=webm:1294
Change-Id: Ibdda54e6d1303b0f73bc7bc71417e4041d7618de
2016-12-12 19:28:35 -08:00
Linfeng Zhang
5d4aa325a6
Cosmetics by unifying dest_stride to stride in idct
...
Change-Id: Ie9336a808a3c3592bb4fd5d4ad3839028bfcafba
2016-12-12 15:13:22 -08:00
Johann
2c24f7178d
Move load_and_transpose to transpose_neon.h
...
Allows for use outside the idcts without pulling in idct_neon.h
Change-Id: I4a94c1af3dac3e1b5bc8296ec9eab0ddcc8cfecf
2016-12-09 12:54:55 -08:00
James Zern
6defef4ab2
idct16x16_add_neon: fix arm visual studio builds
...
after:
2d3d95f
enable vpx_idct16x16_256_add_neon in hbd builds
reorder INCLUDEs and fix indent of IF/ENDIFs
remove vpx_config.asm to avoid multiple symbol definitions in windows
builds and shift idct_neon.asm.S to the top to allow use of
CONFIG_VP9_HIGHBITDEPTH in the export list.
Change-Id: I0dacfbae62a6ec8fe4a26940c1a52da2dfad2029
2016-12-08 15:17:57 -08:00
Linfeng Zhang
174528de1e
Merge "Update idct NEON optimization to not use narrowing saturating shift"
2016-12-07 21:03:21 +00:00
James Zern
f16a0a1aa4
Merge "enable vpx_idct16x16_256_add_neon in hbd builds"
2016-12-07 20:26:44 +00:00
Linfeng Zhang
018a2adcb1
Update idct NEON optimization to not use narrowing saturating shift
...
Change-Id: Iae517017217dbacd638d40fcfeeb0f4bba7b8b8b
2016-12-07 10:25:09 -08:00
James Zern
2d3d95f7ac
enable vpx_idct16x16_256_add_neon in hbd builds
...
BUG=webm:1294
Change-Id: Ib421c150b0d29dee0a81390a612bf01a4a28cff1
2016-12-06 18:32:21 -08:00
James Zern
228c9940ea
Merge changes Ibad079f2,I7858a0a1
...
* changes:
enable vpx_idct16x16_10_add_neon in hbd builds
idct16x16,NEON: rm output_stride from pass1 fns
2016-12-07 01:40:28 +00:00
James Zern
8befcd0089
enable vpx_idct16x16_10_add_neon in hbd builds
...
BUG=webm:1294
Change-Id: Ibad079f25e673d4f5181961896a8a8333a51e825
2016-12-06 16:09:19 -08:00
James Zern
af9d7aa9fb
idct16x16,NEON: rm output_stride from pass1 fns
...
vpx_idct16x16_256_add_neon_pass1, vpx_idct16x16_10_add_neon:
this was a constant 8 in all cases meaning the results are stored
contiguously, this allows the number of stores to be reduced.
Change-Id: I7858a0a15a284883ef45c13dfd97c308df9ea09e
2016-12-06 15:13:33 -08:00
Linfeng Zhang
cb339d628f
Refine 8-bit 8x8 idct NEON intrinsics
...
Change-Id: I4ec4ad1928ec2ed87f596f52f097bc52065278dd
2016-12-05 17:50:14 -08:00
Linfeng Zhang
a8eee97b43
Check in vpx_lpf_vertical_4_dual_neon() assembly
...
This replaces its C version.
Change-Id: Ie39e9324305fdc0fff610ced608a037e44a85a1a
2016-12-02 15:54:30 -08:00
James Zern
a7fa1314da
Merge changes I4afc130e,Iaa64d23f
...
* changes:
Add high bitdepth 4x4 idct NEON intrinsics
Update idct x86 intrinsics to not use saturated add and sub
2016-12-02 04:01:28 +00:00
Linfeng Zhang
17a8cf5cc3
Add high bitdepth 4x4 idct NEON intrinsics
...
Change-Id: I4afc130effa05b8be2e9f982967216b1beb2ce4b
2016-11-30 13:07:13 -08:00
Linfeng Zhang
264f6e70ec
Update idct x86 intrinsics to not use saturated add and sub
...
Change-Id: Iaa64d23fdb45ca1f235b0ea57e614516e548eca4
2016-11-29 17:06:08 -08:00
James Zern
c6641782c3
idct16x16,NEON,cosmetics: normalize fn signatures
...
+ remove unused parameters from vpx_idct16x16_10_add_neon_pass2
Change-Id: Ie5912a4abdd308fab589380bca054a2e7234a2c4
2016-11-28 16:46:01 -08:00