Johann Koenig
e7cac13016
Merge changes Ib8dd96f7,Ie9854b77
...
* changes:
neon variance: process 4x blocks
use memcpy for unaligned neon stores
2017-05-22 17:48:33 +00:00
Johann
7b742da63e
neon variance: process 4x blocks
...
Continue processing sets of 16 values. Plenty of improvement for 4x8
(doubles the speed) but only about 30% for 4x4.
BUG=webm:1422
Change-Id: Ib8dd96f75d474f0348800271d11e58356b620905
2017-05-17 17:35:01 -07:00
Johann
105503b839
neon fdct: 4x4 implementation
...
Approximately twice as fast as C implementation.
BUG=webm:1424
Change-Id: I3c0307fb08ddc23df42545cd089a78e2ed5c9d3f
2017-05-17 07:38:18 -07:00
Alexandra Hájková
bcbc3929ae
ppc: Add vpx_sad64/32/16x64/32/16_avg_vsx
...
Change-Id: Ic9639b1331d8c5cbc207c2a036891ff0137fc56f
2017-05-13 13:13:15 +00:00
James Zern
ac8f58f6ab
Merge changes I1b54a7a5,I3028bdad,I59788cd9
...
* changes:
ppc: Add get_mb_ss_vsx
ppc: Add get4x4sse_cs_vsx
ppc: Add comp_avg_pred_vsx
2017-05-12 15:24:59 +00:00
Luca Barbato
143b21e362
ppc: Add get_mb_ss_vsx
...
Change-Id: I1b54a7a5bb642e4b836d786ea1ae506eed025e3f
2017-05-12 17:23:00 +02:00
Luca Barbato
6d225eb5f9
ppc: Add get4x4sse_cs_vsx
...
Change-Id: I3028bdadf653665d18e781d28e9625f62804b3d8
2017-05-12 17:23:00 +02:00
Luca Barbato
a7f8bd451b
ppc: Add comp_avg_pred_vsx
...
Change-Id: I59788cd98231e707239c2ad95ae54f67cfe24e10
2017-05-12 17:22:55 +02:00
Alexandra Hájková
f48532e271
ppc: Add vpx_sad64x32/64_vsx
...
Change-Id: I84e3705fa52f75cb91b2bab4abf5cc77585ee3e2
2017-05-12 16:10:16 +02:00
Alexandra Hájková
0b15bf1e54
ppc Add vpx_sad32x16/32/64_vsx
...
Change-Id: I3c4f9d595275669580413a71b3c3c810e7ddcacd
2017-05-12 16:10:11 +02:00
James Zern
a12ea1d5e9
Merge "ppc: Add vpx_sad16x8/16/32_vsx"
2017-05-12 13:33:51 +00:00
Alexandra Hájková
cc7f0c0f3e
ppc: Add vpx_sad16x8/16/32_vsx
...
Change-Id: I60619d28fffd9809f93b1af510a50e1aa02519a9
2017-05-10 19:57:30 +00:00
Linfeng Zhang
764b3b8090
Update specializations of idct functions
...
Introduced append situation in Commit 0178d97
which could be
confusing. Clean a little bit and add some comments.
Change-Id: I69ad336f805aca7ce9d45515b8cd237423fadbb2
2017-05-10 12:51:18 -07:00
Johann Koenig
d713ec3c46
Merge changes I92eb4312,Ibb2afe4e
...
* changes:
subpel variance neon: add mixed sizes
sub pixel variance neon: use generic variance
2017-05-10 18:19:52 +00:00
Johann Koenig
1814463864
Merge changes Id602909a,Ib0e85608
...
* changes:
neon variance: process two rows of 8 at a time
neon variance: add small missing sizes
2017-05-08 17:34:20 +00:00
Linfeng Zhang
2c3a2ad6f1
Merge changes I0cfe4117,I3581d80d,Ida62c941
...
* changes:
Split dsp/x86/inv_txfm_sse2.c
Update highbd idct functions arguments to use uint16_t dst
Clean CONVERT_TO_BYTEPTR/SHORTPTR in idct
2017-05-08 16:15:57 +00:00
Johann
2346a6da4a
subpel variance neon: add mixed sizes
...
Add support for everything except block sizes of 4.
Performance is better but numbers will improve again when the variance
optimizations land.
BUG=webm:1423
Change-Id: I92eb4312b20be423fa2fe6fdb18167a604ff4d80
2017-05-04 15:30:01 -07:00
Johann
cb9133c72f
neon variance: add small missing sizes
...
Some of the mixed sizes were missing. They can be implemented trivially
using the existing helper function.
When comparing the previous 16x8 and 8x16 implementations, the helper
function is about 10% faster than the 16x8 version. The 8x16 is very
close, but the existing version appears to be faster.
BUG=webm:1422
Change-Id: Ib0e856083c1893e1bd399373c5fbcd6271a7f004
2017-05-04 08:59:42 -07:00
Linfeng Zhang
d5de63d2be
Update highbd idct functions arguments to use uint16_t dst
...
BUG=webm:1388
Change-Id: I3581d80d0389b99166e70987d38aba2db6c469d5
2017-05-03 13:59:16 -07:00
Yi Luo
a3452996a1
High bit depth inter prediction horizontal/vertical filters AVX2
...
User level speed improvement on i7-6700, cpu-used=1,
x86_64 Linux, bitrate, 1080p, 8Mbps, 4K, 16Mbps:
- Decoder:
1080p: ~4%
4K: ~5%
- Encoder:
1080p: ~1%
4K: ~3%
Change-Id: I51b48f9c5de0d62487d5a11aa579c97bd03dd640
2017-05-03 12:18:01 -07:00
Linfeng Zhang
a10a5cb356
Merge changes I8bb660de,Ica51d780,I6037525d
...
* changes:
Clean specializes of idct functions
Clean add_protos of highbd idct functions
Clean add_protos of idct functions
2017-05-03 19:17:55 +00:00
Luca Barbato
e2ad89092d
ppc: Add convolve8_vsx and convolve8_avg_vsx
...
Change-Id: Ia5293d948003a7fff5a7cbad6e83d8a72717c857
2017-05-02 20:27:47 -07:00
Luca Barbato
e6ca81ee67
ppc: Add convolve8_avg_vert_vsx
...
Only the generic one again, speedups for 8x8 and larger blocks to
come later.
Change-Id: I90d481d3a602d1e277ead8f3934eca126b86b72d
2017-05-02 20:27:42 -07:00
Luca Barbato
a65f1771ad
ppc: Add convolve8_vert
...
Only the generic one again, speedups for 8x8 and larger blocks
to come later.
Change-Id: Ia509d6225984b4930ec03928c9bcbf51486da99f
2017-05-02 20:27:33 -07:00
Luca Barbato
77772350f3
ppc: Add convolve8_horiz_avg
...
The 8x8 and larger blocks cases can be sped up further.
Change-Id: I54549b03ac6c7a4e3f485738b100c3cac7ac2e15
2017-05-02 20:27:28 -07:00
Luca Barbato
08edb85bd0
ppc: Add convolve8_horiz
...
The 8x8 and larger blocks cases can be sped up further.
Change-Id: I89b635d6b01c59f523f2d54b1284ed32916c5046
2017-05-02 20:27:16 -07:00
Linfeng Zhang
0178d974e5
Clean specializes of idct functions
...
Change-Id: I8bb660de47b5f97263ec381dc428db96e9c9a4b2
2017-05-02 18:01:19 -07:00
Linfeng Zhang
4412996d59
Clean add_protos of highbd idct functions
...
Change-Id: Ica51d780b92b316ce9112740c56cdf7670816371
2017-05-02 17:59:38 -07:00
Linfeng Zhang
a7a57d9756
Clean add_protos of idct functions
...
Change-Id: I6037525d92ec172810edab720389eb1865ed3b1a
2017-05-02 17:58:40 -07:00
Luca Barbato
d51d3934f5
ppc: Add convolve_avg
...
Change-Id: Ib203c444c708f42072e38301ee3db97b5b53d014
2017-04-29 15:47:25 +02:00
Luca Barbato
63860ba7b8
ppc: Add convolve_copy
...
Change-Id: Ie26d6dbe090e711d84bac01ba7da270db983f405
2017-04-29 15:47:25 +02:00
Linfeng Zhang
51dc998f3a
Update highbd convolve functions arguments to use uint16_t src/dst
...
BUG=webm:1388
Change-Id: I6912de2639895d817ce850da8ea9f6c8fe21da42
2017-04-25 14:22:19 -07:00
Luca Barbato
914b160fb5
ppc: h predictor 8x8
...
Slightly faster with the current compiler.
Change-Id: Iae225fac08395eb430c97a2abec69c60f5cf5c47
2017-04-19 19:57:51 -07:00
Luca Barbato
0b9be93205
ppc: d63 predictor 8x8
...
10x faster.
Change-Id: I7cedbf4df2ce7df5b6f1108b11815d088fdb9ba8
2017-04-19 19:57:51 -07:00
Luca Barbato
ee9325b0bd
ppc: tm predictor 4x4
...
Slightly faster.
Change-Id: I0ca43f309b3d9b50435d69bd5be64b53a99bd191
2017-04-19 19:57:51 -07:00
Luca Barbato
2904eb5800
ppc: h predictor 4x4
...
2x faster.
Change-Id: I0583dec353299c6797401b646099f18db4e0420d
2017-04-19 19:57:51 -07:00
Luca Barbato
58245d7050
ppc: dc predictor 8x8
...
Slightly faster, the other dc predictors cannot be faster since
the computation speedup is overwhelmed by the time spent reading
dst to write just the 8x8 part.
Change-Id: I94a0b50500adf8b7b6bb919dbf5c7adf5b9fba66
2017-04-19 19:57:51 -07:00
Luca Barbato
6b4a65e8b1
ppc: d45 predictor 8x8
...
11x faster.
Change-Id: I5b8f39213ee1f5260724fc254e3fb5c462435798
2017-04-19 19:57:51 -07:00
Luca Barbato
92e33c7b31
ppc: d63 predictor 32x32
...
About 10x faster.
Change-Id: If7d0645f75c5d7deb9751edd0bf47e2f9068e9e7
2017-04-19 19:57:51 -07:00
Luca Barbato
a5469a00a8
ppc: d63 predictor 16x16
...
About 18x faster.
Change-Id: Id043bf76c011e03e992085bb5e20f330d3e98cd4
2017-04-19 19:57:51 -07:00
Luca Barbato
cc868da526
ppc: d45 predictor 32x32
...
About 12x faster.
Change-Id: I22c150256aefb4941861ab1f6c17d554fb694bed
2017-04-19 19:57:51 -07:00
Luca Barbato
7a7dc9e624
ppc: d45 predictor 16x16
...
About 16x faster.
Change-Id: Ie5469fb32d5fd11bb6cb06318cea475d8a5b00b9
2017-04-19 19:57:51 -07:00
Luca Barbato
c08baa2900
ppc: dc predictor 32x32
...
10x and 5x faster.
Change-Id: I7913c58c768334d818f541a5e219f1035791eeaf
2017-04-19 19:57:47 -07:00
Luca Barbato
22ca468c7c
ppc: dc top and left predictor 32x32
...
6x faster.
Change-Id: I717995b4056e5579c68191d11b495372971fe1ae
2017-04-19 19:49:31 -07:00
Luca Barbato
ad9dea1f6d
ppc: dc top and left predictor 16x16
...
13x faster.
Change-Id: I1771ac39fda599153f933cb3f0506c9f97a6cbe6
2017-04-19 19:49:31 -07:00
Luca Barbato
d68d37872c
ppc: dc_128 predictor 32x32
...
6x faster.
Change-Id: I1da8f51b4262871cb98f0aa03ccda41b0ac2b08b
2017-04-19 19:49:31 -07:00
Luca Barbato
f9d20e6df2
ppc: dc_128 predictor 16x16
...
20x faster.
Change-Id: I05f0deb2d38ae7966eae6b71fbc0aa51880e5709
2017-04-19 19:49:31 -07:00
Luca Barbato
0d9417de4a
ppc: tm predictor 32x32
...
About 8x faster.
Change-Id: I9bad827ccbdf47ec95406e961c74ac2ff45f80cf
2017-04-19 19:49:26 -07:00
Luca Barbato
479443a570
ppc: tm predictor 16x16
...
About 10x faster.
Change-Id: I1f5a3752d346459df3b45f92963208bf3e520f06
2017-04-19 01:48:10 +02:00
Luca Barbato
c8f5a55df4
ppc: tm predictor 8x8
...
About 5x faster.
Change-Id: I951230517f49c0dca9ac9eac2efa8916a303b85a
2017-04-19 01:48:09 +02:00
Luca Barbato
7b0e12934e
ppc: horizontal predictor 32x32
...
About 5x faster.
Change-Id: I3bb724e07baffd901aa2d0f65060ba48882cc9b8
2017-04-19 01:48:09 +02:00
Luca Barbato
a7a2d1653b
ppc: horizontal predictor 16x16
...
About 10x faster.
Change-Id: Ie81077fa32ad214cdb46bdcb0be4e9e2c7df47c2
2017-04-19 01:48:09 +02:00
Luca Barbato
7ad1faa6f8
ppc: vertical intrapred 16x16 and 32x32
...
Change-Id: Ic80f3c050cfbe7697e81a311b4edaaa597b85cab
2017-04-19 01:48:09 +02:00
Johann
9fa24f03b5
re-enable vpx_comp_avg_pred_sse2
...
Buffers on 32 bit x86 builds only guaranteed 8 byte alignment. Fixed
with "AvgPred test: use aligned buffers" and "sad avg: align
intermediate buffer"
Also re-enable asserts on the C version.
BUG=webm:1390
Change-Id: I93081f1b0002a352bb0a3371ac35452417fa8514
2017-04-17 08:40:43 -07:00
James Zern
4ba20da8b1
Merge "Add AVX2 optimization to copy/avg functions"
2017-04-15 00:26:08 +00:00
Yi Luo
aa5a941992
Add AVX2 optimization to copy/avg functions
...
Change-Id: Ibcef70e4fead74e2c2909330a7044a29381a8074
2017-04-14 16:50:10 -07:00
Johann
eaa7cdf05d
Disable vpx_comp_avg_pred_sse2
...
Failures on windows:
unknown file: error: SEH exception with code 0xc0000005 thrown in the
test body.
Alignment check errors on linux:
test_libvpx: ../libvpx/vpx_dsp/variance.c:230: void
vpx_comp_avg_pred_c(uint8_t *, const uint8_t *, int, int, const uint8_t
*, int): Assertion `((intptr_t)comp_pred & 0xf) == 0' failed.
BUG=webm:1390
Change-Id: I5eed5381c0f1a8fe594a128eb415e77232f544ea
2017-04-14 08:43:06 -07:00
Johann
28a8622143
vpx_comp_avg_pred: sse2 optimization
...
Provides over 15x speedup for width > 8.
Due to smaller loads and shifting for width == 8 it gets about 8x
speedup.
For width == 4 it's only about 4x speedup because there is a lot of
shuffling and shifting to get the data properly situated.
BUG=webm:1390
Change-Id: Ice0b3dbbf007be3d9509786a61e7f35e94bdffa8
2017-04-13 08:44:52 -07:00
James Zern
e05f4cf8f4
intrapred: rename d63f to d63e
...
this is consistent with he/ve/d45e
Change-Id: I75641ae5667430b0ecd370db86fff6e666cb577d
2017-03-24 20:41:39 -07:00
James Zern
d45617c702
remove CONFIG_MISC_FIXES
...
this belonged to vp10 with the changes now migrated to av1.
Change-Id: Ie30ead3e7b71f465bc14136e1b6f156ea978c43f
2017-03-24 20:41:39 -07:00
Linfeng Zhang
27530d484e
Add vpx_highbd_idct32x32_1024_add_neon()
...
BUG=webm:1301
Change-Id: Ib90af0c1712e56b301d0e981dbe9a641e15e36ca
2017-03-17 00:27:46 -07:00
Linfeng Zhang
50b13f75b8
Add vpx_highbd_idct32x32_34_add_neon()
...
BUG=webm:1301
Change-Id: I74dd16c6c64e7bb71aa991cedccddf0663ef5e06
2017-03-17 00:27:46 -07:00
James Zern
2882778310
Merge "Add vpx_highbd_idct32x32_135_add_neon()"
2017-03-17 07:26:52 +00:00
Linfeng Zhang
65e9fb65e8
Add vpx_highbd_idct32x32_135_add_neon()
...
BUG=webm:1301
Change-Id: I58c2d65d385080711c3666d6d8f9d241dac7b21a
2017-03-16 22:37:55 -07:00
Rafael de Lucena Valle
405b94c661
Add Hadamard for Power8
...
Change-Id: I3b4b043c1402b4100653ace4869847e030861b18
Signed-off-by: Rafael de Lucena Valle <rafaeldelucena@gmail.com>
2017-03-15 23:46:18 -03:00
Linfeng Zhang
48f5886605
Add vpx_highbd_idct32x32_135_add_c()
...
When eob is less than or equal to 135 for high-bitdepth 32x32 idct,
call this function.
BUG=webm:1301
Change-Id: I8a5864f5c076e449c984e602946547a7b09c9fe6
2017-03-08 10:46:33 -08:00
Yi Luo
6036a0d24f
Following SSSE3 intrinsics functions also work for HBD
...
- vpx_idct8x8_12_add_ssse3
vpx_idct8x8_64_add_ssse3
vpx_idct32x32_34_add_ssse3
vpx_idct32x32_135_add_ssse3
vpx_idct32x32_1024_add_ssse3
- turn on unit tests.
Change-Id: I788b2b3b2074a6f3ab6a0e6f469c1327a123eff7
2017-02-21 12:37:53 -08:00
James Zern
3e7025022e
Merge "Add vpx_highbd_idct16x16_10_add_neon()"
2017-02-17 20:29:37 +00:00
Yi Luo
f62dcc9c33
Replace idct32x32_1024_add_ssse3 assembly with intrinsics
...
- Encoding/decoding test, BQTerrace_1920x1080_60.y4m, on
i7-6700, no obvious user-level speed performance downgrade.
- Passed unit tests.
Change-Id: I20688e0dd3731021ec8fb4404734336f1a426bfc
2017-02-16 16:10:40 -08:00
Linfeng Zhang
0620081731
Add vpx_highbd_idct16x16_10_add_neon()
...
BUG=webm:1301
Change-Id: If686c8144764c4162458f0bc4bb1bbf6555c48ab
2017-02-16 15:13:50 -08:00
Yi Luo
72a43e2378
Add idct32x32_135_add SSSE3 intrinsics
...
- Replace the corresponding assembly code.
- No user level speed performance degrade.
- Unit tests passed.
Change-Id: Idd0c5a4bad4976f1617c34100cb46e75e3b961e5
2017-02-16 11:29:34 -08:00
Linfeng Zhang
81914ce68a
Add vpx_highbd_idct16x16_38_add_neon()
...
BUG=webm:1301
Change-Id: Ic6cd8c1e63e1b7a997cbed221e20fff4c599e0fe
2017-02-15 09:12:02 -08:00
Linfeng Zhang
e07e74fb0f
Add vpx_highbd_idct16x16_38_add_c()
...
When eob is less than or equal to 38 for high-bitdepth 16x16 idct,
call this function.
BUG=webm:1301
Change-Id: I09167f89d29c401f9c36710b0fd2d02644052060
2017-02-14 17:25:52 -08:00
Yi Luo
c1a90dc160
Merge "Replace idct32x32_34_add_ssse3 assembly with intrinsics"
2017-02-14 20:13:27 +00:00
Yi Luo
bd86de1ac8
Replace idct32x32_34_add_ssse3 assembly with intrinsics
...
- No user-level speed performance change.
- Pass unit tests.
Change-Id: Idfc598e00f354265e41f6b3219f4734216c115c6
2017-02-14 10:38:36 -08:00
Linfeng Zhang
de9ae32b93
Merge "Add vpx_highbd_idct16x16_256_add_neon()"
2017-02-14 01:15:34 +00:00
Linfeng Zhang
5ad4159ebb
Add vpx_highbd_idct16x16_256_add_neon()
...
BUG=webm:1301
Change-Id: I6bb755552a39bdd26eef3f449601f6a9766c65ec
2017-02-13 15:50:33 -08:00
Johann
5ecde212a8
fdct8x8 highbd neon: use tran_low_t for output
...
Change-Id: I100c4a1955d80bec4d28e82796b3e7f57e84d0ba
2017-02-13 22:16:14 +00:00
Linfeng Zhang
016933ad48
Add vpx_highbd_idct{16x16,32x32}_1_add_neon()
...
and update vpx_highbd_idct8x8_1_add_neon()
BUG=webm:1301
Change-Id: I18d1a0cbe98ba822d5194c1b4e13a4c29c5c75f4
2017-02-13 10:25:22 -08:00
James Zern
91f87e7513
Merge "Add vpx_idct16x16_38_add_neon()"
2017-02-11 03:42:36 +00:00
Linfeng Zhang
bc1c18e18c
Add vpx_idct16x16_38_add_neon()
...
The RunQuantCheck() test on it exposes 16-bit overflow in stage 7 of
pass 2. Change to use saturating add/sub for both
vpx_idct16x16_38_add_neon() and vpx_idct16x16_256_add_neon() for high
bitdepth.
Change-Id: Ibf4c107a887553a52852cc582e28d38a5a5a2712
2017-02-08 12:15:22 -08:00
Yi Luo
ac04d11abc
Replace idct8x8_12_add_ssse3 assembly code with intrinsics
...
- Performance achieves the same as assembly.
- Unit tests pass.
Change-Id: I6eacfbbd826b3946c724d78fbef7948af6406ccd
2017-02-08 10:07:45 -08:00
Linfeng Zhang
cf76ee2cb7
Add vpx_idct16x16_38_add_c()
...
When eob is less than or equal to 38 for 16x16 idct, call this function.
Change-Id: Ief6f3fb16a49ace3c92cebf4e220bf5bf52a6087
2017-02-07 09:40:51 -08:00
Jingning Han
bb40844e32
Merge "Add SSSE3 intrinsic 8x8 inverse 2D-DCT"
2017-02-02 22:18:32 +00:00
Kaustubh Raste
5b10674b5c
Merge "Add mips msa sum_squares_2d_i16 function"
2017-02-02 08:09:21 +00:00
Johann Koenig
ce6318f254
Merge changes I43521ad3,I013659f6
...
* changes:
satd highbd neon: use tran_low_t for coeff
satd highbd sse2: use tran_low_t for coeff
2017-02-02 03:03:58 +00:00
Jingning Han
8f95389742
Add SSSE3 intrinsic 8x8 inverse 2D-DCT
...
The intrinsic version reduces the average cycles from 183 to 175.
Change-Id: I7c1bcdb0a830266e93d8347aed38120fb3be0e03
2017-02-01 14:47:53 -08:00
Johann
f8d744d91a
satd highbd neon: use tran_low_t for coeff
...
BUG=webm:1365
Change-Id: I43521ad32b6c96737a8ef2b8c327f901fd7eaf84
2017-02-01 11:55:47 -08:00
Johann
2ba383474d
satd highbd sse2: use tran_low_t for coeff
...
BUG=webm:1365
Change-Id: I013659f6b9fbf9cc52ab840eae520fe0b5f883fb
2017-02-01 11:55:16 -08:00
Johann
0f751ecee3
hadamard highbd ssse3: use tran_low_t for coeff
...
BUG=webm:1365
Change-Id: I374dfc08732932382043905f128e928b08cb4f57
2017-02-01 11:51:15 -08:00
Johann
1eb8a718bf
hadamard highbd neon: use tran_low_t for coeff
...
BUG=webm:1365
Change-Id: I7e15192ead3a3631755b386f102c979f06e26279
2017-02-01 11:50:46 -08:00
Johann
2dac808dd1
hadamard highbd sse2: use tran_low_t for coeff
...
BUG=webm:1365
Change-Id: Ica414007d8412ceebfffa9e58e8416226a3fe934
2017-02-01 11:46:57 -08:00
Jingning Han
969957f9f2
Fix real-time compression regression in hbd mode
...
This commit resolves the compression performance regression in
real-time encoding setting when high bit-depth mode is enabled.
The current solution temporarily disables the SIMD implementations
of vpx_satd, hadamard8x8, and hadamard16x16 in high bit-depth mode.
The commit makes the coding results bit-wise identical between
regular coding pipeline and high bit-depth at profile 0.
BUG=webm:1365
Change-Id: Icfb900821733749685370460a1a5a7e07f76f4bf
2017-01-31 23:17:09 -08:00
Kaustubh Raste
750e753134
Add mips msa sum_squares_2d_i16 function
...
average improvement ~4x-5x
Change-Id: I8d91b71d0677009be52b412e4f52b40b98573a53
2017-01-31 12:22:43 +00:00
Kaustubh Raste
df7e1fecc1
Add mips msa vpx_minmax_8x8 function
...
average improvement ~4x-5x
Change-Id: I83aee9977534fddb8a9b80d31af646c0b6b1a8c3
2017-01-31 10:00:43 +05:30
Kaustubh Raste
4ce20fb3f4
Add mips msa vpx_vector_var function
...
average improvement ~4x-5x
Change-Id: I2f63ef83d816052ca8dc42421e7e9d42f7a7af6b
2017-01-28 08:53:20 +00:00
Kaustubh Raste
407fad2356
Add mips msa vpx Integer projection row/col functions
...
average improvement ~4x-5x
Change-Id: I17c41383250282b39f5ecae0197ef1df7de20801
2017-01-27 11:11:42 +05:30
Kaustubh Raste
182ea677a0
Add mips msa vpx satd function
...
average improvement ~4x-5x
Change-Id: If8683d636fe2606d4ca1038e28185bca53bbe244
2017-01-24 10:44:22 +05:30
Kaustubh Raste
e0c0e65378
Add mips msa vpx hadamard functions
...
average improvement ~4x-5x
Change-Id: I167132d894c04fa85dda8dde7906ff9c61b3a65d
2017-01-19 14:44:03 +05:30
Johann
c23970ec25
postproc: vpx_mbpost_proc_down_neon
...
This was much more amenable to optimization than the across filter.
Speedup of almost 2.5x
BUG=webm:1320
Change-Id: I49acc0f9cb2e7642303df90132cbc938acade4c4
2017-01-09 10:21:56 -08:00
Johann Koenig
9af97fb630
Merge "postproc: vpx_mbpost_proc_across_ip_neon"
2017-01-09 18:17:26 +00:00
Johann
4dca923454
postproc: vpx_mbpost_proc_across_ip_neon
...
The speedup is pretty poor. I would be concerned except the SSE2 is
worse:
Existing SSE2 improvement: 22%
New neon improvement: 35%
BUG=webm:1320
Change-Id: Ied598a261134aa6cbe69f96f58589d2bae17bf62
2017-01-06 16:39:17 -08:00
Linfeng Zhang
9b187954df
Add high bitdepth 8x8 idct NEON intrinsics
...
BUG=webm:1301
Change-Id: I56e3bc3aab9214e2debac93796389a7194991084
2016-12-27 16:28:53 -08:00
Johann
41b0888a84
postproc: neon down and across macroblock filter
...
Implement vpx_post_proc_down_and_across_mb_row in NEON.
Runs about 6-7x faster than C.
BUG=webm:1320
Change-Id: Ic5c7d3552a88cfcf999ec5bf2bd46fee460642c2
2016-12-14 15:11:28 -08:00
James Zern
86e340c76e
enable vpx_idct32x32_1024_add_neon in hbd builds
...
BUG=webm:1294
Change-Id: Ibdda54e6d1303b0f73bc7bc71417e4041d7618de
2016-12-12 19:28:35 -08:00
Linfeng Zhang
5d4aa325a6
Cosmetics by unifying dest_stride to stride in idct
...
Change-Id: Ie9336a808a3c3592bb4fd5d4ad3839028bfcafba
2016-12-12 15:13:22 -08:00
James Zern
f16a0a1aa4
Merge "enable vpx_idct16x16_256_add_neon in hbd builds"
2016-12-07 20:26:44 +00:00
James Zern
2d3d95f7ac
enable vpx_idct16x16_256_add_neon in hbd builds
...
BUG=webm:1294
Change-Id: Ib421c150b0d29dee0a81390a612bf01a4a28cff1
2016-12-06 18:32:21 -08:00
James Zern
228c9940ea
Merge changes Ibad079f2,I7858a0a1
...
* changes:
enable vpx_idct16x16_10_add_neon in hbd builds
idct16x16,NEON: rm output_stride from pass1 fns
2016-12-07 01:40:28 +00:00
James Zern
8befcd0089
enable vpx_idct16x16_10_add_neon in hbd builds
...
BUG=webm:1294
Change-Id: Ibad079f25e673d4f5181961896a8a8333a51e825
2016-12-06 16:09:19 -08:00
Linfeng Zhang
17a8cf5cc3
Add high bitdepth 4x4 idct NEON intrinsics
...
Change-Id: I4afc130effa05b8be2e9f982967216b1beb2ce4b
2016-11-30 13:07:13 -08:00
James Zern
21a1abd8e3
enable vpx_idct32x32_135_add_neon in hbd builds
...
BUG=webm:1294
Change-Id: Ide6d3994fe01c4320c9d143e6d059b49568048e4
2016-11-23 19:59:43 -08:00
Linfeng Zhang
05e2b5a59f
Merge "Add 32x32 d45 and 8x8, 16x16, 32x32 d135 NEON intra prediction"
2016-11-22 23:20:53 +00:00
Jerome Jiang
de5fd00ec5
Change *_xmm to *_sse2 in deblocker assembly functions.
...
Some cosmetic changes because xmm is an anachronism.
Change-Id: I436a5b78a3c52776c20d6640939311f2a84a9bc7
2016-11-17 23:38:04 +00:00
Linfeng Zhang
85c1ee434d
Add high bitdepth intra prediction NEON optimization (mode tm)
...
BUG=webm:1316
Change-Id: Ib014de06836ac12726f4a2c9f0833ec4eb4d233b
2016-11-15 14:19:46 -08:00
Linfeng Zhang
a3128ad33a
Add high bitdepth intra prediction NEON optimization (h and v)
...
BUG=webm:1316
Change-Id: I47eeac698a98a31d1af5f72441052302e9fa4f46
2016-11-12 12:00:19 -08:00
James Zern
80f6b243a7
Merge changes I339088b2,Iaade219e,If142afb1,I4257c4b3
...
* changes:
fdct8x8_test: add vpx_idct8x8_64_add_neon in hbd
fdct4x4_test: add vpx_idct4x4_16_add_neon in hbd
partial_idct_test,NEON: add missing idct variants
enable vpx_idct32x32_34_add_neon in hbd builds
2016-11-10 05:02:39 +00:00
Linfeng Zhang
40ab0424d4
Add high bitdepth intra prediction NEON optimization (mode d45 and d135)
...
BUG=webm:1316
Change-Id: I6a330874348df04df24a6d9efdc06f567e04bf8e
2016-11-09 12:04:04 -08:00
James Zern
738c8f23c6
enable vpx_idct32x32_34_add_neon in hbd builds
...
replace load_and_transpose_s16_8x8() in idct32_6_neon() with a separate
load_tran_low_to_s16() and transpose_s16_8x8(). the combined function is
used in idct32_8_neon() where the input is the correctly sized output
from the earlier stage.
BUG=webm:1294
Change-Id: I4257c4b3a421b2cf5d13651f966eee0680ef98a9
2016-11-08 17:03:36 -08:00
Johann
50b40f114c
Optimize idct32x32_135_add for NEON
...
BUG=webm:1295
Change-Id: I7f80ef4d29813fcb401fc6075babf19e3c195462
2016-11-08 22:06:07 +00:00
Linfeng Zhang
64a5a8fd6f
Merge "Add high bitdepth intra prediction NEON optimization (mode dc)"
2016-11-08 16:53:42 +00:00
Linfeng Zhang
d545c19afa
Rename vpx_highbd_idct8x8_10{*}() to vpx_highbd_idct8x8_12{*}()
...
Also update its trigger threshold from 10 to 12.
Change-Id: Ib8dddd87a5a22a12ca66e7084d342fbb027b0a2f
2016-11-07 09:07:55 -08:00
Linfeng Zhang
1868582e7d
Add 32x32 d45 and 8x8, 16x16, 32x32 d135 NEON intra prediction
...
Change-Id: I852616794244490123eb615ac750da50265f0fa5
2016-11-02 11:40:37 -07:00
Linfeng Zhang
3b74066b10
Add high bitdepth intra prediction NEON optimization (mode dc)
...
BUG=webm:1316
Change-Id: I984d6004ea2445e86f213fb6fa4d794a9955af8f
2016-11-01 17:07:36 -07:00
James Zern
3ae25974fd
idct,NEON: add a tran_low_t->s16 load adapter
...
enable idct4x4* and idct8x8* which are compatible for 8-bit decodes in
high-bitdepth mode. the adapter narrows 32-bit input to 16, whether the
expansion can be avoided at all in this case remains a TODO. roughly
matches sse2.
BUG=webm:1294
Change-Id: I3ea94e5a2070dfd509b5de0c555aab4e1f4da036
2016-10-31 11:21:16 -07:00
Johann
9720b58aac
Optimize idct32x32_34_add for NEON
...
Approximately 3 times faster than the 1024 version which was used
previously.
BUG=webm:1295
Change-Id: Id15fb3d096029ec38ef01c53e5f6eb08254347c9
2016-10-25 15:43:58 -07:00
Linfeng Zhang
9c8981c666
add vpx high bitdepth convolve8 NEON intrinsics optimization
...
BUG=webm:1299
Change-Id: I236bfa0441e357b6ff05add8269a2cfb543924d1
2016-10-17 15:23:54 -07:00
Linfeng Zhang
f910d14a1a
add vpx_highbd_convolve_{copy,avg}_neon()
...
BUG=webm:1299
Change-Id: Ib87ac466ada63251eb06ae2abd1e13e61e0d1538
2016-10-13 15:21:14 -07:00
Linfeng Zhang
01454ec485
[vpx highbd lpf NEON 6/6] vertical 16
...
BUG=webm:1300
Change-Id: I29d0b482d66f05e278325ddebcf108fbf0b6e222
2016-10-11 22:59:19 -07:00
Linfeng Zhang
27479775c4
[vpx highbd lpf NEON 5/6] horizontal 16
...
BUG=webm:1300
Change-Id: I21da32d6cfb8a1a6f58bc9756d17f48f13a59a12
2016-10-11 22:59:19 -07:00
Linfeng Zhang
251cbfbec8
[vpx highbd lpf NEON 4/6] vertical 8
...
BUG=webm:1300
Change-Id: If06b12bc081bab60059b100414dd7018f83ac62d
2016-10-11 22:59:19 -07:00
Linfeng Zhang
96c7206ede
[vpx highbd lpf NEON 3/6] horizontal 8
...
BUG=webm:1300
Change-Id: Ica2379e294be60b7f80fcfcec110dca4c3b59d81
2016-10-12 00:48:31 +00:00
Linfeng Zhang
49aa9b1f12
[vpx highbd lpf NEON 2/6] vertical 4
...
BUG=webm:1300
Change-Id: Ia33a9f2d6c7e2e6b3497ad6f1a09439a85b33983
2016-10-06 14:22:26 -07:00
Linfeng Zhang
7aa27bd62f
[vpx highbd lpf NEON 1/6] horizontal 4
...
BUG=webm:1300
Change-Id: Idf441806e6bf397ff5ecd8776146b3f781f50c40
2016-10-06 14:03:04 -07:00
James Zern
a6be7ba1aa
enable idct*_1_add_neon in high-bitdepth builds
...
these are compatible as they only load one element of the input so the
larger size of tran_low_t makes no difference in little endian builds.
note the asm is incompatible with big-endian, but there are other points of
failure there so currently it's considered unsupported.
BUG=webm:1294
Change-Id: Icd2665a0699bccae92d1bea43a95b0a83fb17028
2016-10-05 11:14:25 -07:00
James Zern
b6277a47c7
Merge changes from topic '8bit-hbd-idct'
...
* changes:
*idct*_neon.c: add missing rtcd include
idct,msa/neon: exclude idct files from hbd build
*rtcd_defs.pl: remove empty specialize calls
2016-09-30 19:36:08 +00:00
Linfeng Zhang
8c744fd978
Merge "Unify loopfilter function names"
2016-09-30 15:58:08 +00:00
James Zern
ed62d27c71
*rtcd_defs.pl: remove empty specialize calls
...
add_proto adds a 'c' specialization
Change-Id: I0ed0c2240d45264b0e0056ce7c8f63f4a00780bc
2016-09-29 20:38:26 -07:00
Linfeng Zhang
7f1f35183a
Unify loopfilter function names
...
Rename vpx_lpf_horizontal_edge_8() to vpx_lpf_horizontal_16().
Rename vpx_lpf_horizontal_edge_16() to vpx_lpf_horizontal_16_dual().
Change-Id: I798ca8fbbd657d06d3db2bfb0fb3321168f49e52
2016-09-29 16:25:42 -07:00
Johann
7b5a348088
vpx_dsp: clean up rtcd
...
Remove avx2+ssse3 specialization. Disabling ssse3 now automatically
disables avx2.
Change-Id: Id33eb6c85d1c4ee57128ebe45c995eb15cfcc765
2016-09-29 12:10:07 -07:00
James Zern
4b0e78bfda
Merge "vpx_dsp: added vpx_highbd_idct32x32_1_add_sse2()"
2016-09-08 01:05:18 +00:00
Scott LaVarnway
309125b1e7
vpx_dsp: added vpx_highbd_idct32x32_1_add_sse2()
...
Change-Id: I140d93aebadb0eaf6220881e61a0451450081227
2016-09-07 05:58:29 -07:00
Johann
d393885af1
Remove halfpix specialization
...
This function only exists as a shortcut to subpixel variance with
predefined offsets. xoffset = 4 for horizontal, yoffset = 4 for vertical
and both for "hv"
Removing this allows the existing optimizations for the variance
functions to be called. Instead of having only sse2 optimizations, this
gives sse2, ssse3, msa and neon.
BUG=webm:1273
Change-Id: Ieb407b423b91b87d33c4263c6a1ad5e673b0efd6
2016-08-23 17:05:39 -07:00
Linfeng Zhang
f9efbad392
NEON asm of vpx_lpf_{horizontal,vertical}_8_dual_neon()
...
Also expose the NEON intrinsics version.
BUG=webm:1261, webm:1266.
Change-Id: I8c4ae658467dcf66ebf7a75982b2ef712dbb4535
2016-08-16 08:50:57 -07:00
Linfeng Zhang
f09b5a3328
NEON intrinsics for 4 loopfilter functions
...
New NEON intrinsics functions:
vpx_lpf_horizontal_edge_8_neon()
vpx_lpf_horizontal_edge_16_neon()
vpx_lpf_vertical_16_neon()
vpx_lpf_vertical_16_dual_neon()
BUG=webm:1262, webm:1263, webm:1264, webm:1265.
Change-Id: I7a2aff2a358b22277429329adec606e08efbc8cb
2016-08-12 09:58:17 -07:00
Johann
d55724fae9
Remove armv6 target
...
Change-Id: I1fa81cc9cabf362a185fc3a53f1e58de533a41e5
2016-08-04 12:55:06 -07:00
Jim Bankoski
0dc69c70f7
postproc : fix function parameters for noise functions.
...
Change-Id: I582b6307f28bfc987dcf8910379a52c6f679173c
2016-07-15 08:27:34 -07:00
Jim Bankoski
88e6951465
deblock filter : moved from vp8 code branch
...
The deblocking filters used in vp8 have been moved to vpx_dsp for
use by both vp8 and vp9.
Change-Id: I5209d76edafc894b550f751fc76d3aa6799b392d
2016-07-12 05:53:00 -07:00
Jingning Han
7c1fdf02cd
Merge "Support measure distortion in the pixel domain"
2016-07-07 18:09:20 +00:00
Jingning Han
e357b9efe0
Support measure distortion in the pixel domain
...
Use pixel domain distortion metric in speed 0. This improves the
compression performance by 0.3% for both low and high resolution
test sets.
Change-Id: I5b5b7115960de73f0b5e5d0c69db305e490e6f1d
2016-07-06 18:25:17 -07:00