Johann
d6a7489dd5
neon variance: process two rows of 8 at a time
...
When the width is equal to 8, process two rows at a time. This doubles
the speed of 8x4 and improves 8x8 by about 20%.
8x16 was using this technique already, but still improved a little bit
with the rewrite.
Also use this for vpx_get8x8var_neon
BUG=webm:1422
Change-Id: Id602909afcec683665536d11298b7387ac0a1207
2017-05-04 08:59:46 -07:00
Johann
cb9133c72f
neon variance: add small missing sizes
...
Some of the mixed sizes were missing. They can be implemented trivially
using the existing helper function.
When comparing the previous 16x8 and 8x16 implementations, the helper
function is about 10% faster than the 16x8 version. The 8x16 is very
close, but the existing version appears to be faster.
BUG=webm:1422
Change-Id: Ib0e856083c1893e1bd399373c5fbcd6271a7f004
2017-05-04 08:59:42 -07:00
Yi Luo
a24e1e8027
Merge "High bit depth inter prediction horizontal/vertical filters AVX2"
2017-05-04 15:43:21 +00:00
Linfeng Zhang
2231669a83
Split dsp/x86/inv_txfm_sse2.c
...
Spin out highbd idct functions.
BUG=webm:1412
Change-Id: I0cfe4117c00039b6778c59c022eee79ad089a2af
2017-05-03 15:43:02 -07:00
Linfeng Zhang
d5de63d2be
Update highbd idct functions arguments to use uint16_t dst
...
BUG=webm:1388
Change-Id: I3581d80d0389b99166e70987d38aba2db6c469d5
2017-05-03 13:59:16 -07:00
Linfeng Zhang
081b39f2b7
Clean CONVERT_TO_BYTEPTR/SHORTPTR in idct
...
BUG=webm:1388
Change-Id: Ida62c941f2b836d6c9e27b427a7d5008ab6dc112
2017-05-03 13:58:31 -07:00
Hui Su
5048d6e7ee
Merge "vp9 level: add tentative max cpb values for high levels"
2017-05-03 20:51:03 +00:00
Hui Su
f701a44305
Merge "Adjust alt-ref selection in define_gf_group()"
2017-05-03 20:50:29 +00:00
Yi Luo
a3452996a1
High bit depth inter prediction horizontal/vertical filters AVX2
...
User level speed improvement on i7-6700, cpu-used=1,
x86_64 Linux, bitrate, 1080p, 8Mbps, 4K, 16Mbps:
- Decoder:
1080p: ~4%
4K: ~5%
- Encoder:
1080p: ~1%
4K: ~3%
Change-Id: I51b48f9c5de0d62487d5a11aa579c97bd03dd640
2017-05-03 12:18:01 -07:00
Linfeng Zhang
a10a5cb356
Merge changes I8bb660de,Ica51d780,I6037525d
...
* changes:
Clean specializes of idct functions
Clean add_protos of highbd idct functions
Clean add_protos of idct functions
2017-05-03 19:17:55 +00:00
James Zern
5599e4275a
Merge changes Ia5293d94,I90d481d3,Ia509d622,I54549b03,I89b635d6
...
* changes:
ppc: Add convolve8_vsx and convolve8_avg_vsx
ppc: Add convolve8_avg_vert_vsx
ppc: Add convolve8_vert
ppc: Add convolve8_horiz_avg
ppc: Add convolve8_horiz
2017-05-03 03:31:19 +00:00
Luca Barbato
e2ad89092d
ppc: Add convolve8_vsx and convolve8_avg_vsx
...
Change-Id: Ia5293d948003a7fff5a7cbad6e83d8a72717c857
2017-05-02 20:27:47 -07:00
Luca Barbato
e6ca81ee67
ppc: Add convolve8_avg_vert_vsx
...
Only the generic one again, speedups for 8x8 and larger blocks to
come later.
Change-Id: I90d481d3a602d1e277ead8f3934eca126b86b72d
2017-05-02 20:27:42 -07:00
Luca Barbato
a65f1771ad
ppc: Add convolve8_vert
...
Only the generic one again, speedups for 8x8 and larger blocks
to come later.
Change-Id: Ia509d6225984b4930ec03928c9bcbf51486da99f
2017-05-02 20:27:33 -07:00
Luca Barbato
77772350f3
ppc: Add convolve8_horiz_avg
...
The 8x8 and larger blocks cases can be sped up further.
Change-Id: I54549b03ac6c7a4e3f485738b100c3cac7ac2e15
2017-05-02 20:27:28 -07:00
Luca Barbato
08edb85bd0
ppc: Add convolve8_horiz
...
The 8x8 and larger blocks cases can be sped up further.
Change-Id: I89b635d6b01c59f523f2d54b1284ed32916c5046
2017-05-02 20:27:16 -07:00
Linfeng Zhang
0178d974e5
Clean specializes of idct functions
...
Change-Id: I8bb660de47b5f97263ec381dc428db96e9c9a4b2
2017-05-02 18:01:19 -07:00
Linfeng Zhang
4412996d59
Clean add_protos of highbd idct functions
...
Change-Id: Ica51d780b92b316ce9112740c56cdf7670816371
2017-05-02 17:59:38 -07:00
Linfeng Zhang
a7a57d9756
Clean add_protos of idct functions
...
Change-Id: I6037525d92ec172810edab720389eb1865ed3b1a
2017-05-02 17:58:40 -07:00
Johann Koenig
240a5a15ef
Merge "block error sse2: sum in 32 bits when possible"
2017-05-02 14:16:47 +00:00
Johann
cd94d5f68e
block error avx2: rename variables
...
Change-Id: I2b8a9253f2c3d1fd85304c2970ebe70213870fe9
2017-05-01 17:54:29 -07:00
Johann Koenig
b1a31f8066
Merge "block error avx2: sum in 32 bits when possible"
2017-05-02 00:52:59 +00:00
Marco Paniconi
1e112bce37
Merge "vp9: SVC: Early exit on golden ref in non-rd pickmode."
2017-05-01 21:04:52 +00:00
Linfeng Zhang
e8655d49f5
Merge "Clean vp9_highbd_build_inter_predictor() and highbd_inter_predictor()"
2017-05-01 19:54:40 +00:00
Johann Koenig
3d33a462b3
Merge "move vp9_error_intrin_avx2.c"
2017-05-01 19:52:36 +00:00
Kyle Siefring
760c214519
block error avx2: sum in 32 bits when possible
...
Add 31bit pairs before unpacking in x86 block error code
AVX2 code provides a very minor performance improvement.
BUG=webm:1210
Change-Id: I4c82308eaf65741dca2f5c6db9be9c85f905073a
2017-05-01 12:51:33 -07:00
James Zern
ee3df31d74
Merge "vpx_scale_test: fix segfault on alloc failure"
2017-05-01 19:22:22 +00:00
Marco
ae0215f945
vp9: SVC: Early exit on golden ref in non-rd pickmode.
...
For SVC 1 pass real-time: add condition to skip the
golden (spatial) reference mode in non-rd pickmode.
Condition is to skip golden if the sse of zeromv-last mode
is below threshold. And change order in ref_mode_set_svc
to make sure golden zeromv is tested after last-nearest.
Speedup ~3-4% with little/negligible quality loss.
Change-Id: I6cbe314a93210454ba2997945f714015f1b2fca3
2017-05-01 10:36:54 -07:00
Kyle Siefring
8394990b27
block error sse2: sum in 32 bits when possible
...
Add 31bit pairs before unpacking in x86 block error code
BUG=webm:1210
Change-Id: I5ca8c7f7775585a17fe09d6bbfc25e1f2955eb0a
2017-05-01 09:59:18 -07:00
Johann
2ff01aa1e4
move vp9_error_intrin_avx2.c
...
There is only one avx2 implementation. Drop '_intrin'
Change-Id: I887a0d27d58567eaad49f749f127eca61313f312
2017-05-01 09:13:01 -07:00
James Zern
2930903d51
vpx_scale_test: fix segfault on alloc failure
...
check the return of ResetImage() before continuing
Change-Id: Iff0b038f7b9761113b8cf33a511a5306640d1273
2017-04-29 13:12:53 -07:00
Luca Barbato
d51d3934f5
ppc: Add convolve_avg
...
Change-Id: Ib203c444c708f42072e38301ee3db97b5b53d014
2017-04-29 15:47:25 +02:00
Luca Barbato
63860ba7b8
ppc: Add convolve_copy
...
Change-Id: Ie26d6dbe090e711d84bac01ba7da270db983f405
2017-04-29 15:47:25 +02:00
Johann Koenig
ef5918098d
Merge "Use uint32_t for accumulator"
2017-04-28 18:32:09 +00:00
Jerome Jiang
ce2e278059
Merge "vp9: Fix condition for disabling adaptive_rd_thresh."
2017-04-28 18:10:36 +00:00
Jerome Jiang
04de501229
vp9: Fix condition for disabling adaptive_rd_thresh.
...
Add speed constrains for disabling adaptive_rd_thresh when
row_mt_bit_exact is set.
Change-Id: I2445115c2f9a2e46b8a0966031a0fea488d4964e
2017-04-28 10:26:20 -07:00
Jerome Jiang
bea27a5809
Merge "Generalize vp9 sse2 denoiser test for other platforms."
2017-04-28 15:45:52 +00:00
Johann
657f3e9f14
Use uint32_t for accumulator
...
Be specific about the data type size.
Use convenience macro vp9_zero_array.
Change-Id: I5fadf7dbd408befb73820d85db0be4832e8cfcbd
2017-04-28 06:36:59 -07:00
Johann Koenig
94ebdba71d
Merge "vp9 temporal filter: sse4 implementation"
2017-04-28 13:22:41 +00:00
Jerome Jiang
26aebd77b8
Generalize vp9 sse2 denoiser test for other platforms.
...
Renamed to vp9_denoiser_test.
Change-Id: I0d8f4c94bcb81a60949a13d9fe839cee95d03f77
2017-04-27 22:47:41 -07:00
Yaowu Xu
0e8fea6c13
Merge "VP9: enable trellis for high bitdepth intra"
2017-04-28 00:16:56 +00:00
James Zern
ef15d38df0
Merge "webm_read_frame: avoid NULL dereference"
2017-04-27 21:47:10 +00:00
Johann
6dfeea6592
vp9 temporal filter: sse4 implementation
...
Approximates division using multiply and shift.
Speeds up both sizes (8x8 and 16x16) by 30 times.
Fix the call sites to use the RTCD function.
Delete sse2 and mips implementation. They were based on a previous
implementation of the filter. It was changed in Dec 2015:
ece4fd5d22
BUG=webm:1378
Change-Id: I0818e767a802966520b5c6e7999584ad13159276
2017-04-26 22:03:05 -07:00
Jerome Jiang
43e0e082d1
vp9: Don't force disabling of adaptive_rd_thresh for realtime.
...
Don't force disabling of adaptive_rd_thresh for realtime when
row_mt_bit_exact is set.
Row based adaptive rd is made usable in CL
454882(https://chromium-review.googlesource.com/c/454882 ) for REALTIME.
Change-Id: Ief023414f0fd6eb86f299dd46ae58f4436875af5
2017-04-26 13:17:57 -07:00
Yunqing Wang
b68f14d0ed
Merge "Make the row based multi-threaded encoder deterministic"
2017-04-26 16:12:14 +00:00
Linfeng Zhang
54c4e0f7a5
Merge "Update highbd convolve functions arguments to use uint16_t src/dst"
2017-04-26 15:50:46 +00:00
Marco Paniconi
004fab120a
Merge "vp9: SVC: Adjust some speed settings for temporal layers."
2017-04-26 15:45:06 +00:00
Peter de Rivaz
66117b97c5
VP9: enable trellis for high bitdepth intra
...
BUG=webm:1409
Change-Id: I5236595aac1c09386c60ffe8ad621e01422ed5a7
2017-04-26 11:43:01 +01:00
hui su
d01c9febe9
vp9 level: add tentative max cpb values for high levels
...
Add tentative max cpb size values for levels 5.2 and up. Otherwise
encoding will fail when targeting for these levels.
Change-Id: Ib7e0ba4b9836ea1ac900b6822543812843d48463
2017-04-25 18:03:55 -07:00
hui su
8069f31076
Adjust alt-ref selection in define_gf_group()
...
107de19698
changes the encoder alt-ref selection behavior. Assuming
min_gf_interval = max_gf_interval = 4, the frame order would be
frm_1 arf_1 frm_2 frm_3 frm_4 frm_5 arf_2 before 107de19698;
frm_1 arf_1 frm_2 frm_3 frm_4 arf_2 frm_5 after 107de19698
.
This patch reverts such alt-ref placement change.
Change-Id: I93a4a65036575151286f004d455d4fcea88a1550
2017-04-25 18:03:47 -07:00