Commit Graph

17155 Commits

Author SHA1 Message Date
Luca Barbato
e2ad89092d ppc: Add convolve8_vsx and convolve8_avg_vsx
Change-Id: Ia5293d948003a7fff5a7cbad6e83d8a72717c857
2017-05-02 20:27:47 -07:00
Luca Barbato
e6ca81ee67 ppc: Add convolve8_avg_vert_vsx
Only the generic one again, speedups for 8x8 and larger blocks to
come later.

Change-Id: I90d481d3a602d1e277ead8f3934eca126b86b72d
2017-05-02 20:27:42 -07:00
Luca Barbato
a65f1771ad ppc: Add convolve8_vert
Only the generic one again, speedups for 8x8 and larger blocks
to come later.

Change-Id: Ia509d6225984b4930ec03928c9bcbf51486da99f
2017-05-02 20:27:33 -07:00
Luca Barbato
77772350f3 ppc: Add convolve8_horiz_avg
The 8x8 and larger blocks cases can be sped up further.

Change-Id: I54549b03ac6c7a4e3f485738b100c3cac7ac2e15
2017-05-02 20:27:28 -07:00
Luca Barbato
08edb85bd0 ppc: Add convolve8_horiz
The 8x8 and larger blocks cases can be sped up further.

Change-Id: I89b635d6b01c59f523f2d54b1284ed32916c5046
2017-05-02 20:27:16 -07:00
Luca Barbato
d51d3934f5 ppc: Add convolve_avg
Change-Id: Ib203c444c708f42072e38301ee3db97b5b53d014
2017-04-29 15:47:25 +02:00
Luca Barbato
63860ba7b8 ppc: Add convolve_copy
Change-Id: Ie26d6dbe090e711d84bac01ba7da270db983f405
2017-04-29 15:47:25 +02:00
Johann Koenig
ef5918098d Merge "Use uint32_t for accumulator" 2017-04-28 18:32:09 +00:00
Jerome Jiang
ce2e278059 Merge "vp9: Fix condition for disabling adaptive_rd_thresh." 2017-04-28 18:10:36 +00:00
Jerome Jiang
04de501229 vp9: Fix condition for disabling adaptive_rd_thresh.
Add speed constrains for disabling adaptive_rd_thresh when
row_mt_bit_exact is set.

Change-Id: I2445115c2f9a2e46b8a0966031a0fea488d4964e
2017-04-28 10:26:20 -07:00
Jerome Jiang
bea27a5809 Merge "Generalize vp9 sse2 denoiser test for other platforms." 2017-04-28 15:45:52 +00:00
Johann
657f3e9f14 Use uint32_t for accumulator
Be specific about the data type size.

Use convenience macro vp9_zero_array.

Change-Id: I5fadf7dbd408befb73820d85db0be4832e8cfcbd
2017-04-28 06:36:59 -07:00
Johann Koenig
94ebdba71d Merge "vp9 temporal filter: sse4 implementation" 2017-04-28 13:22:41 +00:00
Jerome Jiang
26aebd77b8 Generalize vp9 sse2 denoiser test for other platforms.
Renamed to vp9_denoiser_test.

Change-Id: I0d8f4c94bcb81a60949a13d9fe839cee95d03f77
2017-04-27 22:47:41 -07:00
Yaowu Xu
0e8fea6c13 Merge "VP9: enable trellis for high bitdepth intra" 2017-04-28 00:16:56 +00:00
James Zern
ef15d38df0 Merge "webm_read_frame: avoid NULL dereference" 2017-04-27 21:47:10 +00:00
Johann
6dfeea6592 vp9 temporal filter: sse4 implementation
Approximates division using multiply and shift.

Speeds up both sizes (8x8 and 16x16) by 30 times.

Fix the call sites to use the RTCD function.

Delete sse2 and mips implementation. They were based on a previous
implementation of the filter. It was changed in Dec 2015:
ece4fd5d22

BUG=webm:1378

Change-Id: I0818e767a802966520b5c6e7999584ad13159276
2017-04-26 22:03:05 -07:00
Jerome Jiang
43e0e082d1 vp9: Don't force disabling of adaptive_rd_thresh for realtime.
Don't force disabling of adaptive_rd_thresh for realtime when
row_mt_bit_exact is set.

Row based adaptive rd is made usable in CL
454882(https://chromium-review.googlesource.com/c/454882) for REALTIME.

Change-Id: Ief023414f0fd6eb86f299dd46ae58f4436875af5
2017-04-26 13:17:57 -07:00
Yunqing Wang
b68f14d0ed Merge "Make the row based multi-threaded encoder deterministic" 2017-04-26 16:12:14 +00:00
Linfeng Zhang
54c4e0f7a5 Merge "Update highbd convolve functions arguments to use uint16_t src/dst" 2017-04-26 15:50:46 +00:00
Marco Paniconi
004fab120a Merge "vp9: SVC: Adjust some speed settings for temporal layers." 2017-04-26 15:45:06 +00:00
Peter de Rivaz
66117b97c5 VP9: enable trellis for high bitdepth intra
BUG=webm:1409

Change-Id: I5236595aac1c09386c60ffe8ad621e01422ed5a7
2017-04-26 11:43:01 +01:00
Jerome Jiang
15ee8a8c45 Merge "Fix the decoder seg fault when frame is corrupted." 2017-04-26 00:09:29 +00:00
Jerome Jiang
997e54ea43 Merge "vp9: speed >= 8: Skip uv variance in model_rd_sb_y_large" 2017-04-26 00:09:22 +00:00
Marco
c614164cb6 vp9: SVC: Adjust some speed settings for temporal layers.
Make some speed setting changes for temporal enhancement layers,
and remove the switch in subpel_force_stop for the aggressive_base_mv
in non-rd pickmode.

Gain some 2-3% speed with little/negligible quality loss.

Change-Id: I3e2a7f80ff45f38c0a6ceb01b34dbca2f53edbf0
2017-04-25 16:27:01 -07:00
Jerome Jiang
69b0242e9a vp9: speed >= 8: Skip uv variance in model_rd_sb_y_large
For speed >= 8 and color_sensitivity not set, skip the transform
skipping test in UV planes.
Add a new condition to check noise level to skip chroma check
for speed >= 8 if y_sad is high.

1~2% speedup on ARM for speed 8.

Borg tests show neutral results in both rtc and rtc_derf.

Change-Id: Idecd3ff6e28c97757a43bb6f3a7082c85f72109c
2017-04-25 16:21:36 -07:00
Linfeng Zhang
51dc998f3a Update highbd convolve functions arguments to use uint16_t src/dst
BUG=webm:1388

Change-Id: I6912de2639895d817ce850da8ea9f6c8fe21da42
2017-04-25 14:22:19 -07:00
James Zern
0be513e8e8 webm_read_frame: avoid NULL dereference
block may be NULL with block_entry_eos or from return of GetBlock()

Change-Id: Ia0dd3ffa46305ee70efcdc55c05c2ad24efc993b
2017-04-25 12:34:23 -07:00
Marco
92ec0674fd vp9; Reduce artifact in non-rd pickmode for lighting changes.
Add a low-variance high-sumdiff to the superblock content state
and use it to limit the mv and bias some decisions in non-rd pickmode.
Only affects speed >= 6.

Reduces artifact for lighting changes.
Small/no difference in metrics on RTC set.

Change-Id: Ic84b2379fe0ae3fa71ae826ee6bae3eaf551a25b
2017-04-24 17:08:43 -07:00
Yunqing Wang
10a497bd38 Make the row based multi-threaded encoder deterministic
This patch followed allow_exhaustive_searches feature modification and
continued to modify the encoder to achieve the determinism in the row
based multi-threaded encoding. While row-mt = 1 and using multiple
threads, the adaptive feature in encoder was disabled, which gave
BDRate gain(at speed 1, -0.6% ~ -0.7%; at speed 2, -0.46% ~ -0.59%),
but some encoder speed losses(7% ~ 10% at speed 1 and 3% ~ 6% at
speed 2). These speed losses were acceptable considering the speed
gains obtained from row-mt.

Change-Id: I60d87a25346ebc487a864b57d559f560b7e398bb
2017-04-24 16:28:27 -07:00
Yunqing Wang
c530208ae3 Merge "Make allow_exhaustive_searches feature no longer adaptive" 2017-04-24 17:41:10 +00:00
Marco Paniconi
b35f64241f Merge "vp9: SVC: fix condition for partition/skip threshold when denoising." 2017-04-21 21:28:17 +00:00
Yunqing Wang
bca4564683 Make allow_exhaustive_searches feature no longer adaptive
A previous patch turned on allow_exhaustive_searches feature only for
FC_GRAPHICS_ANIMATION content. This patch further modified the feature
by removing the exhaustive search limit, and made it no longer adaptive.
As a result, the 2 counts that recorded the number of motion searches
were removed, which helped achieve the determinism in the row based
multi-threading encoding. Tests showed that this patch didn't cause
the encoder much slower.

Used exhaustive_searches_thresh for this speed feature, and removed
allow_exhaustive_searches. Also, refactored the speed feature code
to follow the general speed feature setting style.

Change-Id: Ib96b182c4c8dfff4c1ab91d2497cc42bb9e5a4aa
2017-04-21 11:14:02 -07:00
Jerome Jiang
58fe1bde59 Merge "vp9: Non-rd pickmode: Avoid computation duplication." 2017-04-21 00:51:47 +00:00
Marco
5de0e9ed08 vp9: SVC: fix condition for partition/skip threshold when denoising.
The more aggressive settings should only be used when denoise_svc
condition is satisfied (which means top spatial layer).

Change-Id: Ia8e3515b27f31bf21b1976ca80a2fa826daece3a
2017-04-20 16:36:55 -07:00
Jerome Jiang
7ae1e321a1 vp9: Non-rd pickmode: Avoid computation duplication.
In non-rd pickmode (speed >= 5), avoid duplication of computations in
model_rd_for_sb_y when the speed feature use_simple_block_yrd is
enabled (or for high bitdepth build under certain conditions).

QVGA, VGA and HD have 1.23%, 2.68% and 1.7% speedup on ARM for speed 8,
respectively.

Encoding results are bitexact for speed >= 5.

Change-Id: I3f9130810c21439f5ad7e159e21cb2243dcd05f1
2017-04-20 16:20:59 -07:00
Jerome Jiang
25c1bada72 Fix the decoder seg fault when frame is corrupted.
BUG=webm:1399

Change-Id: I1e006e0260d9b56a4d2273659ca19b86c69c474b
2017-04-20 14:55:42 -07:00
Marco
29938b3a5a vp9: 1 pass SVC: Fix comment and condition for up-sampling reference.
No change in behavior.

Change-Id: I218fb30289091da623acb23324027435b8510d0e
2017-04-20 14:21:05 -07:00
Yunqing Wang
30ef50b522 Merge "Only allow allow_exhaustive_searches for FC_GRAPHICS_ANIMATION content" 2017-04-20 19:57:46 +00:00
Marco Paniconi
17559cd8b5 Merge "vp9: Re-enable SVC datarate tests." 2017-04-20 19:53:20 +00:00
Marco
85ca2e8a8b vp9: Re-enable SVC datarate tests.
Re-enable the SVC tests, wrap the non-zero expectation
in GetMismatchFrames around #if CONFIG_VP9_DECODER.

Change-Id: I0e8a2d78b868c32f18fe597540f397d3a1b303b5
2017-04-20 12:08:08 -07:00
Marco
3134a52d26 vp9: SVC: Redefine the source downsample filter choice.
Rename the source downsampling filter, and define it
per spatial layers. Used 1 pass CBR SVC.

Change-Id: I8135f2ab89c535c53429b9c58b586f746bb668c7
2017-04-20 10:17:13 -07:00
Luca Barbato
8975436466 ppc: Add the intra predictor tests
Change-Id: Idea15b916044ab3d8e74519337880a484ecfd87e
2017-04-19 20:21:40 -07:00
Luca Barbato
914b160fb5 ppc: h predictor 8x8
Slightly faster with the current compiler.

Change-Id: Iae225fac08395eb430c97a2abec69c60f5cf5c47
2017-04-19 19:57:51 -07:00
Luca Barbato
0b9be93205 ppc: d63 predictor 8x8
10x faster.

Change-Id: I7cedbf4df2ce7df5b6f1108b11815d088fdb9ba8
2017-04-19 19:57:51 -07:00
Luca Barbato
ee9325b0bd ppc: tm predictor 4x4
Slightly faster.

Change-Id: I0ca43f309b3d9b50435d69bd5be64b53a99bd191
2017-04-19 19:57:51 -07:00
Luca Barbato
2904eb5800 ppc: h predictor 4x4
2x faster.

Change-Id: I0583dec353299c6797401b646099f18db4e0420d
2017-04-19 19:57:51 -07:00
Luca Barbato
58245d7050 ppc: dc predictor 8x8
Slightly faster, the other dc predictors cannot be faster since
the computation speedup is overwhelmed by the time spent reading
dst to write just the 8x8 part.

Change-Id: I94a0b50500adf8b7b6bb919dbf5c7adf5b9fba66
2017-04-19 19:57:51 -07:00
Luca Barbato
6b4a65e8b1 ppc: d45 predictor 8x8
11x faster.

Change-Id: I5b8f39213ee1f5260724fc254e3fb5c462435798
2017-04-19 19:57:51 -07:00
Luca Barbato
92e33c7b31 ppc: d63 predictor 32x32
About 10x faster.

Change-Id: If7d0645f75c5d7deb9751edd0bf47e2f9068e9e7
2017-04-19 19:57:51 -07:00