Commit Graph

16628 Commits

Author SHA1 Message Date
Jingning Han
39fff1bea0 Rework 8x8 transpose SSSE3 for avg computation
Use same transpose process as inv_txfm_sse2 does.

Change-Id: I2db05f0b254628a11f621c4c09abb89501ba6d3c
2017-01-12 15:16:07 -08:00
Jingning Han
f65170ea84 Rework 8x8 transpose SSSE3 for inverse 2D-DCT
Use same transpose process as inv_txfm_sse2 does.

Change-Id: Ic4827825bd174cba57a0a80e19bf458a648e7d94
2017-01-12 15:13:18 -08:00
Jingning Han
9a780fa7db Rework forward 8x8 2D-DCT ssse3 implementation
This commit reworks the SSSE3 implementation of the forward 8x8
2D-DCT. It uses a cyclic rotation approach to the temporary xmm
registers. It reduces the average cycles from 158 to 154. The SSE2
version uses 169 cycles.

Change-Id: I1b79b9642aae0ed3fb3cefb5b70246e6de5d5caa
2017-01-10 12:50:55 -08:00
Marco
91fc730d83 vp9: 1 pass cbr: Adjustments to usage of gf_cbr_boost and aq=3 mode.
When aq=3 mode is on and the gf_cbr_boost is set: make sure golden frame
is always refreshed, and don't incorporate segement cost in qp setting
on the boosted golden frame.

Better performance on RTC set with gf_cbr_boost on,
for example with gf_cbr_boost=50, gains from ~0.5-3%.

Change-Id: Ie811f5e4d444ff3320bd6e2c1745b2c4c09a8460
2017-01-10 09:42:06 -08:00
Jerome Jiang
299ef2f8eb Merge "vp9: Set less aggresive short_circuit_low_temp_var for HD at speed 8." 2017-01-10 00:51:09 +00:00
Jerome Jiang
198b834c97 vp9: Set less aggresive short_circuit_low_temp_var for HD at speed 8.
Quality improved by 1.866 and 0.386 for two noisy clips (dark720p and
marcooffice720p), respectively.

Change-Id: Ib33a7672ae9ca53da156208f7cd13f07b5543e44
2017-01-09 16:44:07 -08:00
Jerome Jiang
5b1a8ca5e8 Merge "Fix compile warnings for target=armv7-android-gcc" 2017-01-09 23:53:41 +00:00
James Zern
9480da21e8 Merge "Refine 8-bit 16x16 idct NEON intrinsics" 2017-01-09 23:52:29 +00:00
Marco Paniconi
62cce50d55 Merge "vp9: 1 pass cbr: Fix to qp clamping when gf_cbr_boost_pct is used." 2017-01-09 23:30:32 +00:00
Marco
35c4a13eb7 vp9: Fix comment in speed features.
Change-Id: I65d79c06b152922d725bf559adaa508f91cd5766
2017-01-09 13:05:31 -08:00
Marco
bea22782e9 vp9: 1 pass cbr: Fix to qp clamping when gf_cbr_boost_pct is used.
Avoid the qp-clamping on gf/alt frame if gf_cbr_boost_pct is set.

Change only affect CBR mode when  gf_cbr_boost_pct is set.

Change-Id: I0655ed4f2b047c8ed1ed33a070c17960ad776704
2017-01-09 12:52:50 -08:00
Johann Koenig
371a64bfe7 Merge "postproc: vpx_mbpost_proc_down_neon" 2017-01-09 19:53:15 +00:00
Johann Koenig
cabc29ba24 Merge "Add mips dspr2 partial idct tests" 2017-01-09 19:49:02 +00:00
Johann Koenig
8a7847c2c9 Merge "Fix mips dspr2 idct32x32 functions for large coefficient input" 2017-01-09 19:47:47 +00:00
Johann Koenig
bf168b24f5 Merge "Fix mips dspr2 idct16x16 functions for large coefficient input" 2017-01-09 19:47:00 +00:00
Johann Koenig
08d0a7fd0f Merge "Fix mips dspr2 idct8x8 functions for large coefficient input" 2017-01-09 19:46:18 +00:00
Johann Koenig
ab20869221 Merge "Fix mips dspr2 idct4x4 functions for large coefficient input" 2017-01-09 19:45:54 +00:00
Johann Koenig
7b18202e74 Merge "Add mips dspr2 vp9 intrapred tests" 2017-01-09 19:39:13 +00:00
Johann
c23970ec25 postproc: vpx_mbpost_proc_down_neon
This was much more amenable to optimization than the across filter.
Speedup of almost 2.5x

BUG=webm:1320

Change-Id: I49acc0f9cb2e7642303df90132cbc938acade4c4
2017-01-09 10:21:56 -08:00
Johann Koenig
9af97fb630 Merge "postproc: vpx_mbpost_proc_across_ip_neon" 2017-01-09 18:17:26 +00:00
Marco Paniconi
ebe0b57c91 Merge "vp9: 1 pass cbr mode: increase threshold for gf_cbr_boost_pct usage." 2017-01-09 17:23:12 +00:00
Kaustubh Raste
6377f9d966 Add mips dspr2 partial idct tests
Change-Id: Idf4003ea6f9a2a42a9f26e156bee73697acb7a37
2017-01-09 17:30:16 +05:30
Kaustubh Raste
50dd3eb62c Fix mips dspr2 idct32x32 functions for large coefficient input
Change-Id: If9da7099f226a27a09cc9e2899eb66a1158909d2
2017-01-09 17:21:09 +05:30
Kaustubh Raste
c06991fce6 Fix mips dspr2 idct16x16 functions for large coefficient input
Change-Id: I9be3d3d040837f658c6314606e28db8c31092a1a
2017-01-09 16:35:28 +05:30
Kaustubh Raste
24d804f79c Fix mips dspr2 idct8x8 functions for large coefficient input
Change-Id: If011dd923bbe976589735d5aa1c3167dda1a3b61
2017-01-09 16:22:19 +05:30
Kaustubh Raste
afd2d797eb Fix mips dspr2 idct4x4 functions for large coefficient input
Change-Id: I06730eec80ca81e0b7436d26232465b79f447e89
2017-01-09 15:28:30 +05:30
Kaustubh Raste
c6ccd1e939 Add mips dspr2 vp9 intrapred tests
Change-Id: I6be8c59ee220af0597bc2d7213f2779ac2e88db9
2017-01-09 14:11:57 +05:30
Linfeng Zhang
6abdd31555 Refine 8-bit 16x16 idct NEON intrinsics
Speed test shows 25% gain on vpx_idct16x16_256_add_neon(),
and vpx_idct16x16_10_add_neon() got trippled.

Change-Id: If8518d9b6a3efab74031297b8d40cd83c4a49541
2017-01-06 17:52:07 -08:00
Hui Su
c7e2bd6298 Merge "Add support for VP9 level targeting" 2017-01-07 00:55:41 +00:00
Johann
4dca923454 postproc: vpx_mbpost_proc_across_ip_neon
The speedup is pretty poor. I would be concerned except the SSE2 is
worse:
Existing SSE2 improvement: 22%
New neon improvement: 35%

BUG=webm:1320

Change-Id: Ied598a261134aa6cbe69f96f58589d2bae17bf62
2017-01-06 16:39:17 -08:00
Marco
f1909d26f8 vp9: 1 pass cbr mode: increase threshold for gf_cbr_boost_pct usage.
Increase the boost threshold below which GOLDEN update will use same
rate correction factor as INTER_NORMAL.

Improves performance when gf_cbr_boost_pct is set (between 0 and 100)
in CBR mode.

Change-Id: I9f54cc18664786a100b13a416b7137ae03bd0cab
2017-01-06 15:37:10 -08:00
Jerome Jiang
316071d79c Merge "vp9: Enable more aggresive short circuit for speed 8." 2017-01-06 22:38:40 +00:00
Marco Paniconi
b632626ec0 Merge "vp9: Add some controls to sample encoder: vpx_temporal_svc_encoder" 2017-01-06 22:34:49 +00:00
Jerome Jiang
b87ebd7af8 Merge "vp9: Compute source sad for every superblock when partition copy is on." 2017-01-06 21:57:27 +00:00
Marco
bf5cdbdf9d vp9: Add some controls to sample encoder: vpx_temporal_svc_encoder
Add the gf boost and frame_parallel controls.
Set as default to off.

Change-Id: Id85fcb16a4fae97f51c09e9ebadb5cdcd510c2f5
2017-01-06 11:34:04 -08:00
Jerome Jiang
267e73446c vp9: Enable more aggresive short circuit for speed 8.
Set short_circuit_low_temp_var to 3 for speed 8 for all res.
No strong visual difference on all clips.

Change-Id: Ia6d9a314291ab1c14d5421bbdd769974083aeb2a
2017-01-06 10:23:34 -08:00
hui su
337ad83e58 Add support for VP9 level targeting
Constraints on encoder config:
-target_bandwidth is no larger than 80% of level bitrate limit
-target_bandwidth * (1 + max_over_shoot_pct) is no larger than
88% of level bitrate limit
-min_gf_interval is no smaller than level limit
-tile_columns is no larger than level limit

Constraints on rate control:
-current frame size plus previous three frames' size is no larger
than the CPB level limit
-current frame size is no larger than 50%/40%/20% of the CPB
level limit if it's a key/alt-ref/other frame.

Change-Id: I84d1a2d6d6e3c82bfd533b3309ce999cfaba2c8b
2017-01-06 10:07:31 -08:00
Jerome Jiang
afc8c4836f vp9: Compute source sad for every superblock when partition copy is on.
The source sad could be used to copy the partition without going into
choose_partitioning function to speed up vp9 encoding. Computing source
sad takes little time. Speed test on Android and Linux shows little
encoding time gain (less than 1.4%).

Turned off for now since partition copy is turned off.

Change-Id: I61c9d5b8f22329760cb29a4ee30a7f9c232ce8d3
2017-01-06 17:59:02 +00:00
Linfeng Zhang
2d12a52ff0 Merge "Add high bitdepth 8x8 idct NEON intrinsics" 2017-01-06 16:47:23 +00:00
Linfeng Zhang
90f889a56d Merge "Clean DC only idct NEON intrinsics" 2017-01-06 01:16:19 +00:00
Jerome Jiang
72746c079d vp9: Set short circuit to level 3 for VGA for speed 8.
vp9: Set short circuit to level 3 for VGA for speed 8. Also change the
threshold_32x32 to 5/8*thresholds[1] to improve quality regression
caused to VGA clips.

Change-Id: Ia1590e91e7cb22be78d5b85013387bb1be4272e3
2017-01-04 11:28:31 -08:00
Marco Paniconi
1ca1515dd3 Merge "vp9: 1 pass cbr: allow noise estimation down to 360p." 2017-01-04 17:24:08 +00:00
Marco
768b1f7281 vp9: 1 pass cbr: allow noise estimation down to 360p.
Also adjust some thresholds for noise level setting.

Change-Id: I7e03d7057ef2061c9447728deb9c6aff5d3da4b7
2017-01-03 16:26:22 -08:00
Marco
63a8257fb7 vp9: SVC unittests: fix to use y4m source.
Comment out check on buffer underrun, as it currently fails
on some of the svc tests.

Also cast the update of bits_in_buffer_model_, as this can
go negative now due to the buffer underrun.
This fixes the issue in #1352.

BUG=webm:1350
BUG=webm:1352

Change-Id: Ibd4ef23921daf09e5c15b000aca904aa4573599c
2017-01-03 15:29:04 -08:00
Yunqing Wang
99c573f018 Merge "Fix for out of range motion vector bug in joint motion search" 2017-01-03 17:46:15 +00:00
Ranjit Kumar Tulabandu
b67e1f701f Fix for out of range motion vector bug in joint motion search
Clamped the initial mv in vp9_refining_search_8p_c.

BUG=webm:1354

Change-Id: I47d302b350937e3e6e52e95c983b5fb0b4c64fba
2017-01-03 09:12:32 -08:00
Yunqing Wang
ecdb6a00c2 Merge "Make sub-pixel mv search's return value consistent with the return type" 2016-12-29 19:16:01 +00:00
Yunqing Wang
c96a8dcb5b Merge "Bug fix to avoid random crashes during ARNR filtering" 2016-12-29 17:24:24 +00:00
Gabriel Marin
e6b9609fc0 Merge "Remove superfluous conditional on 'shortcut'" 2016-12-29 06:03:43 +00:00
Linfeng Zhang
911bb980b1 Clean DC only idct NEON intrinsics
BUG=webm:1301

Change-Id: Iffc83854218460b3f687f3774e71d45b552382a5
2016-12-28 13:51:44 -08:00