Commit Graph

17734 Commits

Author SHA1 Message Date
Shiyou Yin
9e4647c7ab vpx_dsp:loongson optimize vpx_varianceWxH_c,vpx_sub_pixel_varianceWxH_c and vpx_sub_pixel_avg_varianceWxH_c with mmi.
Change-Id: Ia576a721df6312329b599c31cfe1fb1267a9f174
2017-08-25 01:58:49 +08:00
Shiyou Yin
d080c92524 Merge "vpx_dsp:loongson optimize vpx_mseWxH_c(case 16x16,16X8,8X16,8X8) with mmi." 2017-08-24 00:55:11 +00:00
Marco Paniconi
30c261b1eb Merge "vp9: SVC: Skip NEWMV for small blocks for (0, 0) base_mv." 2017-08-23 23:09:33 +00:00
Johann Koenig
f53b656207 Merge "quantize avx: copy implementation to intrinsics" 2017-08-23 21:14:13 +00:00
Marco
c9ff7b6637 vp9: SVC: Skip NEWMV for small blocks for (0, 0) base_mv.
For SVC encoding:
average speedup ~1.5%, with small ~0.57 loss in avgPSNR metrics.

Change-Id: Icebce6f6ef4e819d7dfcf8db898c583167351de4
2017-08-23 13:08:27 -07:00
Scott LaVarnway
1aad50c092 Merge "vpx_dsp: get32x32var_avx2() cleanup" 2017-08-23 19:59:25 +00:00
Johann Koenig
dfafd10ef5 Merge "quantize neon: round dqcoeff towards zero" 2017-08-23 19:20:53 +00:00
Johann
7c27872164 quantize avx: copy implementation to intrinsics
Adds an early exit based on ptest. Slightly slower than ssse3 in the
full case because of the extra check, but potentially faster if lots of
rows can be skipped.

Very close in speed to the assembly.

Can run in 32 bit, unlike the assembly. Allows reworking the function
prototype to use structs.

Change-Id: If80e2b9ba059370a4cad3c973196e82a97b4330e
2017-08-23 09:19:16 -07:00
Johann
2a5aa98a35 quantize neon: round dqcoeff towards zero
Add 1 if negative to get dqcoeff to round towards zero.

10-15% faster than converting to positive before shifting.

Change-Id: I01a62fd0c9bca786b6885b318bd447bb9229903d
2017-08-23 08:05:50 -07:00
Johann
e83d99d7b8 quantize fp: neon implementation
About 4x faster when values are below the dequant threshold and 10x
faster if everything needs to be calculated.

Both numbers would improve if the division for dqcoeff could be
simplified.

BUG=webm:1426

Change-Id: I8da67c1f3fcb4abed8751990c1afe00bc841f4b2
2017-08-23 08:01:30 -07:00
Shiyou Yin
59e065b6ed vpx_dsp:loongson optimize vpx_mseWxH_c(case 16x16,16X8,8X16,8X8) with mmi.
Change-Id: I2c782d18d9004414ba61b77238e0caf3e022d8f2
2017-08-23 15:14:15 +08:00
Marco Paniconi
0207f17144 Merge "vp9: Condition lighting change detection on CBR mode." 2017-08-22 22:52:05 +00:00
Johann Koenig
103e4e50a8 Merge changes I53f8a160,I48f282bf
* changes:
  quantize ssse3: copy style from sse2
  quantize sse2: copy opts from ssse3
2017-08-22 22:27:56 +00:00
Marco
a31461c853 vp9: Condition lighting change detection on CBR mode.
This feature is used for the CBR RTC encoding mode
at speed >= 6. This change will exclude it for VBR mode.

For speed 6 live encoding (VBR):
avgPSNR/SSIM metrics on ytlive set up by ~1% (few clips up by 2/3%).
No change in speed.

Change-Id: I1a0dd94c334f7df309ab5a48d477d7e25355b798
2017-08-22 14:59:37 -07:00
Johann
b9c1dcc5fa quantize ssse3: copy style from sse2
Change-Id: I53f8a160e640c674ea035fc112e207b6dca42598
2017-08-22 14:25:27 -07:00
Johann Koenig
7f2993f5e4 Merge "quantize: capture skip block early" 2017-08-22 20:03:02 +00:00
Johann
75752ab7c0 quantize sse2: copy opts from ssse3
Simplify eob calculations based on ssse3 implementation.

General clean up and re-scoping.

Change-Id: I48f282bf9bd28ee9bc2c7a6779be9d45b5a3a3ee
2017-08-22 13:01:44 -07:00
Johann Koenig
ab27b68693 Merge changes Icfb70687,I9a963e99,Ie8ac00ef,I1272917c
* changes:
  quantize: ignore skip_block in arm
  quantize: ignore skip_block in x86
  quantize fp: ignore skip_block in arm
  quantize fp: ignore skip_block in x86
2017-08-22 19:19:14 +00:00
Johann
7a178a5631 quantize: capture skip block early
This should probably be handled before vp9_regular_quantize_b_4x4 even
gets called.

Fixes an assert resulting from removing skip_block from the quantize
functions.

BUG=webm:1459

Change-Id: I7f52b53f959b4654b3d4517ebda31a678f4d0fde
2017-08-22 12:10:55 -07:00
James Zern
419ce36294 Merge "ppc: Add vpx_idct16x16_256_add_vsx" 2017-08-22 00:48:39 +00:00
Shiyou Yin
bff5aa9827 Merge "vpx_dsp:loongson optimize vpx_subtract_block_c (case 4x4,8x8,16x16) with mmi." 2017-08-22 00:37:23 +00:00
Johann
2c56bb97f2 quantize: ignore skip_block in arm
Change-Id: Icfb70687476b2edb25d255793ba325b261d40584
2017-08-21 14:37:50 -07:00
Johann
c02fdd0258 quantize: ignore skip_block in x86
Change-Id: I9a963e99f08761f0c8d6a305619270b2f1c4edf8
2017-08-21 14:37:03 -07:00
Johann
b527b47312 quantize fp: ignore skip_block in arm
Change-Id: Ie8ac00efa826eead2a227726a1add816e04ff147
2017-08-21 14:34:48 -07:00
Johann
7b13d99b98 quantize fp: ignore skip_block in x86
Change-Id: I1272917c49cf6e6710e52c36535b2fc8c8dced78
2017-08-21 14:33:41 -07:00
Johann
661efeca97 quantize test: test _fp_ version of quantize
None of the x86 optimizations pass the tests.

Change-Id: Ic67f2ba1977b657e68f2a13b0711fc5fcbafd909
2017-08-21 12:29:41 -07:00
Johann
13eed991f9 Remove skip_block from quantize
This condition is handled before this code is reached. The ssse3 version
of the function has always crashed when attempting to handle the
skip_block condition.

Add assert() and comments regarding the usage of skip_block.

Removing the parameter is a fairly involved process so leave it be for
the moment.

Change-Id: Ib299f6fc6589d7ee102262cc74a7aeb60110bc5a
2017-08-21 09:49:04 -07:00
Scott LaVarnway
eab3f5e0cc vpx_dsp: get32x32var_avx2() cleanup
renamed to get32x16var_avx2()

BUG=webm:1404

Change-Id: Icb8f3986c9c9c646e13a69430db7235fc7e1a036
2017-08-18 13:44:09 -07:00
Scott LaVarnway
2c5478e383 Merge "vpx_dsp: vpx_get16x16var_avx2() cleanup" 2017-08-18 20:30:59 +00:00
Scott LaVarnway
2f7497f341 vpx_dsp: vpx_get16x16var_avx2() cleanup
BUG=webm:1404

Change-Id: I88aceb07f4db4870a06eee21d87296974ce3221a
2017-08-18 12:23:49 -07:00
Johann Koenig
1426f04e91 Merge "quantize: normalize intermediate types" 2017-08-18 16:00:28 +00:00
Shiyou Yin
7d82e57f5b vpx_dsp:loongson optimize vpx_subtract_block_c (case 4x4,8x8,16x16) with mmi.
Change-Id: Ia120ad1064d0b6106d9685cf075bdab373eef19e
2017-08-18 09:06:49 +08:00
James Zern
bb15fd51be highbd_idct32x32*,idct32_34_4x32_quarter_1_2: fix typo
135 -> 34

fixes unused function warnings for highbd_idct32_34_4x32_quarter_[12]

Change-Id: I4f50ff6ea514200af93dd59ff94c7f9717409682
2017-08-17 15:37:38 -07:00
Johann
7f602d6114 quantize: normalize intermediate types
Despite abs_coeff being a positive value, all the other implementations
treat it as signed which simplifies restoring the sign.

HBD builds cast qcoeff to avoid a visual studio warning. Match
vp9_quantize.c style of casting the entire expression.

Change-Id: I62b539b8df05364df3d7644311e325288da7c5b5
2017-08-17 12:34:28 -07:00
James Zern
e038d1610e inv_txfm_sse2.h: correct idct*/iadst* prototypes
fixes mismatch between prototypes and definitions

Change-Id: Ib5e7dfcce244dbb8401815be2cdd183d96792652
2017-08-16 23:06:09 -07:00
Paul Wilkins
f64e14047d Merge "Prevent parameters that can cause invalid ARF groups." 2017-08-16 18:25:57 +00:00
Paul Wilkins
372336d1e5 Merge "Fix corrupt arf groups due to low "lag_in_frames"" 2017-08-16 18:25:29 +00:00
Linfeng Zhang
f95686895b Merge changes I08b562b6,Ia275940a,I51106e90
* changes:
  Add vpx_highbd_idct32x32_{34, 135, 1024}_add_{sse2, sse4_1}
  Update highbd idct x86 optimizations.
  Update 32x32 idct sse2 and ssse3 optimizations.
2017-08-16 16:36:37 +00:00
paulwilkins
b814e2d898 Prevent parameters that can cause invalid ARF groups.
Having a very low "lag_in_frames" value could cause the encoder to create
incorrect / corrupt ARF groups including displayed frames that update the
ARF buffer and false overlay frames that are coded at low rate but are not
actually overlays of a real ARF frame.

This is linked to a reported unit test "slow down" where the chosen parameters
(lag of 3 frames) gave rise to such "broken" ARF group(s).

See also BUG=webm:1454

Change-Id: If52d0236243ed5552537d1ea9ed3fed8c867232c
2017-08-16 14:33:59 +01:00
paulwilkins
48110d0f79 Fix corrupt arf groups due to low "lag_in_frames"
Having a very small value for "lag_in_frames" can result in
corrupt arf groups including displayed frames that update
the arf buffer and fake overlay frames that are not in fact
overlays of real arfs but are nevertheless starved of bits.

Leaving lag_in_frames at the default of 25 for these 5 frame two
pass VBR tests should now give rise to a valid ARF coding pattern
as follows:-  K(ey), A(rf), N(ormal), N, N, O(verlay).

This change is part of a response to BUG=webm:1454 where broken
arf groups interacted badly with a change that corrects for large rate
misses. However, it may still in some cases increase encode time by
virtue of the fact that the unit test now codes a correct coding pattern
with "hidden" ARF frames.

Change-Id: Ifd0246a4c1d0be247247c754024d7a4ed5f66a6b
2017-08-16 14:07:24 +01:00
Paul Wilkins
0472382dbe Merge "Fix for encoder slowdown (for speeds >= 3)" 2017-08-16 13:01:38 +00:00
paulwilkins
e15be3025b Fix for encoder slowdown (for speeds >= 3)
Some clips in nightly unit test exhibiting significant encoder slowdown which
appears to bisect to Change-Id: I692311a709ccdb6003e705103de9d05b59bf840a.

The above change allowed for emergency iterations of the recode loop and
adjustment of the Q range if there is a large rate miss.

This patch disables the above adaptation for cases of cpu_speed >= 3 or more
specifically where cpi->sf.recode_loop >= ALLOW_RECODE_KFARFGF.

For speeds >= 3 the code does not currently run a dummy bit pack operation
inside the recode loop. Without this dummy pack operation there is no up to
date estimate of the current frame's size to use as a basis for assessing the
requirement for a recode. In practice it was using the previous frames size (or 0
for the first frame) which could cause odd behavior.

If we require the emergency rate correction added in  Change-Id: I6923.. for
the higher speed settings it will be necessary to enable the dummy pack
which will in turn hurt encode speed.

BUG=webm:1454

Change-Id: I4fb3c6062ca9508325a6f31582f8e80f1a9b126f
2017-08-16 10:56:52 +01:00
Jerome Jiang
6b9c691daf Merge "Clean up writing YUV files for debug purpose." 2017-08-15 18:28:54 +00:00
Marco Paniconi
14437d0fa6 Merge "vp9: Denoiser fix: use correct bsize for skin detection." 2017-08-15 17:53:08 +00:00
Jerome Jiang
a153080b55 Clean up writing YUV files for debug purpose.
Change legacy vp8/9_write_yuv_frame to vpx_write_yuv_files.
Delete some flags that can be enabled during build.

To enable writing denoised YUV, use the following command line:
CFLAGS='-DOUTPUT_YUV_DENOISED' ./configure
--enable-vp9-temporal-denoising

For skinmap, use CFLAGS='-DOUTPUT_YUV_SKINMAP'

Change-Id: I236974ac8b3cf279d20c4dc7f6162d8b480b6528
2017-08-15 10:44:03 -07:00
Johann Koenig
c59d1a4dc7 Merge changes I1f1edeaa,I89313cac
* changes:
  quantize: silence unsigned overflow warning
  quantize test: quiet overflow warning
2017-08-15 17:37:59 +00:00
Marco
e9ccc6fe79 vp9: Denoiser fix: use correct bsize for skin detection.
Change-Id: I9d201fa3a4b00ebd147b57ed519fab8d59b0a802
2017-08-15 10:02:19 -07:00
Johann
77ed4414d6 quantize: silence unsigned overflow warning
The result of the xor operation is unsigned. If coeff was negative,
this results in an unsigned value - INT_MIN.

Change-Id: I1f1edeaa6de1f4c68b848e8a82a666d390b749f0
2017-08-15 09:48:24 -07:00
Scott LaVarnway
7e8357d664 Merge "vp9: strip temporal filter code" 2017-08-15 15:35:33 +00:00
Johann
08cb7b5c68 quantize test: quiet overflow warning
Promote the result of RandRange to signed

Change-Id: I89313cace3bcbe9af96946bef00b6857fc48b128
2017-08-15 08:28:09 -07:00