Commit Graph

17667 Commits

Author SHA1 Message Date
James Zern
746c0eab3b disable SSSE3/VP9QuantizeTest* in hbd builds
this test fails with the configuration similar to the assembly prior to:
d52cb5972 quantize: copy ssse3 optimizations to intrinsics

BUG=webm:1458

Change-Id: Idc5c0b84c0598259fc49609a9f0756de531d3baf
2017-08-14 09:31:14 -07:00
Johann
e022ce84ac Rename vp8 quantize file
BUG=webm:1457

Change-Id: Ie8fae018ad8417724fde087055b90228850d631d
2017-08-11 10:44:36 -07:00
Jerome Jiang
d48be6ad73 Merge "vp9 SVC: Fix the denoiser frame buffer management." 2017-08-11 00:54:35 +00:00
Jerome Jiang
0f8ebddec4 vp9 SVC: Fix the denoiser frame buffer management.
Change the denoiser frame buffer management for SVC to more generally
handle the layer patterns in SVC (where last is not always refreshed).

This change is only for SVC with denoising and is bitexact.

Change-Id: Ic2b146a924cdf6e7114609158afa3d4880fe3fae
2017-08-10 16:56:46 -07:00
Linfeng Zhang
15193ce51f Merge "Clean highbd idct x86 code with inline functions" 2017-08-10 20:25:18 +00:00
Johann Koenig
9bb8ce5efb Merge "neon: vpx_quantize_b_32x32" 2017-08-10 15:42:49 +00:00
Johann Koenig
0b393ae505 Merge "quantize: copy ssse3 optimizations to intrinsics" 2017-08-10 15:42:20 +00:00
Linfeng Zhang
39da7fb786 Clean highbd idct x86 code with inline functions
Created inline functions highbd_butterfly_cospi16_sse2()
and highbd_butterfly_cospi16_sse4_1()

BUG=webm:1412

Change-Id: Icbc53a73712b6207379872a5e88d0a4d09e2322a
2017-08-08 17:53:28 -07:00
Marco Paniconi
68805583e9 Merge "vp9: Partition logic adjustment for speed 6 feature." 2017-08-08 23:08:10 +00:00
Johann
93166c5e51 neon: vpx_quantize_b_32x32
With skip block the neon is about twice as fast as C.

The neon has no shortcut for coeff < zbin so it always takes the
same amount of time. Even if the C can take the shortcut, it is over
twice as fast in neon. If it can't, that gap increases to over 10x.

BUG=webm:1426

Change-Id: I400722146c1b5a5f6289f67d85fd642463d2bfc6
2017-08-08 14:05:18 -07:00
Johann
d52cb59729 quantize: copy ssse3 optimizations to intrinsics
Fairly minor differences from sse2. pabsw and psignw are the big gains.
Also re-uses some values in eob calculation to avoid an extra pcmp.

Fixes test failures in HBD and OS X builds.

Allows using it in 32bit builds, where it is about 40% faster than sse2.

Substantially faster than the assembly for skip_block. 10-20% faster the
rest of the time.

Change-Id: If783bb3567e561e47667e10133b9c84414a334e2
2017-08-08 12:22:14 -07:00
Marco
427de67e63 vp9: Partition logic adjustment for speed 6 feature.
When adapt_partition_source_sad is enabled (currently only at
speed 6 for resoln <= 360p): use lower subsize (8x8 instead of 16x16)
for nonrd_select_partition on 32X32 blocks.

And force avoiding rectangular partition checks in
nonrd_pick_partition for speed >= 6.

Small increase ~0.5 in metrics for speed 6 on rtc_derf,
no change in speed.

Change-Id: Id751bc8f7573634571b2d6f5e29627cd5cebccae
2017-08-08 11:31:27 -07:00
Linfeng Zhang
853165ba39 Update 32x32 idct sse2 funcs, add partial case 135
Change-Id: I2b9add83f6fd8f9138fed3bec04a59877a237a6a
2017-08-07 17:37:02 -07:00
Linfeng Zhang
d670678f26 Rename highbd_multiplication_and_add_xx() to highbd_butterfly_xx()
in idct x86 code

Change-Id: I5159499a73a5c1b680516f6ca9c3d84f00c35083
2017-08-04 15:33:37 -07:00
Linfeng Zhang
fa829e0e5a Replace multiplication_and_add() with butterfly() in idct x86 code
Change-Id: I266e45a3d75a5357c7d6e6f20ab5c6fdbfe4982e
2017-08-04 15:33:34 -07:00
Linfeng Zhang
c9fb719ee1 Update butterfly() in idct x86 optimizations.
Change-Id: Ic73e03bab9fdc085146f52094014db4af36ad701
2017-08-04 15:33:28 -07:00
Linfeng Zhang
7f20c3ac44 Add vpx_highbd_idct16x16_{10, 38, 256}_add_sse4_1
BUG=webm:1412

Change-Id: I8877c986b4042f7b8e33f5674c86700675a0e4ca
2017-08-04 15:31:17 -07:00
Linfeng Zhang
22b6dc9fdf Update for loop increment of idct x86 functions
Change-Id: Ided7895eaf41d5bc9d64fe536a17f5a078da68d4
2017-08-04 15:29:19 -07:00
Linfeng Zhang
0c61331244 Update high bitdepth 16x16 idct x86 code
Prepare for high bitdepth 16x16 idct sse4.1 code.
Just functions moving and renaming.

BUG=webm:1412

Change-Id: Ie056fe4494b1f299491968beadcef990e2ab714a
2017-08-04 15:12:33 -07:00
Johann Koenig
cbb83ba4aa Merge "quantize test: consolidate sizes" 2017-08-04 20:34:50 +00:00
Johann
9578a84205 quantize test: consolidate sizes
Pass a max txfm size parameter and combine the base quantize
test with the 32x32 test.

Change-Id: I72ddf020fe6888e864ea9f3642ee2d9a8e48a04b
2017-08-04 12:45:32 -07:00
Scott LaVarnway
c42517568d vpx_dsp: merge avx2 variance files
BUG=webm:1404

Change-Id: Ieb8f85c3811b05df78722cb41eeb1166966ceec4
2017-08-04 07:49:30 -07:00
Kaustubh Raste
39e8b8dac6 Fix mips dspr2 6 tap filter clobber list
Change-Id: Ib7c07e6ce00a5c7e59113b16e6661a8369f9e646
2017-08-04 10:56:56 +05:30
Linfeng Zhang
e921c7ba8d Merge "Rewrite vpx_idct16x16_{10,256}_add_sse2() and add case 38 function" 2017-08-04 01:16:35 +00:00
Scott LaVarnway
f6c6f37e0c Merge "vpx_dsp: Use correct check for halfpel in" 2017-08-03 23:17:09 +00:00
Linfeng Zhang
563d58ab84 Rewrite vpx_idct16x16_{10,256}_add_sse2() and add case 38 function
BUG=webm:1412

Change-Id: I945f0fb6807b8948747243794dc7352b959221f7
2017-08-03 13:59:47 -07:00
Linfeng Zhang
6624f20785 Merge changes I76727df0,I66297d78,I1d000c6b
* changes:
  Extract inlined 16x16 idct sse2 code into header file
  Add transpose_32bit_8x4() sse2 optimization
  Update x86 idct optimization
2017-08-03 20:51:02 +00:00
Scott LaVarnway
8334a48d3a vpx_dsp: Use correct check for halfpel in
vpx_sub_pixel_variance32xh_avx2() and
vpx_sub_pixel_avg_variance32xh_avx2

see:
17fae3a Change to use correct check for halfpel

Change-Id: Ib0741c5c2fd011e9650ca62b76009f1b59fdbe4c
2017-08-03 06:57:40 -07:00
Yunqing Wang
6843e7c7f3 Merge "Force the bit exactness in the first pass" 2017-08-03 00:03:10 +00:00
Linfeng Zhang
15a47db730 Extract inlined 16x16 idct sse2 code into header file
Will be called by high bitdepth functions.

Change-Id: I76727df00941b5a27adceaba8347f275475fcd8c
2017-08-02 16:17:43 -07:00
Linfeng Zhang
8c0ab7607e Add transpose_32bit_8x4() sse2 optimization
Change-Id: I66297d78b38db718cfe3ebb8ea972f5a72c17955
2017-08-02 16:15:58 -07:00
Yunqing Wang
bfd0f41f9b Force the bit exactness in the first pass
Originally, for the purpose of keeping a fast first pass, the first-pass
stats between row_mt_mode = 0 and row_mt_mode = 1 are not bit exact, but
that difference is very small that doesn't cause a mismatch between the
final bitstreams. However, if the encoder changes, this minor difference
may cause a mismatch. Thus, this patch always forces the first pass to
be bit exact.

BUG=webm:1453

Change-Id: I2b67cf529dee81f660f9d9e7fe9a60ea3c7b12b8
2017-08-02 15:58:39 -07:00
Johann Koenig
787970a625 Merge "quantize test: add speed comparison" 2017-08-02 21:16:35 +00:00
Marco
b9577e07fc vp8: Drop due to overshoot for non-screen content.
For 1 pass CBR mode:
Apply the logic for dropping (and re-adjusting rate control)
due to large overshoot to the case of non-screen content when
drop_frames_allowed is enabled.

For the non-screen content case: add additional condition that
rate correction factor is close to minimum state, and flag to
constrain the frequency of the dropping.

Also handle the case of temporal layers and multi-res encoding.
Add some flags/counters to the layer context for temporal layers.
For multi-res: drop due to overshoot is checked on lowest stream,
and if overshoot is detected we force drops on all upper streams
for that frame.

This feature is to avoid large frame sizes on big content
changes following low content period.

No change in behavior for screen_content_mode = 2.

Change-Id: I797ab236cbbf3b15cad439e9a227fbebced632e6
2017-08-02 13:12:48 -07:00
Scott LaVarnway
698e56f26c Merge "vpxdsp: variance_impl_avx2.c cleanup" 2017-08-02 19:08:10 +00:00
Johann
1059b5cc52 quantize test: add speed comparison
Test some possible scenarios.

Change-Id: I1a612e7153b31756be66390ceea55877856d5a33
2017-08-02 09:33:35 -07:00
Scott LaVarnway
632fe8286a vpxdsp: variance_impl_avx2.c cleanup
BUG=webm:1404

Change-Id: I8d8498009e5ef7bf1137e4ff16ec81738a020b02
2017-08-02 05:57:39 -07:00
shiyou yin
0e87b16022 Merge "loongson mmi configuration patch." 2017-08-02 01:08:43 +00:00
Linfeng Zhang
6738ad7aaf Update x86 idct optimization
Move constant coefficients preparation into inline function.

Change-Id: I1d000c6b161794c8828ff70768439b767e2afea1
2017-08-01 14:40:12 -07:00
Linfeng Zhang
c0490b52b1 Merge "Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2" 2017-08-01 21:39:39 +00:00
Johann Koenig
847394fe77 Merge "neon: vpx_quantize_b" 2017-08-01 16:44:31 +00:00
Paul Wilkins
3be14200fc Merge "Respond more rapidly to excessive local overshoot." 2017-08-01 08:58:36 +00:00
Marco Paniconi
c22b17dcef Merge "vp9: Adjust noise estimation for 360p." 2017-08-01 02:48:13 +00:00
Marco
5d6c1c2d8f vp9: Adjust noise estimation for 360p.
Change-Id: Ib76875232491b14f7114061e8e913e87004427a0
2017-07-31 17:12:58 -07:00
Linfeng Zhang
bf14d468c1 Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2
This replaces commit aa1c4cd, which has a bug and was reverted in
commit 3c73e58.

The bug is caused by rounding -step1[5] in highbd_idct8x8_12_half1d().

Change-Id: I37b3a5f0d91815f2dc570209091dc6626fd178a8
2017-07-31 16:36:13 -07:00
James Zern
78e2da3e42 Merge "highbd_inv_txfm_sse4: make << of neg. val a multiply" 2017-07-31 22:43:41 +00:00
Johann
2d6b5df657 neon: vpx_quantize_b
With skip block or coeff < zbin it is about twice as fast as C.

If most coeff values are > zbin it is about 10-15x as fast as C.

BUG=webm:1426

Change-Id: I5d3c007b014a372d5ef0882b39bb48983b4131c7
2017-07-31 10:38:46 -07:00
YinShiyou
2758de5cb2 loongson mmi configuration patch.
enable loongson mmi optimization: ../configure --enable-mmi

Change-Id: I7792c3adeac1d5b573917d7857bba6c1cc05fea5
2017-07-31 17:29:36 +00:00
Marco Paniconi
ebb023deb6 Merge "Revert "Revert "vp9: Speed feature to adapt partition based on source_sad.""" 2017-07-31 14:58:15 +00:00
Marco
999bd6ea84 vp9: Fix denoising condition when pickmode partition is used.
When the superblock partition is based on the nonrd-pickmode,
we need to avoid the denoising. Current condition was based on
the speed level. This change is to make the condition at the
superblock level, as the switch in partitioning may be done at
sb level based on source_sad (e.g., in speed 6).

Change-Id: I12ece4f60b93ed34ee65ff2d6cdce1213c36de04
2017-07-30 23:16:38 -07:00