Commit Graph

1341 Commits

Author SHA1 Message Date
Yunqing Wang
770c6663d6 Merge "Changes to facilitate row based multi-threading of ARNR filtering" 2017-02-01 22:04:15 +00:00
Ranjit Kumar Tulabandu
359a6796da Changes to facilitate row based multi-threading of ARNR filtering
Change-Id: I2fd72af00afbbeb903e4fe364611abcc148f2fbb
2017-02-01 13:03:52 -08:00
Johann
bfd62cdaff vp9_rdopt: declare 'c' closer to use
Clears up static clang analysis warning regarding a dead store. Only
declare 'c' when it will be used.

Change-Id: I1ac0fc7f94bc44da63938c63cd1efcd6b95e0eb3
2017-02-01 19:58:24 +00:00
Jingning Han
969957f9f2 Fix real-time compression regression in hbd mode
This commit resolves the compression performance regression in
real-time encoding setting when high bit-depth mode is enabled.

The current solution temporarily disables the SIMD implementations
of vpx_satd, hadamard8x8, and hadamard16x16 in high bit-depth mode.

The commit makes the coding results bit-wise identical between
regular coding pipeline and high bit-depth at profile 0.

BUG=webm:1365

Change-Id: Icfb900821733749685370460a1a5a7e07f76f4bf
2017-01-31 23:17:09 -08:00
Debargha Mukherjee
e6446b4b60 Refactor uv tx size with lookup arrays
Change-Id: Ife6a3d301c5faaba89d16d188d638631083511f7
2016-08-31 13:15:38 -07:00
paulwilkins
635ae8bdc1 Adjust coefficient optimization and tx_domain rd speed features.
Previously Tx domain rd was used in all cases above speed 0.
Coefficient optimization was only enabled for best and speed 0.

This patch selectively sets these features at other speed settings
based on block complexity.

For the Netflix and HD sets in particular the quality gains are
large compared to the speed hit. At speed 1 the average psnr
gain in the NF set  is > 2.5% with one clip coming in at 18%
and some points almost 30%.  Average gains for the lower
resolution test sets are around 1%.

The gains are biggest at low Q so some further optimization
may be possible.

Change-Id: I340376c7b2a78e5389a34b7ebdc41072808d0576
2016-08-25 15:36:16 +01:00
Yunqing Wang
a413dbe594 Fix another motion vector out of range bug
This patch fixed a motion vector out of range bug:
vpxenc: ../libvpx/vp9/encoder/vp9_mcomp.c:69:
 mv_cost: Assertion `mv->col >= -((1 << (11 + 1 + 2)) - 1) &&
 mv->col < ((1 << (11 + 1 + 2)) - 1)' failed.

For blocks that returned without having full-pixel search, the original
MV limits were not restored, which caused the failure. Moved the set
MV limit function down to fix the bug.

Change-Id: Id7d798fc7214e95c6e4846c588f0233fcf1a4223
2016-08-12 09:27:58 -07:00
Alex Converse
6554333b59 Refactor mv limits.
Change-Id: Ifebdc9ef37850508eb4b8e572fd0f6026ab04987
2016-08-08 11:54:00 -07:00
Yunqing Wang
2fb826c4d5 Fix a motion vector out of range bug
This patch fixed a motion vector(MV) out of range bug, which was caused
by not restoring the original values of the MV min/max thresholds after
the sub8x8 full pixel motion search. It occurred rarely and only was seen
while encoding a 4k clip for 200 frames.

BUG=webm:1271

Change-Id: Ibc4e0de80846f297431923cef8a0c80fe8dcc6a5
2016-08-05 15:23:05 -07:00
Yaowu Xu
7a79fa1362 Fix msvc compiler warnings
MSVC 2013 complained about using 32 shift where 64 bit shift should be
used.

Change-Id: I7a2b165d1a92d3c0a91dd4511b27aba7709b5e55
2016-08-03 18:33:06 -07:00
clang-format
e0cc52db3f vp9/encoder: apply clang-format
Change-Id: I45d9fb4013f50766b24363a86365e8063e8954c2
2016-08-02 16:47:11 -07:00
Alex Converse
335cf67d8b Fix 64 to 32 narrowing warning.
- Solves potential integer overflow on 12-bit
- Fixes Visual Studio build

Change-Id: I26dd660451bbab23040e4123920d59e82585795c
2016-07-27 12:40:23 -07:00
Alex Converse
d6c5ef4557 Only consider visible 4x4s in pixel domain error.
BDRATE change
derf144: -0.327
lowres: -0.048
midres: -0.125
hdres: -0.238

Change-Id: I789aba9870b5c2952373a7dd4fc8ed45590c3c54
2016-07-25 21:44:06 +00:00
Scott LaVarnway
c969b2b02b VP9: get_pred_context_switchable_interp() -- encoder side
Change-Id: I7217c90d5cf38c51b76759a2dc4f10070f3a40ac
2016-07-21 11:47:51 -07:00
Scott LaVarnway
2e93fcf893 Merge "vp9_rd_pick_intra_mode_sb(): set interp_filter to" 2016-07-11 22:31:06 +00:00
Scott LaVarnway
ed7786869a vp9_rd_pick_intra_mode_sb(): set interp_filter to
SWITCHABLE_FILTERS.  This is a partial fix for the build
issues with Change 357240.

Change-Id: I4e507c196175bae729a4f1397878ec8776b0146c
2016-07-09 09:47:34 -07:00
Jingning Han
2f28f9072e Enable coeff optimization for intra modes
This further improves the coding performance by
lowres 0.3%
midres 0.5%
hdres  0.6%

Change-Id: I6a03b6da210b9cbc261474bad4a103e0ba021c68
2016-07-07 12:25:41 -07:00
Jingning Han
62aa642d71 Enable uniform quantization with trellis optimization in speed 0
This commit allows the inter prediction residual to use uniform
quantization followed by trellis coefficient optimization in
speed 0. It improves the coding performance by

lowres 0.79%
midres 1.07%
hdres  1.44%

Change-Id: I46ef8cfe042a4ccc7a0055515012cd6cbf5c9619
2016-07-07 12:25:33 -07:00
Jingning Han
541eb78994 Refactor coeff_cost() function
Move the operations that update the context buffers outside this
function. The coeff_cost() takes all input as const value and returns
the coefficient cost.

This makes preparation for the next coefficient optimization CLs.

Change-Id: I850eec6e5470b91ea84646ff26b9231b09f70a0c
2016-07-07 18:09:39 +00:00
Jingning Han
e357b9efe0 Support measure distortion in the pixel domain
Use pixel domain distortion metric in speed 0. This improves the
compression performance by 0.3% for both low and high resolution
test sets.

Change-Id: I5b5b7115960de73f0b5e5d0c69db305e490e6f1d
2016-07-06 18:25:17 -07:00
Jingning Han
14011f037d Remove txfrm_block_to_raster_xy() from vp9 encoder
The transform block row and column positions are always available
outside the callees. There is no need to re-compute these values
again. This approach has been used by the decoder. This commit
removes txfrm_block_to_raster_xy() function.

Change-Id: I5b90f91a0d8b7c35cfa7d171da9edf8202630108
2016-07-04 18:41:47 -07:00
Scott LaVarnway
74bb78df82 Merge "VP9: handle_inter_mode()... Use interp_filter" 2016-06-29 11:41:52 +00:00
Scott LaVarnway
feb7e9a372 VP9: handle_inter_mode()... Use interp_filter
only if above/left is inter.

Change-Id: I0cc1f926425c021c84536df8271e9ee5f3f87caf
2016-06-28 14:09:59 -07:00
James Zern
ca88d22f39 s/UINT32_MAX/UINT_MAX/
provides better toolchain compatibility

Change-Id: I8561a6de668a68ff54fe3886a4ee6300f0ae9c04
2016-06-25 12:15:51 -07:00
James Zern
b34705f64f Merge "cosmetics: Beautify whitespaces and line wrapping" 2016-06-24 21:51:01 +00:00
Yury Gitman
67611119b5 cosmetics: Beautify whitespaces and line wrapping
Change-Id: I9afa02cae671bd3527cf344695e53d0cc767f549
2016-06-24 10:18:06 -07:00
Yaowu Xu
7738bcb350 Rationalize type to avoid integer out of range
BUG=webm:1250

Change-Id: Id5bb2762ca1bf996ba4f9a60eec977a7994c1d94
2016-06-24 13:58:02 +00:00
Yaowu Xu
b3933e2d3c Merge "Fix ubsan warnings: vp9/encoder/vp9_mcomp.c" 2016-06-22 00:12:58 +00:00
Yaowu Xu
87bf1a149c Fix ubsan warnings: vp9/encoder/vp9_mcomp.c
This commit fixes a number of ubsan warnings in HBD build.

BUG=webm:1219

Change-Id: I05f0fd0ef50e93db4ba34205005c54af1ed32acc
2016-06-21 15:37:59 -07:00
hui su
a5af392aae Add a hardware compatibility feature
This commit adds an encoder workaround to support better
compatibility with a non-compliant hardware vp9 profile 2 decoder.

The known issue with this decoder is:
The decoder assumes a wrong value, 127 instead of the correct
value of 511 and 2047, for any assumed top-left corner pixel in
UV planes for 10 and 12 bit, respectively. Such assumed
top-left corner pixel is used for INTRA prediction when a real
decoded/reconstructed pixel is not avalable, e.g. when it is
located inside the row above the top row or inside the column
left to the leftest column of a video image.

Change-Id: Ic15a938a3107e1b85e96cb7903a5c4220986b99d
2016-06-21 10:33:57 -07:00
Scott LaVarnway
ba962a5f37 VP9: Eliminate up_available and left_available
Use above_mi and left_mi instead.

Change-Id: I0b50e232c31d11da30aa2fb6f91a695aaf725e0c
2016-03-30 04:47:39 -07:00
Julia Robson
74a679de6f Port "cost_coeff speed improvements" to vp9.
About a 5% faster overall encode (perf cycles) at speed zero!

Change-Id: Iaf013ba75884415cd824e98349f654ffb1c3ef33
2016-02-26 14:47:18 -08:00
Jingning Han
d642294b1c Fix tsan error in VP9 sub8x8 intra mode search
This commit fixes issue 1141. The issue was triggered in multi-tile
encoding. The change properly saves and restores the block context
information in the real-time mode selection process. It removes
several redundant memcpy operations in sub8x8 intra block mode search.

Change-Id: I35c9ad197f4bd500ec39b5fc833f052f19eee010
2016-02-16 11:24:09 -08:00
Jingning Han
f032c7eaed Merge "Account for sub8x8 block skip mode cost in RD decision" 2016-02-08 19:40:01 +00:00
Jingning Han
203bdd20fb Account for sub8x8 block skip mode cost in RD decision
Make this consistent with regular block size rate-distortion
optimization. It improves the compression performance:
derf    0.055%
hevcmr  0.129%

Change-Id: I112fe734f592c21bc7aa6efb7e3f269c4214ee7b
2016-02-08 10:18:51 -08:00
Jingning Han
ac6d40ece8 Clean up in vp9_rd_pick_inter_mode_sb
Use local variable.

Change-Id: I0d3df36cf4536958a0cda422f6c30da50f0e0bbf
2016-02-08 10:15:02 -08:00
Jingning Han
bcce658d31 Use precise rate cost estimate for skip block mode
It improves the compression performance of VP9 by 0.1% across all
test sets. No speed change is observed.

Change-Id: I59338c5c9e67bae22188f35fc3afbfe2a6bba6b0
2016-02-03 11:09:16 -08:00
Alex Converse
d13385cee7 Switch to 9-bit rate cost constants built on a 256 probability denominator.
-.220 BDRATE derf: https://x20web.corp.google.com/~aconverse/results/cost256_derf.html
-.675 BDRATE hevcmr: https://x20web.corp.google.com/~aconverse/results/cost256_hevcmr.html

Change-Id: Ifb1646d8ce65ffe0eff9953a911b1b88735b335f
2016-01-27 19:34:30 +00:00
Alex Converse
4326cffa65 Merge "Tie the bit cost scale to a define." 2016-01-21 19:17:56 +00:00
Scott LaVarnway
5232326716 VP9: Eliminate MB_MODE_INFO
Change-Id: Ifa607dd2bb366ce09fa16dfcad3cc45a2440c185
2016-01-19 16:40:20 -08:00
Alex Converse
269428e35c Tie the bit cost scale to a define.
This is a pure-refactor in preparation to potentially raise the bit-cost
resolution.

Verified at good speed 0 and rt speed -6.

Change-Id: I5347e6e8c28a9ad9dd0aae1d76a3d0f3c2335bb9
2016-01-15 15:59:31 -08:00
Scott LaVarnway
a85e552d95 VP9: Remove decoder args from find_mv_refs_idx()
The decoder does not use this function.

Change-Id: Ie67f909c0f4108ef286789c70df867d4b960a780
2016-01-13 13:30:40 -08:00
Yaowu Xu
9cac17d157 Enable encoder to avoid 8x4 or 4x8 partitions
This commit enables encoder to avoid 8x4 and 4x8 partitions for
scaled reference frames when libvpx is configured and built with
--enable-better-hw-compatibility

Change-Id: I02ad65c386f5855f4325d72570c49164ed52f413
2016-01-07 09:53:14 -08:00
Yaowu Xu
650a2d7628 Fix a typo
Change-Id: I12de2dd5e5f375551804166188d76a9ad8067b41
2016-01-07 09:29:34 -08:00
Jingning Han
27bbfd652d Fix sub8x8 motion search on scaled reference frame
This commit makes the sub8x8 block rate-distortion optimization
scheme use precise motion compensated prediction to compute the rd
cost. It fixes a potential buffer overflow issue related to sub8x8
motion search on scaled reference frame.

Change-Id: I4274992ef4f54eaacfde60db045e269c13aaa2de
2015-12-11 10:08:51 -08:00
Alex Converse
b1fcd1751e Fix unsigned overflow in rd_variance_adjustment.
Found with clang -fsanitize=integer

Change-Id: I2538e7483cb2d5f06bceecbd3326bdd88bfecfa1
2015-11-19 15:00:59 -08:00
paulwilkins
0149fb3d6b Changes to exhaustive motion search.
This change alters the nature and use of exhaustive motion search.

Firstly any exhaustive search is preceded by a normal step search.
The exhaustive search is only carried out if the distortion resulting
from the step search is above a threshold value.

Secondly the simple +/- 64 exhaustive search is replaced by a
multi stage mesh based search where each stage has a range
and step/interval size. Subsequent stages use the best position from
the previous stage as the center of the search but use a reduced range
and interval size.

For example:
  stage 1: Range +/- 64 interval 4
  stage 2: Range +/- 32 interval 2
  stage 3: Range +/- 15 interval 1

This process, especially when it follows on from a normal step
search, has shown itself to be almost as effective as a full range
exhaustive search with step 1 but greatly lowers the computational
complexity such that it can be used in some cases for speeds 0-2.

This patch also removes a double exhaustive search for sub 8x8 blocks
which also contained  a bug (the two searches used different distortion
metrics).

For best quality in my test animation sequence this patch has almost
no impact on quality but improves encode speed by more than 5X.

Restricted use in good quality speeds 0-2 yields significant quality gains
on the animation test of 0.2 - 0.5 db with only a small impact on encode
speed. On most clips though the quality gain and speed impact are small.

Change-Id: Id22967a840e996e1db273f6ac4ff03f4f52d49aa
2015-11-13 10:16:31 +00:00
hui su
6ab6ac450b Use accurate bit cost for uv_mode in UV intra mode RD selection
On derflr, +0.1% for VP10; however, -0.03% on VP9.

Change-Id: I09c724232ede74254043d61d3cadc506256af0af
2015-11-06 14:45:43 -08:00
Geza Lore
aa8f85223b Optimize vp9_highbd_block_error_8bit assembly.
A new version of vp9_highbd_error_8bit is now available which is
optimized with AVX assembly. AVX itself does not buy us too much, but
the non-destructive 3 operand format encoding of the 128bit SSEn integer
instructions helps to eliminate move instructions. The Sandy Bridge
micro-architecture cannot eliminate move instructions in the processor
front end, so AVX will help on these machines.

Further 2 optimizations are applied:

1. The common case of computing block error on 4x4 blocks is optimized
as a special case.
2. All arithmetic is speculatively done on 32 bits only. At the end of
the loop, the code detects if overflow might have happened and if so,
the whole computation is re-executed using higher precision arithmetic.
This case however is extremely rare in real use, so we can achieve a
large net gain here.

The optimizations rely on the fact that the coefficients are in the
range [-(2^15-1), 2^15-1], and that the quantized coefficients always
have the same sign as the input coefficients (in the worst case they are
0). These are the same assumptions that the old SSE2 assembly code for
the non high bitdepth configuration relied on. The unit tests have been
updated to take this constraint into consideration when generating test
input data.

Change-Id: I57d9888a74715e7145a5d9987d67891ef68f39b7
2015-10-21 12:30:40 +01:00
Geza Lore
0134764fa6 Optimization of 8bit block error for high bitdepth
If high bit depth configuration is enabled, but encoding in profile 0,
the code now falls back on optimized SSE2 assembler to compute the
block errors, similar to when high bit depth is not enabled.

Change-Id: I471d1494e541de61a4008f852dbc0d548856484f
2015-10-08 14:05:25 -07:00