1347 Commits

Author SHA1 Message Date
Alex Converse
15dac923b9 Merge "Narrow cat6_high_cost tables to uint16_t" 2017-03-03 23:45:39 +00:00
Alex Converse
bcd12de6c3 Narrow cat6_high_cost tables to uint16_t
Saves 2688 bytes of rodata.

Change-Id: I46633b6e50c2845181c70fff6273a8e58fdd1e56
2017-03-03 23:09:12 +00:00
Vignesh Venkatasubramanian
5881601488 vp9: Rename new_mt to row_mt
new_mt is a very generic name that will get obsolete soon enough.
Since this is exposed as a codec control, renaming it to row_mt to
signify row level paralellism. Also renaming the ETHREAD_BIT_MATCH
codec control to ROW_MT_BIT_EXACT.

Change-Id: Ic7872d78bb3b12fb4cf92ba028ec8e08eb3a9558
2017-02-27 09:43:26 -08:00
Johann
904b957ae9 consolidate block_error functions
vp9_highbd_block_error_8bit_c was a very simple wrapper around
vp9_block_error_c. The SSE2 implemention was practically identical to
the non-HBD one. It was missing some minor improvements which only
went into the original version.

In quick speed tests, the AVX implementation showed minimal
improvement over SSE2 when it does not detect overflow. However, when
overflow is detected the function is run a second time. The
OperationCheck test seems to trigger this case and reverses any
speed benefits by running ~60% slower. AVX2 on the other hand is
always 30-40% faster.

Change-Id: I9fcb9afbcb560f234c7ae1b13ddb69eca3988ba1
2017-02-24 05:25:26 +00:00
Ranjit Kumar Tulabandu
5127e58dab Structured the mode ordering code to avoid redundant memcpy
Change-Id: I4f5d6b54018bd1928cd9e5e42619e6f55b334803
2017-02-16 14:12:33 +00:00
Ranjit Kumar Tulabandu
71061e9332 Row based multi-threading of encoding stage
(Yunqing Wang)
This patch implements the row-based multi-threading within tiles in
the encoding pass, and substantially speeds up the multi-threaded
encoder in VP9.

Speed tests at speed 1 on STDHD(using 4 tiles) set show that the
average speedups of the encoding pass(second pass in the 2-pass
encoding) is 7% while using 2 threads, 16% while using 4 threads,
85% while using 8 threads, and 116% while using 16 threads.

Change-Id: I12e41dbc171951958af9e6d098efd6e2c82827de
2017-02-15 00:49:34 +00:00
Yunqing Wang
770c6663d6 Merge "Changes to facilitate row based multi-threading of ARNR filtering" 2017-02-01 22:04:15 +00:00
Ranjit Kumar Tulabandu
359a6796da Changes to facilitate row based multi-threading of ARNR filtering
Change-Id: I2fd72af00afbbeb903e4fe364611abcc148f2fbb
2017-02-01 13:03:52 -08:00
Johann
bfd62cdaff vp9_rdopt: declare 'c' closer to use
Clears up static clang analysis warning regarding a dead store. Only
declare 'c' when it will be used.

Change-Id: I1ac0fc7f94bc44da63938c63cd1efcd6b95e0eb3
2017-02-01 19:58:24 +00:00
Jingning Han
969957f9f2 Fix real-time compression regression in hbd mode
This commit resolves the compression performance regression in
real-time encoding setting when high bit-depth mode is enabled.

The current solution temporarily disables the SIMD implementations
of vpx_satd, hadamard8x8, and hadamard16x16 in high bit-depth mode.

The commit makes the coding results bit-wise identical between
regular coding pipeline and high bit-depth at profile 0.

BUG=webm:1365

Change-Id: Icfb900821733749685370460a1a5a7e07f76f4bf
2017-01-31 23:17:09 -08:00
Debargha Mukherjee
e6446b4b60 Refactor uv tx size with lookup arrays
Change-Id: Ife6a3d301c5faaba89d16d188d638631083511f7
2016-08-31 13:15:38 -07:00
paulwilkins
635ae8bdc1 Adjust coefficient optimization and tx_domain rd speed features.
Previously Tx domain rd was used in all cases above speed 0.
Coefficient optimization was only enabled for best and speed 0.

This patch selectively sets these features at other speed settings
based on block complexity.

For the Netflix and HD sets in particular the quality gains are
large compared to the speed hit. At speed 1 the average psnr
gain in the NF set  is > 2.5% with one clip coming in at 18%
and some points almost 30%.  Average gains for the lower
resolution test sets are around 1%.

The gains are biggest at low Q so some further optimization
may be possible.

Change-Id: I340376c7b2a78e5389a34b7ebdc41072808d0576
2016-08-25 15:36:16 +01:00
Yunqing Wang
a413dbe594 Fix another motion vector out of range bug
This patch fixed a motion vector out of range bug:
vpxenc: ../libvpx/vp9/encoder/vp9_mcomp.c:69:
 mv_cost: Assertion `mv->col >= -((1 << (11 + 1 + 2)) - 1) &&
 mv->col < ((1 << (11 + 1 + 2)) - 1)' failed.

For blocks that returned without having full-pixel search, the original
MV limits were not restored, which caused the failure. Moved the set
MV limit function down to fix the bug.

Change-Id: Id7d798fc7214e95c6e4846c588f0233fcf1a4223
2016-08-12 09:27:58 -07:00
Alex Converse
6554333b59 Refactor mv limits.
Change-Id: Ifebdc9ef37850508eb4b8e572fd0f6026ab04987
2016-08-08 11:54:00 -07:00
Yunqing Wang
2fb826c4d5 Fix a motion vector out of range bug
This patch fixed a motion vector(MV) out of range bug, which was caused
by not restoring the original values of the MV min/max thresholds after
the sub8x8 full pixel motion search. It occurred rarely and only was seen
while encoding a 4k clip for 200 frames.

BUG=webm:1271

Change-Id: Ibc4e0de80846f297431923cef8a0c80fe8dcc6a5
2016-08-05 15:23:05 -07:00
Yaowu Xu
7a79fa1362 Fix msvc compiler warnings
MSVC 2013 complained about using 32 shift where 64 bit shift should be
used.

Change-Id: I7a2b165d1a92d3c0a91dd4511b27aba7709b5e55
2016-08-03 18:33:06 -07:00
clang-format
e0cc52db3f vp9/encoder: apply clang-format
Change-Id: I45d9fb4013f50766b24363a86365e8063e8954c2
2016-08-02 16:47:11 -07:00
Alex Converse
335cf67d8b Fix 64 to 32 narrowing warning.
- Solves potential integer overflow on 12-bit
- Fixes Visual Studio build

Change-Id: I26dd660451bbab23040e4123920d59e82585795c
2016-07-27 12:40:23 -07:00
Alex Converse
d6c5ef4557 Only consider visible 4x4s in pixel domain error.
BDRATE change
derf144: -0.327
lowres: -0.048
midres: -0.125
hdres: -0.238

Change-Id: I789aba9870b5c2952373a7dd4fc8ed45590c3c54
2016-07-25 21:44:06 +00:00
Scott LaVarnway
c969b2b02b VP9: get_pred_context_switchable_interp() -- encoder side
Change-Id: I7217c90d5cf38c51b76759a2dc4f10070f3a40ac
2016-07-21 11:47:51 -07:00
Scott LaVarnway
2e93fcf893 Merge "vp9_rd_pick_intra_mode_sb(): set interp_filter to" 2016-07-11 22:31:06 +00:00
Scott LaVarnway
ed7786869a vp9_rd_pick_intra_mode_sb(): set interp_filter to
SWITCHABLE_FILTERS.  This is a partial fix for the build
issues with Change 357240.

Change-Id: I4e507c196175bae729a4f1397878ec8776b0146c
2016-07-09 09:47:34 -07:00
Jingning Han
2f28f9072e Enable coeff optimization for intra modes
This further improves the coding performance by
lowres 0.3%
midres 0.5%
hdres  0.6%

Change-Id: I6a03b6da210b9cbc261474bad4a103e0ba021c68
2016-07-07 12:25:41 -07:00
Jingning Han
62aa642d71 Enable uniform quantization with trellis optimization in speed 0
This commit allows the inter prediction residual to use uniform
quantization followed by trellis coefficient optimization in
speed 0. It improves the coding performance by

lowres 0.79%
midres 1.07%
hdres  1.44%

Change-Id: I46ef8cfe042a4ccc7a0055515012cd6cbf5c9619
2016-07-07 12:25:33 -07:00
Jingning Han
541eb78994 Refactor coeff_cost() function
Move the operations that update the context buffers outside this
function. The coeff_cost() takes all input as const value and returns
the coefficient cost.

This makes preparation for the next coefficient optimization CLs.

Change-Id: I850eec6e5470b91ea84646ff26b9231b09f70a0c
2016-07-07 18:09:39 +00:00
Jingning Han
e357b9efe0 Support measure distortion in the pixel domain
Use pixel domain distortion metric in speed 0. This improves the
compression performance by 0.3% for both low and high resolution
test sets.

Change-Id: I5b5b7115960de73f0b5e5d0c69db305e490e6f1d
2016-07-06 18:25:17 -07:00
Jingning Han
14011f037d Remove txfrm_block_to_raster_xy() from vp9 encoder
The transform block row and column positions are always available
outside the callees. There is no need to re-compute these values
again. This approach has been used by the decoder. This commit
removes txfrm_block_to_raster_xy() function.

Change-Id: I5b90f91a0d8b7c35cfa7d171da9edf8202630108
2016-07-04 18:41:47 -07:00
Scott LaVarnway
74bb78df82 Merge "VP9: handle_inter_mode()... Use interp_filter" 2016-06-29 11:41:52 +00:00
Scott LaVarnway
feb7e9a372 VP9: handle_inter_mode()... Use interp_filter
only if above/left is inter.

Change-Id: I0cc1f926425c021c84536df8271e9ee5f3f87caf
2016-06-28 14:09:59 -07:00
James Zern
ca88d22f39 s/UINT32_MAX/UINT_MAX/
provides better toolchain compatibility

Change-Id: I8561a6de668a68ff54fe3886a4ee6300f0ae9c04
2016-06-25 12:15:51 -07:00
James Zern
b34705f64f Merge "cosmetics: Beautify whitespaces and line wrapping" 2016-06-24 21:51:01 +00:00
Yury Gitman
67611119b5 cosmetics: Beautify whitespaces and line wrapping
Change-Id: I9afa02cae671bd3527cf344695e53d0cc767f549
2016-06-24 10:18:06 -07:00
Yaowu Xu
7738bcb350 Rationalize type to avoid integer out of range
BUG=webm:1250

Change-Id: Id5bb2762ca1bf996ba4f9a60eec977a7994c1d94
2016-06-24 13:58:02 +00:00
Yaowu Xu
b3933e2d3c Merge "Fix ubsan warnings: vp9/encoder/vp9_mcomp.c" 2016-06-22 00:12:58 +00:00
Yaowu Xu
87bf1a149c Fix ubsan warnings: vp9/encoder/vp9_mcomp.c
This commit fixes a number of ubsan warnings in HBD build.

BUG=webm:1219

Change-Id: I05f0fd0ef50e93db4ba34205005c54af1ed32acc
2016-06-21 15:37:59 -07:00
hui su
a5af392aae Add a hardware compatibility feature
This commit adds an encoder workaround to support better
compatibility with a non-compliant hardware vp9 profile 2 decoder.

The known issue with this decoder is:
The decoder assumes a wrong value, 127 instead of the correct
value of 511 and 2047, for any assumed top-left corner pixel in
UV planes for 10 and 12 bit, respectively. Such assumed
top-left corner pixel is used for INTRA prediction when a real
decoded/reconstructed pixel is not avalable, e.g. when it is
located inside the row above the top row or inside the column
left to the leftest column of a video image.

Change-Id: Ic15a938a3107e1b85e96cb7903a5c4220986b99d
2016-06-21 10:33:57 -07:00
Scott LaVarnway
ba962a5f37 VP9: Eliminate up_available and left_available
Use above_mi and left_mi instead.

Change-Id: I0b50e232c31d11da30aa2fb6f91a695aaf725e0c
2016-03-30 04:47:39 -07:00
Julia Robson
74a679de6f Port "cost_coeff speed improvements" to vp9.
About a 5% faster overall encode (perf cycles) at speed zero!

Change-Id: Iaf013ba75884415cd824e98349f654ffb1c3ef33
2016-02-26 14:47:18 -08:00
Jingning Han
d642294b1c Fix tsan error in VP9 sub8x8 intra mode search
This commit fixes issue 1141. The issue was triggered in multi-tile
encoding. The change properly saves and restores the block context
information in the real-time mode selection process. It removes
several redundant memcpy operations in sub8x8 intra block mode search.

Change-Id: I35c9ad197f4bd500ec39b5fc833f052f19eee010
2016-02-16 11:24:09 -08:00
Jingning Han
f032c7eaed Merge "Account for sub8x8 block skip mode cost in RD decision" 2016-02-08 19:40:01 +00:00
Jingning Han
203bdd20fb Account for sub8x8 block skip mode cost in RD decision
Make this consistent with regular block size rate-distortion
optimization. It improves the compression performance:
derf    0.055%
hevcmr  0.129%

Change-Id: I112fe734f592c21bc7aa6efb7e3f269c4214ee7b
2016-02-08 10:18:51 -08:00
Jingning Han
ac6d40ece8 Clean up in vp9_rd_pick_inter_mode_sb
Use local variable.

Change-Id: I0d3df36cf4536958a0cda422f6c30da50f0e0bbf
2016-02-08 10:15:02 -08:00
Jingning Han
bcce658d31 Use precise rate cost estimate for skip block mode
It improves the compression performance of VP9 by 0.1% across all
test sets. No speed change is observed.

Change-Id: I59338c5c9e67bae22188f35fc3afbfe2a6bba6b0
2016-02-03 11:09:16 -08:00
Alex Converse
d13385cee7 Switch to 9-bit rate cost constants built on a 256 probability denominator.
-.220 BDRATE derf: https://x20web.corp.google.com/~aconverse/results/cost256_derf.html
-.675 BDRATE hevcmr: https://x20web.corp.google.com/~aconverse/results/cost256_hevcmr.html

Change-Id: Ifb1646d8ce65ffe0eff9953a911b1b88735b335f
2016-01-27 19:34:30 +00:00
Alex Converse
4326cffa65 Merge "Tie the bit cost scale to a define." 2016-01-21 19:17:56 +00:00
Scott LaVarnway
5232326716 VP9: Eliminate MB_MODE_INFO
Change-Id: Ifa607dd2bb366ce09fa16dfcad3cc45a2440c185
2016-01-19 16:40:20 -08:00
Alex Converse
269428e35c Tie the bit cost scale to a define.
This is a pure-refactor in preparation to potentially raise the bit-cost
resolution.

Verified at good speed 0 and rt speed -6.

Change-Id: I5347e6e8c28a9ad9dd0aae1d76a3d0f3c2335bb9
2016-01-15 15:59:31 -08:00
Scott LaVarnway
a85e552d95 VP9: Remove decoder args from find_mv_refs_idx()
The decoder does not use this function.

Change-Id: Ie67f909c0f4108ef286789c70df867d4b960a780
2016-01-13 13:30:40 -08:00
Yaowu Xu
9cac17d157 Enable encoder to avoid 8x4 or 4x8 partitions
This commit enables encoder to avoid 8x4 and 4x8 partitions for
scaled reference frames when libvpx is configured and built with
--enable-better-hw-compatibility

Change-Id: I02ad65c386f5855f4325d72570c49164ed52f413
2016-01-07 09:53:14 -08:00
Yaowu Xu
650a2d7628 Fix a typo
Change-Id: I12de2dd5e5f375551804166188d76a9ad8067b41
2016-01-07 09:29:34 -08:00