Commit Graph

16948 Commits

Author SHA1 Message Date
paulwilkins
76550dfdc0 Bug in scale_sse_threshold()
The function scale_sse_threshold() returns a threshold scaled
if necessary for use with 10 and 12 bit from an 8 bit baseline.

SSE error values would be expected to rise for the 10 and 12
bit cases where there are more bits of precision.

Hence the threshold used for the test should also be scaled up.

Change-Id: I4009c98b6eecd1bf64c3c38aaa56598e0136b03d
2017-02-15 10:45:03 +00:00
paulwilkins
945ccfee59 Additional first pass stats.
Added counts that split the intra coded blocks into low and high variance.

Change-Id: Ic540144b34d5141659081bb22f7ee16fd6861f14
2017-02-15 10:44:37 +00:00
Paul Wilkins
7635ee0f37 Merge "Aggressive VBR method." 2017-02-15 10:37:02 +00:00
James Zern
1cd926d665 vpx_temporal_svc_encoder.sh: remove FUNCNAME bashism
replace with an explicit output file prefix that matches the function
name

Change-Id: I7f6a4105adb34327b1099a5fbf132aa8d1ad5b90
2017-02-14 23:44:00 -08:00
Johann Koenig
61927ba4ac Merge "vp9 fdct higbd neon: connect existing highbd calls" 2017-02-15 01:33:00 +00:00
Linfeng Zhang
e07e74fb0f Add vpx_highbd_idct16x16_38_add_c()
When eob is less than or equal to 38 for high-bitdepth 16x16 idct,
call this function.

BUG=webm:1301

Change-Id: I09167f89d29c401f9c36710b0fd2d02644052060
2017-02-14 17:25:52 -08:00
Yunqing Wang
f2c1aea118 Merge "Row based multi-threading of encoding stage" 2017-02-15 00:54:10 +00:00
Ranjit Kumar Tulabandu
71061e9332 Row based multi-threading of encoding stage
(Yunqing Wang)
This patch implements the row-based multi-threading within tiles in
the encoding pass, and substantially speeds up the multi-threaded
encoder in VP9.

Speed tests at speed 1 on STDHD(using 4 tiles) set show that the
average speedups of the encoding pass(second pass in the 2-pass
encoding) is 7% while using 2 threads, 16% while using 4 threads,
85% while using 8 threads, and 116% while using 16 threads.

Change-Id: I12e41dbc171951958af9e6d098efd6e2c82827de
2017-02-15 00:49:34 +00:00
Linfeng Zhang
615566aa81 Merge "Replace 14 with DCT_CONST_BITS in idct NEON functions' shifts" 2017-02-15 00:46:29 +00:00
Johann
86fed469ec vp8_dx_iface: remove unused 'else' condition
Clears up static clang analysis warning regarding a dead store.

Change-Id: If4fe7a9a7f94c6e2001d46136944f90712e543b4
2017-02-15 00:05:41 +00:00
Johann
327a02d77e Use 'packssdw' for loading tran_low_t values
This matches bitdepth_conversion_sse2.asm and produces substantially
better assembly. The old way had lots of 'movzwl' and 'shl' and storing
back to memory before loading into an xmm register.

Change-Id: Ib33e35354dfd691a4f8b1e39f4dbcbb14cd5302b
2017-02-14 22:39:49 +00:00
Johann
3e7aa8fda9 vp9 fdct higbd neon: connect existing highbd calls
Change-Id: Ia8f822bd6e70b3911bc433a5a750bfb6f9a3a75c
2017-02-14 22:11:49 +00:00
Johann Koenig
9c2bb7f342 Merge "quantize_fp highbd neon: use tran_low_t for coeff" 2017-02-14 21:28:23 +00:00
Linfeng Zhang
429e652809 Replace 14 with DCT_CONST_BITS in idct NEON functions' shifts
Change-Id: I2a39a3bb87516b04d273bc1c0f4a634e3fb6f0f6
2017-02-14 13:08:41 -08:00
clang-format
4b402746ca apply clang-format
Change-Id: I75e4a9e0b37bd4586f26c8d6c1fa27f3f6ff1bce
2017-02-14 12:45:52 -08:00
James Zern
f670628ca5 .clang-format: update to 3.9.1
Change-Id: Ia51f2201df897651067d09122075953382b59139
2017-02-14 12:39:54 -08:00
Yi Luo
c1a90dc160 Merge "Replace idct32x32_34_add_ssse3 assembly with intrinsics" 2017-02-14 20:13:27 +00:00
Yi Luo
bd86de1ac8 Replace idct32x32_34_add_ssse3 assembly with intrinsics
- No user-level speed performance change.
- Pass unit tests.

Change-Id: Idfc598e00f354265e41f6b3219f4734216c115c6
2017-02-14 10:38:36 -08:00
Johann
2b24aa87d9 quantize_fp highbd neon: use tran_low_t for coeff
Change-Id: I90fd815f15884490ad138f35df575a00d31e8c95
2017-02-14 10:26:10 -08:00
Johann
25301a84a8 vp8 onyx_if: assert divide by zero
Clears up static clang analysis warning regarding divide by zero.

Trying to explain to the compiler how it's impossible to avoid
incrementing num_blocks at least once is difficult.

Change-Id: Ibaae43be572e5cd7a689b440dcd341c17d33443b
2017-02-14 04:27:31 +00:00
Johann Koenig
eeb288d568 Merge "Remove UNINITIALIZED_IS_SAFE" 2017-02-14 03:02:51 +00:00
Linfeng Zhang
de9ae32b93 Merge "Add vpx_highbd_idct16x16_256_add_neon()" 2017-02-14 01:15:34 +00:00
Johann
8a1fb40273 Remove UNINITIALIZED_IS_SAFE
Where clang static analysis or gcc -Wmaybe-uninitialized warns of
uninitialized values, assign 0 to ints, MB_MODE_COUNT to
MB_PREDICTION_MODE, and B_MODE_COUNT to B_PREDICTION_MODE.

Assert that the modes have been changed from the invalid value by
the end of the function.

Change-Id: Ib11e1ffb08f0a6fe4b6c6729dc93b83b1c4b6350
2017-02-14 00:56:08 +00:00
Linfeng Zhang
5ad4159ebb Add vpx_highbd_idct16x16_256_add_neon()
BUG=webm:1301

Change-Id: I6bb755552a39bdd26eef3f449601f6a9766c65ec
2017-02-13 15:50:33 -08:00
Johann Koenig
4526ec7907 Merge "fdct8x8 highbd neon: use tran_low_t for output" 2017-02-13 23:11:30 +00:00
Johann
5ecde212a8 fdct8x8 highbd neon: use tran_low_t for output
Change-Id: I100c4a1955d80bec4d28e82796b3e7f57e84d0ba
2017-02-13 22:16:14 +00:00
Yunqing Wang
318ca07657 The bitstream bit match test in multi-threaded encoder
While the new-mt mode is enabled(namely, allowing to use row-based
multi-threading in encoder), several speed features that adaptively
adjust encoding parameters during encoding would cause mismatch
between single-thread encoded bitstream and multi-thread encoded
bitstream. This patch provides a set_control API to disable these
features, so that the bit match bitstream is obtained in the unit
test.

Change-Id: Ie9868bafdfe196296d1dd29e0dca517f6a9a4d60
2017-02-13 13:02:26 -08:00
Yunqing Wang
e7db593a46 Merge "Minor code style refactoring" 2017-02-13 21:01:41 +00:00
James Zern
45664383f1 Merge "cosmetics,vp9_ratectrl: apply clang-format" 2017-02-13 21:01:18 +00:00
James Zern
7a48bfab47 Merge "vpx_usec_timer_elapsed: use 64-bit math" 2017-02-13 21:00:33 +00:00
Yunqing Wang
f024518387 Minor code style refactoring
Change-Id: I20107693d0a87e08a10520bfb573ff3dcef69fdb
2017-02-13 12:59:01 -08:00
James Zern
3c4ea94210 cosmetics,vp9_ratectrl: apply clang-format
broken since:
c3f095c8b Merge "Fix to avoid abrupt relaxation of max qindex in recode path"
5f21aba4b Fix to avoid abrupt relaxation of max qindex in recode path

the original change pre-dated the addition of .clang-format

Change-Id: If5e399d9a805bcad9147360b13b36fbc8c560a7c
2017-02-13 11:29:39 -08:00
Linfeng Zhang
016933ad48 Add vpx_highbd_idct{16x16,32x32}_1_add_neon()
and update vpx_highbd_idct8x8_1_add_neon()

BUG=webm:1301

Change-Id: I18d1a0cbe98ba822d5194c1b4e13a4c29c5c75f4
2017-02-13 10:25:22 -08:00
paulwilkins
ce7b38459a Aggressive VBR method.
VBR method that allows a wider Q range for the first normal frame
in each ARF group and then centers the min - max range for the rest of
the arf group on the chosen Q value for that first frame.

This allows for quite rapid adjustment of the active Q range even if the
initial estimate is poor.

In some cases where the ARF frames themselves are tending to
undershoot but the normal frames are overshooting this can still give
net undershoot. This can be corrected by allowing a larger Q delta for
arf frames but is usually is a sign that the allocation to the arfs was to
high.

Change-Id: Icec87758925d8f7aeb2dca29aac0ff9496237469
2017-02-13 15:42:11 +00:00
James Zern
91f87e7513 Merge "Add vpx_idct16x16_38_add_neon()" 2017-02-11 03:42:36 +00:00
Marco
22dcfa80aa vp9: Non-rd mode: use simple block_yrd for 8 bit high bitdepth builds
Temporary fix until optimization work for block_yrd is completed.
This essentially reverts back to the state before the change:
https://chromium-review.googlesource.com/c/433821/

Compression loss is about ~5-6% on RTC set.
Speed-up (from using this simple/model-based block_yrd) over the low
bitdepth builds (which uses more complex block_yrd) is ~5% on 720p.

Change-Id: Ie0af9eb0d111e5595f587870c44f08317403b8d8
2017-02-10 10:15:35 -08:00
James Zern
943f9c0356 vpx_usec_timer_elapsed: use 64-bit math
this prevents a rollover when tv_sec is a long:
signed integer overflow: 2776 * 1000000 cannot be represented in type
'long'

Change-Id: I03dc4476ee122b02e2856dad28358a20cf16a9f8
2017-02-09 19:28:59 -08:00
Paul Wilkins
c3f095c8b3 Merge "Fix to avoid abrupt relaxation of max qindex in recode path" 2017-02-09 17:17:55 +00:00
Paul Wilkins
82b88a7fd0 Merge "Fix for max qindex calculation of a gf interval" 2017-02-09 17:17:44 +00:00
Linfeng Zhang
bc1c18e18c Add vpx_idct16x16_38_add_neon()
The RunQuantCheck() test on it exposes 16-bit overflow in stage 7 of
pass 2. Change to use saturating add/sub for both
vpx_idct16x16_38_add_neon() and vpx_idct16x16_256_add_neon() for high
bitdepth.

Change-Id: Ibf4c107a887553a52852cc582e28d38a5a5a2712
2017-02-08 12:15:22 -08:00
Yi Luo
ac04d11abc Replace idct8x8_12_add_ssse3 assembly code with intrinsics
- Performance achieves the same as assembly.
- Unit tests pass.

Change-Id: I6eacfbbd826b3946c724d78fbef7948af6406ccd
2017-02-08 10:07:45 -08:00
Linfeng Zhang
0fefc6873a Merge "Add vpx_idct16x16_38_add_c()" 2017-02-08 17:20:19 +00:00
Johann Koenig
b73f99745b Merge "block_error_fp highbd sse2: use tran_low_t for coeff" 2017-02-07 23:26:10 +00:00
Marco Paniconi
71f5314993 Merge "vp9: Denoiser speed-up: increase partition and ac skip thresholds." 2017-02-07 22:25:00 +00:00
Yunqing Wang
b106abe570 Merge "Row based multi-threading of ARNR filtering stage" 2017-02-07 19:55:41 +00:00
Marco Paniconi
259e835b1b Merge "vp9: Adjust rate_err threshold for setting active_worst factor." 2017-02-07 19:25:47 +00:00
Marco
1a5482d4d8 vp9: Denoiser speed-up: increase partition and ac skip thresholds.
Add factor to increase varianace partition and ac skip thresholds,
under certain conditions (noise level and sum_diff), to increase
denoiser speed.

Change-Id: I7671140ef3598bf5f114a72623d68792bcd7b77b
2017-02-07 10:33:13 -08:00
Linfeng Zhang
cf76ee2cb7 Add vpx_idct16x16_38_add_c()
When eob is less than or equal to 38 for 16x16 idct, call this function.

Change-Id: Ief6f3fb16a49ace3c92cebf4e220bf5bf52a6087
2017-02-07 09:40:51 -08:00
Marco
3c2f076ad0 vp9: Adjust rate_err threshold for setting active_worst factor.
Only affects 1 pass vbr.
Small improvement on ytlive set.

Change-Id: I09a7456fe658fbea82ece1035cf683bd8bd8bd14
2017-02-07 09:38:16 -08:00
Linfeng Zhang
66695533a8 Merge "Update 16x16 8-bit idct NEON intrinsics" 2017-02-07 16:52:40 +00:00