This commit removes the cyclic aq mode dependency on
in_static_area and reworks the corresponding cut-off thresholds.
It improves the compression performance of speed -5 by 1.47% in
PSNR and 2.07% in SSIM, and the compression performance of speed
-6 by 3.10% in PSNR and 5.25% in SSIM. Speed wise, about 1% faster
in both settings at high bit-rates.
Change-Id: I1ffc775afdc047964448d9dff5751491ba4ff4a9
This will save the memory and improve the decode speed due to
removing unnecessary memset of big prev_mi array for
all the key frames.
Decoding a all key frames 1080p video shows speed improve around 2%.
Change-Id: I6284a445c1291056e3c15135c3c20d502f791c10
This commit makes the RTC coding mode to conditionally skip the
reference frame mode search, when the predicted motion vector of
the current reference frame gives more than two times sum of
absolute difference compared to that of other reference frames.
It reduces the runtim by 1% - 4% for speed -5 and -6. The average
compression performance is improved by about 0.1% in both settings.
It is of particular benefit to light change scenarios. The
compression performance of test clip mmmovingvga.y4m is improved by
6.39% and 15.69% at high bit rates for speed -5 and -6, respectively.
Speed -5
vidyo1 16555 b/f, 40.818 dB, 12422 ms ->
16552 b/f, 40.804 dB, 12100 ms
nik 33211 b/f, 39.138 dB, 11341 ms ->
33228 b/f, 39.139 dB, 11023 ms
mmmoving 33263 b/f, 40.935 dB, 13508 ms ->
33256 b/f, 41.068 dB, 12861 ms
Speed -6
vidyo1 16541 b/f, 40.227 dB, 8437 ms ->
16540 b/f, 40.220 dB, 8216 ms
nik 33272 b/f, 38.399 dB, 7610 ms ->
33267 b/f, 38.414 dB, 7490 ms
mmmoving 33255 b/f, 40.555 dB, 7523 ms ->
33257 b/f, 40.975 dB, 7493 ms
Change-Id: Id2aef76ef74a3cba5e9a82a83b792144948c6a91
This commit unfolds the legacy macro definitions used in the
sub-pixel motion search and refactors the operational flow for
later optimizations.
Change-Id: I3e3f770cad961d03d1a6eb0b2a0186cc77eaf2b8
The current logic was allowing for disabling golden refresh only
for two pass svc encoding. This change disables it as long as
more than 1 layer encoding is used (for example temporal layers under 1pass CBR).
Change-Id: I4dc5204a7ad365c821ec7963e93b59da82e1826b
A recent change has introduced big quality drops for speed 7 and 12
for --rt mode. The change reverted the big drop and improved quality
by 9.5% for speed 7 and 13.4% for speed 12.
Change-Id: I07b82e3bb6002a73af486a083458c88877bdad01
This will save a lot of memory for decoder due to removing of prev_mi,
but prev_mi is still needed in encoder. So this will increase a little bit
memory for encoder.
Change-Id: I24b2f1a423ebffa55a9bd2fcee1077dac995b2ed
This commit makes the inter prediction buffer system to support
hybrid partition search. It reduces the runtime of speed -5 by
about 3%. No compression performance change.
vidyo1 720p 1000 kbps
11831 ms -> 11497 ms
nik 720p 1000 kbps
10919 ms -> 10645 ms
Change-Id: I5b2da747c6395c253cd074d3907f5402e1840c36
Combined vp9_denoiser_8xM_sse2 and vp9_denoiser_4xM_sse2 into one
function vp9_denoiser_NxM_sse2_small and passed the bitexact testing.
Changed the name of the function vp9_denoiser_64_32_16xM_sse2 to
vp9_denoiser_NxM_sse2_big.
Change-Id: Ib22478df585994dd347ebae04202c0b701e7f451
This commit changes to allow the usage of golden reference frame in
VP9 CBR mode to improve quality. VP9 supports potentially up to 8
reference buffers, it has reference buffers available for this
purpose. This was not possible in VP8 as golden and alt-ref buffers
were used for temporal scalability purpose in CBR mode in WebRTC.
For frames that update golden frame, there can be a quality boost.
The amount of allowed bitrate boost can be controlled via parameter
rc_max_inter_bitrate_pct. The inital value of the boost ratior is
currently based on over_shoot_pct. Further experiments will work
out the adaption of this boost value.
Change-Id: I0c5f010c8fd8b7b598f69779c1b30e5b2ac30a4d
Added code to relax the active maximum Q in response
to extreme local overshoot to reduce bandwidth peaks.
The impact is small in metrics terms, but it this helps reduce
bandwidth spikes and overall overshoot in a number of
clips in our tests sets (especially the YT test set).
In particular this should help prevent very big spikes where a clip
is mainly easy but has a short hard section. In such a case a choice
of maximum Q for the clip as a whole may allow us to hit the overall
target rate but give some extreme spikes. The chunked encoding in YT
mitigates this problem but it can show up where a longer clip is
coded as a single chunk.
Change-Id: I213d09950ccb8489d10adf00fda1e53235b39203
The zero motion vector was effectively used in the subsampled pixel
based variance calculation. This commit makes it directly use zero
mv to generate prediction.
Change-Id: Ica83dc843e9f8da2f89c3ef451e50f16214c0def
0 means that golden boost is off, and uses average frame target rate,
a non-zero number means the percentage of boost over average frame
bitrate is given initially to golden frames in CBR mode.
Change-Id: If4334fe2cc424b65ae0cce27f71b5561bf1e577d
The point at which frames are scaled to their
coded dimensions is moved into the re-code loop.
This is in preparation for a further patch that
will add logic into the re-code loop to reduce
the coded frame size if the encoder is struggling
to hit the target data rate at the native frame
size.
Change-Id: Ie4131f5ec6fb93148879f6ce96123296442bf2d1
Add second level arf Q adjustment when using dual arfs
in constant Q mode.
Previously in constant Q mode enabling dual arf hurt by ~5%
but with this change the average benefit is ~1-1.5% with some
mid range data points up ~10%.
Note however that it still hurts on some clips including
some very low motion show content.
Change-Id: I5b7789a2f42a6127d9e801cc010c20a7113bdd9b
This patch allocated frame contexts outside VP9_COMMON. This allows
multiple threads to share the same copy of frame contexts, and
reduces the overhead. It also guarantees the correct update of
these contexts during bitstream packing. This patch doesn't change
encoding result.
Change-Id: Ic181a2460b891d1d587278a6d02d8057b9dbd353
The initialization of this_mode_pred does not work when the ref_frame
loop ever goes beyond LAST_FRAME. This commit fixes the subtle issue
and allows potentially expanding the loop to test GOLDEN_FRAME.
Change-Id: Ibbd427a22160d1d9eacb8ed0c87f88d6cef9c0f3
Using 4 threads, frame parallel decode is ~3x faster than single thread
decode and around 30% faster than tile parallel decode for frame parallel
encoded video on both Android and desktop with 4 threads. Decode speed is
scalable to threads too which means decode could be even faster with more threads.
Change-Id: Ia0a549aaa3e83b5a17b31d8299aa496ea4f21e3e
This commit refactors the rate distortion structure used in the
non-RD coding mode and saves a few RDCOST calculations.
Change-Id: I62c3416c300d2c5372f21b96d93a6b633a34ab3a
The existing speed features produce horrible encoding results, almost
30% worse than cpu-used=4, this commit adjust the speed features to
produce relatively resonable results to be within 3%-5% of cpu-used=4.
Change-Id: I0ca6ebafb33024d4a0cbcf04c78a4a00b8dd1ecf
Its functionality has been replaced with choose_partitioning and
threshold based control on split mode check.
Change-Id: Ic9bb321df06b524f5c38ea5874dc6f6a8f93c5e3
This speed feature has been deprecated in both yt and rtc coding
modes. This commit removes the related operations.
Change-Id: I079c79c6adafe45581af2ebf8b98faebcface1ce
This commit re-designs the recursive partition search scheme in
rtc speed -5. It first checks if the current block is under cyclic
refresh mode. If so, apply recursive partition search. Otherwise,
perform sub-sampled pixel based partition selection. When the
pre-selection finds the partition size should be 32x32 or above,
use the partition size directly. Otherwise, apply partition search
at nearby levels around the preset partition size.
It is enabled in speed -5. The compression performance of rtc
speed -5 is improved by 9.4%. Speed wise, the run-time goes slower
from 1% to 10%.
nik_720p, 1000 kbps
33220 b/f, 38.977 dB, 10109 ms -> 33200 b/f, 39.119 dB, 10210 ms
vidyo1_720p, 1000 kbps
16536 b/f, 40.495 dB, 10119 ms -> 16536 b/f, 40.827 dB, 11287 ms
Change-Id: I65adba352e3adc03bae50854ddaea1b421653c6c
Extend --auto-alt-ref from parameter so we can use it to
turn multi-arf on and off from the command line.
For now the range is 0-off, 1-on, 2-multi-arf on.
Rename play_alternate to enable_auto_arf
Change-Id: Id7b64407cfbe76ba0090a83b588a03e22a240386
All sad function that process above 32 consecutive elements are optimized
for AVX2:
vp9_sad64x64
vp9_sad64x32
vp9_sad32x64
vp9_sad32x32
vp9_sad32x16
vp9_sad64x64_avg
vp9_sad64x32_avg
vp9_sad32x64_avg
vp9_sad32x32_avg
vp9_sad32x16_avg
The functions that appeared as a hotspot is vp9_sad32x32 and vp9_sad64x64
vp9_sad32x32 was optimized by 68% and vp9_sad64x64 was optimized by 90%
both of them gave and overall ~2.3% user level gain
Change-Id: Iccf86b375a2b54c5fbbe685902ead0c9a561b9fd
Currently, the tokens for a tile are stored immediately after its
preceding tile, which causes a dependency. This is unnecessary
since we always allocate enough memory for tokens. Removing
the dependency allows token writing done in parallel. This patch
doesn't change encoding result.
Change-Id: I7365a6e5e2c2833eb14377c37e1503c9d0f26543
When early termination is triggered, properly reset the rate cost
to invalid value to avoid potential ioc issue.
Change-Id: I3444390be2e49a34bb02cf8a74c33d5dbd96d88d
Delete gfboost_qadjust() and move Q based adjustment
into calc_frame_boost(). Also remove clamping. Making
the adjustment here means that it influences not just the
boost level but also the selection of the GF/ARF interval.
This change gives a small average gain in PSNR but
larger gains in SSIM, especially for harder std-hd set (1.5%)
Change-Id: I3aa81b8feccaeff93d915e19fb9cf5cd64c86327
This commit fixes an ioc issue that will happen when the cumulative
variables are not in effective use. The fix discards these
redundant additions.
Change-Id: Idbac5bfb989c0cedc5f8a323effce938519b2457
This removes an unnecessary restriction that causes
a problem (noticed by AWG) when the forced key frame
interval is set to a very small value, such as 10. In this case
we were being forced to code minimal length GF groups.
Change-Id: I76ef5861a09638ff51f61fea02359554184ada53
We encode a empty invisible frame in front of the base layer frame to
avoid using prev_mi. Since there's a restriction for reference frame
scaling factor, we have to make it smaller and smaller gradually until
its size is 16x16.
Change remerged.
Change-Id: I9efab38bba7da86e056fbe8f663e711c5df38449
This reverts commit 452dc21500.
This change has introduced a significant quality regression on content
with forced key frames. (e.g. the YT and yt-hd set). It is most
noticeable in static content where the kf bits dominate. Here, despite
key frames being apparently coded at the same Q, there is a drop in all
metrics of ~20% (e.g clXR and BFa0).
Change-Id: Iba14cc61778c0846fa0a59c33c55a9fc49512cb4
Compare the estimated rate and distortion to the thresholds scaled
according to the operating block size and determine if further
split partition search will be run. The compression performance of
speed -5 is changed by -0.074%. The encoding speed is 10% - 15%
faster.
vidyo1 720p
16545 b/f, 40.492 dB, 11475 ms -> 16535 b/f, 40.486 dB, 10100 ms
nik720p
16624 b/f, 36.310 dB, 10071 ms -> 16617 b/f, 36.313 dB, 8346 ms
Change-Id: Ic9197ab5761279ae55d2fb7813b2af0e0db497b8
Reduce the intra_cost_penalty for non-rd mode,
and some updates to VAR_BASED_PARTITION.
Visual tests show some improvement at Speed 6, for RTC clips.
Change-Id: If9090daf7aed14906a32d931a538ab544bbca606
This commit replaces the use of copy_partitioning with
choose_partitioning based on the sse of subsamped pixels, which
provides significantly better coding performance and runs at
similar speed, as compared to copy_partitioning. It improves rtc
speed 5 coding performance by 3%.
Change-Id: I52d3682a12dce0147f5e52383a594fc242ca3228
We encode a empty invisible frame in front of the base layer frame to
avoid using prev_mi. Since there's a restriction for reference frame
scaling factor, we have to make it smaller and smaller gradually until
its size is 16x16.
Change-Id: I60b680314e33a60b4093cafc296465ee18169c19
Move the point at which input frames are scaled
into the recode loop. This will allow us to change
the coded frame size dynamically in response
to previous attempts to encode the frame at a
higher resolution.
A following patch will implement a scheme for
resizing the frame in the recode loop.
Change-Id: I6a59c02d6ac1626512edad6de8b60063b79433e6
This commit makes a struct that contains rate value, distortion
value, and the rate-distortion cost. The goal is to provide a
better interface for rate-distortion related operation. It is
first used in rd_pick_partition and saves a few RDCOST calculations.
Change-Id: I1a6ab7b35282d3c80195af59b6810e577544691f
Add back clamp which ensures that the Q adaptation
is turned off when the over_shoot_pct and under_shoot_pct
parameters are set to 100.
Change-Id: Id0161b114d39a3029cd3eb28020caab0c3914922
Allow min and maxQ to creep when the undershoot
or overshoot exceeds thresholds controlled by the
command line under_shoot_pct and over_shoot_pct
values.
Default is 100%,100% which ~disables adaptation.
Derf results for example undershoot% / overshoot%:-
Head:- Mean abs (%rate error) = 14.4%
This check in:-
25%/25% - Mean abs (%rate error) = 6.7%
PSNR hit -1% SSIM -0.1%
5% / 5% - Mean abs (%rate error) = 2.2%
PSNR hit -3.3% SSIM - 1.1%
Most of the remaining error and most of the quality hit is
at extreme data rates. The adaptation code still has an
exception for material that is in effect static so that we
don't over adjust and over spend on YT slide show type
content.
(Rebase of If25a2449a415449c150acff23df713e9598d64c9
to resolve a auto-merge error)
Change-Id: Iec4e1613ef0d067454751d8220edb7058dfbd816
Allow min and maxQ to creep when the undershoot
or overshoot exceeds thresholds controlled by the
command line under_shoot_pct and over_shoot_pct
values.
Default is 100%,100% which ~disables adaptation.
Derf results for example undershoot% / overshoot%:-
Head:- Mean abs (%rate error) = 14.4%
This check in:-
25%/25% - Mean abs (%rate error) = 6.7%
PSNR hit -1% SSIM -0.1%
5% / 5% - Mean abs (%rate error) = 2.2%
PSNR hit -3.3% SSIM - 1.1%
Most of the remaining error and most of the quality hit is
at extreme data rates. The adaptation code still has an
exception for material that is in effect static so that we
don't over adjust and over spend on YT slide show type
content.
Change-Id: If25a2449a415449c150acff23df713e9598d64c9
- Some fixes to surface fit.
- Returns variance function as cost rather than sad in the
pattern search and diamond search functions. Only
vp9_pattern_search_sad function used in bigdia search
uses sad as integer 1-away costs.
- Deploys SUBPEL_TREE_PRUNED_MORE for speed 4+.
Results:
derf [Speed 3]: About +0.036% in coding efficiency without any
discernible speed loss.
derf [Speed 4]: About 2-3% faster at -0.199% loss in coding efficiency.
derf [Speed 5]: About 3-4% faster at -0.149% loss in coding efficiency.
Change-Id: I8462f94f6adb46966ca964f2bd0400977357fd63
In model_rd_for_sb function, the spatial domain SSE and variance
are checked to see if transform coefficients are quantized to 0.
Besides that, this patch adds another set of thresholds that are
much more strict. These thresholds are used to conduct a partition
block level check to measure if all its TX blocks are skippable
for YUV planes. If it is true, x->skip is set for this partition
block, and thus its mode search is terminated.
This speeds up the encoding at very low prediction error case,
such as screen sharing application. This patch covers what
rd_encode_breakout_test() does, so that function is removed.
Borg test at speed 3 shows:
For stdhd set, psnr: +0.008%, ssim: +0.014%;
For derf set, psnr: +0.018%, ssim: +0.025%.
No noticeable speed change.
Change-Id: I4e5f15cf10016a282a68e35175ff854b28195944
For input source with size that is not multiple of 8, the size is
rounded to 8 and saved in width or height, the original source sizes
are saved in crop_width and crop_height. This commit corrects the
computation of bottom and right extension amounts to use the orignal
sizes, hence crop_width and crop_height.
In addition, this commit also adds the missed initialization for
uv_crop_width and uv_crop_height.
This addresses issue #834
Change-Id: I084543ca7645a4964b88f7cf8ff668f517d3a39b
The concept:
There's too much noise in source pixels for variance and at low bitrate
the reconstructed looks nothing like the source so we have problems
getting good partitionings with either. This skirts the issue by using
a box blur scaled down version for variance calculations. To compare
against source_var_ moved keyframe to be rd based like source_var.
Change-Id: Ie3babdbfadae324b7b5a76bea192893af27f0624
Fixed an encoder crash. Set skip_txfm to 0 for cases that skip_txfm
isn't calculated. Put memcpy of skip_txfm at right place.
Change-Id: Ib3b6afc1b251a85b2a853c8138fb3393f48cfef6
The functions b_width_log2 and b_height_log2 only do direct
table fetch. This commit unifies such use cases by using the
table directly and removes these functions.
Change-Id: I3103fc6ba959c1182886a2799d21b8b77c8a7b6b
Add comments on the use case of these definitions. Further reduce
the scope of header file in vp9_context_tree.h.
Change-Id: Ic4a7638e838d0ac441b64abfc56e57354c059d75
This commit fixes a buffer pointer mis-use in store_coding_context.
The compression performance for stdhd set of speed 3 is improved by
0.097%. It fixes issue 869.
Change-Id: Idc59e22035eaf39f7133ca04174894374d647ff7
This SSE2 is based on VP8 denoiser's SSE2 code. In VP8, there are
only 16x16 blocks in denoiser, while in VP9, there are 13 different
block sizes.
By adding this SSE2 code, the improvement of encoder speed is around
20%(using C code vs using SSE2 code), vary for different clips.
The unit test for VP9 denoiser is to confirm that the SSE2 code is
bit-exact with the C code. The unit test covers all block size.
Change-Id: Ic8d8ac26db4ea40a5f146b5678a065af07eaaa3d
Adjustments to the GF interval choice and minimum boost.
Adjustment to the calculation of 2 pass worst q.
Compared to 09/29 head there is metrics hit on derf of
(-0.123%,-0.191%)
Compared to the September 29 head and a baseline on
September 18 baseline the accuracy of the VBR rate control
measured on the derf set is as follows:-
Mean error % / Mean abs(error %)
Sept 18 baseline (-7.0% / 14.76%)
Sept 29 head (-15.7%, 19.8%)
This check in (-1.5% / 14.4%)
The mean undershoot is reduced slightly but the
worst case overshoot on e.g. harbour/highway is
increased. This will be addressed in a later patch.
Change-Id: Iffd9b0ab7432a131c98fbaaa82d1e5b40be72b58
It is possible that the GOLDEN reference frame is not avaiable, in
which setting the predicted mv will be associated with a residual
value of INT_MAX. This commit checks this condition before
left shift and comparison with that of ALTREF frame, to avoid
overflow issue.
Change-Id: Ib98c3149dbdd016f2fe5beaafb13f67d469dd07c
This commit adds proper initialization of segment id for variance AQ
mode in non-rd coding path. It fixes the enc/dec mismatch issue of
rt=7 with --aq-mode=1, as reported in issue #816
Change-Id: I02fa41b96345bf2e66077d5ea553f85ba800f7bb
This commit enables the encoder to skip split partition search if
the bigger block size has all non-zero quantized coefficients in low
frequency area and the total rate cost is below a certain threshold.
It logarithmatically scales the rate threshold according to the
current block size. For speed 3, the compression performance loss:
derf -0.093%
stdhd -0.066%
Local experiments show 4% - 20% encoding speed-up for speed 3.
blue_sky_1080p, 1500 kbps
51051 b/f, 35.891 dB, 67236 ms ->
50554 b/f, 35.857 dB, 59270 ms (12% speed-up)
old_town_cross_720p, 1500 kbps
14431 b/f, 36.249 dB, 57687 ms ->
14108 b/f, 36.172 dB, 46586 ms (19% speed-up)
pedestrian_area_1080p, 1500 kbps
50812 b/f, 40.124 dB, 100439 ms ->
50755 b/f, 40.118 dB, 96549 ms (4% speed-up)
mobile_calendar_720p, 1000 kbps
10352 b/f, 35.055 dB, 51837 ms ->
10172 b/f, 35.003 dB, 44076 ms (15% speed-up)
Change-Id: I412e34db49060775b3b89ba1738522317c3239c8
Incorporates the WRAPLOW macro into the non-highbitdepth transforms
to aid hardware verification between a software C model and an
intended hardware implementation though the use of the configure
options: --enable-experimental --enable-emulate-hardware.
Note that to avoid further discrepancies between the sse/sse2
implementations of the transforms and the C implementation, when the
emulate hardware option is invoked, we also disable sse/sse2/etc.
Also incudes some minor cleanups/renaming etc.
Change-Id: Ib864d8493313927d429cce402982f1c8e45b3287
Miscellaneous bug-fixes for high bitdepth functionality.
With this patch, high bit-depth profiles become mostly functional,
except for an intermittent assert failure issue that is being
tracked.
Change-Id: I6a7fcbdcf1e5b09842e88535f8442d2e1230748c
This commit removes unused header file vp9_onyxc_int.h and repeatedly
included file vpx_ports/mem.h from vp9_block.h
Change-Id: I400b210bd1da48f1880bd50a8f4a6e2c690e15a1
Block transform skipping was implemented based on DCT's energy
conservation property. Modified the thresholds using zero bin
parameters. AC and DC coefficients were checked separately to
allow better identifying of skippable blocks.
Borg test at speed 3 showed:
stdhd set: psnr gain: 0.153%, ssim gain: 0.051%;
derf set: psnr gain: 0.023%, ssim gain: 0.036%
For most test clips, the encoding speedup is 1% - 2%.
parkrun(720p): 7.5% speedup, park_joy(1080p): 3.5% speedup.
Change-Id: If28eb81113a077414f5ca7b021c14f9069b373bb
For regular inter frames, if the distance from GOLDEN_FRAME is larger
than 2 and if the predicted motion vector of LAST_FRAME gives lower
sse than that of GOLDEN_FRAME, skip the GOLDE_FRAME mode checking in
the rate-distortion optimization. It provides about 5% speed-up at
expense of -0.137% and -0.230% performance down for speed 3. Local
experiment results:
pedestrian 1080p 2000 kbps
66712 b/f, 40.908 dB, 113688 ms ->
66768 b/f, 40.911 dB, 108752 ms
blue_sky 1080p 2000 kbps
51054 b/f, 35.894 dB, 70406 ms ->
51051 b/f, 35.891 dB, 67236 ms
old_town_cross 720p 1500 kbps
14412 b/f, 36.252 dB, 60690 ms ->
14431 b/f, 36.249 dB, 57346 ms
Change-Id: Idfcafe7f63da7a4896602fc60bd7093f0f0d82ca
When calculating delta in VP8 denoiser, since the block size is fixed to 16x16,
the divisor is 256, which is the number of the pixel.
But in VP9, the block size varies, the divisor should correspond to the block
size.
Change-Id: Ibdc1e5d23ba8c788b0d0dc6d406bcdfc34c1b142
One is a more aggressive version of the pruned subpel tree
search where only a single halfpel candidate is searched.
The search candidate is based on a surface fit result.
The other is a method to obtain the subpel position at one
shot based on the same surface fit.
The methods have not been deployed in any speed setting yet.
Change-Id: I34fef3f2e34f11396c9d1ba97f4be8c4ffca62d3
This commit enables the encoder to skip checking ALTREF inter modes
in ARF coding, if the predicted motion vectors suggest that the
GOLDEN_FRAME provides higher prediction accuracy than ALTREF_FRAME.
It improves the speed 3 encoding speed by about 5%, at the expense
of compression performance loss -0.041% and -0.225% for derf and
stdhd, respectively.
pedestrian_area 1080p 2000 kbps
66705 b/f, 40.909 dB, 118738 ms ->
66732 b/f, 40.908 dB, 113688 ms
old_town_cross 720p 1500 kbps
14427 b/f, 36.256 dB, 62746 ms ->
14412 b/f, 36.252 dB, 60690 ms
blue_sky 1080p 1500 kbps
51026 b/f, 35.897 dB, 73310 ms ->
50921 b/f, 35.893 dB, 70406 ms
bus CIF 1000 kbps
21301 b/f, 34.841 dB, 7326 ms ->
21248 b/f, 34.837 dB, 7196 ms
Change-Id: I76cf88b4d655e1ee3c0cb03c8a5745493040e8d2
This patch re-enabled the feature in Pengchong's patch
(commit 1286126073). Originally, it
was turned on while use_lastframe_partitioning > 0(not used anymore).
Now it was added as a feature, and turned on while speed >= 2.
As described in the original patch, this feature helps speed up the
slideshows in YouTube.
Change-Id: I1b0f18d65da1ee1c8d1e117dabba910c5207c471
Simplified the code and removed some code that was not used anymore.
This patch didn't change encoding result.
Change-Id: I7e54a74c8f35a6726dfc8a1c55b337448b7ea124
The rd_thresholds are adaptively changed based on best mode tested.
It was only changed for the same block size, this commit makes the
adaptation for similar block sizes too. The commit also made minor
adjustment and code cleanups.
The impact on encoding time for _ped:
118089 ms -> 111927 ms
The impact on compression:
derf: -0.339%
stdhd: -0.303%
Change-Id: I8817fed1102350497f2ec631849e43f753878e5d
Adds code to return an integer cost list for NSTEP search. Then
uses it for pruned subpel search in speed 3.
derf: -0.06%
Speed on mobcal 720p increaes from 10.28 fps to 10.65 fps.
[Subject to further testing].
Change-Id: Ib591382d25b2c11bcaba9d3a27a93a9d1ab27a96
mi_grid_* are arrays of pointer to pointer. They save the pointers that point
to the MIs in cm->mi. But they are unnecessary and complicated. The original
goal was to remove MODE_INFO_t copy. But with an extra MODE_INFO_t pointer
inside MODE_INFO_t, same goal could be achieved.
This commit totally removes the mi_grid_* structures. But there are still
many dummy MODE_INFO_t inside cm->mi which are a waste of memory. Next commit
will do on-demand MODE_INFO_t allocation in order to save these memories.
Change-Id: I3a05cf1610679fed26e0b2eadd315a9ae91afdd6
vpx_svc_parameters_t contains id, resolution and min/max qp for each spatial layer.
In this change we will use extra config to send min/max qp and scaling factors, then calculate layer resolution inside encoder.
Change-Id: Ib673303266605fe803c3b067284aae5f7a25514a
Substantial restructuring of the way we estimate
the rate of decay in prediction quality and determine
the arf interval and amount of boost used.
Also other changes to support moving to a lower first pass
Q which exposes some new features and allows us to better
distinguish genuinely static blocks from low motion or noisy
blocks.
Net gains now visible on all the test sets with std-hd PSNR up
1.87%. There are still some bad outlier cases but most of these
are low motion or slide show type content where the metrics
are already high at any given rate. The best + case is up by
more than 10%.
Change-Id: I18e25170053bdf3188f493ff8062f48a74515815
The ARF frame should always be the same size as the
native resolution of the input frames.
It will be scaled to the required resolution at
encode time.
Change-Id: I0afe858129aa6ef65b1648f43476331715346896
This commit makes the encoder to use non-zero mode threshold for
NEARESTMV modes. The runtime for test clips of speed 3 is reduced
by about 1%.
pedestrian 1080p 2000 kbps, 143239 ms -> 141989 ms
bus CIF 1000 kbps, 7835 ms -> 7749 ms
The compression performance change is about -0.02% for both derf
and stdhd.
Change-Id: Ib71808922c41ae2997100cb7c561f68dcebfa08e
If the partition block is skippable, which means no coefficients
for Y, U, and V planes, its skip flag is set to 1. No quality
change (verified by borg tests), and no noticeable speed change.
Change-Id: I9231f720f8dd6364384cf05aa148ca24d75450f1
This commit enforces ARF validation check for compound inter modes.
It avoids potential access to ARF in the encoding process if it
is not allowed.
Change-Id: I055fec946b5d19d97937dc9001e1e564923e2439
The valid reference frame check in sub8x8 rate-distortion
optimization search has been included in the ref_frame_skip_mask
scheme. This commit removes the later further validation checks
that are not in effect.
Change-Id: I853b477c44037d3dc0afec6cbfce08a96c597a75
This commit replaces the best_ref_index table fetch with the use
of best_mbmode in vp9_rd_pick_inter_mode_sub8x8.
Change-Id: I882ee9ee6a8c0e61befcca1f4dba6d2ea8de8f13
Call to vp9_rc_get_second_pass_params() moved from
Pass2Encode() to earlier in vp9_get_compressed_data(),
to ensure that two pass stats and parameters are
available before decisions such as frame scaling.
Change-Id: If21537f0073919b04696a7d5e9aac78e23d76f39
When a reference frame type is not in the frame buffer, the mode
search threshold will be set to INT_MAX, so as to effectively
turn off the mode entries in the rate-distortion optimization loop
that involves this reference frame type. This operation is now
integrated in the ref_frame_skip_mask scheme. This commit hence
removes the redundant mode search threshold setting.
Change-Id: Ib18f45da611afda2af275201efd367df7f5101ab
This commit unifies the reference frame control in the rate-
distortion optimization search loop of sub8x8 block size to remove
the control dependency on mode search order.
Change-Id: I3a174099f71a7cc176ede9fd60e2374243ae9232
Improves function to return sad of integer pels by reusing integer
pels already visited in the smallest scale.
Turns on BIGDIA search for speed 4. Also, turns on the
first version of the pruned subpel search at this speed.
derf: -0.32% (speed 4)
Speed seems to improve by at least 5% but subject to verification.
Change-Id: Iaec8eaffd61d6237ac029e6a2a1b0a88b2a35271
Adds various high bitdepth transform functions and tests.
Much of the changes are related to using typedefs tran_low_t
and tran_high_t for the final transform cofficients and intermediate
stages of the transform computation respectively rather than fixed
types int16_t/int. When vp9_highbitdepth configure flag is off,
these map tp int16_t/int32_t, but when the flag is on, they map
to int32_t/int64_t to make space for needed extra precision.
Change-Id: I3c56de79e15b904d6f655b62ffae170729befdd8
The variable best_inter_rd is effectively not in use in the rate-
distortion mode search loops of both regular block sizes and sub8x8
block sizes.
Change-Id: I178f909f8c9629772e13adc6257908653b2adf31
The speed feature that skips compound inter prediction modes was
subsumed by other speed features and effectively was not in use.
This commit removes it.
Change-Id: I22b0c71a8ddd15d93b25d86fa63a1dce2ba6a1a9
Integrate intra mode mask speed feature with the mode_skip_mask
scheme. Move it outside the mode search loop in the
vp9_rd_pick_inter_mode_sb function.
Change-Id: I7738fea749bfdc08ad05d7f2524feb8ff67568d9
This speed feature is used in real-time setting only. Remove the
related condition check in the rate-distortion optimization search
loop.
Change-Id: Iaacc1e268214634e6f95c5048c28a60cec6c42fc
Refactor overlay frame speed-up related function. Make it unified
with the ref_frame_skip_mask system and Move it out of the mode
search loop.
Change-Id: I0dde9baf44354f6ba00b4679cba02fa6a30c7316
This commit refactor the rate-distortion optimization search for
regular block sizes to remove the speed feature dependency on mode
search order.
Change-Id: Ied033ee484c2957e17baa7b6450b720fe7dd0e7d
This issue is found when the denoising mode is set to kDenoiserOnYUVAggressive.
Updated the C code to make it the same with SSE version.
I also changed several lines in VP9 denoiser for the code style.
Change-Id: I640d48cf946fe8c6a400e6e252107501d1e226d3
This commit fixes a bug related to skipping intra mode checking, by
using a separate variable to store the best prediction error from
inter mode. It avoids unintentionally overwriting intra mode
rate-distortion cost, and hence affecting other speed features.
Change-Id: I99e12993339c84c8b4f597996b372012e5858fae
Assigning selected reference frame pointer is done in the
encode_superblock function. No need to do this at the end of
rate-distortion optimization search.
Change-Id: I33fcede0fd304b4a4c4deef2d126d79546a9c070
This commit refactors the vp9_rd_pick_inter_mode_sb function to
remove the intra mode early termination dependency on the mode
search order.
Change-Id: If6ac49aa7c530c7b9a5bd31b0ab84db83e192bec
This commit allows the encoder to find current best prediction mode
state using best_mbmode, instead of fetching from the static mode
search table via best_mode_index.
Change-Id: Ibefeab83aed33a49c2be03e83f09153856ca4271
The use of use_lastframe_partitioning is totally removed in good-
quality encoding. Its usage in real-time encoding needs to be
evaluated to see if it can be removed too.
The Borg tests at speed 4 showed:
stdhd set: 0.220% psnr gain, 0.166% ssim gain;
derf set: 0.329% psnr gain, 0.476% ssim gain.
Speed test on selected clips showed 1.54% speedup.(Worst case:
pedestrian_area_1080p25.y4m, speed loss: 1.5%)
Change-Id: I1c844d329b0b5678558439b887297c1be7ddab00
The speedup in rd_pick_partition() function makes it possible
to drop use_lastframe_partitioning feature. By doing that, we
achieve good PSNR gain with small speed loss. Also, this makes
encoding loop less complicated. The code cleanup patch will
follow.
Borg tests showed:
1. At speed 2,
stdhd set: 0.201% PSNR gain, 0.133% SSIM gain;
derf set: 0.262% PSNR gain, 0.276% SSIM gain.
2. At speed 3,
stdhd set: 0.139% PSNR gain, 0.109% SSIM gain;
derf set: 0.447% PSNR gain, 0.442% SSIM gain.
The average speed loss over selected test clips is within 1%
with the worst case of 4%.
Change-Id: Icfd2ded7869372b585a6972855d933b3d0280d90
The rate costs calculated for inter modes are not precise in some
cases, which causes NEWMV is chosen instead of NEARESTMV, NEARMV,
and ZEROMV. This patch added checks for these cases, and corrected
the mode decisions.
Borg tests at speed 3 showed:
1. stdhd set: 0.102% PSNR gain and 0.088% SSIM gain.
2. derf set: 0.147% PSNR gain and 0.132% SSIM gain.
No speed change.
Change-Id: I35d17684b89ad4734fb610942d707899146426db
This commit turns on adaptive motion search for ARF coding, in
addition to other normal inter frame coding. It improves the
average compression efficiency:
stdhd 0.1%
derf 0.04%
For the test sequences, the speed 3 runtime is reduced:
pedestrian 1080p 2000 kbps, 149932 ms -> 144580 ms, (3.3% speed-up)
bus CIF 1000 kbps, 8050 ms -> 7895 ms, (1.9%)
highway CIF 100 bkps, 45033 ms -> 44078 ms, (2.2%)
Change-Id: I5228565b609f99e8ae04f6140a2bf2b64a831d21
This is to keep the same with VP8 denoiser.
If motion magnitude is small,
make denoiser more aggressive.
Change-Id: I942a6e2f2ed9aec6f0c4c1f9e5fa47066cadcc0c
When the first try of denoising turns out to be too much,
we will use a softer filter by adopting an adjustment to
make the result closer to original pixel (as in VP8 denoiser).
The old code made the adjustment in the wrong direction.
Change-Id: I84e28fa9e01eef47c5a37d5a2e6d3d378a06786b
This commit allows the encoder to store outcomes of single reference
frame modes and compares them to decide if the inter prediction
filter, forward transform, and quantization can be skipped.
The compression performance of speed 3 is down
derf -0.364%
stdhd -0.198%
For test sequences, the speed 3 runtime is reduced
highway CIF 100 kbps, 51976 ms -> 45033 ms, 13% speed-up
stockholm 720p 1000 kbps, 71826 ms -> 67838 ms, 5.5% speed-up
pedestrian 1080p 2000 kbps, 154924 ms -> 150702 ms, 2.6% speed-up
Change-Id: I5aa26f918d2b4b5197a2c0afa2779319f1c88e44
intra_super_block_yrd() and inter_super_block_yrd() are largely same,
this commit merges them into one to reduce code duplication.
Change-Id: I64d7042a5b099345627cf55663010c185b25ec37
From 3 to 2, which seems to be slightly positive on compression for
all test sets, also reduces encoding time by 2%-5%, varying on the
test clips.
Change-Id: If045417bd27311700c919b4a335eff0dc1130ae0
This commit removes the special case for key frame, as transform size
decision is controlled by the appropriate speed feature for all lossy
coding modes: tx_size_search_method.
Change-Id: I9677171e3f2432ec23705f7c5ea8170dd4562fae
This commit allows the encoder to skip check on compound inter
modes in the rate-distortion optimization loop, if the reference
frame bias signs are the same.
Change-Id: Ib753e6bb11cbdd338aee69dbe2b649671f75a6b0
Adds config parameter vp9_highbitdepth, to support highbitdepth profiles.
Also includes most vpx level high bit-depth functions. However
encode/decode in the highbitdepth profiles will not work until
the rest of the code is in place.
Change-Id: I34c53b253c38873611057a6cbc89a1361b8985a6
It's built based on current spatial svc code.
We only support one spatial two temporal layers at this time.
Change-Id: I1fdc8584354b910331e626bfae60473b3b701ba1
This commit skips the compound inter mode prediction check in the
rate-distortion optimization loop for ARF coding. It reduces the
runtime for certain test clips at speed 3, at no compression
performance change:
bus CIF 1000 kbps, 8260 ms -> 8090 ms, 1.8% speed-up
stockholm 720p 1000 kbps, 74453 ms -> 71826 ms, 2.9% speed-up
No visible speed-up for pedestrian area 1080p at 2000 kbps.
Change-Id: Ic68aa56837159b726563b784e2e3729e846465ad
Use unsigned int type to store the sse in the pixel domain. The
precision is sufficient to handle sse of block size up to 64x64.
The transform domain version however needs int64_t, since there is
a transfer gain applied in the forward transformation that might
cause unsigned int overflow.
Change-Id: Ifef97c38597e426262290f35341fbb093cf0a079
This commit allows encoder to skip intra coding mode test, when
the known inter residual is less than the source variance. It
reduces the runtime of speed 3 for test clips:
bus cif 1000 kbps: 8587 ms -> 8260 ms, 3.8% speed-up
pedestrian 1080p 2000 kbps: 161381 ms -> 155241 ms, 3.7% speed-up.
The compression performance is down by
derf -0.36%
stdhd -0.25%
Change-Id: I75ce1e035b4da2153cb1ac14111d1a07c05a735d
This commit extends the sse and forward transform computation flag
to support the case 64x64 blocks where there are 4 32x32 2D-DCT
blocks.
Change-Id: I86a3e805dfaa0f3abd812f590520c71aa0e40473
In the partition search, the encoder checks all possible
partitionings in the superblock's partition search tree.
This patch proposed a set of criteria for partition search
early termination, which effectively decided whether or
not to terminate the search in current branch based on the
"skippable" result of the quantized transform coefficients.
The "skippable" information was gathered during the
partition mode search, and no overhead calculations were
introduced.
This patch gives significant encoding speed gains without
sacrificing the quality.
Borg test results:
1. At speed 1,
stdhd set: psnr: +0.074%, ssim: +0.093%;
derf set: psnr: -0.024%, ssim: +0.011%;
2. At speed 2,
stdhd set: psnr: +0.033%, ssim: +0.100%;
derf set: psnr: -0.062%, ssim: +0.003%;
3. At speed 3,
stdhd set: psnr: +0.060%, ssim: +0.190%;
derf set: psnr: -0.064%, ssim: -0.002%;
4. At speed 4,
stdhd set: psnr: +0.070%, ssim: +0.143%;
derf set: psnr: -0.104%, ssim: +0.039%;
The speedup ranges from several percent to 60+%.
speed1 speed2 speed3 speed4
(1080p, 100f):
old_town_cross: 48.2% 23.9% 20.8% 16.5%
park_joy: 11.4% 17.8% 29.4% 18.2%
pedestrian_area: 10.7% 4.0% 4.2% 2.4%
(720p, 200f):
mobcal: 68.1% 36.3% 34.4% 17.7%
parkrun: 15.8% 24.2% 37.1% 16.8%
shields: 45.1% 32.8% 30.1% 9.6%
(cif, 300f)
bus: 3.7% 10.4% 14.0% 7.9%
deadline: 13.6% 14.8% 12.6% 10.9%
mobile: 5.3% 11.5% 14.7% 10.7%
Change-Id: I246c38fb952ad762ce5e365711235b605f470a66
Updates the vp9_pattern_search function to return integer one-away
neighbors' sad values, for subsequent use in speeding up the
sub-pel search. Also, removes code for the do_refine option
which is not being used currently.
Updates the integer and subpel functions to pass in a 5-element
sad list for output or input.
A new pruned sub-pel search algorithm is implemented that uses
the sad returned from the integer pel search. But it is not
deployed yet.
Change-Id: Ifa9f5ad024b5b660570366d2bd900343e1891520
This commit re-work the operation flow related to prediction
residual generation and the rate-distortion modeling. It saves one
call for model_rd_for_sb.
Change-Id: Icaf96c0ff09c903637ed5283448afe01d798195f
The value of switchable rate has been stored in a local variable.
This change skips the second call to vp9_get_switchable_rate() by
reusing the local variable.
Change-Id: Ib7d3fef7621cc4bde94c6d6e6b3a71f1fd4559f2
Check the mode and motion vector cost. If it is already above
the existing best rate-distortion cost, skip the rest check process
on this mode.
Change-Id: Ie065cebdfda2a3be3be18b8e8b43dc29aaa8c179
This commit makes the rate distortion modeling run in the unit of
maximum transform block size. No compression/speed change observed.
It is for the use of later fast forward transform purpose.
Change-Id: Ibaaedb69c765e8d0c5d5012f0ec07f36fd9f68fd
This commit addes a new strategy to reduce the search for optimal
interpolation filter type. The encoder counts and store how many each
filter type is selected and used for each of the reference frames.
A filter type that is rarely used for all three reference frames is
masked out to avoid computation.
The impact on compression is neglectible:
-0.02% on derf
+0.02% on stdhd
Encoding time is seen to reduce by 2~3%.
Change-Id: Ibafa92291b51185de40da513716222db4b230383