Initial patch to remove get_zbin_mode_boost() and
cpi->zbin_mode_boost.
For now sets a dummy value of 0 for zbin extra pending
a further clean up patch.
Change-Id: I64a1e1eca2d39baa8ffb0871b515a0be05c9a6af
The alternate reference frame is disabled in non-RD mode. No need
to keep the related entries in the THR_MODES array.
Change-Id: I53386f4bb1c6284f582801f27246c5edf55bc24b
In RTC coding mode, the alternate reference frame modes and compound
inter prediction modes are disabled. This commit reworks the
related mode search threshold update process to skip interacting
with these coding modes. It provides about 1.5% speed-up for speed
-6 on average.
vidyo1
16551 b/f, 40.451 dB, 6261 ms -> 16550 b/f, 40.459 dB, 6190 ms
nik720p
33316 b/f, 38.795 dB, 6335 ms -> 33310 b/f, 38.798 dB, 6237 ms
mmmoving
33265 b/f, 41.055 dB, 7176 ms -> 33267 b/f, 41.064 dB, 7084 ms
dark720
33329 b/f, 39.729 dB, 11235 ms -> 33331 b/f, 39.733 dB, 10731 ms
Change-Id: If2a4090a371cd28f579be219c013b972d7d9b97f
This commit removes undefined value options of cpu-used for VP9 and
changed vpxenc prompt to reflect the usable range of [-8,8]
Change-Id: Ib80fef3dbb6ec9aabac45ed13e8ab6fbaf94f55e
Use a temporary variable to store the transform size associated
with the best intra mode and restore the mode_info if the overall
best mode is intra mode.
Change-Id: I2606e0061ad32f91b095462902b1eb734b128eea
This commit explicitly set the second reference frame type to be
NONE in key frame coding mode. This fixes a subtle dependency of
reference motion vector used by next inter frame on mode_info
reset before key frame coding.
Change-Id: I5ff0359753fdc9992b0bfe889490f7a32d7d5f6a
Where there is very subtle motion, especially when combined
with low spatial complexity, the codec sometimes fails to quickly
pick up the ambient motion field.
Once it has been established though the field propagates well using
Nearest and Near MV.
This patch looks specifically at the case where the Nearest and Near
have not been established as non zero vectors and in this case
discounts the cost of searching for a new vector in the rd code.
This will almost certainly have some implications in terms of encode
speed but it should be possible to mitigate the impact in a subsequent
using first pass stats and the local spatial complexity.
Average results for test sets approximately neutral.
Change-Id: I44a29e20f11f7ab10f8c93ffbdc50183d9801524
Change 72141 introduced a new use of vp9_avg_4x4.
This call needs to switch to using vp9_highbd_avg_4x4
when performing high bitdepth encodes.
Change-Id: I6a8ba4b62f8a75d0a917b365a55245e2f0438ea1
When multiple intra modes are tested, the previous mode info
update process may overwrite the selected best intra mode and make
the final selection use an inter mode. This commit fixes this
issue by moving the mode_info reset outside the intra mode search
loop.
Change-Id: I15ed4288a6b3cb0832104a5e6d5d9a25cd1a5b2b
If vp9_pick_inter_mode works properly, it should at least check
one coding mode and hence get best_tx_size assigned a valid value.
There is no need to initialize best_tx_size with a legitimate
value before starting the mode search.
Change-Id: Ic0496cd89672ea9c2c512a9bd1da952190af9cba
Make the variable reduction_fac log2 based and explicitly use
right shift when computing intra_cost_penalty.
Change-Id: I208f1fb879a02debb3b3fc64f9fd06260dcf1c86
Fails to compile. Bad calls to vp9_alloc_frame_buffer
and vp9_realloc_frame_buffer in postproc.c
This reverts commit 399823b6f5.
Change-Id: I29f0e173f8e185d3a303cfdb17813e1eccb51e3a
Add support for setting byte alignment on the Y, U, and V plane of the
reference buffers. The byte alignment must be a power of 2, from 32 to
1024. A value of 0 sets legacy alignment.
Change-Id: I7c1399622f7aa68e123646369216b32047dda73d
This commit fixes a bug in the PICK_MODE_CONTEXT index for
horizontal partition case. The compression performance change
is less than 0.01% level, since most blocks are selected to
use square block size in RTC coding mode.
Change-Id: I67effc18ae8795fccdd82a55f4efc609fa5cb3e1
For key frame under variance source partition: 4x4 prediction blocks
may be selected when variance of 8x8 block is very high (threshold is set fairly high for now).
Testing on some RTC clips shows this helps to reduce some ringing artifacts on key frame.
Encoded key frame size increases about ~10%. Key frame PSNR increases about ~0.1-0.2dB.
Change-Id: I56e203fac32ea6ef69897fb3ea269c59cb50d174
This commit explicitly uses the bit shift operation instead of
division for computing block variance.
Change-Id: Id19c0ff27dd1d1ae4aceee6657e1aad0d406bd74
This commit refactors the choose_partitioning function. It removes
redundant memset calls and makes the encoder to calculate
variance value per block only when it is needed. It reduces the
average runtime cost of choose_partitioning by 60%. Overall it
reduces speed -6 runtime by 2-5%.
Change-Id: I951922c50d901d0fff77a3bafc45992179bacef9
Replace error_resilient flag with use_prev_frame_mvs in
vp9_pick_inter_mode reference motion vector search selection.
This effectively turns off the simplified ref mv search in the
settings of frame resizing, even if error-resilient mode is off.
Change-Id: I7fed814ee7bc0cb419a03b846e0fc2de46ba7686
Update the frame motion vector only if previous frame motion vector
is needed for next frame reference motion vector.
Change-Id: Ica50f9d7b46ad4f815bba0d9e30f5546df29546f
This commit enables the use of sub8x8 blocks in RTC key frame
encoding. It requires the block size to be preset and will decide
the coding mode and encode the bit-stream.
Change-Id: I35aaf8ee2d4d6085432410c7963f339f85a2c19b
Rename set_modeinfo_offsets as set_mode_info_offsets, to be more
consistent with naming convention.
Change-Id: I68ca1f36c4a78127d9439a50c1506a2afd07927d
The later encoding process will take the top-left block's
mode_info for pre-determined block size.
Change-Id: I76a90f9ce7f3b2dbc2975b52442114e461c465b5
The restructure moves the decision into the rd pick
modes loop and makes a decision based at the 16x16
block level instead of only the 64x64 level.
This gives finer granularity and better visual results
on the clips I have tested. Metrics results are worse
than the old AQ2 especially for PSNR and this mode
now falls between AQ0 and AQ1 in terms of visual
impact and metrics results.
Further tuning of this to follow.
It should be noted that if there are multiple iterations
of the recode loop the segment for a MB could change
in each loop if the previous loop causes a change in the
complexity / variance bin of the block. Also where a block
gets a delta Q this will alter the rd multiplier for this block
in subsequent recode iterations and frames where the
segmentation is applied.
Change-Id: I20256c125daa14734c16f7cc9aefab656ab808f7
the flag in the header wasn't being set based on the encoder
configuration in non-intra only mode
broken since:
fbc2fbf Adding oxcf temp variable.
Change-Id: Ib4cff9901889824bc4e68d7f0f6deb1e41df2f53
The initial reset of this_rdc in vp9_pick_inter_mode is not needed,
since it will be re-assign when used.
Change-Id: Ic0e12d741cbab292fc214c1eabb48b129af7839b
Compare the current best mode rate-distortion cost with the skip
threshold to decide if performing motion search.
Change-Id: Ia071824f8dd3b7db485f424692a485a2da6a1a9f
These speed-up features for key frame coding are only turned on
in the settings of hybrid non-RD and RD mode decision. It provides
about 20% speed-up to the hybrid key frame coding at the expense
of certain compression performance loss. For vidyo1, the key frame
coding statistics are changed
9838F, 35.020 dB, 61677 us -> 9920F, 34.834 dB, 47556 us
Overall rtc set compression performance is down by -0.257%.
Change-Id: I0025447fda26bb7855e982955642b5f55d71b51f
When block size is below 16x16, the encoder swap from non-RD to
RD mode for key frame coding. This largely brough back the key
frame compression performance. For vidyo1 at 1000 kbps, the key
frame coding statistics are changed
9978F, 34.183 dB, 36807 us -> 9838F, 35.020 dB, 61677 us
As compared to the full RD case
7187F, 34.930 dB, 214470 us
The overall rtc set coding performance (single key frame setting)
is improved by 1.5%.
Change-Id: I78a4ecf025d7b24ec911e85be94e01da05e77878
Change 72193 made the encoder behave differently
when configured with and without high bitdepth.
This change means the same algorithm is used for both.
Change-Id: I707a44a94afca773a9e0c2f7ebeeea83030257c5
Currently, VP9 supports column-tile encoding, which allows a frame
to be encoded in multiple column tiles independently. The number of
column tiles are set by encoder option "--tile-columns". This
provides a way to encode a frame in parallel.
Based on previous set of patches, this patch implemented the tile-
based multi-threaded encoder. Each thread processes one or more
tiles.
Usage:
For HD clips:
--tile-columns=2 --threads=1/2/3/4
While using 4 threads, tests showed that the encoder achieved
2.3X - 2.5X speedup at good-quality speed 3, and 2X speedup at
realtime speed 5.
Change-Id: Ied987f8f2618b1283a8643ad255e88341733c9d4
Change 71789 renamed CONFIG_VP9_HIGH to CONFIG_VP9_HIGHBITDEPTH.
However, one use of CONFIG_VP9_HIGH was missed.
Change-Id: I0ebb9c71380c6d810a25708d15471abf9533e695
For key frame at speed 6: enable the non-rd mode selection in speed setting
and use the (non-rd) variance_based partition.
Adjust some logic/thresholds in variance partition selection for key frame only (no change to delta frames),
mainly to bias to selecting smaller prediction blocks, and also set max tx size of 16x16.
Loss in key frame quality (~0.6-0.7dB) compared to rd coding,
but speeds up key frame encoding by at least 6x.
Average PNSR/SSIM metrics over RTC clips go down by ~1-2% for speed 6.
Change-Id: Ie4845e0127e876337b9c105aa37e93b286193405
This commit reworks the ONE_LOOP_REDUCED coefficient probability
model update process. It allows model update for every coefficient
across the spectrum at a coarser resolution, instead of performing
precise update only for certain subset of probability models.
The overall runtime remains nearly same (<1% change) for speed -6.
The compression performance is improved by 7.5% in PSNR for speed
-5 and 4.57% for speed -6, respectively.
Change-Id: Ifb17136382ee7e39a9f34ff4a4f09a753125c8d1
Change 72056 unfolded some macro definitions,
but lost some alternative behaviour required for
high bitdepth encodes.
This causes the encoder to crash, see issue 884.
Change-Id: I8ce4d73c9fe0a3c10ccb86fba210fabc8b2f0ccc
Also removes some spurious changes in common/vp9_blockd.h which
was introduced by a rebase issue between nextgen and master branches.
Change-Id: If359f0e9a71bca9c2ba685a87a355873536bb282
(cherry picked from commit 005d80cd05)
(cherry picked from commit 08d2f54800)
(cherry picked from commit 4230c2306c)
AQ2 modified to use mb_av_energy in defining variance
thresholds used alongside complexity when defining the
segment to be used for an SB64.
Slight improvements in metrics (ssim and PSNR).
Change-Id: Idb9cb73f7d9c4f7118cd7e84ac77b0f25cacbf81
Incorporate segment delta-q into estimated bits.
This generally improves the rate control under cyclic refresh (aq=3) mode.
Change-Id: I1dc60fb230e7d08357fae18909d8ed27bf58e037
This patch greatly increase the strength of AQ1.
Visual tests show strong gains on many clips but their is a big
hit on psnr.
SSIM is more mixed with some winners and losers.
Change-Id: Idaa5d3b41d8576096bfa000b62bc531c3d8bf6a1
Each tile's tok starting address is calculated before the encoding
process. These addresses are stored so that the same calculation
won't be done again in packing bit stream.
Change-Id: I0a3be0301f002260c19a850303f2f73ebc47aa50
When the golden frame is boosted, the rate correction factor is not
correlated well with other inter frames even in CBR mode. This commit
changes to use GF specific rate_correction_factor when gf_cbr_boost
is greater than 20%.
Change-Id: I6312c1564387bcacc11f4c5e8a9cfdc781b5c3ab
This commit allows the encoder to increase the mode test kick-off
thresholds if the previous best mode renders all zero quantized
coefficients, thereby saving motion search runs when possible.
The compression performance of speed -5 and -6 is down by -0.446%
and 0.591%, respectively. The runtime of speed -6 is improved by
10% for many test clips.
vidyo1, 1000 kbps
16578 b/f, 40.316 dB, 7873 ms -> 16575 b/f, 40.262 dB, 7126 ms
nik720p, 1000 kbps
33311 b/f, 38.651 dB, 7263 ms -> 33304 b/f, 38.629 dB, 6865 ms
dark720p, 1000 kbps
33331 b/f, 39.718 dB, 13596 ms -> 33324 b/f, 39.651 dB, 12000 ms
mmoving, 1000 kbps
33263 b/f, 40.983 dB, 7566 ms -> 33259 b/f, 40.978 dB, 7531 ms
Change-Id: I7591617ff113e91125ec32c9b853e257fbc41d90
This patch modified struct VP9_COMP. Created a struct ThreadData
to include data that need to be copied for each thread. In
multiple thread case, one thread processes one tile. all threads
share one copy of VP9_COMP,
(refer to VP9_COMP *cpi in the code)
but each thread has its own copy of ThreadData,
(refer to ThreadData *td in the code).
Therefore, within the scope of encode_tiles(), both cpi and td
need to be passed as function parameters.
In single thread case, the FRAME_COUNTS pointer in ThreadData
points to "counts" in VP9_COMMON.
Change-Id: Ib37908b2d8e2c0f4f9c18f38017df5ce60e8b13e
The intra mode penalty is covered by intra_cost_penalty. This
commit removes the other intra cost threshold, provided that the
constant 50 is negligible in normal rate-distortion cost.
Change-Id: I9b8b7483c43b9a41741622e7057def1f7d51bb72
This change is made in preparation for a
subsequent patch which adds acceleration
for the highbitdepth transform functions.
The highbitdepth transform functions attempt
to use 16/32bit sse instructions where possible,
but fallback to using the C implementations if
potential overflow is detected. For this reason
the dct routines are made global so they can be
called from the acceleration functions in the
subsequent patch.
Change-Id: Ia921f191bf6936ccba4f13e8461624b120c1f665
(cherry picked from commit 454342d4e7)
This commit makes a non-RD coding mode decision process for key
frame coding. It can be optionally turned on in speed -6 and above.
Change-Id: I0847258b392877a0210b4768bef88ebc9ad009b5
This commit allows more aggressive decision to skip forward
transform and quantization for luma component in RTC coding mode.
The chroma components remains going through the normal coding
routine, since they are not included in the non-RD mode search
process.
It reduces the runtime cost by 2% - 10%. In speed -6,
vidyo1 1000 kbps
16576 b/f, 40.281 dB, 8402 ms -> 16576 b/f, 40.323 dB, 7764 ms
nik720p 1000 kbps
33337 b/f, 38.622 dB, 7473 ms -> 33299 b/f, 38.660 dB, 7314 ms
dark720p 1000 kbps
33330 b/f, 39.785 dB, 13505 ms -> 33325 b/f, 39.714 dB, 13105 ms
The compression performance of speed -6 is improved by 0.44% in
PSNR and 1.31% in SSIM.
Change-Id: Iae9e3738de6255babea734e5897f29118bebc6d7
In AQ1 a rate adjustment was applied for blocks coded with a
deltaq. This tends to skew the partition selection and cause
rate overshoot.
For example, consider a 64x64 super block where some but not all
sub blocks are in a low q segment and some are in a high q segment.
The choice of Q when considering large partition and transform sizes
is defined by the lowest sub block segment id (currently this implies the
lowest Q). If some parts of the larger partition are very hard this will
cause a high rate component.
The correct behavior here is for the rd code to discard the large partition
choice and break down to sub blocks where some have low and some
have high Q. However the rate correction factor above mask the high
cost of coding at a larger partition size.
Change-Id: Ie077edd0b1b43c094898f481df772ea280b35960
Make the midpoint variance used in AQ mode 1 segmentation
depend on the overall complexity of the frame in two pass.
Change-Id: I452814ec57f7a32352e41bb250e78066abe952dd
Add an additional restriction to bit/complexity based
segmentation based on spatial variance.
Only lower Q when both the number of bits spent
in the initial encoding pass and the spatial complexity are
below a threshold. This will prevent the low Q segments
being used just because there is a surfeit of bits.
Small metrics gains especially opsnr.
derf ~0.2% std-hd ~0.3%
Change-Id: I6a8496d466d673f9b0e2b2ca6304ea7b6d8e1cce
This is the first of a series of patches to restructure and
improve AQ mode 1 (variance based AQ).
Change-Id: Idcf693131a3ea2459dcfd957a54a65b971fa4a2a
Similar to mask_filter, the filter_cache in RD_OPT struct can be
moved out, and declared as a local variable since it is only
used in pick_inter_mode functions.
Change-Id: I412b99cca82bade07ac912064ec03dd1de6b2c17
Correct calculation of number of mbs in two pass code when
frame resizing is enabled. Always use initial number of mbs if
scaling is enabled, as this is what was used in the first pass.
Change-Id: I49a4280ab5a8b1000efcc157a449a081cbb6d410
The mask_filter in RD_OPT struct is used to record rd result in
filter decision. It is only used in pick_inter_mode functions,
and is removed from the struct and declared as a local variable.
Change-Id: I3c95c8632ba7241591ce00ef2ef5677b5e297d7b
The max_partition_size and max_partition_size are set at the
beginning while setting speed features, and then adjusted at
SB level. Moving them to mb struct ensures there is a local
copy for each thread.
Change-Id: I7dd08dc918d9f772fcd718bbd6533e0787720ad4
VP9/DatarateTestVP9Large.ChangingDropFrameThresh/[34] fails post the
merge of commit#ffa06b37. This commit adds reset of rc tracking info
when frame is dropped, and fixes the causes of the bad interaction
between the tests and the previous commit.
Change-Id: I848acfd9fcb336359662274325190f94aac76eae
This commit reworks the forward transform and quantization process
for 8x8 block coding. It combines the two operations in a single
function to save a store/load stage of the original transform
coefficients. Overall the speed -6 is slightly faster (around 1%
range). The compression performance of speed -6 is improved by
3.4%.
Change-Id: Id6628daef123f3e4649248735ec2ad7423629387
In rare cases, the interaction between rate correction factor and Q
choices may cause severe oscillating frame sizes that are way off
target bandwidth. This commit adds tracking of rate control results
for last two frames, and use the information to prevent oscillating
Q choices.
Change-Id: I9a6d125a15652b9bcac0e1fec6d7a1aedc4ed97e
vp9_quantize_fp is the quantization process used by rtc coding
mode. This commit adds a sse2 implementation of it. The
implementation is modified based on vp9_quantize_b_sse2. No speed
difference from ssse3 version.
Change-Id: I24949c5b27df160b4f35117d28858d269454e64a
Current setting had active_worst_quality set too high (close to worst_quality)
for first frame(s) following first key frame. This changes that to be somewhat
more aggressive in allowing active_worst_quality to be lower following key frame.
Also remove the 4/5 reduction in active_worst for key frame as
this should be set by the user qp_max setting.
Change-Id: I0530b3ddcc85c00e3eb7568de1b14a31206c4a4c
The function pointer in compressor instance does not change, so this
commit changes to call the function directly.
Change-Id: I9c9c460e3475711c384b74c9842f0b4f3d037cc5
This commit adds a check condition to the prediction buffering
operation used in the rtc coding mode. This resolves a unit test
warning in example/vpx_tsvc_encoder_vp9_mode_7.
Change-Id: I9fd50d5956948b73b53bd8fc5a16ee66aff61995
These 2 members in RD_OPT were moved to TileDataEnc struct
already, and therefore were removed here.
Change-Id: I22fee3b67f96e473a58e194a7edc76dbd48bfa04
Several frame counters in encoder are updated at SB level. Combine
those counters and put them in a separate struct, which allows us
to allocate one copy for each thread.
Change-Id: I00366296a13c0ada4d8fa12f5e07728388b6cab7
Modified VP9_COMP struct to include MACROBLOCK *mb. This change
makes it feasible in multi-thread case to allocate a mb for each
thread.
Change-Id: I624d6d1aa9c132362200753e5d90b581b1738d6e
Two members in struct CYCLIC_REFRESH
int64_t projected_rate_sb;
int64_t projected_dist_sb;
are updated at the superblock level, which makes them shared data
in the multi-thread situation, and requires extra work to handle
them. However, those values are updated and used immediately, and
therefore can be removed. This patch cleaned up the code and
removed the two members.
Change-Id: I2c6ee4552bf49fb63ce590cdb47f9723974fffb1
Prepare for the introduction of frame-size change
logic into the recode loop.
Separated the speed dependent features into
separate static and dynamic parts, the latter being
those features that are dependent on the frame size.
Change-Id: Ia693e28c5cf069a1a7bf12e49ecf83e440e1d313
Add extra vp9_clear_system_state() calls to fix
double / mmx issue introduced into first pass
code for 32 bit builds.
Change-Id: I84cd2986b80d83650a091ab25c43755efeb82e03
Rate correction factor is used to correct the estimated rate for any
given quantizer, and feeds into rate control for quantizer selection.
We make use of the actual bits used to calculate this rate correction
factor with an adjustment limit to prevent over-adjustment.
This commit adapts the adjustment limit to the difference between the
estimated bits and the actual bits, allows the adjustment limit to vary
between 0.125 (when estimate is close to actual) and 0.625 (when there
is >10X factor off between estimated and actual bits). By doing this,
the commit appears to have largely corrected two observed issues:
1. Adjustment is too slow when the actual bits used is way off from
estimate due to the small adjustment limit.
2. Extreme oscillating quantizer choices due to the feedback loop.
Change-Id: I4ee148d2c9d26d173b6c48011313ddb07ce2d7d6
This commit makes the speed -6 and above use the reconstructed
boundary pixels for precise intra prediction. This allows more
intra prediction modes to be tested in the non-RD coding process.
Enabling horizontal and vertical intra prediction modes can
improve the speed -6 compression performance for rtc set
by 0.331%.
Change-Id: I3a99f9d12c6af54de2bdbf28c76eab8e0905f744
I0c5f010 changed to allow update golden reference buffer in CBR mode,
this commit changes the use of rate_correction_factor for those frames
to be aligned with the new usage. This commit attempts to solve two
issues:
a. Initialization of rate correction factor for Golden Frame
Prior to this patch, even the regular inter frame has been update
the rate correction factor based on content and encoding results,
the first golden frame would still use the ininitialized value
that can be way off.
b. Allowing rate correction factor update to be slightly faster
Prior to this patch, when the rate correction factor is off, the
update to the factor is too slow, the factor could not get close
to a semi-correct value even after many frames.
The commit helps all clips in psnr/ssim metric, but especially to
a few clip in RTC set that rate correction was way off. For example
thaloundeskmtgvga gained about .5dB for both overall/average psnr.
Change-Id: I0be5c41691be57891d824505348b64be87fa3545
This commit integrates the non-RD mode decision process and the
encoding process into a single recursion scheme.
Change-Id: I6a7e72a0b84d567554801ebbe01ec75d54c1f77d
This patch was to fix the vpxdec fuzzing3 test failure. When an
error occurs, setjmp() is invoked, which calls the decoder
removing routine. In multiple thread situation, other threads
could try to access the frame context memory that is already
deallocated, thus causing a segfault.
An invalid unit test was added for this issue.
Change-Id: Ida7442154f3d89759483f0f4fe0324041fffb952
The aim of this patch is to apply a positive weighting to
frames that have a significant number of blocks that are
of low spatial complexity and are dark. The rationale behind
this is that artifacts tend to be more visible in such frames.
In this patch the weight is only applied in regard to the distribution
of bits between frames. Hence if all the frames share similar
characteristics (as is the case for most of our short test clips) there
will be little or no net effect.
However, the effect can be seen on some longer form test content.
For example Tears of steel baseline test:
2323.09 Kbit/s opsnr 39.915 ssim 74.729
With this patch:-
2213.34 Kbit/s opsnr 39.963 ssim 74.808
(Sligtly better metrics and about 5% smaller)
The weighting may well need some further tuning along side changes
to the aq modes.
Change-Id: Ieced379bca03938166ab87b2b97f55d94948904c