This patch followed allow_exhaustive_searches feature modification and
continued to modify the encoder to achieve the determinism in the row
based multi-threaded encoding. While row-mt = 1 and using multiple
threads, the adaptive feature in encoder was disabled, which gave
BDRate gain(at speed 1, -0.6% ~ -0.7%; at speed 2, -0.46% ~ -0.59%),
but some encoder speed losses(7% ~ 10% at speed 1 and 3% ~ 6% at
speed 2). These speed losses were acceptable considering the speed
gains obtained from row-mt.
Change-Id: I60d87a25346ebc487a864b57d559f560b7e398bb
Change it to row based array to avoid the slow down cause by sync.
row-mt on, speed 8, 2 threads: ~4% speedup for VGA on ARM benefited
from adaptive_rd_threshold.
Change-Id: I887e65a53af20a6c4f48d293daaee09dab3512cf
Add routine vp9_model_rd_from_var_lapndz_vec and call it from model_rd_for_sb
to model the rate and distortion for MAX_MB_PLANE Laplacian sources in
parallel. The caller ensures that all sources have non-zero variance.
Measured a 18% to 25% reduction in retired instructions, and 17% to 24%
reduction in instruction execution cost with different compilers for the
Laplacian modeling.
No change in behavior.
TEST=Verified that encoded files match bit for bit, with and without this
change.
BUG=b/33678225
Change-Id: I6b76947f21c659a349adb896e13e99f6e3f951e6
(Yunqing Wang)
This patch implements the row-based multi-threading within tiles in
the encoding pass, and substantially speeds up the multi-threaded
encoder in VP9.
Speed tests at speed 1 on STDHD(using 4 tiles) set show that the
average speedups of the encoding pass(second pass in the 2-pass
encoding) is 7% while using 2 threads, 16% while using 4 threads,
85% while using 8 threads, and 116% while using 16 threads.
Change-Id: I12e41dbc171951958af9e6d098efd6e2c82827de
(Yunqing)
This patch added the missing initialization in temporal filter.
Borg test BDRate results:
PSNR: -0.019%(lowres); -0.013%(hdres);
SSIM: -0.001%(lowres); -0.010%(hdres).
Other q values gave comparable but no better results.
Change-Id: I7ad0c18b39e6f558342688e2fe1e12fdb133ce9b
The bit to error transformation got doubled as a result of going from
8-bit to 9-bit costs (change d13385c).
Use defines to derive the scale numbers and comment some of the fields.
derf: -0.023 BDRATE
hevcmr: +0.067 BDRATE
stdhd: +0.098 BDRATE
(These are substantially smaller than than the original gains from 8 to
9 bit costing.)
Change-Id: I6a2b3b029b2f1415e4f90a05709b2333ec0eea9b
In inter mode search skip all modes except NEARESTMV and DC_PRED.
10% less encode latency for large frames using the chromium remoting_perftests.
+0.313% BDRATE on the screencast set at speed -6.
Change-Id: Ib97a39dd8bcdeab545509e0e02d78ce7033f8c63
This is a pure-refactor in preparation to potentially raise the bit-cost
resolution.
Verified at good speed 0 and rt speed -6.
Change-Id: I5347e6e8c28a9ad9dd0aae1d76a3d0f3c2335bb9
While turning on "--aq_mode=3", the quantizers are updated by each
thread. Fixed the me consts initialization function to make sure
that the correct thread data are updated.
Change-Id: Ied27bb7bae76fc3fa2cda4f8c35ac0b46271bef4
Frame buffers are now allocated dynamically on-demand.
Entries in the reference frame map, cm->ref_frame_map,
may now be set to -1 (INVALID_IDX) to indicate that
there is not a valid reference buffer in that "slot".
All slots in the reference frame map are now initialized
to the empty state (-1) and each buffer is initialized
to have a reference count of 0.
Change-Id: Id1afe98de98db4ae8b2dfefed7889c3b28c68582
This commit enables sub8x8 inter block coding for RTC mode. The
use of sub8x8 blocks can be turned on by allowing
choose_partitioning function to select 4x4/4x8/8x4 block sizes.
Change-Id: Ifbf1fb3888fe4c094fc85158ac3aa89867d8494a
This patch modified struct VP9_COMP. Created a struct ThreadData
to include data that need to be copied for each thread. In
multiple thread case, one thread processes one tile. all threads
share one copy of VP9_COMP,
(refer to VP9_COMP *cpi in the code)
but each thread has its own copy of ThreadData,
(refer to ThreadData *td in the code).
Therefore, within the scope of encode_tiles(), both cpi and td
need to be passed as function parameters.
In single thread case, the FRAME_COUNTS pointer in ThreadData
points to "counts" in VP9_COMMON.
Change-Id: Ib37908b2d8e2c0f4f9c18f38017df5ce60e8b13e
Similar to mask_filter, the filter_cache in RD_OPT struct can be
moved out, and declared as a local variable since it is only
used in pick_inter_mode functions.
Change-Id: I412b99cca82bade07ac912064ec03dd1de6b2c17
The mask_filter in RD_OPT struct is used to record rd result in
filter decision. It is only used in pick_inter_mode functions,
and is removed from the struct and declared as a local variable.
Change-Id: I3c95c8632ba7241591ce00ef2ef5677b5e297d7b
These 2 members in RD_OPT were moved to TileDataEnc struct
already, and therefore were removed here.
Change-Id: I22fee3b67f96e473a58e194a7edc76dbd48bfa04
Several frame counters in encoder are updated at SB level. Combine
those counters and put them in a separate struct, which allows us
to allocate one copy for each thread.
Change-Id: I00366296a13c0ada4d8fa12f5e07728388b6cab7
This commit makes a struct that contains rate value, distortion
value, and the rate-distortion cost. The goal is to provide a
better interface for rate-distortion related operation. It is
first used in rd_pick_partition and saves a few RDCOST calculations.
Change-Id: I1a6ab7b35282d3c80195af59b6810e577544691f
vp9_rdopt is for making rd optimal mode decisions. vp9_rd is for all
other rd related routines. Anything used outside of making an rd optimal
decision belongs in rd.
Change-Id: I772a3073f7588bdf139f551fb9810b6864d8e64b