This commit refactors the rate distortion structure used in the
non-RD coding mode and saves a few RDCOST calculations.
Change-Id: I62c3416c300d2c5372f21b96d93a6b633a34ab3a
Its functionality has been replaced with choose_partitioning and
threshold based control on split mode check.
Change-Id: Ic9bb321df06b524f5c38ea5874dc6f6a8f93c5e3
This speed feature has been deprecated in both yt and rtc coding
modes. This commit removes the related operations.
Change-Id: I079c79c6adafe45581af2ebf8b98faebcface1ce
This commit re-designs the recursive partition search scheme in
rtc speed -5. It first checks if the current block is under cyclic
refresh mode. If so, apply recursive partition search. Otherwise,
perform sub-sampled pixel based partition selection. When the
pre-selection finds the partition size should be 32x32 or above,
use the partition size directly. Otherwise, apply partition search
at nearby levels around the preset partition size.
It is enabled in speed -5. The compression performance of rtc
speed -5 is improved by 9.4%. Speed wise, the run-time goes slower
from 1% to 10%.
nik_720p, 1000 kbps
33220 b/f, 38.977 dB, 10109 ms -> 33200 b/f, 39.119 dB, 10210 ms
vidyo1_720p, 1000 kbps
16536 b/f, 40.495 dB, 10119 ms -> 16536 b/f, 40.827 dB, 11287 ms
Change-Id: I65adba352e3adc03bae50854ddaea1b421653c6c
Currently, the tokens for a tile are stored immediately after its
preceding tile, which causes a dependency. This is unnecessary
since we always allocate enough memory for tokens. Removing
the dependency allows token writing done in parallel. This patch
doesn't change encoding result.
Change-Id: I7365a6e5e2c2833eb14377c37e1503c9d0f26543
Compare the estimated rate and distortion to the thresholds scaled
according to the operating block size and determine if further
split partition search will be run. The compression performance of
speed -5 is changed by -0.074%. The encoding speed is 10% - 15%
faster.
vidyo1 720p
16545 b/f, 40.492 dB, 11475 ms -> 16535 b/f, 40.486 dB, 10100 ms
nik720p
16624 b/f, 36.310 dB, 10071 ms -> 16617 b/f, 36.313 dB, 8346 ms
Change-Id: Ic9197ab5761279ae55d2fb7813b2af0e0db497b8
Reduce the intra_cost_penalty for non-rd mode,
and some updates to VAR_BASED_PARTITION.
Visual tests show some improvement at Speed 6, for RTC clips.
Change-Id: If9090daf7aed14906a32d931a538ab544bbca606
This commit replaces the use of copy_partitioning with
choose_partitioning based on the sse of subsamped pixels, which
provides significantly better coding performance and runs at
similar speed, as compared to copy_partitioning. It improves rtc
speed 5 coding performance by 3%.
Change-Id: I52d3682a12dce0147f5e52383a594fc242ca3228
This commit makes a struct that contains rate value, distortion
value, and the rate-distortion cost. The goal is to provide a
better interface for rate-distortion related operation. It is
first used in rd_pick_partition and saves a few RDCOST calculations.
Change-Id: I1a6ab7b35282d3c80195af59b6810e577544691f
The concept:
There's too much noise in source pixels for variance and at low bitrate
the reconstructed looks nothing like the source so we have problems
getting good partitionings with either. This skirts the issue by using
a box blur scaled down version for variance calculations. To compare
against source_var_ moved keyframe to be rd based like source_var.
Change-Id: Ie3babdbfadae324b7b5a76bea192893af27f0624
The functions b_width_log2 and b_height_log2 only do direct
table fetch. This commit unifies such use cases by using the
table directly and removes these functions.
Change-Id: I3103fc6ba959c1182886a2799d21b8b77c8a7b6b
This commit adds proper initialization of segment id for variance AQ
mode in non-rd coding path. It fixes the enc/dec mismatch issue of
rt=7 with --aq-mode=1, as reported in issue #816
Change-Id: I02fa41b96345bf2e66077d5ea553f85ba800f7bb
This commit enables the encoder to skip split partition search if
the bigger block size has all non-zero quantized coefficients in low
frequency area and the total rate cost is below a certain threshold.
It logarithmatically scales the rate threshold according to the
current block size. For speed 3, the compression performance loss:
derf -0.093%
stdhd -0.066%
Local experiments show 4% - 20% encoding speed-up for speed 3.
blue_sky_1080p, 1500 kbps
51051 b/f, 35.891 dB, 67236 ms ->
50554 b/f, 35.857 dB, 59270 ms (12% speed-up)
old_town_cross_720p, 1500 kbps
14431 b/f, 36.249 dB, 57687 ms ->
14108 b/f, 36.172 dB, 46586 ms (19% speed-up)
pedestrian_area_1080p, 1500 kbps
50812 b/f, 40.124 dB, 100439 ms ->
50755 b/f, 40.118 dB, 96549 ms (4% speed-up)
mobile_calendar_720p, 1000 kbps
10352 b/f, 35.055 dB, 51837 ms ->
10172 b/f, 35.003 dB, 44076 ms (15% speed-up)
Change-Id: I412e34db49060775b3b89ba1738522317c3239c8
This patch re-enabled the feature in Pengchong's patch
(commit 1286126073). Originally, it
was turned on while use_lastframe_partitioning > 0(not used anymore).
Now it was added as a feature, and turned on while speed >= 2.
As described in the original patch, this feature helps speed up the
slideshows in YouTube.
Change-Id: I1b0f18d65da1ee1c8d1e117dabba910c5207c471
Simplified the code and removed some code that was not used anymore.
This patch didn't change encoding result.
Change-Id: I7e54a74c8f35a6726dfc8a1c55b337448b7ea124
mi_grid_* are arrays of pointer to pointer. They save the pointers that point
to the MIs in cm->mi. But they are unnecessary and complicated. The original
goal was to remove MODE_INFO_t copy. But with an extra MODE_INFO_t pointer
inside MODE_INFO_t, same goal could be achieved.
This commit totally removes the mi_grid_* structures. But there are still
many dummy MODE_INFO_t inside cm->mi which are a waste of memory. Next commit
will do on-demand MODE_INFO_t allocation in order to save these memories.
Change-Id: I3a05cf1610679fed26e0b2eadd315a9ae91afdd6
The use of use_lastframe_partitioning is totally removed in good-
quality encoding. Its usage in real-time encoding needs to be
evaluated to see if it can be removed too.
The Borg tests at speed 4 showed:
stdhd set: 0.220% psnr gain, 0.166% ssim gain;
derf set: 0.329% psnr gain, 0.476% ssim gain.
Speed test on selected clips showed 1.54% speedup.(Worst case:
pedestrian_area_1080p25.y4m, speed loss: 1.5%)
Change-Id: I1c844d329b0b5678558439b887297c1be7ddab00
This commit removes the special case for key frame, as transform size
decision is controlled by the appropriate speed feature for all lossy
coding modes: tx_size_search_method.
Change-Id: I9677171e3f2432ec23705f7c5ea8170dd4562fae
In the partition search, the encoder checks all possible
partitionings in the superblock's partition search tree.
This patch proposed a set of criteria for partition search
early termination, which effectively decided whether or
not to terminate the search in current branch based on the
"skippable" result of the quantized transform coefficients.
The "skippable" information was gathered during the
partition mode search, and no overhead calculations were
introduced.
This patch gives significant encoding speed gains without
sacrificing the quality.
Borg test results:
1. At speed 1,
stdhd set: psnr: +0.074%, ssim: +0.093%;
derf set: psnr: -0.024%, ssim: +0.011%;
2. At speed 2,
stdhd set: psnr: +0.033%, ssim: +0.100%;
derf set: psnr: -0.062%, ssim: +0.003%;
3. At speed 3,
stdhd set: psnr: +0.060%, ssim: +0.190%;
derf set: psnr: -0.064%, ssim: -0.002%;
4. At speed 4,
stdhd set: psnr: +0.070%, ssim: +0.143%;
derf set: psnr: -0.104%, ssim: +0.039%;
The speedup ranges from several percent to 60+%.
speed1 speed2 speed3 speed4
(1080p, 100f):
old_town_cross: 48.2% 23.9% 20.8% 16.5%
park_joy: 11.4% 17.8% 29.4% 18.2%
pedestrian_area: 10.7% 4.0% 4.2% 2.4%
(720p, 200f):
mobcal: 68.1% 36.3% 34.4% 17.7%
parkrun: 15.8% 24.2% 37.1% 16.8%
shields: 45.1% 32.8% 30.1% 9.6%
(cif, 300f)
bus: 3.7% 10.4% 14.0% 7.9%
deadline: 13.6% 14.8% 12.6% 10.9%
mobile: 5.3% 11.5% 14.7% 10.7%
Change-Id: I246c38fb952ad762ce5e365711235b605f470a66
Add a speed feature to give the tighter partition search
range. Before partition search, calculate the histogram
of the partition sizes of the left, above and previous
co-located blocks of the current block. If the variance of
observed partition sizes is small enough, adjust the search
range around the mean partition size, which will be tigher.
The feature is currently turned on at speed 2. Experiments on
sample youtube clips show on average the runtime is reduced
by 3-7%.
For hard stdhd clips:
park_joy_1080p @ 15000kbps: 509251 ms -> 491953 ms (3.3%)
pedestrian_area_1080p @ 2000kbps: 223941 ms -> 214226 ms (4.3%)
The PSNR performance is changed:
derf: -0.112%
yt: -0.099%
hd: -0.090%
stdhd:-0.102%
Change-Id: Ie205ec5325bf92ec5676c243e30ba9d0adca10f2
In the encoder, current_video_frame is used in a couple of places to
decide encoding strategy, this commit replaces with more appropriate
variables.
Change-Id: I3d3d8d8e2ea02c489e4639b9d4c446a63e357d29
The function is called only once, right after all stats counters are
reset to 0. Therefore all the computations have zero effect on return
values. This commmit to removed those effectless code.
Change-Id: I50d27c0802547921fa36c60aa4bd92d76247f595
We had a very complicated way to initialize cpi->pass from
cfg->g_pass:
switch (cfg->g_pass) {
case VPX_RC_ONE_PASS:
oxcf->mode = ONE_PASS_GOOD;
break;
case VPX_RC_FIRST_PASS:
oxcf->mode = TWO_PASS_FIRST;
break;
case VPX_RC_LAST_PASS:
oxcf->mode = TWO_PASS_SECOND_BEST;
break;
}
cpi->pass = get_pass(oxcf->mode).
Now pass is moved to VP9EncoderConfig and initialization is simple:
switch (cfg->g_pass) {
case VPX_RC_ONE_PASS:
oxcf->pass = 0;
break;
case VPX_RC_FIRST_PASS:
oxcf->pass = 1;
break;
case VPX_RC_LAST_PASS:
oxcf->pass = 2;
break;
}
Change-Id: I8f582203a4575f5e39b071598484a8ad2b72e0d9
Fix the interaction between active map and reuse_inter_pred_sby. The
reuse_inter_pred_sby feature expects inter predictors to already be
built, but blocks with active map on skip this step.
Change-Id: Ibb2bf0d228f678935d82a0ede9cb0919ab7c8878
This commit integrates the fast transform and quantization process
into skip_recode scheme in the rate-distortion optimization loop.
Previously the fast transform and quantization process was only
enabled for non-RD coding flow.
Change-Id: Ib7db4d39b7033f1495c75897271f769799198ba8
This patch allows the encoder to directly split the block
in partition search, therefore skip searching NONE. It
computes a score which measures whether 16x16 motion vectors
from the first pass in the current block are consistent with
each others. If they are inconsistent and we have enough Q
to encode, split the block directly, and skip searching NONE.
This feature is under flag CONFIG_FP_MB_STATS. In speed 2,
it further gives a speedup of 3-8% on sample yt clips as
compared to the previous version under the same flag. Overall,
the features under the flag will give 7-15% on typical yt
clips at up to 6000kbps data rate. The speedup at very high
data rate is not significant.
For hard stdhd clips:
park_joy_1080p @ 15000kbps: 504541ms -> 506293ms (-0.35%)
pedestrian_area_1080p @ 2000kbps: 326610ms -> 290090ms (+11.2%)
The compression performance using the features under the flag:
derf: -0.068%
yt: -0.189%
hd: -0.318%
stdhd:-0.183%
To use the feature, set CONFIG_FP_MB_STATS and turn on
cpi->use_fp_mb_stats.
Change-Id: Iad58a2966515c8861aa9eb211565b1864048d47f
Re-organize the one-byte structure for 16x16 first pass
block. Add bits to indicate motion vector directions.
Change-Id: Id10754ba343dfc712c7fed5bcc85c67fa0bbcb89
This patch allows the encoder to skip the search for partition
SPLIT, HORZ, VERT after the search for partition NONE is done
in RD optimization. It uses the first pass block-wise statistics
to make the decision. If all 16x16 blocks in the current partition
have zero motions and small residues from the frist pass statistics,
and it has small difference variance, further partition search is
skipped.
For speed 2 setting, experiments on general youtube clips show that
the speedup varies from 1% - 10%, 5% on average. On the performance
side in PSNR, derf 0.004%, yt -0.059%, hd -0.106%, stdhd 0.032%.
For hard stdhd clips:
park_joy_1080p, 502952 ms -> 503307 ms (-0.07%)
pedestrian_area_1080p, 227049 ms -> 220531 ms (+3%)
This feature is under the compilation flag CONFIG_FP_MB_STATS and
it is off in current setting.
Change-Id: I554537e9242178263b65ebe14a04f9c221b58bae
Remove the variable that indicates the relative block index. This
is explicitly covered by the use of pc_tree.
Change-Id: Ib13142582fff926c85e375bde656aa050add8350
This commit enables a chessboard pattern constrained partition
search for 720p and above resolutions. The scheme applies stricter
partition search to alternative blocks based on its above/left
neighboring blocks' partition range, as well as that of the
collocated blocks in the previous frame. It is currently turned
on at 16x16 block size level. The chessboard pattern is flipped
per coding frame.
The speed 3 runtime is reduced:
park_joy_1080p, 652832 ms -> 607738 ms (7% speed-up)
pedestrian_area_1080p, 215998 ms -> 200589 ms (8% speed-up)
The compression performance is changed:
hd -0.223%
stdhd -0.295%
Change-Id: I2d4d123ae89f7171562f618febb4d81789575b19
This commit replace the repetitive retrieve of max and min allowed
partition from speed_feature with local variables max_size and
min_size.
Change-Id: Ib06f11f16615e4876e4dd5fb6a968c6bf5f7b216
The get_chessboard_index() used to call the entire VP9_COMMON
struct pointer to retrieve the chessboard pattern index. This cl
makes it call the frame index directly.
Change-Id: I3cad9d209ea2e77a358085a04fe1ff0ddec5ba03
The partition search for 4x4 blocks takes unnecessary steps to
reconstruct pixels and an extra partition type update. This commit
removes such operations. No visible compression/speed difference.
Thanks to Yue (yuec@) for finding this issue.
Change-Id: I3f83824aa3fd3717d63be0b280fa57258939a70a
For gcc, when libvpx config option debug is disabled, added the
flag -DNDEBUG to disable the assertions in libvpx for some speedup.
Change-Id: Ifcb7b9e8ef5cbe5d07a24407b53b9a2923f596ee
The bug sets the wrong pointer to the first pass mb stats
if the encoder does the re-coding in the second pass.
Change-Id: I8a11f45dd7dceb38de814adec24cecccae370d00
In vp8, statistics are collected about the different modes as they are searched.
This process is more complicated due to the variable block size. Fields were
added to the PICM_MODE_CONTEXT struct to hold this information for each point in
the search. The information is then taken from the appropriate part of the tree
during denoising.
Change-Id: I89261ab77ad637821287ae157dfdf694702b8e77
vp9_rdopt is for making rd optimal mode decisions. vp9_rd is for all
other rd related routines. Anything used outside of making an rd optimal
decision belongs in rd.
Change-Id: I772a3073f7588bdf139f551fb9810b6864d8e64b
This commit re-designs the quantization process for transform
coefficient blocks of size 4x4 to 16x16. It improves compression
performance for speed 7 by 3.85%. The SSSE3 version for the
new quantization process is included.
The average runtime of the 8x8 block quantization is reduced
from 285 cycles -> 255 cycles, i.e., over 10% faster.
Change-Id: I61278aa02efc70599b962d3314671db5b0446a50
Encoder still uses SWITCHABLE as default via DEFAULT_INTERP_FILTER,
but does not override the default if it is not SWITCHABLE.
Change-Id: I3c0f6653bd228381a623a026c66599b0a87d01d5