Approximates division using multiply and shift.
Speeds up both sizes (8x8 and 16x16) by 30 times.
Fix the call sites to use the RTCD function.
Delete sse2 and mips implementation. They were based on a previous
implementation of the filter. It was changed in Dec 2015:
ece4fd5d22
BUG=webm:1378
Change-Id: I0818e767a802966520b5c6e7999584ad13159276
Don't force disabling of adaptive_rd_thresh for realtime when
row_mt_bit_exact is set.
Row based adaptive rd is made usable in CL
454882(https://chromium-review.googlesource.com/c/454882) for REALTIME.
Change-Id: Ief023414f0fd6eb86f299dd46ae58f4436875af5
Make some speed setting changes for temporal enhancement layers,
and remove the switch in subpel_force_stop for the aggressive_base_mv
in non-rd pickmode.
Gain some 2-3% speed with little/negligible quality loss.
Change-Id: I3e2a7f80ff45f38c0a6ceb01b34dbca2f53edbf0
For speed >= 8 and color_sensitivity not set, skip the transform
skipping test in UV planes.
Add a new condition to check noise level to skip chroma check
for speed >= 8 if y_sad is high.
1~2% speedup on ARM for speed 8.
Borg tests show neutral results in both rtc and rtc_derf.
Change-Id: Idecd3ff6e28c97757a43bb6f3a7082c85f72109c
Add a low-variance high-sumdiff to the superblock content state
and use it to limit the mv and bias some decisions in non-rd pickmode.
Only affects speed >= 6.
Reduces artifact for lighting changes.
Small/no difference in metrics on RTC set.
Change-Id: Ic84b2379fe0ae3fa71ae826ee6bae3eaf551a25b
This patch followed allow_exhaustive_searches feature modification and
continued to modify the encoder to achieve the determinism in the row
based multi-threaded encoding. While row-mt = 1 and using multiple
threads, the adaptive feature in encoder was disabled, which gave
BDRate gain(at speed 1, -0.6% ~ -0.7%; at speed 2, -0.46% ~ -0.59%),
but some encoder speed losses(7% ~ 10% at speed 1 and 3% ~ 6% at
speed 2). These speed losses were acceptable considering the speed
gains obtained from row-mt.
Change-Id: I60d87a25346ebc487a864b57d559f560b7e398bb
A previous patch turned on allow_exhaustive_searches feature only for
FC_GRAPHICS_ANIMATION content. This patch further modified the feature
by removing the exhaustive search limit, and made it no longer adaptive.
As a result, the 2 counts that recorded the number of motion searches
were removed, which helped achieve the determinism in the row based
multi-threading encoding. Tests showed that this patch didn't cause
the encoder much slower.
Used exhaustive_searches_thresh for this speed feature, and removed
allow_exhaustive_searches. Also, refactored the speed feature code
to follow the general speed feature setting style.
Change-Id: Ib96b182c4c8dfff4c1ab91d2497cc42bb9e5a4aa
The more aggressive settings should only be used when denoise_svc
condition is satisfied (which means top spatial layer).
Change-Id: Ia8e3515b27f31bf21b1976ca80a2fa826daece3a
In non-rd pickmode (speed >= 5), avoid duplication of computations in
model_rd_for_sb_y when the speed feature use_simple_block_yrd is
enabled (or for high bitdepth build under certain conditions).
QVGA, VGA and HD have 1.23%, 2.68% and 1.7% speedup on ARM for speed 8,
respectively.
Encoding results are bitexact for speed >= 5.
Change-Id: I3f9130810c21439f5ad7e159e21cb2243dcd05f1
The allow_exhaustive_searches feature improves the encoding quality
of FC_GRAPHICS_ANIMATION content a lot. For non-FC_GRAPHICS_ANIMATION
content, the quality test result is almost neutral. This patch makes
this feature to be used only for FC_GRAPHICS_ANIMATION content.
The motivation of doing that is to make this feature no longer adaptive,
which will be implemented in the following patch.
Change-Id: Ic911df6dd757402b6480789cc247801e99840369
Replace by CAST_TO_BYTEPTR/SHORTPTR.
The rule is: if a short ptr is casted to a byte ptr, any offset
operation on the byte ptr must be doubled. We do this by casting to
short ptr first, adding offset, then casting back to byte ptr.
BUG=webm:1388
Change-Id: I9e18a73ba45ddae58fc9dae470c0ff34951fe248
The scaling filter with zero shift will give sub-sampling for
2x downsampling. Allow for a phase shift to get an averaging filter.
Usage is for source scaling in 1 pass SVC mode for 1:2 downscale.
Reduces aliasing in downsampled image.
Keep the phase to 0/off for now.
Change-Id: Ic547ea0748d151b675f877527e656407fcf4d51e
This condiiton is not needed as key_frame should set the refresh
of the reference frames, but good to have for clarity in condition.
Change-Id: Icf9838e7e4f0ff5cf0a9562ae3b5d6c7e6f78702
This reverts commit 863f860bfc.
This causes encoder / decoder mismatches in various
VP9/DatarateTestVP9Large.BasicRateTargeting3TemporalLayers tests
BUG=webm:1408
Change-Id: Ic200c39d7ed9c0b0247ef562f5d6f7b2625f7e14
Fix to avoid getting stuck at very low Q even
though content is changing, which can happen for --min-q=0.
Fix is to more aggressively increase active_worst_quality
when detecting significant rate_deviation at very low Q.
Change will only affect 1 pass VBR for --min-q < 4, so no
change in ytlive metrics for --min-q >= 4.
Change-Id: I4dd77dd7c08a30a4390da0ff2c8bda6fccfa76d7
Useful for SVC, where the top layer enhancement frames may
not update any reference buffers, as is the case for the
patterns in the 1 pass CBR SVC when #temporal_layers > 1.
~3% encoder speedup for SVC patterns with temporal layers
in 1 pass CBR mode.
Updated the SVC datarate tests for the mismatch frames.
Set the frame-dropper off in some tests with #temporal_layers > 1
so we can correctly set #mismatch frames. Adjusted rate target
threshold for tests where frame-dropper was turned off.
Change-Id: Ia0c142f02100be0fed61cd2049691be9c59d6793
The MV unit test revealed an integer overflow issue in vp9_mcomp.c.
This was caused if the MV was very large. In mv_err_cost(), when
mv->row = 8184, mv->col = 8184 and ref_mv is 0, mv_cost = 34363
and error_per_bit = 132412, causing the overflow.
BUG=webm:1406
Change-Id: I35f8299f22f9bee39cd9153d7b00d0993838845e
Set adaptive_rd_thresh to 2 when simple block yrd is not used.
Fix regression caused by computing y sad without
int_pro_motion_estimation on low res motion clips.
Overall 0.07% quality loss on rtc_derf.
Change only affects low res on speed 8.
Change-Id: Ic6a188a56529f1034d6431005fb4b0e24e8a7e27
For speed 5, 1 pass CBR: Don't use the nonrd_pick_partition
on the segment, rather use choose_partitioning followed by
nonrd_select_partition (as is done on base segment).
Little/no quality loss on RTC and RTC_derf (< 0.3%),
speedup of at least 5%.
Change-Id: I5273d5f950e60adf5e437b4ca8c4f63964641e83
If the noise estimation is avoided due to large motion,
the last_source for denoising should still be updated.
Change-Id: I67155ea7dbe9ac2785978e64a27bdafd7d57aac0
To reduce refresh on partial super-blocks on boundary,
for noisy input. Reduces some artifacts on noisy input.
Change-Id: I10b5808a296874e08c7f378b3df58466591d8dbe
Edit
Move the condition for effectively disabling the denoising
for speed 5 into the vp9_denoiser_denoise().
This is cleaner, and also moving the condition into vp9_denoiser_denoise
will keep the denoiser buffer updated with the current source.
This allows for more consistent behavior if speed is changed midstream.
Change-Id: Ia001f591c56e454bf724c3ae73c024badb183ef8
To prevent the motion vector out of range bug, added a motion vector unit
test in VP9. In the 4k video encoding, always forced to use extreme motion
vectors and also encouraged to use INTER modes. In the decoding, checked if
the motion vector was valid, and also checked the encoder/decoder mismatch.
The tests showed that this unit test could reveal the issue we saw before.
Change-Id: I0a880bd847dad8a13f7fd2012faf6868b02fa3b4