Don't add include files to the archive. Avoids build failures for
Windows such as:
the input file 'libvpx_g.a(x86_abi_support.asm.o)' has no sections
Change-Id: If9c8e70c0ec913b7ad7dd6a08d4fa19011114ad2
No need to specify default behaviour. The original change introducing nasm:
7be093ea4d
mentions requiring 2.0.9, which was the first release to default to this behaviour:
http://www.nasm.us/doc/nasmdoc2.html
"The -Ox mode is recommended for most uses, and is the default since NASM 2.09."
Change-Id: Ia914c4deede5aa447277b5189bb4fcf7e54c338d
nasm does not accept x64
yasm has accepted (and appears to prefer) win64 at least as far back as
1.0.0:
http://yasm.tortall.net/releases/Release1.0.0.html
Change-Id: Ied881b1df0570da256b1bd7e131e7817e47f768f
Set num_inter_modes based on ref_mode_set_svc, which is
smaller set than ref_mode_set (which may use alt-ref).
No change in behavior.
Change-Id: I31169bb09028db230552c6fca0a86959d1ade692
1. Delete unnecessary zero setting process.
2. Optimize the method of calculating SSE in vpx_varianceWxH.
Change-Id: I58890c6a2ed1543379acb48e03e620c144f6515f
Avoids duplicate computation of UV predictor.
Bit-exact when static_threshold is zero.
Small/neutral difference on RTC set with nonzero static_threshold
(since UV predictor won't be skipped with this change).
Small speed gain, ~1-2%, at speed 8.
Change-Id: Iba8d22a307768b391e29d63c9826aac5a4d9c285
this is only meant for testing. along with --enable-experimental
--enable-spatial-svc require VPX_TEST_SPATIAL_SVC to be defined rather
than bumping the encoder ABI.
Change-Id: I7f34d9f60300fa31ccf22e1a4aa619392c391b2e
For 1 pass cbr SVC: GOLDEN is the spatial reference,
better not to check for encoder_breakout on this reference.
Small positive ~0.075% (mostly neutral) gain in avgPSNR/SSIM metrics.
No observed change in encoder speed.
Change-Id: Ib337f16d6771105bf06384c6a23ad047fc690418
For the case when the number of temporal layers > 1,
the buffer levels (starting/optimal_buffer_level,
and maximum_buffer_size) were not scaled properly.
In vp9_update_layer_context_change_config():
when setting the layer-buffer levels, fix is to scale
the layer-target_bandwidth by the target_bandwidth
(which is the full stream bandwidth) instead of the
spatial_layer_target.
This is needed because prior to the call
vp9_update_layer_context_change_config(), set_rc_buffer_sizes()
is called which sets the buffer levels based on target bandwidth
(which is the full bandwidth for the SVC stream).
This fix properly sets the layer-buffer levels based on the
layer-bandwidth, and leads to better rate targeting.
Small/neutral change in avgPSNR/SSIM metrics on RTC set.
Change-Id: Ic0f4f7f3487c37b9a9adb4781ae5edfed7140a57
Control Flow Integrity [1] indirect call checking verifies that function
pointers only call valid functions with a matching type signature. This
change eliminates function pointer casts to make libvpx CFI-safe.
[1] https://www.chromium.org/developers/testing/control-flow-integrity
Change-Id: I7e08522d195a43c88cda06fa20414426c8c4372c
For reference frames: enable scale partition for
superblocks with low source sad or if bsize on lower-resoln
is at least 32x32.
Keep feature disabled for base temporal layer.
Small regression in avgPNSR/SSIM metrics, ~0.5-1%.
Speedup ~2-3% on mac for SVC (3 spatial/3 temporal layers) at speed 7.
Change-Id: I5987eb7763845b680059128b538bb5188be0cca5
When allow_partition_search_skip is set the two pass code
can optionally skip the partition search in the rd loop if the image
appears static (based on selection of 0,0 motion).
Unfortunately 0,0 motion does not necessarily mean that there are
no meaningful changes or that motion or intra modes will not be selected
in the second pass.
Disabling "allow_partition_search_skip" may hurt the encode speed a little
for a small number of clips but can have a big impact on compression.
The most notable example of this in our test sets is "bridge_close_cif"
where this change gives a gains of 18%, 12% and 16% in opsnr, ssim and
psnr-hvs.
Change-Id: I765e288b5c0cd82bce00a148e7653a21e9203024
Enable partition copy on boundary and scale blocks along the boundary.
Rename copy_partition_svc to scale_partition_svc.
Do not copy if the block crosses the boundary.
Change-Id: I37a04d48f11b15c4ea67facd7631193ec2f62150
Fixes a build issue when relocation is not allowed:
relocation R_X86_64_32 against '.rodata' can not be used when making a shared object
Change-Id: Ica3e90c926847bc384e818d7854f0030f4d69aa0
Removal of parameters to and code in calc_frame_boost() that is no
longer required.
No change to results from previous patch.
Change-Id: Ic92da35613fdc247d22fddf24d09679fc5329017
The decay accumulator clause covers similar ground to the
new clause that tests the accumulated second reference error
so it has been removed to reduce complexity.
Change-Id: I4ec1cce32d72bd4ee463ad7def2831a68447d525
Add a clause to the breakout test for alt ref groups that
examines the size of the accumulated second reference
frame error compared to the cost of intra coding.
This clause causes a reduction in the average group length for many
clips. Alongside the change to the group length the minimum
boost is increased.
On balance the results are positive for psnr and psnr-hvs
but is negative for ssim/fast ssim for the smaller image formats.
Strong gains on some harder clips (eg ducks take off (midres) ~20%,
husky (lowres) 6-17%. Most of the negative cases are lower motion
clips. Subsequent patch hopefully will help with those.
Change-Id: Ic1f5dbb9153d5089e58b1540470e799f91a65dc4
Fix/cleaup the conditioning for usage of the reuse-lowres
partition feature.
Replace the non-reference condition with the top temporal
layer, and put this condition in the speed feature.
This prevents doing update_partition_svc() on every
VGA frame, instead it will now only do update for VGA in
the top temporal layer frames.
Also this makes it easier to test/enable this feature
for lower layer temporal frames.
Change-Id: Ia897afbc6fe5c84c5693e310bcaa6a87ce017be5
For new VP9 only content type adjust the rate distortion and ARF
filter based on the relative spatial variance of the source and
reconstruction.
In regards to the RD loop the method favors modes where the
reconstruction variance is similar to the source variance. However it
is currently only applied to regions where the source variance is quite
low.
For very low variance blocks it applies a further bias against intra
coding and large prediction block sizes (the later in particular limit
the usefulness of the loop filter).
The final part of this change is to lower the strength of the ARF
filter for blocks where the source has very low spatial variance, to
encourage some low amplitude texture or noise to pass through
the filter.
This change improves the retention of film grain and fine noise /
texture in spatially flat regions, but as expected causes a significant
drop in PSNR on many clips. This is to be expected because similar
but misaligned noise or texture will give a lower PSNR than a flat
noise free reconstruction. However, it is worth noting that most clips
show a strong gain in FAST SSIM.
The features are enabled on the vpxenc command line by setting
--tune-content=film.
VPX_ENCODER_ABI_VERSION bumped for this change and cvbr.
Change-Id: I26a4e4edfa3dc5cacead82fa701fe7a9118ccd0a
Removed three parameters that are no longer needed in calls
to calc_arf_boost() and associated minor changes.
No impact on encode results.
Change-Id: Ieaf31d0d2e1990b99cf69647170145a1bbfbb9fb
For choose_partitioning (speed >= 6): avoid computation
of minmax variance for non-reference frames in SVC.
Existing condition only avoided this for speed >= 8.
Combine that existing logic with non-reference condition.
Small speedup (~0.5-1%) for 3 layer SVC,
neutral change on avgPSNR/SSIM metrics.
Change-Id: I3e9f3a1af0647b15e475cf170d9402908d672ee5
Release frame buffers for non-ref when the decoder is destroyed.
Enable the non ref test.
BUG=b/68819248
Change-Id: Id87ef3b0a62318f9812e927cd957c05c859047fa
For SVC with 3 spatial layers:
Add feature to copy/upscale partition from middle spatial layer
to the upper/highest resolution, when superblock sad is not high.
Enabled for speed >= 7 and only for non-reference frames.
Speedup ~3-4%, small loss in avgPNSR/SSIM of ~1%.
Change-Id: I7f0a2716c0fde28bade0f86159d11b7e31d6ab8d
For a chosen interval "i" the existing arf boost calculation examined frames
+/- (i-1) frames from the current location in the second pass.
This change checks to make sure that the forward search does not extend
beyond the next key frame in the event that the distance to the next key
frame is < (i - 1).
Small metrics gains on all our test sets but these are localized to a few clips
(e.g. midres set psnr-hvs sintel -2.59% but overall average was only -0.185%)
Change-Id: I26fc9ce582b6d58fa1113a238395e12ad3123cf6
The new test will run a SVC bitstream which has non ref frames.
It checks the number of buffer acquired and released to make sure all
external frame buffers are released.
Add a new test bitstream:
vp90-2-22-svc_1280x720_1.webm
which has 400 frames in total, and 1 spatial layer and 2 temporal layers.
There is one non ref frame every other frame.
Disabled for now. Will be enabled with the fix.
BUG=b/68819248
Change-Id: I0515336fd9809a9e1fceba90e4dce53dabaf53a5
Added command line control of Corpus VBR.
The new corpus vbr mode is a variant of standard
VBR (end-usage=0) where the complexity distribution
mid point is passed in rather than calculated for a specific
clip or chunk.
The new variant is enabled by setting a new command line
parameter --corpus-complexity to a zero value. Omitting
this parameter or setting it to 0 will cause the codec to use
standard vbr mode.
The correct value for a given corpus needs to be derived
experimentally using a training set such that the average
rate for the corpus is close to the target value.
For example our using our low res test set with upper and lower
vbr limits of 50%-150% and a corpus complexity value of 650
gives a similar average data rate across the set to using standard
vbr. However, with the corpus mode easier clips will be allocated
fewer bits and harder clips more bits rather than having the same
rate target for all.
Change-Id: I03f0fc8c6fb0ee32dc03720fea6a3f1949118589
For nonrd_pickmode: if early_term is set there should be
no need to include UV in rdcost (when color_sensitivity is set).
Neutral change on RTC and RTC_derf metrics, for speed >= 5.
No change for ytlive metrics.
Very small speed gain (~0.5%) on some clips with strong color content.
Change-Id: Ifc00928ecd935fc71e94935ceef0ae7481249f07
Allow for compound prediction mode in nonrd_pickmode for ZEROMV.
For real-time encoding, 1 pass with non-zero lag-in-frames.
Added speed feature to control the feature.
Enabled for speed >=6 for now, under VBR mode.
avgPSNR/SSIM metrics positive on ytlive set, for speed 6:
some clips up by ~3-5%, some clips neutral gain, average gain
across clips is ~1%.
Small/negligible decrease in speed.
Change-Id: I7a60c7596e69b9a928410c5ee2f9141eecd8613d
Even though frame_size is calculated in uint64_t, it winds up in an int
size value.
This was exposed with the msan test because the memset is called with
(int)frame_size, leading to a segfault.
Change-Id: I7fd930360dca274adb8f3e43e5e6785204808861
Change type of sum_square_error from int64_t to uint32_t.
Change type of sum_error from int64_t to int32_t.
This reduces the stack usage from ~131K to ~87K.
BUG=b/68362457
Change-Id: I147d7c7b226bceb4f0817bb86848e1fa9d9ac149
swap '{' and c-style comments removing a few redundant ones along the
way; covers most leftovers from the clang-tidy run against an
x86_64-linux config.
Change-Id: I67a45596f80a12389faca49c5be440875092a7df
Changed the intrinsics to perform summation similiar to the way the assembly does.
The new code diverges from the assembly by preferring unsaturated additions.
Results for haswell
SSSE3
Horiz/Vert Size Speedup
Horiz x4 ~32%
Horiz x8 ~6%
Vert x8 ~4%
AVX2
Horiz/Vert Size Speedup
Horiz x16 ~16%
Vert x16 ~14%
BUG=webm:1471
Change-Id: I7ad98ea688c904b1ba324adf8eb977873c8b8668
Set adaptive_row_thresh_mt = 1 at speed >= 7,
for svc when multi-threading is used with row-mt.
This allow the adaptive_rd_thresh feature to be used
in the nonrd-pickmode.
~1-2% speedup for SVC encoding with small quality
loss (< 0.6%) on RTC set.
Change-Id: Iab9878dff117bccdaef3e4d0645165db9808cdfc
Disable cyclic refresh if ROI is used and add flag to properly handle
the static_thresh deltas.
Remove the ROI test for cyclic refresh (it's allowed but disabled if ROI
is used).
Add an example in vpx_temporal_svc_encoder.c. Turned off by default.
BUG=webm:1470
Change-Id: Ief9ba1d7f967bc00511b412b491c3f70943bfbda
Note this change will trigger the different C version on SSSE3 and
generate different scaled output.
Its speed is 2x compared with the version calling vpx_scaled_2d_ssse3().
Change-Id: I17fff122cd0a5ac8aa451d84daa606582da8e194
Small inncrease the sad_thresh1, avoids some false
detection of possible scene changes within lag.
Small improvement in few clips on ytlive, otherwise neutral change.
Change-Id: Ia79b53bb657bbce65a7aac7d20666b6373d5af8b
Expose the threshold for setting key frame on cut,
and increase it for speed 5.
Also small adjustment to min_thresh.
No change in overall metrics or fps.
Small quality improvement and lower encode time on scene cuts.
Change-Id: I36e06ff3b26b6c29aede39c23fce454525fc9026
Small increase in threshold for the 1 pass VBR datarate tests.
Needed due to commit:
<017257a Adjustment to scene detection and key frame>
Change-Id: I28b3bd7db2192a8cc2bccc3cb0e3b8dbb910ca16
The initial allocation of bits in the two pass code to each frame
should be within the min max limits on the command line. However,
when forming an ARF group the cost of the ARF is shared by frames
in that group such that the residual bits for a frame could drop below
the min value. This change prevents the minimum being re-applied
after the cost of the ARF has been deducted as this may otherwise
cause low rate sections to overshoot their target.
Test runs comparing to a baseline run with min and max section pct
0-2000% vs one closer to the YT use case (50-150%) suggest that
this fix not only results in better rate control but also gives a better
rd outcome.
For example the HD set vs 0-2000% baseline (opsnr, ssim).
Old code (50-150): +0.751, +1.099
New code(50-150): +0.241, -0.009
Change-Id: I715da7b130bf53ba8aa609532aa9e18b84f5e2ef
Let it test extreme inputs and all filter types.
In the future ConvolveTest should test regular 8-bit functions in
high bitdepth mode.
Change-Id: I1042564d1d390589ca203070fe332c6da3315d75
For 1 pass vbr: use higher threshold on avg_sad
and force key frame under scene cut detection if
above the threshold. Allow it for speed >= 6 for now,
since it does not use the full nonrd_pickmode partition
(as in speed 5).
Improves quality somewhat on scene cut frames.
Neutral on overall metrics and fps for speed 6 on
ytlive set.
Change-Id: I12626f7627419ca14f9d0d249df86c7104438162
Change to the bit allocation within a GF/ARF group.
Normal VBR and CQ mode allocate bits to a GF/ARF group based of the mean
complexity score of the frames in that group but then share bits evenly between
the "normal" frames in that group regardless of the individual frame complexity
scores (with the exception of the middle and last frames).
This patch alters the behavior for the experimental "Corpus VBR" mode such that
the allocation is always based on the individual complexity scores.
Change-Id: I5045a143eadeb452302886cc5ccffd0906b75708
This patch makes further changes to support an experimental
corpus wide VBR mode that uses a corpus complexity
number as the midpoint of the distribution used to allocate bits
within a clip, rather than some average error score derived from the
clip itself.
At the moment the midpoint number is hard wired for testing and
the mode is enabled or disabled through a #ifdef. Ultimately this
would need to be controlled by command line parameters.
Change-Id: I9383b76ac9fc646eb35a5d2c5b7d8bc645bfa873
vpx_convolve8_avg works by first running a normal horizontal filter then a
vertical filter averages at the end.
The added vpx_convolve8_avg_avx2 calls pre-existing AVX2 code for the
horizontal step.
vpx_convolve8_avg_vert_avx2 is also added, but only uses ssse3 code.
Change-Id: If5160c0c8e778e10de61ee9bf42ee4be5975c983
This reverts commit 9311ef18b4.
Reason for revert:
Notice small regression in some clips.
Will revisit in another change.
Original change's description:
> Speed >=5 real-time: add TM intra mode for high_source_sad.
>
> Small/neutral change in metrics or speed for ytlive.
> Some improvement in quality on frames with big content change.
>
> Change-Id: Ib3b0703a5f28ea6710e90324436e27598ab7384d
TBR=marpan@google.com,builds@webmproject.org,jianj@google.com
Change-Id: I9d8ec5195bb05ddf329d325699355185affb9b13
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
For 1 pass vbr: increase min_thresh slightly, and also add
condition on golden/arf update for using full nonrd_pick_partition.
Reduces possible false detection for scene cut detection.
Neutral/small change in metrics or speed for speed 5.
Change-Id: I388f4d9a56e3cc763e0148338c1bc0381e58ad76
Small/neutral change in metrics or speed for ytlive.
Some improvement in quality on frames with big content change.
Change-Id: Ib3b0703a5f28ea6710e90324436e27598ab7384d
Lower SAD threshold to select non_rd pickmode partition
at superblock level more often.
Small gain in metrics, small/negligible decrease in speed.
Change-Id: I0f728236b91a604e4ca7e02039adc54d5985c4dc
For 1 pass vbr speed >= 6: when REFERENCE_PARTITION is selected,
avoid doing the full nonrd_pickmode based partition.
No change in overall metrics or speed.
Reduces encode times on scene cuts by 10-20%.
Change-Id: I0310b1610cc1c83793a509e0a9059840e8f18308
For 1 pass vbr mode:
On no-show_frame/ARF: instead of skipping alt_ref_frame
completely in mode testing, allow for checking (0, 0) on alt_ref.
Small gain in metrics, ~0.18%, no change in speed.
Change-Id: I32a3c24faca64ab70dd5091071a0dc301db7dd1e
For 1 pass vbr: when significant content/scene change is detected
(high_source_sad = 1) reduce/turnoff the additional qdelta on the
active_worst_quality. This helps somewhat to reduce the occurrence
of large frame sizes and large encode times.
Allow it only when use_altef_onepass is enabled.
Neutral/no change on metrics.
Change-Id: I1dd97dd2ab892d65f707b841b27a5de300b714ea
For speed 6 real-time mode: use adapt_partition
on ARF frame instead of REFERENCE_PARTITION (which is slower).
This requires enabling compute_source_sad_onepass for no-show_frames.
Speedup of ~3-5% on some clips that heavily use ARF,
small loss (~0.2%) in quality on ytlive set.
Change-Id: Ib50acc97df06458244a6ac55d2bd882c30012536
Speed comparing with the one calling vpx_scaled_2d_neon()
~1.7 x in general
~2.8x for BILINEAR filter
BUG=webm:1419
Change-Id: I8f0a54c2013e61ea086033010f97c19ecf47c7c6
Scale 3x3 block instead of 16x16 block in each loop. Disabled by
default.
Benefits:
1. Reduced number of different phase_scaler from 16 to 3.
Optimization code will be smaller and faster.
2. Maximum phase_scaler drifting will be reduced from 5/16 to 1/24.
(The drifting is 1/(3*16) in each step.)
BUG=webm:1419
Change-Id: I59a1f7496d89a1b090498c935d30cfcf1d0c282b
For real-time mode. Move the switch to fixed partition
for is_src_frame_alt_ref so all speeds may use it
if use_altref_onepass is set.
Improves metrics by ~2% for ytlive set at speed 4
(where use_altref_onepass is currently used).
Change-Id: I033240386598c9dbd0364da89ccbcca64bc663ee
Only has effect when sf->use_altref_onepass is enabled,
as in that case scene detection is skipped for non-show frame
and so high_source_sad does not get reset to 0.
No change in metrics or speed.
Change-Id: I421f066d239341449c18826089e1810b9fc5967f
Add stats for past ARF usage, and use it to disable
ARF usage based on some conditions.
Overall improvement on ytlive set, reduces the regression
on the problem clips for this feature.
Only affects when sf->use_altref_onepass is enabled
(currently off by default).
Change-Id: I66267f227ea132dc86acb730e9882f85bead2cdb
This reverts commit 535b7b915a.
This is actually used in CBR to reset the rate control if high source sad is detected.
Original change's description:
> Remove the speed condition on scene detection in 1 pass code.
>
> Scene detection is used for VBR mode and for screen_content mode.
>
> It was also enabled for CBR mode via the speed condition,
> but currently the analysis in the scene detection is not used
> in CRB mode (similar computations are done locally at superblock level
> when the source_sad feature is enabled).
>
> For 1 pass code.
> No change in behavior. Small speed gain, ~0.5%.
>
> Change-Id: I59991d7ef2af320bea7af4b907596e057affa42f
TBR=marpan@google.com,builds@webmproject.org,jianj@google.com
Change-Id: Ib4e6b02047f75632503e7b0fc870af97fa9291c3
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Scene detection is used for VBR mode and for screen_content mode.
It was also enabled for CBR mode via the speed condition,
but currently the analysis in the scene detection is not used
in CRB mode (similar computations are done locally at superblock level
when the source_sad feature is enabled).
For 1 pass code.
No change in behavior. Small speed gain, ~0.5%.
Change-Id: I59991d7ef2af320bea7af4b907596e057affa42f
'iter' parameter is being checked for NULL in every call to
decoder_get_frame which is quite pointless because it is always
going to be NULL unless the application changed it. The code works
as described only because vp9_get_raw_frame returns -1 on all
subsequent calls after the first.
Change-Id: Ic736b9e8fe36fc1430fc11d6a9b292be02497248
* changes:
Remove the unnecessary cast of (int16_t)cospi_{1...31}_64
Remove the unnecessary upcasts of (int)cospi_{1...31}_64
Change cospi_{1...31}_64 from tran_high_t to tran_coef_t
Add the condition frames_since_golden > 0 to the
early exit check for ARF usage in nonrd_pickmode.
This improves quality of first frame following ARF, where
frame_since_golden = 0.
Small/neutral gain in metrics for speed 6, neutral change in speed.
Only affects when USE_ALTREF_FOR_ONE_PASS is enabled.
Change-Id: I82e73e6ff6fc849e5ca5448563cb8a0515fe0cdc
A new bug was introduced in a80bdfd "Change sinpi_{1,2,3,4}_9 from
tran_high_t to int16_t". Reverted the change in this file.
BUG=webm:1450
Failed test C/TransHT.AccuracyCheck/26.
Change-Id: Id001f57aad811803ef7d367d2b2bc008d8499991
Modify simple_block_yrd condition in nonrd_pickmode for SVC:
allow it to be used also on base temporal_layer, only when
spatial_layer > 1 and block size < 32x32.
Speed up of about ~2% for 3 layer SVC, with little/negligible
loss in quality.
Change-Id: I7734bdae51cf51f22b96f6b2b27da20ea1d84344
Fix the setting to frames_till_gf_update_due, and
adjust the limit value.
Only affects when USE_ALTREF_FOR_ONE_PASS is enabled.
Neutral change to metrics and speed for ytlive.
Change-Id: I266d9a00b36221bc8602fa2746d4e8a8f7d4dfae
Only when USE_ALT_REF_ONE_PASS is enabled (off by default).
Force fixed partition to 64x64 when is_src_alt_ref_frame is true,
and don't force early exit for some modes in nonrd_pickmode
for ARF noshow frames.
Small gain ~0.2% on ytlive metrics for speed 6.
Neutral speed difference.
Change-Id: I27eb6622d0453c09a06ccdc3b16368762474d11d
Add datarate test, for both VBR and CBR mode, with the
frame_parallel_decoding mode disabled (and error_resilience off).
Change-Id: I54feec3248a68ecff4bef8d9a31bb1616fab77df
In the new AUTO mode, restrict the minimum alt-ref interval and max column
tiles adaptively based on picture size, while not applying any rate control
constraints.
This mode aims to produce encodings that fit into levels corresponding to
the source picture size, with minimum compression quality lost. However, the
bitstream is not guaranteed to be level compatible, e.g., the average bitrate
may exceed level limit.
BUG=b/64451920
Change-Id: I02080b169cbbef4ab2e08c0df4697ce894aad83c
Removed inline for GP load-store in case of (__mips_isa_rev >= 6)
Created one define LD_V for vector load and ST_V for vector store
Change-Id: Ifec3570fa18346e39791b0dd622892e5c18bd448
Also add column headings so that the output can still be parsed if the
set of headers changes later.
Change-Id: I4beaf266521e093db4acf5f715b18fdfb7e3d1cd
This reverts commit 8c42237bb2.
Because ssse3 code is used for the reference, the qcoeff and dqcoeff
reference buffers must be aligned.
Original change's description:
> quantize avx: copy 32x32 implementation
>
> Ensure avx and ssse3 stay in sync by testing them against each other.
>
> Change-Id: I699f3b48785c83260825402d7826231f475f697c
Change-Id: Ieeef11b9406964194028b0d81d84bcb63296ae06
Scale 3x3 block instead of 16x16 block in each loop.
Benefits:
1. Reduced number of different phase_scaler from 16 to 3. Optimization code
will be smaller and faster.
2. The maximum phase_scaler drifting will be reduced from 5/16 to 1/24.
(The drifting is 1/(3*16) in each step.)
BUG=webm:1419
Change-Id: Ibb9242a629ddb03e1ff93b859bece738255e698c
The intra mode rd penalty was implemented as a rate penalty.
Code was added to scale the penalty according to block size but
this was not done correctly for the SB level or sub 8x8.
The code did a weird double scaling in regard to bit depth that
has been removed. Given that it is a rate penalty the bit depth
should not matter.
This bug fix improves average metrics on our standard test
sets by about 0.1%
Change-Id: I7cf81b66aad0cda389fe234f47beba01c7493b1e
Move class VpxScaleBase to new file test/vpx_scale_test.h.
Add new file test/vp9_scale_test.cc with ScaleFrameTest.
BUG=webm:1419
Change-Id: Iec2098eafcef99b94047de525e5da47bcab519c1
This header doesn't build on g++ v6 as it's a C and not C++ header
(_Atomic is not a keyword in C++11). Since the C and C++ invocations
cannot be guaranteed to point to the same underlying atomic_int
implementation, remove support for them and use compiler intrinsics
instead.
BUG=webm:1461
Change-Id: Ie1cd6759c258042efc87f51f036b9aa53e4ea9d5
Makes main thread wait for the filter level to be picked to avoid a race
between the LPF thread and update_reference_frames(). This also
re-enables the failing tests under thread_sanitizer where this data race
was detected.
BUG=webm:1460
Change-Id: I7f5797142ea0200394309842ce3e91a480be4fbc
Fixes issue on iPad Pro 10.5 (and probably other places) where threads
are not properly synchronized. On x86 this data race was benign as load
and store instructions are atomic, they were being atomic in practice as
the program hasn't been observed to be miscompiled.
Such guarantees are not made outside x86, and real problems manifested
where libvpx reliably reproduced a broken bitstream for even just the
initial keyframe. This was detected in WebRTC where this device started
using multithreading (as its CPU count is higher than earlier devices,
where the problem did not manifest as single-threading was used in
practice).
This issue was not detected under thread-sanitizer bots as mutexes were
conditionally used under this platform to simulate the protected read
and write semantics that were in practice provided on x86 platforms.
This change also removes several mutexes, so encoder/decoder state is
lighter-weight after this change and we do not need to initialize so
many mutexes (this was done even on non-thread-sanitizer platforms where
they were unused).
Change-Id: If41fcb0d99944f7bbc8ec40877cdc34d672ae72a
Neutral on rtc set for speed 8. Neutral on ytlive for speed 5.
Saves some computation cycles but no speed gain observed on Pixel.
Change-Id: I34c4642cd543aa89c5b9c4bff6b7113577c64c91
This reverts commit df9ce12259.
Reason for revert:
Re-enabled tests still fail tsan in high bitdepth.
Original change's description:
> Re-enable disabled tests under TSan.
>
> These tests point to an already-fixed bug, this should no longer have a
> data race.
>
> BUG=webm:1049
>
> Change-Id: Iaedc5db8df99362bdc501b70ff7fdebf8756fdb8
TBR=jzern@google.com,pbos@chromium.org,builds@webmproject.org
# Not skipping CQ checks because original CL landed > 1 day ago.
Bug: webm:1049
Change-Id: I232f1f7726bf795b301abfb2e07cad6756642e53
Rev d147771 fixed the test failure. So remove the resolution condition
for using source_sad in speed 6.
BUG=webm:1452
Change-Id: I1efba97e1ef5bd4de5f886299f6fcb907187abcd
Enable adapt_partition for vbr mode for speed 6.
This allows the usage of the pickmode-based partition
(used in speed 5), but only selectively for superblocks
with high source sad, otherwise the faster variance based
partition scheme is used.
For speed 6 on ytlive set: avgPSNR/SSIM metrics up by ~0.6%,
several clips up by ~1.5%. Small/negligible decrease in speed.
Change-Id: I12f3efef6b3e059391de330fdbe5a44c2587f1f8
For SVC at speed >= 7: only use the improved mv search
on base spatial layer, if top layer resolution is above 640x360.
~2.3% speedup
Small/negligible loss in avgPSNR metrics on rtc set.
Change-Id: Iaef75a57ebf1c248931bc1aa28d20b7fecac1851
This reverts commit f60d1dcd3d.
Reason for revert: <INSERT REASONING HERE>
Failures in AVX/VP9QuantizeTest in nightly tests.
Original change's description:
> quantize avx: copy 32x32 implementation
>
> Ensure avx and ssse3 stay in sync by testing them against each other.
>
> Change-Id: I699f3b48785c83260825402d7826231f475f697c
TBR=slavarnway@google.com,johannkoenig@google.com,builds@webmproject.org
Change-Id: Ibd38636212269328317dd0721be9d25452113d1c
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
For speeds < 7, increase threshold that controls the split
of 16x16->8x8 blocks, for resolutions 720p and higher.
Minor change for speed 5 (since it uses reference partition scheme
which only uses variance partition as first step).
For speed 6: ~0.5% increase in avgPSNR/SSIM metrics on ytlvie set.
No change in speed.
Change-Id: I5126580973201538d8ca26a9256b93c4d11d685b
Still does not pass tests. Does match the previous assembly, although
saving the sign before multiplying is dubious.
Change-Id: Ia163f18c755aba542d6e93f7bf7343184660df5a
Adds an early exit based on ptest. Slightly slower than ssse3 in the
full case because of the extra check, but potentially faster if lots of
rows can be skipped.
Very close in speed to the assembly.
Can run in 32 bit, unlike the assembly. Allows reworking the function
prototype to use structs.
Change-Id: If80e2b9ba059370a4cad3c973196e82a97b4330e
Add 1 if negative to get dqcoeff to round towards zero.
10-15% faster than converting to positive before shifting.
Change-Id: I01a62fd0c9bca786b6885b318bd447bb9229903d
About 4x faster when values are below the dequant threshold and 10x
faster if everything needs to be calculated.
Both numbers would improve if the division for dqcoeff could be
simplified.
BUG=webm:1426
Change-Id: I8da67c1f3fcb4abed8751990c1afe00bc841f4b2
This feature is used for the CBR RTC encoding mode
at speed >= 6. This change will exclude it for VBR mode.
For speed 6 live encoding (VBR):
avgPSNR/SSIM metrics on ytlive set up by ~1% (few clips up by 2/3%).
No change in speed.
Change-Id: I1a0dd94c334f7df309ab5a48d477d7e25355b798
* changes:
quantize: ignore skip_block in arm
quantize: ignore skip_block in x86
quantize fp: ignore skip_block in arm
quantize fp: ignore skip_block in x86
This should probably be handled before vp9_regular_quantize_b_4x4 even
gets called.
Fixes an assert resulting from removing skip_block from the quantize
functions.
BUG=webm:1459
Change-Id: I7f52b53f959b4654b3d4517ebda31a678f4d0fde
This condition is handled before this code is reached. The ssse3 version
of the function has always crashed when attempting to handle the
skip_block condition.
Add assert() and comments regarding the usage of skip_block.
Removing the parameter is a fairly involved process so leave it be for
the moment.
Change-Id: Ib299f6fc6589d7ee102262cc74a7aeb60110bc5a
Despite abs_coeff being a positive value, all the other implementations
treat it as signed which simplifies restoring the sign.
HBD builds cast qcoeff to avoid a visual studio warning. Match
vp9_quantize.c style of casting the entire expression.
Change-Id: I62b539b8df05364df3d7644311e325288da7c5b5
Having a very low "lag_in_frames" value could cause the encoder to create
incorrect / corrupt ARF groups including displayed frames that update the
ARF buffer and false overlay frames that are coded at low rate but are not
actually overlays of a real ARF frame.
This is linked to a reported unit test "slow down" where the chosen parameters
(lag of 3 frames) gave rise to such "broken" ARF group(s).
See also BUG=webm:1454
Change-Id: If52d0236243ed5552537d1ea9ed3fed8c867232c
Having a very small value for "lag_in_frames" can result in
corrupt arf groups including displayed frames that update
the arf buffer and fake overlay frames that are not in fact
overlays of real arfs but are nevertheless starved of bits.
Leaving lag_in_frames at the default of 25 for these 5 frame two
pass VBR tests should now give rise to a valid ARF coding pattern
as follows:- K(ey), A(rf), N(ormal), N, N, O(verlay).
This change is part of a response to BUG=webm:1454 where broken
arf groups interacted badly with a change that corrects for large rate
misses. However, it may still in some cases increase encode time by
virtue of the fact that the unit test now codes a correct coding pattern
with "hidden" ARF frames.
Change-Id: Ifd0246a4c1d0be247247c754024d7a4ed5f66a6b
Some clips in nightly unit test exhibiting significant encoder slowdown which
appears to bisect to Change-Id: I692311a709ccdb6003e705103de9d05b59bf840a.
The above change allowed for emergency iterations of the recode loop and
adjustment of the Q range if there is a large rate miss.
This patch disables the above adaptation for cases of cpu_speed >= 3 or more
specifically where cpi->sf.recode_loop >= ALLOW_RECODE_KFARFGF.
For speeds >= 3 the code does not currently run a dummy bit pack operation
inside the recode loop. Without this dummy pack operation there is no up to
date estimate of the current frame's size to use as a basis for assessing the
requirement for a recode. In practice it was using the previous frames size (or 0
for the first frame) which could cause odd behavior.
If we require the emergency rate correction added in Change-Id: I6923.. for
the higher speed settings it will be necessary to enable the dummy pack
which will in turn hurt encode speed.
BUG=webm:1454
Change-Id: I4fb3c6062ca9508325a6f31582f8e80f1a9b126f
Change legacy vp8/9_write_yuv_frame to vpx_write_yuv_files.
Delete some flags that can be enabled during build.
To enable writing denoised YUV, use the following command line:
CFLAGS='-DOUTPUT_YUV_DENOISED' ./configure
--enable-vp9-temporal-denoising
For skinmap, use CFLAGS='-DOUTPUT_YUV_SKINMAP'
Change-Id: I236974ac8b3cf279d20c4dc7f6162d8b480b6528
The result of the xor operation is unsigned. If coeff was negative,
this results in an unsigned value - INT_MIN.
Change-Id: I1f1edeaa6de1f4c68b848e8a82a666d390b749f0
Actual frame size and bitrate is all 0 when using SVC sample encoder
with sl = 1 because the stats are set in parse_superframe_index which
will not caculate properly when sl = 1 since there is no superframe.
Use pkt->data.frame.sz instead when sl = 1.
Change-Id: I93f5e98a4c779e32b007e1564ba5396af9e34ad6
Use input with a narrow range because the filter only applies when the
frames are similar.
Run CompareReferenceRandom more times. Especially before narrowing the
input range, the filter frequently did not apply.
Change-Id: Ie249bedf6d0d33dfa5884611cb1835788e418b38
this test fails with the configuration similar to the assembly prior to:
d52cb5972 quantize: copy ssse3 optimizations to intrinsics
BUG=webm:1458
Change-Id: Idc5c0b84c0598259fc49609a9f0756de531d3baf
Change the denoiser frame buffer management for SVC to more generally
handle the layer patterns in SVC (where last is not always refreshed).
This change is only for SVC with denoising and is bitexact.
Change-Id: Ic2b146a924cdf6e7114609158afa3d4880fe3fae
Testing of 4k videos encoded with a fixed arbitrary chunking interval
uncovered a bug where by if a chunk ends 1 frame before a real scene cut,
the next chunk may be encoded with two consecutive key frames at the start
with the first being assigned 0 bits.
This fix insures that where there is a key frame group of length 1 it is
at least assigned 1 frames worth of bits not 0.
See also patch Change-Id: I692311a709ccdb6003e705103de9d05b59bf840a
which by virtue of allowing fast adaptation of Q made this bug more visible.
BUG=webm:1456
Change-Id: Ic9e016cb66d489b829412052273238975dc6f6ab
Created inline functions highbd_butterfly_cospi16_sse2()
and highbd_butterfly_cospi16_sse4_1()
BUG=webm:1412
Change-Id: Icbc53a73712b6207379872a5e88d0a4d09e2322a
With skip block the neon is about twice as fast as C.
The neon has no shortcut for coeff < zbin so it always takes the
same amount of time. Even if the C can take the shortcut, it is over
twice as fast in neon. If it can't, that gap increases to over 10x.
BUG=webm:1426
Change-Id: I400722146c1b5a5f6289f67d85fd642463d2bfc6
Fairly minor differences from sse2. pabsw and psignw are the big gains.
Also re-uses some values in eob calculation to avoid an extra pcmp.
Fixes test failures in HBD and OS X builds.
Allows using it in 32bit builds, where it is about 40% faster than sse2.
Substantially faster than the assembly for skip_block. 10-20% faster the
rest of the time.
Change-Id: If783bb3567e561e47667e10133b9c84414a334e2
When adapt_partition_source_sad is enabled (currently only at
speed 6 for resoln <= 360p): use lower subsize (8x8 instead of 16x16)
for nonrd_select_partition on 32X32 blocks.
And force avoiding rectangular partition checks in
nonrd_pick_partition for speed >= 6.
Small increase ~0.5 in metrics for speed 6 on rtc_derf,
no change in speed.
Change-Id: Id751bc8f7573634571b2d6f5e29627cd5cebccae
Prepare for high bitdepth 16x16 idct sse4.1 code.
Just functions moving and renaming.
BUG=webm:1412
Change-Id: Ie056fe4494b1f299491968beadcef990e2ab714a
vpx_sub_pixel_variance32xh_avx2() and
vpx_sub_pixel_avg_variance32xh_avx2
see:
17fae3a Change to use correct check for halfpel
Change-Id: Ib0741c5c2fd011e9650ca62b76009f1b59fdbe4c
Enable fast adaptation of Q when there is a large overshoot
for the #ifdef AGGRESSIVE_VBR test case.
AGGRESSIVE_VBR is not currently enabled by default.
Change-Id: I7240bb6589795964b6b0b66df4468e4f21504e0f
Originally, for the purpose of keeping a fast first pass, the first-pass
stats between row_mt_mode = 0 and row_mt_mode = 1 are not bit exact, but
that difference is very small that doesn't cause a mismatch between the
final bitstreams. However, if the encoder changes, this minor difference
may cause a mismatch. Thus, this patch always forces the first pass to
be bit exact.
BUG=webm:1453
Change-Id: I2b67cf529dee81f660f9d9e7fe9a60ea3c7b12b8
For 1 pass CBR mode:
Apply the logic for dropping (and re-adjusting rate control)
due to large overshoot to the case of non-screen content when
drop_frames_allowed is enabled.
For the non-screen content case: add additional condition that
rate correction factor is close to minimum state, and flag to
constrain the frequency of the dropping.
Also handle the case of temporal layers and multi-res encoding.
Add some flags/counters to the layer context for temporal layers.
For multi-res: drop due to overshoot is checked on lowest stream,
and if overshoot is detected we force drops on all upper streams
for that frame.
This feature is to avoid large frame sizes on big content
changes following low content period.
No change in behavior for screen_content_mode = 2.
Change-Id: I797ab236cbbf3b15cad439e9a227fbebced632e6
This replaces commit aa1c4cd, which has a bug and was reverted in
commit 3c73e58.
The bug is caused by rounding -step1[5] in highbd_idct8x8_12_half1d().
Change-Id: I37b3a5f0d91815f2dc570209091dc6626fd178a8
With skip block or coeff < zbin it is about twice as fast as C.
If most coeff values are > zbin it is about 10-15x as fast as C.
BUG=webm:1426
Change-Id: I5d3c007b014a372d5ef0882b39bb48983b4131c7
When the superblock partition is based on the nonrd-pickmode,
we need to avoid the denoising. Current condition was based on
the speed level. This change is to make the condition at the
superblock level, as the switch in partitioning may be done at
sb level based on source_sad (e.g., in speed 6).
Change-Id: I12ece4f60b93ed34ee65ff2d6cdce1213c36de04
This reverts commit c9266b8547.
Disable source_sad when resolution > 1080P. The test should
pass now.
BUG=webm:1452
Change-Id: I72dde88e66590ff9e41da5e5dd83f5550a83f082
left shifting a negative value is undefined; quiets a ubsan warning.
this is applied to a constant, no change in the generated code.
Change-Id: I595f0ff7904ef025e07bb80234293d958dc9f254
This reverts commit aa1c4cd140.
This fails the following tests with extreme input coefficients:
SSE2/InvTrans8x8DCT.CompareReference/0
SSE2/InvTrans8x8DCT.CompareReference/2
previously the optimized path was skipped in this range
Change-Id: I9af015a46eba96208834a219fafd651d37556a80
Move the source_sad feature to speed 6 (from speed 7), and
add speed feature to switch from the variance-based partition
to reference_partition (which uses nonrd-pickmode for bsize selection)
if source_sad is high.
Currently used only for speed 6 for resoln <= 360p.
About 4-5% improvement on 360p in RTC set.
Some speed slowdown, but still ~30% faster than speed 5.
Change-Id: Ib0330ee5fe9fdd2608aed91359a2a339d967491c
This reverts commit 03f5e300d6.
This causes test failures under OSX:
SSSE3/VP9QuantizeTest.EOBCheck/0
SSSE3/VP9QuantizeTest.OperationCheck/0
Change-Id: I122732717ead1f7af5b04c529a6948e382e5e59b
allow the right shift to operate on 64-bits, this matches the rest of
the implementations
previously:
b0f1ae147 vpx_get16x16var_avx2: correct cast order
Change-Id: I632ee5e418f3f9b30e79ecd05588eb172b0783aa
allow the right shift to operate on 64-bits, this matches the rest of
the implementations
missed in:
6acd061aa variance_avx2: sync variance functions with c-code
Change-Id: Icae436b881251ccb9f9ed64fcbf8d358c58a4617
For 8-bit the subtrahend is small enough to fit into uint32_t.
For 10/12-bit apply:
63a37d16f Prevent negative variance
previously:
47b9a0912 Resolve -Wshorten-64-to-32 in highbd variance.
c0241664a Resolve -Wshorten-64-to-32 in variance.
Change-Id: I181c85f0b9a03da37c2e8b89482d48aa3dbc0aee
Avoid unsigned overflow warning:
unsigned integer overflow: 19974 - 32703 cannot be represented in type
'unsigned int'
Change-Id: Ifebee014342e4c6f3b53306c0cad6ae0b465ac12
Backend specific optimization for PPC VSX reads 16 bytes, whereas arm neon /
sse2 only reads <= 8 bytes. Although the extra bytes read are actually never
used, this is not a warrant for groping around. Fixed by allocating more when
building for VSX. This is reported by asan.
Also note - PPC does have assembly that loads 64-bit content from memory - lxsdx
loads one 64-bit doubleword (whereas lxvd2x loads two 64-bit doubleword) from
memory. However, we only have "vec_vsx_ld" builtins that mapped to lxvd2x, no
builtins to lxsdx. The only way to access lxsdx is through inline assembly,
which does not fit well in the origin paradigm.
Refer:
vsx:
vpx_tm_predictor_4x4_vsx @ third_party/libvpx/git_root/vpx_dsp/ppc/intrapred_vsx.c
neon:
vpx_tm_predictor_4x4_neon @ third_party/libvpx/git_root/vpx_dsp/arm/intrapred_neon_asm.asm
sse2:
tm_predictor_4x4 @ third_party/libvpx/git_root/vpx_dsp/x86/intrapred_sse2.asm
BUG=b/63112600
Tested:
asan tests passed.
Change-Id: I5f74b56e35c05b67851de8b5530aece213f2ce9d
Keep optimized code out of the reference implementation. This matches
the style of the other sub calls.
Change-Id: I3da6acd4f2c647b029c420e22ac9410a18259689
0.007% regression on rtc and 0.004% gain on rtc_derf.
1 thread on QVGA,VGA and HD has ~0.2% speed regression while 2 threads has
~0.2% speed gain on Google Pixel.
Change-Id: Ia4a6ec904df670d7001e35e070b01e34149d23dc
Officially the quant structures are 8 elements, with one dc element and
7 repeated ac elements. The low bit depth optimizations take advantage
of this to fill the xmm registers. The high bit depth version manually
duplicates the values.
If all the optimizations were unified, the structure sizes could be
greatly reduced.
Change-Id: Ibd7a0337a7832ce2a1a05ee433c310077e1059ae
Use only valid values for quantize inputs. These were determined by
looping over vp9_init_quantizer and looking for max and min values.
This allows extending the test to the low bit depth functions which were
not designed to handle all possible inputs but only valid inputs.
Change-Id: I94e1d8863a49ac227845b65c6b50130e10e6319e
To fix valgrind issueis with SVC tests.
SVC encoding uses prune_evenmore which is causing uinit value.
Will re-enable later when issue is resolved.
Change-Id: I257ff878cf78197ddd813db056582a4d5fe94f44
When content_state_sb is set to LowVarHighSumdiff, don't reset
it to VeryHighSad. Visually better on clips with strong lighting changes.
Small/negligible change in RTC metrics and speed.
Change-Id: I20c383e3c4cf8d1149de5f9260449c0b7cf7c6aa
When int_pro_motion_estimation is done for superblock in
choose_partitioning, use it to avoid the full_pixel_search
for NEWMV mode, if bsize is >= 32X32.
For speed > 7.
Small/neutral change on RTC metrics.
~1-2% speedup on arm on high motion clip.
Change-Id: I3cfe6833ff4bf75d4afa83eaf058ad45729de85b
Although the low bitdepth functions are identical (excepting the need
for larger intermediate values) they do not pass these tests. This
improves the error output to aid debugging.
Simplify buffer usage with Buffer and removing unnecessarily aligned
variables.
eob is a single element and never written using aligned instructions.
BUG=webm:1426
Change-Id: Ic95789a135cf1e8a3846d85270f2b818f6ec7e35
Reduces memory usage, and speeds up encoding for some difficult clips.
No impact on output or metrics.
Ported from aomedia patch:
https://aomedia-review.googlesource.com/c/14501
Change-Id: I26ec69af8336f9e80da486a1cfbfc89a3596954d
This reintroduces the fix:
https://chromium-review.googlesource.com/c/422807/
and later reverted here:
https://chromium-review.googlesource.com/c/447843/
BUG=webm:1355
This time behind a compile time flag :
configure --disable-always_adjust_bpm
configure --enable-always_adjust_bpm
This should make side by side testing easier and let users of the
lib pick which way they want to go.
Change-Id: I7d7b37b83015dc001810af84c132cbc1e71ba8d6
For fixed pattern SVC: keep track of denoised last_frame buffer
for base temporal layer, and if alt_ref is updated on middle/upper
temporal layers, force an update to denoised last_frame buffer.
This allows for improved denoising on top temporal layers.
Change-Id: Icbd08566027d4d2eabc024d3b7a0d959d2f8c18b
This code is unused in vp9. Only vp8 still contains references to
vpx_sad_NxMx[3|8] and only for sizes 16x16, 16x8, 8x16, 8x8 and 4x4.
Remove the remaining sizes and all the highbitdepth versions.
BUG=webm:1425
Change-Id: If6a253977c8e0c04599e25cbeb45f71a94f563e8
Denoiser is used in real-time mode which does not use alt-ref.
Reduce memory usage when denoiser is enabled.
Change-Id: I54ba3bcaeeb1818bbdf718ef90e97d4897ff793d
* changes:
sad neon: avg for 64x[32,64]
sad neon: macroize 64xN definitions
sad neon: avg for 32x[16,32,64]
sad neon: macroize 32xN definitions
sad neon: avg for 16x[8,16,32]
sad neon: macroize 16xN definitions
this has been set to max since:
f5c36a5ce VP9: turn on tile-columns and frame-parallel-mode by default
~v1.4.0
Change-Id: Ic796fc05abe73a58700ec50e3f8e72d3462898ec
In the content_state for a superblock is set to HighSad,
use that to bias some decisions in variance partition and
nonrd pickmde: use int_pro_motion for sad computation in
choose_partitioning, and set large_block in pickmode based
on the content_state_sb.
Only affects speed >= 7.
Immprovement for high motion content.
Small gain (~1%) in RTC metrics.
Speedup of ~5 for high motion clip on android (speed 8, 1 thread).
Change-Id: I5774c4854f012b89c8e969f6129b60988c2ce11c
this has been on by default since:
f5c36a5ce VP9: turn on tile-columns and frame-parallel-mode by default
~v1.4.0
Change-Id: I52017ab0157feaf429dce3d9e1af8a53bb5c1b65
the file was empty after the struct removal. the only remaining use was
within vp9_dx_iface, but the wrapper became unnecessary after the
removal of frame_parallel_decode.
BUG=webm:1395
Change-Id: I515ab585d701e77d388d12b2802d844c424f9bcd
This patch attempts to address a bug reported for 4K video.
https://b.corp.google.com/issues/62215394
In this instance a perfect storm of a moderate complexity section
followed by a much easier section where a CGI overlay helped to
suppress film grain noise, followed by a much harder and very grainy
section at the end, cause a massive local rate spike that pushed a chunk
over the upper allowed rate limit.
This patch detects cases where the rate for a frame is much higher than
expected and allows, in this special case, for rapid adjustment of the active
Q range.
For the example chunk in the bug report the target rate was 18Mb/s and the
observed rate was over 37 Mb/s with a surge for the last few frames to over
100Mb/s. This patch brings the overall chunk rate right back down to ~18.2 Mbit/s
and almost completely eliminates the rate spike at the end. (See graphs appended
to bug report)
Also see I108da7ca42f3bc95c5825dd33c9d84583227dac1 which fixes a bug
unearthed during testing of this patch and also has a bearing on high rate
encodes such as 4K.
This patch does have a negative impact on some metrics. Most notably there are
clips in our standard test set where it hurts global psnr (though in many cases it
conversely helps SSIM, FAST SSIM and PSNR-HVS). It is also worth noting that
the clips (and data rates) where there is a big metric impact, are almost all cases
where there is currently a significant overshoot vs the target rate and overall rate
accuracy is greatly improved.
Change-Id: I692311a709ccdb6003e705103de9d05b59bf840a
Local application of:
https://github.com/google/googletest/pull/1066
Suppress unsigned overflow instrumentation in the LCG
The rest of the (covered) codebase is already integer overflow clean.
TESTED=gtest_shuffle_test goes from fail to pass with -fsanitize=integer
Change-Id: I8a6db02a7c274160adb08b7dfd528b87b5b53050
left shifting a negative value is undefined; quiets a ubsan warning.
this is applied to a constant, no change in the generated code.
Change-Id: Ia17a7672d4832463decbc4afd6cd42974d02698e
Finish the calulations in neon registers. This avoids a potentially
expensive move from neon to gp and allows at least clang to store
directly to memory.
BUG=webm:1424
Change-Id: Idef25eec95f7610947167818e9194bde8b00d282
this makes the function compatible with high-bitdepth and fixes test
failures since:
5ac88162b partial fdct test
Change-Id: Ib630694608237f0c515948942e05dbea259ba338
testing::Range does not include the end parameter in the set of values.
also adjust the start to 2 as the single threaded case is already
covered in another instantiation
Change-Id: Iae3bf3ed4363dd434eccfa5ad4e3c5e553fbee60
For nonrd_pickmode: add condition for checking
intra mode if the sb content state is VeryHighSad.
Reduces artifacts when sudden change in content.
Metrics on RTC/RTC_derf neutral (small gain).
No speed loss observed.
Change-Id: I07006d28fd2dc06c1d06b07630102b0fece50c40
the last frame_worker_owner, row and col references were removed in:
131bd06e6 remove vp9_dthread.c
BUG=webm:1395
Change-Id: Ia7fb2e8782b12a58d2a2263849d20a8abf06aef6
and the related prototypes in vp9_dthread.h. the last references were
removed in:
09dabc58d VP9_COMMON: rm frame_parallel_decode
vp9_dx_iface.c still uses FrameWorkerData
BUG=webm:1395
Change-Id: Ica8e98ae776fc0105f1fbbed9e0a729808980810
creating a thread associated with the sole worker isn't necessary when
only execute() is being used after the removal of frame_parallel_decode.
BUG=webm:1395
Change-Id: I2255ce72607321e5708bc82a632dc6825d4eff5c
Add a method to acm_random.h to generate ranges of values
Add a way to call that method to buffer.h
Adjust dct_[partial_]test.cc to use it.
Change-Id: I8c23ae9d27612c28f050b0e44c41cb4ad2494086
this field has been 0 since:
01d23109a vp9: make VPX_CODEC_USE_FRAME_THREADING a no-op
BUG=webm:1395
Change-Id: I15448e9401e15329b54c6878dda033b17be5ec6b
VPX_CODEC_USE_FRAME_THREADING was made a no-op in:
01d23109a vp9: make VPX_CODEC_USE_FRAME_THREADING a no-op
and the tests in this file have been disabled since:
6ab0870d4 disable VP9MultiThreadedFrameParallel tests
BUG=webm:1395
Change-Id: I2c7a250acb65cf9522cf8a7bb724bb92070e41c6
this was made a no-op in:
01d23109a vp9: make VPX_CODEC_USE_FRAME_THREADING a no-op
and the test hitting this branch has been disabled since:
6ab0870d4 disable VP9MultiThreadedFrameParallel tests
rename the test to VP9MultiThreaded to exercise the tile-based threading
BUG=webm:1395
Change-Id: I35564a75eb5a7d7f7ccb923133b1b07295201f4c
Always return an int32_t. Since it needs to be moved to a register for
shifting, this doesn't really penalize the smaller transforms.
The values could potentially be summed and shifted in place.
BUG=webm:1424
Change-Id: Id5beb35d79c7574ebd99285fc4182788cf2bb972
For the 8x8_1, the highbd output fit nicely in the existing function. 12
bit input will overflow this implementation of 16x16_1.
BUG=webm:1424
Change-Id: I2945fe5478b18f996f1a5de80110fa30f3f4e7ec
The function was originally written with HBD in mind. Enable it and
configure the tests.
BUG=webm:1424
Change-Id: I78a2eba8d4d9d59db98a344ba0840d4a60ebe9a1
* changes:
sad neon: rewrite 64x64 and add 64x32
sad neon: rewrite 32x32, add 32x16 and 32x64
sad neon: rewrite 16x8, 16x16, add 16x32
sad neon: rewrite 8x8 and 8x16
sad neon: rewrite 4x4 and add 4x8
Test the _1 variant of the fdct, which simply sums the block and applies
a modifying shift based on the block size.
BUG=webm:1424
Change-Id: Ic80d6008abba0c596b575fa0484d5b5855321468
Existing logic was only affecting resolutions above 720p.
Needs more testing for reducing subpel for speed >= 8.
No change on RTC metrics.
Change-Id: I2f4bf9f25891614aafa9a86aa5a5063a3ccfce4d
This could save some cycles since skin detection is used in multiple
places in vp9.
1~2% speed up on ARM.
Change-Id: I86b731945f85215bbb0976021cd0f2040ff2687c
Split to load_input_data4() and load_input_data8().
Use pack with signed saturation instruction for high bitdepth.
Change-Id: Icda3e0129a6fdb4a51d1cafbdc652ae3a65f4e06
this normalizes these tests with the regular variance ones both in
implementation and test list output
Change-Id: I387aea81456f94b8223b8fb2a28cab94bc1aa9d5
Use the scene detection for CBR mode, and use it to reset the
rate control if large source sad is detected and rate
correctioni fact/QP is at minimum state.
Avoids large frame sizes after big content change following
low content period.
Only affects CBR mode for 1 pass at speeds 5, 6, 7.
Change-Id: I56dd853478cd5849b32db776e9221e258998d874
Fix misplaced cast that caused an overflow and incorrect rate adaptation
behavior for high data rates. This in particular will have affected 4k encodes
but could also have come into play for some higher rate 1080p cases.
In our standard test sets the quality impact is small though several high rate
clips show improved rate accuracy. This can also impact the number of recode
loop hits and on one problem 4k clip the encode time for speeds 0 and 1 was
reduced by >25%
Change-Id: I108da7ca42f3bc95c5825dd33c9d84583227dac1
Use it to limit NEWMV early exit in nonrd pickmode
Small change in RTC metrics, has some improvement
for high motion clips.
Change-Id: I1d89fd955e1b3486d5fb07f4472eeeecd553f67f
this is consistent with other threaded tests and ensures gtest_filters
meant to operate on these pick them up
Change-Id: I99ce53720553a22c4b9905a2882273c2be2c031b
and vp8_fast_quantize_b_impl_mmx; this was never enabled in rtcd
an sse2 version exists so there isn't much reason to keep a mmx
implementation around.
Change-Id: I8b3ee7f46ba194ffa0d0a6225a0f299f2a4dea90
use an int to quiet an unsigned rollover warning similar to:
25110f283 Fix an ubsan warning: vp9_quantizer.c
Change-Id: Iedecb79a17249bc18f10c0920f88cf704920f12b
Adjust the threshold for turning off cyclic refresh for high motion,
and avoid testing golden in nonrd pickmode for speed >= 8 if
golden refresh was long ago.
No change/neutral on RTC metrics.
Change-Id: I40959b8d9637f3553e7458bbabd8c6024c2c09c0
vpx_idct32x32_1024_add_ssse3() is actually a sse2 function and faster
than vpx_idct32x32_1024_add_sse2(). Replace the slow one. All are
code relocations, no new code.
Change-Id: I5dac0e98cc411a4ce05660406921118986638d19
'in' is used for the reference fdct. 'coeff' is input to the idct being
tested and 'dst[16]' is output
Fixes a segfault on unaligned memory access on x86.
Change-Id: I3691b1380ed49986897dd89a63ce63a80a0e0962
this was deleted in:
98967645a Remove vpx_idct8x8_64_add_ssse3()
but this was merged in:
9e03eedf6 Merge changes Ib26dd515,Ie60dabc3
after:
a92991133 Merge "dct tests: run all possible sizes in one test"
which added a new reference
Change-Id: I8da4a6c80d27b237a378ff15eead1daab89e7e25
Don't overide max_gf_interval if it's not specified. It will
be assigned with a default value in vp9_rc_set_gf_interval_range().
BUG=b/62803416
Change-Id: Ide46ce00279ed076865fc54ce98c55a994f0c798
Sample encoder change: reduce max-intra-rate to 1000 and
buf-initial to 600. Paramaters affect target size of key frame.
Change-Id: I2be6bc2927f5fa74e19e1efa3fb574d23a503300
Sample encoder change: reduce max-intra-rate to 1500 and
buf-initial to 700. Paramaters affect target size of key frame.
Change-Id: I01e238378b63eeef28dfc2178baadffcd3cc7561
Adjust some parameters in sample encoder: vpx_temporal_svc_encoder.
Parameters adjusted to set lower QP for initial key frame,
and allow for larger target size on subsequent key frames.
Change-Id: I092ad968e5b51b9f495dadb6ee96e810663c910e
Modify fdct4x4_test.cc to support all size combinations. This does not
add any new tests and in fact fails a few. There were minimal changes
made to the tests so it's not entirely surprising that some of the
larger 12 bit transforms are failing since it was initially only used
for 4x4.
In follow up patches the tests in fdct8x8_test.cc, dct16x16_test.cc and
dct32x32_test.cc will be evaluated and moved to dct_test.cc.
BUG=webm:1424
Change-Id: I72a23430f457d7fae8c91e706adc0e77c25abc8f
Set the base_mv_aggressive for temporal enhancement layers (TL > 0).
Under the aggressive mode, skip the NEWMV depending on the
SSE of the base_mv. Also reduce the subpel motion to 1/2 under
aggressive mode if base_mv is good.
Speedup ~3% with small/negligible loss in quality on RTC.
Affects speed >= 6.
Change-Id: I89341b279cad6da2a04b76d5e726016191dacdb8
It's almost identical with vpx_idct8x8_64_add_sse2(), except little
difference in instructions order.
Change-Id: Ie60dabc35eaa6ebae7c755e6cff00a710aad284f
This was ported from the greedy version in AV1, written by Dake He
(dkhe@google.com).
See:
https://aomedia.googlesource.com/aom/+/master/av1/encoder/encodemb.c#137
Greedy version is disabled by default, but can be picked by setting
USE_GREEDY_OPTIMIZE_B to 1.
To be enabled by default later.
This is both faster and better in terms of compression.
Compression Improvement:
------------------------
lowres: -0.119
midres: -0.064
hdres: -0.405
Speed Improvement:
------------------
(Based on encode time of 3 videos of different difficulties at
3 different target bitrates)
With --cpu-used=0: 0.38% to 5.55% faster
With --cpu-used=1: 0.24% to 2.79% faster
With --cpu-used=2: 0.29% to 1.46% faster
Change-Id: Ia7a23b3b244ad8eb253ac9e43cd03c5e021d2635
* changes:
Update high bitdepth load_input_data() in x86
Clean array_transpose_{4X8,16x16,16x16_2) in x86
Remove array_transpose_8x8() in x86
Convert 8x8 idct x86 macros to inline functions
some build systems have trouble with duplicate basenames.
vpx_dsp/skin_detection.[hc] were added in:
658e85425 Merge skin detection code in vp8/9.
BUG=webm:1438
Change-Id: Ieaa70b40bda409ec23e6d179b47a930ac6243b05
Set subpel prune_evenmore only for non_reference frames,
instead of all TL > 0 frames. Gain some quality back at
cost of small speed loss (~1-2%).
Change only effects SVC encoding at speed >= 7.
Change-Id: I5b9f51e51dccfd7050521a66996176b0415ca3f9
the check for error correction being disabled was overriding the data
length checks. this avoids returning incorrect information (width /
height) for the decoded frame which could result in inconsistent sizes
returned in to an application causing it to read beyond the bounds of
the frame allocation.
BUG=webm:1443
BUG=b/62458770
Change-Id: I063459674e01b57c0990cb29372e0eb9a1fbf342
min_gf_interval should be no less than min_altref_distance + 1,
as the encoder may produce bitstream with alt-ref distance being
min_gf_interval - 1.
BUG=b/38450599
Change-Id: Ifb733daa643ebc668d1b23e1ce92db94b66dabe8
Roughly 2x speedup. Since the only change for HBD is to store(), the
improvement appears to hold there as well.
BUG=webm:1424
Change-Id: I15b813d50deb2e47b49a6b0705945de748e83c19
Keep the 1/4subpel for all frames, use SUBPEL_TREE_PRUNED_EVENMORE
for all temporal enhancement layer frames.
Change-Id: Ibc681acbb6fc75b7b3c57fc483fcb11d591dfc9a
It is initialized to be { INT_MAX, 0, ... } in ffe0f9b.
No effect on encoders.
Make it consistent with other initializations.
BUG=webm:1440
Change-Id: Ie2a180d93626b55914c8c4255e466a1986d2b922
visual studio will warn if a 32-bit shift is implicitly converted to 64.
in this case integer storage is enough for the result.
since:
f3a9ae5ba Fix ubsan failure in vp9_mcomp.c.
Change-Id: I7e0e199ef8d3c64e07b780c8905da8c53c1d09fc
For SVC 1 pass non-rd mode:
Force subpel seach off for SVC for non-reference frames
under motion threshold.
Add flag to svc context to indicate if the frame is not used
as a reference.
Little/no quaity loss, ~2% speedup.
Change-Id: Ic433c44b514d19d08b28f80ff05231dc943b28e9
Speed >=8: for resolutions above CIF, and for low motion content,
set subpel_search_method to SUBPEL_TREE_PRUNED_EVENMORE.
Small speed gain (~2%) on vga clips,
RTC metrics up by ~2-3% on average.
Change-Id: Ie26ba0264589652f92dfe74308740debf94cf0cc
x86 requires 16 byte alignment for some vector loads/stores.
arm does not have the same requirement.
The asserts are still in avg_pred_sse2.c. This just removes them from
the common code.
Change-Id: Ic5175c607a94d2abf0b80d431c4e30c8a6f731b6
Split vp8/vp9 implementations on yv12_copy_frame_c.
Remove high-bitdepth codes from vp8_yv12_extend_frame_borders_c.
Clean up vp8 codes usage in vp9.
BUG=webm:1435
Change-Id: Ic68e79e9d71e1b20ddfc451fb8dcf2447861236d
Fix the condition on usage of source_sad for temporal layers.
FIx allows it to be used for the case of 1 temporal layer.
Change-Id: I02b1b0ade67a7889d1b93cee66d27c0951131fc3
Adjust the max_copied_frame setting for temporal layers.
Keep the same setting for non-SVC at speed 8.
This change also enables copy_partiton for non-SVC at speed 7,
but with smaller value of max_copied_frame (=2).
~2% speedup for SVC speed 7, 3 layers, with little/no quality loss.
Change-Id: Ic65ac9aad764ec65a35770d263424b2393ec6780
Unlike x86, arm does not impose additional alignment restrictions on
vector loads. For incoming values to the first pass, it uses vld1_u32()
which typically does impose a 4 byte alignment. However, as the first
pass operates on user-supplied values we must prepare for unaligned
values anyway (and have, see mem_neon.h).
But for the local temporary values there is no stride and the load will
use vld1_u8 which does not require 4 byte alignment.
There are 3 temporary structures. In the C, one is uint16_t. The arm
saturates between passes but still passes tests. If this becomes an
issue new functions will be needed.
Change-Id: I3c9d4701bfeb14b77c783d0164608e621bfecfb1
The sub pixel variance uses a temp buffer which guarantees width ==
stride. Take advantage of this with the 4x and avoid the very costly
lane loads.
Change-Id: Ia0c97eb8c29dc8dfa6e51a29dff9b75b3c6726f1
For aq-mode=3: refactor the condition for turning off
the refresh. Add some adjustments for high motion content.
No/little change in RTC metrics, only affects high motion case.
Change-Id: I7da8eabfb0e61db014be4562806f72ee5ef4a43b
When temporal layers are used, only allow for copy partition
on the top temporal enhancement layer frames.
Change-Id: I5472abdc0f9f6c8dafa75a7a84c615e08ae22af8
Only affects speed 8.
Make changes to copy partition to fix a bug in setting microblock
offset. Avg PSNR shows 0.02% gain on rtc_derf and 0.08% loss on rtc.
Change-Id: I61c3e5914dde645331344388e7437e5638acd4f3
The modified error was a derivative of the "coded_error"
that was used to allocate bits between different frames on the
assumption that the allocation should be linear in terms of this
modified error. I.e. a frame with double the modified error score
should all things being equal get double the number of bits. The
code also included upper and lower caps derived from input
VBR parameters.
This patch improves the initial calculation of the clip mean error
(now called "mean_mod_score" as it is no longer a prediction error)
used as the midpoint for the rate distribution function and normalizes
the output "modified scores" scores such that 1.0 indicates a frame
in the middle of the distribution. The VBR upper and lower caps are
then applied directly to a frame's normalized score.
This refactoring is intended to make it easier to drop in alternative
distribution functions or to base the rate allocation on a corpus wide
midpoint (rather than the clip mean).
Change-Id: I4fb09de637e93566bfc4e022b2e7d04660817195
Continue processing sets of 16 values. Plenty of improvement for 4x8
(doubles the speed) but only about 30% for 4x4.
BUG=webm:1422
Change-Id: Ib8dd96f75d474f0348800271d11e58356b620905
Advise the compiler that the store is eventually going to a uint8_t
buffer. This helps avoid getting alignment hints which would cause the
memory access to fail.
Originally added as a workaround for clang:
https://bugs.llvm.org//show_bug.cgi?id=24421
Change-Id: Ie9854b777cfb2f4baaee66764f0e51dcb094d51e
Add PartialIDctTest::PrintDiff() to help debugging.
In RunQuantCheck, try all combinations of +/-mask_ input for 4x4 idct.
Update PartialIDctTest::InitInput().
Change-Id: I13fd163954a4c1a3a6cfeb5e4a4d3d0e7ff901f4
Most existing first pass stats are stored in a form normalized to a
macro-block scale. However the error scores for intra / inter etc were
stored as frame level values but mainly used as MB level values.
This change fixes that. Normalized per MB values make comparisons
between different formats easier and in any case this is usually what is
wanted.
An change in results should be limited to slight differences in rounding.
*** Change after patch 8 +2 requiring new approval.
Final pre-submit testing showed one 4K clip with above expected change.
Investigation showed this was due to a value used to test for ultra low intra
complexity in key frame detection. This was a per frame not per MB value but
also did not scale with frame size. Replacement with a small per MB value
(based on original per frame value and cif frame size) resolved the KF detection
problem.
Also converted kf_group_error_left to a double in line with other error values
to reduce rounding problems in KF group bit allocation
All clips and sets now show nominal (or 0) change as expected.
Change-Id: Ic2d57980398c99ade2b7380e3e6ca6b32186901f
This reverts commit 0d88e15454.
Reason for revert: chromium builds are failing to locate vpx_rv during dlopen()
dlopen failed: cannot locate symbol "vpx_rv" referenced by "libstandalonelibwebviewchromium.so"
Original change's description:
> Add visibility="protected" attribute for global variables referenced in asm files.
>
> During aosp builds with binutils-2.27, we're seeing linker error
> messages of this form:
> libvpx.a(subpixel_mmx.o): relocation R_386_GOTOFF against preemptible
> symbol vp8_bilinear_filters_x86_8 cannot be used when making a shared
> object
>
> subpixel_mmx.o is assembled from "vp8/common/x86/subpixel_mmx.asm".
> Other messages refer to symbol references from deblock_sse2.o and
> subpixel_sse2.o, also assembled from asm files.
>
> This change marks such symbols as having "protected" visibility. This
> satisfies the linker as the symbols are not preemptible from outside
> the shared library now, which I think is the original intent anyway.
>
> Change-Id: I2817f7a5f43041533d65ebf41aefd63f8581a452
>
TBR=jzern@google.com,johannkoenig@google.com,rahulchaudhry@chromium.org,builds@webmproject.org
Change-Id: I0c2ea375aa7ef5fda15b9d9e23e654bb315c941b
This reverts commit 3704807805.
Reason for revert: <INSERT REASONING HERE>
Does not look to be the cause of the test failures.
Original change's description:
> Revert "vp8: Real-time mode: reduce mode_check_freq thresh for speed 10."
>
> This reverts commit 4a7424adba.
>
> Reason for revert: <INSERT REASONING HERE>
> Possibly causing test failures in roll into chromium.
>
> Original change's description:
> > vp8: Real-time mode: reduce mode_check_freq thresh for speed 10.
> >
> > Reduces quality regression at speed 10 for real-time mode.
> >
> > Change-Id: I9f624bea9ca262dab32ce9de7d6d91175d6becc8
> >
>
> TBR=marpan@google.com,builds@webmproject.org,jianj@google.com
> # Not skipping CQ checks because original CL landed > 1 day ago.
>
> Change-Id: I1defcb74e78a5a3bd29b7d1b21a96a79fa26a457
>
TBR=marpan@google.com,builds@webmproject.org,jianj@google.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
Change-Id: I13d86a2a68b8aa8c0c7465e6e58cff0e00bc7862
This reverts commit 4a7424adba.
Reason for revert: <INSERT REASONING HERE>
Possibly causing test failures in roll into chromium.
Original change's description:
> vp8: Real-time mode: reduce mode_check_freq thresh for speed 10.
>
> Reduces quality regression at speed 10 for real-time mode.
>
> Change-Id: I9f624bea9ca262dab32ce9de7d6d91175d6becc8
>
TBR=marpan@google.com,builds@webmproject.org,jianj@google.com
# Not skipping CQ checks because original CL landed > 1 day ago.
Change-Id: I1defcb74e78a5a3bd29b7d1b21a96a79fa26a457
Move the tran_low_t helper functions to a new file. Additional
load/store functions will be added here.
Change-Id: I52bf652c344c585ea2f3e1230886be93f5caefc3
During aosp builds with binutils-2.27, we're seeing linker error
messages of this form:
libvpx.a(subpixel_mmx.o): relocation R_386_GOTOFF against preemptible
symbol vp8_bilinear_filters_x86_8 cannot be used when making a shared
object
subpixel_mmx.o is assembled from "vp8/common/x86/subpixel_mmx.asm".
Other messages refer to symbol references from deblock_sse2.o and
subpixel_sse2.o, also assembled from asm files.
This change marks such symbols as having "protected" visibility. This
satisfies the linker as the symbols are not preemptible from outside
the shared library now, which I think is the original intent anyway.
Change-Id: I2817f7a5f43041533d65ebf41aefd63f8581a452
Increase the partition and acskip thresholds for temporal
enhancement layers.
~1-2% speedup, with negligible loss in quality.
Change-Id: Id527398a05855298ad9ddac10ada972482415627
For SVC 1 pass non-rd pickmode, the interpolation filter for the
upsampling of the golden (spatial) reference was not being explicitly
set and instead was takin gwhatever value was set in the previous
mode/block (which would be either EIGHTTAP or EIGHTAP_SMOOTH).
Fix it to the default EIGHTTAP for now, to be updated/selected
adaptively in a later change.
Minor adjustmemt to rate targeting thresholds in datarate unittests.
Change-Id: I52085048674072c6cfb7163e11e9a2658d773826
A more detailed explanation of the experimentation
leading to this change can be found in:-
https://docs.google.com/a/google.com/document/d/13lsYhxgPyxUHvEess6wg9nikaonIZKY9Ak_Lpafv5Mo/edit?usp=sharing
This change gives gains across all our standard test sets for
overall psnr, ssim, fast ssim and psnr-HVS.
Values expressed as % reduction in bitrate.
Low res set -0.257, -0.192, -0.173, -0.101
Mid res set -0.233, -0.336, -0.367, -0.139
High res set -0.999, -1.039, -1.111, -0.567
NetFlix 2K set -0.734, -0.174, -0.389, -0.820
Netflix 4K set -0.814, -0.485, -0.796, -0.839
Change-Id: Ie981fb3c895c9dfcfc8682640d201a86375db5c8
Speed up for speed 0.
Reduce 10+% of encoding time for hdres in speed 0,
with less than 0.1% PSNR loss.
Compute total difference of previous and current frame context probability
model. If the diff is less than the threshold, skip recoding the frame.
Borg test (positive number means performance loss):
lowres midres hdres
PSNR: 0.030 0.032 0.065
Local speed test: bitrate set at 1200
blue_sky pedestrian rush_hour
Encoding time: -10.0% -16.5% -16.5%
Change-Id: I4e2d200ea3115d48b2c3e890143596b31b8ef9e9
Introduced append situation in Commit 0178d97 which could be
confusing. Clean a little bit and add some comments.
Change-Id: I69ad336f805aca7ce9d45515b8cd237423fadbb2
When the noise estimate is forced off due to large motion,
reset the counter and set smaller window for next estimate.
Change-Id: Ifa4ec95396134173a00d48353ad52f1b6a40c217
Add option in SVC to set the filter type and phase for
the frame level downsampling filters.
For 3 spatial layers: set downsampling filter type to bilinear
and set phase to 8, for lowest spatial layer.
Change-Id: Id81f4b1ba93db19c1cd37b6a46d1281a2c61bc43
Makes more sense to call the corresponding partial idct C function
instead of the full idct C function as the reference.
Change-Id: Ibb7681dd063edd6307ba582c10c26c4c6a4b78c6
Base the condition on the resolution of the spatial layer.
And remove restriction on scaling factor.
Change-Id: Iad00177ce364279d85661654bff00ce7f48a672e
Read in a Q register. Works on blocks of 16 and larger.
Improvement of about 20% for 64x64. The smaller blocks are faster, but
don't have quite the same level of improvement. 16x32 is only about 5%
BUG=webm:1422
Change-Id: Ie11a877c7b839e66690a48117a46657b2ac82d4b
For lowest spatial layer, in 3 layer SVC, set the
downsampling filtertype to get averaging filter.
Needed for reducing aliasing on low-res layer,
small increase in overall encoder time.
Change-Id: Ia31460123bd91b72eca49b46dd924b9f226d4563
An intended behavior change disabling exhaustive searches in speed
feature causes VP9/DatarateTestVP9LargeDenoiser.4threads test failure.
Change the threshold to make it pass.
BUG=webm:1429
Change-Id: Ibcbe2314c6b2525799894f5d7204fc8eb4ec2a1e
Adjust thresholds for noise estimation, for resolutions above VGA.
Tends to push cleaner/low noise clips to LowLow state.
No change in RTC metrics.
Change-Id: I739ca6b797d0a60ccd1c6c6a2775269b1f007e5e
Set noise level to kLowLow for high motion low res clips.
Change the normalization in noise metric for low res.
Reduce the initial time-window for all resolutions.
Change-Id: Iaed39dbb50b205cd9c735dc5b84822304fb01987
Add support for everything except block sizes of 4.
Performance is better but numbers will improve again when the variance
optimizations land.
BUG=webm:1423
Change-Id: I92eb4312b20be423fa2fe6fdb18167a604ff4d80
When a neon version is available it will be called. This allows
decoupling the variance implementations and has no real downside. For
most configurations, the call will be #define'd to the neon
implementation.
Change-Id: Ibb2afe4e156c5610e89488504d366b3e6d1ba712
When the width is equal to 8, process two rows at a time. This doubles
the speed of 8x4 and improves 8x8 by about 20%.
8x16 was using this technique already, but still improved a little bit
with the rewrite.
Also use this for vpx_get8x8var_neon
BUG=webm:1422
Change-Id: Id602909afcec683665536d11298b7387ac0a1207
Some of the mixed sizes were missing. They can be implemented trivially
using the existing helper function.
When comparing the previous 16x8 and 8x16 implementations, the helper
function is about 10% faster than the 16x8 version. The 8x16 is very
close, but the existing version appears to be faster.
BUG=webm:1422
Change-Id: Ib0e856083c1893e1bd399373c5fbcd6271a7f004
Add 31bit pairs before unpacking in x86 block error code
AVX2 code provides a very minor performance improvement.
BUG=webm:1210
Change-Id: I4c82308eaf65741dca2f5c6db9be9c85f905073a
For SVC 1 pass real-time: add condition to skip the
golden (spatial) reference mode in non-rd pickmode.
Condition is to skip golden if the sse of zeromv-last mode
is below threshold. And change order in ref_mode_set_svc
to make sure golden zeromv is tested after last-nearest.
Speedup ~3-4% with little/negligible quality loss.
Change-Id: I6cbe314a93210454ba2997945f714015f1b2fca3
Approximates division using multiply and shift.
Speeds up both sizes (8x8 and 16x16) by 30 times.
Fix the call sites to use the RTCD function.
Delete sse2 and mips implementation. They were based on a previous
implementation of the filter. It was changed in Dec 2015:
ece4fd5d22
BUG=webm:1378
Change-Id: I0818e767a802966520b5c6e7999584ad13159276
Don't force disabling of adaptive_rd_thresh for realtime when
row_mt_bit_exact is set.
Row based adaptive rd is made usable in CL
454882(https://chromium-review.googlesource.com/c/454882) for REALTIME.
Change-Id: Ief023414f0fd6eb86f299dd46ae58f4436875af5
Add tentative max cpb size values for levels 5.2 and up. Otherwise
encoding will fail when targeting for these levels.
Change-Id: Ib7e0ba4b9836ea1ac900b6822543812843d48463
107de19698 changes the encoder alt-ref selection behavior. Assuming
min_gf_interval = max_gf_interval = 4, the frame order would be
frm_1 arf_1 frm_2 frm_3 frm_4 frm_5 arf_2 before 107de19698;
frm_1 arf_1 frm_2 frm_3 frm_4 arf_2 frm_5 after 107de19698.
This patch reverts such alt-ref placement change.
Change-Id: I93a4a65036575151286f004d455d4fcea88a1550
Make some speed setting changes for temporal enhancement layers,
and remove the switch in subpel_force_stop for the aggressive_base_mv
in non-rd pickmode.
Gain some 2-3% speed with little/negligible quality loss.
Change-Id: I3e2a7f80ff45f38c0a6ceb01b34dbca2f53edbf0
For speed >= 8 and color_sensitivity not set, skip the transform
skipping test in UV planes.
Add a new condition to check noise level to skip chroma check
for speed >= 8 if y_sad is high.
1~2% speedup on ARM for speed 8.
Borg tests show neutral results in both rtc and rtc_derf.
Change-Id: Idecd3ff6e28c97757a43bb6f3a7082c85f72109c
Add a low-variance high-sumdiff to the superblock content state
and use it to limit the mv and bias some decisions in non-rd pickmode.
Only affects speed >= 6.
Reduces artifact for lighting changes.
Small/no difference in metrics on RTC set.
Change-Id: Ic84b2379fe0ae3fa71ae826ee6bae3eaf551a25b
This patch followed allow_exhaustive_searches feature modification and
continued to modify the encoder to achieve the determinism in the row
based multi-threaded encoding. While row-mt = 1 and using multiple
threads, the adaptive feature in encoder was disabled, which gave
BDRate gain(at speed 1, -0.6% ~ -0.7%; at speed 2, -0.46% ~ -0.59%),
but some encoder speed losses(7% ~ 10% at speed 1 and 3% ~ 6% at
speed 2). These speed losses were acceptable considering the speed
gains obtained from row-mt.
Change-Id: I60d87a25346ebc487a864b57d559f560b7e398bb
A previous patch turned on allow_exhaustive_searches feature only for
FC_GRAPHICS_ANIMATION content. This patch further modified the feature
by removing the exhaustive search limit, and made it no longer adaptive.
As a result, the 2 counts that recorded the number of motion searches
were removed, which helped achieve the determinism in the row based
multi-threading encoding. Tests showed that this patch didn't cause
the encoder much slower.
Used exhaustive_searches_thresh for this speed feature, and removed
allow_exhaustive_searches. Also, refactored the speed feature code
to follow the general speed feature setting style.
Change-Id: Ib96b182c4c8dfff4c1ab91d2497cc42bb9e5a4aa
The more aggressive settings should only be used when denoise_svc
condition is satisfied (which means top spatial layer).
Change-Id: Ia8e3515b27f31bf21b1976ca80a2fa826daece3a
In non-rd pickmode (speed >= 5), avoid duplication of computations in
model_rd_for_sb_y when the speed feature use_simple_block_yrd is
enabled (or for high bitdepth build under certain conditions).
QVGA, VGA and HD have 1.23%, 2.68% and 1.7% speedup on ARM for speed 8,
respectively.
Encoding results are bitexact for speed >= 5.
Change-Id: I3f9130810c21439f5ad7e159e21cb2243dcd05f1
Re-enable the SVC tests, wrap the non-zero expectation
in GetMismatchFrames around #if CONFIG_VP9_DECODER.
Change-Id: I0e8a2d78b868c32f18fe597540f397d3a1b303b5
Slightly faster, the other dc predictors cannot be faster since
the computation speedup is overwhelmed by the time spent reading
dst to write just the 8x8 part.
Change-Id: I94a0b50500adf8b7b6bb919dbf5c7adf5b9fba66
The allow_exhaustive_searches feature improves the encoding quality
of FC_GRAPHICS_ANIMATION content a lot. For non-FC_GRAPHICS_ANIMATION
content, the quality test result is almost neutral. This patch makes
this feature to be used only for FC_GRAPHICS_ANIMATION content.
The motivation of doing that is to make this feature no longer adaptive,
which will be implemented in the following patch.
Change-Id: Ic911df6dd757402b6480789cc247801e99840369
Replace by CAST_TO_BYTEPTR/SHORTPTR.
The rule is: if a short ptr is casted to a byte ptr, any offset
operation on the byte ptr must be doubled. We do this by casting to
short ptr first, adding offset, then casting back to byte ptr.
BUG=webm:1388
Change-Id: I9e18a73ba45ddae58fc9dae470c0ff34951fe248
The scaling filter with zero shift will give sub-sampling for
2x downsampling. Allow for a phase shift to get an averaging filter.
Usage is for source scaling in 1 pass SVC mode for 1:2 downscale.
Reduces aliasing in downsampled image.
Keep the phase to 0/off for now.
Change-Id: Ic547ea0748d151b675f877527e656407fcf4d51e
Disable the 1 pass CBR SVC tests with temporal_layers > 1.
Issue with the commit 863f860, which will cause encoder/decoder
mismatch due to skipping encoder loopfilter for non-reference frames.
Will re-enable the tests when fixed.
Change-Id: I74918a0045a17976b069c4be63fbeb921974df0d
This condiiton is not needed as key_frame should set the refresh
of the reference frames, but good to have for clarity in condition.
Change-Id: Icf9838e7e4f0ff5cf0a9562ae3b5d6c7e6f78702
Modify the frame flags to update the ARF on top layer,
for the tests:
VP9/DatarateTestVP9Large.BasicRateTargeting3TemporalLayers
VP9/DatarateTestVP9Large.BasicRateTargeting3TemporalLayersFrameDropping
This is needed to fix the encode/decoder mismatches caused by 863f860,
and removed in the revert e9b7f98.
Change-Id: I6b9fecfdd17315fc0179e29949338c77636026c0
Buffers on 32 bit x86 builds only guaranteed 8 byte alignment. Fixed
with "AvgPred test: use aligned buffers" and "sad avg: align
intermediate buffer"
Also re-enable asserts on the C version.
BUG=webm:1390
Change-Id: I93081f1b0002a352bb0a3371ac35452417fa8514
This reverts commit 863f860bfc.
This causes encoder / decoder mismatches in various
VP9/DatarateTestVP9Large.BasicRateTargeting3TemporalLayers tests
BUG=webm:1408
Change-Id: Ic200c39d7ed9c0b0247ef562f5d6f7b2625f7e14
For low resolutions (<= CIF): use quarter-pixel and simple_block_yrd.
~5% gain on RTC_derf.
~6-7% slowdown on ARM.
Change-Id: I4439ebd1116b9decac04786503f978840b68a60c
Fix to avoid getting stuck at very low Q even
though content is changing, which can happen for --min-q=0.
Fix is to more aggressively increase active_worst_quality
when detecting significant rate_deviation at very low Q.
Change will only affect 1 pass VBR for --min-q < 4, so no
change in ytlive metrics for --min-q >= 4.
Change-Id: I4dd77dd7c08a30a4390da0ff2c8bda6fccfa76d7
Useful for SVC, where the top layer enhancement frames may
not update any reference buffers, as is the case for the
patterns in the 1 pass CBR SVC when #temporal_layers > 1.
~3% encoder speedup for SVC patterns with temporal layers
in 1 pass CBR mode.
Updated the SVC datarate tests for the mismatch frames.
Set the frame-dropper off in some tests with #temporal_layers > 1
so we can correctly set #mismatch frames. Adjusted rate target
threshold for tests where frame-dropper was turned off.
Change-Id: Ia0c142f02100be0fed61cd2049691be9c59d6793
Provides over 15x speedup for width > 8.
Due to smaller loads and shifting for width == 8 it gets about 8x
speedup.
For width == 4 it's only about 4x speedup because there is a lot of
shuffling and shifting to get the data properly situated.
BUG=webm:1390
Change-Id: Ice0b3dbbf007be3d9509786a61e7f35e94bdffa8
The MV unit test revealed an integer overflow issue in vp9_mcomp.c.
This was caused if the MV was very large. In mv_err_cost(), when
mv->row = 8184, mv->col = 8184 and ref_mv is 0, mv_cost = 34363
and error_per_bit = 132412, causing the overflow.
BUG=webm:1406
Change-Id: I35f8299f22f9bee39cd9153d7b00d0993838845e
Set adaptive_rd_thresh to 2 when simple block yrd is not used.
Fix regression caused by computing y sad without
int_pro_motion_estimation on low res motion clips.
Overall 0.07% quality loss on rtc_derf.
Change only affects low res on speed 8.
Change-Id: Ic6a188a56529f1034d6431005fb4b0e24e8a7e27
For speed 5, 1 pass CBR: Don't use the nonrd_pick_partition
on the segment, rather use choose_partitioning followed by
nonrd_select_partition (as is done on base segment).
Little/no quality loss on RTC and RTC_derf (< 0.3%),
speedup of at least 5%.
Change-Id: I5273d5f950e60adf5e437b4ca8c4f63964641e83
If the noise estimation is avoided due to large motion,
the last_source for denoising should still be updated.
Change-Id: I67155ea7dbe9ac2785978e64a27bdafd7d57aac0
To reduce refresh on partial super-blocks on boundary,
for noisy input. Reduces some artifacts on noisy input.
Change-Id: I10b5808a296874e08c7f378b3df58466591d8dbe
Edit
Move the condition for effectively disabling the denoising
for speed 5 into the vp9_denoiser_denoise().
This is cleaner, and also moving the condition into vp9_denoiser_denoise
will keep the denoiser buffer updated with the current source.
This allows for more consistent behavior if speed is changed midstream.
Change-Id: Ia001f591c56e454bf724c3ae73c024badb183ef8
To prevent the motion vector out of range bug, added a motion vector unit
test in VP9. In the 4k video encoding, always forced to use extreme motion
vectors and also encouraged to use INTER modes. In the decoding, checked if
the motion vector was valid, and also checked the encoder/decoder mismatch.
The tests showed that this unit test could reveal the issue we saw before.
Change-Id: I0a880bd847dad8a13f7fd2012faf6868b02fa3b4
For 8-bit the subtrahend is small enough to fit into uint32_t.
This is the same that was done for:
c0241664a Resolve -Wshorten-64-to-32 in variance.
For 10/12-bit apply:
63a37d16f Prevent negative variance
Change-Id: Iab35e3f3f269035e17c711bd6cc01272c3137e1d
Temporal denoiser runs in non-rd pickmode, so it is only used
for speed >= 5. Regression exists for speed 5, due to use of
reference_partition (which use non-rd pickmode for partitioning).
Avoid denoising for now at speed 5.
Change-Id: I74a74d2e1404d7cfd33dcf4ec06dd2e503256cf0
Base the low_content_frame metric on the motion vectors,
and adjust the logic for preventing golden update.
Small change in behavior: small positive gain (~0.2-1%) on clips
with high activity.
Change-Id: I0b861c8e9666cd82b45cde5ee57ee8a1e5ab453c
Code cleanup: merged two functions that were doing postencode
update for cylic refresh, remove some unused code and fix comments.
No change in behavior.
Change-Id: I9be0d7e346d34dec29bf4e5bb380a7bf81c8480a
BUG=webm:1397
(yunqingwang)
To verify that this patch wouldn't cause much performance change,
the Borg tests were run. Here was the result:
avg_psnr overall_psnr ssim
hdres: -0.002 0.006 0.013
midres: 0 0 0
lowres: 0 0 0
Change-Id: Iae395ae7b741e0513cf5bab9dcace110b792a67d
The row mt sync read uses sync_range = 1, and wouldn't work if we want
to use a sync_range that is greater than 1. To make it work, this sync
read code is modified. Pass in col instead of col - 1 to make it
consistent with other row mt code in VP9, and then add 1 in "while"
codition.
Change-Id: I4a0e487190ac5d47b8216368da12d80fec779c1a
Issue/bug happens for denoising with spatial layers, where
the golden (spatial) reference is used in pickmode, but
denoising is only done wrt to last (temporal).
Fix is to make sure set_ref_ptrs is set before build predictors
in denoiser.
Change-Id: I793cf441341edf7c4a88b8ab1e1b22b3cb0eb508
Temporary override to condition for disallowing intra-search in SVC,
since golden (spatial) reference is currently suppressed due to
artifact issue.
Change-Id: I28ed7fdddc9fcdbcc0a4175a247a3ecc94c11767
For non-rd variance partition, avoid the chrome check
unless y_sad is below some threshold.
Small decrease in avgPSNR (~0.3) on RTC set.
Small/negligible decrease on RTC_derf.
Change-Id: I7af44235af514058ccf9a4f10bb737da9d720866
Refactor to split the 1 passs source sad computation into scene
detection (currently used for VBR and screen-content mode), and
superblock based source sad computation (used in non-rd CBR mode).
This allows the source sad computation for CBR mode to be
multi-threaded.
No change in compression.
Change-Id: I112f2918613ccbd37c1771d852606d3af18c1388
d207/d63/d45/d117/d135/d153
~9-45% better depending on the predictor on 32-bit ARM, similar range on
x86-64
this matches the non-highbitdepth implementation
BUG=webm:1316
Change-Id: Iddebdf7c58c6f31c47cae04da95c6e5318200e4c
Make the source_sad feature work properly for cases of VBR or
screen_content with SVC.
Added unittest for SVC with screen-content on.
Change-Id: Iba5254fd8833fb11da521e00cc1317ec81d3f89b
tolerate a NULL hist being passed as a result of invalid parameters
passed to init_rate_histogram(). this fixes a divide by zero in
init_rate_histogram() with an invalid fps.
BUG=webm:1383
Change-Id: Id203e0f3b18d67a4a09aaf206abcce4708f966ec
Since y_sad is not computed yet (on the early exit due to source_sad),
no need to check for setting color_sensitiviy.
Only affects speed >=8. No change in behavior.
Change-Id: I3a6f2d20fed38d8b8ec51b75bcacf9a21f2db916
Allow for simple_block_rd for VGA resoln, and reduce
adaptive_rd_thresh to 1.
On average no loss on RTC set, ~4% speedup on mac.
Change-Id: Ib549c4061c853776062b5e34040f839d470fbebc
Change tests to reflect use. Input sizes will be 8 or 16 (but not
necessarily square).
filter_weight is capped at 2 and filter_strength at 6
Speed test, disabled by default.
Change-Id: Idfde9d6c4b7d93aaf0e641b0f4862c15e2a2af7a
Change it to row based array to avoid the slow down cause by sync.
row-mt on, speed 8, 2 threads: ~4% speedup for VGA on ARM benefited
from adaptive_rd_threshold.
Change-Id: I887e65a53af20a6c4f48d293daaee09dab3512cf
- Refer to patch: 48fca113d inv_txfm_ssse3,butterfly: fix win32 abi
compatibility.
- Change four butterfly() calls to butterfly_self(), to simplify the
operations.
Change-Id: Ib2a8cfe6cddcaf0a59e6e6270d8380055ea42ef3
Add additional condition to split to 16x16, for resolutions <= 360p,
reduces dragging artifact near moving boundary.
Small/no change on RTC metrics.
Change-Id: I314694f2166435d918f74e7ab42f002b07f40dae
For each superblock, keep track of how far from current frame
was the last significant content change, and use that (along
with GF distance), to turnoff GF search in non-rd pickmode.
Only enabled for speed >= 8.
avgPNSR on RTC/RTC_derf down by ~0.9/1.2.
Speedup on mac: ~3-5%.
Speedup on arm: 3.6% for VGA and 4.4% for HD.
Change-Id: Ic3f3d6a2af650aca6ba0064d2b1db8d48c035ac7
The sum of tx bloxk eobs is needed in the machine learning based partition
early termination. The eobs are first accumulated during tx search, and
then the value associated with the best tx_size is copied to ctx for later
use.
After the sum of eobs are calculated correctly, re-enabled
ml_partition_search_early_termination speed feature.
Re-did the quality/speed test to check the impact of the fix.
1. Borg test BDRATE result:
4k set: PSNR: +0.183%; SSIM: +0.100%;
hdres set: PSNR: +0.168%; SSIM: +0.256%;
midres set: PSNR: +0.186%; SSIM: +0.326%;
2.Average speed gain result:
4k clips: 21%;
hd clips: 26%;
midres clips: 15%.
The result is in line with the original result.
Change-Id: I4209a95c89be03b4cbfb6a95b16885f89feddbda
Add routine vp9_model_rd_from_var_lapndz_vec and call it from model_rd_for_sb
to model the rate and distortion for MAX_MB_PLANE Laplacian sources in
parallel. The caller ensures that all sources have non-zero variance.
Measured a 18% to 25% reduction in retired instructions, and 17% to 24%
reduction in instruction execution cost with different compilers for the
Laplacian modeling.
No change in behavior.
TEST=Verified that encoded files match bit for bit, with and without this
change.
BUG=b/33678225
Change-Id: I6b76947f21c659a349adb896e13e99f6e3f951e6
Don't denoise spatial layer frames whose base layer is a key frame.
Disallow golden reference for SVC with denoising on frames
that will be denoised (highest layer), as this removes bad artifact.
Will re-enable when issue is resolved.
Change-Id: I87a6597812330500966458172acfce54af65f70f
Fix the update of the denoiser buffer when the base
spatial layer is a key frame. And allow for better/lower
QP on high spatial layers when their base layer is key frame.
Change-Id: I96b2426f1eaa43b8b8d4c31a68b0c6d68c3024a2
Similar issue as Change bc1c18e.
The PartialIDctTest.ResultsMatch test on vpx_idct32x32_135_add_neon()
in high bit-depth mode exposes 16-bit overflow in final stage of pass
2, when changing the test number from 1,000 to 1,000,000.
Change to use saturating add/sub for vpx_idct32x32_34_add_neon(),
vpx_idct32x32_135_add_neon and vpx_idct32x32_1024_add_neon() in high
bit-depth mode.
Change-Id: Iaec0e9aeab41a3fdb4e170d7e9b3ad1fda922f6f
Reduce it from 5 to 4, small/no change in metrics or speed.
Small reduction in dragging artifact near moving head.
Change-Id: Ic3bc5ca67c70bf0c89fc2ed14454840a28ae5b6a
This patch was based on Yang Xian's intern project code. Further modifications
were done.
1. Moved machine-learning related parameters into the context structure.
2. Corrected the calculation of sum_eobs.
3. Removed unused parameters and calculations.
4. Made it work with multiple tiles.
5. Added a speed feature for the machine-learning based partition search
early termination.
6. Re-organized the code.
The patch was rebased to the top-of-tree.
Borg test BDRATE result:
4k set: PSNR: +0.144%; SSIM: +0.043%;
hdres set: PSNR: +0.149%; SSIM: +0.269%;
midres set: PSNR: +0.127%; SSIM: +0.257%;
Average speed gain result:
4k clips: 22%;
hd clips: 23%;
midres clips: 15%.
Change-Id: I0220e93a8277e6a7ea4b2c34b605966e3b1584ac
Fixes an issue when the LAST and golden is not used as a reference,
in which case its possible no encoding mode is set (since intra may be
skipped under certain codtions). Fix is to make sure intra is searched
if no inter mode is checked.
Issue can happen for temporal layer pattern#7 in vpx_temporal_svc_encoder.c
Change-Id: I5ab4999b2f9dbd739044888e0916b5ec491d966b
only the first 3 parameters can be aligned to 16 as required by __m128i,
make them all pointers for consistency.
since:
07c48ccfe Improve idct32x32_34_add SSSE3 intrinsics performance
BUG=webm:1384
Change-Id: I0324f701e723a27cb470036a180693ba8829d01d
shift the bsse[] member of the macroblock struct to the front to avoid
an incorrect offset (0) to the upper half of bsse[0] which leads to a
negative resulting in a crash. restrict this to visual studio versions
before 2015 (the bug was observed with 2013, fixed in 2015) to avoid any
potential cache impact on other platforms.
https://connect.microsoft.com/VisualStudio/feedback/details/2396360/bad-structure-offset-in-32-bit-code
BUG=webm:1054
Change-Id: I40f68a1d421ccc503cc712192263bab4f7dde076
Enable row-mt for SVC for real-time mode, speed >=5.
Add the controls to the sample encoders, but keep it off for now.
Add the control and enable it for the 1 pass CBR unittests.
For speed 7, 3 layer SVC, 2 threads, row-mt enabled gives about ~5% speedup.
Change-Id: Ie8e77323c17263e3e7a7b9858aec12a3a93ec0c1
- Split the inv txfm into three parts to avoid stack spillover.
- Function level speed improves ~12%.
- Use function and macro to remove some repeated code.
Change-Id: I14f5f072334fd766808cb52bf648df792e7379ee
this is similar to the x86 configuration and helps mitigate an issue
with a circular dependency between this function and the ssse3 variant
causing an outsized increase in binary size (~300K for chrome)
chrome.dll:
.text 255B000 -> 252B000
.data 7B000 -> 75000
-221184 bytes
BUG=chromium:697956
Change-Id: Ic95b142ecd62dd4f1795788aa27dd8fab59b708c
The 2 thresholds(i.e. partition_search_breakout_dist_thr and
partition_search_breakout_rate_thr) are used as the partition search
early termination speed feature. This refactoring patch made this
feature to be frame size dependent consistently throughout the code.
Change-Id: Idaa0bd8400badaa0f8e2091e3f41ed2544e71be9
Most are cosmetics changes.
Speed has no change with clang 3.8, and about 5% faster with gcc 4.8.4
Tried the strategy used in 8x8 and 16x16 (which operations' orders are
similar to the C code), though speed gets better with gcc, it's worse
with clang.
Tried to remove store_in_output(), but speed gets worse.
Change-Id: I93c8d284e90836f98962bb23d63a454cd40f776e
Add ppc, ppc64 and ppc64le on all_platforms and ARCH_LIST
Add VSX flags and check for -mvsx
Define empty setup_rtcd_internal
Add Altivec detection based on:
http://freevec.org/function/altivec_runtime_detection_linux
Detect VSX at runtime when enabled
Change-Id: I304f4d8c5fee0ff19b6483cd2e9cc50d6ddec472
Signed-off-by: Rafael de Lucena Valle <rafaeldelucena@gmail.com>
When eob is less than or equal to 135 for high-bitdepth 32x32 idct,
call this function.
BUG=webm:1301
Change-Id: I8a5864f5c076e449c984e602946547a7b09c9fe6
Fix the conditon for getting last_source when denoising is on.
This avoids unneeded scaling in the case of SVC.
No change in quality.
Change-Id: I32c1c2c9085104da51af8535716bcc4d55fb0f42
This may fix the time out failure of valgrind tests in nightly
since more coverages were added on row-mt.
Change-Id: Id9414e66d1a266602c7495243d9f5cb69e17ccdc
clear the entire array on error. the size used previously was equal to
the number of elements.
BUG=webm:1364
Change-Id: I2f2e16ed6e867f41d4774a5a8ac9cedaee11ce46
Reduce the level from 4 to 2.
This gives ~1-2% quality gain on RTC set, with small decreaee in speed (~1-2% on mac).
Change-Id: I7d959731badcee3d45b2f4a08efe378765016a13
- Split the transform into first half and second half.
- Reschedule the instructions to avoid stack spillover.
- Function level speed improves ~16%.
Change-Id: I166889840d23aa8a273eca00f6fbdae8b4566f35
Moves the def from vpx_encoder.h -> vpx_codec.h. The defined value
is changed as part of this move.
Adds the value to decoder capabilities when CONFIG_VP9_HIGHBITDEPTH.
Change-Id: I7d61fc821cda29f1e32bb9b2b9ffd3d83966e419
This reverts commit d3db846cc5.
This change causes a large drop in psnr (4-5db) on low framerate
difficult content (tested at 360/480p)
BUG=b/35804225
Change-Id: I8e90012d3b9c8a0cddb062ba93b01b36c0e0c0a0
From commit:
https://chromium-review.googlesource.com/c/441393/
On non-segment the set_vbp_thresholds() should be called
again to adjust thresholds based on content_state of superblock.
This was the intended behavior from 441393.
Small change in RTC metrics and speed.
Change-Id: I45e5fbdc4af74db76b3cb4f13074fcae0eb2219e
new_mt is a very generic name that will get obsolete soon enough.
Since this is exposed as a codec control, renaming it to row_mt to
signify row level paralellism. Also renaming the ETHREAD_BIT_MATCH
codec control to ROW_MT_BIT_EXACT.
Change-Id: Ic7872d78bb3b12fb4cf92ba028ec8e08eb3a9558
Re-organized the encoder threading tests and grouped tests into
4 parts. Added PSNR checking test to make sure the PSNR variation
is within a small range.
BUG=webm:1376
Change-Id: I09edb990236a87a4d2b2b0e1ceaf6c6435a35eff
vp9_highbd_block_error_8bit_c was a very simple wrapper around
vp9_block_error_c. The SSE2 implemention was practically identical to
the non-HBD one. It was missing some minor improvements which only
went into the original version.
In quick speed tests, the AVX implementation showed minimal
improvement over SSE2 when it does not detect overflow. However, when
overflow is detected the function is run a second time. The
OperationCheck test seems to trigger this case and reverses any
speed benefits by running ~60% slower. AVX2 on the other hand is
always 30-40% faster.
Change-Id: I9fcb9afbcb560f234c7ae1b13ddb69eca3988ba1
Only works for bitdepth = 8 when compiled with high bitdepth flag.
4x speed ups for handling 1:2 down/upsampling.
Validated manually for:
1) Dynamic resize for a single layer encoding
2) SVC encoding with 3 spatial layers
Results are bitexact with the patch and the speed gain (~4x) in the
scaling was verified.
BUG=webm:1371
Change-Id: I1bdb5f4d4bd0df67763fc271b6aa355e60f34712
The reduction showed improvement on RTC when aq-mode=3 is on.
Add that (cyclic refresh enabled) to the condition.
Only affects 1 pass CBR.
Change-Id: I5d0843002d8e31d7c165098a62e7a71146b08664
For speed 8 only.
3% speed up for QVGA and 6.3% for VGA on Nexus 6.
~3% avgPSNR decrease on rtc_derf and 2.9% on rtc.
Disabled for now.
Change-Id: I70133f1f6c804d663d594df437bfe7fdb0030d6a
The output needs to be aligned. Input is read with 'movq' not 'movqda'
so it is not expected to be aligned.
Change-Id: Ibd48a84c1785917a6a97c3689a05322abba486b4
Increase the variance partition thresholds for superblocks that
have low sum-diff (from source analysis prior to encoding frame).
Use it for now only for speed >= 7 or for denoising on.
Small change on metrics for rtc set: less than ~0.1 avgPNSR decrease
on RTC set, for both speed 7 and 8.
Change-Id: I38325046ebd5f371f51d6e91233d68ff73561af1
Use the simple block_yrd under certain conditions.
The optimization code is completed but the speed is still slower
(~6% on 720p) than the low-bitdepth build.
For now, use the more complex block_yrd under certain conditions
(always use it for speed <= 5, otherwise use it on key frames and for
bsize >= 32x32).
This gives about ~2-3% gain in quality for speed 7 on RTC set
(over high bitdepth build), with about the same encoder fps as the
low bitdepth build.
Change-Id: Ibe92a1945d0bd635f880befb4c815727df62d754
Modified the code to facilitate bit-match tests in first pass
Added unit-tests to test the row based multi-threading behavior for bit-exactness
Change-Id: Ieaf6a8f935bb1075597e0a3b52d9989c8546d7df
This change subtracts out low complexity intra regions that are also low
error in the inter domain, in the calculation of the frame prediction decay.
The rationale here his that low complexity regions (such as sky) do not imply
high prediction decay in the same way as high error intra or neutral blocks.
The effect of this is small in most clips but in a few clips it can be > 10%.
(E.g. In to tree)
Change-Id: If67ac23d17fca14285cad2defa464c61c9ea861c
vp9[_highbd]_quantize]_fp[_32x32] and vp9_fdct8x8_quant do not make use
of these parameters.
scan is used for C code and iscan is used for SIMD implementations.
Change-Id: I908a0ff7d3febac33da97e0596e040ec7bc18ca5
* changes:
quantize_fp_32x32 highbd ssse3: enable existing function
quantize_fp highbd ssse3: use tran_low_t for coeff
quantize_fp highbd sse2: use tran_low_t for coeff
- Replace the corresponding assembly code.
- No user level speed performance degrade.
- Unit tests passed.
Change-Id: Idd0c5a4bad4976f1617c34100cb46e75e3b961e5
This was created as part of the quantize_fp_ssse3 change. Both
functions use the same source file with different macro parameters.
Change-Id: I267050a559426a85955d215aa0aaca270439c5ab
The previous implementation confused bit/bytes/elements. It was using
'32' as the multiplier but that was mistakenly adopted because a 32x32
transform embedded the stride.
Change-Id: Ieeb867a332416b9a40580b5e7c9b20088e9e691a
The weight segment needs to only be computed once per frame,
so remove it from the funciton vp9_cyclic_refresh_rc_bits_per_mb(),
which is called within a loop inside vp9_rc_regulate_q.
Change-Id: Ia0e18b89abb97e42c466d4dbc47700d7f76555db
vp9_compute_qdelta_by_rate has almost 2% overhead in profiling on Nexus 6.
Reduce the calling of that function in speed 8 by estimating the delta-q.
Both rtc and rtc_derf show little/no change in avg psnr/ssim.
Encoding speed is 2~3% faster on Nexus 6.
Change-Id: If25933715783f31104a18a5092ea347b1221b5f5
This small change replaces the frame boost check in the arf group
length break out clause with a test against a prediction decay value.
The boost value is in fact partly dependent on the decay value but
this change means that the per frame boost calculation can be adjusted
without influencing the group length calculation.
The value chosen gives a close match on all the test sets with the previous
code (on average) but it was noted that a lower threshold was slightly better
for 1080P and up and a slightly higher value for small image sizes.
Change-Id: I4d5b9f67d5b17b0d99ea3f796d3d6202fd61ee0c
The function scale_sse_threshold() returns a threshold scaled
if necessary for use with 10 and 12 bit from an 8 bit baseline.
SSE error values would be expected to rise for the 10 and 12
bit cases where there are more bits of precision.
Hence the threshold used for the test should also be scaled up.
Change-Id: I4009c98b6eecd1bf64c3c38aaa56598e0136b03d
When eob is less than or equal to 38 for high-bitdepth 16x16 idct,
call this function.
BUG=webm:1301
Change-Id: I09167f89d29c401f9c36710b0fd2d02644052060
(Yunqing Wang)
This patch implements the row-based multi-threading within tiles in
the encoding pass, and substantially speeds up the multi-threaded
encoder in VP9.
Speed tests at speed 1 on STDHD(using 4 tiles) set show that the
average speedups of the encoding pass(second pass in the 2-pass
encoding) is 7% while using 2 threads, 16% while using 4 threads,
85% while using 8 threads, and 116% while using 16 threads.
Change-Id: I12e41dbc171951958af9e6d098efd6e2c82827de
This matches bitdepth_conversion_sse2.asm and produces substantially
better assembly. The old way had lots of 'movzwl' and 'shl' and storing
back to memory before loading into an xmm register.
Change-Id: Ib33e35354dfd691a4f8b1e39f4dbcbb14cd5302b
Clears up static clang analysis warning regarding divide by zero.
Trying to explain to the compiler how it's impossible to avoid
incrementing num_blocks at least once is difficult.
Change-Id: Ibaae43be572e5cd7a689b440dcd341c17d33443b
Where clang static analysis or gcc -Wmaybe-uninitialized warns of
uninitialized values, assign 0 to ints, MB_MODE_COUNT to
MB_PREDICTION_MODE, and B_MODE_COUNT to B_PREDICTION_MODE.
Assert that the modes have been changed from the invalid value by
the end of the function.
Change-Id: Ib11e1ffb08f0a6fe4b6c6729dc93b83b1c4b6350
While the new-mt mode is enabled(namely, allowing to use row-based
multi-threading in encoder), several speed features that adaptively
adjust encoding parameters during encoding would cause mismatch
between single-thread encoded bitstream and multi-thread encoded
bitstream. This patch provides a set_control API to disable these
features, so that the bit match bitstream is obtained in the unit
test.
Change-Id: Ie9868bafdfe196296d1dd29e0dca517f6a9a4d60
broken since:
c3f095c8b Merge "Fix to avoid abrupt relaxation of max qindex in recode path"
5f21aba4b Fix to avoid abrupt relaxation of max qindex in recode path
the original change pre-dated the addition of .clang-format
Change-Id: If5e399d9a805bcad9147360b13b36fbc8c560a7c
VBR method that allows a wider Q range for the first normal frame
in each ARF group and then centers the min - max range for the rest of
the arf group on the chosen Q value for that first frame.
This allows for quite rapid adjustment of the active Q range even if the
initial estimate is poor.
In some cases where the ARF frames themselves are tending to
undershoot but the normal frames are overshooting this can still give
net undershoot. This can be corrected by allowing a larger Q delta for
arf frames but is usually is a sign that the allocation to the arfs was to
high.
Change-Id: Icec87758925d8f7aeb2dca29aac0ff9496237469
Temporary fix until optimization work for block_yrd is completed.
This essentially reverts back to the state before the change:
https://chromium-review.googlesource.com/c/433821/
Compression loss is about ~5-6% on RTC set.
Speed-up (from using this simple/model-based block_yrd) over the low
bitdepth builds (which uses more complex block_yrd) is ~5% on 720p.
Change-Id: Ie0af9eb0d111e5595f587870c44f08317403b8d8
this prevents a rollover when tv_sec is a long:
signed integer overflow: 2776 * 1000000 cannot be represented in type
'long'
Change-Id: I03dc4476ee122b02e2856dad28358a20cf16a9f8
The RunQuantCheck() test on it exposes 16-bit overflow in stage 7 of
pass 2. Change to use saturating add/sub for both
vpx_idct16x16_38_add_neon() and vpx_idct16x16_256_add_neon() for high
bitdepth.
Change-Id: Ibf4c107a887553a52852cc582e28d38a5a5a2712
Add factor to increase varianace partition and ac skip thresholds,
under certain conditions (noise level and sum_diff), to increase
denoiser speed.
Change-Id: I7671140ef3598bf5f114a72623d68792bcd7b77b
Threshold for partitioning only affects VGA and lower res.
0.07% quality regression is observed in borg tests on rtc_derf
and 0.2% regression on rtc.
5.6% speed up for low res and 6.8% for VGA on Nexus 6.
Change-Id: If85a2919b48c991de66059c90f32ed06980452be
Fixed the following issue.
..\test\vp9_ethread_test.cc(69): warning C4805: '|=' : unsafe mix of type 'bool' and type 'int' in operation [C:\src\buildbot\test-libvpx\tests\dveCPjwhBE\.build-x86_64-win64-vs10\test_libvpx.vcxproj]
..\test\vp9_ethread_test.cc(69): warning C4800: 'int' : forcing value to bool 'true' or 'false' (performance warning) [C:\src\buildbot\test-libvpx\tests\dveCPjwhBE\.build-x86_64-win64-vs10\test_libvpx.vcxproj]
Change-Id: I37f897cf12a0b7500d2fcbac9e4615f08a83fdb4
Modified the encoding stage to have row level entry points with relevant
initializations and to access the token information at row level
Change-Id: Ife10e55a7c1a420ee906d711caf75002688d9e39
* changes:
hadamard highbd ssse3: use tran_low_t for coeff
hadamard highbd neon: use tran_low_t for coeff
hadamard highbd sse2: use tran_low_t for coeff
Clears up static clang analysis warning regarding a dead store. Only
declare 'c' when it will be used.
Change-Id: I1ac0fc7f94bc44da63938c63cd1efcd6b95e0eb3
This commit resolves the compression performance regression in
real-time encoding setting when high bit-depth mode is enabled.
The current solution temporarily disables the SIMD implementations
of vpx_satd, hadamard8x8, and hadamard16x16 in high bit-depth mode.
The commit makes the coding results bit-wise identical between
regular coding pipeline and high bit-depth at profile 0.
BUG=webm:1365
Change-Id: Icfb900821733749685370460a1a5a7e07f76f4bf
Clears a clang static analyzer warning where 'cols' is assumed to be
less than 0, preventing the for loop from executing.
The assembly already requires that the size be 8 or 16 (U/V or Y plane)
and cols is a multiple of 8.
Change-Id: Ica4612690ead1638c94cfe56b306e87f8ce644f9
In non-rd pickmode: Allow speed 7 to also use larger block size in
model_rd. Small change in behavior for speed 7.
Change-Id: I8c5523e424308e8f0bc71b3f6324dec42a464cc8
In non-rd pickmode: small change in behavior for speed 6 and 7.
Remove condition on HIGHBITDEPTH flag.
Change-Id: I360a13fcc313d72612fe9b918162ef4bb278cdea
Add Buffer features for:
Setting the buffer to the output of an ACMRandom function.
Copying a buffer.
Comparing two buffers.
Printing two buffers.
Change-Id: Ib53fb602451a3abdcee279ea2b65b51fbc02d3df
Skip denoising for blocks < 16x16, and for block = 16x16
skip denoising for low noise levels and width > 480 for now.
Allow for some speed-up in denoiser.
Change-Id: Ib46cefe4741962d145fa08775defea3a9c928567
(yunqingwang)
1. Rebased the patch. Incorporated recent first pass changes.
2. Turned on the first pass unit test.
Change-Id: Ia2f7ba8152d0b6dd6bf8efb9dfaf505ba7d8edee
Increase the qp-delta, mainly for low resolutions,
excluding case of very low bitrates.
avgPSNR/SSSIM gain of ~3-5% on rtc_derf set.
Small change on rtc set.
Change-Id: Ice03d04bd0340404d1957666ef154fd64fed0606
Affecting only speed 8.
Speed tests on Nexus 6 show 4% faster for QVGA and 2.4% faster for VGA.
Little/negligible quality regression observed on both rtc and rtc_derf sets.
Change-Id: I337f301a2db49a568d18ba7623160f7678399ae1
(Yunqing)
This patch added the missing initialization in temporal filter.
Borg test BDRate results:
PSNR: -0.019%(lowres); -0.013%(hdres);
SSIM: -0.001%(lowres); -0.010%(hdres).
Other q values gave comparable but no better results.
Change-Id: I7ad0c18b39e6f558342688e2fe1e12fdb133ce9b
This currently runs 1000 * 1000 = one *million* times which is quite
unnecessary. It's one of the slowest items in Jenkins and takes over an
hour for each of the larger transforms.
Change-Id: I01653b5e610683e1a2d778ec60cf5065562ab8db
Only for speed >= 7, and affects skipping of intra modes.
Threshold is set low for now, needs to be tuned.
Small/no difference in metrics on rtc clips.
Change-Id: If9bdbd43f08d1f80407cdd2e9e5e96780dcd2424
Added the multi-threaded first pass encoder unit test in VP9. The test is
to check if the new multi-threaded first pass encoder(namely, new-mt = 1)
still generates matching stats. In the unit test, the new-mt mode will be
turned on once the multi-threaded first pass implementation is checked in.
Change-Id: Ic21bb1a55c454f024cfd2b397a4c148cfe638218
For short_circuit set to level 1, skip newmv for 64x64 blocks if the
low temporal variance flag is set. Also modify threshold for 64x64 split
in variance partitioning.
Overall speed-up on noisy clips of 2-4%.
Only affect speed >= 7.
Change-Id: I384b3772007e84de6f8707e480d2ddf1fe1f907d
Avoid quality loss when copying partition of superblock with large motions.
Maximum consecutively copied frames can be set (currently 5).
Change-Id: I11c30575514f02194c0f001444cf4021609e5049
Also set the flag to 1 when exit early choosing 64x64 block
such that skipping new mv for golden works in these scenerios.
Change the size of prev_segment_id to the number of superblocks
to save memory.
Borg test shows quality regression of 0.012% on average PSNR
and 0.035% on SSIM.
Change-Id: I5014224c8617d439d35c66ece3fed9ae30b31d23
Adds an optional output framestats.csv file that prints comparions
per-frame instead of averaged over the entire clip. It prints
per-channel and combined metrics for SSIM and PSNR.
Change-Id: Id28dfade27bc5775b59a9d83cfe8b37d1d52b686
The fix relaxes the max qindex based on the data from previous loop of
coding if output frame size is greater than maximum frame size allowed
Change-Id: Iac1f63ec67559d68766e090a7cbb80b812b2560f
before calling vp9_apply_encoding_flags() which may crash if the
resolution was invalid. this is the same change as:
c0523090b vp8e_encode: check validate_config return
BUG=https://bugzilla.mozilla.org/show_bug.cgi?id=1315288
Change-Id: Icd2aab322422e83d3a778fca6d7789e5000239d7
If enabled will compute source_sad for every superblock on every frame,
prior to encoding. Off by default, only on for speed=8 when
copy_partition is set.
Change-Id: Iab7903180a23dad369135e8234b7f896f20e1231
Used with --framestats=file.csv. Currently prints raw codec QP (not
internal 0-63 range) and bytes per frame.
Change-Id: Ifbb90129c218dda869eaf5b810bad12a32ebd82d
Avoid many visual artifacts. Compression quality is improved by more
than 1%. Encode speed is about 4% for QVGA and 6% for VGA faster on
android.
Change-Id: I4dd0a81429ddf7efdef1e80a191da5fb8de8e8af
Renames SSIM to VpxSSIM as an upscaled weighted SSIM metric, then prints
Y, U and V channels unweighted as well as a weighted but not scaled SSIM
score that's 8/1/1 parts Y/U/V (same as VpxSSIM).
Change-Id: Iff800cc8f145314eeb1a9b4af1e11a25bec095ca
For speed 8, it speeds up the encoding on android by 6% for QVGA and
7.4% for VGA with the new threshold. Overall PSNR is improved by 0.667
for rtc.
Change-Id: I4a644560b32c0b5b4e9f49ffb953d000413a3732
If enabled denoiser will only denoise the top spatial layer for now.
Added unittest for SVC with denoising.
Change-Id: Ifa373771c4ecfa208615eb163cc38f1c22c6664b
This commit reworks the SSSE3 implementation of the forward 8x8
2D-DCT. It uses a cyclic rotation approach to the temporary xmm
registers. It reduces the average cycles from 158 to 154. The SSE2
version uses 169 cycles.
Change-Id: I1b79b9642aae0ed3fb3cefb5b70246e6de5d5caa
When aq=3 mode is on and the gf_cbr_boost is set: make sure golden frame
is always refreshed, and don't incorporate segement cost in qp setting
on the boosted golden frame.
Better performance on RTC set with gf_cbr_boost on,
for example with gf_cbr_boost=50, gains from ~0.5-3%.
Change-Id: Ie811f5e4d444ff3320bd6e2c1745b2c4c09a8460
Quality improved by 1.866 and 0.386 for two noisy clips (dark720p and
marcooffice720p), respectively.
Change-Id: Ib33a7672ae9ca53da156208f7cd13f07b5543e44
Avoid the qp-clamping on gf/alt frame if gf_cbr_boost_pct is set.
Change only affect CBR mode when gf_cbr_boost_pct is set.
Change-Id: I0655ed4f2b047c8ed1ed33a070c17960ad776704
This was much more amenable to optimization than the across filter.
Speedup of almost 2.5x
BUG=webm:1320
Change-Id: I49acc0f9cb2e7642303df90132cbc938acade4c4
Speed test shows 25% gain on vpx_idct16x16_256_add_neon(),
and vpx_idct16x16_10_add_neon() got trippled.
Change-Id: If8518d9b6a3efab74031297b8d40cd83c4a49541
The speedup is pretty poor. I would be concerned except the SSE2 is
worse:
Existing SSE2 improvement: 22%
New neon improvement: 35%
BUG=webm:1320
Change-Id: Ied598a261134aa6cbe69f96f58589d2bae17bf62
Increase the boost threshold below which GOLDEN update will use same
rate correction factor as INTER_NORMAL.
Improves performance when gf_cbr_boost_pct is set (between 0 and 100)
in CBR mode.
Change-Id: I9f54cc18664786a100b13a416b7137ae03bd0cab
Set short_circuit_low_temp_var to 3 for speed 8 for all res.
No strong visual difference on all clips.
Change-Id: Ia6d9a314291ab1c14d5421bbdd769974083aeb2a
Constraints on encoder config:
-target_bandwidth is no larger than 80% of level bitrate limit
-target_bandwidth * (1 + max_over_shoot_pct) is no larger than
88% of level bitrate limit
-min_gf_interval is no smaller than level limit
-tile_columns is no larger than level limit
Constraints on rate control:
-current frame size plus previous three frames' size is no larger
than the CPB level limit
-current frame size is no larger than 50%/40%/20% of the CPB
level limit if it's a key/alt-ref/other frame.
Change-Id: I84d1a2d6d6e3c82bfd533b3309ce999cfaba2c8b
The source sad could be used to copy the partition without going into
choose_partitioning function to speed up vp9 encoding. Computing source
sad takes little time. Speed test on Android and Linux shows little
encoding time gain (less than 1.4%).
Turned off for now since partition copy is turned off.
Change-Id: I61c9d5b8f22329760cb29a4ee30a7f9c232ce8d3
vp9: Set short circuit to level 3 for VGA for speed 8. Also change the
threshold_32x32 to 5/8*thresholds[1] to improve quality regression
caused to VGA clips.
Change-Id: Ia1590e91e7cb22be78d5b85013387bb1be4272e3
Comment out check on buffer underrun, as it currently fails
on some of the svc tests.
Also cast the update of bits_in_buffer_model_, as this can
go negative now due to the buffer underrun.
This fixes the issue in #1352.
BUG=webm:1350
BUG=webm:1352
Change-Id: Ibd4ef23921daf09e5c15b000aca904aa4573599c
For out-of-range cases, returned UINT_MAX instead of INT_MAX in the
sub-pixel mv search to be consistent with the "uint32_t" return type.
Change-Id: I8e206d771228c13d89bafbbe9f14722c8ecc6a7a
The function 'vp9_find_best_sub_pixel_tree_pruned_more' is modified
to return INT_MAX for handling invalid MV cases from UINT32_MAX.
yunqingwang:
patch 3: rebased on top of the tree.
patch 4: The return type of vp9_find_best_sub_pixel_tree* was changed
to uint32_t to fix ubsan warnings. Changing UINT_MAX back to INT_MAX
was not quite right. Patch 4 modified vp9_temporal_filter.c to accept
uint32_t.
(Note: Inconsistency exists in vp9_find_best_sub_pixel_tree*, which
will be fixed in a separate CL.)
Change-Id: Ib1a79dc2aa41ea6335c21669c76883cdbb7e0535
This reverts commit f0b491a524.
This change results in unsigned integer overflows (as reported by
-fsanitize=integer) in datarate_test.cc,
for many of --gtest_filter=VP9/DatarateOnePassCbrSvc.OnePassCbrSvc*:
unsigned integer overflow: 167198 - 185560 cannot be represented in type
'unsigned long'
As the encoder didn't change, but the input with the change to
(correctly) use Y4mVideoSource, this revert is merely masking the issue.
BUG=webm:1352
Change-Id: Iecd9a6c83b3fca67c566732a5c92d36193cc2060
Fix compile warnings about implicit type conversion for
target=armv7-android-gcc in vpxenc.c.
BUG=webm:1348
Change-Id: I9fbabd843512f2a1a09f4bb934cd091e834eed9c
Comment out check on buffer underrun, as it currently fails
on some of the svc tests.
BUG=webm:1350
Change-Id: I73c88b800cdcc06bd2f900f7b7e2a5fd08248065
When source frame is altref, we only do zero-mv mode, so we can skip
the find_predictors(). No change in compression.
Small speed gain, ~1%.
Only affects 1 pass vbr with lookhead altref, for ytlive with
the macro flag USE_ALTREF_FOR_ONE_PASS on.
Change-Id: I9318c5da8521f017bf54919cd652438b3a6313d1
Remove superfluous test. Produces a small improvement in instruction scheduling.
Measured a 1% to 1.5% reduction in execution time for routine vp9_optimize_b
with different compilers.
No change in behavior.
TEST=Verified that encoded files match bit for bit, with and without this
change.
BUG=b/33678225
Change-Id: I2bf248d4c25fc0256147d7a8766ff9108ae9cba3
Add feature to copy partition from the last frame.
The copy is only done under certain conditions that SAD is below threshold.
Feature is currently disabled, until threshold is tuned.
Feature will be initially used for Speed 8 (ARM).
Under extreme case of always copying partition for speed 8:
Encode time is reduced by 5.4% on rtc_derf and 7.8% on rtc.
Overall PSNR reduced by 2.1 on rtc_derf and 0.968 on rtc.
Change-Id: I1bcab515af3088e4d60675758f72613c2d3dc7a5
Simplify address arithmetic on token_costs to reduce the number of generated
instructions that are used for address arithmetic inside routine
vp9_optimize_b. It also helps improve instruction scheduling depending on
compiler and optimization level.
Measured a 9.3% reduction in retired instructions and 5.3% reduction in
execution time for this routine with GCC v4.8.4 and optimization flags -O3,
and a reduction of up to 11.6% in execution time with other compilers.
No change in behavior.
TEST=Verified that encoded files match bit for bit, with and without this
change.
BUG=b/33678225
Change-Id: I6098650fb5cd2aa04e014fe6e68ca20761f3a21f
relocate the assignment to 'in' outside of the for loop. this quiets a
spurious warning in visual studio builds since:
86e340c enable vpx_idct32x32_1024_add_neon in hbd builds
+ give the variable a more descriptive name
BUG=webm:1294
Change-Id: I5c3da5c7939621477e0fc0ad3a1b2a3045c5bffd
Correctly set interp_filter to SWITCHABLE for INTRA mode.
Also reduce threshold on noise level for re-evaluating zeromv.
Change-Id: Id32c01e193209fb380aa07204f0be3babf29f70a
For when denoising enabled: change condition to enable
the recheck_zeromv_after_denoising for only very high noise level.
This is causing an issue, so enabling it for very high noise
to effectively shut it off.
Change-Id: Ic40d6025f3f398338cedd270d17c0ccd9a3daa84
The new test is causing valgrind failures:
[ RUN ] SSE2/VpxPostProcDownAndAcrossMbRowTest.CheckCvsAssembly/0
==28923== Invalid read of size 16
28923== at 0x724016: ??? (deblock_sse2.asm:146)
Disable during investigation. The test is new but the code is not.
Change-Id: I5521e5fd48a595e3798b833bf7e3cc97b81c1975
To avoid decode performance hit of 2% when running on hyperthreaded
cores.
This patch only uses the mutex's when we are running tsan.
This is safe because 32 bit operations like read and store are atomic
on all the platforms we care about. Tsan warns about race situations,
but in this case either situation ( read occurs before write or write
before read) the worst case is that we go around one extra time in the
loop. So the ordering doesn't really matter.
That said a few other things have been tried :
for instance as per here:
webrtc/base/atomicops.h#52
In this patch they use:
__atomic_load_n(i, __ATOMIC_ACQUIRE);
__atomic_store_n(i, value, __ATOMIC_RELEASE);
This code works on gcc, clang ( replacing protected write and read), and
avoids tsan errors. Incurring no penalty in performance. In C11 its
replaced by straight atomic operands.
However there is no equivalent in the visual studio's we support as
int32 on all windows platforms is already atomic. To avoid tsan like
warnings on windows we'd need to use interlocked exchange and the
end result doesn't gain us any thing.
Change-Id: I2066e3c7f42641ebb23d53feb1f16f23f85bcf59
Implement vpx_post_proc_down_and_across_mb_row in NEON.
Runs about 6-7x faster than C.
BUG=webm:1320
Change-Id: Ic5c7d3552a88cfcf999ec5bf2bd46fee460642c2
The flag USE_ALTREF_FOR_ONE_PASS allows for alt-ref lookahead
in 1 pass vbr (from https://chromium-review.googlesource.com/#/c/365498).
This change is to make sure this macro flag only has effect if
the config flag cpi->oxcf.enable_auto_altef is also on.
No change in ytlive encoding, as USE_ALTREF_FOR_ONE_PASS is not
yet enabled.
Change-Id: I1a69681e4a15c5244581a3dab4587fca08f02e0f
Reapply this patch:
ff0107f Amend and improve VP8 multithreading implementation
Amended the patch to add a unit test, and fix an asan error.
BUG=webm:851
Change-Id: I6572c03256169c64e80248bf5a5e99f59a2fc93c
use_base_mv assumes 2x2 scaling, so fix is to shutoff
this feature unless spatial scale factors are 2.
Added svc unittest for 2 spatial layers with 5x5 scaling,
which generates the issue without this fix.
Also fix some settings in svc unittest:
let the speed setting vary (from 5 to 8), and enable static threshold.
BUG=webm:1344
Change-Id: Idfd0a6c633c21b49a0479601506302cfe974e30e
Set #threads to default 1 for all streams, change bit allocaton
for 3 temporal layers, and enable denoiser on middle resolution layer.
Change-Id: I4a57adbfdb2c319002b8f3cf359613842dc00d75
after:
2d3d95f enable vpx_idct16x16_256_add_neon in hbd builds
reorder INCLUDEs and fix indent of IF/ENDIFs
remove vpx_config.asm to avoid multiple symbol definitions in windows
builds and shift idct_neon.asm.S to the top to allow use of
CONFIG_VP9_HIGHBITDEPTH in the export list.
Change-Id: I0dacfbae62a6ec8fe4a26940c1a52da2dfad2029
One of the first pass stats "new_mv_count" is no longer used in VP9,
and is removed. This also makes it easy to implement a multi-threaded
first pass. This change doesn't affect the coding performance, which
has been verified by borg tests.
Change-Id: I4c7c7bf9465fda838eb230814ef0c631c068c903
1. Use correct projections when copying real dct/quant outputs.
2. Remove local random number generator and combine loops.
3. Quantization with minimum allowed step sizes instead of maximum.
This may generate larger inputs.
Change-Id: I154afc26230c894d564671cff4b8fd5485b69598
vpx_idct16x16_256_add_neon_pass1, vpx_idct16x16_10_add_neon:
this was a constant 8 in all cases meaning the results are stored
contiguously, this allows the number of stores to be reduced.
Change-Id: I7858a0a15a284883ef45c13dfd97c308df9ea09e
Use the segment weight factor based on the target (cr->percent_refresh)
if it less than the current estimate (avergae of past usage and target).
Small improvement at low bitrates.
Change-Id: Iba8fd909e203f94458901366d3a991f7ea854d49
the expansion of findstring and rtcd_dep_template_CONFIG_ASM_ABIS needs
to be deferred until the block is parsed as makefile syntax rather than
eval time where rtcd_dep_template_CONFIG_ASM_ABIS will be unset. this
ensures vpx_config.asm is properly created.
Change-Id: I7c38c6c082da78397936467482789dd468adc316
* changes:
Update vpx_idct4x4_16_add_neon() to pass SingleExtremeCoeff test
Refine 8-bit 4x4 idct NEON intrinsics
Add idct speed test.
Update partial_idct_test.cc to support high bitdepth
force this to be created before any other .S files. this change
additionally removes the file from the source list as it doesn't need to
be compiled on its own.
Change-Id: I6b4cd56ef6059d08f75f06fb749cddf76e0e165e
Increase the motion threshold and qp-delta for segment#2 boost.
This can increase the frame-drop at low bitrates, but generally
better spatial quality.
Only affects real-time mode with aq-mode=3, at very low bitrates.
Change-Id: I5ccb784667f70d0c27d369806b93b1f93d5605d1
For some filter level, the C/MSA doesn't match SSE2. Part of unit tests
are disabled. They will be re-enabled when C/MSA funcs are fixed.
BUG=webm:1321
Change-Id: Ib16b98b5eecb15d2252aa4ea267b782ee2b27533
replace downloads.webmproject.org with the canonical
storage.googleapis.com/... form. this appears less likely to fail when
dealing with multiple concurrent connections.
Change-Id: I0dcbd04df9e4057fa851f458b3ef7e3589f1f2f1
best_sub8x8[1] won't be used meaningfully when is_compound is false, but
may trigger an msan warning as the value is copied around and later
clamped.
BUG=667044
Change-Id: Icc24c3b72cdb550bebea44d4aaa4ff8bf3fbab56
Use the same feature as https://chromium-review.googlesource.com/#/c/411327/,
but allow it to be used for speed = 6 and 7, where
short_circuit_low_temp_var = 1.
Speed up of ~2-3% for speed 7, with little/no loss in compression.
Change-Id: I263a0f261ad9929034392d68f0153dc6376fdb5f
Remove unnecessary "virtual" before some functions. Change *_btm_* in
variable names to *_bottom_*.
Change-Id: Ifd4ce667537617f451cdfed47dd8c48817fd983b
VpxEncoderThreadTest was taking a very long time for some runs and
timing out a lot. This is an attempt to split the test into runs
that can be run nightly ( speeds 2 through 9) and runs that can
be run weekly ( speeds 0-1 ).
Change-Id: Iee6f61a561006d3a30381dd3b52b9a4dce07a70c
This is a boolean value that is written into bitstream, any value other
than 0 or 1 could have led to unexpected behavior. This commit fix the
issue by adding validation of the value to make sure it is boolean.
BUG=webm:1339
Change-Id: I2d3e69e8dbefcab9a0db9cb39a91a40ce531c5a1
fixes reloc errors like:
R_X86_64_PC32
vpx_dsp/x86/deblock_sse2.o:
requires dynamic R_X86_64_PC32 reloc against 'vpx_rv' which may overflow
at runtime
Change-Id: I218fc0e7c8258197f890d395f335e5a4fe82dccb
tests with 'Large' in the name are reserved for slow running tests which
may not be run on all platforms
Change-Id: I2a7d6dd46b29b50469893e46433844132fb727c2
This runs multiple encodes and decodes of vp8 and vp9 in parallel,
with so many threads that problems with synchronization can show up.
Change-Id: I2b297e7f43d1e741323c7ad9f50a3931ae609f16
Add a new, more aggresive short circuit: short_circuit_low_temp_var = 3 to skip
golden of any mode when variance is lower than threshold for low res.
This change only affects speed = 8, low resolution.
Metrics for avgPSNR/SSIM on rtc_derf (low resolution) show loss of
0.27/0.31%.
On Nexus 6, the encoding time is reduced by ~2.3% on average across all
low-res clips.
Visually little change on rtc_derf clips.
Change-Id: Ia8f7366fc2d49181a96733a380b4dbd7390246ec
this removes the need for __STDC_LIMIT_MACROS which is defined in
vpx_integer.h, but may be preceded by earlier includes of stdint.h;
fixes build with the r13 ndk
Change-Id: I3950c8837cf90d5584a20ce370ae370581c2182c
avoids the definition of min/max macros in headers that may appear in
c++ unit tests. the codebase uses VPXMIN/MAX for this purpose in any
case
Change-Id: I2b679b045d64fb34fd8780f704e3caf10a758d82
use 'android/cpufeatures' rather than 'cpufeatures'; this matches the
documentation, fixes compilation with r12b/r13 and still works with
older ndks.
Change-Id: I2f34233c164e6d4d46428f8905d5502cea4288a2
Changes only affects speed = 8 for low resolutions.
Metrics for avgPSNR/SSIM on rtc_derf (low resolutions) show loss of
0.5/0.6%.
On Nexus 6, the encoding time is reduced by ~5.9% on average across all
low-res clips.
Visually little/no change on rtc_derf clips.
Change-Id: I68dd50e558d72dcc1af8317d224bfae5e3bd872d
This commit enables asymptotic closed-loop encoding decision for
the key frame and alternate reference frame. It follows the regular
rate control scheme, but leaves out additional iteration on the
updated frame level probability model. It is enabled for speed 0.
The compression performance is improved:
lowres 0.2%
midres 0.35%
hdres 0.4%
Change-Id: I905ffa057c9a1ef2e90ef87c9723a6cf7dbe67cb
this was enabled in:
3ae2597 idct,NEON: add a tran_low_t->s16 load adapter
+ enable it for all NEON configs, both intrisincs and assembly versions
exist
BUG=webm:1294
Change-Id: I339088b2a398200f95658d040034fb9b2a7c8ce0
usage of the vp8 versions was removed in:
3f72509 vp8: remove VP8_SET_DBG* control support
vp9 had the usage stripped even earlier.
Change-Id: I978142eb6492552cd29c9c6feb1e89acfc5f7b84
this was enabled in:
3ae2597 idct,NEON: add a tran_low_t->s16 load adapter
+ enable it for all NEON configs, both intrisincs and assembly versions
exist
BUG=webm:1294
Change-Id: Iaade219e9d1de7b69423670d3ea6271b0965e068
idct4x4 and idct8x8 were universally enabled for high-bitdepth builds
in:
3ae2597 idct,NEON: add a tran_low_t->s16 load adapter
BUG=webm:1294
Change-Id: If142afb169c48728cc4b222e7c41aa4a63f95f0f
replace load_and_transpose_s16_8x8() in idct32_6_neon() with a separate
load_tran_low_to_s16() and transpose_s16_8x8(). the combined function is
used in idct32_8_neon() where the input is the correctly sized output
from the earlier stage.
BUG=webm:1294
Change-Id: I4257c4b3a421b2cf5d13651f966eee0680ef98a9
For noisy content, be more aggressive in skippping some blocks for
delta-qp to reduce noise pulsing artifact. Also treat frame boundary
case when dimension is not multiple of superblock size/64.
Only affects non-screen content case, and when source noise
is measured to be high (at least level kMedium).
Change-Id: Ib13a2a20ed1ce37ff3c44d95c3ef2635fd695222
This uses the same sdx4df pointers as vp8_diamond_search_sadx4 and
should therefore target the same optimizations.
See e4ddf9db6a
Change-Id: Ic298e9b25c34bbe6b7a0799509355b0addb56675
The matching on ads2gas_apple.pl is too liberal and catches
CONFIG_EXTERNAL_BUILD and CONFIG_INTERNAL_STATS because they have RN in
the names.
The RN renaming feature is not used in any existing assembly files. It
was used in some armv6 files but they were removed.
Change-Id: Ib65abf1947d3e89f0d1584e2a5de399d24008f95
Two functions do not pass this test:
vpx_idct8x8_64_add_ssse3
vpx_idct8x8_12_add_ssse3
The test has been modified to avoid triggering an issue with those
functions but they still must be investigated.
BUG=webm:1332
Change-Id: I52569a81e8e6e0b33c4a4d060d0b69c3fc4f578e
Add condition that usable_ref_frame > LAST.
This is to avoid potentially skipping all last-nonzero mv modes,
if golden is used as a reference but skipped completely for the
current block.
This has no effect currenty, as we always consider testing golden
mode for each block.
Change-Id: I3182cf44664081935a90ed43aa7b32e710e60e22
Fixed formatting bug introduced by the fix to BUG=webm:1322
( Iedc4477aef1746aa0a4f84d88a1156296fd3ba87)
Change-Id: I715ee446c0e8584967ab87ba4e355759dd394187
vp9_init_macroblockd() resets the error_info to cm's global copy; this
needs to be set to the thread-level target to avoid jumping to the
incorrect stack, resulting in hang or crash.
broken since:
1f4a6c8 vp9/tile_worker_hook: add multiple tile decoding
includes v1.5.0, v1.6.0
BUG=629481
Change-Id: Icbf1696b25ba8c479e845fbf227b3c3ca73542f5
vpx_config.asm and idct_neon.asm.S are required since:
3ae2597 idct,NEON: add a tran_low_t->s16 load adapter
Change-Id: If5959a25edb370dd7dcdca71c96e9a5aad0840ce
enable idct4x4* and idct8x8* which are compatible for 8-bit decodes in
high-bitdepth mode. the adapter narrows 32-bit input to 16, whether the
expansion can be avoided at all in this case remains a TODO. roughly
matches sse2.
BUG=webm:1294
Change-Id: I3ea94e5a2070dfd509b5de0c555aab4e1f4da036
A past patch made it so that every frame that had a decode error
caused a corrupted frame to be counted. Unfortunately it was possible
to get both a decode error and a corrupt frame for the same frame
and thus double count an error. This code makes that impossible.
Change-Id: Iea973727422a3bf093ffda72fa358a285736048b
Permits skipping 0, 1/2 or 3/4 of the frames, corresponding to
temporal layers 2, 1 and 0 of a 3-temporal-layer encoding. 1/2
corresponds to TL0 in a 2-layer encoding.
Change-Id: I7f6d131f63707e5262fc67d111bfb3a751ede90d
Allow for passing in the layer bitrates at command line.
Fix to allow passing in bitrate for each spatial-temporal layer.
Change to some default values for 1 pass cbr mode:
spatial scale and qp-max/min.
Small fixes to some build warnings.
Change-Id: I3f9a776262712480a6570bb863a835b2fc49935a
This change is a step in a larger change to the way boost and interval are
determined for ARF and Key frames.
This patch contains some pluming for the general case but focuses on the
key frame boost calculation. This now relies more heavily on the rate at
which the error score increases between the primary and secondary reference
frame. This seems to be less fragile when dealing with different frame sizes.
For example larger image formats tend in the first pass to see a higher
% of intra coded blocks and the use of this number in calculating the frame
decay factor was leading to much lower boost numbers for 4K, for example,
than the same clip coded at 2K.
This change does give overall gains but they are MUCH larger for the 4K Netflix
set. For the 4K Netflix set the average gain is around 3% with some clips > 20%
whereas for the same set at 2K the average gain is 0.5-1%.
In general for small image formats the boost is most often reduced a little whereas
4K clips the boost is increased. There are some -ve cases such as Akiyo at 352x288
where the reduced boost hurts the metrics, especially for SSIM, even while
the set as a whole improves. This is most notable at very low Q and may be the
subject of a future patch.
Some common code for KF and ARF was separated in this patch for the purposes of
tuning but may later be re-merged if appropriate.
Change-Id: Iaa15ac5a58d2be89181100d95cef6a8dc4b12d0d
Fixes a case where recode is not triggered based on the value
of maxq passed into the recode loop test function.
BUG=b/32375284
Change-Id: I15ad985d0525c68e0443cfaf842440d2754b2266
The result of the transform is added to the destination buffers. In the
existing tests the destination buffer is always empty so that portion of
the code was never exercised.
Change-Id: I1858c4fed2274f1b9faf834d2ba4186a4510492a
Switch to using correctly sized inputs and outputs. This simplifies
adding tests with varying strides.
Change-Id: I716a0d8173dcf6a86d56656ac9d3101b7ec27642
Removed a couple of adjustments that no longer move the needle
much but complicate the process of tuning.
Change-Id: Ie320f5cf155e6aac14a4757ea9ada2cd59f27590
Modified the encoder multi-thread test so that it included cpu-used=0 and
frame-parallel=0.
frame_parallel_decoding_mode is 1 by default, which disables probability
updating and gives lower encoding quality. Current VP9 multi-threading
encoder and decoder support probability updating. To test this part, we
should turn on it in the unit test, namely, setting frame-parallel to 0.
Change-Id: Ia1f86e01f0de628f50d819ae31509de3e1b6c755
This patch modified the motion search counts used in:
https://chromium-review.googlesource.com/#/c/305640/
These 2 counts were originally added as thread data, and used to
make decisions in motion search. The tile encoding order can be
inconsistent while using different number of threads, which can
cause bitstream mismatch. Here moved them to tile data to solve
the issue.
BUG=webm:1322
Change-Id: Iedc4477aef1746aa0a4f84d88a1156296fd3ba87
dst += stride behaving better with gcc/clang
Expanding inline function dc_SIZExSIZE() save intructions for
vpx_dc_predictor_SIZExSIZE_neon().
Change-Id: Id0ccbd58b6a31df539141fd33bdf28633339150d
A failure to decode is most likely equivalent to a corrupt
frame for the purpose of returning a failure.
Change-Id: Ie53db2b8130b40b725841f5f7a299d63aa56913d
Re-use the tile worker threads to pack the bitstream in parallel
on a per-tile basis. Restricting this to real-time only for now
(further testing is needed to ensure this does not make 2-pass
worse in any case).
BUG=webm:1309
Change-Id: I8a80da7c5089b837d0df79a5c49d5e3022dfc8ec
In variance partition low resolutions may use varianace based on
4x4 average for better partitioning.
Increase the threshold for doing this at speed = 8.
Improves speed by ~5%, with little loss, < 1%, on RTC_derf set.
Change-Id: Ib5ec420832ccff887a06cb5e1d2c73199b093941
the intrinsics are neutral to ~20% faster on cros/android
devices when using gcc-4.9/clang-3.8.1 and gcc-4.9/clang-3.8.x from the
r13 ndk. neutral results typically came with gcc-4.9 while larger
positive gains were achieved with clang 3.8.x.
BUG=webm:1303
Change-Id: I4d31f9c017944681b881493525d4573a7a5b1e16
Control already exists for vp9, adding it to vp8.
Usage is only when error_resilient is off.
Added a datarate unittest for non-zero boost.
Change-Id: I4296055ebe2f4f048e8210f344531f6486ac9e35
This reverts commit 9e8efa5b18.
this change causes ubsan warnings, failures in
vpxenc_vp9_webm_rt_multithread_tiled
BUG=webm:1309
Change-Id: I020c7be985c771bfff4b3de1afe51cc8edb980da
git log --no-merges 32d5ac4..9732ae9
9732ae9 EbmlElementSize: quiet uint64->int32 conv warning
da04eba SetProjectionPrivate: quiet uint64->size_t conv warning
6db32d5 mkvparser,Projection::Parse: fix int->bool conv
3bb0dfa cosmetics: fix a couple lint warnings
0e179d6 update .clang-format
fc5f88d Fix temp files being left on system.
c04a134 Add support for overriding PixelWidth and PixelHeight.
c0160e0 Add support to explicitly set segment duration.
02bc809 Add support to estimate file duration.
c97e3e7 Add support to output sub-sample encryption information.
26f4344 MakeUID: quiet unused param warning in Android builds
d6af52a Change check to fix compile error.
1720020 webm_parser: Add Mesh value for ProjectionType
78f2c5a webm_parser: Use ./ prefix for includes
da62f65 webm_parser: Remove webm/ prefix from public includes
e15e8f2 webm_parser: Update README build instructions
5023f2b mkvmuxer: Fix Colour::Valid()
cf16204 mkvmuxer_tests: Actually test cue points in the cue point test.
93e9fb3 Validate Colour element values.
8036925 mkvparser_tests: Add Projection element test.
f52d38c mkvparser_tests: Add Colour element test.
826436a mkvparser: minor SeekHead::Entry clean up.
24fb44a mkvmuxer_tests: Add Projection element test.
1e0a8ea mkvmuxer_tests: Add Colour element test.
0278616 mkvmuxer: Colour accessors/mutators.
2346f8f Add mkvparser wrapper functions.
54d6b6b webm_info: Add Projection element support.
65fee06 mkvmuxer_sample: Add support for Projection element.
9a3f2b5 mkvparser_sample: Add support for Projection element.
41e814a mkvparser: Add Projection element support.
483a0ff mkvmuxer: Add Projection element support.
676a713 Add support for the Projection element
725f362 mkvmuxer: Fix memory leak when Colour is set multiple times.
fa182de mkvparser_sample: Add output of audio track codec private size.
8f521f2 mkvparser_tests: Add invalid BlockGroup test.
39137d7 Remove docs saying binary elements default to 0
80685d3 Do not skip over unknown elements at the root level
c147504 Fix legacy Makefile.
58711e8 mkvparser_sample: Fix version info string.
837746f mkvparser_tests: Add invalid block test.
207cd80 Disambiguate sample sources and targets.
a112d71 mkvparser_tests: Refactor invalid file loading code.
5dea33e Disambiguate test source and target names.
125049e parser_tests: Add another truncated chapter string test.
1de8d4c parser_tests: Add truncated chapter string test.
ff8c2b6 parser_tests: Move cue validation to test_util.
4b0690f parser_tests: Add invalid lacing test.
9828e39 mkvmuxer: Set default doc type version to 4.
5495a59 webm_parser: Reference more files in CMakeLists.txt.
0c0ecd0 vpxpes_parser: Add start code emulation prevention support.
639a4bc webm2pes: Remove debug printfs().
9a51102 webm2pes: fflush() in the correct conversion function.
dc7f155 webm2pes: Track total bytes written.
d518128 webm_parser: Enable usage of werror.
e1fe762 webm2pes: Add test for mux/demux of large input.
1b24a79 vpxpes_parser: Read and store PTS when present.
6cf0a0f vpxpes_parser: Store frame payloads.
25d2602 webm_parser: Convert style to match the rest of libwebm
24be76d webm2pes: Replace VpxFrame with VideoFrame.
b451c3b Add a basic video frame storage class.
05c90eb libwebm_util: Clarify error text in superframe parser.
e6415af webm2pes: Make WritePesPacket() a public method.
8f840dd webm2pes: Move frame read out of PES packet write method.
448af97 webm2pes: Restore frame fragmentation support.
f8bb714 cmake: Integrate new parsing API and tests.
cb8ce0b Add a new incremental parsing API
900d322 vpxpes_parser/webm2pes: BCMV and PTS fixes.
4b73545 webm2pes: Add start code emulation prevention.
82903f3 Add column tiles and frame parallel to webm_info
5d91edf style_clean_up: Remove unnecessary parentheses
a95aa4b vp9_level_stats: correct total_uncompressed_bits_ calculation
f46566f mkvreader: Fix shorten-64-to-32 warning in 32 bit builds.
76630ca mkvwriter: Fix shorten-64-to-32 warning in 32 bit builds.
a8ffbd4 webm2pes: Fix format specifier warnings.
faf89d4 Add MaxLumaSampleRate grace percent to stats.
d31e6c9 Fix profile 2 in vp9_header_parser.
bd3ab3a Add flag to estimate last frame's duration to stats.
c182ed9 Fix lint issue in hdr_util.h
cc62ecd Add test for Cluster memory leak
196708a Change MaxLumaSampleRate to be based on frame resolution.
cbd676b mkvmuxer: Fix leak when a Cluster isn't finalized
9a235e0 mkvmuxer: Set doctype to matroska when muxing non-WebM codecs.
47f2843 Add parsing support for new features in CodecPrivate.
e3c9576 Add VP9 level output to webm_info.
5cf549f cmake: Log compiler flag at check time.
bbaaf2d Add class to gather VP9 level stats.
8bb68c2 Add file to parse data from VP9 frames.
296429a Add support to parse VP9 profile.
df3412f Add support for setting VP9 profile and level to sample_muxer.
87832d4 mkvmuxer: Fix Segment::Finalize in kLive mode
6df3e56 mkvmuxerutil.hpp: Add using directives for overloaded size utils.
ec47928 mkvmuxerutil: Revert to using mkvmuxertypes.
a1dc4f2 Fix parsing of VP9 level.
4e3d037 Add support to output Colour elements to webm_info.
d3656fd muxer_tests: ignore iwyu re gtest-message.h
e76dd5e Fix file name in mkvmuxertypes shim.
1be5889 Add temporary include shims at old file locations.
039df94 Add TEST_TMPDIR environment variable
Change-Id: I84bc1401b0aad71ad6727b687f1bede9953a7a08
While we are at it:
- Rename some variables to more meaningful names
- Reuse some common consts from a header instead of redefining them.
Change-Id: I75c4248cb75aa54c52111686f139b096dc119328
(cherry picked from aomedia 09eea21)
Add stronger condition for splitting 64x64, for low noise content.
This reduces dragging artifact near moving head.
Little/no change in metrics on RTC set.
Change-Id: I39b38cfd20f2ece53ff49c2aaf76ba9f82761be1
fc5f88d Fix temp files being left on system.
c04a134 Add support for overriding PixelWidth and PixelHeight.
c0160e0 Add support to explicitly set segment duration.
02bc809 Add support to estimate file duration.
c97e3e7 Add support to output sub-sample encryption information.
26f4344 MakeUID: quiet unused param warning in Android builds
d6af52a Change check to fix compile error.
1720020 webm_parser: Add Mesh value for ProjectionType
78f2c5a webm_parser: Use ./ prefix for includes
da62f65 webm_parser: Remove webm/ prefix from public includes
e15e8f2 webm_parser: Update README build instructions
5023f2b mkvmuxer: Fix Colour::Valid()
cf16204 mkvmuxer_tests: Actually test cue points in the cue point test.
93e9fb3 Validate Colour element values.
8036925 mkvparser_tests: Add Projection element test.
f52d38c mkvparser_tests: Add Colour element test.
826436a mkvparser: minor SeekHead::Entry clean up.
24fb44a mkvmuxer_tests: Add Projection element test.
1e0a8ea mkvmuxer_tests: Add Colour element test.
0278616 mkvmuxer: Colour accessors/mutators.
2346f8f Add mkvparser wrapper functions.
54d6b6b webm_info: Add Projection element support.
65fee06 mkvmuxer_sample: Add support for Projection element.
9a3f2b5 mkvparser_sample: Add support for Projection element.
41e814a mkvparser: Add Projection element support.
483a0ff mkvmuxer: Add Projection element support.
676a713 Add support for the Projection element
725f362 mkvmuxer: Fix memory leak when Colour is set multiple times.
fa182de mkvparser_sample: Add output of audio track codec private size.
8f521f2 mkvparser_tests: Add invalid BlockGroup test.
39137d7 Remove docs saying binary elements default to 0
c147504 Fix legacy Makefile.
80685d3 Do not skip over unknown elements at the root level
58711e8 mkvparser_sample: Fix version info string.
837746f mkvparser_tests: Add invalid block test.
207cd80 Disambiguate sample sources and targets.
a112d71 mkvparser_tests: Refactor invalid file loading code.
5dea33e Disambiguate test source and target names.
125049e parser_tests: Add another truncated chapter string test.
1de8d4c parser_tests: Add truncated chapter string test.
ff8c2b6 parser_tests: Move cue validation to test_util.
4b0690f parser_tests: Add invalid lacing test.
9828e39 mkvmuxer: Set default doc type version to 4.
5495a59 webm_parser: Reference more files in CMakeLists.txt.
0c0ecd0 vpxpes_parser: Add start code emulation prevention support.
639a4bc webm2pes: Remove debug printfs().
9a51102 webm2pes: fflush() in the correct conversion function.
dc7f155 webm2pes: Track total bytes written.
d518128 webm_parser: Enable usage of werror.
e1fe762 webm2pes: Add test for mux/demux of large input.
1b24a79 vpxpes_parser: Read and store PTS when present.
6cf0a0f vpxpes_parser: Store frame payloads.
25d2602 webm_parser: Convert style to match the rest of libwebm
24be76d webm2pes: Replace VpxFrame with VideoFrame.
b451c3b Add a basic video frame storage class.
05c90eb libwebm_util: Clarify error text in superframe parser.
e6415af webm2pes: Make WritePesPacket() a public method.
8f840dd webm2pes: Move frame read out of PES packet write method.
448af97 webm2pes: Restore frame fragmentation support.
f8bb714 cmake: Integrate new parsing API and tests.
cb8ce0b Add a new incremental parsing API
900d322 vpxpes_parser/webm2pes: BCMV and PTS fixes.
4b73545 webm2pes: Add start code emulation prevention.
82903f3 Add column tiles and frame parallel to webm_info
5d91edf style_clean_up: Remove unnecessary parentheses
a95aa4b vp9_level_stats: correct total_uncompressed_bits_ calculation
f46566f mkvreader: Fix shorten-64-to-32 warning in 32 bit builds.
76630ca mkvwriter: Fix shorten-64-to-32 warning in 32 bit builds.
a8ffbd4 webm2pes: Fix format specifier warnings.
faf89d4 Add MaxLumaSampleRate grace percent to stats.
d31e6c9 Fix profile 2 in vp9_header_parser.
bd3ab3a Add flag to estimate last frame's duration to stats.
c182ed9 Fix lint issue in hdr_util.h
cc62ecd Add test for Cluster memory leak
196708a Change MaxLumaSampleRate to be based on frame resolution.
cbd676b mkvmuxer: Fix leak when a Cluster isn't finalized
47f2843 Add parsing support for new features in CodecPrivate.
9a235e0 mkvmuxer: Set doctype to matroska when muxing non-WebM codecs.
e3c9576 Add VP9 level output to webm_info.
bbaaf2d Add class to gather VP9 level stats.
5cf549f cmake: Log compiler flag at check time.
8bb68c2 Add file to parse data from VP9 frames.
df3412f Add support for setting VP9 profile and level to sample_muxer.
296429a Add support to parse VP9 profile.
87832d4 mkvmuxer: Fix Segment::Finalize in kLive mode
6df3e56 mkvmuxerutil.hpp: Add using directives for overloaded size utils.
ec47928 mkvmuxerutil: Revert to using mkvmuxertypes.
4e3d037 Add support to output Colour elements to webm_info.
a1dc4f2 Fix parsing of VP9 level.
039df94 Add TEST_TMPDIR environment variable
d3656fd muxer_tests: ignore iwyu re gtest-message.h
e76dd5e Fix file name in mkvmuxertypes shim.
1be5889 Add temporary include shims at old file locations.
Change-Id: I6a1026814560be80d604a5ecb9b66406a1186dd9
Re-use the tile worker threads to pack the bitstream in parallel
on a per-tile basis. Restricting this to real-time only for now
(further testing is needed to ensure this does not make 2-pass
worse in any case).
BUG=webm:1309
Change-Id: Ia2c982da56697756e12f02643f589189b3271d98
Fix unit_tests_ubsan failure for VP8/DatarateTestLarge.DenoiserOffOn.
Failure was triggered by commit: df66f8e8.
Change-Id: I7cc5bd309e85950cfc5755e01d0eb942d9ca6984
For 1 pass vbr real-time mode:
Allow for the usage of alt-ref frame when non-zero lag-in-frames is used.
Use non-filtered alt-ref, and select usage based on fast scene/content
analysis/detection within the lag of frames.
Positive gains on ytlive set: overall avgPSNR ~3-4%.
Several clips are up between 5-14%, a few clips are neutral/small change.
Current speed decrease is about ~5-10%.
Use the flag USE_ALTREF_FOR_ONE_PASS to enable this feature
(off by default for now).
Change-Id: I802d2bf3d44f9cf01f6d15c76be9c90192314769
Put limit on gf interval based on lag, and allow
for the adjustment on next gf group also on key frame.
Small/neutral change on ytlive metrics.
Change only affects 1 pass vbr real-time mode.
Change-Id: I339c8f4398848698b6e10fe9482c52ca661b94a5
Updated code to process in 8bit as saturation/clipping takes care of
overflow
Removed unused macro
Change-Id: I113df60286fb28b216df800d95b2d3695ef71440
In 1 pass CBR, with error_resilience off, allow for
special logic to change the default gf behaviour.
In this CL: boost is turned off and the gf period
is set to a multiple of cyclic refresh period.
Change only affect 1 pass CBR mode, i.e, when the flag
gf_update_onepass_cbr is set.
Including the previous change (3ec8e11: to allow cyclic refresh
for error_resilience off), comparing metrics on RTC set for
error_resilience off vs on: avgPSNR/SSIM up by ~6%.
Change-Id: Id5b3fb62a4f04de5a805bd1b418f2b349574e0bc
Due to change in command line to sample encoder from:
7eff8f3 Update to vpx_temporal_svc_encoder command line.
This caused the tests in vpx_temporal_svc_encoder.sh to fail.
Change-Id: Ic667da81955ad117d04610af21877fed1d4f188f
these are compatible as they only load one element of the input so the
larger size of tran_low_t makes no difference in little endian builds.
note the asm is incompatible with big-endian, but there are other points of
failure there so currently it's considered unsupported.
BUG=webm:1294
Change-Id: Icd2665a0699bccae92d1bea43a95b0a83fb17028
It only handles the realloc constraint (preserving low elements) by
serendipity, and we don't actually rely on that behavior anyway.
Meanwhile the calls may do extra copying that gets immediately clobbered
by the callers.
Change-Id: I8dfa89e4a81084b084889c27bd272fdf85184e8d
vp8_short_inv_walsh4x4_msa - Optimized to process in short vector type
Updated below functions to store exact number of bytes in output rather than complete vector
idct4x4_addblk_msa
idct4x4_addconst_msa
dequant_idct4x4_addblk_msa
dequant_idct4x4_addblk_2x_msa
dequant_idct_addconst_2x_msa
Change-Id: Ic1b3752e2421dc7d70a082dcdaab9d140d7e5d9c
cyclic_refresh was tied to error_resilience mode.
Allow it to be on also for 1 pass CBR mode even if
error_resilience is off.
Other option to use new control for this, but prefer to avoid
that for now.
Change-Id: I3625b292ee059a890e31338b514e211bf0ab5c3e
This change will make the highbd txfm input range check more comprehensive
The 25-bit highbd input range is composed by
12 signal input bits + 7 bits for 2D forward transform amplification + 5 bits for
1D inverse transform amplification + 1 bit for contingency in rounding and quantizing
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1286
BUG=https://bugs.chromium.org/p/chromium/issues/detail?id=651625
Change-Id: I04c0796edd7653f8d463fba5dc418132986131e7
These caused the following warning with GCC 5:
warning: logical not is only applied to the left hand side of
comparison [-Wlogical-not-parentheses]
assert(!is_compound == (cm->reference_mode == SINGLE_REFERENCE));
Change-Id: If296aabb2311ceb7d903b395c1549ef81c2cbf9b
(cherry picked from commit c6cf7a6111)
these functions are incompatible currently and unreferenced in rtcd,
exclude them from the build.
BUG=webm:1294
Change-Id: I7790c195a91e1b142f56c04d2a5e305d9133b896
Rename vpx_lpf_horizontal_edge_8() to vpx_lpf_horizontal_16().
Rename vpx_lpf_horizontal_edge_16() to vpx_lpf_horizontal_16_dual().
Change-Id: I798ca8fbbd657d06d3db2bfb0fb3321168f49e52
This patch sets the 16x16 src_diff to zero and ensures correct calculation
of this_error for block sizes smaller than 16x16.
Change-Id: I7b7c02d267433c9f22c8ac9b8d5df2f499175172
change_config() may be called often in real-time application,
to update bitrate/framerate or qp-max/min.
No need to do update_frame_size() unless frame size has changed.
Change-Id: I23a51deade1e03adc91c468f9ffde3235298770c
For real-time mode at speed 8: turn off MINIMAL_LF at speed 8,
for non-screen content mode.
Visually better, avgPSNR/SSIM on rtc set go up by ~4-5%.
Speed decrease of about ~3%.
Change-Id: I8eb69330f02e0ceece1507d43cfc8a049a1d8291
to get_binary_prob(). the only other caller mode_mv_merge_probs() does
its own test on 0.
BUG=chromium:639712
Change-Id: I1178688706baeca2883f7aadbc254abb219a44ce
+ inline the function directly as there was only one consumer
(get_prob())
this is an attempt to reduce the amount of branches to workaround an amd
bug. this change is mildly faster or neutral across x86-64, arm.
http://support.amd.com/TechDocs/44739_12h_Rev_Gd.pdf
665 Integer Divide Instruction May Cause Unpredictable Behavior
BUG=chromium:639712
Suggested-by: Pascal Massimino <pascal.massimino@gmail.com>
Change-Id: Ia91823aded79aab469dd68095d44300e8df04ed2
Inline function called from test/dct16x16_test.cc wouldn't build due to:
invalid operands of types ‘__gnu_cxx::__enable_if<true, double>::__type
{aka double}’ and ‘int’ to binary ‘operator>>’
return (abs(ref->row) >> 3) < COMPANDED_MVREF_THRESH &&
this converts the test to abs() < COMPANDED_MVREF_THRESH << 3 which
hides the promotion issue.
Regression from commit de993a847f
BUG=webm:1291
Change-Id: I73b5943d07d5b61b709d299114216a2371a8fd62
provides msvc-like warnings for implicit conversions from 64-bit to
32-bit types
--enable-vp9-highbitdepth still requires some work
this also skips CXXFLAGS for now as some work would be needed to cleanup
third_party/*.cc or split it from test/*.cc where it comes to flags.
Change-Id: Ic9a095b73286eba5ed39bfc27ff69593748cbbf4
The original commit never set any 'specialize' line:
61311e6103
It appears the sadx4 version of function uses sdx4df calls to speed up
the search. There are no sse3 versions of the sdx4df functions, but
there are sse2 and msa versions.
There is a neon version of vpx_sad16x16x4d but not any of the smaller
versions. Perhaps if they existed this function could be expanded to use
them.
Change-Id: I936d7d6b1a3ff6dcd5a4d2322272708c47cdec13
* changes:
Expand -Wextra to more of the library
mips: clean up wextra warnings
Add compiler flag -Wsign-compare
Add compiler warning flag -Wextra and fix related warnings.
Remove unused zbin variable:
warning: unused parameter ‘zbin’
Use int for loop variables to avoid unsigned conversion:
warning: comparison between signed and unsigned integer expressions
Change-Id: Icea74b870c0ee68a8bf687e796a69392af25a8ad
Also, fix the warnings generated by this flag.
(cherry picked from commit ebeb1155d4fa6d28e2f40c92265245f8df097fcb)
From AOM. Don't actually add -Wsign-compare. It will be covered by
-Wextra.
Switch to vpx_integer.h from df9c9d6d4c43f02c58d4e776c53323788e013cbc
BUG=webm:1069
Change-Id: I1dc6e61caa5d56af4a55b6692ab620bb3144652a
Note: some of these warnings are enabled by a combination of -Wunused
(added earlier) and -Wextra.
Cherry-picked from AOM 4790a69faaec8f03d65f64ff070f6ab4307dbb16
Expands use of (void)x; on unused variables. AOM only supports one codec
in codec_factory.h
Does not include changes to HandleDecodeResult. AOM removed
invalid_file_test.cc which does use the video parameter.
Does not enable -Wextra yet. There are more issues to fix.
BUG=webm:1069
Change-Id: I322a1366bd4fd6c0dec9e758c2d5e88e003b1cbf
Current version does not build with options:
--enable-vp9-highbitdepth --enable-coefficient-range-checking
Change-Id: Ic3285f1a3e0d6be88da7f2cd8fa5a631368dd03b
Reduce the filt_guess for 1 pass cbr on inter-frames.
This reduces visual artifact seen in rtc clip (jimred.vga),
and improves metrics on rtc set.
Metrics on rtc set for cbr mode overall positive, most clips are up:
Speed 7 rtc: avgPSNR/SSIM up by: ~2.6/3.9%
Speed 8 rtc: avgPSNR/SSIM up by: ~1.3/2.5%
Change-Id: Ia4eccea1c19d65b583516df28823cd756c49464f
Added a cap on the maximum boost for an arf based on interval length.
Fixed bug where by the image size was not accounted for in determining
two of the motion breakout thresholds.
Overall small gains of 0.2-0.4% psnr but on large image format clips with
slow zooms the gain may be as much as 20% or more (e.g. in_to_tree
at 1080P)
Change-Id: Id0a47391203026742daa9c97afac5705fd8c4dfb
The value 35468 changes sign when stored in int16_t:
implicit conversion from 'int' to 'int16_t' (aka 'short')
changes value from 35468 to -30068
This negation requires adding back the original value to compensate.
Shifting the value keeps the value positive and saves a post-vqdmulh
shift.
This technique is used in webp and idct_dequant_full_2x_neon
BUG=b/28027557
Change-Id: I0c5ce09bea170fe08061856c2af6f841a557e0c3
This restores d9dce2f48e
Switched to using signed shift-and-narrow. Instead of saturating
negative results to 0, it was saturating them to 255.
BUG=webm:817
BUG=webm:1273
Change-Id: I571095336aa4182e3288b17924fcaaece42b0a49
removes some unnecessary casts and adds a few explicit uint32 ones for
larger sizes to quiet -Wshorten-64-to-32 warnings
Change-Id: I63c5fce8e62c426d5cf5c10a66a113c119a43518
Do nothing in vp9_highbd_iht#x#_##_add_c when input magnitude is beyond
20 bits. Note that, sign bit is not included here.
In the 20 bits, we use 12 bits for input signal, 7 bits for forward
transform amplification, and 1 bit for contingency in rounding and
quantizing
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1286
Change-Id: I332c6f68df4614fc2e7d2dc4c5bb0d0cff8a245c
When filtering it needs 6 pixels: 2 prior to the source, the source, and
3 after the source.
When filtering 16 wide, that means 21. To accomplish this the SSE2 reads
[-2] to [5], [6] to [13], and [14] to [21], a total of 24 bytes (reading
in groups of 8 is easy)
The filter then shifts this last set to the top half of the register and
uses 'or' to combine it with the previous set.
Valgrind detected an issue reading pixels [19], [20] and [21]:
Address 0x7f581c2 is 434 bytes inside a block of size 441 alloc'd
Note: we only need pixels [16], [17], and [18] as context for [15].
To fix this, it now reads 8 bytes starting at [11], which re-loads [11]
through [13], but stops at [18] and does not over-read any values.
This is shifted by 5 and 'or'd with xmm1. Although the lower bits are
not cleared, they overlap directly with [11] through [13], so 'or'
produces the correct results.
Change-Id: I0c89c03afa660fc9b0108ac055d7bd403e493320
On 32 bit machines 'new' does not always appear to allocate sufficiently
aligned buffers, causing intermittent test failures.
Change-Id: I0db4fc73782012e4eef71dc0fb540e74fdbfcebe
the --enable-postproc-visualizer configure option remains as a no-op as
do the control names and values for compatibility
+ remove the corresponding debug flags from vpxdec: --pp-*
Change-Id: I4a001cd9962b59560d7d6bda6272d4ff32b8d37c
similar to changes that were done in vp9 for encoded frame size
reporting. has the side-effect of quieting a -Wshorten-64-to-32 warning.
Change-Id: I89f74cb617fc29334ee351dc8dfaa3b8cfd4e5af
The referenced bug was fixed by saving neon registers. That this had any
effect was coincidental.
Both chromium and Android build with clang and neither uses this flag.
Change-Id: I470247d6fd9226fc207b42a187105581a94badc3
The vp9_mv_class0_tree is a balanced tree with two leafs and can
simply be coded as a boolean with probability class0[0].
Change-Id: If294dac825a5f945371092c74aa8e3f84cd962b6
(cherry picked from commit be8a8ab62ebdd111c6f2e9a33b15630570671eba)
assume __clang_major__==0 has the latest version of
_mm256_broadcastsi128_si256. fixes builds with custom clang toolchains.
BUG=b/30970831
Change-Id: I90becd56278e4716bd46e2ba9d910af977e8dfa6
The code only has issues when xoffset == 0 and yoffset == 0 which
represents a simple copy. Presumably this case does not need to be
handled because the issue has existed since 2010.
BUG=webm:1287
Change-Id: Ic47e2653f3b729e99b40e53d8d2d8d1501edaaa9
Build out the sixtap_predict test because the filters are
interchangeable. Add verbose failures and border checking.
Change-Id: I962f50041750dca6f8d0cd35a943424cf82ddcb1
This reverts commit d9dce2f48e.
Appears to be failing the SixtapPredict tests in some configurations and possibly test vectors as well.
Change-Id: Ica6aa83ebac47d0a76e451846e7da67b1c17a7d7
This function was removed when clang started introducing alignment hints
which caused the 32 bit vld1_lane_u32/vst1_lane_u32 to fail:
https://llvm.org/bugs/show_bug.cgi?id=24421
The load has been rendered safe with an implementation ~indiscernible
performance-wise that uses _u8 and over-reads just a touch.
It is still ~5x faster than C in the unaligned case and doing both
filters.
BUG=webm:892
BUG=webm:1273
Change-Id: Icf7167189391b46202f47233bb585c24c42bcc36
postproc.c is overloaded and used for both postproc and internal stats.
If only --enable-internal-stats is specified there are issues with
non-existent struct members and unused functions.
Change-Id: I82367f1ffce659c3918c9f964dbce94a716fbb89
All the other test which do not use 'pass' (which appears to be almost
all of them) do this.
Cleans -Wextra/-Wunused-parameter:
unused parameter ‘pass’
Change-Id: I1ff3acf3f3d1e831f94dcb00ea36337afe0aefe0
Remove the experiment LIMIT_QP_ONEPASS_VBR_LAG, as its
not currently used and no plan to use in near future.
Change-Id: Ib069f8d7225195be04b765d0ab477510dfba6a3b
This function was removed when clang started introducing alignment hints
which caused the 32 bit vld1_lane_u32/vst1_lane_u32 to fail:
https://llvm.org/bugs/show_bug.cgi?id=24421
The load has been rendered safe with an implementation ~indiscernible
performance-wise that uses _u8 and over-reads just a touch.
The store, when unaligned, has a version that is ~25% slower but safe
when xoffset = 0 (second pass filter only). When the first pass filter
(or both) are in play, the new version is almost identical in speed.
Worst case performance (both filters, unaligned stores) is roughly 3-4x
faster than C.
BUG=webm:817
BUG=webm:1273
Change-Id: I1e490e94453e0872151fe0dafb05557463f6247d
Use the canonical 'vpx_codec_dec_cfg_t()' as opposed to 'vp9_zero()'
which just hammered everything to 0.
Change-Id: Id820efef700ad92a625797f8fd58e465b15eeca4
For some reason allocated_decoding_thread_count is signed, but decoding_thread_count is not.
Cleans -Wextra/-Wsign-compare:
comparison between signed and unsigned integer expressions
Change-Id: Id0ada78100acff27c1c4ed7493c563d13c55cdcd
Use vp9_zero() to set every element.
Cleans -Wextra/-Wmissing-field-initializers:
missing initializer for member ‘vpx_codec_dec_cfg::w’
missing initializer for member ‘vpx_codec_dec_cfg::h’
Change-Id: I5b41ce7d55a912e29b1d4c3e840cea80e8510fbe
This change was reverted before due to a hangouts encode-time
regression investigation. But since then this change has been
cleared of causing any noticeable regression.
This mode reduces some false detection, and uses the
same model as in vp9.
Change-Id: I9c82a748c5f601d0aca9f61ee218abfbd58c62bd
* changes:
Enable -Wundef by default
Define VP8_TEMPORAL_ALT_REF to !CONFIG_REALTIME_ONLY
Remove CONFIG_DEBUG guards from assert()
Remove unused function vpx_de_mblock
Fix -Wundef warning for OUTPUT_FPF
Fix -Wundef warning for __SANITIZE_ADDRESS__
Added casts to remove warnings:
BUG=webm:1274
In regards to the safety of these casts they are of two types:-
- Normalized bits per (16x16) MB stored in a 32 bit int (This is safe as bits
per MB even with << 9 normalization cant overflow 32 bits. Even raw 12
bits hdr source even would only be 29 bits :- (4+4+12+9) and the encoder
imposes much stricter limits than this on max bit rate.
- Cast as part of variance calculations. There is an internal cast up to 64 bit
for the Sum X Sum calculation, but after normalization dividing by the number
of points the result will always be <= the SSE value.
Change-Id: I4e700236ed83d6b2b1955e92e84c3b1978b9eaa0
Previously VP8_TEMPORAL_ALT_REF was only defined for non-realtime-only
builds. However, its value was checked with #if, not #ifdef.
Fixes -Wundef warnings.
BUG=webm:1069
Change-Id: If78d8731298f3f0d3662ffa25f973e7adaf67152
Using a tighter resize constraint on undershoot seems to help
results (especially SSIM) as significant undershoot on a frame
seems to have more of a damaging impact than overshoot.
This patch has been tuned so that in local testing using the
derf set it is encode speed neutral for speed setting 2.
Average quality result for speed 2 (psnr,ssim) were as follows:-
lowres 0.039, 0.453
midres 0.249, 0.853
hdres 0.159, 0.659
NetFlix -0.241, 0.360
Change-Id: Ie8d3a0d7d6f7ea89d9965d1821be17f8bda85062
Fixes windows build issue:
==> tests::VS10_x64 is broken
LINK : warning C4742: 'kYvuI601Constants' has different alignment in 'third_party\libyuv\source\row_common.cc' and 'third_party\libyuv\source\planar_functions.cc': 32 and 2 [.build-x86_64-win64-vs10\vpxdec.vcxproj]
LINK : warning C4744: 'kYvuI601Constants' has different type in 'third_party\libyuv\source\row_common.cc' and 'third_party\libyuv\source\planar_functions.cc': '__declspec(align(32)) struct (224 bytes)' and 'struct (224 bytes)' [.build-x86_64-win64-vs10\vpxdec.vcxproj]
LINK : warning C4742: 'kYuvI601Constants' has different alignment in 'third_party\libyuv\source\row_common.cc' and 'third_party\libyuv\source\planar_functions.cc': 32 and 2 [.build-x86_64-win64-vs10\vpxdec.vcxproj]
LINK : warning C4744: 'kYuvI601Constants' has different type in 'third_party\libyuv\source\row_common.cc' and 'third_party\libyuv\source\planar_functions.cc': '__declspec(align(32)) struct (224 bytes)' and 'struct (224 bytes)' [.build-x86_64-win64-vs10\vpxdec.vcxproj]
LINK : warning C4742: 'kYvuI601Constants' has different alignment in 'third_party\libyuv\source\row_common.cc' and 'third_party\libyuv\source\planar_functions.cc': 32 and 2 [.build-x86_64-win64-vs10\vpxenc.vcxproj]
LINK : warning C4744: 'kYvuI601Constants' has different type in 'third_party\libyuv\source\row_common.cc' and 'third_party\libyuv\source\planar_functions.cc': '__declspec(align(32)) struct (224 bytes)' and 'struct (224 bytes)' [.build-x86_64-win64-vs10\vpxenc.vcxproj]
LINK : warning C4742: 'kYuvI601Constants' has different alignment in 'third_party\libyuv\source\row_common.cc' and 'third_party\libyuv\source\planar_functions.cc': 32 and 2 [.build-x86_64-win64-vs10\vpxenc.vcxproj]
LINK : warning C4744: 'kYuvI601Constants' has different type in 'third_party\libyuv\source\row_common.cc' and 'third_party\libyuv\source\planar_functions.cc': '__declspec(align(32)) struct (224 bytes)' and 'struct (224 bytes)' [.build-x86_64-win64-vs10\vpxenc.vcxproj]
LINK : error C2220: warning treated as error - no 'executable' file generated [.build-x86_64-win64-vs10\vpxdec.vcxproj]
LINK : error C2220: warning treated as error - no 'executable' file generated [.build-x86_64-win64-vs10\vpxenc.vcxproj]
Change-Id: Ic3c4fff9209f5a52ff8f8ff321548d49ba09ec06
removes the need for an intermediate cast to int, which was missing in
the call added in:
69c5ba1 vpx_mem: Refactor code
quiets a visual studio warning:
C4146: unary minus operator applied to unsigned type, result still
unsigned
Change-Id: I76c4003416759c6c76b78f74de7c0d2ba5071216
This patch is to address concerns that changes to allow
recodes on the first frame in each ARF group do not give a
good enough speed quality trade off for speed 2. Though the
average impact on encode speed is 1-2%, for some hard clips
it is > 5% rise. For speed 1 this is less an issue and for Speed 0
the previous patch actually improves speed.
Change-Id: Ie1bcefdbfdf846d3f4428590173f621465dffe3a
This corrects a formatting error introduced in:
I1e9d548ce445d29002f0c59ebfd3957a6f15e702
where spaces were used as delimiters instead of tabs.
The corresponding fix for vp10 is in
Ica3d625d6672b3c47e0e208b45eede29b9004030.
Change-Id: Ibc4eb8fd82e6b926ba259a679dc98557cadba9b1
Current commit is just an API template for the rest of the code, and
I will add inner logic later.
Altref frames generate a lot of bitrate and at the same time
other frames refer to them a lot, so it makes sense to apply
special compensation-based adaptive quantization scheme for altref
frames. E.g., for blocks that are good predictors for the future
apply rate-control chosen quantizer while for bad predictors apply
worse one.
Change-Id: Iba3f8ec349470673b7249f6a125f6859336a47c8
Previously Tx domain rd was used in all cases above speed 0.
Coefficient optimization was only enabled for best and speed 0.
This patch selectively sets these features at other speed settings
based on block complexity.
For the Netflix and HD sets in particular the quality gains are
large compared to the speed hit. At speed 1 the average psnr
gain in the NF set is > 2.5% with one clip coming in at 18%
and some points almost 30%. Average gains for the lower
resolution test sets are around 1%.
The gains are biggest at low Q so some further optimization
may be possible.
Change-Id: I340376c7b2a78e5389a34b7ebdc41072808d0576
fixes SSE2/AddNoiseTest.CheckCvsAssembly/0 with -funsigned-char.
visibly broken since:
0dc69c7 postproc : fix function parameters for noise functions.
where the types diverged (char vs. int8)
but likely the return changed in:
2ca24b0 postproc - move filling of noise buffer to vpx_dsp.
when multiple implementations were merged.
Change-Id: I176ca1f170217f05ba7872b0c4de63e41949e999
set a max allocable size to prevent overflows in 32-bit and extremely
large allocation attempts in 64-bit. this could be amended to allow size
or num parameters to be 64-bits with the correct size being used at each
call site.
BUG=webm:819
Change-Id: Ia81004d6c4279680714c4488b4f6cf287ab396a5
vpx_realloc was allocating 1 byte more than needed every time.
Fixed this, and took this opportunity to do a small refactoring.
Change-Id: I38fcb62b698894acbbab43466c1decd12f906789
(cherry picked from aom: 2a876b4 aom_realloc correction.)
In the future this option will activate adaptive quantization special
for altref frames. Encoder will create the adaptive quantization map
on the basis of lookahead buffers similarity which is the estimate of
the future motion compensation performance.
Change-Id: Ia0088b3babb0f9a4899c79d8d819947ba5a03df2
This function only exists as a shortcut to subpixel variance with
predefined offsets. xoffset = 4 for horizontal, yoffset = 4 for vertical
and both for "hv"
Removing this allows the existing optimizations for the variance
functions to be called. Instead of having only sse2 optimizations, this
gives sse2, ssse3, msa and neon.
BUG=webm:1273
Change-Id: Ieb407b423b91b87d33c4263c6a1ad5e673b0efd6
decoding the same invalid keyframe twice would result in a crash as the
second time through the decoder would be assumed to have been
initialized as there was no resolution change. in this case the
resolution was itself invalid (0x6), but vp8_peek_si() was only failing
in the case of 0x0.
invalid-vp80-00-comprehensive-018.ivf.2kf_0x6.ivf tests this case by
duplicating the first keyframe and additionally adds a valid one to
ensure decoding can resume without error.
BUG=b/30593765
Change-Id: If0859035908b7870d67a7f3f646b5a080252eb6d
Disabled the split mode while encoding 4k video to speed
up the encoder.
Borg test result on 4k set:
Overall PSNR: +0.029%; SSIM: +0.009%.
Average encoder speedup at speed 2 is 2.5%.
Change-Id: I1519c658f07c3ac838affbe5aff0ed9b94f3f8f4
Adjusted speed 2 features to speed up 4k video encoding.
BDBR results from borg test:
PSNR: +0.313%; SSIM: +0.268%.
Average speedup: 8.5%
Change-Id: I1e2695a01fb3f3817c1df4480e184c2aed8f2eba
this fixes a crash in vp9_dec_setup_mi() via
vp9_init_context_buffers() should decoding continue and the decoder
resyncs on a smaller frame
BUG=b/30593752
Change-Id: I9ce8d94abe89bcd058697e8bd8599690e61bd380
Bias towards base_mv and skip 1/4 pixel motion search when using base mv.
2~3% speed up for 2 spatial layers, 3~5% speed up for 3 spatial layers.
PSNR loss:
(2 layers) 0.07dB for gips_stationary, 0.04dB for gips_motion;
(3 layers) 0.07dB for gips_stationary, 0.06dB for gips_motion.
Change-Id: I773acbda080c301cabe8cd259f842bcc5b8bc999
Add option, for newmv-last, to limit the rd-threshold update for early exit,
under a source varianace condition.
This can improve visual quality in low texture moving areas,
like forehead/faces.
Also add bias against golden to improve the speed/fps,
will little/negligible loss in quality.
Only affects CBR mode, non-svc, non-screen-content.
Change-Id: I3a5229eee860c71499a6fd464c450b167b07534d
The flag was added because Apple clang and Chromium clang disagreed
for certain versions of instructions.
qsubaddx, qaddsubx, ldrneb and ldrneh were used in armv6 assembly
which was removed in d55724fae9
vqshrun was used in some neon assembly but superseded by
dcbfacbb98
.include was used for obj_int_extract/asm_offsets and removed in
6eec73a747
Change-Id: I32f4c9b536d0318482101c0b8e91e42b8f545f18
Changes the default recode rule for Speed 0 and best quality
from ALLOW_RECODE to ALLOW_RECODE_KFARFGF.
Tested on the NF, hdres, midres and lowres test sets, this setting
when combined with patch I40cb559... now performs "as well" in
metrics terms (in fact it came out a tiny amount better overall)
but encode time is 9.6% faster (measured as the average
from 27 mid rate local encodes on clips in the derf/lowres set.
Change-Id: I8c781c0cdfa3a9929cd9406d15582fce47d6ae3b
Allow recodes for the first inter frame in each arf group
even when the recode rule is set to ALLOW_RECODE_KFARFGF.
Small gains of 0.05%.
Change-Id: I40cb559d36a2bf0ebf5cf758c3f92e452b480577
disable clang-format for bilinear_filters_avx2
restores the row layout prior to:
099bd7f vpx_dsp: apply clang-format
but keeps the justification used by clang-format
Change-Id: Icf1733a37edb807e74c26b23a93963c03bd08fd7
This patch fixed a motion vector out of range bug:
vpxenc: ../libvpx/vp9/encoder/vp9_mcomp.c:69:
mv_cost: Assertion `mv->col >= -((1 << (11 + 1 + 2)) - 1) &&
mv->col < ((1 << (11 + 1 + 2)) - 1)' failed.
For blocks that returned without having full-pixel search, the original
MV limits were not restored, which caused the failure. Moved the set
MV limit function down to fix the bug.
Change-Id: Id7d798fc7214e95c6e4846c588f0233fcf1a4223
_beginthreadex does not align the stack on 16-byte boundary as expected
by gcc.
On x86 targets, the force_align_arg_pointer attribute may be applied to
individual function definitions, generating an alternate prologue and
epilogue that realigns the run-time stack if necessary. This supports
mixing legacy codes that run with a 4-byte aligned stack with modern
codes that keep a 16-byte stack for SSE compatibility.
https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html
Change-Id: Ie4e4ab32948c238fa87054d5664189972ca6708e
Signed-off-by: Aleksey Vasenev <margtu-fivt@ya.ru>
prevents use of an uninitialized value in the deconstructor should the
test fail before tmpfile_ is set.
Change-Id: I8b49fd05f0d05e055fdf653bd46983d30f466a68
applied against a x86_64 configure with and without
--enable-vp9-highbitdepth
clang-tidy-3.7.1 \
-checks='-*,google-readability-braces-around-statements' \
-header-filter='.*' -fix
+ clang-format afterward
Change-Id: Ia2993ec64cf1eb3505d3bfb39068d9e44cfbce8d
Extract the duplicated data generation code in OperationCheck() of
Loop8Test6Param and Loop8Test9Param, and put in function InitInput().
Change-Id: Ied39ba4ee86b50501cc5d10ebf54f5333c4708f0
This patch fixed a motion vector(MV) out of range bug, which was caused
by not restoring the original values of the MV min/max thresholds after
the sub8x8 full pixel motion search. It occurred rarely and only was seen
while encoding a 4k clip for 200 frames.
BUG=webm:1271
Change-Id: Ibc4e0de80846f297431923cef8a0c80fe8dcc6a5
* changes:
Use common transpose for vpx_idct32x32_1024_add_neon
Use common transpose for vpx_idct8x8_[12|64]_add_neon
Use common transpose for vp9_iht8x8_add_neon
Use common transpose for vpx_idct16x16_[10|256]_add_neon
The code was expanding to Q registers so that vqrshn could be used, for
vector quad round shift and narrow. If 4 values are added together,
there is a shift by 2. If 8 values, a shift by 3. Since this accounts
for any possibility of overflow, we can skip the narrowing shift.
This allows keeping the values in D registers and casting the 16 bit
value to 8 bits.
Change-Id: I8d9cfa07176271f492c116ffa6a7b351af0b8751
The neon intrinsics are not able to load just the 4 values that are
used. In vpx_dsp/arm/intrapred_neon.c:dc_4x4 it loads 8 values for both
the 'above' and 'left' computations, but only uses the sum of the first
4 values.
BUG=webm:1268
Change-Id: I937113d7e3a21e25bebde3593de0446bf6b0115a
Increase the minimum distance.
Reduces the overshoot somewhat on some clips,
small gain in avgPSNR (~0.1%) on ytlive set.
Change-Id: Id5ddde20c2907dbdb536e79542eff775019c142b
This error was introduced by the patch:
8ce67d7 vp9 svc: Enable different speed setting for each spatial layer.
To use svc, svc_param should be cleared to 0 at the beginning.
Change-Id: I222f03ddae8a50e84b4690b78263abb742fae91e
Move best index into the token state. Shrink it down to one byte. This
is more cache friendly (access are group together) and uses less total
memory.
Results in 4% fewer cycles in optimize_b().
Change-Id: I75db484fb3dc82f59928d54b659d79c80ee40452
everything outside of third_party should follow 'PointerAlignment:
right' i.e., associate the '*' with the variable
+ add a note about the clang-format that generated this file
Change-Id: I13e3f4f5fb6e22a8fa7fc3d06879c995b7c41a39
- make Check() void as the EXPECT's are sufficient to document failure
cumulatively this has the effect of avoiding reporting incorrect Check()
failures due to earlier test failures.
Change-Id: I2cf775449f18c90c1506b8eadd7067adbc3ea046
The LLVM trunk has reached 4.0 and now __clang_major__ is not enough
to distinguish between old XCode Clang and the new 'real' Clang.
Using __apple_build_version__ allows to make this distinction.
BUG=chromium:631144
Change-Id: I0b6e46fddfe4f409c7b7e558bda34872e60ee2d9
Shutdown all threads before reclaiming any memory. The frame-level
parallel decoder may access data from another worker.
BUG=webm:1259
Change-Id: I26856ebd1f77cc4a4545331baa19bbf3e01c4ea4
1 - stops de allocating before threads are closed.
2 - limits threads to mb_rows when mb_rows < partitions
BUG=webm:851
Change-Id: I7ead53e80cc0f8c2e4c1c53506eff8431de2a37e
This commit changes the call in vp9 encoder from vp9_deblock() to
vp9_post_proc_frame() to ensure the data structures used in the call
are properly allocated. This fixes an encoder crash when configured
with --enable-internal-stats.
Change-Id: I2393b336c0f566665336df4f1ba91c405eb56764
bitstream.c: asserts are disabled when CONFIG_DEBUG is unset
vp8_dx_iface.c: split |s into 2 statements across #if bounds
Change-Id: I307d1e969134db5c9c0edd7690589b6b29116cbd
allows 'make test_libvpx', etc. some reworking of the makefiles would be
needed to avoid hard coding targets here.
Change-Id: I18982dbf691e7d36ab8bcf5934bab9340687b061
remove some (but not all yet!) tuple mis-use, and revamp the code a lot.
Factorize some common chores into MainTestClass.
Change-Id: Id37b7330eebe80d19b9d12a454f24ff9be6b1116
Allow usage of lookahead for VBR in real-time mode, for 1 pass vbr.
Current usage is for fast checking of future scene cuts/changes,
and adjusting rate control (gf interval and active_worst/target size).
Added unittests (datarate) for 1 pass vbr mode, with non-zero lag.
Added an experimental option to limit QP based on lookahead.
Overall positive gain in metrics on ytlive set:
avgPNSR/SSIM up on average ~1-3%; several clips up by 5, 7%.
Change-Id: I960d57dfc89de121c4824b9a9bf88d2814e74b56
Chromium changed the upstream default to --squash but this conflicts
with libvpx historical defaults.
Change-Id: I80f2f2b48e2ba08e02184b50e6d5f8f5e76fec24
remove some (but not all yet!) tuple mis-use, and revamp the code a lot.
Factorize some common chores into MainTestClass.
Change-Id: Ia14f3924140e8545e4f10d0504475681baae8336
* changes:
vpx_thread: use CreateThread for windows phone
vpx_thread: use WaitForSingleObjectEx if available
vpx_thread: use InitializeCriticalSectionEx if available
vpx_thread: use native windows cond var if available
vpx_thread.[hc]: update webp source reference
This change eliminates redundant computation in the two stage
downscaling, which saves ~1% encoding time in 3-layer svc encoding.
Change-Id: Ib4b218811b68499a740af1f9b7b5a5445e28d671
Modify the gfu_boost and af_ratio setting based on the
average frame motion level.
Change only affects 1 pass vbr.
Metrics overall positive on ytlive set.
On average up by ~1%, several clips up by 2-4%.
Change-Id: Ic18c49eb2df74cb4986b63cdb11be36d86ab5e8d
Use the precise context to estimate the zero token cost in trellis
optimization process. This improves the speed 0 coding performance
by 0.15% for lowres and 0.1% for midres. It improves the speed 1
coding performance by 0.2% for midres and hdres.
Change-Id: I59c7c08702fc79dc4f8534b64ca594da909e2c91
This commit allows the inter prediction residual to use uniform
quantization followed by trellis coefficient optimization in
speed 0. It improves the coding performance by
lowres 0.79%
midres 1.07%
hdres 1.44%
Change-Id: I46ef8cfe042a4ccc7a0055515012cd6cbf5c9619
Move the operations that update the context buffers outside this
function. The coeff_cost() takes all input as const value and returns
the coefficient cost.
This makes preparation for the next coefficient optimization CLs.
Change-Id: I850eec6e5470b91ea84646ff26b9231b09f70a0c
Replace the existing mv bias with a bias only for
NEWMV, and based on the motion vector difference of
its top/left neighbors.
For cbr non-screen-content mode.
Change-Id: I8a8cf56347cfa23e9ffd8ead69eec8746c8f9e09
Use a measure of noise energy to adjust Q estimate and
arf filter strength.
Gains 0.3-0.5% on Lowres and |Netflix sets.
Hdres and Midres neutral.
Change-Id: Ic0de552e7b6763e70eeeaa3651619831b423e151
Use pixel domain distortion metric in speed 0. This improves the
compression performance by 0.3% for both low and high resolution
test sets.
Change-Id: I5b5b7115960de73f0b5e5d0c69db305e490e6f1d
Safer to have the decoder operate normally and have
better-hw-compatibility only implement encoding changes.
Fixes some test failures.
Change-Id: I0dd70d002e4e893992f0cd59774b9363e6f7fe76
For real time CBR mode, use model_rd_for_sb_y for 32x32 if the sb is
a skin sb to avoid visual regression on the slowly moving face.
Refer to the cl: https://chromium-review.googlesource.com/#/c/356020/
Change-Id: I42c36666b2b474ce5ee274239d52ae8ab400fd46
The transform block row and column positions are always available
outside the callees. There is no need to re-compute these values
again. This approach has been used by the decoder. This commit
removes txfrm_block_to_raster_xy() function.
Change-Id: I5b90f91a0d8b7c35cfa7d171da9edf8202630108
BUG=b/29583578
original webp change:
commit d2afe974f9d751de144ef09d31255aea13b442c0
Author: James Zern <jzern@google.com>
Date: Mon Nov 23 20:41:26 2015 -0800
thread: use CreateThread for windows phone
_beginthreadex is unavailable for winrt/uwp
Change-Id: Ie7412a568278ac67f0047f1764e2521193d74d4d
100644 blob 93f7622797f05f6acc1126e8296c481d276e4047 src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h
Change-Id: Iade8fff6367b45534986c77ebe61abeb45bce0f8
BUG=b/29583578
original webp change:
commit 0fd0e12bfe83f16ce4f1c038b251ccbc13c62ac2
Author: James Zern <jzern@google.com>
Date: Mon Nov 23 20:40:26 2015 -0800
thread: use WaitForSingleObjectEx if available
Windows XP and up
Change-Id: Ie1a46a82722b8624437c8aba0aa4566a4b0b3f57
100644 blob d58f74e5523dbc985fc531cf5f0833f1e9157cf0 src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h
Change-Id: If165c38b378c6e0c55e17a1b071efd3ec3e7dcdd
BUG=b/29583578
original webp change:
commit 63fadc9ffacc77d4617526a50c696d21d558a70b
Author: James Zern <jzern@google.com>
Date: Mon Nov 23 20:38:46 2015 -0800
thread: use InitializeCriticalSectionEx if available
Windows Vista / Server 2008 and up
Change-Id: I32c5b4e5384d614c5a821ef511293ff014c67966
100644 blob f84207d89b3a6bb98bfe8f3fa55cad72dfd061ff src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h
Change-Id: I9ce49b3a86857267e504cd8ceab503b7b441d614
BUG=b/29583578
original webp change:
commit 110ad5835ecd66995d0e7f66dca1b90dea595f5a
Author: James Zern <jzern@google.com>
Date: Mon Nov 23 19:49:58 2015 -0800
thread: use native windows cond var if available
Vista / Server 2008 and up. no speed difference observed.
Change-Id: Ice19704777cb679b290dc107a751a0f36dd0c0a9
100644 blob 4fc372b7bc6980a9ed3618c8cce5b67ed7b0f412 src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h
Change-Id: Iede7ae8a7184e4b17a4050b33956918fc84e15b5
these are debug-only modules that can be added in manually when needed.
leave a reference in vp8_common.mk / vp9_common.mk for easy addition.
quiets -Wmissing-prototypes warning
BUG=b/29584271
Change-Id: Ifc8637d877edfbd562b34dc5c540428bba7951fc
Moved the API patch from NextGenv2. An example was included.
To try it, for example, run the following command:
$ examples/vpx_cx_set_ref vp9 352 288 in.yuv out.ivf 4 30
Change-Id: I4cf8f23b86d7ebd85ffd2630dcfbd799c0b88101
The scaling of the threshold for 10 and 12 bit here appears
to be in the wrong direction. For 10 and 12 bit we expect sse
values to be higher and hence the threshold used should be
scaled up not down.
Change-Id: I2678116652b539aef48100e0f22873edd4f5a786
This function seems to scale the threshold for testing an
SSE value in the wrong direction for 10 and 12 bit inputs.
Also for a true SSE the scalings should probably be << 4 and 8
Change-Id: Iba8047b3f70d04aa46d9688a824f3d49c1c58e90
CONVERT_TO_BYTEPTR(x) was corrected in:
003a9d2 Port metric computation changes from nextgenv2
to use the more common (x) within the expansion. offsets should occur
after converting the pointer to the desired type.
+ factorized some common expressions
Change-Id: I171c3faaa5606d098e984baa9aa74bb36042f57f
Force enable x86inc.asm when building for x86. Previously there were
compatibility issues so a flag was added to simplify disabling this
code.
The known issues have been resolved and x86inc.asm is the preferred
abstraction layer (over x86_abi_support.asm).
BUG=b:29583530
Change-Id: Ib935e97b37ffb22d7af72ba0f04564ae6280f1fd
inadvertently lost in the final patchset of:
078dff7 configure: remove old visual studio support (<2010)
this prevents an empty CONFIG_VS_VERSION and avoids make failure
Change-Id: I529d52eca59329e2715309efd63d80f0e1fed462
For real time CBR mode, use model_rd_for_sb_y for 32x32 if the mode is
newmv last, which is less aggressive in skipping transform and
quantization, to avoid quality regression in some conditions.
Change-Id: Ifa30be587f2a8a4a7f182a172de6ce277c0f8556
Speed test shows the new vertical filters have degradation on Celeron
Chromebook. Added "X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON" to control
the vertical filters activated code. Now just simply active the code
without degradation on Celeron. Later there should be 2 set of vertical
filters ssse3 functions, and let jump table to choose based on CPU type.
Change-Id: Iba2f1f2fe059a9d142c396d03a6b8d2d3b981e87
Due to rounding, hbd variance may become negative. This commit put in
check and clamp of negative values to 0.
Change-Id: I610d9c8aa2d4eebe7bc5f2c5624a9e3cadad4c94
For forced key frames in particular this helps to make them
blend better with the surrounding frames where noise tends
to be suppressed by a combination of quantization and alt
ref filtering.
Currently disabled by default under and IFDEF flag pending
wider testing.
Change-Id: I971b5cc2b2a4b9e1f11fe06c67ef073f01b25056
Bug fix: The crash is caused by not allocating buffer for prev_mip in
postproc_state and prev_mip in postproc_state is only used for MFQE,
ohter postproc modules, deblocking and etc., should not use it.
BUG=webm:1251
Change-Id: I3120d2f50603b4a2d400e92d583960a513953a28
add a trailing ':', though it's optional with the tools we support, it's
more common to use it to mark a label. this also quiets the
orphan-labels warning with nasm/yasm.
BUG=b/29583530
Change-Id: I46e95255e12026dd542d9838e2dd3fbddf7b56e2
For real-time mode, increase variance threshold for 32x32 blocks in
var-based partitioning for resolution >= 720p, so that it is more
likely to stay at 32x32 for high resolution which accelerates the
encoding speed with little/no PSNR drop.
PSNR effect on different speed settings:
speed 8 rtc: 0.02 overall PSNR drop, 0.285% SSIM drop
speed 7 rtc: 0.196% overall PSNR increase, 0.066% SSIM increase
speed 5 rtc_derf: no effect.
Speed up:
gips_motion_WHD, 1mbps: 2.5% faster on speed 7, 2.6% faster on speed8
gips_stat_WHD, 1mbps: 4.6% faster on speed 7, 5.6% faster on speed8
Change-Id: Ie7c33c4d2dd7d09294917e031357fc5476c3a4bb
Avoids a segfault in high-bitdepth builds.
This restores the condition to its state prior to:
7991241 vp9: Change the scheme for modeling rd for bsize 32x32.
BUG=webm:1250
Change-Id: I6183d5b34cb89dfbf27b7bb589812148a72cd7de
For real-time CBR mode, use model_rd_for_sb_y_large instead of
model_rd_for_sb_y for 32x32 block. In the former model, transform
might be skipped more aggressively in some condtions, which speeds
up encoding time with only a little PSNR/SSIM drop on rtc test set.
No obvious visual quality regression.
PSNR effect on different speed settings:
speed 8 rtc: 0.129% overall PSNR drop, 0.137% SSIM drop
speed 7 rtc: 0.135% overall PSNR drop, 0.062% SSIM drop
speed 5 rtc_derf: 0.105% overall PSNR drop, 0.095% SSIM drop
Speed up:
gips_motion_WHD, 1mbps: 3.29% faster on speed 7, 2.56% faster on speed8
gips_stat_WHD, 1mbps: 2.17% faster on speed 7, 1.62% faster on speed8
BUG=webm:1250
Change-Id: I818babce5b8549b4b1a7c3978df8591bffde7173
Use quotes whenever possible and {} always for variables.
Replace multiple set_all calls with *able_feature().
Change-Id: If579d3f718bd4133cf1592b4554a8ed00cf9f2d3
decoder_peek_si_internal could potentially read more bytes than
what actually exists in the input buffer. We check for the buffer
size to be at least 8, but we try to read up to 10 bytes in the
worst case. A well crafted file could thus cause a segfault.
Likely change that introduced this bug was:
https://chromium-review.googlesource.com/#/c/70439 (git hash:
7c43fb6)
BUG=chromium:621095
Change-Id: Id74880cfdded44caaa45bbdbaac859c09d3db752
When building without multithreading and for a non-arm, non-x86 system,
ctx is unused.
Cleans up -Wextra warning:
unused parameter ‘ctx’ [-Werror=unused-parameter]
Change-Id: Ifddff89d2ebd45f7d71e3d415a8f2415dd818957
'duration' is not used in realtime-only mode:
Cleans up -Wextra warning:
unused parameter 'duration' [-Wunused-parameter]
Change-Id: I827dfe59ebcdc72c5a93fdf7e5aca063433914b1
In vp9_pick_inter_mode(), instead of using
vp9_get_pred_context_switchable_interp(xd) to assign filter_ref,
we use a less strict condition on assigning filter_ref.
This is to reduce the probabily of entering the flow of not
assigning filter_ref and then skipping filter search.
Overall PSNR gain 0.074% for rtc dataset
Details:
Low Mid High
0.185% -0.008% -0.082%
Change-Id: Id5c5ab38d3766c213d5681e17b4d1afd1529e676
Allows building simple targets with sane default flags.
For example, using the Android arm64 toolchain from the NDK:
https://developer.android.com/ndk/guides/standalone_toolchain.html
./build/tools/make-standalone-toolchain.sh --arch=arm64 \
--platform=android-24 --install-dir=/tmp/arm64
CROSS=/tmp/arm64/bin/aarch64-linux-android- \
~/libvpx/configure --target=arm64-linux-gcc --disable-multithread
BUG=webm:1143
Change-Id: I06f5a7564f5382cf1a4bad41aef4308566c53adf
Speed test shows the new vertical filters have degradation on Celeron
Chromebook. Added "X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON" to control
the vertical filters activated code. Now just simply active the code
without degradation on Celeron. Later there should be 2 set of vertical
filters ssse3 functions, and let jump table to choose based on CPU type.
Change-Id: I37e3e9c5694737d9134a6bce6698d3e43f8fc962
For real-time CBR mode, use model_rd_for_sb_y_large instead of
model_rd_for_sb_y for 32x32 block. In the former model, transform
might be skipped more aggressively in some condtions, which speeds
up encoding time with only a little PSNR/SSIM drop on rtc test set.
No obvious visual quality regression.
PSNR effect on different speed setting:
speed 8 rtc: 0.129% overall PSNR drop, 0.137% SSIM drop
speed 7 rtc: 0.135% overall PSNR drop, 0.062% SSIM drop
speed 5 rtc_derf: 0.105% overall PSNR drop, 0.095% SSIM drop
Speed up:
gips_motion_WHD, 1mbps: 3.29% faster on speed 7, 2.56% faster on speed8
gips_stat_WHD, 1mbps: 2.17% faster on speed 7, 1.62% faster on speed8
Change-Id: I902f62def225ea01c145d7e5a93497398b8f5edf
Due to rounding used computation, HDB variance computation may produce
slightly negative values. This commit adds clamping to make sure
output variance values for 10 and 12 to be non-negative.
Change-Id: Id679aa55a4c201958c4c7d28cd8733b9246a71c8
This commit adds an encoder workaround to support better
compatibility with a non-compliant hardware vp9 profile 2 decoder.
The known issue with this decoder is:
The decoder assumes a wrong value, 127 instead of the correct
value of 511 and 2047, for any assumed top-left corner pixel in
UV planes for 10 and 12 bit, respectively. Such assumed
top-left corner pixel is used for INTRA prediction when a real
decoded/reconstructed pixel is not avalable, e.g. when it is
located inside the row above the top row or inside the column
left to the leftest column of a video image.
Change-Id: Ic15a938a3107e1b85e96cb7903a5c4220986b99d
decoding is done if the decoder is available, with errors handled
accordingly. the encoded frame count should be sufficient for this test.
+ remove HandleDecodeResult() as it's redundant given the base
implementation
BUG=webm:1233
Change-Id: I513c1c3475c58a746f4df627491bdc392fe21416
development has moved to the nextgenv2 branch and a snapshot from here
was used to seed aomedia
BUG=b/29457125
Change-Id: Iedaca11ec7870fb3a4e50b2c9ea0c2b056a0d3c0
This commit refactors the trellis coefficient optimization process.
It saves multiplications used to generate the final dequantized
coefficients. It removes two memset operations on quantized
and dequantized coefficient sets. This improves the unit speed
by 10%.
Change-Id: I23f47c6e14582520a7f952f03ce8f72183e7f0e6
Each time a codec is enabled or disabled with the umbrella
--enable-vpN flag, set the encoder and decoder configurations as well.
This was done as a post-processing step but doing that lost the order of
the arguments.
BUG=webm:1205
Change-Id: Ic629bfdd06acc04bc5a7227309f36bba54dad8b1
Since combining VPX_DL_REALTIME with VPX_RC_FIRST_PASS is basically
nonsense, ignore the user's pass setting when this happens and
behave as if the requested encode is a single pass encode.
BUG=webm:1233
Change-Id: I5ee4c4e5838c4ca6d24988890aae490b10826db2
The logic can be incorporated into configure.sh
Removes a dependency on ios-version.sh which was not part of DIST-SRCS
and removes a warning from 'make dist' sub builds:
../src/build/make/configure.sh: line 787:
../src/build/make/ios-version.sh: No such file or directory
Change-Id: Ic38314708eb278dd9d2a9769a670da32f6126637
This value is signed in vp9/10
Cleans warning in Android build:
comparison of integers of different signs: 'unsigned int' and 'int'
if (cpi->frames_since_golden == (cpi->current_gf_interval >> 1))
~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Change-Id: Ie137724982f3a46c8c1820548c1960d62a4e96f2
left_above_mv and above_block_mv return as_int
as_int is defined as uint32_t in vp8/common/mv.h
Cleans up -Wextra warnings:
signed and unsigned type in conditional expression
this_mv->as_int = col ? d[-1].bmi.mv.as_int : left_block_mv(mic, i);
^
this_mv->as_int = row ? d[-4].bmi.mv.as_int : above_block_mv(mic, i, mis);
^
left_mv.as_int = col ? d[-1].bmi.mv.as_int :
^
Change-Id: Ia043764e4ce93d2152d2269b1c7b28b5d5f814cf
This commit change to use int64_t to represent the sum of pixel
differences, which can be negative.
This fixes a number of ubsan warnings.
BUG=webm:1219
Change-Id: I885f245ae895ab92ca5f3b9848d37024b07aac98
Use ~15 instead of 0x..F0
Cleans warning in Android build:
comparison of integers of different signs: 'unsigned int' and 'int'
if (((cm->Width + 15) & 0xfffffff0) !=
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
comparison of integers of different signs: 'unsigned int' and 'int'
((cm->Height + 15) & 0xfffffff0) !=
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
Change-Id: Iac25839cde3425b7b9db7f33740dc46a551b7546
For VBR: (1) allow newmv mode for golden ref to
select interpolation filter (as in last ref case), and
(2) don't use the more aggressive tx-skip testing logic for large blocks.
Only affects 1 pass real-time vbr mode (speed >= 5).
PSNR/SSIM metrics on ytlive set are all positive, ~0.5-2% gain.
Change-Id: I0ffbb0a9755563a5acd6230c58236e4f19a47266
This change is only for real-time mode if short_circuit_low_temp_var
is on. Add bias to last frame in choosing ref frame for partitioning,
when y_sad and y_sad_g are close. It speeds up real-time encoding by
0.5% on some clips with less than 0.1% overall PSNR drop on rtc test set.
Change-Id: I2a2110fe36455f3d8f0fc404aef2228f512e8df8
Cleans warning in Android build:
comparison of integers of different signs: 'unsigned int' and 'int'
int n = (int)VPXMIN(sizeof(clear_buffer), data_end - data);
^ ~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~
Change-Id: I964355ceae6b39e22c0196294b25e28387f84945
Defined as unsigned in VP8_CONFIG
Cleans warning in Android build:
comparison of integers of different signs: 'unsigned int' and 'int'
if (cpi->oxcf.number_of_layers != prev_number_of_layers)
~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~
Change-Id: I969e64cd2bfda6e61c564476dbd35b892b177646
The vpx_roi_map_t and vpx_active_map_t structures use unsigned rows
and cols but VP8_COMMON uses signed values for mb_rows and mb_cols.
Cleans warning in Android build:
comparison of integers of different signs: 'int' and 'unsigned int'
if (cpi->common.mb_rows != rows || cpi->common.mb_cols != cols)
~~~~~~~~~~~~~~~~~~~ ^ ~~~~
comparison of integers of different signs: 'int' and 'unsigned int'
if (cpi->common.mb_rows != rows || cpi->common.mb_cols != cols)
~~~~~~~~~~~~~~~~~~~ ^ ~~~~
comparison of integers of different signs: 'unsigned int' and 'int'
if (rows == cpi->common.mb_rows && cols == cpi->common.mb_cols)
~~~~ ^ ~~~~~~~~~~~~~~~~~~~
comparison of integers of different signs: 'unsigned int' and 'int'
if (rows == cpi->common.mb_rows && cols == cpi->common.mb_cols)
Change-Id: If1f118c20ffefd2530fbd371e6787cc8a6c31f0a
Mode is signed
Cleans warning in Android build:
comparison of integers of different signs: 'int' and 'unsigned int'
if (ctx->oxcf.Mode != new_qc)
~~~~~~~~~~~~~~ ^ ~~~~~~
Change-Id: I5cf81c40b103e688a31e1339511f5c9eb27edd38
1. Skip golden non-zeromv and newmv-last for bsize >= 16x16 if the
temporal variance obtained from choose_partitioning is very low.
2. Skip horz and vert INTRA mode for speed 8.
This change works best on the clips with little noise and with some
motion (e.g. gips_motion which has > 5% speed up). PSNR drop is 1.78%
on rtc test set, no obvious visual quality regression found.
Change-Id: Ib43b5b20e67809d03c5a6890818ddff59e1fc94a
Move initialization of a some new "twopass" values
to the function vp9_init_second_pass() and some other
small changes.
Remove #if GROUP_ADAPTIVE_MAXQ as this is always
enabled now.
Change-Id: I1dbec2fd7c419779848aa987c4cd7824d4df8456
the difference between src and dst will be signed, the error will be
unsigned.
quiets -fsanitize=integer:
unsigned integer overflow: 4294967295 * 4294967295
Change-Id: I580813093ee46284fde7954520dfcb1188f79268
the difference between src and dst will be signed, the error will be
unsigned.
quiets -fsanitize=integer:
unsigned integer overflow: 4294967295 * 4294967295
Change-Id: I502fd707823c4faaa7f587c9cc0312f057e04904
On scene-cut detected frames (i.e., high_source_sad = 1), use
nonrd_pick_partition (over choose_part + select_part), as
the nonrd_pick partitioning is generally better.
Small positive increase in metrics on ytlive set (~0.5 - 1%).
Negligle overall speed decrease, as its only used on scene-cut frames.
Only affects 1 pass vbr mode, speed = 5.
Change-Id: I07c89cbdc75f5bb16eb8e0e2773ead0980d2de5c
This reverts commit be12fefa4b
and commit 057c1c4034.
Also, the mismatch between the avx version and the
c version has been fixed.
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1168
For a rt encode using 1080p@60fps material, up to 11% performance
improvement overall was seen.
Change-Id: Icd1f216209ebc6fc0b8da885f32f356fa4355ed0
The eob of a block is not perperly set when skip_recode is true,
thus triggering assert(eob <= default_eob) to fail.
Change-Id: Ifecbe33dce2dc4903e0a80bd384dc09bf0dd8a44
Code cleaup, use existing rolling_actual/target metrics instead,
set threshold to get same/similar effect.
Little/no change in metrics on ytlive set.
Change-Id: I74f3c3d0a143a9cf20dc9c3dee54c0f7e6a97a51
Add a max condition and lower the min value.
No change in behavior (metrics for yt live set) for the
default min/max_gf_interval=4/16 settings.
Small positive change when min/max_gf_interval=7/16
(for 60fps clips on ytlive set).
Change-Id: I1c1d72425c86c69419ea43fb9730130e81062f91
add an upper bound to the framerate denominator above which 30fps will
be reported; fixes warning in corrupt / fuzzed files
Change-Id: I46a6a6f34ab756535cd009fe12273d83dcc1e9f1
Provides more comprehensive coverage for --enable-coefficient-checking.
The intent is to make the --enable-coefficient-checking option
consistent with the VP9 spec.
Change-Id: I12d0120756d17572ca2b2d7e6a2ab9d8071d8d58
Error messages:
..\vp9\common\vp9_loopfilter.c(1312): warning C4244: 'function' :
conversion from 'uint64_t' to 'unsigned int', possible loss of data
[.build-x86_64-win64-vs10\vpx.vcxproj]
..\vp9\common\vp9_loopfilter.c(1313): warning C4244: 'function' :
conversion from 'uint64_t' to 'unsigned int', possible loss of data
[.build-x86_64-win64-vs10\vpx.vcxproj]
..\vp9\common\vp9_loopfilter.c(1312): error C2220: warning treated as
error - no 'object' file generated
[.build-x86_64-win64-vs10\vpx.vcxproj]
Change-Id: Ia69260611997cd2ba41c7184a85ecead740a7c07
Increase in the damping used in adjusting the active Q range.
This does hurt rate accuracy a little in a few extreme cases
especially if the clip is very short*, but helps metrics.
* Note that the adjustment is applied at the GF/ARF group level based
on what happened in the last group. Hence for very short clips where
the length of a single group may be a significant % of the clip length
there is still scope for some drift that cannot be accommodated.
In practice most data points in our test sets are now much closer to target
than was previously the case with default settings and in some cases are
better even than they were with the command line undershoot and overshoot
parameter was set very low (e.g. 2%). For example in bridge_close at high rates
the old mechanism was unable to adapt enough to prevent extreme overshoot.
Change-Id: I634f8f0e015b5ee64a9f0ccaa2bcfdbc1d360489
Change to the calculation of the error divisor used in
get_twopass_worst_quality(). This follows on from other
changes to the rate control that impact the output of this
function.
Change-Id: I414fa9aa1e6a68a64dccea17c3712f44b8a0c10c
Changes to the function the redistributes bits from overshoot
or undershoot throughout the rest of the clip to respond more
quickly.
Change-Id: I90f10900cdd82cf2ce1d8da4b6f91eb5934310da
Added a factor based on the bit spend in the last arf group vs the
target to adjust the choice of the active worst quality in subsequent
groups.
Helps clips where previously there was a big overshoot or undershoot
to adapt and get closer to the target rate.
Change-Id: I67034b801679b99024409489a2273ea6fe23b8e6
The use of this value is preventing rate adjustment on clips
or sections that have very little motion but high noise and
this can give rise to some sections with massive overshoot.
Change-Id: I9a65c7c1148dc5d3a7d8b23e50fc1733f3661621
Replaced vpx_d45_predictor_4x4_ssse3(), vpx_d45_predictor_8x8_ssse3()
and vpx_d207_predictor_4x4_ssse3() with
created vpx_d45_predictor_4x4_sse2(), vpx_d45_predictor_8x8_sse2()
and vpx_d207_predictor_4x4_sse2() respectively.
It's mostly neutral or slightly worse than ssse3 in good cases and
better than ssse3 in the bad cases (but still worse than using the mmx
regs).
Change-Id: Ib0237ceb71d2c57b8a93fd3170330cfed9d56bdd
Skip intra-mode and some inter-modes (newmv, nearmv, nearestmv) for
golden frame if the variance got from choose_partitioning is very low.
Only for 1 pass real-time CBR mode and bsize >= 32x32, it has ~2.5%
speed up with less than 0.1% PSNR drop for rtc test set. Don't see
visual regression.
Change-Id: I70efbc95a1007231ae36f02c5b2fbf6cd35077ad
Reduce operations and jumps. perf shows CPU time reduced from 1.9% to
1.6% when decoding fdJc1_IBKJA.248.webm on Xeon E5.
Will apply the changes to vp10 after code review.
Change-Id: I9351509922855d8896ddef1ed093b3ca12619a61
For non-rd pickmode:
best_pred_sad, computed for NEWMV-last, is only used for
skipping golden non-zero modes. Add condition to avoid this
computation if not used (i.e, if golden nonzero modes are not used).
And remove code for computing best_pred_sad for NEWMV-golden,
since that sad is not used.
No change in behavior; small speed gain (~1%) for svc encodes.
Change-Id: Ic2cbdef6c4e9a233a57c0db0eeac8ad5fcead366
convert the random value to int16 before subtracting 256 from it; quiets
a ubsan (sanitize=integer) warning
BUG=webm:1225
Change-Id: Ibc2c5a21f30e112bd6c180f7d6a033327c38d0df
Function level timing test shows about 27% time saving on
a Xeon E5-2680 v2 desktop.
Rename vp9_dct_sse2.c to vp9_dct_intrin_sse2.c for vp9 and
rename dct_sse2.c to dct_intrin_sse2.c for vp10 to avoid
duplicate basenames.
Actually vp9_fwht4x4_mmx/sse2() and vp10_fwht4x4_mmx/sse2()
are identical. TODO: They should be unified later if there is
no intention to keep a duplicate.
Change-Id: I3e537b7bbd9ba417c606cd7c68c4dbbfa583f77d
C does not allow for shifting into the sign bit of a signed
integer, and the two instances here become signed ints via
promotion. Explcitly cast them to unsigned MEM_VALUE_T to
avoid the problem.
BUG=https://bugs.chromium.org/p/chromium/issues/detail?id=614648
Change-Id: I51165361a8c6cbb5c378cf7e4e0f4b80b3ad9a6e
Followed the code style of other lpf fuctions.
These 2 functions put 2 rows of data in a single xmm register,
so they have similar but not identical filter operations,
and cannot share the same macros.
Change-Id: I3bab55a5d1a1232926ac8fd1f03251acc38302bc
- Avoid excessive copying
- Don't both searching if no update can possibly offer savings
- Simplify the interface
- Remove the confusing vp9_cost_upd256 macro
Change-Id: Id9d9676a361fd1203b27e930cd29c23b2813ce59
Apple's version format specification is strictly checked on app
store submission, even for embedded frameworks:
http://apple.co/1WgelY1
The build version number should be a string comprised of
three non-negative, period-separated integers with the
first integer being greater than zero. The string should
only contain numeric (0-9) and period (.) characters.
So that's room for "1.5.0" but not for "1.5.0-906-g656f9c4".
The full version returned from 'version.sh --bare' is now
embedded under a 'VPXFullVersion' custom key in the Info.plist,
so it can still be extracted from the resulting framework.
Change-Id: If34a58d02e407379d1f1859fda533ef7f983170b
vp9_diamond_search_sad_avx was disabled in:
057c1c4 disable vp9_diamond_search_sad_avx
this removes a missing prototype warning as the prototype is no longer
included in vp9_rtcd.h. the file can be restored if someone gets around
to fixing the issue.
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1168
Change-Id: Ia9fda4b81c53dc5fba7c31d780d761f886940b52
Many codes require -mstackrealign flags. Although -mstackrealign has
been already added to CFLAGS of some modules, SIGSEGV occurs in other
modules than those modules.
The best way may be to find causes and to fix them. However, we
cannot know those causes until SIGSEGV occur really. In addition, if
SIGSEGV occurs in other programs, it will be fatal.
So adding -mstackrealign flags to CFLAGS unconditionally is
reasonable.
Change-Id: I999ef597a6afe97f5e7cc7bffaa866537c3eedd2
This reverts commit 2468163e07.
causes valgrind errors for overread of buffer in SubpelVarianceTest
Change-Id: I448e52c76f815ac199305b71f7d169f2bc167679
Move the logic for rechecking zeromv on denoised block out to simplify
the function. To simplify the param passing, add a new structure
VP9_PICKMODE_CTX_DEN which is only used when denoiser is enabled.
Change-Id: Iaa9b4396dfcb8147236c02d4a1868a09103a4476
This commit clarifies integer value range for vairables used in
several variance functions, also change to use proper type
conversion to reflect the value ranges.
Change-Id: Ic3234b83a912ce1ad12d1b254f3378763e15cc5c
The inlining mirrors what was done with the low bit depth
inter_predictor. And the new highbd_inter_predictor name is more
consistent with other high bit depth functions.
Change-Id: I96437f745759aeec6260c6e39a974bf36f1c211c
Rename and change to how its updated.
Only affects 1 pass vbr.
Small change in metrics (< ~0.1%) on ytlive set.
Change-Id: Ibb1fe485699b6c4a8194951c8f229abe2f64b9a5
Also allows use of --enable-shared when configuring for Mac OS X,
producing a bare .dylib.
Enabling the shared framework bumps the iOS deployment target to 8.0,
the minimum required to support dynamic framework deployment in apps.
When not using --enable-shared, a static library for iOS 6.0+ will still
be built.
Minimum version settings have been moved into ios-version.sh so they
can be updated in a single place.
As with the static build, unless header search paths are manually
tweaked, users must add a VPX prefix on includes, such as:
#include <VPX/vpx/vpx_decoder.h>
A module map for headers is not yet included as inttypes.h is not
modular; this means that VPX cannot be used directly in Swift code,
but can still be pulled in through an Objective-C wrapper.
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1092
Change-Id: I28fb06ce65e48ed167a88c14a7bfb2861989317e
In motion estimation stage for subpel motion, subpel variance is
computed use bilinear interpolation. The motion vector precision
used is at 1/8 pel and three bits are used to represent the x and y
subpel offsets. Based on this, the half pel check should be against
4, not 8.
Change-Id: I1f56fa1fa3f2f5e19a20d27983efe628557f170e
there are sse2 equivalents which is a reasonable modern baseline
Removed mmx variance functions:
vpx_get_mb_ss_mmx()
vpx_get8x8var_mmx()
vpx_get4x4var_mmx()
vpx_variance4x4_mmx()
vpx_variance8x8_mmx()
vpx_mse16x16_mmx()
vpx_variance16x16_mmx()
vpx_variance16x8_mmx()
vpx_variance8x16_mmx()
Change-Id: Iffaf85344c6676a3dd337c0645a2dd5deb2f86a1
Added actual and absolute rate miss values to the opsnr.stt
stats output line.
Changes to the borg graphing may be needed before merge.
Change-Id: I1e9d548ce445d29002f0c59ebfd3957a6f15e702
Bug found by Yunqing relating to the correction for size at 8K and
above in get_twopass_worst_quality().
The basis for the correction was changed to the linear size relative to
1080P as a baseline and the adjustment has been clamped to prevent
problems at extreme images sizes.
For 1080P the results on our test sets were neutral but the low res and
mid res sets saw a small gain (0.1%-0.2% average).
I would also expect some gains on 4k and larger content where the
previous correction was overly aggressive.
Change-Id: I30b026b5f4535e9601e3178d738066459d19c8fb
Add control API VP9E_SET_TARGET_LEVEL that allows the encoder to
control the output bitstream level and/or keep level related
statistics.
Usage:
255 do not care about level (default)
0 keep level related stats only
10 target for level 1
11 target for level 1.1
.
.
.
62 target for level 6.2
Usage for vpxenc:
--target-level=0/255/10/11...
Change-Id: I31d1aeca19358b893e7577b4e63748c8e614034a
For at least some of the implementations of sdx8f, such as
vpx_sad4x4x8_sse4_1, aligned moves are used to move the results into the
array.
Change-Id: I83df5a8e657b44e906d0d8b0bc154f1e5660f7f9
block_variance: This operates on 8x8s and would be safe with a int32 *
int32 to uint32 multiply, but this is potentially unsafe for 12-bit
input. Unfortunately the code already segfaults on 12-bit input:
https://bugs.chromium.org/p/webm/issues/detail?id=1223
calculate_variance: This operates on up to a 32x32 of 8x8s and can
overflow even with 8-bit input (log2((256*32*32)**2) == 36).
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1220
Change-Id: I1ca4ff6092db9a7580da371ee9a21f403fdadc40
Reduce factor for setting base-qp for active_best_quality (for inter-frames).
Small increase in metrics on yt live set.
Change-Id: I9cf0ac797783aeddbfaf1ff510696c9035d7c5ee
This change makes the c match the assembly and removes the todo's
associated with getting this to work.
Change-Id: Ie32e9ebb584a9d60399662d8bcb71b74fbd19d1e
These implementations rely on casting the pointers to load the data.
Clang implemented optimizations which automatically add alignment hints
to such loads. The 4x4 filters do not guarantee the necessary alignment
so the resulting assembly is broken.
https://llvm.org/bugs/show_bug.cgi?id=24421
BUG=webm:817
BUG=webm:892
Change-Id: I608885299f1f86ff83653b65e0e40d0ae87fb3fe
* changes:
vp9_frame_scale_ssse3.c: make 2 functions static
vp9_pickmode.c: make function static
vp9_noise_estimate.c: make function static
vp9_aq_360.c: add missing include
vp9_idct_intrin_sse2: add missing vp9_rtcd.h include
vpx_dsp/*.[hc]: add missing vpx_dsp_rtcd.h include
Makes the delta-qp stop little earlier on areas that have been refreshed enough.
This helps to reduce some pulsing artifact on noisy flat areas observed in some
noisy vc-clips.
Threshold changes only take effect for sources where noise level is estimated to
be >= medium level.
Only affects 1 pass CBR, non-screen content case.
Change-Id: Iacf557f6aa8abbcd6782c02ff2e6c14891960850
For 1 pass vbr mode:
Refactor to move the logic for gf setting based on up-coming
key frames to a separate function, so same logic can be used for
scene-cuts/changes.
Change-Id: Ic4ede308e08ba869bb62e4566e19ea31222c5229
Makes the noise estimation react little faster.
Little/no change in metrics.
Change only affects 1 pass cbr.
Change-Id: I13f0daa90ecbf9d49eb1cf2e48febd9d92292940
When building a dynamic framework with Swift compatibility, can't
include any headers that aren't in another module or you get an
error like this from Xcode on the including project:
Include of non-modular header inside framework
For some reason the system inttypes.h is not in a module, unlike
other standard C library headers... but it doesn't seem to be
actually needed on Darwin, so removing it doesn't appear to
be a problem.
Change-Id: I11d264483c54feefd9d2edf573afaef34ddcd0f2
When using git submodules, .git may be a file instead of a directory.
The -d test was failing in that case; switched to -e.
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1215
Change-Id: Iedf0e92bfeb003b28a415945dc729e6ce58c4fe4
"qc" in vp{9,10}_token_state is used to save quantized coefficients, this
commit changes the type from short to tran_low_t to properly reflect
the value range for highbitdepth build.
This fixes an out-of-range bug when optimize_b is used in highbitdepth
build.
Change-Id: Ibf330879e6ac6ae8f099e085caa9d3d9a889fde8
This is an actual overflow where the result of the calculation is
materially changed, not just a negative value that is stored in an
unsigned.
Caught with fsanitize=integer on the VP9/AqSegmentTest.TestNoMisMatchAQ2/1 test.
Change-Id: I514b0ef4ae7ad50e3e08c0079aa204d59fa679aa
In so doing this fixes a couple of bugs:
vpx_plane_add_noise.c needed to subtract a clamp instead of add.
And the assembly (mmx sse) had assumptions that parameters were
continuous in memory which was not true.
Change-Id: I76f2c43cf54bfc838eb2edf8a443eaaa7565d7b5
- iOS SDKs no longer ship with armv6 support.
- Our minimum iOS version means all target devices have neon.
- Remove armv6 darwin LD workaround.
- This removes a TODO.
Change-Id: I2fcb5b82c96213364275475be021c7dd8459d5c0
Move skin superblock force split out of this function as well
as some minor code refactors. Checked bitexact for different speed
settings and different resolutions.
Change-Id: I6078cbe88dd9ce6c0b69470a8a0a8f8d2274161b
this avoids the decoder test which was only correct for vp9, vp10 was
missed in the earlier change
Change-Id: Ib789c906d440c0e4169052cf64c74d5e4b196caa
First, we only set use_4x4_partition for key frame where we don't
denoise; second, envision we have small partitions, we should pass the
actual block size to denoiser and make an early termination if needed.
Change-Id: I331f42046d792b17360723d17ff817d601394658
Wrap around behavior is enforced manually and we use the values in
arithmetic involving negative integers.
Change-Id: I199706b6f3af91f4fb6fe2ef302fbbc6d0cf5785
The product always fits in uint32_t, but the operands don't.
An optimizing compiler should generate the wraparound code.
(Verified with clang).
Change-Id: I25eb64df99152992bc898b8ccbb01d55c8d16e3c
ADL will look this up from the callsite namespace iff it is declared
before the callsite or from the parent namespace of the class type (the
global namespace).
This patch has been tested on MSVS 2015 and clang-3.8.
Change-Id: I00ba74712c9b617b9d81761abed1e14d8f25d8e3
Block size passed into denoiser filter is always >= BLOCK_8X8 (in
vp9_pick_inter_mode), it is not necessary to check smaller block
size. Passed the bitexact test on clips with different resolutions and
noise levels.
Change-Id: I19fa3195d18c27d9e5de60dc11cff1522ef3714e
Fix will reset the consec_zero_mv map on non-skipped blocks with non-zero mv.
Adjust thresholds on consec_zero_mv in noise estimation and skin detection,
as more possible reset on map means lower thresholds should be used.
Change-Id: Ibe8520057472b3609585260b51b6f95a38fb777d
webm_read_frame is the only function now which requires
documentation for what the return value means (other two are quite
obvious - file_is_webm and webm_guess_framerate).
Change-Id: I7a4f7d8097b1d748812b2ee251ee718a0b5ce836
In VP9 internal denoiser, motion magnitude is computed from
best_sse_mv, which should be set to 0 at the begining. This bug may
cause visual aritifact in denoiser. Also, delete two improper comments.
Change-Id: I8710d2acba23320bc85cf72af17d65245c19438b
Need to check that sse for non-zero mv has been set for the current block
(i.e., check that nonzero-mv is tested as a mode, so newmv_sse != UINT_MAX)
before forcing to not use zero-mv for denoising.
Also increase some thresholds (sse and sse_diff) for high noise case,
and use shift operaton instead of multiplication on a threshold computation.
Change-Id: Iae7339475d57240316b7fa8b887c4ee3c0d0dbec
Resolved two TODO items.
Force a minimum value of 1.0 for frame duration as per section duration.
Column inactive zone is currently set to 0 as most of the serious issues
relating to inactive regions relate to letter boxing.
Change-Id: Ifbab3acf2c089d7305620a7ff7ed7c3536cc9235
In Aq mode 1 the segment and AQ delta for each block is based
on spatial variance. There may be a net imbalance between blocks
that have lower Q than the baseline value and those that have higher Q.
This patch monitors that imbalance and extends the allowed baseline
Q range for the frame to accommodate adjustment of that baseline value
to compensate.
Change-Id: Iae8a48c7c01fe2af94a141e149d03acf467237ca
So it can be used even with aq-mode=3 not enabled.
Also cleans up some code in the places where its used.
No change in behavior.
Change-Id: Ib6b265308dbd483f691200da9a0be4da4b380dbc
Removed this todo because of another todo which says none of this code
should exist. It should be integrated into the block by block encode
process as per the decoder.
Change-Id: I076bd15140a060e69c014dd7d7cd07fea260aba3
For 1 pass vbr mode.
Increase the gf interval for case where average Q is close to
max and high overshoot is detected.
Small increase in overall avg_psnr/sssim metrics (~0.2/0.1%) for ytlive,
but improves the low-end (low bitrate) for several clips (less overshoot).
Change-Id: Ifba40f25b4861b2e0d9832c82d5359a6a3dce9f2
More even spacing near key frame and avoid gf on scene cut
if its close to key frame.
Small increase in metrics for ytlive set (which uses key-period=150).
(~0.2% gain)
Change only affects 1 pass vbr mode.
Change-Id: If1e5a59baf1e0befbaf998522fbc47d94ac5b5df
Change only affects 1 pass vbr.
Use a q value somewhat larger (~6%) than avg_frame_qindex[INTER]
as basis for active_best_quality for inter-frames.
And use the minium of this (avg_frame_qindex) and the active_worst_quality.
This reduces some overshoot in ytlive clips.
Overall small but positive average increase in metrics (up on average ~0.2%).
Change-Id: Icdbaae7872d5675fd38a13c0ec6ce0e2e3b919ce
This was never hooked up for the 32x32_34 case as the neon_asm version
in 3f7c12da, when the intrinsics version was added.
Change-Id: Ic7db4ce5850c637315f9fe9e2de93a4f8cf9e320
Change recursive weight for average_source_sad and
put some constraint on spacing between detected scene-cuts.
Change only affects 1 pass real-time mode.
Change-Id: I1917e748d845e244812d11aec2a9d755372ec182
Correct the setting of Q basis of GF/ARF in 1 pass vbr.
Existing logic would switch to using avg_QP of key frame if
avg_QP of inter is less than active worst (even if key frame is
not last frame).
Instead fix the logic (as per the comment) to use the lower of
active_worst_quality and avg_Q for inter as basis for GF/ARF
active_best_quality (unless last frame was key frame).
Increase in metrics: AvgPSNR/SSIM up by ~0.7/0.3 on ytlive set.
Change-Id: I9a628378ec6684bfda9457ebfc2384ef6d8579f7
Adjustment to stop excessive prediction decay triggered by blocks
or frames with extremely low spatial complexity which rendered the
comparison of intra and inter coded errors meaningless.
This was causing much shorter than expected groups on some 4k
test content.
Change-Id: I3f2c64200ef6dcef4721fc9f2ec09e480056ffc2
Uses a metric on fraction of smooth blocks derived from first pass
stats in a frame to adjust down the cq_level modestly in the cq mode.
The current implementation does not add much complexity, and is
fairly light in the adaptation.
Change-Id: Ic484e810d5bd51b7bb6b8945f378c7c3d9d27053
Adjust the motion decay component to account for image size.
This has very little impact for smaller image sizes.
Average bdrate results for our HD test sets:-
Hdres set: opsnr +0,92%, Fast SSIM +1.6%
Netflix hd set: opsnr + 1.5%, Fast SSIM +3.1%
There are a couple of notable -ve clips such as cyclist and sunflower
which seem to be better with a shorter interval but also a few very big
wins such as Jets >12% psnr 22% Fast SSIM and from the Netflix
Netflix set PierSeaside 9.7% psnr and 18.2% Fast SSIM.
Change-Id: Ie43aaedaa74331ed83d624a13548094ac64fed9e
Change only affects 1 pass vbr mode, speed >=5.
Increase min_thresh, decrease boost, and set a min/max
value for gf_interval.
Change-Id: I9c1e1a1ab0c5780064eb62714ee39a72ea4d2107
Trap the case where we end up with a very short arf group just before
a key frame. Such a group often has poor quality and may cause pulsing.
For example if the KF is 17 frames away we are better doing two mid-size
groups of 9 and 8 than a group of 15 followed by a group of 2.
This becomes more and more important when coding with a short forced
kf interval though it may not impact our standard tests much.
Change-Id: I29d83d6637b203eac69be320dd35a7401a4678c1
This reverts commit 74aaa2389e. Unstable
under valgrind because of uninitialized reads. Limiting the bad bisect range.
Change-Id: I45b32f0ee0ba45795e7efb9947fb805830c8ce0e
- Use arithmetic AND (&) instead of logical AND (&&) to
generate correct testing input.
- Fix variance reference function to be consistent with
our codebase implementation.
- Refer to the following issue:
https://bugs.chromium.org/p/webm/issues/detail?id=1166
Change-Id: I8c1ebb03e22dc9e1dcd96bdf935fc126cee71307
Avoid copy-block when denoising is at LowLow level (i.e., no denoising is done).
Instead, don't enter denoiser at all, and when level goes back up over kLowLow
do a reset in denoiser.
Change-Id: I0544adf58f4dd51ecc4a4607fcb0353bfbbb7a59
only output[0] needs to be set, store_output is more involved than a
movdqa in the high bitdepth case
Change-Id: I2cbd85d7cf74688bdf47eb767934fe42e02bff67
Avoid doing the mcomp in denoiser if we don't denoise the
block (because of motion/SSE/skin threshold, etc).
This can reduce encoding time (with denoiser enabled) by ~1.5-2%.
Change-Id: Ia699b68dfd37b89cdf3a82b8aa40e8c8f98a3d4f
This make it more likely clean/low-noise content will
be set as LowLow, and hence no denoising will be done.
Also set early exit on denoising for small blocks.
Change-Id: I4a72bba3e6c5e2d523d304c39deacc9c39bf216c
Some cleanup and bugfix: pass mi_row/mi_col (not mv_col/mv_row)
to build_inter_predictors. This only affects case where
the frame is resized, but since denoising is not done on resized
frames, the fix has not effect currently.
Change-Id: I36617a7f0b43b6f49976745f15d400977e6ffa46
Switch to use new skin model.
And fix condition for denoising skin block.
Previous condition did not denoise skin blocks if the selected
mode was non-zero motion in current frame. Modify condition to
also force no denoising if that mode was not selected as zero motion
now and for at least "x" past frames in a row (x = 2).
Change-Id: I00753e3fe45b9a308a7ef43c58f11868e3bfc6b0
not strictly necessary, but allows projects using '-Wconversion
-Wno-sign-conversion' to reuse these headers.
Change-Id: Id1398d726c90173ccba9aea66798fcef6f20fa23
Change only affects 1 pass, vbr, speed = 5 (real-time mode).
Some improvement for high motion content.
AvgPSNR/SSIM metrics for ytlive set all up, on average ~2%,
some clips (high motion ones) up 4/5%.
Encoder speed down: on mynintendo_x1.1280_720.y4m: 47fps -> 44fps.
Change-Id: I9e3eaa6392dcb6b5b44ee6f43004f97ba859bc11
the vpx_decoder layer guarantees that when called directly this won't
receive NULL data and the reuse via decode() is protected by a NULL data
check and 0 size check (NULL data and non-zero data size is protected by
the vpx_decoder layer).
Change-Id: I7437fb5ca4e4aa431963d55b909d4d920f339be3
The mv is clamped in dec_find_mv_refs() to a smaller region
than the clamp in dec_find_best_ref_mvs(). See clamp_mv_ref
and clamp_mv2.
Change-Id: I47dd5f7fa8b42f2cc593559b4d7c782fe7bcb1db
In multi-thread case, the encoder may crash if using encoder option
tile-rows > 0. To prevent that, force tile-rows=0 in this situation.
This is a workaround for WebM issue 1095:
https://bugs.chromium.org/p/webm/issues/detail?id=1095
The further fix can be done by adding synchronizations after a tile
row is encoded. But this will hurt multi-threaded encoder performance.
So, it is recommended to use tile-rows=0 while encoding with threads
> 1.
Change-Id: I656cbcc200f8d0410d09530e7981ad8f32fe7bc9
This patch was to fix a reported Hangouts deadlock/freezing issue
in VP8 encoder(issue 27232610). The original encoder loopfilter
synchronization happened in the following frame, which was prone
to causing problems in some complex use cases. This patch simplified
the synchronization logic.
More testing needs to be done.
Change-Id: I38fd3f35d11f98fae1e44546aa5e4c6d6e19c4be
Allow the encode loop to select from a wider range of Q values
when encoding normal (non arf or kf) frames.
This change is targeted at improving psycho-visual quality in some
easy sections that are currently not getting enough bits.
This is likely to be a little worse from a metrics perspective and may also
have a small impact on encode speed in cases where extra recode
iterations are triggered.
Change-Id: I667eebf33c753bcbcf8b93596467369e5708b889
Adds a second threshold for recodes even on frames where
recode is normally disabled if there is a big rate miss.
Change-Id: Ifd4a34707da55ec15eb7cfb87de4644b8d76deb2
Fix the threshold for forcing refresh of golden frame based
on high motion. The current comparison was incorrect and
prevented this (force update of gf on high motion) from being used.
For now keep this logic under a flag (and off for now) so as to
not change behavior, until further testing.
Change-Id: Ib5f0082159a428b0603b9534e4bcb6f83e4ccb25
+5.857% BD-RATE on SCREEN_CONTENT
Leaving this off for non-screen content because:
+25.300% on TWITCH120
+37.833% BD-RATE on RTC
Change-Id: Ie0a312182d6cc859fb04298e4cd81d02b39e23fe
For 1 pass vbr mode: Increase the period of gf update on scene
cut (keep it same as orginal/default setting for now).
Change-Id: I679c3bd21152f6c4e486c8098d931c00e1d26b5f
This is the identical change submitted for vp8 here:
https://chromium-review.googlesource.com/#/c/274107/
Tested this change on Mac OSX (10.10) and Linux
(Linux Mint 17 / Ubuntu 14.04) and in both cases:
- downloaded and compiled latest source for libvpx and ffmpeg
- confirmed ffmpeg would build sub-second frame rate webm files
via the previous patch
- confirmed ffmpeg would *not* build fps < 1 for vp9
- made this change, recompiled libvpn and ffmpeg
- confirmed ffmpeg would now create the same webm with
fps < 1
- confirmed the resulting file would play and was vp9 (e.g.
would not play in Firefox (Linux version complained it was
VP9 but mostly could play it) or older vlc, etc., but does
play just fine in Google Chrome and a newer version of vlc.
Sorry I didn't catch this last time - but this seems a solid
change and it's handy to be able to create frame rates
less than one second.
-jk
Change-Id: I38fa32148de8c4c359f228cf08b9a4b83b5a52fb
The change https://chromium-review.googlesource.com/#/c/329181/
also changed behavior for cbr mode, which causes some regression
in screenshare test in webrtc.
Resetting the specific change to leave the cbr behavior
unchanged for now.
Change-Id: I52df158806422f86398e1d2f522e92067d8325eb
Some adjustments to inter-mode selection for vbr mode.
Condition some of the bias to low/zero motion on cbr mode, and
don't use int_pro_motion_estimation for golden ref
(treat it same as last ref).
Change only affect 1 pass vbr mode, speed >=5 (non-rd pickmode).
Encoding time increase within ~5%.
Avg PSNR/SSIM on RTC set increase by ~2%, all clips up,
ranging from 0.5 to 4%.
Change-Id: I0048d0104a8816773d91a2b1484d601169d9bad7
Don't advance the svc frame counters on dropped frame,
since this can break the referencing scheme and lead
to a crash/assert.
Updated svc-datarate unittest to add a lower bitrate test.
Change only affects 1 pass cbr svc, with frame dropper enabled.
Change-Id: Ibb7530b7a587a9344d46898d9286fd9e2ef0779c
Use the superframe counter to set the key frame, and force
it to the key frame on base spatial layer only.
Also, update svc frame counters under frame dropping.
Update unittest: add specific tests with short key frame period.
https://bugs.chromium.org/p/webm/issues/detail?id=1150
Change-Id: I5b1c9a09253e6e5fbfce51b4cf603ae22d422b01
For 1 pass cbr mode: allow for two-stage 1:2 scaling
(which will use the 1:2 optimized scaler) if the spatial
layer is 1/4x1/4 of souce.
Without this change, the base layer for 3 spatial layers would
be using the non-normative scaler which is un-optimized/C code.
Change-Id: I9d73f92a4a96927d0f1d6bf75315c1e60513226a
Use sharp filter to generate motion compensated reference for
temporal filtering. It improves the average coding performance of
VP9 speed 0:
derf 0.34%
hevcmr 0.38%
stdhd 0.58%
Change-Id: I1772a051be545de8c343055274e5ca0929d19cda
This commit back ports the fix from
https://chromium-review.googlesource.com/#/c/326940
It corrects the block partition context fetching in rate-distortion
optimization. It improves the average coding performance of speed 0:
derf 0.098%
hevcmr 0.102%
stdhd 0.282%
Change-Id: I8bcc6fe40ba5c6b50a6136daac116dcc738937ec
The double pointer in xd->mi handles this for us.
Cuts encode_suberblock()'s self time in half at rt speed 8.
Change-Id: I820dae24efdbf9a140bbeae82e4e2a5850317766
* changes:
x86/convolve.h: remove redundant check in FUN_CONV_2D
x86/convolve.h: replace while w/if for w < 16
x86/convolve.h: change filter[] || chains to |
restore the value for VP9 to 9999 to satisfy the current test
expectations; without this
VP9/DatarateTestVP9Large.ChangingDropFrameThresh/8 will overshoot.
Change-Id: I88dad574ae4ab10f923579824c7347ff468c7045
This reverts commit f51f0998e1.
This causes datarate tests to fail. Some are due to the new default
keyframe distance, another causes an assert even forcing 9999:
[ RUN ] VP9/DatarateOnePassCbrSvc.OnePassCbrSvc3SpatialLayers/0
test_libvpx:
vpx_dsp/x86/vpx_subpixel_8t_intrin_ssse3.c:853: scaledconvolve2d:
Assertion `y_step_q4 <= 32' failed.
Change-Id: I4ee4fea97f47e4f1a23b82a62e6afc6280961e38
Reset the scale factors before build_inter_predictors.
Add datarate tests for 3 spatial layers, which exposed this issue.
Change-Id: I7f81efbe44345ecea9fdd5f639a4cca76aed3874
For 1 pass cbr mode: allow for two-stage 1:2 scaling
(which will use the 1:2 optimized scaler) if the spatial
layer is 1/4x1/4 of souce.
Without this change, the base layer for 3 spatial layers would
be using the non-normative scaler which is un-optimized/C code.
Change-Id: Ifcf526ec2aaf3e5fa7924588d9dd8660bf02fb46
some configurations may fail if AltRefTest is undefined though
VP8_INSTANTIATE_TEST_CASE is defined away.
Change-Id: I7272775a506718336bd6cee2225cf83bd72fede5
the same as vp8, with the same reasoning from:
2a0d7b1 Reduce the default kf_max_dist to 128.
see also:
https://trac.ffmpeg.org/ticket/4904https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=815673
+ restore vpxenc behavior of taking the library default rather than
forcing 5s
This change also exposes an issue with one-pass svc in cbr mode, keep
the old default in datarate_test.cc for now.
Change-Id: Id6d1244f42490b06fefc1a7b4e12a423a1f83e88
* changes:
x86inc.asm: only set visibility for chromium builds
Only use .text sections for aout
Use .text instead of .rodata on macho
Copy PIC handling code from x86_abi_support
Set 'private_extern' visibility for macho targets
Expand PIC default to macho64 and respect CONFIG_PIC from libvpx
Use libvpx defines to set name mangling rules
Customize x86inc.asm for libvpx
Update x86inc.asm from x264
Use the existing scene/content change detection to better
update/adjust golden frame refresh.
Change only affects 1 pass real-time vbr mode, speed >=5.
Change-Id: I2963a5bb7ca4a19f8cf8511b0a925e502f60e014
this restores the previous version's behavior avoiding issues with
builds that may split sources on directory boundaries; protected
visibility may work in this case.
Change-Id: Ie759bd96c9ea5b45613f450dffa6e67eb45f5a8b
The read only sections are getting stripped on some OS X builds. As a
result, random data is used in place of the intended tables.
Change-Id: I4629c90d9e0ae4d4efc193a93be6fb93809ae895
Don't initialize first pass costs for a number of symbols where first
pass probabilities aren't initialized.
This brings a 1.22x first pass speedup.
https://bugs.chromium.org/p/webm/issues/detail?id=1089
Change-Id: I97438c357bd88f52f5a15c697031cf0c3cc8f510
replace with vpx_highbd_lpf_horizontal_edge_16 and
vpx_highbd_lpf_horizontal_edge_8 to avoid passing a count parameter
Change-Id: I551f8cec0fce57032cb2652584bb802e2248644d
replace with vpx_lpf_horizontal_edge_16 and vpx_lpf_horizontal_edge_8 to
avoid passing a count parameter
Change-Id: I848c95c02a3c6ebaa6c2bdf0983dce05cd645271
move to encoder_encode() as vp10_get_compressed_data() allocates data and
would require some modification to make its error return meaningful.
Change-Id: Ia5267c35d16ccd42b6da6d2136402b13e28f9159
move to encoder_encode() as vp9_get_compressed_data() allocates data and
would require some modification to make its error return meaningful.
Change-Id: I8ddc390a1441afd0ff937842fa4ad1053c956133
Add frame-level condition for reference masking: under external or
internal dynamic resize, allow for reference masking if none of
the references have been scaled.
Peviously, reference masking was turned off for the stream if dynamic
resize feature was enabled or an external resize event occurred.
reference_masking gives speed up with little/no loss in compression.
For speed 7 on rtc set: encoding time decreases by about 5-7%,
avgPSNR/SSIM goes down ~0.2%.
Change-Id: Ie4444577451ef954414d8fb4b2c99d65cadf1746
This commit fixes issue 1141. The issue was triggered in multi-tile
encoding. The change properly saves and restores the block context
information in the real-time mode selection process. It removes
several redundant memcpy operations in sub8x8 intra block mode search.
Change-Id: I35c9ad197f4bd500ec39b5fc833f052f19eee010
External dynamic resize with swapping width and height was
not handled properly.
Fix is to re-init loop-filter under certain condtions.
Modify unittest to test this case.
Without this change test will fail.
Relates to: https://bugs.chromium.org/p/webm/issues/detail?id=1140
Change-Id: I7d81ca7fe0783b3bc103a52a7b7cf073a96be26e
allocations done within this function are protected with
vpx_internal_error; adding the setjmp fixes a crash in
vp10_lookahead_push() under low memory conditions.
Change-Id: I5515017cd71b218840c506791b3a517da7ffc93e
allocations done within this function are protected with
vpx_internal_error; adding the setjmp fixes a crash in
vp9_lookahead_push() under low memory conditions.
Change-Id: I4b79dca37cc7fadc4b7633f0db44c0e406799bc6
An issue exists with reference_masking in non-rd pickmode for spatial
scaling. It was kept off for internal dynamic resizing and svc, this
change is to keep it off also for external dynamic resizing.
Update to external resize test, and update TODO to re-enable this
at frame level when references have same scale as source.
Change-Id: If880a643572127def703ee5b2d16fd41bdbf256c
For dynamic resizing (whether the new codec size is determined internally
or externally set by user), we should for now keep rc.resize_allowed enabled.
This prevent the use of referene_masking for real-time mode
(in set_rt_speed_feature()).
Change-Id: Ibb7c3ff35be88afdf1a3c6db6693521766f177a3
to vp9_setup_pre_planes(), preventing the function
unscaled_value() from being called. unscaled_value()
returns the same value that was passed in. See
scaled_buffer_offset() in vp9_reconinter.h.
Change-Id: I2a6fbaf07972c2f212834929d29a2cbe72e399c3
The bit to error transformation got doubled as a result of going from
8-bit to 9-bit costs (change d13385c).
Use defines to derive the scale numbers and comment some of the fields.
derf: -0.023 BDRATE
hevcmr: +0.067 BDRATE
stdhd: +0.098 BDRATE
(These are substantially smaller than than the original gains from 8 to
9 bit costing.)
Change-Id: I6a2b3b029b2f1415e4f90a05709b2333ec0eea9b
When the codec frame size is the same as the reference frame size,
release the scaled reference before assigning it a new buf_idx.
Only affects 1 pass non-svc mode, where the scaled references are
release only under certain conditions (to prevent un-needed scaling
of the references every frame).
Modified a unittest that can trigger this bug without this change.
https://code.google.com/p/chromium/issues/detail?id=582598
Change-Id: I9a884e36ebd7608b1641ec2a469e20a4f829cf43
If the application changes frame size (external size changes),
and aq-mode=3 is on, reset the cyclic refresh.
Modify the TestExternalResize unittest (longer run with more resize
actions). Without this change an assert would be triggered on this
longer test.
Change-Id: I0eefd2cd7ffa0c557cca96ae30d607034a2599ce
Fixes an issue where the tx_type was not set correctly for
sub8x8 inter and intra blocks. In the current syntax, for
sub8x8 blocks, there is still a single tx_type that is
transmitted. Ideally, this should be searched for the best
rd performance, albeit at the expense of encode speed.
For now, we just set it to DCT_DCT. Previously it was left
incorrectly as what was used for the previous non sub8x8
block.
derflr: BDRATE -0.277%
Change-Id: If76ba903bfbfd4d374cf1ac7d1daee50e92f0edd
Make this consistent with regular block size rate-distortion
optimization. It improves the compression performance:
derf 0.055%
hevcmr 0.129%
Change-Id: I112fe734f592c21bc7aa6efb7e3f269c4214ee7b
For 1 pass real-time mode. No change in behavior as only last
and golden are used as references in 1 pass real-time mode.
Change-Id: Ie4655014eee1a8b271542f29d74b2c6f7fed54c9
the results along the top and left border are then stored with a moving
window into the vector.
~40-67% faster on ARM, ~40-77+% on x86 depending on the block size.
Change-Id: Iab369aa2946a3ae4eb7290d512868fe5db92dbc8
delete apply_cyclic_refresh_bitrate(). unused since:
3472cbb vp9 aq-mode=3: Keep it on even at low bitrates.
Change-Id: I0fac9a31b59504e31000ac3a8f0b68e8d4320113
The definition is for the number of frames to check to determine the
recent decay rate, further to determine the next key frame in the
first pass of the encoder.
Change-Id: Ic696d6eb518a86fa296842273cf8767ef0b0e27a
when INLINE is defined and mips is not being targeted. otherwise keep
the old --enable-extra-warnings behavior
Change-Id: Iba576edbe5fca03efa56ce99eee11f9cafc573ad
-use larger threshold on y (as in vp8).
-add distance threshold for each cluster
-use larger skin distance threshold for first cluster
-add some early exist checks.
Keep default setting to model=0.
Change-Id: I1044b99ade4bb1f215a860a019a4d84cee2f7715
It improves the compression performance of VP9 by 0.1% across all
test sets. No speed change is observed.
Change-Id: I59338c5c9e67bae22188f35fc3afbfe2a6bba6b0
The postproc vp9_denoise() is a spatial denoise/blur function.
It was not intended to be used if temporal denoising is enabled.
Change-Id: I97d2dcb941e7cc49bbafce99d9286beb2693249d
Put check to avoid possible out of bounds when looping
over the blocks to estimate noise level.
No change in behavior.
Change-Id: I4b7b19b7edee0ae1c35b9dc0700b1bf9b304d7f5
* changes:
configure: extend armv7 hf target autodetect
configure: remove default CROSS for arm targets
configure: avoid default when CROSS is set to null
This commit changes SSSE3 optimized idct8x8 functions to work with
highbitdepth build.
With this commit and the previous one that enabled SSSE3 idct32x32
functions, tests showed virtually no difference on decoding speed for
file fdJc1_IBKJA.248.webm for the build with -enable-vp9-highbitdpeth
option and the build without the option.
Change-Id: Ibe0634149ec70e8b921e6b30171664b8690a9c45
This commit changes the SSSE3 assembly functions for idct32x32 to
support highbitdepth build.
On test clip fdJc1_IBKJA.248.webm, this cuts the speed difference
between hbd and lbd build from between 3-4% to 1-2%.
Change-Id: Ic3390e0113bc1ca5bba8ec80d1795ad31b484fca
the lookahead buffer allocation is deferred to receipt of the first
frame to allow profile changes. if the encoder was flushed before
supplying any frames the encoder would crash trying to dereference the
NULL buffer. vp8 is unaffected.
fixes mozilla bug:
https://bugzilla.mozilla.org/show_bug.cgi?id=1237848
Change-Id: Icee4b64de760476eee0d33b568f0a1010335ff13
Use multiple clusters instead of one and decrease
the distance thresholds.
Add a define to switch between models.
Default is set to existing (1 cluster) model.
Change-Id: I802cd9bb565437ae8983ef39453939f5d5073bb1
If a superblock contains alot of "skin" then force split
of 64x64 partition, and make some adjustments in mode selection.
This helps to reduce artifacts on moving face/skin areas at low bitrates.
Little/no change in metrics: avgPSNR/SSIM down by ~0.12%.
Small encoding time increase < 1%.
Change-Id: Ic57f52148c3716f391419fab0530d916e4c1d186
For aqmode=3, golden period update is set based on period of cyclic refresh.
Put a limit on max golden period update, for now set to 40.
And fix comment.
Change-Id: Icb61dd87c796cce2a5f5f7331c6a129540994696
Limit oscilation detection in the case where overshoot is very very
large.
This keeps the 9-bit cost patch from breaking the DownUp reisze test.
The patch pushed us to an 11% undershoot right before a scene cut
causing a 1200% overshoot. (Whereas before we were undershooting by
only 6% before overshooting by 1200%).
Change-Id: Id90ccfab8aba872ccadc45b73b3bb097b895677f
In inter mode search skip all modes except NEARESTMV and DC_PRED.
10% less encode latency for large frames using the chromium remoting_perftests.
+0.313% BDRATE on the screencast set at speed -6.
Change-Id: Ib97a39dd8bcdeab545509e0e02d78ce7033f8c63
Remove comment(s) and enable frame-dropper for tests.
Frame dropper for 1 pass svc was fixed a while ago:
https://chromium-review.googlesource.com/#/c/309230/
Change-Id: I5fd3192825b22e562db9210d3dc7b246a1799d8d
Make it consistent with the comment/intended behavior,
that is, only denoise if current block is zero_mv.
Change-Id: I3909761e802e80089752a493ab3646dc32698ded
Changes to mode selection for 1 pass SVC mode:
use base layer motion vector, changes to intra-prediction.
Change-Id: I3e883aa04db521cfa026a0b12c9478ea35a344c9
This patch fixes a bug that causes the loop filter search to reset to
a low value or zero after each arf overlay frame. We expect the overlay
frames to need little or no loop filtering but this should not propagate.
Change-Id: I895b28474cf200f20d82793f3de40b60b19579fd
This is a pure-refactor in preparation to potentially raise the bit-cost
resolution.
Verified at good speed 0 and rt speed -6.
Change-Id: I5347e6e8c28a9ad9dd0aae1d76a3d0f3c2335bb9
More aggresive on avoiding denoising on skin.
May supplement this later by adding condtion onn consec_zeromv.
Change-Id: Ied92b332f9b24e821d2009f81d1565758588d9a5
Different quality levels are used for different regions in
the frame depending on how far they are vertically from the
center. Specifically, three segments are used based on the
mi_row index with respect number to the number of mi_rows in
the frame.
Change-Id: Ifc8b777bc58ea8521dffc4640360c67d99f8d381
This reverts commit ea48370a50, reversing
changes made to 15939cb2d7.
The commit was insufficiently tested and causes failures.
Change-Id: I623d6fc2cd3ae6fd42d0abab1f8eada465ae57a7
This commit adds the logic for segmentation map initialization and
disable temporal update of segmentation map when error-resilient
mode is on. It fixes the enc/dec mistmates (release build) and
assertions(debug) when both aq-mode and error-resilient are on.
Change-Id: Id2155e8b28962cf1f64494f4df0c8d79499b6890
Prior to this patch, read_inter_block_mode_info() would
find the nearmv and nearestmv for all modes. Now it does not
search for ZEROMV modes and breaks out early for NEARMV and
NEWMV modes.
Change-Id: Ifa7b1eaf58bb03b9c7792ea5012fef477527d0fd
There are flaws in current implementation of VP8 multithreading encoder
and decoder as reported in the following issue:
https://code.google.com/p/chromium/issues/detail?id=158922
Although the data race warnings are harmless, and wouldn't cause real
problems while encoding and decoding videos, it is better to fix the
warnings so that VP8 code could pass the TSan test.
To synchronize the thread-shared data access and maintain the speed
(i.e. decoding speed), use multiple mutexes based on mb_rows to reduce
the number of synchronizations needed, make the reads and writes of
the shared data protected, and reduce the number of mb_col writes by
nsync times.
The decoder speed tests showed < 3% speed loss while using 2 ~ 4
threads.
Change-Id: Ie296defffcd86a693188b668270d811964227882
The nominal tx_type for a given mode is used as a context
to encode the actual tx_type for intra.
Results:
derflr: -0.241% BDRATE
hevcmr: -0.366% BDRATE
Change-Id: Icfe7b0a58d79bc6497a06e3441779afec6e01e21
This commit enables encoder to avoid 8x4 and 4x8 partitions for
scaled reference frames when libvpx is configured and built with
--enable-better-hw-compatibility
Change-Id: I02ad65c386f5855f4325d72570c49164ed52f413
Move the logic for forcing zero_mode after the
(ref_frame & flag_list) check.
This was causing an memory leak under msan:
https://bugs.chromium.org/p/webrtc/issues/detail?id=5402
Change-Id: Ie9d243369f8ed7c332f46178275945331da4fd85
Under --enable-better-hw-compabibility, this commit adds the asserts
that no mv clamping is applied for scaled references, so when built
with this configure option, decoder will assert if an input bitstream
triggger mv clamping for scaled reference frames.
Change-Id: I786e86a2bbbfb5bc2d2b706a31b0ffa8fe2eb0cb
This commit adds a new configure option:
--enable-better-hw-compatibility
The purpose of the configure option is to provide information on known
hardware decoder implementation bugs, so encoder implementers may
choose to implement their encoders in a way to avoid triggering these
decoder bugs.
The WebM team were made aware of that a number of hardware decoders
have trouble in handling the combination of scaled frame reference
frame and 8x4 or 4x8 partitions. This commit added asserts to vp9
decoder, so when built with above configure option, the decoder can
assert if an input bitstream triggers such decoder bug.
Change-Id: I386204cfa80ed16b50ebde57f886121ed76200bf
Add function to compute skin map for a given block, as its
used in several places (cyclic refresh, noise estimation, and denoising).
Change-Id: Ied622908df43b6927f7fafc6c019d1867f2a24eb
Set initial values for these parameters in the vp9_init_layer_context().
This also fixes an issue in the svc-bypass mode when frame flags are
passed via the vpx_codec_encode().
Change-Id: I0968f04672f8d3d2fe2cea6b8a23f79f80d7a8b1
Otherwise, per-segment lossless might mean that some segments are not
lossless and they could still want to use another mode. The per-block
tx points remain uncoded on blocks where (per the segment id) the Q
value implies lossless.
Change-Id: If210206ab1fe3dd11976797370c77f961f13dfa0
For coding block sizes <=16X16, if the block is determined to be skin,
then always allow for that block to be candidate for refresh. So if that
block happens to be on the boost segment(s), segment won't get reset to 0
and delta-q will be applied.
PSNR/SSIM metrics neutral (little/no change) on RTC clips.
Speed increase small/negligible (< 1%).
Some visual improvement on faces in a few RTC clips.
Change-Id: I6bf0fce6f39d820b491ce05d7c017ad168fce7d6
arm-none-linux-gnueabi- is an anachronism and makes building on native
arm platforms more difficult. further, many distros include alternative
cross compilers, e.g., arm-linux-gnueabihf-, so the choice is best left
up to the user.
Change-Id: Id8aaf820ed112b85db2b8518d0e9d8abee1ad85c
avoids picking up defaults if CROSS is forcibly set empty as in:
$ CROSS= ./configure ...
BUG=1121
Change-Id: I6af91959288dede01efe3e5945698ab249eb6ec3
reduce the register count by 1 to avoid xmm6 and unnecessarily
penalizing the other users of the base macro
Change-Id: I59605c9a41a31c1b74f67ec06a40d1a7f92c4699
In 32-bit build with --enable-shared, there is a lot of
register pressure and register src_strideq is reused.
The code needs to use the stack based version of src_stride,
but this doesn't compile when used in an lea instruction.
This patch also fixes a related segmentation fault caused by the
implementation using src_strideq even though it has been
reused.
This patch also fixes the HBD subpel variance tests that fail
when compiled without disable-optimizations.
These failures were caused by local variables in the assembler
routines colliding with the caller's stack frame.
Change-Id: Ice9d4dafdcbdc6038ad5ee7c1c09a8f06deca362
H/V intra mode was only enabled for bsize < 16x16,
enable it also for bsize=16x16.
Metrics are neutral with this change:
Overall very small gain (0.1%), small visual gain on some RTC clips.
Change-Id: Ib2d7a44382433bfc11cf324aa3cc5c382ea9e088
For testing implemented a fixed pattern and delta, 1 pass,
fixed Q, low delay mode.
This has not in any way been tuned or optimized.
Change-Id: Icf9b57c3bb16cc5c0726d5229009212af36eb6d9
(copied from VP9)
The one pass VBR mode selects a Q range based on a
moving average of recent Q values. This calculation
should have been excluding arf overlay frames as these
are usually coded at the highest allowed value. Their
inclusion skews the average and can cause it to drift
upwards even when the clip as a whole is undershooting.
As such it can undermine correct adaptation of the allowed
Q range especially for easy content.
Change-Id: I9e12da84e12917e836b6e53ca4dfe4f150b9efb1
For testing implemented a fixed pattern and delta, 1 pass,
fixed Q, low delay mode.
This has not in any way been tuned or optimized.
Change-Id: Idf5ee179b277fa15d07a97f14f2ce5bbaae80a04
The one pass VBR mode selects a Q range based on a
moving average of recent Q values. This calculation
should have been excluding arf overlay frames as these
are usually coded at the highest allowed value. Their
inclusion skews the average and can cause it to drift
upwards even when the clip as a whole is undershooting.
As such it can undermine correct adaptation of the allowed
Q range especially for easy content.
Change-Id: I7d10fe4227262376aa2dc2a7aec0f1fd82bf11f9
The culprit is on the decode side xd->lossless[i] setup was in wrong
location where segment features are not yet decoded.
Also on the encoder side, transform mode was not set consistently
between when tx_mode is selected and how tx_mode is enforced in
tx size selection.
Change-Id: I4c4c32188fda7530cadab9b46d4201f33f7ceca3
Keep track of frame indexes for the references, and
constrain inter mode search for reference with same
temporal alignment.
Improves speed by about ~15%, no noticeable loss in
compression performance.
Change-Id: I5c407a8acca921234060c4fcef4afd7d734201c8
Lower the threshold for splitting 32x32->16x16 based on average variance,
and add lower bound condition for this split to occur. This prevents
unneccassry splitting for areas with very low variance.
Change-Id: Ibeb33b3d993632c2019f296eb87ef3b7e3568189
For non-rd variannce partition, speed >= 5:
Adjustments to reduce dragging artifcat of background area near
slow moving boundary.
-Decrease base threshold under low source noise conditions.
-Add condition to split 64x64/32x32 based on average variances
of lower level blocks.
PSNR/SSIM metrics go down ~0.7/0.9% on average on RTC set.
Visually helps to reduce dragging artifact on some rtc clips.
Change-Id: If1f0a1aef1ddacd67464520ca070e167abf82fac
Reallocate the xmm register usage so that no ARCH_X86_64 required.
Reduce memory access to the left neighbor by half.
Speed up by single digit on big core machine.
Change-Id: I392515ed8e8aeb02e6a717b3966b1ba13f5be990
This commit makes the sub8x8 block rate-distortion optimization
scheme use precise motion compensated prediction to compute the rd
cost. It fixes a potential buffer overflow issue related to sub8x8
motion search on scaled reference frame.
Change-Id: I4274992ef4f54eaacfde60db045e269c13aaa2de
GET_GOT modifies the stack pointer so the offset for left's address will
be wrong if loaded afterword.
Change-Id: Iff9433aec45f5f6fe1a59ed8080c589bad429536
Relocate the function from SSSE3 to SSE2, Unroll loop from 16 to 8,
and reduce mem access to left.
Speed up by single digit in ./test_intra_pred_speed on big core
machines.
Change-Id: I2b7fc95ffc0c42145be2baca4dc77116dff1c960
Xcode 7 refuses to link to x86 and x86_64 code that's built for
iphone sim, so add an extra command line flag that forces iosbuild
to use darwin15 targets.
Change-Id: I2228d458f5cccf4d26866040380a974f88d9d360
This commit enables the new temporal filter system for VP9. For
speed 1, it improves the compression performance:
derf 0.54%
stdhd 1.62%
Change-Id: I041760044def943e464345223790d4efad70b91e
This change has been imported from VP9 and
alters the nature and use of exhaustive motion search.
Firstly any exhaustive search is preceded by a normal step search.
The exhaustive search is only carried out if the distortion resulting
from the step search is above a threshold value.
Secondly the simple +/- 64 exhaustive search is replaced by a
multi stage mesh based search where each stage has a range
and step/interval size. Subsequent stages use the best position from
the previous stage as the center of the search but use a reduced range
and interval size.
For example:
stage 1: Range +/- 64 interval 4
stage 2: Range +/- 32 interval 2
stage 3: Range +/- 15 interval 1
This process, especially when it follows on from a normal step
search, has shown itself to be almost as effective as a full range
exhaustive search with step 1 but greatly lowers the computational
complexity such that it can be used in some cases for speeds 0-2.
This patch also removes a double exhaustive search for sub 8x8 blocks
which also contained a bug (the two searches used different distortion
metrics).
For best quality in my test animation sequence this patch has almost
no impact on quality but improves encode speed by more than 5X.
Restricted use in good quality speeds 0-2 yields significant quality gains
on the animation test of 0.2 - 0.5 db with only a small impact on encode
speed. On most natural video clips, however, where the step search
is performing well, the quality gain and speed impact are small.
Change-Id: Iac24152ae239f42a246f39ee5f00fe62d193cb98
4x4 Intra predictor implemented with MMX is replaced with SSE2.
Segfault in change 315561 when decoding vp8 is taken care of.
Change-Id: I083a7cb4eb8982954c20865160f91ebec777ec76
Fix copied over from VP9 master to VP10 master.
Do not reset the alt ref active flag when overlaying the middle
arf(s) of a multi arf group.
Change-Id: I1b7392107e7c675640d5ee1624012f39cc374c58
use CONFIG_VP[89] to protect white-box tests and drop redundant
uses of CONFIG_VP9 in variable assignments within that block
Change-Id: Id3c6cf5c7822aa161b19768b295f58829a1c6447
For non-rd variance partition: Adjust variance threhsold based
on noise level estimate. This change allows the adjustment to be
updated more frequently.
Change-Id: Ie2abf63bf3f1ee54d0bc4ff497298801fdb92b0d
Relocate the function from SSSE3 to SSE2, Unroll loop from 8 to 4,
and reduce mem access to left.
Speed up by >20% in ./test_intra_pred_speed.
Change-Id: Ie48229c2e32404706b722442942c84983bda74cc
Relocate the function from SSSE3 to SSE2, Unroll loop from 4 to 2,
and reduce mem access to left.
Speed up by >20% in ./test_intra_pred_speed.
Change-Id: Ib9f1846819783b6e05e2a310c930eb844b2b4d2e
The range_check is not used because the bit range
in fdct# is not correct. Since we are going to merge in a new version
of fdct# from nextgenv2, we won't fix the incorrect bit range now.
Change-Id: I54f27a6507f27bf475af302b4dbedc71c5385118
For low resolutions, whem 4x4downsample is used for variance,
use the same force split (that is used for 8x8downsample) for 16x16 blocks.
No change in metrics. Small improvement visually.
Change-Id: I915b9895902d0b9a41e75d37fee1bf3714d2366d
the loop filter level is transmitted as 6-bits + sign so needs to be clamped in
the delta + absolute case.
BUG=https://bugzilla.mozilla.org/show_bug.cgi?id=1224363
Change-Id: Icbdca4fdbf043466429bd5c9d59dbe913bf153bc
the quantizer is transmitted as 7-bits + sign so needs to be clamped in
the delta + absolute case.
BUG=https://bugzilla.mozilla.org/show_bug.cgi?id=1224361
Change-Id: I9115f5d1d5cf7e0a1d149d79486d9d17de9b9639
This is so we may update level at any time (e.g., to be used
for setting thresholds in variance-based partition).
Change-Id: I32caad2271b8e03017a531f9ea456a6dbb9d49c7
Under certain denoising conditons, check for re-evaluation of
zero_last mode if best mode was golden reference.
Change-Id: Ic6cdfd175eef2f7d68606300c7173ab6654b3f6e
Reduce mem access to left. Speed up by 10% in ./test_intra_pred_speed
with the same instruction size.
Change-Id: Ia33689d62476972cc82ebb06b50415aeccc95d15
For non-rd variance partition: only allow minmax computation
(which currently has no arm-neon optimization) for speeds < 8.
Performance loss is small: On RTC set with speed 8, few clips lose ~2/3%,
average loss is < 1%.
Change-Id: Ia9414f4d0b77dc83c3e73ca8de5d903f64b425ce
Relocate h_predictor_4x4 from SSSE3 to SSE2 with XMM registers.
Speed up by ~25% in ./test_intra_pred_speed.
Change-Id: I64e14c13b482a471449be3559bfb0da45cf88d9d
Change initial state of noise level, and only update
denoiser with noise level when estimate is done.
Change-Id: If44090d29949d3e4927e855d88241634cdb395dc
For denoising, and for noise level above threshold, re-evaluate
ZEROMV for mode selection after denoising.
Current change only does this check if selected best mode (before denoising)
was intra.
Change-Id: I4b1435b68d26c78f7597b995ee7bff0ddd5f9511
Always round sum error and sum square error toward zero in variance
calculations. This prevents variance from becoming negative.
Avoiding rounding variance at all might be better but would be far
more invasive.
Change-Id: Icf24e0e75ff94952fc026ba6a4d26adf8d373f1c
This change makes sure last reference with zero mv
is always checked for mode selection.
No change in metrics.
Change-Id: Iaf01877bf34272b966c78bfe18daad882a0a419e
the final sum may use up to 26 bits
+ add a unit test
+ disable the sse2 as the result will rollover; this will be fixed in a
future commit
Change-Id: I2a49811dfaa06abfd9fa1e1e65ed7cd68e4c97ce
Change on affects 1 pass CBR.
On key frame, temporal layer_id is reset to 0 for 1 pass CBR,
but since "layer" is reset, the svc.layer_context[layer].is_key_frame
was not correspondingly set properly.
Change-Id: I08f6da0a55ac7429ccfbaddfb7be14479e43543b
tm_predictor_4x4 is implemented with SSE2 using XMM registers.
Speed up by ~25% in ./test_intra_pred_speed.
Change-Id: I25074b78d476a2cb17f81cf654bdfd80df2070e0
--disable-XXX has the effect of disabling all extensions above it, e.g.,
--disable-ssse3 disables ssse3-avx2.
Change-Id: If02b44ca71ee12e4acb12010db8593a7989f2a9d
Small changes to the best quality default speed trade off.
Some speedup settings are worth while even for best quality as they
have only a very small impact on quality but a significant impact on
encode time.
These changes give as much as a further 50-60% increase in encode
speed for my test animations clip with minimal impact on quality.
For this sequence these changes improve the best quality encode speed
to about the same level as good quality speed 0 in Q3 2015 whilst
retaining the large quality gain of over 1 db
For many natural videos though the quality difference from good 0
to best is much smaller.
Change-Id: I28b3840009d77e129817a78a7c41e29cb03e1132
This is simpler than the previous scheme, which tried to allocate
the CRITICAL_SECTION struct in a thread-safe manner before it
could use it to run the wrapped function in a thread-safe manner.
Change-Id: I172e5544e5f16403a3a0e5e2b9104b1292a0d786
This change alters the nature and use of exhaustive motion search.
Firstly any exhaustive search is preceded by a normal step search.
The exhaustive search is only carried out if the distortion resulting
from the step search is above a threshold value.
Secondly the simple +/- 64 exhaustive search is replaced by a
multi stage mesh based search where each stage has a range
and step/interval size. Subsequent stages use the best position from
the previous stage as the center of the search but use a reduced range
and interval size.
For example:
stage 1: Range +/- 64 interval 4
stage 2: Range +/- 32 interval 2
stage 3: Range +/- 15 interval 1
This process, especially when it follows on from a normal step
search, has shown itself to be almost as effective as a full range
exhaustive search with step 1 but greatly lowers the computational
complexity such that it can be used in some cases for speeds 0-2.
This patch also removes a double exhaustive search for sub 8x8 blocks
which also contained a bug (the two searches used different distortion
metrics).
For best quality in my test animation sequence this patch has almost
no impact on quality but improves encode speed by more than 5X.
Restricted use in good quality speeds 0-2 yields significant quality gains
on the animation test of 0.2 - 0.5 db with only a small impact on encode
speed. On most clips though the quality gain and speed impact are small.
Change-Id: Id22967a840e996e1db273f6ac4ff03f4f52d49aa
This reverts commit 380a5519cc.
This causes an assertion failure in debug_check_frame_counts() which
probably isn't valid with this change; leaving the investigation for
later now.
Change-Id: Ieda5ca811ed2fa50a0cc6935919a8d10dca996e0
This function now has an AVX intrinsics version which is about 80%
faster compared to the C implementation. This provides a 2-4% total
speed-up for encode, depending on encoding parameters. The function
utilizes 3 properties of the cost function lookup table, constructed
in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'.
For the joint cost:
- mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3]
For the component costs:
- For all i: mvsadcost[0][i] == mvsadcost[1][i]
(equal per component cost)
- For all i: mvsadcost[0][i] == mvsadcost[0][-i]
(Cost function is even)
These must hold, otherwise the AVX version of the function cannot be used.
Change-Id: I6c2791d43022822a9e6ab43cd124a773946d0bdc
the return value of enabled, which may be empty, is handled by the for
loop. this avoids making an unnecessarily long command line which may
fail in certain cases.
Change-Id: Ib88ecbbe2c0f6d7debb600b4caed4884497263b1
Change is only for real-time mode, speed >= 5, and non-screen content mode.
Add bias to zero/low motion for big blocks, if noise estimation
is enabled and noise level is above threshold.
Change-Id: I3a0a4608ede6aa535bda6eca528d20f8aba738e7
For 1 pass CBR mode: increase waiting time after key frame
before we start sampling rate control behavior for determining
resize. This change need to disable one internal resize(DownUp)
temporally since it requires a longer clip to do so.
Change-Id: If21beda1be23f169ee541ab4dd642f718347887a
Use same setting for speed 5 (as it is for speed > 5).
Change is only for real-time (non-rd) mode.
Change-Id: I830250eac654328373cb318baa89d4f0e63942e1
Reduces Linux perf estimated cycle count for pack_mb_tokens on a
lossless encode on my desktop from 61858501855 to 48154040219 or from
26% of the overall profile to 21%.
Change-Id: I9ca3426d7e3272bc7f7030abda4f0d0cec87fb4a
This reverts commit f1342a7b07.
This breaks 32-bit builds:
runtime error: load of misaligned address 0xf72fdd48 for type 'const
__m128i' (vector of 2 'long long' values), which requires 16 byte
alignment
+ _mm_set1_epi64x is incompatible with some versions of visual studio
Change-Id: I6f6fc3c11403344cef78d1c432cdc9147e5c1673
Add threshold/condition on spatial_variance and brightness level.
Modification to normalization of block variance.
Change resolution limit below which we disable noise estimation.
Change-Id: If5be08a26ceda351242d8a58d2f0bc88c0a918f0
This function now has an AVX intrinsics version which is about 80%
faster compared to the C implementation. This provides a 2-4% total
speed-up for encode, depending on encoding parameters. The function
utilizes 3 properties of the cost function lookup table, constructed
in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'.
For the joint cost:
- mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3]
For the component costs:
- For all i: mvsadcost[0][i] == mvsadcost[1][i]
(equal per component cost)
- For all i: mvsadcost[0][i] == mvsadcost[0][-i]
(Cost function is even)
These must hold, otherwise the AVX version of the function cannot be used.
Change-Id: I184055b864c5a2dc37b2d8c5c9012eb801e9daf6
Change is only for real-time mode, speed > 5, and non-screen content mode.
Bias is based on block size and motion vector level (motion above some threshold).
Helps to improves stability in background from lightning changes.
PSNR/SSIM metrics on RTC set almost no change/neutral (within +/- 0.1).
Change-Id: I7eac13c1ae10be4ab1f40acc7f9f1df5653ece9d
Only use non-zero threshold(s) for breakout if
the motion level of the current tested mode is low.
Change-Id: I22aae961cc42371b49d3f648560181cc54708502
Source noise level estimate is also useful for
setting variance encoder parameters (variance thresholds,
qp-delta, mode selection, etc), so allow it to be used also
if denoising is not on.
Change-Id: I4fe23d47607b4e17a35287057f489c29114beed1
this avoids redefining vpx_codec_vp9_dx, vpx_codec_vp9_dx_algo in
vp9_encoder_parms_get_to_decoder.cc
Change-Id: I3b89e7a62497227ee32419f1a7d30e4c10a13c05
(cherry picked from commit ca163b85bb)
this avoids redefining vpx_codec_vp9_dx, vpx_codec_vp9_dx_algo in
vp9_encoder_parms_get_to_decoder.cc
Change-Id: I3b89e7a62497227ee32419f1a7d30e4c10a13c05
The old workaround "p = 0 ? 0 : p -1" is misleading.
?: happens before =
assigning back to p truncates to one byte.
Therefore it is equivalent to (p - 1) & 0xFF, but the check just exists
to work around a first pass bug, so let's make the work around more
clear.
https://bugs.chromium.org/p/webm/issues/detail?id=1089
Change-Id: I587c44dd61c1f3767543c0126376f881889935af
Width and height of downscaling resolution should not be lower
than min_width and min_height which can be set as needed, both
are 180 for now.
Change-Id: I34d06704ea51affbdd814246e22ee8d41d991f00
This reverts commit 7f56cb2978.
It causes uninitialized reads in the first pass setting up later cost tables.
Change-Id: I2df498df3f5c03eff359f79edf045aed0c618dc9
Remove delta index 254 from probability remapping and subexp coding.
Saves 1-bit when the delta index is 129.
Change-Id: I88aba565fc766b1769165be458d2efd3ce45817e
Adjust variance threshold, delta-qp, and intra penalty cost,
based on estimated noise level in source.
Replace denoising_on with a level value=L/M/H.
Change-Id: I0c017dae75a5d897367d2c42dec26f2f37e447c1
The option exists specifically to allow for configurations
where the build environment is different from the configure
environment.
Change-Id: I95196fa3c49700251d10ff5d256dc7380e39d0c4
The old workaround "p = 0 ? 0 : p -1" is misleading.
?: happens before =
assigning back to p truncates to one byte.
Therefore it is equivalent to (p - 1) & 0xFF, but the check just exists
to work around a first pass bug, so let's make the work around more
clear.
https://code.google.com/p/webm/issues/detail?id=1089
Change-Id: Ia6dcc8922e1acbac0eeca23a4d564a355c489572
The custom LCG is based on the POSIX recommend constants for a 16-bit
rand(). This implementation uses less computation than typical standard
library procedures which have been extended for 32-bit support, is
guaranteed to be reentrant, and identical everywhere.
Change-Id: I3140bbd566f44ab820d131c584a5d4ec6134c5a0
Ref: http://pubs.opengroup.org/onlinepubs/9699919799/functions/rand.html
Bug relating to issue:- http://b/25090786
base_frame_target is supposed to track the idealized bit
allocation based on error score and not the actual bits
allocated to each frame.
The clamping of this value based on the VBR min and max pct values
was causing a bug where in some cases the loop that adjusts the
active max quantizer for each GF group was running out of bits at
the end of a KF group. This caused a spike in Q and some ugly artifacts.
A second change makes sure that the calculation of the active
Q range for a group DOES, however, take account of clamping.
Change-Id: I31035e97d18853530b0874b433c1da7703f607d1
Periodically estiamte noise level in source, and only denoise
if estimated noise level is above threshold.
Change-Id: I54f967b3003b0c14d0b1d3dc83cb82ce8cc2d381
Add the row and column index to the argument list of unit functions
called by foreach_transformed_block wrapper. This avoids the
repeated internal parsing according to the block index.
Change-Id: Ie7508acdac0b498487564639bc5cc6378a8a0df7
A new version of vp9_highbd_error_8bit is now available which is
optimized with AVX assembly. AVX itself does not buy us too much, but
the non-destructive 3 operand format encoding of the 128bit SSEn integer
instructions helps to eliminate move instructions. The Sandy Bridge
micro-architecture cannot eliminate move instructions in the processor
front end, so AVX will help on these machines.
Further 2 optimizations are applied:
1. The common case of computing block error on 4x4 blocks is optimized
as a special case.
2. All arithmetic is speculatively done on 32 bits only. At the end of
the loop, the code detects if overflow might have happened and if so,
the whole computation is re-executed using higher precision arithmetic.
This case however is extremely rare in real use, so we can achieve a
large net gain here.
The optimizations rely on the fact that the coefficients are in the
range [-(2^15-1), 2^15-1], and that the quantized coefficients always
have the same sign as the input coefficients (in the worst case they are
0). These are the same assumptions that the old SSE2 assembly code for
the non high bitdepth configuration relied on. The unit tests have been
updated to take this constraint into consideration when generating test
input data.
Change-Id: I57d9888a74715e7145a5d9987d67891ef68f39b7
This causes the output of find_ref_mvs() to always be unique or zero.
A nice side-effect of this is that it also causes the output of
find_ref_mvs_sub8x8() to be unique-or-zero, and it will not ignore
available candidate MVs under certain conditions.
See issue 1012.
Change-Id: If4792789cb7885dbc9db420001d95f9b91b63bfa
Added optimization of the 8 bit assembly quantizer routines. This makes
these functions up to 100% faster, depending on encoding parameters.
This patch maskes the encoder faster in both the high bitdepth and 8bit
configurations. In the high bitdepth configuration, it effects profile 0
only.
Based on my profiling using 1080p input the net gain is between 1-3% for
the 8 bit config, and around 2.5-4.5% for the high bitdepth config,
depending on target bitrate. The difference between the 8 bit and high
bitdepth configurations for the same encoder run is reduced by 1% in all
cases I have profiled.
Change-Id: I86714a6b7364da20cd468cd784247009663a5140
VP8E_UPD_ENTROPY, VP8E_UPD_REFERENCE and VP8E_USE_REFERENCE have been
deprecated since the initial public release
Change-Id: Ied16b441eec13434d85f1ab115d49ccaf5f2f7b0
Some more testing of this patch would probably be useful, but I
think the basics of it should work fine now.
See issue 1035.
Change-Id: I4a36d58f671c5391cb09d564581784a00ed26245
This experiment allows using full above/right edges for all transform
sizes whenever available (for d45/d63), and adds bottom/left edges for
d207.
See issue 1043.
Change-Id: I5cf7f345e783e8539bb6b6d2c9972fb1d6d0a78b
In VP9, the ref MV had to point to a block that itself fully resided
within the visible image, i.e. all borders of the image had to be
within the visible borders of the coded frame. This is somewhat
illogical, and had obscure side effects, e.g. clamping of fairly
reasonable motion vectors such as 0,0 were clipped to negative values
if the block was overhanging on frame edges (such as the last rows
on 1080p content), which makes no sense whatsoever.
Instead, relax clamping constraints such that the ref MVs are allowed
to point to blocks exactly outside the visible edges in both Y as well
as UV planes, including the 8tap filter edges (that's why the offset is
8 pixels + block size).
See issue 1037.
Change-Id: I2683eb2a18b24955e4dcce36c2940aa2ba3a1061
This has various benefits:
- simplify implementations because we don't have to switch between
multiple probability tables depending on frametype
- allows fw subexp and bw adaptivity for partitions/uvmode in keyframes
See issue 1040 point 5.
Change-Id: Ia566aa2863252d130cee9deedcf123bb2a0d3765
Locate them (code-wise) in frame_context, and have them be updated
as any other probability using the subexp forward and adaptive bw
updates.
See issue 1040 point 1.
TODOs:
- real-world default probabilities
- why is counts sometimes NULL in the decoder? Does that mean bw
adaptivity updates only work on some frames? (I haven't looked
very closely yet, maybe this is a red herring.)
Change-Id: I23b57b4e5e7574b75f16eb64823b29c22fbab42e
Account for rounding in distortion calculation in k-means;
carry out rounding before duplicates removal of base colors;
replace numbers with macros;
use prefix increment.
Slight coding gain (<0.1%) on screen_content testset.
Change-Id: Ie8bd241266da6b82c7b2874befc3a0c72b4fcd8c
Adjust the qp threshold and consec_zeromv threshold for
limiting cyclic refresh. Also increase the refresh period
when the limit amount is significant, and some code-cleanup.
Small gain in PSNR/SSIM metrics: ~0.25/0.3 gain on RTC set, speed 7.
Change only affects non-screen content.
Change-Id: I1ced87a89a132684c071e722616e445b2d18236a
Adjust the qp threshold based on the denoising setting; not allow
to scale directly from original resolution to one half and vise versa.
Change-Id: I032a9b22f8e1c88de6bb81cf8351367223a3e40d
For the re-encoding (at max-qp) on the detected high-content change:
update rate correction factor, reset rate over/under-shoot flags,
and update/reset the rate control for layered coding.
Change-Id: I5dc72bb235427344dc87b5235f2b0f31704a034a
Changes to the breakout behavior for partition selection.
The biggest impact is on speed 0 where encode speed in
some cases more than doubles with typically less than 1%
impact on quality.
Speed 0 encode speed impact examples
Animation test clip: +128%
Park Joy: +59%
Old town Cross: + 109%
Change-Id: I222720657e56cede1b2a5539096f788ffb2df3a1
This change (in a new config experiment: universal_hp) removes the
bitstream parsing dependency of the HP MV bit on the ref MV to be
coded. It also cleans up clearing of the HP bit in near/nearestMV,
since HP is always on if it's set in the frame header.
This admittedly doesn't clean up the crap that could be cleaned up,
but that's mostly because I think this needs some careful review;
not so much for coding style, but more from hardware people and from
the codec team on what we/you want. It would also be nice to get some
actual numbers on the real quality impact of this change. If, for
example, hardware people come up and tell us they don't actually care
anymore, we should probably just this code as-is and do nothing (i.e.
discard this patch).
See issue 1036.
Change-Id: Ic9b106f34422aa0f79de0c28125b72d566bd511a
This actually has no effect whatsoever, since the input MVs themselves
are clamped by clamp_mv_ref() already, which is significantly more
restrictive in its bounds.
Change-Id: I4a3a7b2b121ee422c56428c2a12d930c3813c06e
We only write EOSB tokens if we write tokens (i.e. not for skip blocks),
and we write EOSB tokens per-plane instead of per block.
Change-Id: I8d7ee99f8ec50eb7ae809f9f9282c1c91dbf6537
Add palette mode for keyframe luma channel. Palette mode is enabled
when using "--tune-content=screen" in encoding config parameters.
on screen_content testset: +6.89%
on derlr : +0.00%
Design doc (WIP):
https://goo.gl/lD4yJw
Change-Id: Ib368b216bfd3ea21c6c27436934ad87afdaa6f88
If high bit depth configuration is enabled, but encoding in profile 0,
the code now falls back on optimized SSE2 assembler to compute the
block errors, similar to when high bit depth is not enabled.
Change-Id: I471d1494e541de61a4008f852dbc0d548856484f
The artifact occurs periodically when VP9 denoiser is on and
refresh_golden_frame happen. When refresh_golden_frame happen,
we should copy the frame buffer instead of swapping the pointers.
Change-Id: Ib3204c4b04db28ecf439c6d9e61f3d146f04196d
this reduces the number of synchronizations in decode_tiles_mt() and
improves overall performance when the number of threads is less than the
number of tiles
Change-Id: Iaee6082673dc187ffe0e3d91a701d1e470c62924
Small code cleanup. consec_zeromv refresh threshold
does not need to be computed for every super-block.
No change in behavior.
Change-Id: I8c4b1b28072f42b01d917fff6d1f62722f1e1554
The serial decode check is too strict for tile-threaded decoding as
there is no guarantee on the decode order nor which specific error
will take precedence. Currently a tile-level error is not forwarded so
the frame will simply be marked corrupt.
Change-Id: I51cf1e39e44bedeac93746154b36a4ccb2f059b1
Use the existing VP9_SET_SVC control to set the
first spatial layer to encode.
Since we loop over all spatial layers inside the encoder, the
setting of spatial_layer_id via VP9_SET_SVC has no relevance.
Use it instead to set the first_spatial_layer_to_encode,
which allows an application to skip encoding lower layer(s).
Change only affects the 1 pass CBR SVC.
Change-Id: I5d63ab713c3e250fdf42c637f38d5ec8f60cd1fb
When configured with high bit detpth enabled, the 8bit quantize
function stopped using optimised code. This made 8bit content
decode slowly. This commit re-enables the SSSE3 optimisations.
Change-Id: I194b505dd3f4c494e5c5e53e020f5d94534b16b5
We have historically added new bits to cat6 whenever we added a new
transform size (or bitdepth, for that matter). However, we have
always coded these new bits regardless of the actual transform size,
which means that for smaller transforms, we code bits that cannot
possibly be set. The coding (quality) impact of this is negligible,
but the bigger issue is that this allows creating bitstreams with
coefficient values that are nonsensible and can cause int overflows,
which then de facto become part of the bitstream spec. By not coding
these bits, we remove this possibility.
See issue 1065.
Change-Id: Ib3186eca2df6a7a15ddc60c8b55af182aadd964d
This is identical to what the tile size does for the last tile. See
issue 1042 (which covers generalizing the superframe/tile concepts).
Change-Id: I1f187d2e3b984e424e3b6d79201b8723069e1a50
See issue 1051. 6 bits is fairly arbitrary but at least allows writing
delta Q values that are fairly normal in other codecs. I can extend to
8 if people want full range, although I personally don't have any need
for that.
Change-Id: I0a5a7c3d9b8eb3de4418430ab0e925d4a08cd7a0
The resolution check fixs the issue which resets resize_pending
unnecessarily and causes not-bitexact with previous one-step version.
Change-Id: I4e7660b3c8f34f59781e2e61ca30d61080c322de
When configured with high bit detpth enabled, the 8bit quantize
function stopped using optimised code. This made 8bit content
decode slowly. This commit re-enables the SSE2 optimisation
(but not the SSSE3 optimisation).
Change-Id: Id015fe3c1c44580a4bff3f4bd985170f2806a9d9
Temporary fix to denoiser when dynamic resizing is on.
-Reallocate denoiser buffers on resized frame.
-Force golden update on resized frame.
-Don't denoise resized frame, and copy source into denoised buffers.
Change-Id: Ife7638173b76a1c49eac7da4f2a30c9c1f4e2000
For screen-content mode, with frame dropper off, put a limit
on how low encoder buffer can go.
Under hard slide changes, the buffer level can go too low and then
take long time to come back up (in particular when frame-dropping
is not used), which will affect the active_worst and target frame size.
Change-Id: Ie9fca097e05cd71141f978ec687f852daf9de332
Dynamic resizing now support two-steps scaling: first go down to
3/4 and then 1/2. This feature is under a flag which controls the
switch between two-steps scaling and one-step scaling (1/2 only).
Change-Id: I3a6c1d3d5668cf8e016a0a02aeca737565604a0f
This is more a proof of concept than anything else. The problem here
isn't so much how to code it, but rather where to place the resulting
code. All intrapred DSP code lives in vpx_dsp, so do we want the vp10
specific intra pred functions to live there, or in vp10/?
See issue 1015.
Change-Id: I675f7badcc8e18fd99a9553910ecf3ddf81f0a05
The x86 simd expects this. Identical alignment can be found in vp9
and vp10 also. Fixes crashes on 32bit x86 systems.
Change-Id: I229c88d8f696acbef5337c8fa9503528df4e1c40
I've added a few new functions (d45e, d63e, he, ve) to cover the
filtered h/v 4x4 predictors that are vp8-specific, the "correct"
d45 with the correctly filtered bottom-right pixel (as opposed to
the unfiltered version in vp9), and the "broken" d63 with weirdly
filtered bottom-right pixels (which is correctly filtered in vp9).
There may be a minor performance impact on all systems because we
have to do an extra copy of the Above pixel array to incorporate
the topleft pixel in the same array (thus fitting the vpx_dsp API).
In addition, armv6 will have a more serious performance impact b/c
I removed the armv6/vp8-specific assembly. I'm not sure anyone
cares...
Change-Id: I7f9e5ebee11d8e21aca2cd517a69eefc181b2e86
define NOMINMAX to allow the std:: versions to be used; min/max will be
defined transitively via windows.h otherwise
Change-Id: I692b03fa3e70b7a53962d3fd209498f70f712fed
vp9_filter_block_plane_ss11() and vp9_filter_block_plane_non420()
are only called for the uv planes.
Change-Id: Iacd3b3242c8ce581edd37c8f06d95efc8a0f88a3
The loopfilter masks are now built in the decode loop.
This is done so we can eventually reduce the number of
MODE_INFO structs required by the decoder.
The encoder builds the masks for the entire frame prior
to calling the loopfilter.
Change-Id: Ia2146b07e0acb8c50203e586dfae0c4c5b316f11
When configured with high bitdepth enabled, the 8bit transform
stopped using optimised code. This made 8bit content decode slowly.
Change-Id: I67d91f9b212921d5320f949fc0a0d3f32f90c0ea
Update rate correction factor when we drop the frame due to overshoot.
Only affects when the drop_overshoot feature is on: screen_content_mode = 2.
Change-Id: I67e24de979b4c74744151d2ceb3cd75fec2a1e7a
In the decoder, map this to the output variable vpx_image_t.r_w/h.
This is intended as an improved version of VP9D_GET_DISPLAY_SIZE,
which doesn't work with parallel frame decoding. In the encoder,
map this to a codec control func (VP9E_SET_RENDER_SIZE) that takes
a w/h pair argument in a int[2] (identical to VP9D_GET_DISPLAY_SIZE).
Also add render_size to the encoder_param_get_to_decoder unit test.
See issue 1030.
Change-Id: I12124c13602d832bf4c44090db08c1009c94c7e8
The name "display_*" (or "d_*") is used for non-compatible information
(that is, the cropped frame dimensions in pixels, as opposed to the
intended screen rendering surface size). Therefore, continuing to use
display_* would be confusing to end users. Instead, rename the field
to render_*, so that struct vpx_image can include it.
Change-Id: Iab8d2eae96492b71c4ea60c4bce8121cb2a1fe2d
In practice, this fixes the issue that if you have an odd number of
mi_cols, on the full right of the image, the UV int4x4 loopfilter
will be skipped over odd cols as well as odd rows (because it holds a
single variable for both edges).
See issue 1016.
Change-Id: Id53b501cbff9323a8239ed4775ae01fe91874b7e
Use the existing QP condition on limiting cyclic refresh, and add
addiitonal condition that block has been encoded with zero/small motion
x frames in row (where x is at least several times the refresh period).
Additional condition only affect non-screen content mode.
This helps to improve visual stability for noisy input, where on steady
background areas the application of delta_qp may lead to encoding the noise.
Also added a change to use the true skip (after encoding) to update the
last QP.
Change-Id: I234a1128d017d284cf767fdb58ef6c59d809f679
When the iOS SDK major version is 9 or higher:
- Pass -fembed-bitcode to compiler, assembler, and linker.
- Add a warning for simulator targets since yasm doesn't know
what -fembed-bitcode means, and exits with an error.
BUG=https://code.google.com/p/webm/issues/detail?id=1075
Change-Id: I38c997a0225e53c5dd1b4ddf7935d21362953f76
Always add IOS_VERSION_MIN to darwin arm cflags. The warning occured
because the default (9.0) does not match the value set by configure
(6.0).
BUG=https://code.google.com/p/webm/issues/detail?id=1075
Change-Id: Ia9085ceeca10e057f9eb781c14f07581bb6280a5
- Use the iphoneos SDK path (instead of macosx).
- Detect iOS SDK major version and disable media (armv6) when using
iOS SDK version 9 or higher.
BUG=https://code.google.com/p/webm/issues/detail?id=1075
Change-Id: I12f77dbeee4c0084e8322f6841813da8b5e91c16
these have been supported in tile-threaded decoding since:
b3b7645 vp9_dthread: remove frame_parallel_decoding_mode requirement
Change-Id: Ia5a752db9be937153cf4830d9258752136356d1b
Limit transform size for intra to 16x16, for non-screen content mode.
Little/no change in speed or metrics.
32x32 intra block is rarley selected in RTC (non-screen content) case,
but some visual improvement can be seen in some example,
e.g., captured_video_dark_whd.yuv.
Change-Id: I68e2db87875343b3fb9bb407a7709f0088f84072
remove static from fdct4/8/16/32 in vp10/encoder/dct.c
add prefix vp10_ to fdct4/8/16/32
add vp10/encoder/dct.h
Change-Id: I644827a191c1a7761850ec0b1da705638b618c66
Reallocation of mi buffer fails if change size on the first frame and
change config in subsequent frames. Add a condition for resolution
check to avoid assertion failure.
BUG=1074
Change-Id: Ie26ed816a57fa871ba27a72db9805baaaeaba9f3
Reference frame masking logic may skip checking zeromv-last mode.
Fix to avoid this and make sure zero-last is always checked.
No noticeable change in speed, and PSNR/SSIM metrics on RTC set overall
neutral (very small gain ~0.02).
Small visual improvement on few RTC clips.
Change-Id: I26eacdc449126424001a4a64e5ac31949f064417
the range check in dct.c (abs(input[i]) < (1 << bit)) will fail in many
cases. this was broken at the time this check was added
BUG=1076
Change-Id: I3df8c7a555e95567d73ac16acda997096ab8d6e2
the range check in dct.c (abs(input[i]) < (1 << bit)) will fail in the
25-29 range. this was broken at the time this check was added
Change-Id: I8ca9607f6cbdc8be7f47696ffeabbab3ac5727e2
Shortcut arg for --extra-configure-args --enable-examples. Enables
the examples, and thus ensures that all versions of libvpx that
iosbuild.sh produces can actually be linked.
Change-Id: I2ddda094361bf0ac77f8d2ae542e4dc7b2cab158
fixes build on windows x64; previously 'heightq' i.e., the 64-bit register
was accessed when only the 32-bit value was needed. given this is from a
stack variable the upper bits were undefined.
+ bump register/xmm counts; users of SETUP_LOCAL_VARS touch xmm13 in
64-bit builds and filter_block1d16_v* uses one extra temp variable
Change-Id: I9c768c0b2047481d1d3b11c2e16b2f8de6eb0d80
This commit removes mbmi_ext_base pointer from MACROBLOCK struct.
Its use case can be fully covered by cpi->mbmi_ext_base pointer.
Change-Id: I155351609336cf5b6145ed13c21b105052727f30
Add SVC codec control to set the frame flags and buffer indices
for each spatial layer of the current (super)frame to be encoded.
This allows the application to set (and change on the fly) the
reference frame configuration for spatial layers.
Added an example layer pattern (spatial and temporal layers)
in vp9_spatial_svc_encoder for the bypass_mode using new control.
Change-Id: I05f941897cae13fb9275b939d11f93941cb73bee
This means that we don't reconstruct in 4x4 dimensions, but in
blocksize dimensions, e.g. 4x8 or 8x4. This may in some cases lead
to performance improvements. Also, if we decide to re-introduce
scalable coding support, this would fix the fact that you need to
re-scale the MV halfway the block in sub8x8 non-4x4 blocks.
See issue 1013.
Change-Id: If39c890cad20dff96635720d8c75b910cafac495
In vp9, the bottom MV would be the average of the topright and
bottomleft luma MV (instead of the bottomleft/bottomright luma MV).
See issue 993.
Change-Id: Ic91c0b195950e7b32fc26c84c04788a09321e391
This has virtually no effect on coding efficiency, but it is more
logical from a theoretical perspective (since it makes no sense to
me that you would exclude a MV from a list just because it's sign-
inversed value is identical to a value already in a list), and it
also makes the code simpler (it removes a duplicate value check in
cases where signbias is equal between the two MVs being compared).
See issue 662.
Change-Id: I23e607c6de150b9f11d1372fb2868b813c322d37
For reading, this makes the operation branchless, although it still
requires two shifts. For writing, this makes the operation as fast
as writing an unsigned value, branchlessly. This is also how other
codecs typically code signed, non-arithcoded bitstream elements.
See issue 1039.
Change-Id: I6a8182cc88a16842fb431688c38f6b52d7f24ead
The implicitly changed value would be used for contextualizing future
skip flags of neighbour blocks (bottom/right), which is certainly not
what was intended. The original code stems from vp8, and was useful
in cases where coding of the skip flag was disabled. In vp9, the skip
flag is always coded. The result of this change is that for bitstream
parsing purposes, decoding of the skip flag becomes independent of
decoding of block coefficients.
See issue 1014.
Change-Id: I8629e6abe76f7c1d649f28cd6fe22a675ce4a15d
In decoder, export (eventually) into vpx_image_t.range field. In
encoder, use oxcf->color_range to set it (same way as for
color_space).
See issue 1059.
Change-Id: Ieabbb2a785fa58cc4044bd54eee66f328f3906ce
Verify the dynamic resizer behavior for real time, 1 pass CBR mode.
Start at low target bitrate, raise the bitrate in the middle of the
clip, verify that scaling-up does occur after bitrate changed.
Change-Id: I7ad8c9a4c8288387d897dd6bdda592f142d8870c
Verify the dynamic resizer behavior for real time, 1 pass CBR mode.
Run at low bitrate, with resize_allowed = 1, and verify that we get
one resize down event.
Change-Id: Ic347be60972fa87f7d68310da2a055679788929d
For 1 pass CBR spatial-SVC:
Add cyclic refresh parameters to the svc-layer context.
This allows cyclic refresh (aq-mode=3) to be applied to
the whole super-frame (all spatial layers).
This gives a performance improvement for spatial layer encoding.
Addd the aq_mode mode on/off setting as command line option.
Change-Id: Ib9c3b5ba3cb7851bfb8c37d4f911664bef38e165
Fixes temporal scalability. Updates were inadvertently turned
off for two pass svc causing crashes due to gf_group.index
growing unchecked.
Change-Id: Iff759946bf61bbde70630347cc8fa4d51a8c2d2f
The counts didn't take usehp into account, which means that if the
scope of the refmv is too large for the hp bit to be coded, the value
(always 1) is still included in the stats. Therefore, the final
counts will not reflect the entropy of the coded bits, but rather the
entropy of the combination of coded bits and the implied value (which
is always 1). Fix that by only including counts if the hp bit is
actually coded.
See issue 1060.
Change-Id: I19a3adda4a8662a05f08a9e58d7e56ff979be11e
The normative (convolve8) filter is optimized/faster than
the nonnormative one. Pass usage of scaler (normative/nonomorative)
to vp9_scale_if_required(), and always use normative one for 1 pass.
Change-Id: I2b71d9ff18b3c7499b058d1325a9554de993dd52
Unify the style of fdct4() fdct8() fdct16()
Add fdct32()
Add range_check() at each stage
Add unit test at ../../test/vp10_dct_test.cc
Change-Id: I13f76d9046c3ea473c82024b09a5bc8662e2c28e
Upstream hash: 476366249e1fda7710a389cd41c57db42305e0d4
Changes from upstream since last update:
4763662 mkvparser: fix type warnings
267f71c mkvparser: SafeArrayAlloc fix type warning
f1a99d5 mkvparser: s/LONG_LONG_MAX/LLONG_MAX/ for compatibility
bff1aa5 mkvparser: add msvc compatibility for isnan/isinf
Change-Id: Ie0375e564fc74b3b296744d0039830d2f77b83b6
See issue 1030. The value of frame_parallel_decoding_mode was ignored
in vp9 if refresh_frame_context was 0, so instead make it a 3-member
enum where the dependency is obviously stated.
Change-Id: I37f0177e5759f54e2e6cc6217023d5681de92438
In vp9, [0] and [1] had identical meaning, so merge them into a
single value. Make it impossible to code RESET_FRAME_CONTEXT_NONE
for intra_only frames, since that is a non-sensical combination.
See issue 1030.
Change-Id: If450c74162d35ca63a9d279beaa53ff9cdd6612b
1) copy following files from vpx_dsp/ to vp10/common/
vp10_inv_txfm.c
vp10_inv_txfm.h
vp10_inv_txfm_sse2.c
vp10_inv_txfm_sse2.h
2) change the function prefix "vpx_" to "vp10_" in above files
3) add unit test at vp10_inv_txfm_test.cc
Change-Id: I206f10f60c8b27d872c84b7482c3bb1d1cb4b913
This condition is not effectively in use. The actual reference
frame masking is done in other route.
Change-Id: Ia59c843bcac7243dada92f0f67658d7ce43df5e8
remove 'u' and specify all objects to allow objects with the same
basename to be added and a incremental rebuild to succeed
fixes issue #1067
Change-Id: Id0ebc89be826a026f1bbf21b4e32a2b1af45154d
Take out speed features that affect the compression performance
to simplify the coding route. This commit removes the motion field
mode search used in speed 3.
Change-Id: Ifdf6862cb1ece8261125a56d9d89bcef60758c00
WebM files could have CodecId missing in the track headers. Treat those files as
unknown input file type in vpxdec.
Fixes issue #1064.
Change-Id: I6c3bb7b4bd3a4f5c244312482a5996f8b68db3f3
avoid duplicating internal structures and include vp9_dx_iface.c
directly. these had fallen out of sync after the frame-parallel branch
merge.
Change-Id: I604cfbffa95abe2a1c8e906a696f32436b1422ed
this file needs to be reworked to remove the duplication of codec
internals + allow for divergence of vp9 and vp10
Change-Id: I6266b94ccfbc24dae30148f134804b52aa411b88
prevents an int -> vpx_img_fmt_t conversion warning with high-bitdepth
as it modifies the image format
Change-Id: Ie3135d031565312613a036a1e6937abb59760a7e
Access scaled reference frame in the sub8x8 rate-distortion
optimization loop only when the current test mode is an inter mode.
This prevents an ioc warning triggered by sending intra_frame index
to fetch scaled reference frame.
Change-Id: I6177ecc946651dd86c7ce362e3f65c4074444604
For 3 temporal layers, reduce somewhat the
cyclic_refresh_mode_max_mbs_perframe parameter, from 20% to ~14%.
Small increase in PSNR/SSIM metrics.
Change-Id: Ia216fa5474048f1ef7fe3db88cd60dfef2a1bf8a
Change settings for 1 pass CBR.
And only use SET_SVC control(s) if there is at least 1 layer (spatial or temporal).
This allows sample encoder to also work for 1 layer case.
Change-Id: I5b0a33c25afb2f24a3a8aa4ec8ade9afc87cd702
This commit allows the encoder to include sub8x8 inter mode with
scaled reference frame in the rate-distortion optimization scheme.
Change-Id: Ibbe9678801592826ef22566566dcdeeb008350d5
Sync the encoder's buffer offset calculation for sub8x8 block motion
compensated prediction with scaled reference frame to match the
decoder's behavior. This resolves an enc/dec mismatch issue when
sub8x8 inter mode with scaled is turned on.
Change-Id: I4bab3672b007a5ae0c992f8a701341892d2458b0
The fields are always coded in the frame itself, so there is never any
dependency on past frames. In practice, this fixes sign_bias being
ignored when error_resilient_mode=1.
See issue 1011.
Change-Id: I9d134ef6b445ced4d100fa735ce579855a0fa5af
the check performed within the while was redundant; simply place the
accumulation after all tiles are decoded.
Change-Id: I6a74e87257c775fd8bfc8ac4511e4a6ad8f18346
This is based on the original patch optimized for 32bit
platforms by Tamar/Ilya and now uses the x86inc style asm.
The assembly was also modified to support 64bit platforms.
Change-Id: Ice12f249bbbc162a7427e3d23fbf0cbe4135aff2
These frame types cannot make bitstream parsing depend on previous
frames, so the hypothetical combinations of e.g. keyframe=1 and
update_map=0 or keyframe=1 and temporal_update=1 are non-sensical.
Therefore, make it impossible to code such combinations in the vp10
bitstream header.
See issue 1044.
Change-Id: I3f0a83d5c7e3989541a469a909471424a285239d
If the encoder dynamic resize is triggered and change config()
is then called, it will reset the current (resized) codec width/height
back to the the config (unresized) width/height (which will then
prevent the resizing action from occurring in encoder_loop).
Avoid this by checking for a change in the config width/height
before resetting the cm->width/height.
Change-Id: Id9d50c0ee8a943abe4b6c72bbaa02d9696f93177
In VP9, the order for frame header was: [0] smooth, [1] regular, [2]
sharp, [3] bilinear. Per-block, the order was [0] regular, [1] smooth
and [2] sharp. For VP10, swap smooth/regular in the frame header so
that the block ordering and frame header ordering are interchangeable.
See issue #1046.
Change-Id: Ic9ec5964874375e40cd59bef50b489a76cbe4365
Unify the style of fdct4() fdct8() fdct16()
Add fdct32()
Add range_check() at each stage
Add unit test at ../../test/vp10_dct_test.cc
Change-Id: I9e912b2c5683862e65c5a21abc3e1c260cca4576
* changes:
test: limit the valid image size on OS/2
configure: add -Zhigh-mem to LDFLAGS on OS/2
configure: disable PIC on OS/2
Makefile: add $(STACKREALIGN) to CFLAGS for vp9_reconintra.c
x86inc.asm: fix NASM compilation
* changes:
Only build multithreaded functions on mt builds.
Don't build calc_psnr for high bit depth.
Enable missing dual lpf test
Remove unused VP10 functions.
Mark VP10 functions as 'INLINE'
Remove unused functions from test files
Only build append_negative_gtest_filter when it is used.
Add INLINE decoration to static test functions
* changes:
vp9_mcomp: make search functions private
vp9_mbgraph: use vp9_full_pixel_search(HEX)
vp9_temporal_filter: use vp9_full_pixel_search(HEX)
vp9_firstpass: make vp9_init_subsampling private
vp9_encoder: make vp9_alloc_compressor_data private
If either the encoder or the decoder is enabled, CONFIG_VP<N> will be
set. This simplifies the conditional and passes the chromium update
script when CONFIG_ values are passed in with 'yes' and 'no' values.
This was failing because it was checking against empty strings but
they are set to 'no'
Change-Id: I02ecd557210088ba1458cd0e89eead5666f6597a
Spatial/temporal svc code was removed. Verified using Borg test,
and the results before and after the change are matching.
Change-Id: I4c2ee5cd560428e3e50be02e57e5871ef4246390
For one pass CBR: only check for updating refresh_golden
if ext_refresh_frame_flags_pending is not set (i.e., == 0).
And move the resetting of ext_refresh_frame_flags_pending = 0
down to after the encode_loop (and account for dropped frames).
This is to prevent changing refresh_golden flga when the user
supplies the reference/update flags.
Change-Id: I4d87b3e705ba43f243667e367503b585c61e2a54
* changes:
Only build ssse3 filter functions on 64 bit
Clean up unused function warnings in vp8 encoder
Clean up unused function warnings in vp8 onyx_if.c
These were lost in the great sub pixel variance move of
6a82f0d7fb
Not having these functions caused a ~10% performance regression in
some realtime vp8 encodes.
Change-Id: I50658483d9198391806b27899f2c0d309233c4b5
previously any flags added while setting up the toolchain would
override the user selections; environment variables could be treated
similarly
Change-Id: Ibfcc644137d8e579af554d19a38d4020019a7a34
Mark functions in findnearmv.h, invtrans.h and setupintrarecon.h
with INLINE.
Hide function in postproc.h behind the same #if as it's callers.
Change-Id: Ic1e014a943d2aca280f137019218b9d4f1443d61
This is a leftover of the XMA code which was removed a long time ago.
Found while looking for unused functions.
Change-Id: I07a3d542ae55440af59380dcdcf9a6c11cdfcb75
This commit adds clamp of new vectors similar to the logic in RD loop.
Such clamp is not necessary from the perspective of VP8 bitstream, but
is added to improve ChromeCast mirroring's robustness.
Change-Id: I42f6adbc60ffce283b994869364230858632d6fa
For each block in pickinter: use average of four middle
pixels (instead of single pixel) to set skin map.
This can help a little in reducing false skin detection in
some cases.
Change-Id: Ic247af75e9c2948b08ab977a39e061adacd8ec97
In high bitdepth setting, the rate multipier may be set as 0. In
lossless mode, the RD cost would always be 0, resulting in bad
partition and prediction mode choices.
Change-Id: I297014dd8bfa8a07ff0ab480119f75678300ff68
This patch just fixes the test for the time being, but does not
actually solve the underlying issue, which still needs investigation.
Change-Id: I54a35de839723f5b499b57e38dd2bdd400adc427
Switch to use the normative (convolve8) filter for source scaling,
only for 1/2x1/2 scaling for now. This is faster and has better
quality than either the vpx_scale_frame or the nonnormative scaler.
Remove the vp9_scale_if_required_fast, which is now not used.
Change-Id: I2f7d73950589d19baafb1fa650eac987d531bcc8
Define it as a function of reference frame types to provide
scalability for multiple reference frames.
Change-Id: I77b856c96916f352bc31004b9266b3f24e19bd0f
this restores the previous version's behavior avoiding issues with
builds that may split sources on directory boundaries; protected
visibility may work in this case.
Change-Id: If37c70d9bd81de85a8e112457b9819a5cac6129d
For 1 pass CBR mode under screen content mode:
if pre-analysis (source temporal-sad) indicates significant
change in content, then check the projected frame size after
encode_frame(), and if size is above threshold, force re-encode
of that frame at max QP.
Change-Id: I91e66d9f3167aff2ffcc6f16f47f19f1c21dc688
Only test for using golden as reference for variance partition
selection if it is used as a reference for that frame.
For temporal layers, golden may not be a reference on a given frame,
even though it was for some previous frame. If it is not a reference
for current frame, don't check/use it for partition selection.
Change-Id: I6b0f2bd36aebbb5903077c9a0a66d80f1de9a7b1
CONFIG_VP9_HIGHBITDEPTH is currently used by both vp9 and vp10, but in
many place outside vp9/vp10, the macro was used in conjunction of
CONFIG_VP9. This created a dependency on vp9 for vp10 to build. This
commit removes the dependency by use CONFIG_VP9_HIGHBITDEPTH only in
these places.
Change-Id: I8cc007fc9cf132394c6498ce6759e606b64a6ad0
This commit renames the vp10 encoder, decoder, and common interface
file names from vp9_ prefix to vp10_ prefix.
Change-Id: Iafb5d786e4b428d2b9bf097123bd86c4fa9ded24
For speed 7, real-time mode: Base layer frames are further apart
(for #temporal layers = 3, this is every 4 frames) so worth keeping
same motion search parameters (as in speed 6) on the base layer frames.
Change-Id: Idebf49dda6ef4f3d9a55aee55129a68253f692fb
gcc-based builds will allow a 0-element array, but visual studio builds
will not; this change hides the encoder and decoder specific symbols as
modules using them are selected based on the configuration.
Change-Id: Ic16ba9d12241070ec689dc5880164c14a4f7ca44
* changes:
Only use .text sections for aout
Use newer x86inc.asm
Use .text instead of .rodata on macho
Copy PIC handling code from x86_abi_support
Set 'private_extern' visibility for macho targets
Avoid 'amdnop' when building with nasm
Catch all elf formats
Expand PIC default to macho64 and respect CONFIG_PIC from libvpx
Use libvpx defines to set name mangling rules
Customize x86inc.asm for libvpx
Rename updated version of x86inc.asm
Use "private_prefix" instead of "program_name" and make vpx the default
prefix.
Change-Id: I4883a99b2aee8e5dc9f2c16a2e6f4b5d6e4de458
The read only sections are getting stripped on some OS X builds. As a
result, random data is used in place of the intended tables.
Change-Id: I58c18a53e503f093ee268451698c5761e6c32540
Other implementations of x86inc.asm have more comprehensive nasm
workarounds. This is the only thing that was changed for the previous
import to libvpx. See if we can still get away with it.
Change-Id: I3ef6fe9a4816461c89431a82b7e4a08b4b948d39
Make sure all variants get correct visibility and SECTION notes.
libvpx only pass elf32 and elf64 to the assembler, never just elf.
Change-Id: I7c36c115bf52436c9afe61985c859a2081948271
Revision a95584945dd9ce3acc66c6cd8f6796bc4404d40d
from git://git.videolan.org/x264.git
Temporarily name file x86inc.asm until all necessary local patches are
applied.
Change-Id: I9c7d0ed4d3ed900ae2d5db0abbcc048a2892c9b8
Use the correct period (in terms of cr->percent_refresh) for the condition
of larger delta-qp following key frame.
And account for larger interval for temporal layers.
Change-Id: Ibb43f5200f9b1eeb8bbb8211327b08ecda3c3b8a
Re-investigated the second-level sub-pixel motion search. Improved the
way of choosing search points. Rewrote the second-level search code.
At speed 0, the borg tests showed:
1. for stdhd set, Avg PSNR gain: 0.216%; Overall PSNR gain: 0.196%;
SSIM gain: 0.206%. Only 1 out of 15 clips showed PSNR loss.
2. for derf set, Avg PSNR gain: 0.171%; Overall PSNR gain: 0.192%;
SSIM gain: 0.207%. Only 3 out of 30 clips showed PSNR losses.
Added the condition for third-point checking, namely, less points
were checked. Speed tests showed no speed loss(Avg 0.3% speedup at
speed 0).
Change-Id: I6284ebb3fa7ba63be8528184c49e06757211a7f1
when configuring with mips32-android-gcc HAVE_MIPS32 would be set, but the
ndk does not set -mips32r2 for APP_ABI=mips which results in BSwap32 failing
to build; refine the check in endian_inl.h
Change-Id: I22893fe61f29111eb902d961b500b2174596268d
The test file compiler fails if one uses --disable-vp8-decoder
--enable-vp9-decoder. It effectively turns on CONFIG_VP8 and
CONFIG_DECODERS, but turns off CONFIG_VP8_DECODER, which causes
compiler error at test_vector_test.cc.
This commit fixes this issue by adding vp8/9 decoder flags to
the decoder behavior test, respectively.
Change-Id: I097ff8fd5e12715a94a565a82e54503885eb7187
-For ambient qp in active_worst setting: increase the initial
averaging time (from very first frame) to account for avg_qp of key_frame.
-In postencode on key frame: update the last_q/avg_q[key_frame] for
all temporal layers.
Change-Id: I5313153d350b1045b4835ce948dfffb7d2039b52
Condition usage of rc.frames_since_golden to non-svc mode.
rc.frames_since_golden, which is used in non-svc mode to add second reference,
was causing, under certain condiiton, the turning off of golden reference
for svc case.
Change-Id: Icec644d235d0471e56d8ff73d6c37278bd6ecd3b
and FUN_CONV_2D macros. The predict lut now handles
this case. The encoder now calls vpx_scaled_2d() instead
of vpx_convolve8() for scaling.
Change-Id: Ia1c8af8a31e4cb4887a587143108cb45835f7df7
This reverts commit a5e97d874b.
Additionally:
Revert "vpx_convolve_copy_sse2: fix win64"
This reverts commit 22a8474fe7.
This change performs poorly on various x86_64 devices affecting
performance by 1-3% at 1080P. Performance on chromebook like devices was
mixed neutral to slightly negative, so there should be minimal change
there.
Change-Id: I95831233b4b84ee96369baa192a2d4cc7639658c
This commit clears all the vp9_ prefix use case in vpx_dsp. It gets
the vp9 folder ready to branch out vp10.
Change-Id: I2906eec179ee792b4af8c9b4161313653050e931
This commit clears the function naming convention in vpx_dsp. It
replaces vp9_ prefix of global functions with vpx_ prefix. It also
removes the vp9_ prefix from static functions.
Change-Id: I6394359a63b71a51dda01342eec6a3cc08dfeedf
Choose a different diagonal point to check when the two costs are
the same, making it consistent with the way we choose the best mv.
This slightly changes the encoding result, and the derflr set borg
test at speed 0 shows 0.027% Overall PSNR gain, 0.024% Avg PSNR
gain, and 0.043% SSIM gain.
Change-Id: Ic8ee3a6767394866d159e4f9e1c777604dd73c17
If the current best mv(namely, the search center) is still the best mv
after the first level search, the second level checks is skipped. This
patch doesn't change the bitstream. At speed 0, it speeds up the encoder
by 1% - 2%.
Change-Id: I054c91b884d3f7aef157436c061744562bd6506d
Add a guard to exclud dspr2 inverse transform files from vpx_dsp
make file, when high bit-depth is turned on. This fixes the jenkins
nightly build.
Change-Id: Ibacd86563af1ec4810c550905b3fa0397baeeafc
Changes:
b6de61a Adds support for simple tags
75a6d2d sample_muxer: Don't write huge files.
cec1f85 mkvmuxer: remove unused timecode_scale variable
8a61b40 Merge "mkvparser: Tiny whitespace fix."
7affc5c clang-format re-run
d6d04ac mkvmuxer: use generic Cluster::AddFrame
4928b0b Merge "mkvmuxer: Write Block key frames correctly."
c2e4a46 Merge "sample_muxer: Use AddGenericFrame to add frames."
e97f296 mkvparser: Tiny whitespace fix.
d66ba44 Merge "Add support to parse DisplayUnit."
deb41c2 Add support to parse DisplayUnit.
42e5660 Fix issues on EBML lacing block parsing
fe1e9bb Fix block parsing to not allow frame_size = 0
2cb6a28 Change assertions to checks when parsing TrackPositions
d04580f Fixes issues on Block Group parsing
c3550fd mkvmuxer: Write Block key frames correctly.
5dd0e40 Merge "mkvmuxer: Set is_key to true for metadata blocks."
8e96863 mkvmuxer: Set is_key to true for metadata blocks.
a9e4819 sample_muxer: Use AddGenericFrame to add frames.
5a3be73 Change assertions to checks when load CuePoints
f99f3b2 mkvmuxerutil::EbmlDateElementSize: remove value param
ff572b5 Frame::IsValid: fix track_number check
b6311dc mkvmuxer: Refactor to remove a lot of duplicate code
256cd02 Merge "mkvmuxer: DiscardPadding should be signed integer."
16c8e78 mkvmuxer: s/frame/data in all AddFrame* functions.
c5e511c mkvmuxer: DiscardPadding should be signed integer.
4baaa2c Add framework build script: iosbuild.sh
3d06eb1 PATENTS: fix a typo: constitutes -> constitute
d3849c2 mkvparser: Dead code removal.
f439e52 Change assertions to checks when preloading Cues
d3a44cd Fix track transversal when listing Cues on sample
c6255af Tweak .gitignore so git status is clean after checkout and
build: - added missing underscore to sample_muxer - added cmake and make
related files
b5229c7 Makefile.unix: s/samplemuxer/sample_muxer/
e3616a6 Add support to parse stereo mode, display width and display
height in mkvparser
a4b68f8 parser: Fix bug in Chapters::Atom::Parse()
bab0a00 cmake: Set library and project name the proper way on Windows.
feeb9b1 Set library name to match Windows expectations.
b9a549b Fix CMakefile to generate libwebm.a
b386aa5 Add CMakeLists.txt and msvc_runtime.cmake.
b0f8a81 parser: Fix memory leak in Chapter parsing
f06e152 mkvmuxer: Fix MoveCuesBeforeClustersHelper recursive call.
27bb747 allow subtitle tracks with ContentEncodings
623d182 DoLoadCluster: tolerate empty clusters
1156da8 Update PATENTS to reflect s/VP8/WebM/g
0d4cb40 mkvmuxerutil: Use rand() in MSVC builds.
e12fff0 mkvmuxer: Overload WriteEbmlHeader for backward compatibility
a321704 mkvmuxer: write correct DocTypeVersion
574045e mkvmuxer: fix DiscardPadding
8be6397 Include crop elements when calculating size of Video element
8f2d1b3 mkvparser: fix DiscardPadding extraction
1c36c24 mkvmuxer: fix style guide violations
568504e Merge "UUIDs can have their high bit set"
acf788b Add support for CropLeft, CropRight, CropTop and CropBottom
elements.
418188b Merge "muxer: codec_id is a mandatory element"
07688c9 mkvmuxer: Reject frames if invalid track number is passed.
2a63e47 muxer: codec_id is a mandatory element
d13c017 UUIDs can have their high bit set
Change-Id: Iba28acb1ff774349d03e565f2641ddea132cf1e7
fixes link under vs9; this is the same change as:
dbf6e3f gen_msvs_vcxproj.sh: Avoid object name collisions.
Change-Id: I2a188c9024d0605e60e5e03ddcef1a25e7e53585
Chromium puts all the yasm output in the same directory. Looking at ways
to improve this but in the meantime get rid of collisions.
Change-Id: I923c5231d14e895ab96521eb89807ede868a0753
Ssim_vars is used to accumulate stats based 4x4 pixel blocks, this
commit changes the allocations size to be based on mi_rows and mi_cols
to avoid out-of-bound memory access for larger size videos. The hard
coded 720x480 can only work for image size up to 2880x1920.
Change-Id: Id9d07f3f777385b448ac88a6034b7472e4cf3c79
This commit moves the module inverse transform functions from vp9
to vpx_dsp folder. The hybrid transform wrapper functions stay in
the vp9 folder, since it involves codec-specific data structures.
Change-Id: Ib066367c953d3d024c73ba65157bbd70a95c9ef8
This got erroneously changed during the refactor. This fixes
SvcTest.TwoPassEncode2TemporalLayersWithMultipleFrameContextsAndTiles.
Change-Id: Ifa5ab0e098396c5e2d10478db87df256eadfa4c7
This function suffers from a couple problems in small core(tablets):
-The load of the next iteration is blocked by the store of previous iteration
-4k aliasing (between future store and older loads)
-current small core machine are in-order machine and because of it the store will spin the rehabQ until the load is finished
fixed by:
- prefetching 2 lines ahead
- unroll copy of 2 rows of block
- pre-load all xmm regiters before the loop, final stores after the loop
The function is optimized by:
copy_convolve_sse2 64x64 - 16%
copy_convolve_sse2 32x32 - 52%
copy_convolve_sse2 16x16 - 6%
copy_convolve_sse2 8x8 - 2.5%
copy_convolve_sse2 4x4 - 2.7%
credit goes to Tom Craver(tom.r.craver@intel.com) and Ilya Albrekht(ilya.albrekht@intel.com)
Change-Id: I63d3428799c50b2bf7b5677c8268bacb9fc29671
It in essence refactors the code for both the interpolation
filtering and the convolution. This change includes the moving
of all the files as well as the changing of the code from vp9_
prefix to vpx_ prefix accordingly, for underneath architectures:
(1) x86;
(2) arm/neon; and
(3) mips/msa.
The work on mips/drsp2 will be done in a separate change list.
Change-Id: Ic3ce7fb7f81210db7628b373c73553db68793c46
Don't run rate_block (cost_coeffs) if distortion alone is enough to
surpass best_rd.
This decreases 2nd pass runtime on HD at speed 2 by about 2%. There is
zero effect on output if tx_cache is removed.
Change-Id: Ia3b1cc77bfbe6ee988c395fde06c0eb92940b784
1. The RD scores obtained during the tx size selection were stored in the
tx cache, and used to help make the tx decision for the following frames.
This wasn't used anymore in VP9 encoder. Recovered the related decision
making code from 1.5+ years ago, and borg tests didn't show any quality
gain. This patch removed it to lower the complexity.
2. An optimization was done after the above refactoring. If the tx_mode
is not TX_MODE_SELECT, we only need to test the chosen tx size instead
of all posible tx sizes. This gave a 1.5% average speed gain at speed 2,
and a 1% average speed gain at speed 3.
Change-Id: Id8cd650e066a8cef33829d8c15388a8138adc78c
The forward 32x32 2D-DCT functions are aligned in vpx_dsp folder.
The vp9_dct.h file is not effectively used now.
Change-Id: Ie7946b6fdd784b8e91496242337bc9002c75c281
Replace the duplicate coefficient definition in neon implementations
of inverse transform with those from vpx_dsp/txfm_common.h
Change-Id: I4cd9bd9569ab1793dfdbb6f16d80bcb581599f0d
This commit replaces vp9_idct.h with txfm_common.h in many SIMD
implementation files for precise file dependency.
Change-Id: If73dd726bb16537e7494f28538b0a169810f9756
Separate the common coefficient constant into vpx_dsp/txfm_common.h.
Move the SSE2 macro definitions to vpx_dsp/x86/txfm_common_sse2.h.
This clears the use case of vp9_idct.h in vpx_dsp folder.
Change-Id: I319735a2abf42888e5080ac14cfbcde34be7b121
requires r10e or newer:
Android NDK, Revision 10e (May 2015)
...
Other bug fixes:
...
- Fixed .asm support for ABI x86_64.
Change-Id: I51ec9a5f77c982b7412d922e896348a83ae2d7d6
Avoid scaling the references if they have already been scaled.
Change only affects 1 pass non-svc mode for now.
Change-Id: I204f4079c026cba7adce7a7f855d072f6139ccec
The RD and load save/code grabs it as groups of four. In practice there
is no change to physical allocations becaquse this is backed by a 16-byte
memalign.
Change-Id: I01e89769872300e23227e03dd24a6e229f482025
Add vpx_dsp_rtcd.h to the header file list. The od_bin_fdct8x8()
here depends on forward 8x8 2D-DCT.
Change-Id: I1d71edc71f07069808823d2445c1cafd285e1b94
This commit factors out common macro definitions from the forward
and inverse transform implementations into vpx_dsp. It removes
the duplicate macro definitions from encoder and decoder folders.
Change-Id: I92301acbd3317075e9c5f03328a25abb123bca78
This commit factors the 4x4, 8x8, and 16x16 2D-DCT forward
transform operations into vpx_dsp folder.
Change-Id: I084b117b79c0925edcbcabb93f62b9f4bf8dbe7d
vp9_itrans*_dspr2.c aren't necessary for high bitdepth builds and
notably vp9_itrans8_dspr2.c fails in various configurations using a
codesourcery toolchain:
vp9_itrans8_dspr2.c:31:5: can't find a register in class 'GR_REGS' while reloading 'asm'
Change-Id: I2ac76203e65cc643cb835ab50e95701896d92a1a
This test places 128 in positions that would not be found
in the VP9 filter tables. The ssse3 code packs this table
into chars and uses the pmaddubsw instruction, which treats
the value as signed. The ssse3 code checks for 128 in
position 3, skipping the ssse3 code if found, and calls
vp9_convolve8_c(). vp9_convolve8_c() is also used for scaling.
ChangeFilterWorks breaks the ssse3 scaling code found in other
commits.
Change-Id: I1f5a76834bc35180b9094c48f9421bdb19d3d1cb
Eliminates the byte by byte read from bool decoder, by reading
in a size_t and then shifting it into place.
Change-Id: I0ed8c7b6f942847e79cc90105dc1d2b5b3deb0d6
This commit limits the scope of 1-D DCT and ADST functions within
vp9_dct.c and makes them static. This largely clears out the cross
referencing issue between vp9_dct.c and the SIMD optimizations.
Change-Id: If7cac478b11bb32328ccf70a9f60b709dad43d7f
The SSE2 version high bit-depth forward hybrid transforms are
essentially using the C functions via cross referencing to 1-D
functions in vp9_dct.c. This commit unifies the two versions and
removes the unnecessary dependency.
Change-Id: Ib4d0702a138f8daf7d0bd97c141ee7088f293765
Separate the hybrid transform case from 2D-DCT case. This will
allow us to clear up cross dependency between c and SIMD
implementations later.
Change-Id: Iaa499e8b096850a1c5a0c50a3b6e63e15d0184bf
The following quantization functions were moved:
vp9_quantize_b
vp9_quantize_b_32x32
vp9_highbd_quantize_b
vp9_highbd_quantize_b_32x32
vp9_quantize_dc
vp9_quantize_dc_32x32
vp9_highbd_quantize_dc
vp9_highbd_quantize_dc_32x32
The purpose of doing that was to allow these functions to be shared
by multiple codecs.
Change-Id: Id8ab939f283353cdd07bd930d47db3d932a5d87f
This commit moves the loop filter dspr2 implementation from vp9 to
vpx_dsp directory. It also fixes header file format issues.
Change-Id: I09203ed4bd267d7fd76bb79a6ee84a37646206b2
Remove the use of drop_frames_water_mark, as this is used for
frame dropping control. Use fixed threshold for now on buffer underflow.
Change-Id: If0ddda9f7f6fa96067cdcb0eccb42e17bda37c32
For vp9 decoder build without profile 2 and profile 3 support, this
commit changes to report error "Unsupported bitstream profile" for
input streams in profile 2 or 3, rather than other misleading error
information.
In addition, one of the invalid files in unit tests is actually coded
profile 2, this commit makes it tested only when the decoder is built
with vp9-highbitdepth.
This fixes issue #1028.
Change-Id: I8b6c1210787c8f89c703a546687dcf973ac20fc0
The various tap loop filter operations are common functions across
codec. This commit moves them along with SIMD optimizations to
vpx_dsp folder.
Change-Id: Ia5fa0b2e5289cdb98467502a549c380b9c60e92c
In aq-mode=3 under a resizing action (i.e., resize_pending != 0),
force an update of the golden reference frame.
Change-Id: I14806f6db71b5f8c827678cc5e1fc913c138a9a4
Fix bug in setting this flag for animated content.
The bug did cause quality to increase because far
more frames are not boosted than boosted.
However, the speed trade off to gain is a lot less
favorable and the behavior was not as intended.
Change-Id: I89fb70419c88b26f40b3534de0481730a1b3fcfa
Move the clamp functions to vpx_dsp_common.h file. Clear out the
dependency of vp9_loopfilter_filters.c on vp9_common.h file.
Change-Id: I9c4b928bcd7f597106b5aa96354356d3775a3431
Use drop_frames_water_mark for threshold on buffer underflow,
and change threshold for resize down.
Change-Id: I2de19adce50abe9bcdc0b107528cec8cc1857fcc
The fast scaling for 1 pass mode was being used only on the
first frame after resizing event (because resize_scale_num/den
is set to 1 and only changed for first frame following resize event).
Change-Id: I723b63e21823eb858f25f5662d2bbe4f1842e61f
Proper use/update of resize_state and resize_pending to constrain
the total amount of downsizing to be at most one scale down, for now.
Change-Id: Id18fc32499f2fbdbec16728dcdc9e4eac09098f0
From Change Ibf0c30b72074b3f71918ab278ccccc02a95a70a0
There is still an issue relating to one animated test clip with repeat
patterns where this change effectively increase the default maximum
arf interval by +1. This can be examined seperately.
Change-Id: Idd01d5480fc45202d8a059a0c3afc0997cc5bdd1
Flaten the intra block decoding process. It removes the legacy
foreach_transformed_block use in the decoder. This saves cycles
spent on retrieving the transform block position.
Change-Id: I21969afa50bb0a8ca292ef72f3569f33f663ef00
This commit simplifies the intra block boundary condition logic.
It removes the block index from the argument set.
Change-Id: If00142512eb88992613d6609356dfd73ba390138
Eliminates the byte by byte read from bool decoder, by reading
in a size_t and then shifting it into place.
Change-Id: Id89241977103fc3b973e4ed172a5cbf246998e5d
The clamp calls with INT32_MIN and INT32_MAX have no effect at all on
int values passed in, therefore this commit removes those effectless
clamps and also adds more const intermediate results to make the code
more readable.
Change-Id: I66d8811f58bb74ec31cbec9a6c441983a662352e
Rework the inter mode transform block decoding loop. Replace the
block index with the row and col index as the input argument. It
saves function call to compute the row and col index according to
the block index and overall block size, and many if statements
associated with the transform block position relative to the coding
block. For the test bit-stream pedestrian_area 1080p at 5 Mbps,
the decoding speed goes up from 81.13 fps to 81.92 fps.
Note that the intra coded block decoding needs more refactoring
work than the inter ones. So keep it using foreach_transforme_block
as for now.
Change-Id: I5622bdae7be28ed5af96693274057f55ba9b4fb4
The encoder gets its dqcoeff from the context tree. In the decoder move
it to directly after MACROBLOCKD.
Change-Id: I46c9b76f26956a360d17de0b26ecb994dae34ecb
Even if the recode loop is not enabled for the current frame type
trap the case where the projected size of a a frame is above the
maximum allowed in recode_loop_test()
Change-Id: I453004694b8f8699e3c2a83252e9f83adccdda4e
Changes to allow more use of rectangular partitions at
speeds 1 and 2 for content classed by the first pass as
animation and for blocks near the active image edge.
This has quite a big impact in quality for the animated
test sequence but also hurts encode speed for speed 2.
For other content types the impact on both speed and
quality is small.
Added some plumbing for detection of internal vertical
image edges.
Change-Id: I3fc48de2349f8cb87946caaf0b06dbb0ea261a9a
Change speed features / behavior for split mode when there
is an internal active edge (e.g. formatting bars).
Remove some threshold constraints in rd code near the active
edge of the image.
Add some plumbing for left and right active edge detection.
Patch set 5. Limit rd pass through for sub 8x8 to internal active edges.
This takes away any speed penalty for most clips but keeps the enhanced
edge coding for the more critical case of internal image edges
Change-Id: If644e4762874de4fe9cbb0a66211953fa74c13a5
Replace block index with transform type in the argument list. This
allows to save an extra fetch to the prediction mode. For pedestrian
area 1080p coded at 5 Mbps with single tile, the average decoding
speed goes up from 80.55 fps (before the refactoring series) to
81.13 fps.
Change-Id: Icbebf84ce63c19c0c92f3690ed201f6c3eab7881
If the pre-selected partition size (from variance partition) is
32x32, also apply nonrd partition search for 32x32 and 16x16 size.
Overall small positive gain in metrics, average ~1%.
Some visual improvement, for lower resolutions.
Change-Id: I69cb425bda94f7d13d34c451ab30e9276335a30e
The decoding process handles detokenization and reconstruction per
transform block sequentially. There is no need to offset the dqcoeff
buffer according to the transform block index. This allows to
reduce the memory spill and improve cache performance.
Change-Id: Ibb8bfe532a7a08fcabaf6d42cbec1e986901d32d
This commit replaces the vp8_ prefixed subtract function with the
common vpx_subtract_block function. It removes redundant SIMD
optimization codes and unit tests.
Change-Id: I42e086c32c93c6125e452dcaa6ed04337fe028d9
inline the code directly in read_mv_component(), the only place where it
was being used; this removes a function call in a hot function
Change-Id: I66f99c0c9ce3bc310101dbca4a470f023cc6fb55
To aid version management for integration with ffmpeg by use
of:
#ifdef VPX_CTRL_<CTRL_ID>
...
#endif
Change-Id: If550e06de4d3aa3685881f312ce6a86fa9de083b
Adds two new vp9 parameters --min-gf-interval and --max-gf-interval
to enable testing based on frequency of alt-ref frames.
Also adds a unit-test to test enforcement of min-gf-interval.
For both these parameters the default value is 0, which indicates
they are picked by the encoder, based on resolution and framerate
considerations. If they are greater than zero, the specified
parameter is honored.
(Additional note by paulwilkins)
Note that there is a slight oddity in that key frames are also GFs and
considered part of GF only group. However they are treated as not
being part of an arf group because for arf groups the previous GF is
assumed to be the terminal or overlay frame for the previous group.
(end note)
Change-Id: Ibf0c30b72074b3f71918ab278ccccc02a95a70a0
This reverts commit a42df86c03.
this change causes MSA/VP9SubpelVarianceTest.Ref and
MSA/VP9SubpelVarianceTest.ExtremeRef failures under
mips32r5el-msa-linux-gnu and mips64r6el-msa-linux-gnu
Change-Id: I40b71a0b774eaeb31f66f795733f95cf360909f7
This reverts commit 61774ad1c4.
this change causes MSA/VP9SubpelAvgVarianceTest.Ref failures under
mips32r5el-msa-linux-gnu and mips64r6el-msa-linux-gnu
Change-Id: I7fb520c12b2a3b212d5e84b7619a380a48e49bb0
The vp9_lpf_vertical_16_dual function optimized for x86 32bit target. The hot code in that function was caused by the call to the transpose8x16.
The gcc generated assembly created uneeded fills and spills to the stack. By interleaving 2 loads and unpack instructions, in addition to hoisting the consumer
instruction closer to the producer instructions, we eliminated most of the fills and spills and improve the function-level performance by 17%.
credit for writing the function as well as finding the root cause goes to Erik Niemeyer (erik.a.niemeyer@intel.com)
Change-Id: I6173cf53956d52918a047d1c53d9a673f952ec46
Added code to reduce the minimum partition size searched
for super blocks at or straddling the edge of the image.
If the first pass has detected formatting bars the "active" edge
may not be the real edge.
Change-Id: I9c4bdd1477e60f162a75fac95ba6be7c3521e05c
Correct the ARF boost calculations to partly discount
inactive or very low energy regions of the image.
Examples (formatting bars and 0 energy areas of animated clips).
Change-Id: I241af058d10aba8c67a4deca36deb913047d4561
This commit moves the primitive multi-threading files from vp9
folder to vpx_thread, which will be accessible by all vpx codec.
Change-Id: Ib51e66e9c69801c10631fab56d35a0c0aaed5883
the max value of the lookup in expanded form is:
(((1 << 7) - 1) << 1) - 65 + 1 + 64 = 254
remove the clamp [0, 253] and add one table entry
Change-Id: I0b5d0c66702fdb0b8f1cc9ab9b0dac66326e85a6
to MB_MODE_INFO_EXT. This saves 36 bytes per 8x8 area for
both the decoder and encoder. (encoder has two MODE_INFO
buffers)
Change-Id: If006abb2224acaf326df3c2be09e77e967662107
Only do the check for resizing if the feature is selected
(i.e., resize_mode = RESIZE_DYNAMIC).
And modify condition for checking to be resize_count >= window,
(since framerate can change).
Change-Id: Idceb4e50956bb965a1492b4993b0dcb393c9be4d
only uint8 is required; each use only loads one value as a uint8
quiets a few type conversion warnings
Change-Id: I03dc0dc0eb01ac23a6e8673daa2b77c6c57bf1b0
- Change default real time speed to -6.
- Add vpxenc_vp9_webm_rt_multithread, which encodes
niklas_1280_720_30.y4m with 2 to 4 threads using 2 to 4
tile columns.
Change-Id: I4d86c3360aec67ae5d1ba82eb6e0f0be8068b5af
Reduce boost for segment#2 for low bitrates and low-res.
This change is to reduce the rate overshoot at low bitrates.
No change in behavior, except at the very low bitrates.
Change-Id: I0dbd9d3b6356da5804de94adf10fca6a7a8f8948
Keep the same transform cutoff and partition selection
for speed 5 as in speeds >=6 (non-rd speed settings).
Existing setting for key frame at speed 5 allowed transform size
up to 32x32 on key frames, and did not allow for 4x4 block partition size.
This created more visual artifacts on first few frames.
avgPSNR/overallPSNR/SSIM gains of 0.2/0.7/0.8 for rtc_derf(low-res) set,
and 0/0.7/1.1 gains for rtc set.
Change-Id: I8c139ec6c9bb74e14b4ffbad5f12e94f18a59c0b
configure.sh was setting some Mac OS X options for iOS targets, which
confuses the iOS 9 beta SDK in Xcode 7 when linking libraries.
Additionally, old armv6 media extensions were being enabled on iOS
when they're not needed (we always have Neon since iOS 6). These
broke on iOS 9 SDK which no longer assembles those instructions.
Change-Id: I4e4d2722392ead3382ce96289c03ef1e489799d6
skips testdata verification; useful with slow media or if the data was
retrieved via a separate call to testdata
Change-Id: Ifd97892cee6c04b0111874cc8071675e90ec852b
For speed 5 real-time mode, the selection of the partition size for
superblocks on the segment (aq-mode=3) uses the non-rd recursive
pick partition search, and can sometimes select 64x64.
For low resolutions, visually better to limit this to 32x32.
Change-Id: I69657a7ed8899f8b3cf8c9c318a2509c5c72c565
For screen content don't refresh a block at a quantizer higher than
it was last coded at. PReviosuly at realtime speeds the encoder had a
tendency to recode a block from GOLDEN with a higher Q than it was last
coded at.
Change-Id: Iacd561806c769dcce1a81b9827ffc70090f5ba18
Decision to scale down/up is based on buffer state and average QP
over previous time window. Limit the total amount of down-scaling
to be at most one scale down for now.
Reset certain quantities after resize (buffer level, cyclic refresh,
rate correction factor).
Feature is enable via the setting rc_resize_allowed = 1.
Change-Id: I9b1a53024e1e1e953fb8a1e1f75d21d160280dc7
There is a naming conflict in the chromium build system.
The rest of the variance functions will move to vpx_dsp soon.
Change-Id: Iff78da2aafb0d7380eda73e38d7dac72110a1e47
The internal behavior of block_yrd differs in high bit depth
settings from 8-bit one. This causes the assertion condition not
true for high bit depth.
Change-Id: I15dc02e7162d27cabe78c451941d769d488b1174
The overflow issue affects a variable that is only used in inter
mode. This commit fixes the ioc warning triggered in the intra
mode. It does not affect the compression performance.
Change-Id: I593d1b5650599de07f3e68176dd1442c6cb7bdbc
set_frame_size() is being called twice, once before entering
encode_encode_frame_to_data_rate(), and once again in that function.
No need to call it twice for one-pass mode.
Change-Id: I5fabaf0a90482d4f42cd89ef7ae1402c31aec600
This patch modified the thread creating code. When use_svc is true,
the number of threads created is decided by the highest resolution.
This resolved WebM issue 1018.
Change-Id: I367227b14d1f8b08bbdad3635b232a3a37bbba26
* changes:
vp9_decodeframe.h: remove unused prototype
vp9_decodeframe: move public funcs to end of file
vp9_decodeframe: reorder some functions
vp9_decodeframe: hide vp9_dec_build_inter_predictors_sb
This commit fixes a potential integer overflow issue in function
hadamard_16x16. It adds corresponding dynamic range comment.
Change-Id: Iec22f3be345fb920ec79178e016378e2f65b20be
the declaration containing the alignment in vp9_filter.h was removed in:
eb88b17 Make vp9 subpixel match vp8
fixes a crash in 32-bit builds
Change-Id: I9a97e6b4e8e94698e43ff79d0d8bb85043b73c61
If the frame size increases, the tile data buffer needs to be
re-allocated according to the number of tiles existing in current
frame. This patch makes the multi-tile encoding work in spatial
SVC usage case, and partially solved WebM issue 1018.
Change-Id: I1ad6f33058cf5ce6f60ed5024455a709ca80c5ad
vp9_init_dequantizer() was deleted in:
bdd249b Optimize the dequantization process on decoder side.
Change-Id: Iedb5b6a3a03964dd6901c1e3b2325194d94bc708
add a dependency on *_rtcd.h to ensure they're generated before
attempting to build the test files
Change-Id: Ibbbd1f6ea77912bfd297129e7c83b9a80923ea12
Reduce motion threshold and boost factor for second segment,
for low bitrates, at low resolutions for now.
This is to reduce the rate fluctuation/frame dropping that occurs
at these low bitrates.
Change-Id: Ia66c3be41831882fca8c1e4fe104f5ea8fbe7142
Some initial experiments into discounting dead zone
formating bars and intra skip blocks (common in some
types of animation and graphics) in the calculation of
the active max Q for each ARF/GF group.
TODO: check for vertical formating bars and validate the
horizontal bar at the bottom edge of the image.
As expected, this change as it stands, does not make much
difference for the natural videos in the std-hd and derf sets.
However, for the yt and yt hd set there is a significant rise
in the average PSNR with overall PSNR and SSIM remaining
neutral.
The mean rise for the YT-HD test set was > 6%. This is mainly
because the change allows Q to drop further on titles and
other graphics sections where spending a small number of
extra bits gives a sharp rise in PSNR.
Change-Id: I3f878ae91fc1854312d7ecf9fa792c17bc1aa6b7
For content that is identified as likely to contain some
animation or graphics content, increase the availability
of split modes for good quality speeds 1-3.
On a problem test animation clip this improves metrics
results by about 0.25 db and makes a noticeable difference
visually. It also causes a small drop in file size (~0.5%) but
a rise in encode time of about 5-6% at speed 2.
For more normal content it should have no effect.
Change-Id: Ic4cd9a8de065af9f9402f4477a17442aebf0e439
Added check to see if last frame was all intra. This will
eliminate two checks in find_mv_refs_idx(). Also, do not
update the frame mvs if the current frame is all intra.
This improved performance on material with frequent
intra-only frames.
Change-Id: I44a4042c3670ab0d38439d565062a0e2a1ba9d1e
fails unit tests:
[ FAILED ] NEON/VP8SubpelVarianceTest.ExtremeRef/0, where GetParam() = (3, 3, 0x14e36d, 0)
[ FAILED ] NEON/VP8SubpelVarianceTest.Ref/0, where GetParam() = (3, 3, 0x14e36d, 0)
the tests were recently enabled in:
eb88b17 Make vp9 subpixel match vp8
the functions likely haven't changed since being converted from assembly
Change-Id: I6141717b111b8f735f436c160d74270af53ef722
move them under their respective config check to avoid some unused
variable warnings when disabled
Change-Id: Ic5e5280cf1bc1f56e8349676f0bedae4acef34ea
this quiets warnings from armv6 code [1].
from msdn [2]:
-oldit
Generate ARMv7-style IT blocks. By default, ARMv8-compatible IT blocks
are generated.
a new configuration would be needed for armv8 in any case as the neon
assembly is being built, so removing this should be harmless
[1] A4509: This form of conditional instruction is deprecated
[2] https://msdn.microsoft.com/en-us/library/hh873189.aspx
Change-Id: I4c3b838b52a87401c6daecd83d22ab148ed7c5d9
This control allows the application to skip the loop filter in the
decoder. This is an advanced control that should only be used in
extreme circumstances as it may introduce and accumulate decode
artifacts.
Change-Id: I278c65c60826f84c9141ebe06c6eeed3c2335fa8
WebM files will adjust the display width and height according to the
input pixel aspect ratio. The default pixel aspect ratio is 1:1.
BUG=https://code.google.com/p/webm/issues/detail?id=1005
Change-Id: I23e0a601b7259fa9513cb86110c41b8437769808
calculate the averages needed for even and odd rows once; this removes a
conditional from the inner loop
the final average calculated currently relies on above[] being extended,
it could be reduced to use
above[block_size - 2] + 3 * above[block_size - 1]
Change-Id: I70f5eac8d8a2a959c7114844a95826f445c3dd4d
The only difference between the two was that the vp9 function allowed
for every step in the bilinear filter (16 steps) while vp8 only allowed
for half of those. Since all the call sites in vp9 (<< 1) the input, it
only ever used the same steps as vp8.
This will allow moving the subpel variance to vpx_dsp with the rest of
the variance functions.
Change-Id: I6fa2509350a2dc610c46b3e15bde98a15a084b75
Adds code to detect dead zone bars at the top and bottom
of reformatted letterbox video (note that the code only
looks at the top of the image and assumes any dead zone
is symmetrical). Use of this to adapt rate control etc.
will follow in a subsequent patch.
Also counts other blocks (excluding the dead zone) that
have no intra signal. The presence of a significant
number of such blocks can be used as a identify that the frame
may be artificial (e.g. animation, screen capture, graphics).
This patch contains plumbing only and does not use
the signal.
Change-Id: I59bc93529cd4065416cef773e405fda3ae006a20
Some places are using the unoptimized variance function. This was never
intended and does not fit into the optimization framework.
Change-Id: Id96238407aad03b0ffd4a46cd183555a026daedc
Updated sources according to improved version of common MSA macros.
Enabled respective convolve MSA hooks and tests.
Overall, this is just upgrading the code with styling changes.
Change-Id: If5ad6ef8ea7ca47feed6d2fc9f34f0f0e8b6694d
Clang adds alignment hints when casting up the loads/stores. Although
this should be safe for most paths, it's causing some crashes. Either
the source of the misalignment needs to be determined and adjusted or
the intrinsics need to be rewritten to avoid using the cast to load the
data.
BUG=817,892
Change-Id: Ia3aa824d6a4cd97e14325ff49dc730b6f85ec7e8
The larger internal variables are required for the intermediates
but RoundHighBitDepth brings them down to uint32_t/unsigned int.
Fixes type warnings in visual studio.
Change-Id: I48d35284d6cbde330ccdc1f46b6215a645d5eb00
Numerator was being range checked against the
denominator - preventing any frame rate slower
than 1 fps.
I've tested this on a Mac using using ffmpeg and
results are comparable to mp4 and ogg files generated
at the same time.
Not yet tested on Windows.
Johnny Klonaris
google@jawknee.com
Change-Id: Idb358dbc2e7dc000037880ede4a1b0df248a42c8
Updated sources according to improved version of common MSA macros.
Enabled idct MSA hooks and tests.
Overall, this is just upgrading the code with styling changes.
Change-Id: I1f488ab2c741f6c622b7a855388a202168082209
only the immediate above right pixel is needed; this removes a
conditional from the inner loop
the final average calculated currently relies on above[] being extended,
it could be reduced to use above[block_size - 2] + 3 * above_right
Change-Id: Ica4f2b8d25eec3ca1d6fa52ef0d4adc228eeea3f
Done little restructuring/styling changes to the sources like generic macro definitions, their use to reduce code lines, better code alignments etc.
Disabled all MSA hooks and tests
Change-Id: Ic6f2dce0b501f46b80c06c46c0fe2043d557b190
Keep the logic, transform size based on cyclic refresh and bsize,
(that was conditioned on VAR_PARTITION conditions) the same
for all speeds in non-rd mode (speeds >= 5).
No change to speeds >=6.
Small improvement for speed 5, ~0.5/1.5% gain for avg psnr/ssim.
Change-Id: If9c5657f3d30efd3c7f147166bba7cb69ea55114
In VS 2015 and higher snprintf is supplied and therefore vsnprintf
doesn't need to be defined. This also avoids problems caused by
_snprintf being different from snprintf.
This fixes a build break with VS 2015 and improves security.
Originally submitted via chromium by brucedawson@chromium.orghttps://codereview.chromium.org/1055603003
Additionally break this MSVC-specific tweak to a new file, which will
become the home of all such MSVC-specific things.
This requires adding a dependency on msvc.h to every example which uses
args.c and tools_common.h
Change-Id: I35b5f8e7ea00f6627403aabc9ea79b0412557a99
ROUND_POWER_OF_TWO has some poor side effects when used
with [u]int64_t such as doing the shifting in 32bits.
Change-Id: Ic85a19765cd316fb43657cb21c86f35ceb772773
Increase the 32x32 split threshold, to allow for more 32x32
at expense of 16x16. Visually looks somewhat better.
Change-Id: Ia1439c3a0dc2d7933468b88bd59266fcd9f03505
Break out the setting of the block variance split thresholds,
since they are locally modified, e.g., based on local/segment qp.
No change in performance.
Change-Id: I0a3238e6dab05140657539fc4bd27ac5ff7a554e
+ synchronize filter function signatures
this makes any intrinsics filters available for inlining and has the
side-effect of making those filters static, quieting missing-prototype
warnings.
Change-Id: I1908875caffa585bd4fc65aaf10d17a5e20cfb46
+ synchronize filter function signatures
this makes any intrinsics filters available for inlining and has the
side-effect of making those filters static, quieting missing-prototype
warnings.
Change-Id: I1cd55c9d52547793ad65aa90c7620f0e426edaa2
collect the vp9_convolve function definition macros there; this will
allow some relocation of functions from vp9_asm_stubs.c
Change-Id: Idadd117fa256dd48748379856973fd985b8204e8
This commit fixes the integral projection motion search crash when
frame resize is used. It fixes issue 994.
Change-Id: Ieeb52619121d7444f7d6b3d0cf09415f990d1506
reorder includes to avoid:
warning C4985: 'ceil': attributes not present on previous declaration.
this is the same workaround used in vp9/common/vp9_systemdependent.h
Change-Id: Ia10dd63de24f96fa1507a6179220e9d6ec774db6
Various header/test files had to be re-worked in order to
build "Remove cm parameter from vp9_decode_block_tokens()".
This patch reverts the "Remove cm" part and only contains
the re-worked header files.
Change-Id: I520958a88d1991fee988a3c784d0eac40e117a32
1. Check existing buffer sizes when re-allocate context buffers.
2. Don't need to set mi buffers to 0 during setup_mi.
Change-Id: I6b48b0e077a4d804312b605ad0dc34aec5795a6d
This patch provides a partial rapid feedback of bits
resulting from extreme undershoot.
Some improvement on some problem animated material
but in its current form only a small impact on the metrics results
of our standard test sets.
Change-Id: Ie03036ea8123bc2553437cb8c8c9e7a9fc5dac5d
This patch addresses two issues that can occur when the
encoder chooses to use a mixture of ARF and GF groups.
The first issue relates to a failure to reset the "ARF active" flag
correctly when transitioning from coding ARF groups to coding
GF groups. This caused some golden frames to be encoded
with an incorrect bit rate target as if they were ARF overlay frames.
The second issue relates to the encoding of a single short GF group
just before a key frame. Where the last group before a key frame
is an ARF group we expect the final frame before the key frame to
be an low data rate overlay frame. However, when the last group
is a GF group, the final frame before the key frame should be a normal
frame with a normal bit allocation. This issue had the potential to cause
a single poorly coded frame just before a key frame. If that key frame
were a forced key frame rather than a real scene cut, this might cause
pulsing.
Change-Id: Idf1eb5eaf63a231495a74de7899236e1ead9fb00
This allows rate control to react to content of current frame being encoded.
Enabling this feature via the setting: screen_content_mode = 2.
Change-Id: Ib2c6670551d96f4907495d5b7b76bb8c49e673db
this allows test_libvpx's simd caps check to be used; it also fixes a
link error on OS X with -fcommon.
Change-Id: I1a62a3e74ba06b8f3b37a22fcfdebf90c04ab289
in addition to <arch>/*. this will pick up tests defined with TEST()
instead of INSTANTIATE_TEST_CASE_P()
Change-Id: I0917741baac89d9ce857f4d4aa53790e8a0c6c12
split call of extend_and_predict() and return, fixes visual studio build
warning since:
0a80164 Move mc_buf to cut down size of MACROBLOCKD.
Change-Id: I7cdf712941ef773a07f038539cb8080dc27861cd
this file shouldn't be built directly, it is included in vp9_dct_sse2.c
to create a non-high-bitdepth and a high-bitdepth version
silences missing prototype warnings for the unused FDCT* functions
Change-Id: Ide6ff8c24ab31bdb0f833260505ae33660a1ad5b
this file shouldn't be built directly, it is included in vp9_dct_sse2.c
to create a non-high-bitdepth and a high-bitdepth version
silences missing prototype warnings for the unused FDCT32x32* functions
Change-Id: I0e38f16dae5ea1728de184ee2c89287d48675c51
this file shouldn't be built directly, it is included in vp9_dct_avx2.c
to create a non-high-bitdepth and a high-bitdepth version
silences missing prototype warnings for the unused FDCT32x32* functions
Change-Id: I4c19935c0e035b393be513bde735e9a78064a494
* changes:
vp9_subexp.h: add a missing prototype
vp9: add some missing includes
vp9 intrinsics: add vp9_rtcd include
vp9: correct some function signatures
vp9_variance_sse2: sync function signatures
vp9/encoder: make some functions static
vp9_dct_sse2: make some functions static
vp9_decodeframe.c: make a function static
Use the same settting as in speed >=6.
This will use same logic for tx_size selecton as in speed >=6,
which limits the transform size and reduces ringing artifact.
Also metrics go up on average with this change: ~2% for PSNR, ~10% for SSIM.
Change-Id: Ia2d50db236ae1cc72f742bfa6c9ec5ea50ff0e0a
* changes:
vp8/rdopt.h+onyx_int.h: add some missing prototypes
vp8: add some missing includes
vp8: make some functions static
vp8/common/variance*: add vp8_rtcd include
vp8_copy32xn: sync function signature
useful for speed testing / verifying individual function optimizations;
currently tests non-high-bitdepth VP9 intra predictors
Change-Id: Ibd247765e43a31894697d43f1d39d312e0ba2090
Add a check to make sure we have a decoded frame available
before copying its 'corrupt' flag.
(Originally submitted to the old repository by Alexander Voronov
as: https://gerrit.chromium.org/gerrit/#/c/74305/).
Change-Id: Iceb4686c785afb437b668015bf8818b18d60e0ce
Testing on another rate control patch reveals that in some
situations, where the encoder is flipping in and out of arf
mode, we get an encoder decoder mismatch.
Whilst it is still not clear why, skipping the last buffer
update seems to trigger the problem. Until I can establish
why, or if there is another underlying cause, I am reverting
this change.
This reverts commit e5112b3ae3.
Change-Id: I315c5200414de89458015823344b7367e9dd75ba
With the sad functions, and hopefully the variance functions soon,
moving to the vpx_dsp location, place the defines used in the
reference C code in a common location.
Change-Id: I4c8ce7778eb38a0a3ee674d2f1c488eda01cfeca
Basically just a warning, but disconcerting nonetheless. Removes this
output from the build:
Makefile:59: -x86_64-darwin13-gcc.mk: No such file or directory
Change-Id: Ibb379506352b2f613ef4a7b1ac47e9c95d0d1580
These targets no longer build (PPC support was removed from
libvpx). Remove the dead code and misleading help output.
BUG=https://code.google.com/p/webm/issues/detail?id=997
Change-Id: Ib35614806adeae970f3821da0d8dbcc54ab8d868
The computation of new metrics is not supported yet in highbitdepth
mode. This commit adds protection to make sure the computation is
done only when highbitdepth is not on. This protection shall be
revised when support of highbitdpeth computation is added.
This resolves the encoder crash when configured with both
--enable-internal-stats
--enable-vp9-highbitdepth
Change-Id: Id9f4bcc4fa26d9ca0e9eabade83f3f88a5b212e6
This patch fixed the following warning:
src\third_party\libvpx\source\libvpx\vp9\encoder\vp9_pickmode.c(1607) :
warning C6246: Local declaration of 'this_mode' hides declaration of the
same name in outer scope.
Change-Id: I1d93c4a47a13cb13089fec5bd61e8b58e6cd8d58
* changes:
vpxenc: make some functions static
vpxdec: make some functions static
tools_common.h: fix get_vpx_decoder_count() proto
tools_common.h: fix get_vpx_encoder_count() proto
tools_common.h: fix usage_exit() prototype
rename LIBVPX_TEST_BINS to LIBVPX_TEST_BIN and remove foreach usage.
this was a leftover from having multiple targets with their own (single)
object list; the use of LIBVPX_TEST_OBJS so widely makes extending these
loops difficult.
Change-Id: I61bda1b91acb43145609f04b8fe6e45ec4483e22
When aq-mode=3 is enabled, only for base layer frames should the
qp of the frame incorporate the segment delta-qp.
This was causing more rate mismatch for the enhancement layer frames
when running temporal layers with aq-mode=3 on.
Change-Id: I1c5e69d1ef8a51188af8696753c17fd8f67699b3
currently this needs to be 2x (NEED_ABOVERIGHT) the size of the largest
block (32) + 1 (for above_left). reduce the buffer size from 128 + 16
(alignment) to 64 + 16.
Change-Id: Idaca1806c7e1214e9437de24e15edc2ebf18f95d
The warning only happens in VP9 encoder's first pass due to src_mi
is not set up yet. But it will not fail the encoder as left_mi and
above_mi are not used in the first_pass and they will be set up again
in the second pass.
Change-Id: I0713b4660d71e229e196654cb0970ba6b1574f28
Where a frame appears to be a repeat of an earlier
frame or frame buffer, but the first pass code
does not anticipate this (usually because it is matching
the GF or ARF buffer not the last frame buffer), do not
update the last frame buffer.
This helps ensure that the content of the last frame buffer
is kept "different" where possible, and not updated to
match the GF or ARF. This is particularly helpful in some
animated sequences where there are groups of repeating
frames. Here it has quite a big impact. However, in most
of our standard test clips it has little or no impact.
Change-Id: I77332ee1a69f9ffc0c6080bfeb811c43fd8828e6
this macro was used inconsistently and only differs in behavior from
DECLARE_ALIGNED when an alignment attribute is unavailable. this macro
is used with calls to assembly, while generic c-code doesn't rely on it,
so in a c-only build without an alignment attribute the code will
function as expected.
Change-Id: Ie9d06d4028c0de17c63b3a27e6c1b0491cc4ea79
Using EIGHTTAP and EIGHTTAP_SMOOTH seem sufficient.
Hard to see any visual gain from allowing EIGHTTAP_SHARP, and it is
rarely selected.
PSNR/SSIM metrics go up by ~0.18/0.14%.
Change-Id: I96fa0d98f9321b913e3ebcd464d4ff3c63018791
Create a new component, vpx_dsp, for code that can be shared
between codecs. Move the SAD code into the component.
This reduces the size of vpxenc/dec by 36k on x86_64 builds.
Change-Id: I73f837ddaecac6b350bf757af0cfe19c4ab9327a
Added the intra mode early termination in order to
speed up the mode search in non-rd case since we
started to include more intra modes in the search
list. Borg tests(rtc set) showed a 0.048% PSNR gain
and 0.061 SSIM gain. No speed change.
Change-Id: I6f255fe534dc50b736e6a66a726ad458eb9b4443
widen the loads and stores to 128-bit.
this was added, but not enabled in:
493a857 Add some sse2 code for intra prediction.
Change-Id: I277d7db608a7db7d75cc0bde86f48fa66ad487e4
For non-rd mode (speed >=5): use mask based on prediction block size, and
(for non-screen content mode) allow for checking horiz and vert intra modes
for blocks sizes < 16x16.
Avg psnr/ssim metrics go up by about ~0.2%.
Only allowing H/V intra on block sizes below 16x16 for now, to keep
encoding time increase very small, and also when allowing H/V on 16x16 blocks,
metrics went down on a few clips which need to be further examined.
Change-Id: I8ae0bc8cb2a964f9709612c76c5661acaab1381e
Impose a limit on the rd auto partition search based on
the image format. Smaller formats require that the search
includes includes a smaller minimum block size.
This change is intended to mitigate the visual impact of
ringing in some problem clips, for smaller image formats.
Change-Id: Ie039e5f599ee079bbef5d272f3e40e2e27d8f97b
Remove one of the auto partition size cases.
This case can behaves badly in some types of animated content
and was only used for the rd encode path. A subsequent patch
will add additional checks to help further improve visual quality.
Change-Id: I0ebd8da3d45ab8501afa45d7959ced8c2d60ee4e
Previously limit on max interval set to 0.5 seconds.
Though this helped some low frame rate material it
appears to be a bit too aggressive for some 24 and 25 fps
content. This patch relaxes the limit to 0.75 seconds.
The patch also adds a new minimum interval variable
to replace the current hard wired value. This allows us
to impose a limit on the maximum number of primary
arfs per second for high frame rate (e.g. 50 & 60fps)
content. This is to address concerns regarding playback
performance on some platforms if there is a high base
frame rate and very frequent arfs.
Change-Id: I373e8b6b2a8ef522eced6c6d2cceb234ff763fcf
offsetting by a variable stride prevents instruction reordering,
resulting in poor assembly.
additionally reroll 16x16/32x32 loops to reduce register spill with this
new format
Change-Id: I0635b8ba21ecdb88116e927dbdab53acdf256e11
This patch reduced the BLOCK_8X8's intra_cost_penalty, which
allows 8x8 blocks to conduct intra mode search. Borg test
result(rtc set): 0.077% PSNR gain, 0.228% SSIM gain. No speed
changes.
Change-Id: Icfe90c4f6969de24bda8ecacbd3da50330bf22b2
The rotation computation using 2X of cos(pi/16) has a potential to
overflow 32 bit, this commit disable the function to allow further
investigation and optimization.
Change-Id: I4a9803bc71303d459cb1ec5bbd7c4aaf8968e5cf
Calculated cpi->vbp_threshold_sad from this frame's dequant value.
The encoding quality and speed didn't change much. Borg test
result: PSNR: -0.002%, SSIM: -0.003%.
Change-Id: I97c9826986f39582f29910d637d08a69c90afdee
The version is currently producing different result from c version
for some input. Disable the use of it for now to allow time for
investigation the source of mismatch.
Change-Id: Id039455494ee531db4886a9f1fa4761174ef6df3
The default golden frame interval was doubled. After encoding a
frame, the background motion was measured. If the motion was high,
the current frame was set as the golden frame. Currently, the
changes were applied only while aq-mode 3 was on.
Borg tests(rtc set) showed a 0.226% PSNR gain and 0.312% SSIM gain.
No speed changes.
Change-Id: Id1e2793cc5be37e8a9bacec1380af6f36182f9b1
structured extended feature flags require eax = 7; avoids incorrectly
detecting avx2 on some older processors that support avx.
from [1]:
INPUT EAX = 0: Returns CPUID’s Highest Value for Basic Processor
Information and the Vendor Identification String
[1] http://www.intel.com/content/www/us/en/processors/processor-identification-cpuid-instruction-note.html
Change-Id: I6b4735b5f7b7729a815e428fca767d1e5a10bcab
For color sampling format other than 420, valid partion size in Y may
not work for UV plane. This commit adds validation of UV partition
size before select the partition choice.
This fixes a crash for real time encoding of 422 input.
Change-Id: I1fe3282accfd58625e8b5e6a4c8d2c84199751b6
(see I3a05cf1610679fed26e0b2eadd315a9ae91afdd6)
For the test clip used, the decoder performance improved by ~2%.
This is also an intermediate step towards adding back the
mode_info streams.
Change-Id: Idddc4a3f46e4180fbebddc156c4bbf177d5c2e0d
The existing test was triggering a lot of false positives on some types
of animated material with very plain backgrounds. These were triggering
code designed to catch key frames in letter box format clips.
This patch tightens up the criteria and imposes a minimum requirement
on the % blocks coded intra in the first pass and the ratio between the
% coded intra and the modified inter % after discounting neutral (flat)
blocks that are coded equally well either way.
On a particular problem animation clip this change eliminated a large
number of false positives including some cases where the old code
selected kf several times in a row. Marginal false negatives are less
damaging typically to compression and in the problem clip there are now
a couple of cases where "visual" scene cuts are ignored because of well
correlated content across the scene cut.
Replaced some magic numbers related to this with #defines and added
explanatory comments.
Change-Id: Ia3d304ac60eb7e4323e3817eaf83b4752cd63ecf
vestigial. the code is stale and couldn't be configured directly; there
are better ways to achieve this now
Change-Id: I5a9c62e099215588cd0d7e5ae002dfc77c21a895
PSNR HVS is a human visual system weighted version of SNR that's
gained some popularity from academia and apparently better matches
MOS testing.
This code is borrowed from the Daala Project but uses our FDCT code.
Change-Id: Idd10fbc93129f7f4734946f6009f87d0f44cd2d7
When the tokenization is not taking effect, the tokenization
pointer remains unchanged. No need to re-assign the backup pointer
value.
Change-Id: I58fe1f6285aa3b4a88ceb864c11d5de8ac6235dd
This partially reverts commit 14ef4aeafb
Including the rtcd headers to get the function definitions causes
problems on VS9.
Change-Id: I780874d9e03af2d3124192ab0e3907301f22674c
This patch limits the maximum arf interval length to
approximately half a second. In some low fps animations in
particular the existing code was selecting an overly long interval
which was hurting visual quality. For a sample problem test clip
(360P animation , 15fps, ~200Kbit/s) this change also improved
metrics by >0.5 db.
There may be some clips where this hurts metrics a little, but the
worst case impact visually is likely to be less than having an
interval that is much too long. On more normal material at 24
fps or higher, the impact is likely to be nil/minimal.
Change-Id: Id8b57413931a670c861213ea91d7cc596375a297
nothing is using android/log.h currently; also quiets a warning when
building a static lib:
Android NDK: WARNING:libvpx/build/make/Android.mk:vpx: LOCAL_LDLIBS is
always ignored for static libraries
Change-Id: I1469a5d6fca4f7338a4a82e26a03e60fc91d92ca
for big endian disable msa
removed -flax-vector-conversion flag
disable runtime_cpu_detect feature if enabled
Change-Id: Icd5130b733f2ddcdb94ffb6e4d170e6ca0f9832b
Force split on 16x16 block (to 8x8) based on the minmax over the 8x8 sub-blocks.
Also increase variance threshold for 32x32, and add exit condiiton in choose_partition
(with very safe threshold) based on sad used to select reference frame.
Some visual improvement near moving boundaries.
Average gain in psnr/ssim: ~0.6%, some clips go up ~1 or 2%.
Encoding time increase (due to more 8x8 blocks) from ~1-4%, depending on clip.
Change-Id: I4759bb181251ac41517cd45e326ce2997dadb577
Refactor the loops in dec_build_inter_predictors to try and decrease
the number of instructions. Limited testing saw about 1% perf
increase on x86 and about 0.67 % perf increase on Arm.
Change-Id: I69cfe6335bb562fbaaebf43fb3f5c5a2a28882a2
add a check for the status line to awk and better report failure given
the program output will be lost in this case
Change-Id: I1348a80108c81099d609f2e2227dd2c31bd8cd54
Modifies a special handling that improves rate control accuracy in
the constrained quality mode, when the undershoot and overshoot
limits are set tighter.
Change-Id: If62103f0ef3ed1cac92807400678c93da50cf046
the sse4 code expects 16-byte aligned arrays; vp8 already had a similar
change applied:
b2aa401 Align SAD output array to be 16-byte aligned
Change-Id: I5e902035e5a87e23309e151113f3c0d4a8372226
Skip redundant prediction filter type cost in filter search loop,
if the rate value will be reset in Hadamard transform based rate
distortion estimate.
Change-Id: Ie5221f4bc8da9461c449df367251aeeac52c6e5d
Reset the reached_eos flag in webm_guess_framerate in case it ends
up consuming the entire file. Also adding a vpxdec shell test to
verify this behavior.
Change-Id: I371eebd2105231dc0f60e65da1f71b233ad14be5
* changes:
vp8_regular_quantize_b_sse2: remove dead init
vp8cx_pick_filter_level*: remove dead inits
vp8_decode_frame: remove dead increment
rdopt: remove dead stores
find_next_key_frame: remove dead init & store
multiframe_quality_enhance_block: remove dead stores
vp8_print_modes_and_motion_vectors: remove dead stores
This commit turns on the Hadamard transform based rate distortion
estimate for all block sizes in RTC coding mode. It conditionally
skips the rate distortion estimation if all zero block flag is set
on. No significant encoding speed change is observed. The
compression performance of speed -6 is improved by 1.7% over using
it only for block sizes of 32x32 and below.
Change-Id: I768145e6f05c737b05b5b5f1ee674e929532cafb
The threshold scaling factor was calculated wrong using partition
size "bsize". Thank Yaowu for pointing it out. It was fixed and no
speed change was seen.
Change-Id: If7a5564456f0f68d6957df3bd2d1876bbb8dfd27
The following functions use the count parameter to either loop or select
dedicated paths:
vp9_lpf_horizontal_16_c
vp9_lpf_horizontal_16_sse2
vp9_lpf_horizontal_16_avx2
vp9_lpf_horizontal_16_neon
vp9_highbd_lpf_horizontal_16_c
vp9_highbd_lpf_horizontal_16_sse2
Change-Id: I7abfd2cb30baa292b4ebe11c847968481103c037
This commit accounts for the transform block end of coefficient flag
cost in the RTC mode decision process. This allows a more precise
rate estimate. It also turns on the model to block sizes up to 32x32.
The test sequences shows about 3% - 5% speed penalty for speed -6.
The average compression performance improvement for speed -6 is
1.58% in PSNR. The compression gains for hard clips like jimredvga,
mmmoving, and tacomascmv at low bit-rate range are 1.8%, 2.1%, and
3.2%, respectively.
Change-Id: Ic2ae211888e25a93979eac56b274c6e5ebcc21fb
The vbp thresholds are set seperately for boosted/non-boosted
superblocks according to their segment_id. This way we don't
have to force the boosted blocks to split to 32x32.
Speed 6 RTC set borg test result showed some quality gains.
Overall PSNR: +0.199%; Avg PSNR: +0.245%; SSIM: +0.802%.
No speed change was observed.
Change-Id: I37c6643a3e2da59c4b7dc10ebe05abc8abf4026a
remove incorrect specializations in rtcd and update a configuration
check in partial_idct_test.cc
(cherry picked from commit 8845334097)
Change-Id: I20f551f38ce502092b476fb16d3ca0969dba56f0
To enable us to the scale-invariant motion estimation
code during mode selection, each of the reference
buffers is scaled to match the size of the frame
being encoded.
This fix ensures that a unit scaling factor is used in
this case rather than the one calculated assuming that
the reference frame is not scaled.
(cherry picked from commit 8d8d7bfde5)
Change-Id: Id9a5c85dad402f3a7cc7ea9f30f204edad080ebf
This commit separates Hadamard transform/quantization operations
from rate and distortion computation in block_yrd. This allows one
to skip SATD computation when all transform blocks are quantized
to zero. It also uses a new block error function that skips
repeated computation of sum of squared residuals. It reduces the
CPU cycles spent on block error calculation in block_yrd by 40%.
Change-Id: I726acb2454b44af1c3bd95385abecac209959b10
This commit allows the quantizer to compare the AC coefficients to
the quantization step size to determine if further multiplication
operations are needed. It makes the quantization process 20% faster
without coding statistics change.
Change-Id: I735aaf6a9c0874c82175bb565b20e131464db64a
For large partition blocks(block_size > 32x32), the variance
calculation is modified so that every 8x8 block's variance
is stored during the calculation, which is used in the
following transform skipping test. Also, the variance for
every tx block is calculated. The skipping test checks all tx
blocks in the partition, and sets the skip flag only if all tx
blocks are skippable. If the skip flag of Y plane is 1, a
quick evaluation is done on UV planes. If the current partition
block is skippable in YUV planes, the mode search checks fewer
inter modes and doesn't check intra modes.
The rtc set borg test(at speed 6) showed that:
Overall psnr: -0.527%; Avg psnr: -0.510%; ssim: -0.573%.
Average single-thread speedup on rtc set was 3.5%.
For 720p clips, more speedups were seen.
gipsrecmotion: 13%
gipsrestat: 12%
vidyo: 5 - 9%
dark: 15%
niklas: 6%
Change-Id: I8d8ebec0cb305f1de016516400bf007c3042666e
sse4 isn't set by configure or used in rtcd, correct the sad entries to
use sse4_1 without changing the signatures for now.
this was done in vp8 post-vp9 branch.
Change-Id: Ia9f1fff9f2476fdfa53ed022778dd2f708caa271
exclude files that only contain functions for non-high-bitdepth builds.
this removes some warnings related to missing prototypes
Change-Id: Ic6642998c46a7b808c6c53b2f9c34bcd4d037abe
This commit allows the encoder to check the eob per transform
block to decide how to compute the SATD rate cost. If the entire
block is quantized to zero, there is no need to add anything; if
only the DC coefficient is non-zero, add its absolute value;
otherwise, sum over the block. This reduces the CPU cycles spent
on vp9_satd_sse2 to one third.
Change-Id: I0d56044b793b286efc0875fafc0b8bf2d2047e32
When the estimated rate-distortion cost of skip coding mode is
lower than that of sending quantized coefficients, allow the
encoder to drop these coefficients. This improves the compression
performance of speed -6 by 0.268% and makes the encoding speed
slightly faster.
Change-Id: Idff2d7ba59f27ead33dd5a0e9f68746ed3c2ab68
This commit fixes the SSE2 version 8x8 Hadamard transform
alignment and makes it consistent with the C version.
Change-Id: I1304e5f97e0e5ef2d798fe38081609c39f5bfe74
This commit replaces the 16x16 2D-DCT transform with Hadamard
transform for RTC coding mode. It reduces the CPU cycles cost
on 16x16 transform by 5X. Overall it makes the speed -6 encoding
speed 1.5% faster without compromise on compression performance.
Change-Id: If6c993831dc4c678d841edc804ff395ed37f2a1b
This commit makes the encoder to select between SATD/variance as
metric for mode decision. It also allows to account chroma
component costs for mode decision as well. The overall encoding
time increase as compared to variance based mode selection is about
15% for speed -6. The compression performance is on average 2.2%
better than variance based approach, with about 5% compression
performance gains for hard clips (e.g., jimredvga, nikas720p, and
mmmoving) at lower bit-rate range.
Change-Id: I4d04a31d36f4fcb3f5f491dacd6e7fe44cb9d815
This commit uses Hadamard transform based rate-distortion cost
estimate for rtc coding mode decision. It improves the compression
performance of speed -6 for many hard clips at lower bit-rates.
For example, 5.5% for jimredvga, 6.7% for mmmoving, 6.1% for
niklas720p. This will introduce extra encoding cycle costs at
this point.
Change-Id: Iaf70634fa2417a705ee29f2456175b981db3d375
webm_read_frame assumes that it won't be called once end of file
is reached. But for frame parallel mode that turns out to be not
true. this patch fixes that behavior by checking for EOS and
returning the appropriate value for subsequent calls.
Change-Id: Ie2fddbe00493a0f96c4172c67be1eb719f0fe8ed
Metrics on RTC set go down by ~1.5% on average.
Key frame encoding time goes down by factor of ~5.
Change-Id: Ia83acc55848613870e5ac6efe7f3d904d877febb
To enable us to the scale-invariant motion estimation
code during mode selection, each of the reference
buffers is scaled to match the size of the frame
being encoded.
This fix ensures that a unit scaling factor is used in
this case rather than the one calculated assuming that
the reference frame is not scaled.
Change-Id: Id9a5c85dad402f3a7cc7ea9f30f204edad080ebf
this matches the other includes and simplifies include paths in builds
from source
(cherry picked from commit 7999c07697)
Change-Id: I344902c84f688ef93c9f3a53e7c06c30db49d8d3
Set the GF group adaptive max Q compile flag to 1 by default.
This change has a quite big visual impact in some clips and also
contributes to tighter rate control.
For short test clips that have consistent content the impact is
quite small on metrics but for more varied long form clips there is
a drop in overal psnr but a sharp rise in average psnr caused by
greater expenditure on some easier sections and tighter rate clipping
in hard sections.
In chunck'ed encodes some of the effect will already be present due
to the independent rate control in each chunk but this change takes
the control down to a smaller scale.
yt hd +10.67%, - 3.77%, -1.56%
yt +9.654%, - 3.6%, - 1.82%
std hd +0.25%, -0.85%, -0.42%
derf +0.25%, - 1.1%. - 0.87%
Change-Id: Ibbc39b800d99d053939f4c6712d715124082843e
1. skip near if it is same as nearest
2. correct rounding for converting mv to fullpel position
3. update pred_mv_sad after new mv search.
Overall .1%~.25% compression gains on rtc set for speed 5, 6, 7, 8.
Change-Id: Ic300ca53f7da18073771f1bb993c58cde9deee89
Revised adjustment for rd based on source complexity.
Two cases:
1) Bias against low variance intra predictors
when the actual source variance is higher.
2) When the source variance is very low to give a slight
bias against predictors that might introduce false texture
or features.
The impact on metrics of this change across the test sets is
small and mixed.
derf -0.073%, -0.049%, -0.291%
std hd -0.093%, -0.1%, -0.557%
yt +0.186%, +0.04%, - 0.074%
ythd +0.625%, + 0.563%, +0.584%
Medium to strong psycho-visual improvements in some
problem clips.
This feature and intra weight on GF group length now
turned on by default.
Change-Id: Idefc8b633a7b7bc56c42dbe19f6b2f872d73851e
This experiment biases the rd decision based on the impact
a mode decision has on the relative spatial complexity of the
reconstruction vs the source.
The aim is to better retain a semblance of texture even if it
is slightly misaligned / wrong, rather than use a simple rd
measure that tends to favor use of a flat predictor if a perfect
match can't be found.
This improves the appearance of texture and visual quality
on specific test clips but is hidden under a flag and currently
off by default pending visual quality testing on a wider Yt set.
Change-Id: Idf6e754a8949bf39ed9d314c6f2daaa20c888aad
The joint_motion_search function alternates prediction
between two reference frames. In order to reuse existing
code, a pointer to the appropriate reference frame is
written into xd->plane[0].pre[0], that the motion
estimation code assumes points to the reference frame.
If this first reference frame was scaled then the
pointer was incorrectly being reset to point to the
unscaled reference frame rather than the scaled
version.
Change-Id: I76f73a8d8f4f15c1f3a5e7e08a35140cdb7886ab
While CONFIG_INTERNAL_STATS=1, PSNR is calculated while encoding.
The aligned width/height were used mistakenly in the calculation.
This patch fixed it, and used the orignal image width/height.
Change-Id: Iad5334f8693af761b71ebb78f2587db8a3404ecf
the TODO around CONFIG_SPATIAL_SVC has been resolved by changing the
CONFIG_* checks to use an ABI based check
Change-Id: If2638baf361b863186177a453beec9af9231e69e
this removes the CONFIG_* checks from public headers, but means
'--enable-experimental --enable-spatial-svc' builds will fail without a
local change to the ABI in vpx_encoder.h. this should be all right for
testing this experiment.
Change-Id: Ief55e7b9d1e8332cfce990275e04c29b30af0c4a
add explicit returns in cases where ASSERT_* can't be used due to the
function returning a value; retain the EXPECT_* for reporting purposes.
Change-Id: I1f514728537fee42a99277d3aba538e832d3b65b
Factor in segment#2 and skip blocks into the postencode estimated bits,
and increase somewhat the aggressiveness of the refresh.
PSNR/SSIM Metrics on RTC set go up by ~0.8/0.5%.
Change-Id: I5d4e7cb00a3aefb25d18c88b6b24118b72dc5d51
This commit makes the encoder to explicitly calculate the SAD
associated with the LAST_FRAME motion vector and compare it to
that of the GOLDEN_FRAME given by integral projection motion
estimation. It skips the expensive sub-pixel motion search over
GOLDEN_FRAME when the LAST_FRAME can provide fairly good motion
compensated prediction quality.
For dark720p speed -6 single thread goes from
33304 b/f, 40.070 dB, 18156 ms ->
33319 b/f, 40.061 dB, 17611 ms
Change-Id: I01bc94b9b598075567a392111046b97a9bc30efe
Because the call to vpx_codec_control at line 928 is now guarded by
!frame_parallel, 'corrupted' may not be set.
Change-Id: Id166bd8a8cdb5e5120fca1640011a3545f6e178a
Use force_split to constrain the partition selection.
This is used because in the top-down approach to variance partition,
a block size may be selected even though one of its subblocks may have
high variance.
In this patch the selection of the 64x64 block size will only
be allowed if the variance of all the 32x32 subblocks are also below the threshold.
Stil testing, but some visual improvement for areas near slow moving boundary
can be seen. Metrics for RTC set increase by about ~0.5%.
Change-Id: Iab3e7b19bf70f534236f7a43fd873895a2bb261d
Shut off all the metric checks for golden reference frame, if we
decide that it is unlikely to be selected for reference.
Change-Id: Ie457cc1fd43935584403b4982659aed80fb9909c
Move the scaling factor outside column projection. This avoids
repeated calculation of the same scaling factor. Profiling shows
that the percentage of vp9_int_pro_col_sse2 of overall cycles
goes from 2.29% down to 1.88%.
Change-Id: I5ac4e324ab2d7f33ba2de66dd2a12e04e04dfd66
1. remove duplicate initialization to mbmi->interp_filter.
2. move mv clamping into ref_frame loop instead of mode checking loop.
3. move the check if last frame is same as golden frame earlier to
avoid initialization of Golden reference related variables.
Change-Id: Idf2d05e19e94a24f69cc289687869fc71d2ff289
use \li to separate the list items contained in conditionals. this
avoids the encode page becoming a sub-item of decode; likely a problem
in <1.8.3.
+ fix encoder conditional, spelling error
+ correct encode page name to match decode 'Encoding'
Change-Id: I67890f52bed8e708bad63fb8819a074e0beff2ca
use \li to denote list items with \if.
fixes the following likely visible in <1.8.3:
usage.dox: warning: Invalid list item found
usage.dox: warning: End of list marker found without any preceding list items
Change-Id: I33c72799edf9f8866596ac8f79247050b8c75681
The compression performance of speed -5 is on average 12.6% better
than speed -6. At lower bit-rates, the gains are typically 20% or
more. For 2-thread encoding, the speed -5 takes about 1.6x time of
speed -6.
Change-Id: If7a73464a24d33e8f49b9533b51ec51c8da7fc80
The commit updates the comments in vp8cx.h to make it clear which
codec support each of codec control functions.
Change-Id: Ibf876e289d4325bbb61ce19311da60d384624c2f
Crash occured on very first key frame, because denoiser
temporal function was beng entered.
Updated denoiser unittest to set cpu_used from first frame,
and verified fix fixes the crash.
Change-Id: I3be1124b52846fbbe7248d2c3d6136e086c80bc1
Comments are updated to reflect that these controls apply to VP9 only,
thereby, to insure the document produced by doxygent to reflect the
same fact too.
Change-Id: Ic54c88ec066aa0ec4552d43dd4a7016e1f810f42
This commit uses a 6-point 1-step refine motion search in the
integral projection based full pixel motion estimation, to replace
the current 9-point search.
It reduces runtime cost of speed -6 on some noisy clips, e.g.,
dark720p single thread
33314 b/f, 40.076 dB, 18231 ms ->
33307 b/f, 40.067 dB, 17768 ms
The compression performance for rtc set remains unchanged.
Change-Id: I194ea5a9ce52e5a10baeee36338633adc22f764c
This commit changes to use single loop to evaluate all inter modes.
There is no impact on compression quality and speed, but allow future
experiment with the order of modes evaluated.
Change-Id: I71696ce1014cbe127e25e98710d835987f5ecc09
Added a skip_dc check. If skip_dc = 1, we could eliminate calling
of vp9_model_rd_from_var_lapndz(). This gave slight PSNR & SSIM
gain(<0.1%), and no speed change.
Change-Id: If5ca733366148c86b98e196a00cc890f50e9a3e5
Re-arrange the multiplication and right shift operations to avoid
integer overflow in choose_partitioning.
Change-Id: Ib4005cafb410a67c1960486471d75b6ebe38c4e0
This commit removes the pred_mv_sad comparison from rtc motion
search, given that a stronger comparison has been done at the
mode search level to eliminate unlikely selected reference frames.
Change-Id: I49b8d24b2174303066fd8eff2102c0648f2869df
This commit enables the rtc coding mode to run integral projection
based motion search for golden reference frame. It improves the
speed -6 compression performance by 1.1% on average, 3.46% for
jimred_vga, 6.46% for tacomascmvvga, and 0.5% for vidyo clips. The
speed -6 is about 6% slower.
Change-Id: I0fe402ad2edf0149d0349ad304ab9b2abdf0c804
Make the vp9_int_pro_motion_estimation() function return zero
motion vector if high bit depth is turned on, instead of removing
it from compiled codes.
Change-Id: Ia48f010eb590b2d517d5678c394110b326a1a95e
This patch accounts in the first pass stats for blocks that
while not coded as intra, are complex and have an intra error /
best error ratio below a threshold.
The modification shortens the GF arf interval for a particular
class of content that contains a lot of blocks matching the
above criteria. (In one short problem test sequence the average
interval dropped from about 14-15 to 10-11)
The change results in small net gains in metrics results for the
Yt(~0.2%) and yt-hd (~0.5%) sets and is approximately neutral
for the other test sets.
The change is currently shielded by a flag and off by default
pending verification that it does not cause other regressions
in tests on a wider YT test set.
Change-Id: I6b803daa6a4ac09a6f428fb3a18be1ecedd974b7
Only update the rd_thresh factors for modes sharing same reference
frame. This helps overall compression of 6 and 7 by .13% and .19%
respectively without any noticeable speed difference.
Change-Id: Idb3a3879512c5d7d0880034516079949290690c5
For non-SVC 1 pass CBR: make the GF update interval a multiple of the
cyclic refresh period, and use encoding stats to prevent GF update at certain times.
Change-Id: I4c44cacc2f70f1d27391a47644837e1eaa065017
Tx_totals counters weren't handled correctly in multi-thread
case, which caused the mismatch while encoding using threads > 1.
This patch fixed that.
Change-Id: Ice9b0386f57175fb92a0bdcd5042686a3106246a
The return value from vp9_compute_qdelta_by_rate, which is
a delta value for the quantizer, could never be 0 if
(qindex == rc->worst_quality).
This occurs because target_index was setup unconditionally
in the loop and yet the loop counter stopped at
(rc->worst_quality - 1).
Change-Id: I6b59cd9b5811ff33357e71cd7d814c5e53d291f2
Choose_partition uses only the last frame as reference frame in making
partition decision, this commit adds the check on how well Golden
frame with (0,0) predicts the current block, and uses GF(0,0) as
basis for partition decision if it produces better prediction.
The commit improves rtc speed 6 and 7 encoding by 0.14% and 0.19%
respectively.
Change-Id: I156acf925bd6e0b586d48155d1940d27270a3915
When golden reference frame is refreshed, the next frame has both
its last and golden reference frames point to the same reference
frame in real-time coding mode. Experiments suggest that using
two separate reference frames for frames right after golden refresh
frame does not provide further compression performance advantage.
This commit hence retains the current encoder implementation and
shuts off the mode search over golden reference frame in this case.
It makes the encoder run slightly faster at no coding performance
change.
Change-Id: I1561f7799253a10e675d05c63c1749fe9e85b472
Force 64x64 partitioning when a whole superblock is SEGMENT_LVL_SKIP. This
drops encode times of screens mostly at rest by 20%.
Change-Id: Ieba554b0b8a0c1679aae784a8bd11f038ab942c3
Adjustment previously only enabled in VBR mode.
This patch allows adjustment of min and max q for CBR
and adjustment of max q only for CQ mode.
Change-Id: Id5e583f3d50453cd544fc57249acacd946457482
While turning on "--aq_mode=3", the quantizers are updated by each
thread. Fixed the me consts initialization function to make sure
that the correct thread data are updated.
Change-Id: Ied27bb7bae76fc3fa2cda4f8c35ac0b46271bef4
While searching for the best mode in non-rd case, SSE of
a partition block is calculated and the transform size is set.
This patch rewrites the skip checking conditions based on
transform size instead of partition size to be more precise.
Small gains were seen in rtc set borg test (speed 6).
AVG PSNR: 0.087%, overall PSNR: 0.073%, SSIM: 0.146%.
No noticeable speed change.
Change-Id: I5603ca5339c784dfa02263f4005988ccd8c32f6e
It was tiny when it was orginally marked INLINE. Forcing this function
to be inlined prevents the compiler from inlining its much smaller
callers.
No measurable speed impact, 28320 byte smaller libvpx.a
Change-Id: I6bf4c917157d15cbadb3cd3e20a9e82d35dc7d6f
Visual Studio is exceptionally picky about this:
vp9_reconintra.c(900): warning C4113: 'void (__cdecl *)()' differs in
parameter lists from 'void (__cdecl *)(void)'
[.build-x86_64-win64-vs10\vpx.vcxproj]
Change-Id: I564c7415f4608fd962be8c699d6133a996b545f7
This saves an extra 64x64 variance calculation and replaces two
32x32 variance functions with sad functions. The compression
performance change is unnoticeable.
Change-Id: I6d33868695664ec73b56c42945162ae61c484856
Frame buffers are now allocated dynamically on-demand.
Entries in the reference frame map, cm->ref_frame_map,
may now be set to -1 (INVALID_IDX) to indicate that
there is not a valid reference buffer in that "slot".
All slots in the reference frame map are now initialized
to the empty state (-1) and each buffer is initialized
to have a reference count of 0.
Change-Id: Id1afe98de98db4ae8b2dfefed7889c3b28c68582
Use rectangular block size for integral projection motion estimation
if the the 64x64 block has over half block outside the frame. This
avoids the issue that the motion information of these blocks is
dominated by the extended pixels, instead of the pixels of interest.
Change-Id: I22f4d2bb7f6a20db9b3f5e2e5463a7f4b9d1b737
The rounding factor needs to be scaled down by a factor of 2.
Also, the quantized and dequantized coefficients are memset to 0
when dc quantizer is used.
Change-Id: Ifa68bab02addbf1b83d249c5b4cbd5cda796b1cf
A frame may be waiting for an out of border pixel from another
frame. A frame's row progress variable is set to -1 when start being decoded
and another frame may be waiting for -2 row pixel from this frame.
In this case, vp9_frameworker_wait will return directly and skip the waiting
which leads to tsan error between threads.
Change-Id: Id16604915fb598b823e34393f696e3aa46fb6422
Most of the current decoders use tile-based multithreading. Also
most of the current decoders need frame_parallel_decoding_mode
turned on to enable multithreaded decoding. tile-columns is
limited by resolution, so setting to max (6) is fine.
BUG=https://code.google.com/p/webm/issues/detail?id=963
Change-Id: I6e7ac3485d96bf0c69e06706cbb326dd38be0020
Instead using only a fixed threshold, this commit adapts the threshold
for color sensitivity decision to luma signal energy: chroma channel's
sse is at least 1/6 of that in luma for color sensitivity flag to be
set to active.
This recoups a large portion of the speed loss due to accounting for
chroma component costs in RTC mode decision.
Change-Id: Ie01f747f6037dba6a1d1ed3e10b71a0ef1abc42c
This patch fixed webm issue 962.
(https://code.google.com/p/webm/issues/detail?id=962)
The data races occurred when an encoder and a decoder were created
at the same time, and the function pointers were initialized twice.
Change-Id: I8851b753c4b4ad4767d6eea781b61f0ac9abb44b
This commit replaces the SAD with variance as metric for the
integral projection vector match. It improves the search accuracy
in the presence of slight light change. The average speed -6
compression performance for rtc set is improved by 1.7%. No speed
changes are observed for the test clips.
Change-Id: I71c1d27e42de2aa429fb3564e6549bba1c7d6d4d
This commit fixes an issue in source frame border extension. It
causes certain frame resolution such as 640x480 to have a portion
of the right/bottom extension filled by zeros, which misleads
motion search and degrades transform coding performance when large
block size is used.
This fix improves the speed 2 compression performance of a few
yt sequence, typically ranging from 1% - 2%, up to 5% at median
to low bit-rate.
Change-Id: Id6b09a5695d9e7651c6dfbc2c6a72288b08af7fb
both the encode and decode perf tests require niklas_1280_720_30.yuv
broken since:
28eebf3 Merge "tests: add a shorter 720p test clip"
7839d03 tests: add a shorter 720p test clip
Change-Id: I51ebbf7261832e25d8f2c1da5c7df5c2e47f748e
+ add a helper function to reduce the duplication
this is a bit clearer when the environment variable is set, but the
directory is missing
Change-Id: I08f9b56122b5741bb40a5f795f7f82f5b49f1047
niklas_1280_720_30.y4m 60 frames @ 30fps
only a small number of frames are being used; this reduces the test data
download size in non-perf-test cases by >500M.
retain niklas_1280_720_30.yuv for encode+decode perf tests
Change-Id: I56b3433104acd462f952a9554280de5a3ec0b6d2
This commit applies one-step refinement search to the resulting
motion vector of the integral projectiion based motion estimation,
per 64x64 block. It improves the coding performance of speed -6.
pedestrian 1080p 500 kbps
51735 b/f, 36.794 dB, 16044 ms ->
51382 b/f, 36.793 dB, 16282 ms
cloud 1080p 500 kbps
24081 b/f, 37.988 dB, 14016 ms ->
23597 b/f, 38.076 dB, 12774 ms
vidyo1 720p 1000 kbps
16552 b/f, 40.514 dB, 8279 ms ->
16553 b/f, 40.543 dB, 8510 ms
The rtc set compression performance is improved by 0.5%.
Change-Id: I3d09bea2caf58b2a4f3b38aa26fffafcbe9a2c17
The intrinsic statement _mm_subs_epi16() should take immediate.
Feeding variable as its input argument will cause compile failure
in older version gcc.
Change-Id: I6a71efcc8d3b16b84715e0a9bcfa818494eea3f4
This commit modifies the hierarchical vector match patter. It
avoids repeated SAD computation at same points. The function
vp9_vector_sad_sse2 is called 12 times per 64x64 block, instead
of 15 times as before. The effective coverage remains the same.
Change-Id: I91ad9d27d40db8963c907d02af84e10702136994
In ssse3 functions, DEFINE_ARGS macro hard codes qcoeff and dqcoeff
to r3 and r4. If skip is 1, qcoeff and dqcoeff need to be loaded
from the stack, which doesn't work because of the above definitions.
Currently, skip=1 case is not used in the encoder. This patch fixed
the issue, so it can be turned on later.
Change-Id: I998d696b1a7a85dca2b3bcee790b21c21e039147
When GF group adaptive maxQ is enabled this patch accounts
somewhat for accumulated error in the rate control.
This improves accuracy quite a bit on many clips especially
when there is overshoot.
Examples when the overshoot and undershoot command line
parameters are set to 100:
Hall @ 1200 overshoot is reduced from 67-24%.
Akiyo @ 400 undershoot is reduced from 28%-15%.
Setting a lower value for undershoot or overshoot still
reduces the error further.
Impact on metrics is mixed with some gains in average psnr
but generally a little lower (e.g. 0.5%) on overall and ssim.
The GF group adaptation is still off by default in this patch.
Compared to with the head, enabling this mode now gives
big average psnr gains on the YT sets (e.g. YT_HD >11.2%),
a drop in overall PSNR (YT-HD 3.9%) and a smaller drop or
neutral for SSIM.
Change-Id: If4b32cd0740d3fb941317b374f9c2951954eee90
Target higher delta-qp for big blocks with zero motion,
and for segment#1: avoid 64x64 partition size and force 8x8 tx size.
Metrics on RTC set mostly positive: SSIM up by ~4%, PSRN by ~1.5%.
Doesn't seem to be any change in speed.
Change-Id: I1f68fa3c4f62dab3b90cc58041f05ebb048ae5ac
Modified the thresholds of deciding whether or not to skip
the transforms in model_rd_for_sb_y(). Used zbin[] instead
of dequant[] to be more precise. Also, modified the checking
coditions.
Rtc set borg test results (at speed 6) showed:
average PSNR gain: 0.138%, overall PSNR gain: 0.158%,
and SSIM gain: 0.177%.
The data rate test was modified slightly as suggested by
Marco.
Change-Id: Ieaf633ab77f4838cb3c45cf69065b29d55f8ae6c
This commit introduces a new block match motion estimation
using integral projection measurement. The 2-D block and the nearby
region is projected onto the horizontal and vertical 1-D vectors,
respectively. It then runs vector match, instead of block match,
over the two separate 1-D vectors to locate the motion compensated
reference block.
This process is run per 64x64 block to align the reference before
choosing partitioning in speed 6. The overall CPU cycle cost due
to this additional 64x64 block match (SSE2 version) takes around 2%
at low bit-rate rtc speed 6. When strong motion activities exist in
the video sequence, it substantially improves the partition
selection accuracy, thereby achieving better compression performance
and lower CPU cycles.
The experiments were tested in RTC speed -6 setting:
cloud 1080p 500 kbps
17006 b/f, 37.086 dB, 5386 ms ->
16669 b/f, 37.970 dB, 5085 ms (>0.9dB gain and 6% faster)
pedestrian_area 1080p 500 kbps
53537 b/f, 36.771 dB, 18706 ms ->
51897 b/f, 36.792 dB, 18585 ms (4% bit-rate savings)
blue_sky 1080p 500 kbps
70214 b/f, 33.600 dB, 13979 ms ->
53885 b/f, 33.645 dB, 10878 ms (30% bit-rate savings, 25% faster)
jimred 400 kbps
13380 b/f, 36.014 dB, 5723 ms ->
13377 b/f, 36.087 dB, 5831 ms (2% bit-rate savings, 2% slower)
Change-Id: Iffdb6ea5b16b77016bfa3dd3904d284168ae649c
There is a corner case that when a frame is corrupted, the following
inter frame decode worker will miss the previous failure. To solve
this problem, a need_resync flag needs to be added to master thread
to keep control of that.
Change-Id: Iea9309b2562e7b59a83dd6b720607410286c90a6
using this to control reallocation would miss a change if the function
were not called for every frame.
fixes potential memory corruption by the subsequent memset
Change-Id: I4c6bb6ab68803104fc824c7e27cc2f9b2cf53e33
use VP[89]_INSTANTIATE_TEST_CASE case when possible to disable the tests if
the codec is unavailable.
broken since:
be6aead Try again to merge branch 'frame-parallel' into master branch.
Change-Id: I8d81c5ba3b951f82be94bfaed6be194e4289baec
This commit prevent the encoder to update last_frame_type when a frame
is dropped in the encoder. Prior to this fix, if there is a dropped
frame immediatedly after a key frame, decoder would have the value of
last_frame_type as key frame, different from encoder as the dropped
frame in encoder would have updated the value to an inter frame. This
leads to different probability update in encoder and decoder, thereby
encoder/decoder mismatch.
This fixes issue #941
Change-Id: I27115224b138bec43ae3916c016574f5740822b0
Replaced a divide by 9 with 8, so some very small difference,
but otherwise no change in behavior.
Change-Id: I1079ae3c41e0789ff0bc6fa9940a238b6bca0f5b
Simple skin detection, from vp8; works reasonable on most of the
RTC clips, but could miss sometimes.
Added debug flag to write out skin map over source input.
Change-Id: I2caea7592f1c459047aac46627eeb24a94946464
This commit allows the encoder to properly account for the mode
cost in sub8x8 non-RD mode decision.
Change-Id: I2951960d20e37ed08e372ee0c7044935b2b9b899
Add the rate cost on inter prediction filter type to the overall
rate-distortion cost in vp9_pick_mode_inter.
Change-Id: I72c34017adf5220cadb3962694ee5404469fc673
This commit adds a heuristic rate cost of reference frame to the
non-RD mode decision. It improves the compression performance of
speed -6 by 0.31% and speed -5 by 0.69%.
Change-Id: If7f3b45519d49b2cb640bcb7316a254efc8be446
This enbale the encoder to set color space information for the input
video, so it is then coded in the output bitstream.
Change-Id: Ife03deab3c762425ccd27c4c190902c4d94a76f4
MODE_INFO struct was modified, and vp9_print_modes_and_motion_vectors()
didn't work anymore. This patch modified vp9_debugmodes.c so that
this function works again for debug usage.
Change-Id: I293fae0295235deb2529a460a274caf7c045ac1a
This is to avoid redo the same calculation repeatly, and also allow
easier adjustments for further experiments.
This commit shall have no effect on quality/compression.
Change-Id: I4460acf5c808ff5518da18d21e002c5da58af857
Note: This feature is still in development.
Add an option for the encoder to decide the resolution
at which to encode each frame.
Each KF/GF/ARF goup is tested to see if it would be
better encoded at a lower resolution. At present, each
KF/GF/ARF is coded first at full-size and if the coded
size exceeds a threshold (twice target data rate) at
the maximum active Q then the entire group is encoded
at lower resolution.
This feature is enabled in vpxenc by setting:
--resize-allowed=1
In addition, if the vpxenc command line also specifies
valid frame dimensions using:
--resize-width=XXXX & --resize_height=YYYY
then *all* frames will be encoded at this resolution.
Change-Id: I13f341e0a82512f9e84e144e0f3b5aed8a65402b
This commit fixes the sub block partition size used in
fill_mode_info_sb. Previous implementation effectively disabled
the rectangular block sizes. This commit resolved this issue.
Change-Id: Ic1c383ab0a9a2e7d59e85b388093f1f1f94d1e7f
This will fix the frame parallel decode hang on windows
due to not enough semaphores.
This will also make the frame parallel decode safer as
the number of frame buffers could only support maximum
8 threads.
Change-Id: Id9ef50692819dcbebbd74a0aabffbfb3f39a4309
This commit changes the value of highbitdepth flag to avoid conflict
with vp8 refresh_last_frame flag.
Change-Id: Idcff2cf44f0a200bd935b326f785c0cf32d7228a
The calculation of required extension used in HBD case was wrong due
to rounding for UV when y dimension is odd. This commit replace the
computation with correct version.
This fixes a crash caused by writting beyond buffer boundary.
Change-Id: Ic7c9afeb7388cd1341ec4974a611dacfb74ac6b6
This commit makes the ZEROMV mode first in the search order to
ensure that the zero mv is always checked in the RTC coding mode.
It improves the average speed -6 compression performance by 0.3%
in both PSNR and SSIM at no visible speed change.
Change-Id: I465a7e59f4e20cd84fee3f02ced6f98036945949
This reverts commit a6715a7558.
Removes a duplicate entry; this was previously added by:
14e37cf Add help info for --enable-vp9-highbitdepth
Change-Id: I61408e782232821ef6ed84775b5c79d172ba7f2d
cm->frame_bufs[].idx values were made consistent in:
61c5e94 Use -1 consistently as invalid buffer idx
update the initialization in swap_frame_buffers() to match.
additionally:
- remove some shadowed variables in the former and marked them volatile
Change-Id: Ie3f9636c405bd822112bb56bd22d28024ae98909
The high bit depth build failed while building for 32bit target.
The bugs were in vp9_highbd_subpel_variance.asm and
vp9_highbd_sad4d_sse2.asm functions. This patch fixed the bugs,
and made 32bit build work.
Change-Id: Idc8e5e1b7965bb70d4afba140c6583c5d9666b75
there are no known issues since:
10d5e09 Fix issues in 32bit PIC enabled build
related issues: #808, #924
Change-Id: I80454f95fe6b4ce630fdd434d740ce8b0d42951b
The current file's directory, ".", is treated much more literally
when building libvpx examples with Xcode than it is with make, and
clang cannot find common include files included via "./" when those
files actually reside one directory up in the tree.
Change-Id: I5f66a026282e35d80248ca4052ebb882b859172e
On rtc set:
speed 7 quality improves about 0.5%
speed 8 quality improves about 1.0%
Encoding time for speed 7 changes from 67804ms to 65889ms
Encoding time for speed 8 changes from 58659ms to 56808ms
Change-Id: Iabcfb53012fc1b9f3326cdbc167e5758b8c7ad30
After syncing the frame worker thread, avaiable thread count should
increase by 1 even the worker thread does not have displayable frame
to output.
Change-Id: I9eeb87720fed82dfe38555286833ff88e8a8e746
Reuse the yv12_mb array to fetch the buffer pointers/strides
corresponding to the current reference frame.
Change-Id: I5276b7494158b2cccef15213be2dc189e9036851
This commit allows the encoder to account for additional chroma
plane costs in the mode decision process, if the current block
potentially contains significant color change. It improves the
visual quality at very low bit-rates.
The compression performance of dark720p is improved by 12.39% in
speed 6. For jimred at 150 kbps, the PSNR of V component (red)
increased by 0.2 dB, at the expense of about 5% increase in
encoding time. Note that for sequences where the chroma components
are fairly consistent, the encoding time increase is negligible.
On average the rtc set compression performance is improved by
1.172% in PSNR and 1.920% in SSIM.
Change-Id: Ia55b24ef23a25304f7ec9958fbf07fd6e658505c
This patch continues the work to remove frame_parallel_decoding_mode
requirement in VP9 multi-threaded tile decoder. In order to do that,
the frame counts associated to each thread need to be accumulated
together after the frame is decoded.
Change-Id: Idba1a756cedfed3c154aef52ed82c8da3bbf9e0c
The original implementation had the following comment:
// Ignore mv costing if mvsadcost is NULL
However the current implementation does not allow for this.
If x exists then nmvsadcost must not be null.
This removes the only warning from -Wpointer-bool-conversion
https://code.google.com/p/webm/issues/detail?id=894
Change-Id: I1a2cee340d7972d41e1bbbe1ec8dfbe917667085
The current multi-threaded tile decoder requires that the videoes
are encoded with frame_parallel_decoding_mode = 1. This requirement
is not necessary, and is better to be removed. This patch includes
the first part of the work.
Change-Id: Ic7695fb3cfe13f9022582c9f0edd2aa6e2e36d28
In vp8_sub_pixel_variance8x8_neon the temp2 buffer is only initialized
to kHeight8 * kWidth8. However, in the case that xoffset != 0 and
yoffset == 0, var_filter_block2d_bil_w8 is called with output_width
kHeight8PlusOne.
Thanks to cmugurel for diagnosing and yulius for the patch.
Change-Id: Ib71ffd96ffad963c92b8b7ca23f303942785b8e0
https://code.google.com/p/webrtc/issues/detail?id=4190
Apple ships version 0.98 of nasm through at least XCode 6. It is
incompatible with the assembly in libvpx.
https://code.google.com/p/webm/issues/detail?id=772
Change-Id: I33245a76f50a8224fe6fafa3cce9991f953fdcc8
1. Adjusted the threshold for coef update computation based on counts
of tx used, avoid coef update computation when count is low (<20)
2. Move sf->lpf_pick = LPF_PICK_MINIMAL_LPF to speed 8.
Change-Id: I02b44309e40fcdbf135c7934ae067a3f42502d30
In frame parallel decode, libvpx decoder decodes several frames on all
cpus in parallel fashion. If not being flushed, it will only return frame
when all the cpus are busy. If getting flushed, it will return all the
frames in the decoder. Compare with current serial decode mode in which
libvpx decoder is idle between decode calls, libvpx decoder is busy
between decode calls.
Current frame parallel decode will only speed up the decoding for frame
parallel encoded videos. For non frame parallel encoded videos, frame
parallel decode is slower than serial decode due to lack of loopfilter
worker thread.
There are still some known issues that need to be addressed. For example:
decode frame parallel videos with segmentation enabled is not right sometimes.
* frame-parallel:
Add error handling for frame parallel decode and unit test for that.
Fix a bug in frame parallel decode and add a unit test for that.
Add two test vectors to test frame parallel decode.
Add key frame seeking to webmdec and webm_video_source.
Implement frame parallel decode for VP9.
Increase the thread test range to cover 5, 6, 7, 8 threads.
Fix a bug in adding frame parallel unit test.
Add VP9 frame-parallel unit test.
Manually pick "Make the api behavior conform to api spec." from master branch.
Move vp9_dec_build_inter_predictors_* to decoder folder.
Add segmentation map array for current and last frame segmentation.
Include the right header for VP9 worker thread.
Move vp9_thread.* to common.
ctrl_get_reference does not need user_priv.
Seperate the frame buffers from VP9 encoder/decoder structure.
Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:"""
Conflicts:
test/codec_factory.h
test/decode_test_driver.cc
test/decode_test_driver.h
test/invalid_file_test.cc
test/test-data.sha1
test/test.mk
test/test_vectors.cc
vp8/vp8_dx_iface.c
vp9/common/vp9_alloccommon.c
vp9/common/vp9_entropymode.c
vp9/common/vp9_loopfilter_thread.c
vp9/common/vp9_loopfilter_thread.h
vp9/common/vp9_mvref_common.c
vp9/common/vp9_onyxc_int.h
vp9/common/vp9_reconinter.c
vp9/decoder/vp9_decodeframe.c
vp9/decoder/vp9_decodeframe.h
vp9/decoder/vp9_decodemv.c
vp9/decoder/vp9_decoder.c
vp9/decoder/vp9_decoder.h
vp9/encoder/vp9_encoder.c
vp9/encoder/vp9_pickmode.c
vp9/encoder/vp9_rdopt.c
vp9/vp9_cx_iface.c
vp9/vp9_dx_iface.c
This reverts commit a18da9760a.
Change-Id: I361442ffec1586d036ea2e0ee97ce4f077585f02
+ nearest for consistency
near is a reserved word in windows builds so using it as a parameter
name may cause build failures with some configurations
Change-Id: Iddf1d4ecdb39843f14e95dbfd9dca55f07f81403
1. move the check of search method of USE_TX_8X8 up one level to
avoid operations of build_tree_distributions()
2. count tx used and avoid computaton for coef udpate when one size
is not used at all.
Change-Id: Ia3e54a2588aa531c41377a1bfaa64385d04a592c
The previous patch "Fix issues in 32bit PIC enabled build" fixed
the x86inc.asm for macho32. Now we can enable use_x86inc while
building libvpx for 32bit pic enabled Darwin target, which makes
the encoder a lot faster(>2X) in this case by turning on the
existing optimizations.
Change-Id: I5f5c7add428d73f50c935c48d0a70aed2b1eb7af
gives a better summary of what is enabled / disabled outside of the
automatic toolchain options.
fixes issue #936
Change-Id: I1bf27593a5512713aab1473cb606c58cf3084d62
1. reduce the size of temporaray arrays on stack
2. avoid build_tree_distribution for tx size that is not used at all.
Change-Id: I0f8d7124e16a3789d3c15ad24cf02c1c12789e2c
This patch was to fix issue 924:
https://code.google.com/p/webm/issues/detail?id=924
The SECTION_RODATA macro was modified to support macho32 format.
The sub-pixel functions were modified to pass in 2 more parameters
to handle the global offsets for PIC build.
Change-Id: I3bfcd336bcae945edf300bca4ab40376a2628cd4
On Nexus 7 speed -6 saw ~18% increase in perf.
Tested on Nexus 7, built with ndk r10d, gcc 4.9.
BUG=https://code.google.com/p/webm/issues/detail?id=908
Change-Id: I70ccdea0326750552ed946fb004507d6efe02d5c
On Nexus 7 speed -6 saw ~15% increase in perf.
Tested on Nexus 7, built with ndk r10d, gcc 4.9.
BUG=https://code.google.com/p/webm/issues/detail?id=908
Change-Id: I4b2006b644c488f42bf06d8a22ef0e6120a96bf9
On Nexus 7 speed -6 saw ~30% increase in perf.
Tested on Nexus 7, built with ndk r10d, gcc 4.9.
BUG=https://code.google.com/p/webm/issues/detail?id=908
Change-Id: Id12af7d1883243c23e6692e898aea82299633d58
The current method doesn't work with Xcode 4 and up, since they no
longer have a $DEVELOPER_DIR/SDKs directory. Using xcrun and xcodebuild
works all the way back to Xcode 3 on OS X 10.6 Snow Leopard, if not
earlier.
Change-Id: I7126f2fb4a8f1d6e46f921e70bbd090f00ce3d36
Floating point is used in vp9_convert_qindex_to_q(), so sometime unit
test ActiveMapTest would cause run time error without properly call
to clear_system_state to reset register status.
Change-Id: I181e9395148c44a6ca8b97d6e109bd4a152143c6
Add distortion threshold condition to refresh state of a coding block,
and allow for qp adjustment also for some intra modes and non-zero motion modes.
Also some code cleanup (remove unused variables/code).
Change-Id: I735fa2b28bc64f60e0323976b82510577b074203
Currently disabled by default: enabled using
#define GROUP_ADAPTIVE_MAXQ
In this patch the active max Q is adjusted for each GF
group based on the vbr bit allocation and raw first pass
group error.
This will tend to give a lower q for easy sections
and a higher value for very hard sections. As such it is
expected to improve quality in some of the easier
sections where quality issues have been reported.
This change tends to hurt overall psnr but help
average psnr. SSIM also shows a small gain.
Average results for derf, yt, std-hd and yt-hd test sets were
as follows (%change for average psnr, overal psnr and ssim):-
derf +0.291, - 0.252, -0.021
yt +6.466, -1.436, +0.552
std-hd +0.490, +0.014, +0.380
yt-hd +5.565, - 1.573, +0.099
Change-Id: Icc015499cebbf2a45054a05e8e31f3dfb43f944a
On Nexus 7 speed -5 got ~2%, -6 got ~15%, -7 and -8 got ~30%
increase in perf.
Tested on Nexus 7, built with ndk r10d, gcc 4.9.
Change-Id: I83246d63b96674d170098a572fa4fe28a05aaf51
the result should have both bits set; previously this was converted from
webp incorrectly and resulted in a boolean check...
Change-Id: I2a7c7f2b491945f3a536ab4fca02247eccc892b8
This commit replaces an integer divide with a table-lookup. It is
to improve decoding speed, and at the same time, to reduce possible
complications with a bug in AMD Family 12h processors:
"665 Integer Divide Instruction May Cause Unpredictable Behavior"
Change-Id: I678b707a538798a923850bac467e66e847e6def7
In frame parallel decode, libvpx decoder decodes several frames on all
cpus in parallel fashion. If not being flushed, it will only return frame
when all the cpus are busy. If getting flushed, it will return all the
frames in the decoder. Compare with current serial decode mode in which
libvpx decoder is idle between decode calls, libvpx decoder is busy
between decode calls. VP9 frame parallel decode is >30% faster than serial
decode with tile parallel threading which will makes devices play 1080P
VP9 videos more easily.
* frame-parallel:
Add error handling for frame parallel decode and unit test for that.
Fix a bug in frame parallel decode and add a unit test for that.
Add two test vectors to test frame parallel decode.
Add key frame seeking to webmdec and webm_video_source.
Implement frame parallel decode for VP9.
Increase the thread test range to cover 5, 6, 7, 8 threads.
Fix a bug in adding frame parallel unit test.
Add VP9 frame-parallel unit test.
Manually pick "Make the api behavior conform to api spec." from master branch.
Move vp9_dec_build_inter_predictors_* to decoder folder.
Add segmentation map array for current and last frame segmentation.
Include the right header for VP9 worker thread.
Move vp9_thread.* to common.
ctrl_get_reference does not need user_priv.
Seperate the frame buffers from VP9 encoder/decoder structure.
Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:"""
Conflicts:
test/codec_factory.h
test/decode_test_driver.cc
test/decode_test_driver.h
test/invalid_file_test.cc
test/test-data.sha1
test/test.mk
test/test_vectors.cc
vp8/vp8_dx_iface.c
vp9/common/vp9_alloccommon.c
vp9/common/vp9_entropymode.c
vp9/common/vp9_loopfilter_thread.c
vp9/common/vp9_loopfilter_thread.h
vp9/common/vp9_mvref_common.c
vp9/common/vp9_onyxc_int.h
vp9/common/vp9_reconinter.c
vp9/decoder/vp9_decodeframe.c
vp9/decoder/vp9_decodeframe.h
vp9/decoder/vp9_decodemv.c
vp9/decoder/vp9_decoder.c
vp9/decoder/vp9_decoder.h
vp9/encoder/vp9_encoder.c
vp9/encoder/vp9_pickmode.c
vp9/encoder/vp9_rdopt.c
vp9/vp9_cx_iface.c
vp9/vp9_dx_iface.c
Change-Id: Ib92eb35851c172d0624970e312ed515054e5ca64
For low spatial resolutions: bias partittion selection to smaller block sizes,
and base the variance computation on 4x4 down-sampling.
Also move the threshold computations into the choose_partitioning,
so they are computed once for each sb block.
On low-res clips (RTC_derf) PSNR/SSIMetrics increase by about 4-5%.
No change for resolutions above CIF.
Change-Id: I93f8ff742c8044786977bb6e31dcf8efda6dd1b0
Just before a forced key frame we often get a foreshortened
arf/gf group. In such a case, we do not want to update
rc->last_boosted_qindex, which is used to define the Q range
for the forced key frame itself.
This gives a small average metrics gain for the YT and YT-HD sets
(eg. YT SSIM +0.141%).
Change-Id: Ie06698bc4f249e87183b8f8fb27ff8f3fde216d9
The comparison of address in the condition is not necessary, since
they will constantly be non-null.
Change-Id: Id0b0075283f5af65215d5761a8160a4cb2a15c9b
The SSE2 code is from VP8 MFQE, reuse it in VP9. No change on VP8
side. In our testing, we achieve 2X speed by adopting this change.
Change-Id: Ib2b14144ae57c892005c1c4b84e3379d02e56716
1. Added row-based loopfilter in encoder;
2. Moved common multi-threaded loopfilter functions from decoder
to common;
3. Merged multi-threaded loopfilter code, and made encoder/
decoder call same function to reduce code duplication.
Encoder tests showed that 1% - 2% speedup was seen for good-quality
2-pass mode(at speed 3); 1% - 3% speedup using 2 threads and 4% - 6%
speedup using 4 threads were seen for real-time mode(at speed 7).
Change-Id: I8a4ac51c2ad9bab9fa7b864e90743931c53ec1c4
This commit fixes a bug in denoiser reference frame buffer swap,
which disables frame buffer update.
Change-Id: I39a9427180fd18f9692602064ad821f7af4714c0
On Nexus 7 speed -5, -6, -7, and -8 saw about a 1% increase
in perf for 480p. Speeds -5, -6, -7, and -8 saw about a 1.5%
increase in perf for 720p.
Tested on Nexus 7, built with ndk r10d, gcc 4.9.
Change-Id: Ibf17ebfd952a6aec941719bd8306df8ec4574bee
On some platforms, such as 32bit Windows and 32bit Mac, the allocated
memory isn't aligned automatically. The thread data is aligned to
ensure the correct access in SIMD code.
Change-Id: I1108c145fe982ddbd3d9324952758297120e4806
This commit added coments for the following encoder controls:
VP9E_SET_LOSSLESS
VP9E_SET_TILE_COLUMNS
VP9E_SET_TILE_ROWS
VP9E_SET_FRAME_PARALLEL_DECODING
VP9E_SET_AQ_MODE
Change-Id: I2f75afd9cce01394f202b8e25f36bf763be0ddeb
This commit adds encoder side control for vp9 to set color space info
in the output compressed bitstream.
It also amends the "vp9_encoder_params_get_to_decoder" test to verify
the correct color space information is passed from the encoder end to
decoder end.
Change-Id: Ibf5fba2edcb2a8dc37557f6fae5c7816efa52650
On Nexus 7 speed -5, -6, -7, and -8 saw about a 15% increase
in perf for 480p. Speeds -5, -6, -7, and -8 saw about a 10%
increase in perf for 720p.
Tested on Nexus 7, built with ndk r10d, gcc 4.9.
Change-Id: I2fa5315845e3021c9a6e2ea47e52e68b398d8334
Don't put small empty frame in front of a key frame. We will
put key frame flag in webm container if there's a visible key
frame. But there will be decoding error when we seek to here
if we put the small empty frame, which will be inter frame,
in front of it.
Change-Id: Id50c2c1fd31da0405ff6faa7375cc2f49c55402d
This commit added a field to vpx_image_t for indicating color space,
the field is also added to YUV_BUFFER_CONFIG. This allows the color
space information pass through the decoder from input stream to the
output buffer.
The commit also updated compare_img() function with added verification
of matching color space to ensure the color space information to be
correctly passed from encode to decoder in compressed vp9 streams.
Change-Id: I412776ec83defd8a09d76759aeb057b8fa690371
Add optimized Neon functions of:
vp9_variance32x64
vp9_variance64x32
vp9_variance64x64
On Nexus 7 speed -5 and -6 saw about a 4% increase in perf.
Speeds -7 and -8 saw about a 6% increase in perf.
Tested on Nexus 7, built with ndk r10d, gcc 4.9.
Change-Id: I5a81f13c9897eb927fa39662530f5524a0f768fa
Replaced "color space" with "color format" in comments where color
sampling format is concerned, so to differentiate from the concept
defined in COLOR_SPACE.
Change-Id: I8c935034c166b24307a99352dab1686531276bb8
This commit refactors the motion compensated reference block fetch
process in denoiser. It skips the stage that generates motion
compensated reference block if denoiser decides to use copy block
mode. For high motion clips, this could speed up the denoising
process by about 10%.
Change-Id: I8ef4fa5fe766a8c4529119b9ec01faefb3d4ef53
Use frame buffer pointer swap instead of memcpy when possible.
These two CLs make the denoiser when running on vidyo1 720p at
speed -6 over 10% faster.
Change-Id: I64fe8a2422cafca6787a50c7f4dfb961191c0a9d
These two parameters are used to control the denoiser cut-off
thresholds. They should be properly initialized when starting
mode search of a given block.
Change-Id: Iba8a25487026a0dbe0d350c347d7e4e4e237b637
When qdiff is larger, the sad/variance threshold should also be
higher which indicates a more aggressive action on MFQE.
Change-Id: I44c5c93572805458d4f87fdc7619cc9d8a522185
The vp9_denoiser_free() function will internally check if the
buffer pointers are NULL. This commit makes the encoder always
call vp9_denoiser_free() after finishing encoding. It protects the
case where noise_sensitivity_level is changed during encoding
process and happen to be turned off towards the end of sequence,
which could result memory space allocated to denoiser not being
released.
Change-Id: Ie20dc2f2e6e5fb6333fbab3356bc153978a6a0f8
Use the correct frame size and stride value for chroma components
when setting the initial values. These control parameters are
assigned when the denoiser buffer was allocated and initialized.
Change-Id: Ia6318194c7738aff540bcbd34f77f0dac46221a1
Allocate the frame buffer allocation for denoiser once during the
encoder initialization. This avoids allocating frame buffer
multiple times and overwriting the buffer pointer without proper
releasing.
Change-Id: I9b3baa6283449d86fd164534d344c036bb035700
When testing frame sse to choose a loop filter value and
when checking ambient error in kf Q selection, use 64 bit
values for accumulating the sse, to avoid risk of overflow
for large image formats.
Change-Id: I03765d16c843d0ade61a45b0cd46312472697e57
Separate functions and rename files. This will make it easier to disable
some functions later to help work around a compiler issue in chromium.
Change-Id: I7f30e109f77c4cd22e2eda7bd006672f090c1dc5
This makes the inter_mode counts update consistent with other symbols.
Also, forward updates should work corerctly now.
Change-Id: Id98be26fd08875162e644bb8f1de6f0918f85396
The denoiser sensitivity level should be set to 1 starting from
key frame. The internal function of denoiser should make the
temporal denoising operations cut off in key frame coding.
Change-Id: Id3e704a73e98e4ea801284a2cbbab2ea9c371d23
With "show_existing_frame" frames:
Minimum data size for profile 0 and 1 is 1 byte (8bits)
Minimum data size for profile 2 and 3 is 2 bytes (9bits)
Otherwise:
Minimum data size is 8 bytes.
This resolves the VP9 failure in fuzzing test build #56.
Change-Id: I146d9d37688f535dd68d24aacc76d464ccffdf04
By using weighted averaging in the calculation of the frames to be
displayed, we get an average gain of more than 1 db for key frames
whose base qp are 20 higher than non-key frames.
Change-Id: I7bcb2e7b9c6420ea3f73f33204d18b072dffd17c
This commit fixes the buffer alignment control in denoised video
output function. The encoder is now able to properly store the
denoised input video into provided file when enabled.
Change-Id: I258e272c8d4a9b52592e16d6d09976c6f5c21728
Use mbmi->segment_id directly in vp9_pick_inter_mode. The value is
set outside this function, hence no need to assign it again.
Change-Id: I3d63cdd2e4fadf62ccdefada638b00d979eb3741
Check if block size is below 8x8 for rectangular block coding. It
is added to support 4x8 and 8x4 block coding for RTC mode.
Change-Id: I760b328f45b98ae48adc45ed5a39fb643cd8aebd
This commit simplifies the reference motion vector part for sub8x8
block coding in RTC mode and reduces the required local variables.
Change-Id: I470d1482092563b68af22404dc1f497e7457b0a8
VP9FrameSizeTestsLarge.OneByOneVideo has been causing a failure in
jenkins libvpx__unit_tests-valgrind_long for "using of uninitialized
memory", the root cause was that the input image for this test was
not initialized with proper size, therefore plan U and V were not
initialized at all.
This commit fixes the size initialization, and resolves the issue.
Change-Id: Ic4dd1542b7bb0cb260a1e0aeeb505db21ae5edc8
This commit enables sub8x8 inter block coding for RTC mode. The
use of sub8x8 blocks can be turned on by allowing
choose_partitioning function to select 4x4/4x8/8x4 block sizes.
Change-Id: Ifbf1fb3888fe4c094fc85158ac3aa89867d8494a
Properly set the corresponding scaling factor of the reference
frame in the non-RD mode decision process. This allows the mode
search process to account for the scaled reference frame when
selecting coding mode.
Change-Id: I9d41bff6931c98e5a82b413e37ac5e6e14b93b23
Local variables used at the setjmp() site need to be marked volatile.
Relevant excerpt from the 'man longjmp':
===============
The values of automatic variables are unspecified after a call to
longjmp() if they meet all the following criteria:
· they are local to the function that made the corresponding setjmp(3) call;
· their values are changed between the calls to setjmp(3) and longjmp(); and
· they are not declared as volatile.
===============
Change-Id: I093e6eeeedbf5f781d202248ca701ba2c29d3064
This allows us to track decode speed for new encodes so that we catch
problems like an encode change that makes decode really slow.
Change-Id: I7210196415c4e53d455e9c81246d9fb324913a06
Encode the files with 1, 2, and 4 threads.
Explicitly turn on error resilient and frame parallel
decoding and turn off altref frames.
Change-Id: I02b66f72b7d35c666c3ba685b33015508e440209
The unit tests for VP9 multi-threaded encoder are added, which
carry out tests for all three modes(i.e. kTwoPassGood, kOnePassGood
and kRealTime), and speeds ranging from 1 to 8. A 1280x720 test
clip is used, which is encoded into multiple tiles. The number of
threads is num_of_tiles.
Change-Id: I04419eeca145ad841c9c527603668239a82e7fbd
This commit adds a guard condition to the intra mode test skip
control in RTC coding mode. If all inter modes are skipped, force
the encoder to check intra mode. It avoids situations where the
encoder processes without properly assigning required mode
information.
Change-Id: Ibb349fee997d6584ce901d08b06e8df3ca9c01b1
Initial patch to remove get_zbin_mode_boost() and
cpi->zbin_mode_boost.
For now sets a dummy value of 0 for zbin extra pending
a further clean up patch.
Change-Id: I64a1e1eca2d39baa8ffb0871b515a0be05c9a6af
this function may return an error if no frame is available; --keep-going
is meant to test decoder resilience, so simply warn in this case.
Change-Id: I6e6aed3e78eca21cca80d7d8a06a1a244685ba29
The alternate reference frame is disabled in non-RD mode. No need
to keep the related entries in the THR_MODES array.
Change-Id: I53386f4bb1c6284f582801f27246c5edf55bc24b
In RTC coding mode, the alternate reference frame modes and compound
inter prediction modes are disabled. This commit reworks the
related mode search threshold update process to skip interacting
with these coding modes. It provides about 1.5% speed-up for speed
-6 on average.
vidyo1
16551 b/f, 40.451 dB, 6261 ms -> 16550 b/f, 40.459 dB, 6190 ms
nik720p
33316 b/f, 38.795 dB, 6335 ms -> 33310 b/f, 38.798 dB, 6237 ms
mmmoving
33265 b/f, 41.055 dB, 7176 ms -> 33267 b/f, 41.064 dB, 7084 ms
dark720
33329 b/f, 39.729 dB, 11235 ms -> 33331 b/f, 39.733 dB, 10731 ms
Change-Id: If2a4090a371cd28f579be219c013b972d7d9b97f
This commit removes undefined value options of cpu-used for VP9 and
changed vpxenc prompt to reflect the usable range of [-8,8]
Change-Id: Ib80fef3dbb6ec9aabac45ed13e8ab6fbaf94f55e
Use a temporary variable to store the transform size associated
with the best intra mode and restore the mode_info if the overall
best mode is intra mode.
Change-Id: I2606e0061ad32f91b095462902b1eb734b128eea
The encoder initialization is called in EncodeFrame(). Therefore,
in the unit tests, the set control is done when video->frame() is 1.
This didn't cause problem since current tests mainly test lag_frame
> 0 case, or no encoding option that needs to allocate memory before
1st frame is used. If use lag_frame = 0 and encoding multiple tiles,
the unit tests crash. The issue is fixed by doing the initialization
before encoding frames.
Change-Id: I43102048f88448bcf27e9c60e0ec06c176b02e5c
Only for the rectangle blocks larger than 16X16, SAD and Variance are
still based on the internal square blocks.
Change-Id: I3754da1b0254147313f86a0140dbf4f980f06a5a
The mode_info array was unnecessarily reset to zero every frame
when error resilient mode turned on, given that the mode info
values per block will be assigned during mode search stage.
This commit removes this reset operation. It reduces the runtime
cost on memset operation to 1/3. The overall speed -6 runtime is
reduced by 2%.
Change-Id: I32ecb73338d8995cc0c5147de09357364f13d45b
This commit explicitly set the second reference frame type to be
NONE in key frame coding mode. This fixes a subtle dependency of
reference motion vector used by next inter frame on mode_info
reset before key frame coding.
Change-Id: I5ff0359753fdc9992b0bfe889490f7a32d7d5f6a
These were established for compatibility. Make sure to use them.
Most frequently they manifest as issues on Visual Studio builds.
Change-Id: I39d764d2eb341b999d7a6132cb44b2acfc511160
Export vpx_codec_enc_init_multi_ver so the vp8 multi res encoder example
can see it when building shared.
Change-Id: Ic5222b1b6d949f39c7e50c3bc58fb76bece2a3f1
Delete vp9_dc_only_idct_add_neon.c
The function was merged with vp9_short_idct4x4_1_add (later
vp9_idct4x4_1_add) in d2de1ca and should have been deleted then.
Change-Id: Ie58ba3dd9dc7330a8f1238dd7dd71c9ed4639b94
Signed-off-by: James Yu <james.yu@linaro.org>
Where there is very subtle motion, especially when combined
with low spatial complexity, the codec sometimes fails to quickly
pick up the ambient motion field.
Once it has been established though the field propagates well using
Nearest and Near MV.
This patch looks specifically at the case where the Nearest and Near
have not been established as non zero vectors and in this case
discounts the cost of searching for a new vector in the rd code.
This will almost certainly have some implications in terms of encode
speed but it should be possible to mitigate the impact in a subsequent
using first pass stats and the local spatial complexity.
Average results for test sets approximately neutral.
Change-Id: I44a29e20f11f7ab10f8c93ffbdc50183d9801524
Change 72141 introduced a new use of vp9_avg_4x4.
This call needs to switch to using vp9_highbd_avg_4x4
when performing high bitdepth encodes.
Change-Id: I6a8ba4b62f8a75d0a917b365a55245e2f0438ea1
When multiple intra modes are tested, the previous mode info
update process may overwrite the selected best intra mode and make
the final selection use an inter mode. This commit fixes this
issue by moving the mode_info reset outside the intra mode search
loop.
Change-Id: I15ed4288a6b3cb0832104a5e6d5d9a25cd1a5b2b
If vp9_pick_inter_mode works properly, it should at least check
one coding mode and hence get best_tx_size assigned a valid value.
There is no need to initialize best_tx_size with a legitimate
value before starting the mode search.
Change-Id: Ic0496cd89672ea9c2c512a9bd1da952190af9cba
Make the variable reduction_fac log2 based and explicitly use
right shift when computing intra_cost_penalty.
Change-Id: I208f1fb879a02debb3b3fc64f9fd06260dcf1c86
Add vp9_iht8x8_add_neon.c
- vp9_iht8x8_64_add_neon
The assembly did not previously implement tx_type 0
BUG=716
Change-Id: Icfc99dd24f3d59047f9184a7d0c761ba7e3de934
Signed-off-by: James Yu <james.yu@linaro.org>
Add vp9_iht4x4_add_neon.c
- vp9_iht4x4_16_add_neon
The assembly did not previously implement tx_type 0
BUG=715
Change-Id: I60034d1568de034edba45c5cdd13f3d87dbc73b6
Signed-off-by: James Yu <james.yu@linaro.org>
Fails to compile. Bad calls to vp9_alloc_frame_buffer
and vp9_realloc_frame_buffer in postproc.c
This reverts commit 399823b6f5.
Change-Id: I29f0e173f8e185d3a303cfdb17813e1eccb51e3a
Allows override of default target list. Also added missing usage info
for --extra-configure-args, and removed last vestiges of armv6 support.
Change-Id: Ic0f14fffa0cbaea1bed371d38ff65e035bbe3273
Add support for setting byte alignment on the Y, U, and V plane of the
reference buffers. The byte alignment must be a power of 2, from 32 to
1024. A value of 0 sets legacy alignment.
Change-Id: I7c1399622f7aa68e123646369216b32047dda73d
INLINE is used quite widely in vp9, this change improves performance
1-2% on most modern platforms.
Change-Id: I8a9974aab89fa588ea4923cc7eaf6199e344a528
the entire module is wrapped in CONFIG_VP9_POSTPROC which is forcibly
enabled with CONFIG_INTERNAL_STATS
+ a similar change in vp9_alloccommon.c
Change-Id: I374993297a9fba5bef2f0b71f984eba42f0995a3
set LIBVPX_RAND with --enable-vp9-postproc, previously only the vp8
config was checked. this fixes the build with --disable-postproc.
Change-Id: Ia61baded6aa0e44d6443ae4a3c85915f1054f053
Assembly tests should clear system state, as we have no
expectation of proper system state in between test runs..
Change-Id: I0f591996c1f17ef2a5a8572a6b445f757223a144
This commit fixes a bug in the PICK_MODE_CONTEXT index for
horizontal partition case. The compression performance change
is less than 0.01% level, since most blocks are selected to
use square block size in RTC coding mode.
Change-Id: I67effc18ae8795fccdd82a55f4efc609fa5cb3e1
For key frame under variance source partition: 4x4 prediction blocks
may be selected when variance of 8x8 block is very high (threshold is set fairly high for now).
Testing on some RTC clips shows this helps to reduce some ringing artifacts on key frame.
Encoded key frame size increases about ~10%. Key frame PSNR increases about ~0.1-0.2dB.
Change-Id: I56e203fac32ea6ef69897fb3ea269c59cb50d174
This commit explicitly uses the bit shift operation instead of
division for computing block variance.
Change-Id: Id19c0ff27dd1d1ae4aceee6657e1aad0d406bd74
If decoding starts with intra-only frame, there is a possibility
of using uninitialized entropy context, what leads to undefined
behavior.
Change-Id: Icbb64b5b1bd1e5de2a4bfa2884e56bc0a20840af
The 8x8 DCT uses a fast version whenever possible.
There was a mistake in the checking code which
meant sometimes the fast version was used when it
was not safe to do so.
Change-Id: I154c84c9e2d836764768a11082947ca30f4b5ab7
(cherry picked from commit fd05fb0c21)
This commit refactors the choose_partitioning function. It removes
redundant memset calls and makes the encoder to calculate
variance value per block only when it is needed. It reduces the
average runtime cost of choose_partitioning by 60%. Overall it
reduces speed -6 runtime by 2-5%.
Change-Id: I951922c50d901d0fff77a3bafc45992179bacef9
It is the first version of MFQE in VP9. There are a few TODOs included
in this version.
Usage: Add flag --enable-vp9-postproc to config the project.
In decoder, use flag --mfqe in the command line to enable
MFQE in postproc.
Note: Need to have key frame with low quality to see the effect of this
new patch. In my experiment, I fixed the qindex to 200 in key frame.
Change-Id: I021f9ce4616ed3574c81e48d968662994b56a396
Replace error_resilient flag with use_prev_frame_mvs in
vp9_pick_inter_mode reference motion vector search selection.
This effectively turns off the simplified ref mv search in the
settings of frame resizing, even if error-resilient mode is off.
Change-Id: I7fed814ee7bc0cb419a03b846e0fc2de46ba7686
Update the frame motion vector only if previous frame motion vector
is needed for next frame reference motion vector.
Change-Id: Ica50f9d7b46ad4f815bba0d9e30f5546df29546f
The warning only happens in VP9 encoder's first pass due to src_mi
is not set up yet. But it will not fail the encoder as left_mi and
above_mi are not used in the first_pass and they will be set up again
in the second pass.
Change-Id: I12dffcd5fb1002b2b2dabb083c8726650e4b5f08
This allows us to track decode speed for new encodes so that we catch
problems like an encode change that makes decode really slow.
Change-Id: I92251a8b1f710b241f66e1042413df1b71b76038
This commit enables the use of sub8x8 blocks in RTC key frame
encoding. It requires the block size to be preset and will decide
the coding mode and encode the bit-stream.
Change-Id: I35aaf8ee2d4d6085432410c7963f339f85a2c19b
Rename set_modeinfo_offsets as set_mode_info_offsets, to be more
consistent with naming convention.
Change-Id: I68ca1f36c4a78127d9439a50c1506a2afd07927d
The later encoding process will take the top-left block's
mode_info for pre-determined block size.
Change-Id: I76a90f9ce7f3b2dbc2975b52442114e461c465b5
The restructure moves the decision into the rd pick
modes loop and makes a decision based at the 16x16
block level instead of only the 64x64 level.
This gives finer granularity and better visual results
on the clips I have tested. Metrics results are worse
than the old AQ2 especially for PSNR and this mode
now falls between AQ0 and AQ1 in terms of visual
impact and metrics results.
Further tuning of this to follow.
It should be noted that if there are multiple iterations
of the recode loop the segment for a MB could change
in each loop if the previous loop causes a change in the
complexity / variance bin of the block. Also where a block
gets a delta Q this will alter the rd multiplier for this block
in subsequent recode iterations and frames where the
segmentation is applied.
Change-Id: I20256c125daa14734c16f7cc9aefab656ab808f7
The function vp9_filter_block1d16_h8_ssse3 uses the PSHUFB instruction which has a 3 cycle latency and slows execution when done in blocks of 5 or more on Atom processors.
By replacing the PSHUFB instructions with other more efficient single cycle instructions (PUNPCKLBW + PUNPCHBW + PALIGNR) performance can be improved.
In the original code, the PSHUBF uses every byte and is consecutively copied.
This is done more efficiently by PUNPCKLBW and PUNPCHBW, using PALIGNR to concatenate the intermediate result and then shift right the next consecutive 16 bytes for the final result.
For example:
filter = 0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8
Reg = 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
REG1 = PUNPCKLBW Reg, Reg = 0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7
REG2 = PUNPCHBW Reg, Reg = 8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15
PALIGNR REG2, REG1, 1 = 0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8
This optimization improved the function performance by 23% and produced a 3% user level gain on 1080p content on Atom processors.
There was no observed performance impact on Core processors (expected).
Change-Id: I3cec701158993d95ed23ff04516942b5a4a461c0
Adds decoder config as a changeable parameter to unit tests, and
changes end to end test to use commonly used parameters to enable
base test of tiles encoding and frame parallel decoding.
Change-Id: I5d23a6857303b4d68b92b15c3f2f04a1bcb4c2bb
the flag in the header wasn't being set based on the encoder
configuration in non-intra only mode
broken since:
fbc2fbf Adding oxcf temp variable.
Change-Id: Ib4cff9901889824bc4e68d7f0f6deb1e41df2f53
The initial reset of this_rdc in vp9_pick_inter_mode is not needed,
since it will be re-assign when used.
Change-Id: Ic0e12d741cbab292fc214c1eabb48b129af7839b
Compare the current best mode rate-distortion cost with the skip
threshold to decide if performing motion search.
Change-Id: Ia071824f8dd3b7db485f424692a485a2da6a1a9f
These speed-up features for key frame coding are only turned on
in the settings of hybrid non-RD and RD mode decision. It provides
about 20% speed-up to the hybrid key frame coding at the expense
of certain compression performance loss. For vidyo1, the key frame
coding statistics are changed
9838F, 35.020 dB, 61677 us -> 9920F, 34.834 dB, 47556 us
Overall rtc set compression performance is down by -0.257%.
Change-Id: I0025447fda26bb7855e982955642b5f55d71b51f
When block size is below 16x16, the encoder swap from non-RD to
RD mode for key frame coding. This largely brough back the key
frame compression performance. For vidyo1 at 1000 kbps, the key
frame coding statistics are changed
9978F, 34.183 dB, 36807 us -> 9838F, 35.020 dB, 61677 us
As compared to the full RD case
7187F, 34.930 dB, 214470 us
The overall rtc set coding performance (single key frame setting)
is improved by 1.5%.
Change-Id: I78a4ecf025d7b24ec911e85be94e01da05e77878
Change 72193 made the encoder behave differently
when configured with and without high bitdepth.
This change means the same algorithm is used for both.
Change-Id: I707a44a94afca773a9e0c2f7ebeeea83030257c5
No more checking of corrupted reference frame as we skip
decoding any non-intra frame in case of frame corrupted.
Change-Id: I77d41bbb02fc5f61972740e2d411441eb6a17073
Currently, VP9 supports column-tile encoding, which allows a frame
to be encoded in multiple column tiles independently. The number of
column tiles are set by encoder option "--tile-columns". This
provides a way to encode a frame in parallel.
Based on previous set of patches, this patch implemented the tile-
based multi-threaded encoder. Each thread processes one or more
tiles.
Usage:
For HD clips:
--tile-columns=2 --threads=1/2/3/4
While using 4 threads, tests showed that the encoder achieved
2.3X - 2.5X speedup at good-quality speed 3, and 2X speedup at
realtime speed 5.
Change-Id: Ied987f8f2618b1283a8643ad255e88341733c9d4
Change 71789 renamed CONFIG_VP9_HIGH to CONFIG_VP9_HIGHBITDEPTH.
However, one use of CONFIG_VP9_HIGH was missed.
Change-Id: I0ebb9c71380c6d810a25708d15471abf9533e695
the gtest implementation used only returns values between 0 and 2^31-1
+ temporarily disable some tests in fdct8x8_test which misbehave with the
new range
Change-Id: I45381076f0bea3317cc6728305890e4fd2f2facd
Currently, the configure script checks for x32 by testing just the
__ILP32__ define. However, on "plain" i386, __ILP32__ can also be
defined, for example by clang 3.5.0 and higher. (That gcc does not
define it there, is another issue, but not for this tracker.)
Therefore, extend the check by also checking for __x86_64__, which will
also be defined for x32.
BUG=887
Change-Id: I90ac1d6843caff0416e1dd360c0be3dbaa85c2ae
the gtest implementation used only returns values between 0 and 2^31-1
+ temporarily disable some tests in lpf_8_test which misbehave with the
new range
Change-Id: I8a026680c4b8c12dc14d4f24c33edb2315963114
For key frame at speed 6: enable the non-rd mode selection in speed setting
and use the (non-rd) variance_based partition.
Adjust some logic/thresholds in variance partition selection for key frame only (no change to delta frames),
mainly to bias to selecting smaller prediction blocks, and also set max tx size of 16x16.
Loss in key frame quality (~0.6-0.7dB) compared to rd coding,
but speeds up key frame encoding by at least 6x.
Average PNSR/SSIM metrics over RTC clips go down by ~1-2% for speed 6.
Change-Id: Ie4845e0127e876337b9c105aa37e93b286193405
This commit reworks the ONE_LOOP_REDUCED coefficient probability
model update process. It allows model update for every coefficient
across the spectrum at a coarser resolution, instead of performing
precise update only for certain subset of probability models.
The overall runtime remains nearly same (<1% change) for speed -6.
The compression performance is improved by 7.5% in PSNR for speed
-5 and 4.57% for speed -6, respectively.
Change-Id: Ifb17136382ee7e39a9f34ff4a4f09a753125c8d1
Synchronize all threads immediately as a subsequent decode call may
cause a resize invalidating some allocations.
fixes one aspect of crbug.com/437655
Change-Id: Ie993b62c2756478543206ddbe43ec6268d90a470
Change 72056 unfolded some macro definitions,
but lost some alternative behaviour required for
high bitdepth encodes.
This causes the encoder to crash, see issue 884.
Change-Id: I8ce4d73c9fe0a3c10ccb86fba210fabc8b2f0ccc
Also removes some spurious changes in common/vp9_blockd.h which
was introduced by a rebase issue between nextgen and master branches.
Change-Id: If359f0e9a71bca9c2ba685a87a355873536bb282
(cherry picked from commit 005d80cd05)
(cherry picked from commit 08d2f54800)
(cherry picked from commit 4230c2306c)
This commit makes the codec automatically turn on error resilient
mode when using real-time mode for temporal scalable coding. It
fixes an enc/dec mismatch issue and re-enables the corresponding
unit test.
Change-Id: Ie1f7134e9a78ddd43e9b1555b3ee991c8a3afd0d
AQ2 modified to use mb_av_energy in defining variance
thresholds used alongside complexity when defining the
segment to be used for an SB64.
Slight improvements in metrics (ssim and PSNR).
Change-Id: Idb9cb73f7d9c4f7118cd7e84ac77b0f25cacbf81
Incorporate segment delta-q into estimated bits.
This generally improves the rate control under cyclic refresh (aq=3) mode.
Change-Id: I1dc60fb230e7d08357fae18909d8ed27bf58e037
A hidden enc/dec mismatch bug was accidentally triggered by
https://gerrit.chromium.org/gerrit/#/c/72247/
Adaptively adjust mode test kick-off thresholds in RTC coding
This commit temporarily turns off the broken unit tests to avoid
blocking other CLs while fixing.
Change-Id: I0a0f195030321190ce10879cd833187680576367
Probably not even the dominant platform the library is being built for.
Add --cpu= option description to help. The option already exists.
Don't allow passing just --cpu as a no-op.
BUG=826
Change-Id: Iaa3f4f693ec78b18927b159b480daafeba0549c0
This patch greatly increase the strength of AQ1.
Visual tests show strong gains on many clips but their is a big
hit on psnr.
SSIM is more mixed with some winners and losers.
Change-Id: Idaa5d3b41d8576096bfa000b62bc531c3d8bf6a1
Each tile's tok starting address is calculated before the encoding
process. These addresses are stored so that the same calculation
won't be done again in packing bit stream.
Change-Id: I0a3be0301f002260c19a850303f2f73ebc47aa50
When the golden frame is boosted, the rate correction factor is not
correlated well with other inter frames even in CBR mode. This commit
changes to use GF specific rate_correction_factor when gf_cbr_boost
is greater than 20%.
Change-Id: I6312c1564387bcacc11f4c5e8a9cfdc781b5c3ab
This commit allows the encoder to increase the mode test kick-off
thresholds if the previous best mode renders all zero quantized
coefficients, thereby saving motion search runs when possible.
The compression performance of speed -5 and -6 is down by -0.446%
and 0.591%, respectively. The runtime of speed -6 is improved by
10% for many test clips.
vidyo1, 1000 kbps
16578 b/f, 40.316 dB, 7873 ms -> 16575 b/f, 40.262 dB, 7126 ms
nik720p, 1000 kbps
33311 b/f, 38.651 dB, 7263 ms -> 33304 b/f, 38.629 dB, 6865 ms
dark720p, 1000 kbps
33331 b/f, 39.718 dB, 13596 ms -> 33324 b/f, 39.651 dB, 12000 ms
mmoving, 1000 kbps
33263 b/f, 40.983 dB, 7566 ms -> 33259 b/f, 40.978 dB, 7531 ms
Change-Id: I7591617ff113e91125ec32c9b853e257fbc41d90
This patch modified struct VP9_COMP. Created a struct ThreadData
to include data that need to be copied for each thread. In
multiple thread case, one thread processes one tile. all threads
share one copy of VP9_COMP,
(refer to VP9_COMP *cpi in the code)
but each thread has its own copy of ThreadData,
(refer to ThreadData *td in the code).
Therefore, within the scope of encode_tiles(), both cpi and td
need to be passed as function parameters.
In single thread case, the FRAME_COUNTS pointer in ThreadData
points to "counts" in VP9_COMMON.
Change-Id: Ib37908b2d8e2c0f4f9c18f38017df5ce60e8b13e
The intra mode penalty is covered by intra_cost_penalty. This
commit removes the other intra cost threshold, provided that the
constant 50 is negligible in normal rate-distortion cost.
Change-Id: I9b8b7483c43b9a41741622e7057def1f7d51bb72
This change is made in preparation for a
subsequent patch which adds acceleration
for the highbitdepth transform functions.
The highbitdepth transform functions attempt
to use 16/32bit sse instructions where possible,
but fallback to using the C implementations if
potential overflow is detected. For this reason
the dct routines are made global so they can be
called from the acceleration functions in the
subsequent patch.
Change-Id: Ia921f191bf6936ccba4f13e8461624b120c1f665
(cherry picked from commit 454342d4e7)
This commit makes a non-RD coding mode decision process for key
frame coding. It can be optionally turned on in speed -6 and above.
Change-Id: I0847258b392877a0210b4768bef88ebc9ad009b5
previously 'bit_depth_', which is later used to calculate 'mask_', would
be left uninitialized in non-high-bitdepth builds
Change-Id: Ia72035f4645baf3bb0f191504f491b934cdf1e0e
This commit allows more aggressive decision to skip forward
transform and quantization for luma component in RTC coding mode.
The chroma components remains going through the normal coding
routine, since they are not included in the non-RD mode search
process.
It reduces the runtime cost by 2% - 10%. In speed -6,
vidyo1 1000 kbps
16576 b/f, 40.281 dB, 8402 ms -> 16576 b/f, 40.323 dB, 7764 ms
nik720p 1000 kbps
33337 b/f, 38.622 dB, 7473 ms -> 33299 b/f, 38.660 dB, 7314 ms
dark720p 1000 kbps
33330 b/f, 39.785 dB, 13505 ms -> 33325 b/f, 39.714 dB, 13105 ms
The compression performance of speed -6 is improved by 0.44% in
PSNR and 1.31% in SSIM.
Change-Id: Iae9e3738de6255babea734e5897f29118bebc6d7
In AQ1 a rate adjustment was applied for blocks coded with a
deltaq. This tends to skew the partition selection and cause
rate overshoot.
For example, consider a 64x64 super block where some but not all
sub blocks are in a low q segment and some are in a high q segment.
The choice of Q when considering large partition and transform sizes
is defined by the lowest sub block segment id (currently this implies the
lowest Q). If some parts of the larger partition are very hard this will
cause a high rate component.
The correct behavior here is for the rd code to discard the large partition
choice and break down to sub blocks where some have low and some
have high Q. However the rate correction factor above mask the high
cost of coding at a larger partition size.
Change-Id: Ie077edd0b1b43c094898f481df772ea280b35960
Make the midpoint variance used in AQ mode 1 segmentation
depend on the overall complexity of the frame in two pass.
Change-Id: I452814ec57f7a32352e41bb250e78066abe952dd
By using 0xff for a short it was not setting the high bits. When
comparing the output with vtst to find non-zero elements it was skipping
vaules which had no low bits set such as -512 / 0xFE00.
Using -8191 as the first element of coeff will generate this condition.
BUG=883
Change-Id: Ia1e10fb809d1e7866f28c56769fe703e6231a657
All the assembly code has been removed, the tests no longer check for
the target, and android and chrome do not use the targets.
Change-Id: I193993f7b2b0bd6478453402f573ce3606e04e8d
Add an additional restriction to bit/complexity based
segmentation based on spatial variance.
Only lower Q when both the number of bits spent
in the initial encoding pass and the spatial complexity are
below a threshold. This will prevent the low Q segments
being used just because there is a surfeit of bits.
Small metrics gains especially opsnr.
derf ~0.2% std-hd ~0.3%
Change-Id: I6a8496d466d673f9b0e2b2ca6304ea7b6d8e1cce
This is the first of a series of patches to restructure and
improve AQ mode 1 (variance based AQ).
Change-Id: Idcf693131a3ea2459dcfd957a54a65b971fa4a2a
this was incorrectly set in test.mk by
93ffd37 Enable and fix resize_test for VP9
the test is now available when using --disable-vp9
Change-Id: I6acf44b0de647b34812ef5e18fd96447cdf9b25d
worker hooks return false on error, fix the assignment in Execute() used
in the TestSerialInterface test
Change-Id: I93c2e45f270330ae6d35a3a303411c4ee0f31337
Similar to mask_filter, the filter_cache in RD_OPT struct can be
moved out, and declared as a local variable since it is only
used in pick_inter_mode functions.
Change-Id: I412b99cca82bade07ac912064ec03dd1de6b2c17
Correct calculation of number of mbs in two pass code when
frame resizing is enabled. Always use initial number of mbs if
scaling is enabled, as this is what was used in the first pass.
Change-Id: I49a4280ab5a8b1000efcc157a449a081cbb6d410
The mask_filter in RD_OPT struct is used to record rd result in
filter decision. It is only used in pick_inter_mode functions,
and is removed from the struct and declared as a local variable.
Change-Id: I3c95c8632ba7241591ce00ef2ef5677b5e297d7b
The max_partition_size and max_partition_size are set at the
beginning while setting speed features, and then adjusted at
SB level. Moving them to mb struct ensures there is a local
copy for each thread.
Change-Id: I7dd08dc918d9f772fcd718bbd6533e0787720ad4
VP9/DatarateTestVP9Large.ChangingDropFrameThresh/[34] fails post the
merge of commit#ffa06b37. This commit adds reset of rc tracking info
when frame is dropped, and fixes the causes of the bad interaction
between the tests and the previous commit.
Change-Id: I848acfd9fcb336359662274325190f94aac76eae
This commit reworks the forward transform and quantization process
for 8x8 block coding. It combines the two operations in a single
function to save a store/load stage of the original transform
coefficients. Overall the speed -6 is slightly faster (around 1%
range). The compression performance of speed -6 is improved by
3.4%.
Change-Id: Id6628daef123f3e4649248735ec2ad7423629387
In rare cases, the interaction between rate correction factor and Q
choices may cause severe oscillating frame sizes that are way off
target bandwidth. This commit adds tracking of rate control results
for last two frames, and use the information to prevent oscillating
Q choices.
Change-Id: I9a6d125a15652b9bcac0e1fec6d7a1aedc4ed97e
vp9_quantize_fp is the quantization process used by rtc coding
mode. This commit adds a sse2 implementation of it. The
implementation is modified based on vp9_quantize_b_sse2. No speed
difference from ssse3 version.
Change-Id: I24949c5b27df160b4f35117d28858d269454e64a
Current setting had active_worst_quality set too high (close to worst_quality)
for first frame(s) following first key frame. This changes that to be somewhat
more aggressive in allowing active_worst_quality to be lower following key frame.
Also remove the 4/5 reduction in active_worst for key frame as
this should be set by the user qp_max setting.
Change-Id: I0530b3ddcc85c00e3eb7568de1b14a31206c4a4c
The function pointer in compressor instance does not change, so this
commit changes to call the function directly.
Change-Id: I9c9c460e3475711c384b74c9842f0b4f3d037cc5
This commit adds a check condition to the prediction buffering
operation used in the rtc coding mode. This resolves a unit test
warning in example/vpx_tsvc_encoder_vp9_mode_7.
Change-Id: I9fd50d5956948b73b53bd8fc5a16ee66aff61995
These 2 members in RD_OPT were moved to TileDataEnc struct
already, and therefore were removed here.
Change-Id: I22fee3b67f96e473a58e194a7edc76dbd48bfa04
Several frame counters in encoder are updated at SB level. Combine
those counters and put them in a separate struct, which allows us
to allocate one copy for each thread.
Change-Id: I00366296a13c0ada4d8fa12f5e07728388b6cab7
Modified VP9_COMP struct to include MACROBLOCK *mb. This change
makes it feasible in multi-thread case to allocate a mb for each
thread.
Change-Id: I624d6d1aa9c132362200753e5d90b581b1738d6e
A flush bug is discovered during putting frame parallel decoder
into Android. This test will expose that bug.
Change-Id: Ia047f27972f4da0471649f79f1f91e7695297473
Two members in struct CYCLIC_REFRESH
int64_t projected_rate_sb;
int64_t projected_dist_sb;
are updated at the superblock level, which makes them shared data
in the multi-thread situation, and requires extra work to handle
them. However, those values are updated and used immediately, and
therefore can be removed. This patch cleaned up the code and
removed the two members.
Change-Id: I2c6ee4552bf49fb63ce590cdb47f9723974fffb1
Prepare for the introduction of frame-size change
logic into the recode loop.
Separated the speed dependent features into
separate static and dynamic parts, the latter being
those features that are dependent on the frame size.
Change-Id: Ia693e28c5cf069a1a7bf12e49ecf83e440e1d313
Add extra vp9_clear_system_state() calls to fix
double / mmx issue introduced into first pass
code for 32 bit builds.
Change-Id: I84cd2986b80d83650a091ab25c43755efeb82e03
This reverts commit 7d07f512cd.
this breaks visual studio builds:
'#' : invalid character : possibly the result of a macro expansion
Change-Id: I77170d549afb71e75a878fa0f6acd204fe8d9e67
Rate correction factor is used to correct the estimated rate for any
given quantizer, and feeds into rate control for quantizer selection.
We make use of the actual bits used to calculate this rate correction
factor with an adjustment limit to prevent over-adjustment.
This commit adapts the adjustment limit to the difference between the
estimated bits and the actual bits, allows the adjustment limit to vary
between 0.125 (when estimate is close to actual) and 0.625 (when there
is >10X factor off between estimated and actual bits). By doing this,
the commit appears to have largely corrected two observed issues:
1. Adjustment is too slow when the actual bits used is way off from
estimate due to the small adjustment limit.
2. Extreme oscillating quantizer choices due to the feedback loop.
Change-Id: I4ee148d2c9d26d173b6c48011313ddb07ce2d7d6
This commit makes the speed -6 and above use the reconstructed
boundary pixels for precise intra prediction. This allows more
intra prediction modes to be tested in the non-RD coding process.
Enabling horizontal and vertical intra prediction modes can
improve the speed -6 compression performance for rtc set
by 0.331%.
Change-Id: I3a99f9d12c6af54de2bdbf28c76eab8e0905f744
I0c5f010 changed to allow update golden reference buffer in CBR mode,
this commit changes the use of rate_correction_factor for those frames
to be aligned with the new usage. This commit attempts to solve two
issues:
a. Initialization of rate correction factor for Golden Frame
Prior to this patch, even the regular inter frame has been update
the rate correction factor based on content and encoding results,
the first golden frame would still use the ininitialized value
that can be way off.
b. Allowing rate correction factor update to be slightly faster
Prior to this patch, when the rate correction factor is off, the
update to the factor is too slow, the factor could not get close
to a semi-correct value even after many frames.
The commit helps all clips in psnr/ssim metric, but especially to
a few clip in RTC set that rate correction was way off. For example
thaloundeskmtgvga gained about .5dB for both overall/average psnr.
Change-Id: I0be5c41691be57891d824505348b64be87fa3545
Adds support for one-pass rc-enabled SVC encoder with callbacks for
getting per-layer packets.
- the callback function registration is implemented as an encoder
control function.
- if the callback function is not registered, the old way of
aggregating packets with superframe will take effect.
- one more control function “VP9E_GET_SVC_LAYER_ID” has been
implemented to get the temporal/spatial id from the encoder
within the callback. This can be used to get the ids to put on RTP
packet.
Change-Id: I1a90e00135dde65da128b758e6c00b57299a111a
This commit rename a reserved color space entry to BT_2020, it intends
to provide support for VP9 bitstream to pass along the color space
type defined in BT.2020(Rec.2020)
please note this entry does not have any effect on encoding/decoding
behavior, but allow applications to the pass the information along
from encoding end to decoding end.
Change-Id: I4678520e89141ea5e8900f7bd1c0e95b710b7091
This commit integrates the non-RD mode decision process and the
encoding process into a single recursion scheme.
Change-Id: I6a7e72a0b84d567554801ebbe01ec75d54c1f77d
The obj_int_extract code is no longer worth maintaining. It creates
significant issues when adapting for different build systems and no
longer offers as significant of a performance benefit due to
improvements in intrinsics.
Source files will remain until the various third-party builds are updated.
The neon fast quantizer has been moved to intrinsics. The armv6 version
has been removed because so few remaining targets require it.
Compilers and processors have improved significantly since the
pack_tokens code was written. The assembly is no longer faster than the
C code.
pack_tokens were the only optimizations for the armv5te targets so the targets
will be removed after the test infrastructure has been updated.
BUG=710
Change-Id: Ic785b167cd9f95eeff31c7c76b7b736c07fb30eb
This patch was to fix the vpxdec fuzzing3 test failure. When an
error occurs, setjmp() is invoked, which calls the decoder
removing routine. In multiple thread situation, other threads
could try to access the frame context memory that is already
deallocated, thus causing a segfault.
An invalid unit test was added for this issue.
Change-Id: Ida7442154f3d89759483f0f4fe0324041fffb952
The aim of this patch is to apply a positive weighting to
frames that have a significant number of blocks that are
of low spatial complexity and are dark. The rationale behind
this is that artifacts tend to be more visible in such frames.
In this patch the weight is only applied in regard to the distribution
of bits between frames. Hence if all the frames share similar
characteristics (as is the case for most of our short test clips) there
will be little or no net effect.
However, the effect can be seen on some longer form test content.
For example Tears of steel baseline test:
2323.09 Kbit/s opsnr 39.915 ssim 74.729
With this patch:-
2213.34 Kbit/s opsnr 39.963 ssim 74.808
(Sligtly better metrics and about 5% smaller)
The weighting may well need some further tuning along side changes
to the aq modes.
Change-Id: Ieced379bca03938166ab87b2b97f55d94948904c
This commit removes the cyclic aq mode dependency on
in_static_area and reworks the corresponding cut-off thresholds.
It improves the compression performance of speed -5 by 1.47% in
PSNR and 2.07% in SSIM, and the compression performance of speed
-6 by 3.10% in PSNR and 5.25% in SSIM. Speed wise, about 1% faster
in both settings at high bit-rates.
Change-Id: I1ffc775afdc047964448d9dff5751491ba4ff4a9
the offending assembly code was deleted in:
08e38f0 VP8 for ARMv8 by using NEON intrinsics 14
the intrinsics currently pass.
fixes issue #725
Change-Id: I43e4263bef21f9d9008c51ffdfa39fcf10b8e776
This will save the memory and improve the decode speed due to
removing unnecessary memset of big prev_mi array for
all the key frames.
Decoding a all key frames 1080p video shows speed improve around 2%.
Change-Id: I6284a445c1291056e3c15135c3c20d502f791c10
The test filter is not a prefix matcher. It requires test type to
contain no more than the optimization type. In this example, SSSE3_64
fails to match and the test is not skipped even when SSSE3 is not
available.
Change-Id: Ia74229a167c88da4e6da169012a7a77d438c3f75
Check that the numerator is not zero. If it is, guess 30fps.
Fixes a clang IOC error in the quantize test. It's very unlikely for
this to occur in the wild because the setup in the quantize test is very
nonstandard.
Change-Id: Icdab7b81d4e168d3423e14db20787f960052e0c3
This commit makes the RTC coding mode to conditionally skip the
reference frame mode search, when the predicted motion vector of
the current reference frame gives more than two times sum of
absolute difference compared to that of other reference frames.
It reduces the runtim by 1% - 4% for speed -5 and -6. The average
compression performance is improved by about 0.1% in both settings.
It is of particular benefit to light change scenarios. The
compression performance of test clip mmmovingvga.y4m is improved by
6.39% and 15.69% at high bit rates for speed -5 and -6, respectively.
Speed -5
vidyo1 16555 b/f, 40.818 dB, 12422 ms ->
16552 b/f, 40.804 dB, 12100 ms
nik 33211 b/f, 39.138 dB, 11341 ms ->
33228 b/f, 39.139 dB, 11023 ms
mmmoving 33263 b/f, 40.935 dB, 13508 ms ->
33256 b/f, 41.068 dB, 12861 ms
Speed -6
vidyo1 16541 b/f, 40.227 dB, 8437 ms ->
16540 b/f, 40.220 dB, 8216 ms
nik 33272 b/f, 38.399 dB, 7610 ms ->
33267 b/f, 38.414 dB, 7490 ms
mmmoving 33255 b/f, 40.555 dB, 7523 ms ->
33257 b/f, 40.975 dB, 7493 ms
Change-Id: Id2aef76ef74a3cba5e9a82a83b792144948c6a91
This commit unfolds the legacy macro definitions used in the
sub-pixel motion search and refactors the operational flow for
later optimizations.
Change-Id: I3e3f770cad961d03d1a6eb0b2a0186cc77eaf2b8
The current logic was allowing for disabling golden refresh only
for two pass svc encoding. This change disables it as long as
more than 1 layer encoding is used (for example temporal layers under 1pass CBR).
Change-Id: I4dc5204a7ad365c821ec7963e93b59da82e1826b
In the function mb_lpf_horizontal_edge_w_avx2_16 the usage of the intrinsic
_mm256_cvtepu8_epi16 cause a compiler bug in gcc 4.9.1.
until it will be fixed I created a workaround that create the up convert by
using broadcast128+shuffle.
The bug was reported here:
https://code.google.com/p/webm/issues/detail?id=867
Change-Id: I73452e6806f42e0fadcde96b804ea3afa7eeb351
A recent change has introduced big quality drops for speed 7 and 12
for --rt mode. The change reverted the big drop and improved quality
by 9.5% for speed 7 and 13.4% for speed 12.
Change-Id: I07b82e3bb6002a73af486a083458c88877bdad01
This will save a lot of memory for decoder due to removing of prev_mi,
but prev_mi is still needed in encoder. So this will increase a little bit
memory for encoder.
Change-Id: I24b2f1a423ebffa55a9bd2fcee1077dac995b2ed
Use intrinsics for neon quantization. Slight loss (<5%) of performance
compared to the assembly. Roughly 10x faster on arm64 because that was
running C code before.
Change-Id: I7cf5242d8f29b7eab5bca6a1c20c89c9fc9ca66d
This commit makes the inter prediction buffer system to support
hybrid partition search. It reduces the runtime of speed -5 by
about 3%. No compression performance change.
vidyo1 720p 1000 kbps
11831 ms -> 11497 ms
nik 720p 1000 kbps
10919 ms -> 10645 ms
Change-Id: I5b2da747c6395c253cd074d3907f5402e1840c36
Combined vp9_denoiser_8xM_sse2 and vp9_denoiser_4xM_sse2 into one
function vp9_denoiser_NxM_sse2_small and passed the bitexact testing.
Changed the name of the function vp9_denoiser_64_32_16xM_sse2 to
vp9_denoiser_NxM_sse2_big.
Change-Id: Ib22478df585994dd347ebae04202c0b701e7f451
This commit changes to allow the usage of golden reference frame in
VP9 CBR mode to improve quality. VP9 supports potentially up to 8
reference buffers, it has reference buffers available for this
purpose. This was not possible in VP8 as golden and alt-ref buffers
were used for temporal scalability purpose in CBR mode in WebRTC.
For frames that update golden frame, there can be a quality boost.
The amount of allowed bitrate boost can be controlled via parameter
rc_max_inter_bitrate_pct. The inital value of the boost ratior is
currently based on over_shoot_pct. Further experiments will work
out the adaption of this boost value.
Change-Id: I0c5f010c8fd8b7b598f69779c1b30e5b2ac30a4d
Added code to relax the active maximum Q in response
to extreme local overshoot to reduce bandwidth peaks.
The impact is small in metrics terms, but it this helps reduce
bandwidth spikes and overall overshoot in a number of
clips in our tests sets (especially the YT test set).
In particular this should help prevent very big spikes where a clip
is mainly easy but has a short hard section. In such a case a choice
of maximum Q for the clip as a whole may allow us to hit the overall
target rate but give some extreme spikes. The chunked encoding in YT
mitigates this problem but it can show up where a longer clip is
coded as a single chunk.
Change-Id: I213d09950ccb8489d10adf00fda1e53235b39203
The zero motion vector was effectively used in the subsampled pixel
based variance calculation. This commit makes it directly use zero
mv to generate prediction.
Change-Id: Ica83dc843e9f8da2f89c3ef451e50f16214c0def
0 means that golden boost is off, and uses average frame target rate,
a non-zero number means the percentage of boost over average frame
bitrate is given initially to golden frames in CBR mode.
Change-Id: If4334fe2cc424b65ae0cce27f71b5561bf1e577d
-Use full bandwidth (when temporal layers is on) for checking switching.
-Normalize metric wrt num_blocks.
-Rounding fix to update of average noise level metric.
-Make default internal denoiser mode == kDenoiserOnYUV (in denoiser set_parameters()).
-Adjust some thresholds.
Change-Id: Ib827512b25a7bf1f66c76d3045f3a68ce56b1cd2
The point at which frames are scaled to their
coded dimensions is moved into the re-code loop.
This is in preparation for a further patch that
will add logic into the re-code loop to reduce
the coded frame size if the encoder is struggling
to hit the target data rate at the native frame
size.
Change-Id: Ie4131f5ec6fb93148879f6ce96123296442bf2d1
Add second level arf Q adjustment when using dual arfs
in constant Q mode.
Previously in constant Q mode enabling dual arf hurt by ~5%
but with this change the average benefit is ~1-1.5% with some
mid range data points up ~10%.
Note however that it still hurts on some clips including
some very low motion show content.
Change-Id: I5b7789a2f42a6127d9e801cc010c20a7113bdd9b
This patch allocated frame contexts outside VP9_COMMON. This allows
multiple threads to share the same copy of frame contexts, and
reduces the overhead. It also guarantees the correct update of
these contexts during bitstream packing. This patch doesn't change
encoding result.
Change-Id: Ic181a2460b891d1d587278a6d02d8057b9dbd353
The initialization of this_mode_pred does not work when the ref_frame
loop ever goes beyond LAST_FRAME. This commit fixes the subtle issue
and allows potentially expanding the loop to test GOLDEN_FRAME.
Change-Id: Ibbd427a22160d1d9eacb8ed0c87f88d6cef9c0f3
Using 4 threads, frame parallel decode is ~3x faster than single thread
decode and around 30% faster than tile parallel decode for frame parallel
encoded video on both Android and desktop with 4 threads. Decode speed is
scalable to threads too which means decode could be even faster with more threads.
Change-Id: Ia0a549aaa3e83b5a17b31d8299aa496ea4f21e3e
This commit refactors the rate distortion structure used in the
non-RD coding mode and saves a few RDCOST calculations.
Change-Id: I62c3416c300d2c5372f21b96d93a6b633a34ab3a
The existing speed features produce horrible encoding results, almost
30% worse than cpu-used=4, this commit adjust the speed features to
produce relatively resonable results to be within 3%-5% of cpu-used=4.
Change-Id: I0ca6ebafb33024d4a0cbcf04c78a4a00b8dd1ecf
Its functionality has been replaced with choose_partitioning and
threshold based control on split mode check.
Change-Id: Ic9bb321df06b524f5c38ea5874dc6f6a8f93c5e3
This speed feature has been deprecated in both yt and rtc coding
modes. This commit removes the related operations.
Change-Id: I079c79c6adafe45581af2ebf8b98faebcface1ce
This commit re-designs the recursive partition search scheme in
rtc speed -5. It first checks if the current block is under cyclic
refresh mode. If so, apply recursive partition search. Otherwise,
perform sub-sampled pixel based partition selection. When the
pre-selection finds the partition size should be 32x32 or above,
use the partition size directly. Otherwise, apply partition search
at nearby levels around the preset partition size.
It is enabled in speed -5. The compression performance of rtc
speed -5 is improved by 9.4%. Speed wise, the run-time goes slower
from 1% to 10%.
nik_720p, 1000 kbps
33220 b/f, 38.977 dB, 10109 ms -> 33200 b/f, 39.119 dB, 10210 ms
vidyo1_720p, 1000 kbps
16536 b/f, 40.495 dB, 10119 ms -> 16536 b/f, 40.827 dB, 11287 ms
Change-Id: I65adba352e3adc03bae50854ddaea1b421653c6c
Extend --auto-alt-ref from parameter so we can use it to
turn multi-arf on and off from the command line.
For now the range is 0-off, 1-on, 2-multi-arf on.
Rename play_alternate to enable_auto_arf
Change-Id: Id7b64407cfbe76ba0090a83b588a03e22a240386
All sad function that process above 32 consecutive elements are optimized
for AVX2:
vp9_sad64x64
vp9_sad64x32
vp9_sad32x64
vp9_sad32x32
vp9_sad32x16
vp9_sad64x64_avg
vp9_sad64x32_avg
vp9_sad32x64_avg
vp9_sad32x32_avg
vp9_sad32x16_avg
The functions that appeared as a hotspot is vp9_sad32x32 and vp9_sad64x64
vp9_sad32x32 was optimized by 68% and vp9_sad64x64 was optimized by 90%
both of them gave and overall ~2.3% user level gain
Change-Id: Iccf86b375a2b54c5fbbe685902ead0c9a561b9fd
Currently, the tokens for a tile are stored immediately after its
preceding tile, which causes a dependency. This is unnecessary
since we always allocate enough memory for tokens. Removing
the dependency allows token writing done in parallel. This patch
doesn't change encoding result.
Change-Id: I7365a6e5e2c2833eb14377c37e1503c9d0f26543
This should be set right after decoder really start to decode frame
instead setting at the end.
Even decoder does not have a displayable frame to show and return NULL
to application, this should be set too.
Change-Id: If0313a834bc64e3b0f05a84f4459d444d9eab0d8
When early termination is triggered, properly reset the rate cost
to invalid value to avoid potential ioc issue.
Change-Id: I3444390be2e49a34bb02cf8a74c33d5dbd96d88d
Delete gfboost_qadjust() and move Q based adjustment
into calc_frame_boost(). Also remove clamping. Making
the adjustment here means that it influences not just the
boost level but also the selection of the GF/ARF interval.
This change gives a small average gain in PSNR but
larger gains in SSIM, especially for harder std-hd set (1.5%)
Change-Id: I3aa81b8feccaeff93d915e19fb9cf5cd64c86327
Covers all profiles and input formats. The tests check if the
encode succeeds and if the psnr is sane.
Change-Id: I195a5330debf92562846121819b6eaf961e27c01
This commit fixes an ioc issue that will happen when the cumulative
variables are not in effective use. The fix discards these
redundant additions.
Change-Id: Idbac5bfb989c0cedc5f8a323effce938519b2457
this removes an assumption that worker->data1 would be pointing to a
TileWorkerData allocation.
additionally, within the multi-threaded loopfilter pass VP9LfSync as a
parameter to the worker hook, removing the need for a shadow pointer in
LFWorkerData.
Change-Id: Ic7b2faa34e3eb59dbcb8a7c67f333448fa047c88
move them from VP9Worker::data[12] to allow the structure to be reused a
bit more naturally by the multi-threaded loopfilter.
Change-Id: I31b49c9e93ca744fd7f6d6ed8696671188fb2c1d
This removes an unnecessary restriction that causes
a problem (noticed by AWG) when the forced key frame
interval is set to a very small value, such as 10. In this case
we were being forced to code minimal length GF groups.
Change-Id: I76ef5861a09638ff51f61fea02359554184ada53
We encode a empty invisible frame in front of the base layer frame to
avoid using prev_mi. Since there's a restriction for reference frame
scaling factor, we have to make it smaller and smaller gradually until
its size is 16x16.
Change remerged.
Change-Id: I9efab38bba7da86e056fbe8f663e711c5df38449
This reverts commit 452dc21500.
This change has introduced a significant quality regression on content
with forced key frames. (e.g. the YT and yt-hd set). It is most
noticeable in static content where the kf bits dominate. Here, despite
key frames being apparently coded at the same Q, there is a drop in all
metrics of ~20% (e.g clXR and BFa0).
Change-Id: Iba14cc61778c0846fa0a59c33c55a9fc49512cb4
Compare the estimated rate and distortion to the thresholds scaled
according to the operating block size and determine if further
split partition search will be run. The compression performance of
speed -5 is changed by -0.074%. The encoding speed is 10% - 15%
faster.
vidyo1 720p
16545 b/f, 40.492 dB, 11475 ms -> 16535 b/f, 40.486 dB, 10100 ms
nik720p
16624 b/f, 36.310 dB, 10071 ms -> 16617 b/f, 36.313 dB, 8346 ms
Change-Id: Ic9197ab5761279ae55d2fb7813b2af0e0db497b8
Reduce the intra_cost_penalty for non-rd mode,
and some updates to VAR_BASED_PARTITION.
Visual tests show some improvement at Speed 6, for RTC clips.
Change-Id: If9090daf7aed14906a32d931a538ab544bbca606
This commit replaces the use of copy_partitioning with
choose_partitioning based on the sse of subsamped pixels, which
provides significantly better coding performance and runs at
similar speed, as compared to copy_partitioning. It improves rtc
speed 5 coding performance by 3%.
Change-Id: I52d3682a12dce0147f5e52383a594fc242ca3228
this change checks that CONFIG_SPATIAL_SVC is defined and adds a TODO to
ensure this is changed in the future as the release headers can't
depend on vpx_config.h.
vpx/vpx_encoder.h:164:5: warning: "CONFIG_SPATIAL_SVC" is not defined
[-Wundef]
Change-Id: I797a0150e5f56caf048e7ee00b282fbc9c5ede19
We encode a empty invisible frame in front of the base layer frame to
avoid using prev_mi. Since there's a restriction for reference frame
scaling factor, we have to make it smaller and smaller gradually until
its size is 16x16.
Change-Id: I60b680314e33a60b4093cafc296465ee18169c19
Move the point at which input frames are scaled
into the recode loop. This will allow us to change
the coded frame size dynamically in response
to previous attempts to encode the frame at a
higher resolution.
A following patch will implement a scheme for
resizing the frame in the recode loop.
Change-Id: I6a59c02d6ac1626512edad6de8b60063b79433e6
This commit makes a struct that contains rate value, distortion
value, and the rate-distortion cost. The goal is to provide a
better interface for rate-distortion related operation. It is
first used in rd_pick_partition and saves a few RDCOST calculations.
Change-Id: I1a6ab7b35282d3c80195af59b6810e577544691f
Add back clamp which ensures that the Q adaptation
is turned off when the over_shoot_pct and under_shoot_pct
parameters are set to 100.
Change-Id: Id0161b114d39a3029cd3eb28020caab0c3914922
There are two CreateDecoder functions and decode_test_driver is not
calling the right function now. This bug is discovered during really
enable the frame parallel flag inside libvpx. This bug does not affect
any existing unit test though.
Change-Id: Icd9633c4b66d50e422a09c4310ff791082878936
Make sure VP9 frame-parallel decode passes all the standard
test vectors. Only test running with 2,3,4 threads now.
Also refactor the video decode test driver to support passing
in decode flags which is used to enable frame-parallel decode.
Change-Id: I6a712464232c2e13681634951c7e176312522e1e
The original implementation only allocates one segmentation map and this
works fine for serial decode. But for frame parallel decode, each thread
need to have its own segmentation map and the last frame segmentation map
should be provided from last frame decoding thread.
After finishing decoding a frame, thread need to serve the old segmentation
map that associate with the previous decoded frame. The thread also need to
use another segmentation map for decoding the current frame.
Change-Id: I442ddff36b5de9cb8a7eb59e225744c78f4492d8
pthread.h is not supported in windows. vp9_thread.h includes
the emulation layer for pthread in windows.
Change-Id: I2b1c8ec299928472faca7ebeea998170c9f4d744
Prepare for frame parallel decoding, the reference count buffers
need to be protected by mutex. Move vp9_thread.* to common
folder so that those buffers could use cross-platform mutex
from vp9_thread.*.
(cherry picked from commit 337e8015c9)
Change-Id: I0587a08447925f4554d7788686a31483c2ae3f37
The relationship of the user private data at runtime
is not preserved from decode() to this call which may
occur at an unknown point in the future
Change-Id: Ia7eb25365c805147614574c3af87aedbe0305fc6
Prepare for frame parallel decoding, the frame buffers must be
separated from the encoder and decoder structure, while the encoder
and decoder will hold the pointer of the BufferPool.
Change-Id: I172c78f876e41fb5aea11be5f632adadf2a6f466
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.