libvpx only emits:
VPX_IMG_FMT_{I420,I422,I440,I444,I42016,I42216,I44016,I44416}
and additionally supports YV12 as input.
interleaved yuv, rgb and alpha formats are unused.
Change-Id: Ie2ab1099e950c6e696f475d46882f5c47a174042
Switch the order of constrained and layer drop mode,
and keep constrained_layer_drop as the default.
Update the svc datarate tests.
Change-Id: I764270f7b4964b87b0cd3da6c2f96a628f212a30
The control is set by log2 of number of threads (such that the number of
tiles is the same of number of threads).
Thus it should be log2(num_threads) instead of (num_threads >> 1).
Change-Id: I2ccec5557e660048dad3e561534e1c74fc8eec1f
For spatial layers whose base is a key frame, i.e., when
svc.layer_context[cpi->svc.temporal_layer_id].is_key_frame = 1,
allow for hybrid search, similar to what we do on key frames.
For small blocks (<= 8x8) rd-based intra search will be used,
otherwise non-rd pick mode is used.
Feature is controlled by nonrd_keyframe, which is set to 1
for now on non-base spatial layers, so this change has
currently no effect.
Small change only when inter-layer prediction is off, as we now
call vp9_pick_intra_mode instead of vp9_pick_inter_mode on key frame.
But this change is very small/insignificant.
Change-Id: I5372470f720812926ebbe6c4ce68c04336ce0bdd
This reverts commit 5cc8df5bcfe49fbfca21ee57401c7807c048751b.
Reason for revert: <INSERT REASONING HERE>
We need to do this on all key frames in the stream (not just the first one). Will make another cleaner change for this.
Original change's description:
> vp9-svc: Fix to first superframe when inter_layer is off.
>
> When the application selects the setting INTER_LAYER_PRED_OFF
> each spatial stream should be decodeable separately.
> For this we need to force key frames on all spatial layers
> on the first superframe.
>
> In order to maintain the quality at the beginning of the stream
> the active_worst for spatial layer of the second superframe is set
> to the last_QP of the correspondng spatial layer of the first superframe.
> Also make sure nonrd_keyframe is set for non-base spatial layers.
>
> Change only affects SVC mode wit number_spatial_layers > 1 and
> svc->disable_inter_layer_pred == INTER_LAYER_PRED_OFF.
> And only affects first and second frame of sequence.
>
> Change-Id: I8ee9a0873ab1d3a02515774571f719617771ad41
TBR=marpan@google.com,builds@webmproject.org,jianj@google.com
Change-Id: If73d9f3932224fc6751e773763adf7e8ee67d17f
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
When the application selects the setting INTER_LAYER_PRED_OFF
each spatial stream should be decodeable separately.
For this we need to force key frames on all spatial layers
on the first superframe.
In order to maintain the quality at the beginning of the stream
the active_worst for spatial layer of the second superframe is set
to the last_QP of the correspondng spatial layer of the first superframe.
Also make sure nonrd_keyframe is set for non-base spatial layers.
Change only affects SVC mode wit number_spatial_layers > 1 and
svc->disable_inter_layer_pred == INTER_LAYER_PRED_OFF.
And only affects first and second frame of sequence.
Change-Id: I8ee9a0873ab1d3a02515774571f719617771ad41
Cyclic refresh is disabled on key frames, but we did not
disable it for for spatial layers whose base is a key frame
(i.e., on a key-superframe).
This fix means generally somewhat lower frame-level QP will be
used for those spatial layers whose base is a key frame,
which will generally mean little better quality for the
key-superframes.
Change-Id: Idf090651aa2f5856fb6696c89198a9f6d5d50280
Generating file lists on a non-mac with:
--target=x86-iphonsimulator-gcc --enable-external-build
the lack of xcrun would cause a warning to print:
libvpx/build/make/configure.sh: line 1397: [: : integer expression expected
Change-Id: I4623b6c5b65296bc71986cd042823f4be9427b42
In the SVC encoder LAST ref frame should be the last temporal
reference at the same resolution. This is the case for the default/fixed
patterns, but may not be the case for arbitrary pattern in flexible mode.
Add check that the LAST reference frame has same resolution as the current frame.
If the reference scale for LAST is different from current treat the current
frame as key frame just for the purpose of superblock partitioning.
This avoids potential segfault in vp9_int_pro_motion_estimation() for different
scaled reference.
Change-Id: I4276ff616de46cd4e12c73316f85ae313f170242
Previously we attempted to convert 411 input. Remove support
because malformed 411 input can cause the conversion to crash.
BUG=webm:1386
Change-Id: I3d41465a94867ee7f8eaa43fb76beb41f8fa644b
When writing out stream for spatial layer N,
make sure to include all spatial layers up to N.
Fixes an issue with the streams when frame dropping occurs.
Change-Id: I1e20b7dac6b94dcda751043541dd8a12f7df6d8c
in BasicRateTargetingVBRLagZero and
BasicRateTargetingVBRLagNonZeroFrameParDecOff after:
e0b28ad69 Add extra case to wq_err_divisor()
BUG=webm:1512
Change-Id: Id181613cc191ff2a2281deffe141efb982501edf
As we add more tests to datarate_test.cc, it's growing bigger and hard
to find specific test.
Split it to vp8, vp9 and svc ones.
Change-Id: Ie8c302010cf304a95554bee19d87ddc90498d0fb
source tools/set_analyzer_env.sh <sanitizer>
will set the compiler, flag, and sanitizer variables necessary to build
and run a variety of sanitizers.
Change-Id: I5dd2ae947cb337d5ccf2a11e9fe87991bc8ba0c8
Add extra case for 360P and smaller.
This hurts a little in psnr for the derf cif set but helps a little
in terms of average rate accuracy. Most clips come in a little
smaller with this patch.
No impact on larger formats.
Change-Id: I5056246cb53b90f961ff9ea5813937f33778aa4c
For the fixed/default SVC patterns, GOLDEN is the
spatial reference, except on key frames, where LAST
is labeled as the spatial reference.
The current code was assuming GOLDEN is always the
spatial reference for the purpose of selecting the
subpel motion (due to the downsampling filter).
Fix is make sure flag_svc_subpel is set and used
with spatial_ref, which is labeled as the proper
spatial reference before entering mode check.
Some quality improvement on key frames.
Change-Id: Id236bcd47055b035731cc910ed84449d7e29f50c
In the constrained framedrop mode for svc: modify the buffer check
condition relative to (non-zero) dropmark to include uppper spatial layers,
in addition to the current spatial layer.
But keep the single layer check if the buffer goes below zero, since
in this case (buffer underflow) we should force drop of that layer
regardless of upper layers.
Change-Id: Id277f0b4a3ae6275effdd5f5f0c80e3229c17424
To reduce the memcpy() cycles in vp9_rd_pick_inter_mode_sb().
The maximum value of mode_map is (MAX_MODES - 1) = 29.
Change-Id: I5704bd66838ea0b075f0afb001f5cbebfd3f1602
googletest imports tuple into testing to allow for compatibility across
c++ versions where tuple may be in std::tr1 or std. fixes deprecation
warnings under visual studio 2017
Change-Id: Id78b372d5478b12d8c8f63fd3f2166fec25aa8be
Add verfication for constrained svc framedrop mode: check that
if a given spatial is dropped, all uppper layers must be dropped.
Change-Id: I9b4821b23c95d1d9d0c031a41af19984647ec5dc
Add the logic for the constrained framdrop mode for SVC.
Add test case in datarate unittests.
Also lower target bitrates in the tests to better test
frame dropper.
Change-Id: I8ee1b8cb56d835c233ad1fbe0fc1456cb2e7291f
Add encoder control to set the frame drop thresholds per
spatial layer, and add a frame drop mode: 0 = per-layer drop,
and 1 = constrained drop mode (a drop on a given layer forces
drops to all upper layers).
Default is mode 0 (per-layer dropping).
Implementation for mode 1 will come in subsequent change.
If the control is not used, then the spatial layer frame
drop thresholds (water mark) are all equal and set to the value
given by the encoder config (oxcf->drop_frames_water_mark).
Bump up the ABI version.
Change-Id: Id038d4181b86fa98b3d44d026f96d5f344d81629
Clears a warning when generating VS project files with older versions of
bash:
declare: -n: invalid option
Change-Id: Id0c0bc17dc5a1599f7d2d73e3cc9259a45540f3f
Even on x86_64, emms has to be called if the x87 state has
been clobbered - the calling code (either within libvpx or
in a caller outside of libvpx) may be using the x87 instructions,
even though use of them isn't all that common on x86_64.
This fixes builds with clang for mingw/x86_64.
Change-Id: I1f6072835590b862bad156f17331ba65c813ddd9
* changes:
configure: Add an arm64-win64-gcc target
test: Check for ARCH_X86_64 in addition to _WIN64
configure: Add an armv7-win32-gcc target
ads2gas: Add a -noelf option
This reverts commit 60a3cb9ad840377d46286bfd703c30a4a4ee56e2.
Reason for revert: x87 instruction usage might not be as
clear cut as I would like. At the very least, llvm mingw
builds appear to having issues with emms.
Original change's description:
> remove fldcw/fstcw from Win64 builds
>
> _MCW_PC (Precision control) is not supported on x64:
> https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/control87-controlfp-control87-2
>
> The x87 FPU is not used on Win64 or ARM so setting the x87 control word
> is not necessary. The SSE/SSE2 and ARM FPUs don't have a precision
> control - the precision is embedded in each instruction - so the need to
> set the control word is also gone.
BUG=webm:1500
Change-Id: I25bcfa96bc9c860f6c7e03315d75fa6fd1d88ec5
_MCW_PC (Precision control) is not supported on x64:
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/control87-controlfp-control87-2
The x87 FPU is not used on Win64 or ARM so setting the x87 control word
is not necessary. The SSE/SSE2 and ARM FPUs don't have a precision
control - the precision is embedded in each instruction - so the need to
set the control word is also gone.
BUG=webm:1500
Change-Id: I014513282a7dc320d1cdeaec48249d98a66bf09f
This configuration doesn't require any extra custom settings, since
it only uses neon intrinsics that are handled automatically by the
compiler (no external assembly).
Change-Id: I35415c68f483a430c0672e060a7bbd09a3469512
This builds for windows on arm, with llvm-mingw. The target triplet
is named -gcc since that's how similar existing targets are named,
even though it technically runs clang (via frontends named
"$CROSS-gcc").
Assemble using $CC -c since there's no standalone assembler
available (except perhaps llvm-mc).
Change-Id: I2c9a319730afef73f811bad79f488dcdc244ab0d
adaptive_rd_threshold_mt is set to 1 when speed >= 7 for SVC.
QVGA in SVC uses speed 5 which set adaptive_rd_threshold_mt to 0.
If VGA or HD is dropped for the last super frame, the flag is still 0
when the encoder is destroyed. Thus memory won't be released.
Change the bitrate threshold in datarate test.
Change-Id: I55352cc0b030568d38eb735d99c2fa29058d3690
Compiler -- gcc (Debian 7.3.0-5) 7.3.0
./libvpx/vp9/encoder/vp9_denoiser.c:374:9: assuming signed overflow
does not occur when assuming that (X + c) < X is always false
[-Wstrict-overflow]
for (j = 0; j < xmis; j++) {
Change-Id: Ib7397e718ff717bdabc088fc4c6e1771381fb522
Add VP9E_SET_SVC_INTER_LAYER_PRED to disable inter layer (spatial)
prediction.
0: prediction on
1: prediction off for all frames
2: prediction off for non key frames
Bump up ABI version.
Change-Id: I5ab2a96b47e6bef202290fe726bed5f99bd4951f
SVC frame dropper: modify the logic to allow for individual
spatial layers to drop. This removes the constraint that all
upper spatial layers must drop when a given spatial layer drops.
Add a flag to the pkt to indicate whether a spatial layer is
encoded or dropped. This is needed for applications that enable
this feature (frame dropping for SVC).
For a current spatial layer, if its previous spatial layer is
dropped, then disable certain features for that layer:
inter-layer prediction, base_mv, partition_reuse, copy partition.
Also add the constraint to never drop a spatial layer if its
base layer is a key frame.
Updates to sample encoder (vp9_spatial_svc_encoder) and the
SVC datarate unittests to properly handle frame dropping.
Bump up ABI version.
Change-Id: I7d14ccf67b8d014a7abfce5ba3989fc623e94067
Only target 32bit builds. Visual Studio does not define _mm_empty for
64bit configurations.
Rename emms.asm and remove from 32 bit builds to avoid empty file
warnings.
Don't check register state on 64bit builds.
BUG=webm:1500
This reverts commit 60beb781c140b61c1957abd2a6717d2e9a831933.
Change-Id: I5ac4cf6c67249ff24f7da19792144de20527bfce
avoids potential OOM when allocating 3 buffers for 16383x16383; 3840 is
used as a replacement
this test was missed in:
215bddf32 vpx_scale_test: reduce max size for 32-bit targets
Change-Id: I515adf5999c6ef1724394ccd62d677134bd35e6d
Adjustment to initial active based on image size.
Add extra breakout case for kf boost loop.
Small adjustment to q delta calculation for key frames.
Net % improvements for all standard tests sets (-ve values) measured
using c-bvr mode.
(Overall PSNR, SSIM, PSNR-HVS)
Low Res: -0.223 -0.229 -0.107
Mid Res: -0.175 0.008 -0.180
High Res: -0.131 0.106 -0.206
NFlix 2K: -0.390 -0.271 -0.489
NFlix 4K: -1.370 -0.825 -1.590
Change-Id: I06a39de43594e1a99bb0cb281af15cdb8058a8ed
If a given spatial layer decides to drop, due to the
buffer/overshoot conditions for that layer, then drop
that current spatial layer and all spatial layers above.
In the current implementation the svc frame counter
(and hence the pattern for the non-flexible SVC case)
are updated on frame drops.
Also add last spatial layer encoded to the pkt.
This is useful for RTC applications that enable
frame dropping for SVC.
Update to the SVC datarate tests:
enabled frame dropper on all SVC datarate tests, and
made a fix to properly set the temporal_layer_id, which
works now even on frame drops.
Change-Id: If828c193f3cb6b1839803fd52fe9fbbda5b5a039
This reverts commit 13d0955b250bcb7eac99034e7b1677d3d026b569.
Reason for revert:
this should be investigated further to ensure the memset is really
necessary outside of the static analysis pass.
Original change's description:
> vp9_loopfilter.c: zero lfl_uv
>
> The initialization depends on cm and mi_row which static
> analysis does not approve of.
>
> Clears a static analysis warning:
> warning: The right operand of '+' is a garbage value
> const loop_filter_thresh *lfi = lfthr + *lfl;
>
> Change-Id: I8c863ced2b1e9a7e10103b7281098f20941a6ca2
TBR=johannkoenig@google.com,marpan@google.com,builds@webmproject.org,jianj@google.com
Change-Id: Icadb6438fbcddba747622f06f2eadebdb333edf6
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Fix a bug when middle and top spatial layer are skip encoded
(disabled) and then re-enabled again, during the sequence.
Issue is that pending_frame_count in the packing may
be incremented on middle layer, even though that layer is skipped
(not encoded and hence zero size). Fix is to add size check.
Modified existing unitest to reproduce the issue.
Change-Id: I86d806a112d468e06b04fbf7c46ae07db9e0ad93
The initialization depends on cm and mi_row which static
analysis does not approve of.
Clears a static analysis warning:
warning: The right operand of '+' is a garbage value
const loop_filter_thresh *lfi = lfthr + *lfl;
Change-Id: I8c863ced2b1e9a7e10103b7281098f20941a6ca2
These values are not consistently set before calling update_best_mode.
In vp9_rdopt.c they are individual values instead of a struct and are
zero'd at declaration.
Clears a static analysis warning:
warning: The right operand of '-' is a garbage value
RDCOST(x->rdmult, x->rddiv, (rd->rate2 - rd->rate_uv - other_cost),
warning: The right operand of '-' is a garbage value
(rd->distortion2 - rd->distortion_uv));
Change-Id: I19895d062e7c0ac67937126ebc5dcb0afd3a2931
The loop appears to set map[i] with the intention of running
the 'j' loop up to that point. However, without zero'ing map[]
first the behavior is unpredictable.
Fixes a static analysis warning:
warning: Branch condition evaluates to a garbage value
for (j = 0; j < 4 && map[j]; ++j) {
Change-Id: Ifa39353d8aa5cc47b467a7d3d8cdd3b5319fd997
These values are set in main() from user input. Ensure
they are cleared out first.
Clears a static analysis warning:
warning: The right operand of '*' is a garbage value
1000.0 * rc->layer_target_bitrate[0] / rc->layer_framerate[0];
Change-Id: I09bd209be5aff31b87597a24d37a9673fa99381b
This reverts commit 118a57045bf5b49ab7c2f7f930543b9217fd422e.
Reason for revert: Fails on Visual Studio builds:
vpxmdd.lib(vpx_ports_emms_mmx.obj) : error LNK2019: unresolved
external symbol _m_empty referenced in function
vpx_clear_system_state
Original change's description:
> use intrinsics for 'emms'
>
> BUG=webm:1500
>
> Change-Id: I3235d8c2abc01dd3a35e14c5cbcfe20283ff8fb2
Change-Id: Ia9c40bc103c57cced83353249c55218eaf2f0b0c
Static analysis does not recognize that output_rc_stat guards
the usage of window_size. Clears this warning:
The right operand of '>' is a garbage value
if (frame_cnt > (unsigned int)rc.window_size) {
set_rate_control_stats sets window_size to 15. Zeroing it
just introduces another static analysis warning.
Change-Id: Ieee7b81a385f986e42189101cfa39279e519b368
This should be taken care of by parse_superframe_index but
the static analysis is not recognizing it because it depends
on 'marker' which is read from the bitstream.
Clears a static analysis warning:
The right operand of '*' is a garbage value
rc.layer_encoding_bitrate[layer] += 8.0 * sizes[sl];
Change-Id: I8ee48a98f907bc7b46869fd27a351f33e2e7de71
Print error messages as they are encountered. This was the default
behavior.
Removes a static analysis warning regarding the use of strncat:
Null pointer argument in call to string length function
As this is the only use of strncat in the library, remove it and the
associated public function.
Change-Id: Id55305c5a4d65f11da88c3a2203ff824200f526f
sl was passed to set_frame_flags_bypass_mode, triggering
an uninitialized variable warning. Inside the function it
is only used as a local variable.
Change-Id: If743626e9e10fd41d135e3b4ad6196dc4dc90172
When an enhancement spatial layer is skipped, we should check
for updating the layer frame counters.
Change-Id: Ib79d0955c62fb465f59ef2f9ac45240ae2614d7b
This causes assert to trigger in choose_partitioning().
This can happen in some cases when enhancement layers
are enabled midway during the stream.
Change-Id: I69c3c8b4b1e3f1c7d8d7294d633ca5ddca148e8b
The largest frame is currently in choose_partitioning:
warning: stack frame size of 44156 bytes in function 'choose_partitioning'
but adding HBD amplifies other things:
warning: stack frame size of 51480 bytes in function 'dec_build_inter_predictors'
Add some padding for sanitizer and variances between compilers.
BUG=webm:1498
Change-Id: I0d94d4f94d25dafafca9d7484881c2ce5f8de371
The file contains sse2 implementations related to various block error
functions. Update the .mk file to include it only when sse2 is
requested.
BUG=webm:1500
Change-Id: I67b766faed425fd7a96db8541b13c69670b65fec
For SVC, if any of the layer scale ratios are not
2x2, then disable the partiton_reuse, which assumes
2x2 scaling between layers.
Change-Id: I8b3163de0826052bbb1bfe03554a074c89510558
Set phase_shift = 0 if the scale factors are
above 3/4. Removes artifact for scale factors
close to 1.
phase_shift = 8 is to get an averaging filter
(decimated pixel aligns to 8/16, midway between source pixels),
and only makes sense for scale factors multiples of
2 (1/2, 1/4,...).
Removes artifact for high scaling ratios.
Change-Id: Id0a85869d6c6156dda0032c697ded2de78fad6bd
This change is targeted mainly at higher resolutions where typically
the average error per MB is much smaller. hence this patch replaces
a fixed error per MB factor with a tiered value.
It also adds in a fixed offset value that acts as a minimum return score.
Note also minor fix to debug stats output.
The results are overall beneficial (-ve) on our test sets, most notably for
higher definition formats (see below - overall psnr, ssim, psnr hvs)
low res: 0.184 -0.262 -0.166
mid res: 0.094 0.075 0.049
hd res: -0.752 -0.300 -0.800
NF 2K: -0.353 1.095 -0.302
NF 4K: -1.245 -0.578 -1.205
The most notable negative case is pierseaside 2K which appears to be worse by
8-10% (which has a big impact on the overall gain for the NF 2K set). Closer
inspection reveals that the drop does not relate to the key frame boost
per se as in both cases the key frame substantially undershoots its target. Rather
this is a side effect relating to the initial Q range allowed for the key frame and
a poor initial complexity estimate. This will hopefully be improved in a later
patch.
Change-Id: I4773ebe554782f4024c047c3c392c763a3fe843b
Changes in the function size in bytes (in lieu of performance metrics)
Before After Diff
vpx_fdct32x32_avx2 29564 -> 28334 -1230
vpx_fdct32x32_sse2 38053 -> 36309 -1744
Change-Id: Ie0b3e6ed7c3f2e9ea45f9d6a1ce1e27d068cee6b
Extended ROI struct suitable for VP9.
ROI input from user is passed into internal struct and applied on every frame
(except key frame).
Enabled usage of all 4 VP9 segment features (delta_qp, delta_lf, skip,
ref_frame) via the ROI map input.
Made changes to nonrd_pickmode for the ref_frame feature.
Only works for realtime speed >= 5.
AQ_MODE needs to be turned off for ROI to take effect.
Change example in the sample encoder: vpx_temporal_svc_encoder.c to be suitable
for VP9.
Add datarate test.
Bump up ABI version.
BUG=webm:1470
Change-Id: I663b8c89862328646f4cc6119752b66efc5dc9ac
This reverts commit 4e5b4b58483e1f38e37acd49b809d725b4f66c26.
Reason for revert: Commit message inaccurate.
Original change's description:
> Add ROI support for VP9.
>
> Extended ROI struct suitable for VP9.
> ROI input from user is passed into internal struct and applied on every frame
> (except key frame).
>
> Enabled usage of all 4 VP9 segment features (delta_qp, delta_lf, skip,
> ref_frame) via the ROI map input.
> Made changes to nonrd_pickmode for the ref_frame feature.
>
> Only works for realtime speed >= 5.
> AQ_MODE needs to be turned off for ROI to take effect.
>
> Change example in the sample encoder: vpx_temporal_svc_encoder.c to be suitable
> for VP9.
> Add datarate test.
>
> Bump up ABI version.
>
> BUG=webm:1470
>
> Change-Id: I7e0cf6890649adb98a5fda2efb6ae1fa511c7fc9
TBR=yaowu@google.com,jzern@google.com,marpan@google.com,builds@webmproject.org,jianj@google.com
Change-Id: I000dbd81e0c67cb8a0dcde4013ee9bf7afb038f0
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: webm:1470
This patch adds in detection of slide show key frame groups.
The detection assumes extremely low or 0 motion for all frames
in the key frame group.
If this case is detected the boost level is set to a very high value
and the min Q to a lower value for the key frame itself.
Alt refs and golden frames are disabled to save bits (up to a limiting
maximum interval currently set to 240 frames).
In test samples that I created, this patch gave rise to a substantial
improvement in overall psnr and a drop in data rate. In some cases the
average psnr fell, however, with the boost and minQ values set as they are.
This is to be expected because previously a relatively poor key frame
could be followed by progressively better alt refs. For example a key
frame at q7.5 but subsequent alt refs improving it to lossless. Given that
average psnr tends to be dominated by the best frames, a ramp like this
from q7.5 to lossless may give a better average psnr than, for example,
coding the entire sequence at q2.5. Overall psnr, however, will be much
better in the latter case. The option exists to boost the key frame further
which would insure much better results for all metrics, but at the expense
of smaller bitrate savings. Given that these samples tend to have very
good quality anyway this seems like a bad trade off.
For slides displayed for several seconds, bitrate savings of >= 20% are likely
and much larger gains are possible in some cases.
Change-Id: Ib4b61e153c55d3f2f561153da13fdb56f397a52b
Extended ROI struct suitable for VP9.
ROI input from user is passed into internal struct and applied on every frame
(except key frame).
Enabled usage of all 4 VP9 segment features (delta_qp, delta_lf, skip,
ref_frame) via the ROI map input.
Made changes to nonrd_pickmode for the ref_frame feature.
Only works for realtime speed >= 5.
AQ_MODE needs to be turned off for ROI to take effect.
Change example in the sample encoder: vpx_temporal_svc_encoder.c to be suitable
for VP9.
Add datarate test.
Bump up ABI version.
BUG=webm:1470
Change-Id: I7e0cf6890649adb98a5fda2efb6ae1fa511c7fc9
This value was originally set in response to requests from the hardware
team before levels were properly defined for VP9.
Even if a level is not specified for an encode, it imposes a maximum
frame size for videos of dimensions <= 1080P. For larger formats the
limit was set at 250 bits per MB.
This patch modifies the limit to be more in line with the requirements
specified for level 4 (max rate for a 4 frame group of 16 Mbits). If a lower
level is specified at encode time and this mandates a smaller maximum frame
size then the level requirement will still take precedence.
Increasing this value allows for some slide shows or very low motion clips
to code a better quality key frame.
Change-Id: Ic08e0e09c8a918077152190c59732b9a1c049787
The stats input pointer, when passed in, already points to the
frame after the golden frame so should not be advanced here.
This fix has a small mostly positive effect on results in our test sets
(tested using corpus vbr settings) and gives a gain of almost 0.5%
in overall psnr (plus slightly smaller gains on other metrics) for the
4K set.
The bug also caused a crash in calculate_group_score() in another
patch which allows coding of slides in a slide show as a single
long KF group without ARFs or GFs.
Change-Id: I57a3a24baf442ce55dbc91fba05e056697c63a6f
For encoding with --enable-multi-res-encoding, with 1 layer, when the
target bitrate is set 0, under these conditions null pointer
will be de-referenced. Fix is to check
cpi->oxcf.mr_total_resolutions > 1. Also added NULL pointer check.
This issue causes crash for asan build in chromium clusterfuzz.
BUG=805863
Change-Id: I9cd25af631395bc9fede3a12fb68af4021eb15f8
this allows the test to be sharded more efficiently and speeds up the
run when built with slower configs, e.g., asan.
Change-Id: If6d863b76871e3934704a1079bbf17f4886932c7
scaled_temp frame is used as an intermediate buffer for
2 stage down-sampling: two stages of 1/2 down-sampling
for a target of 1/4x1/4. This is used in 3 layer SVC
to avoid duplicate frame downsampling (on middle layer).
As this allocation is only needed/used when the
number_spatial_layers > 2, add this condition to avoid
unneeded allocation for 1 and 2 spatial SVC.
Change-Id: If342466644f685c1ea3ca5344b581793e5136c09
For 3 spatial layers with 1/2 downsampling, the
downsampling filter for the middle layer was not
set for the very first frame, so it was defaulting
to the subsample filter (no averaging/phase = 0).
Its not set due to the two stage scaling that is
done for 1/4 on base layer, during which the intermediate
1/2 result is saved for the middle layer.
Fix for now is to set the default downsampling filter
to Bilinear (averaging/non-zero phase) for all layers on
init (vp9_init_layer_context):.
Change-Id: Ic7407810b34c621e7e7420682508d45478bdffcf
Eliminate false positives in previous patch.
The previous patch did a good job of detecting slide transitions but
in discussions a number of situations were identified that might trigger
harmful false positives. This risk seems to be born out by some testing
on a wider YT set done by yclin@.
This patch adds an additional clause that requires that the best case
inter and intra error for the frame are very similar,meaning it is almost
as easy to code a key frame as an inter frame. This will certainly prevent
the false positive conditions that Jim and I discussed and even if one
does occur it should not be very damaging.
The down side is that this clause may mean that we still miss some
real slide transitions, especially if the images are small and similar. If this
proves to be the case then some further adjustment of the threshold may be
required. However, in the specific problem sample provided we do trap every
transition correctly.
Change-Id: I7e5e79e52dc09bc47917565bf00cc44e5cddd44c
Only affects 2 temporal layer case.
Modified the flags for 2 temporal layers to make
top layer (top spatial, top temporal) a non-reference
frame, conistent with the 3 TL case.
Add mismatch check to the datarate test of changing
svc pattern on the fly, which is test for 2 temporal
layers.
This re-applies the change: 254e2f5501d,
that was reverted in: 658eb1d675.
Change-Id: Ib5fd4a7a0312c0c05329ae75baac480af34b4694
This reverts commit 254e2f5501d000ca66bc65c5f44bb6a882d4167c.
Reason for revert: <INSERT REASONING HERE>
Original change's description:
> vp9 svc: fix to make top layer frame non-ref
>
> Add mismatch check to the datarate test of changing svc pattern on the
> fly.
>
> Change-Id: I6a878736de44e6a40c077ed6430aabd7fadabdd9
TBR=marpan@google.com,builds@webmproject.org,jianj@google.com
Change-Id: Ibcb600438098f8dc380fe7e1de90cb81fc367468
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
In the process of fixing a ubsan warning:
commit 738b829b8cdf079a5fa48c74a28a177c9567d212
Fix incorrect size reading
the inferred check of start < end was removed. This causes fuzzed files
to get a little further and segfault in vp8dx_start_decode.
Change-Id: I316e23058753ba42dbcc46d27eb575f51c8a9e9a
When compiling an app using libvpx in Xcode 9.2, a warning is
thrown in vpx_frame_buffer.h:
"Parameter 'new_size' not found in the function declaration"
Switching it to 'min_size' to match the comment text and the
callback type definition prototype resolves it.
Change-Id: I7a3e4a857c2007c2d0d390e22054d7bc85068aa1
The following changes were not carried back from the release branch:
commit f87a4594fbd0b19071a2befebf52d3f5fabd1a9e
Revert "Add frame width & height to frame pkt. Add test."
commit c5dc3373dbd442ea299bacf276c4258fa7ce0559
work around pic issue with gcc 6
BUG=webm:1490
Change-Id: Id3e15983d5565680c05a0c454544003a615a4d7f
Cherry pick from vp9:
commit 85770264ac891505730dcd5092d1993a62c74060
Guard against incorrect size values moving *data past data_end.
Check read length against the difference of the buffers.
Change-Id: I5e8679ddd447c4d73deb80be5ec94841a92c5fcd
For SVC, on spatial enhancement layer, intra
search was disabled unless best reference frame
is golden (i.e., spatial/inter-layer prediction),
except for some other conditions (lower layer is key
or golden is not an allowed reference).
Fix is to add the base temporal layer condition,
so intra search will not be force-disabled for base
temporal layer frames.
This improves metrics (-1-2%) for SVC 3 and 2 layer config.
Some small encode time is expected, but since condition
only affect base temporal layers (i.e., every 4 frames
for 3 layers), increase is small.
Change-Id: I10b824faef99560dfdeeb02ba8bf8e3e1eea6255
In nonrd-pickmode: the golden/spatial reference for inter-layer
prediction may be skipped in the mode testing. Add QP dependency
to reduce the threshold for skipping (i.e., check it more often)
at high QP, if the lower layer was encoded at lower QP relative
to the current layer.
At high QP, a better quality lower resolution is more likely to
provide good spatial (inter-layer) prediction.
avgPSNR/SSIM metrics up by ~1% (all clips positive gain or neutral).
Some decrease in encode time (~1-2%) expected at lower bitrates,
for 3 layer SVC.
Change-Id: I9ee0f62d4b10d4ebd30165d378ecfa4399ae5ef1
warning: Tag `XML_SCHEMA' at line 941 of file `doxyfile' has become obsolete.
warning: Tag `XML_DTD' at line 947 of file `doxyfile' has become obsolete.
Change-Id: I85e39c4fb154569b8d7f68bdf362408983e9bd4f
Remove an adjustment to two cyclic refresh (aq-mode= 3)
parameters for SVC. The adjustment was to reduce the
delta-qp on second segment, and reduce the motion threshold.
This was done early on in the SVC encoder development,
in the latest codebase removing this adjustment yields
some improvements in metrics.
The avgPSNR/SSIM metrics increase on average by ~1%
(most clip positive gain), for 3 and 2 layer SVC.
Change-Id: I7a4d5114f16b2a1df383dbe6b3fe02940e29e6cc
vp9 does not support multi-res encoding, the request should not crash.
+ encode_api_test: unconditionally expose multi-res test
vpx_codec_enc_init_multi should fail independent of
CONFIG_MULTI_RES_ENCODING if not for the same reason.
Change-Id: I44fc58ef70ee4e0e482cb6a5736885f4cb2a8517
(cherry picked from commit 004fb91416e355986dc098883791becf39ffc1f7)
Fix is from the patch in the issue.
Release memories allocated before early exit.
BUG=webm:1482
Change-Id: I64952af99c58241496e03fa55da09fd129a07c77
(cherry picked from commit 5b6ae020b6a972b67b59b3dbf7cf9cbd3140a80d)
vp9 does not support multi-res encoding, the request should not crash.
+ encode_api_test: unconditionally expose multi-res test
vpx_codec_enc_init_multi should fail independent of
CONFIG_MULTI_RES_ENCODING if not for the same reason.
Change-Id: I44fc58ef70ee4e0e482cb6a5736885f4cb2a8517
In commit 577d4fa79, int8_t was used to replace char. This will result in a
compilation error, for int8_t was typedefined to signed char, but not char.
Change-Id: I5c9837e01b0b58688a7741f5c9a99a76ca887e4a
Remove trailing commas to keep multiple elements on one line.
Add blank lines to prevent comments from being treated as blocks.
clang-format guards for struct with a comment in the middle.
Change-Id: I3bcb8313ae8aaf69179249a13b4087b1272cdbc0
These don't appear to make any sense given their context. The
commit log also does not reveal anything.
Discovered due to spurious clang-format indenting:
https://bugs.llvm.org/show_bug.cgi?id=35930
Change-Id: I732a66056ba4c05e3e132a2f236fe10f7a282900
Allow*OnASingleLine appears to no longer apply to
typedef structs.
Adjust closing parenthesis/opening brace on functions.
Remove trailing commas to keep multiple elements on one line.
Change-Id: I6e535a8ddb15c9b3de8216ce8ddb2a18241af46c
Remove comments above #define statements because they get
indented unnecessarily.
https://bugs.llvm.org/show_bug.cgi?id=35930
Add blank lines to prevent comments from being treated as
blocks.
Change-Id: I04dce21b2a10e13b8dc07411a0019c098f6dd705
For the vp8 simulcast/multi-res-encoder:
Add flags to keep track of the disabling/skipping of
streams for the multi-res-encoder. And if the lower spatial
stream is skipped for a given stream, disable the motion
vector reuse for that stream.
Also remove the condition of forcing same frame type
across all streams.
This fix allows for the skipping/disabling of the base
or middle layer streams.
Change-Id: Idfa94b32b6d2256932f6602cde19579b8e50a8bd
This reverts commit bd1d995cd38b4e31a01356079f1d94067273eb28.
Remove the feature from the release as it requires additional work.
BUG=webm:1485
Change-Id: I1a01ac2525703af97a456a3eed85718306c0f734
Without this applications cannot use the vpx_codec_control macro
for VP9_SET_SKIP_LOOP_FILTER. The tests only cover the underscored
version vpx_codec_control_().
Change-Id: I3e6c1888307b76636fdc1a8deae70b5c14238163
(cherry picked from commit 373e08f921e5bfd5a96963fabbbbe16ec793d44e)
Without this applications cannot use the vpx_codec_control macro
for VP9_SET_SKIP_LOOP_FILTER. The tests only cover the underscored
version vpx_codec_control_().
Change-Id: I3e6c1888307b76636fdc1a8deae70b5c14238163
The in/out (or zoom metrics) in accumulate_frame_motion_stats()
are in effect a % of the blocks that have a motion vector pointing
either towards or away from the center. As such they are already
normalized in terms of image size and the thresholds against which
these are tested should be image size independent.
In practice a zoom either in or out is an indicator for a shorter group
length so the abs value is more important as a breakout clause.
This patch fixes the threshold test. Clips without noticeable zoom show
no effect but some with strong zooms such as "station" show a big
gain (5-10%). Average psnr-hvs gain on hdres set was 0.292%
Change-Id: I4f97a72b0e273e4e844ade15285749c32cd81c1c
(cherry picked from commit 0226ce79e9389ccf7d10ed7acacba6840ad911c9)
Remove trailing commas to keep multiple elements on one line.
Remove trailing empty lines to keep comments from being indented.
https://bugs.llvm.org/show_bug.cgi?id=35930
Change-Id: I0a66dde95f2a304f13cb85a2e9197afca20051e8
For SVC: if an enhancement layer (spatial_layer > 0)
has 0 bandwidth, skip/drop the encoding of the layer.
This allows the application to dynamically disable
higher layers for SVC.
Add flag to signal the skip encoding, this is needed
to modify the packing of the superframe when the top
layer is skipped/dropped.
Also moved some updates (current_video_frame counter and
the last_avg_frame_bandwidth) to the postencode_update_drop_frame().
Added datarate unittest for dynamically going from 3 to 2
and then back to 3 spatial layers.
Change-Id: Idaccdb4aca25ba1d822ed1b4219f94e2e8640d43
Control Flow Integrity [1] indirect call checking verifies that function
pointers only call valid functions with a matching type signature. This
change eliminates some function pointer casts that I missed in my last
CL https://crrev.com/c/780144.
BUG=chromium:776905
[1] https://www.chromium.org/developers/testing/control-flow-integrity
Change-Id: I1c7adbdfffa4fe0b62e993bfb31d06e64b022d66
Adds a breakout threshold to key frame boost loop.
This reduces the boost somewhat in cases where there is a
significant zoom component. In tests most clips no effect
but a sizable gain for some clips like station.
Change-Id: I8b7a4d57f7ce5f4e3faab3f5688f7e4d61679b9a
This fix improves detection of key frames in slide shows.
In particular it helps if the slides are pictures of varying formats
as in a sample provided by yclin@.
This change does not impact any of the clips in our standard tests
but for the example slide show test clip helped global psnr by
several db and resolved a serious visual quality issue.
Change-Id: Iaeeeed55dc0bb50aeacd4996ed660ced06374603
Switch from bilinear to eighttap_smooth for frame-level
downsampling at low resolutions (<= 320x240).
avgPSNR/SSIM metrics increase from ~0.5-2% (all clips positive gain),
for 2 and 3 spatial layer SVC, with 3 temporal layers.
Small/negligible increase in encoding time (< 1%).
Change-Id: I758472fc4fddd51d87f13c9d1a1cd4986ef5d41f
The in/out (or zoom metrics) in accumulate_frame_motion_stats()
are in effect a % of the blocks that have a motion vector pointing
either towards or away from the center. As such they are already
normalized in terms of image size and the thresholds against which
these are tested should be image size independent.
In practice a zoom either in or out is an indicator for a shorter group
length so the abs value is more important as a breakout clause.
This patch fixes the threshold test. Clips without noticeable zoom show
no effect but some with strong zooms such as "station" show a big
gain (5-10%). Average psnr-hvs gain on hdres set was 0.292%
Change-Id: I4f97a72b0e273e4e844ade15285749c32cd81c1c
Increase the recursive average factor from 15/16 to 3/4
to make the noise estimation respond faster.
Small/neglible change on low noise content, but better
denoising for noisy content.
Also encoder speedup of ~2-3% observed on some noisy clips.
Change-Id: I9dd02fe961ca24b411fe4c2732f814bf1e9a7f9f
This c version uses the shortcuts found in the x86
vp9_quantize_fp functions.
The test was updated to use the correct quant/round range.
Change-Id: Ie5871f710d9eb39047d8d9f48b907c0633e1f830
INLINE is defined as __forceinline for vs* configs, but is the
normal, compiler-discretion inline for gcc/clang configs. This
makes many functions very large when building for windows targets,
much larger than they are elsewhere.
Use '__inline' as a consistent definition to get consistent function
sizes. Although Visual Studio documentation says that 'inline' is
only available in C+ code. This is probably incorrect, since Visual
Studio 2017 accepts C99 'inline' even when passed /TC. Nevertheless,
this commit uses the recommended '__inline' for consistency.
Thanks to David Major for the diagnosis.
Change-Id: Ib0b31a3afcea77822c84fe3c6cd452add66d825a
eob is a pointer to a uint16_t. previously the code would store 64-bits
causing a crash or test failure with the right stack layout.
Change-Id: Ibd653baf323db114f2444951b9d8b00c596bf15a
This reverts commit 86842855d30d6ca6befdcf5108003e027d90daa9.
SSSE3/VP9QuantizeTest.EOBCheck/1 fails on Mac and the build breaks under
visual studio due to a #if within another macro.
Change-Id: I475095a04aafcc714fade2b24e4df7b682be2cd1
Modify and update the SVC datarate unittests to verify the
rate targeting for each spatial-temporal layer.
The current tests were only verifying the rate targeting
of the full SVC stream, not individual layers.
Also re-enabled a test that was disabled.
This is a stronger verification of the layered rate control
for SVC for 1 pass CBR encoding.
Added PostEncodeFrameHook, needed to get the layer_id and
update the layer buffer level.
Change-Id: I9fd54ad474686b20a6de3250d587e2cec194a56f
This c version uses the shortcuts found in the x86
vp9_quantize_fp functions.
The test was updated to use the correct quant/round range.
Change-Id: I5d19f8af2fddda8e50910249eafb740acb29415b
For a large change in the target avg_frame_bandwidth,
via the update in change_config()), reset the buffer_level
to optimal_level.
This fix prevents possible frame drops, where for example,
encoder suddenly goes from lower to higher bitrate.
Change-Id: I2f844c41d04c01240e85f574e59d2b9075c7eb6d
the random number generator creates values from [0, range) add 1 to all
and make hev more realistic by mirroring its calculation of level >> 4,
i.e., [0, 3]
Change-Id: Ic19be5d7ba668deb17c96f143b739116a4b5d21c
Optimize function vp8_mbloop_filter_vertical_edge_mmi and
function vp8_mbloop_filter_horizontal_edge_mmi.
Make full use of memory loading delay slot and reduce unnecessary
instructions.
Change-Id: I61da2c3a44c06044225461f46bf487d83cba6c16
all_builds.py has been more or less replaced by Jenkins.
author_first_release.sh is unused.
ftfy.sh has been obviated by having the whole tree clang-format clean.
Change-Id: I741315ad9042e6e901f07410e93f28371db703b2
1. Delete unnecessary zero setting process.
2. Optimize the method of calculating SSE in vpx_varianceWxH.
Change-Id: I8bab801416e7f4958c28c6d080e3cf785a50f82b
With recent fixes to rate control for SVC the
buffer underrun in the tests does not happen,
so comment and TODO can be removed.
Also, in some of these SVC tests, replace the HD clip
with the corresponding VGA clip, which has > 400 frames.
For the (niklas) HD clip: it has only 60 frames but the
test was running up to 300 frames. Fixed it to 60 frames.
Keep some tests with the HD clip, needed for the 4 thread
and 5 level scaling test.
Change-Id: I0a2356a908e8b2271c7a422eb8b15c0d56eec968
For large dynamic changes in target avg_frame_bandwidth, or
a change in resolution, via the update in change_config()),
reset the under/overshoot flags (rc_1_frame, rc_2_frame)
to prevent constraining the QP for the first few frames
following the change.
For SVC use the spatial stream avg_frame_bandwidth in
reset condition.
For the avg_frame_bandwidth condition, use fairly large
threshold (~50%) for now in reset.
This allows for better/faster QP response if, for example,
application dynamically changes bitrate by large amount.
Change-Id: Ib6e3761732d956949d79c9247e50dba744a535c0
Denoise 2 spatial layes at most.
Add noise sensitivity level 2 for vp9 such that applications can control
whether to denoise the second highest spatial layer.
Add tests to cover this case.
Change-Id: Ic327d14b29adeba3f0dae547629f43b98d22997f
Immediately following a key frame the trailing second reference
error in the first pass stats will be based on a reference frame from
the prior key frame group and will thus usually be much larger.
This fix eliminates that effect (which typically triggers a short arf
group immediately after a key frame). It also changes the accounting
for the first frame in each new arf group.
This change gives large gains on a couple of clips that contain mid
sequence key frames (e.g. 6% on 1080P tennis). Overall there was
a net gain in PSNR and PSNR-HVS ~(0.05- 0.4%) and mixed results for
SSIM (+/- 0.2%).
Change-Id: I8e00538ac2c0b5c2e7e637903cac329ce5c2a375
Downsampling filter for SVC was set to subsample (phase 0)
for HD -> VGA, and bilinear averaging (phase 8) for VGA -> QVGA.
This change makes it bilinear averaging for HD -> VGA.
Given the recent commit 9f9d4f8, quality is improved with
this change: avgPSNR/SSIM up ~1-3% on HD clips in RTC set.
Speed decrease of ~1% for 3 layer SVC.
Change-Id: If834a320e372b8b922a6bf7cab4227703b1beae6
Move the early exit checks on usable_ref_frame and
skip_ref_find_pref up before the check on flag_svc_subpel.
The code under flag_svc_subpel requires frame_mv to be set
for the golden/spatial reference, which is only set if the
both those exits don't pass.
No change in behavior.
Change-Id: Id304276c745eeb389ff85fa2dcf510d5976bc413
For nonrd pickmode on a given spatial layer, the spatial
(golden) reference was always only using zeromv for prediction.
In this patch if the downsampling filter used for generating
the lower spatial layer is an averaging filter (nonzero phase),
we allow for subpel motion on the spatial (golden) reference to
compensate for the shift. This is done by forcing the testing of
nonzero motion mode to compensate for spatial downsampling shift.
Improvement for cases where the downsampling is averaging filter.
In the current code this is only done for generating
resolutions <= QVGA.
Improvement for avgPSNR/SSIM on RTC set for speed 7: ~1.2%.
Gain is larger (~2-3%) for VGA clips with 2 spatial layers.
~1% speed slowdown for 3 layer SVC on mac.
Change-Id: I9ec4fa20a38947934fc650594596c25280c3b289
Don't add include files to the archive. Avoids build failures for
Windows such as:
the input file 'libvpx_g.a(x86_abi_support.asm.o)' has no sections
Change-Id: If9c8e70c0ec913b7ad7dd6a08d4fa19011114ad2
No need to specify default behaviour. The original change introducing nasm:
7be093ea4d
mentions requiring 2.0.9, which was the first release to default to this behaviour:
http://www.nasm.us/doc/nasmdoc2.html
"The -Ox mode is recommended for most uses, and is the default since NASM 2.09."
Change-Id: Ia914c4deede5aa447277b5189bb4fcf7e54c338d
nasm does not accept x64
yasm has accepted (and appears to prefer) win64 at least as far back as
1.0.0:
http://yasm.tortall.net/releases/Release1.0.0.html
Change-Id: Ied881b1df0570da256b1bd7e131e7817e47f768f
Set num_inter_modes based on ref_mode_set_svc, which is
smaller set than ref_mode_set (which may use alt-ref).
No change in behavior.
Change-Id: I31169bb09028db230552c6fca0a86959d1ade692
1. Delete unnecessary zero setting process.
2. Optimize the method of calculating SSE in vpx_varianceWxH.
Change-Id: I58890c6a2ed1543379acb48e03e620c144f6515f
Avoids duplicate computation of UV predictor.
Bit-exact when static_threshold is zero.
Small/neutral difference on RTC set with nonzero static_threshold
(since UV predictor won't be skipped with this change).
Small speed gain, ~1-2%, at speed 8.
Change-Id: Iba8d22a307768b391e29d63c9826aac5a4d9c285
this is only meant for testing. along with --enable-experimental
--enable-spatial-svc require VPX_TEST_SPATIAL_SVC to be defined rather
than bumping the encoder ABI.
Change-Id: I7f34d9f60300fa31ccf22e1a4aa619392c391b2e
For 1 pass cbr SVC: GOLDEN is the spatial reference,
better not to check for encoder_breakout on this reference.
Small positive ~0.075% (mostly neutral) gain in avgPSNR/SSIM metrics.
No observed change in encoder speed.
Change-Id: Ib337f16d6771105bf06384c6a23ad047fc690418
For the case when the number of temporal layers > 1,
the buffer levels (starting/optimal_buffer_level,
and maximum_buffer_size) were not scaled properly.
In vp9_update_layer_context_change_config():
when setting the layer-buffer levels, fix is to scale
the layer-target_bandwidth by the target_bandwidth
(which is the full stream bandwidth) instead of the
spatial_layer_target.
This is needed because prior to the call
vp9_update_layer_context_change_config(), set_rc_buffer_sizes()
is called which sets the buffer levels based on target bandwidth
(which is the full bandwidth for the SVC stream).
This fix properly sets the layer-buffer levels based on the
layer-bandwidth, and leads to better rate targeting.
Small/neutral change in avgPSNR/SSIM metrics on RTC set.
Change-Id: Ic0f4f7f3487c37b9a9adb4781ae5edfed7140a57
Control Flow Integrity [1] indirect call checking verifies that function
pointers only call valid functions with a matching type signature. This
change eliminates function pointer casts to make libvpx CFI-safe.
[1] https://www.chromium.org/developers/testing/control-flow-integrity
Change-Id: I7e08522d195a43c88cda06fa20414426c8c4372c
For reference frames: enable scale partition for
superblocks with low source sad or if bsize on lower-resoln
is at least 32x32.
Keep feature disabled for base temporal layer.
Small regression in avgPNSR/SSIM metrics, ~0.5-1%.
Speedup ~2-3% on mac for SVC (3 spatial/3 temporal layers) at speed 7.
Change-Id: I5987eb7763845b680059128b538bb5188be0cca5
When allow_partition_search_skip is set the two pass code
can optionally skip the partition search in the rd loop if the image
appears static (based on selection of 0,0 motion).
Unfortunately 0,0 motion does not necessarily mean that there are
no meaningful changes or that motion or intra modes will not be selected
in the second pass.
Disabling "allow_partition_search_skip" may hurt the encode speed a little
for a small number of clips but can have a big impact on compression.
The most notable example of this in our test sets is "bridge_close_cif"
where this change gives a gains of 18%, 12% and 16% in opsnr, ssim and
psnr-hvs.
Change-Id: I765e288b5c0cd82bce00a148e7653a21e9203024
Enable partition copy on boundary and scale blocks along the boundary.
Rename copy_partition_svc to scale_partition_svc.
Do not copy if the block crosses the boundary.
Change-Id: I37a04d48f11b15c4ea67facd7631193ec2f62150
Fixes a build issue when relocation is not allowed:
relocation R_X86_64_32 against '.rodata' can not be used when making a shared object
Change-Id: Ica3e90c926847bc384e818d7854f0030f4d69aa0
Removal of parameters to and code in calc_frame_boost() that is no
longer required.
No change to results from previous patch.
Change-Id: Ic92da35613fdc247d22fddf24d09679fc5329017
The decay accumulator clause covers similar ground to the
new clause that tests the accumulated second reference error
so it has been removed to reduce complexity.
Change-Id: I4ec1cce32d72bd4ee463ad7def2831a68447d525
Add a clause to the breakout test for alt ref groups that
examines the size of the accumulated second reference
frame error compared to the cost of intra coding.
This clause causes a reduction in the average group length for many
clips. Alongside the change to the group length the minimum
boost is increased.
On balance the results are positive for psnr and psnr-hvs
but is negative for ssim/fast ssim for the smaller image formats.
Strong gains on some harder clips (eg ducks take off (midres) ~20%,
husky (lowres) 6-17%. Most of the negative cases are lower motion
clips. Subsequent patch hopefully will help with those.
Change-Id: Ic1f5dbb9153d5089e58b1540470e799f91a65dc4
Fix/cleaup the conditioning for usage of the reuse-lowres
partition feature.
Replace the non-reference condition with the top temporal
layer, and put this condition in the speed feature.
This prevents doing update_partition_svc() on every
VGA frame, instead it will now only do update for VGA in
the top temporal layer frames.
Also this makes it easier to test/enable this feature
for lower layer temporal frames.
Change-Id: Ia897afbc6fe5c84c5693e310bcaa6a87ce017be5
For new VP9 only content type adjust the rate distortion and ARF
filter based on the relative spatial variance of the source and
reconstruction.
In regards to the RD loop the method favors modes where the
reconstruction variance is similar to the source variance. However it
is currently only applied to regions where the source variance is quite
low.
For very low variance blocks it applies a further bias against intra
coding and large prediction block sizes (the later in particular limit
the usefulness of the loop filter).
The final part of this change is to lower the strength of the ARF
filter for blocks where the source has very low spatial variance, to
encourage some low amplitude texture or noise to pass through
the filter.
This change improves the retention of film grain and fine noise /
texture in spatially flat regions, but as expected causes a significant
drop in PSNR on many clips. This is to be expected because similar
but misaligned noise or texture will give a lower PSNR than a flat
noise free reconstruction. However, it is worth noting that most clips
show a strong gain in FAST SSIM.
The features are enabled on the vpxenc command line by setting
--tune-content=film.
VPX_ENCODER_ABI_VERSION bumped for this change and cvbr.
Change-Id: I26a4e4edfa3dc5cacead82fa701fe7a9118ccd0a
Removed three parameters that are no longer needed in calls
to calc_arf_boost() and associated minor changes.
No impact on encode results.
Change-Id: Ieaf31d0d2e1990b99cf69647170145a1bbfbb9fb
For choose_partitioning (speed >= 6): avoid computation
of minmax variance for non-reference frames in SVC.
Existing condition only avoided this for speed >= 8.
Combine that existing logic with non-reference condition.
Small speedup (~0.5-1%) for 3 layer SVC,
neutral change on avgPSNR/SSIM metrics.
Change-Id: I3e9f3a1af0647b15e475cf170d9402908d672ee5
Release frame buffers for non-ref when the decoder is destroyed.
Enable the non ref test.
BUG=b/68819248
Change-Id: Id87ef3b0a62318f9812e927cd957c05c859047fa
For SVC with 3 spatial layers:
Add feature to copy/upscale partition from middle spatial layer
to the upper/highest resolution, when superblock sad is not high.
Enabled for speed >= 7 and only for non-reference frames.
Speedup ~3-4%, small loss in avgPNSR/SSIM of ~1%.
Change-Id: I7f0a2716c0fde28bade0f86159d11b7e31d6ab8d
For a chosen interval "i" the existing arf boost calculation examined frames
+/- (i-1) frames from the current location in the second pass.
This change checks to make sure that the forward search does not extend
beyond the next key frame in the event that the distance to the next key
frame is < (i - 1).
Small metrics gains on all our test sets but these are localized to a few clips
(e.g. midres set psnr-hvs sintel -2.59% but overall average was only -0.185%)
Change-Id: I26fc9ce582b6d58fa1113a238395e12ad3123cf6
The new test will run a SVC bitstream which has non ref frames.
It checks the number of buffer acquired and released to make sure all
external frame buffers are released.
Add a new test bitstream:
vp90-2-22-svc_1280x720_1.webm
which has 400 frames in total, and 1 spatial layer and 2 temporal layers.
There is one non ref frame every other frame.
Disabled for now. Will be enabled with the fix.
BUG=b/68819248
Change-Id: I0515336fd9809a9e1fceba90e4dce53dabaf53a5
Added command line control of Corpus VBR.
The new corpus vbr mode is a variant of standard
VBR (end-usage=0) where the complexity distribution
mid point is passed in rather than calculated for a specific
clip or chunk.
The new variant is enabled by setting a new command line
parameter --corpus-complexity to a zero value. Omitting
this parameter or setting it to 0 will cause the codec to use
standard vbr mode.
The correct value for a given corpus needs to be derived
experimentally using a training set such that the average
rate for the corpus is close to the target value.
For example our using our low res test set with upper and lower
vbr limits of 50%-150% and a corpus complexity value of 650
gives a similar average data rate across the set to using standard
vbr. However, with the corpus mode easier clips will be allocated
fewer bits and harder clips more bits rather than having the same
rate target for all.
Change-Id: I03f0fc8c6fb0ee32dc03720fea6a3f1949118589
For nonrd_pickmode: if early_term is set there should be
no need to include UV in rdcost (when color_sensitivity is set).
Neutral change on RTC and RTC_derf metrics, for speed >= 5.
No change for ytlive metrics.
Very small speed gain (~0.5%) on some clips with strong color content.
Change-Id: Ifc00928ecd935fc71e94935ceef0ae7481249f07
Allow for compound prediction mode in nonrd_pickmode for ZEROMV.
For real-time encoding, 1 pass with non-zero lag-in-frames.
Added speed feature to control the feature.
Enabled for speed >=6 for now, under VBR mode.
avgPSNR/SSIM metrics positive on ytlive set, for speed 6:
some clips up by ~3-5%, some clips neutral gain, average gain
across clips is ~1%.
Small/negligible decrease in speed.
Change-Id: I7a60c7596e69b9a928410c5ee2f9141eecd8613d
Even though frame_size is calculated in uint64_t, it winds up in an int
size value.
This was exposed with the msan test because the memset is called with
(int)frame_size, leading to a segfault.
Change-Id: I7fd930360dca274adb8f3e43e5e6785204808861
Change type of sum_square_error from int64_t to uint32_t.
Change type of sum_error from int64_t to int32_t.
This reduces the stack usage from ~131K to ~87K.
BUG=b/68362457
Change-Id: I147d7c7b226bceb4f0817bb86848e1fa9d9ac149
swap '{' and c-style comments removing a few redundant ones along the
way; covers most leftovers from the clang-tidy run against an
x86_64-linux config.
Change-Id: I67a45596f80a12389faca49c5be440875092a7df
Changed the intrinsics to perform summation similiar to the way the assembly does.
The new code diverges from the assembly by preferring unsaturated additions.
Results for haswell
SSSE3
Horiz/Vert Size Speedup
Horiz x4 ~32%
Horiz x8 ~6%
Vert x8 ~4%
AVX2
Horiz/Vert Size Speedup
Horiz x16 ~16%
Vert x16 ~14%
BUG=webm:1471
Change-Id: I7ad98ea688c904b1ba324adf8eb977873c8b8668
Set adaptive_row_thresh_mt = 1 at speed >= 7,
for svc when multi-threading is used with row-mt.
This allow the adaptive_rd_thresh feature to be used
in the nonrd-pickmode.
~1-2% speedup for SVC encoding with small quality
loss (< 0.6%) on RTC set.
Change-Id: Iab9878dff117bccdaef3e4d0645165db9808cdfc
Disable cyclic refresh if ROI is used and add flag to properly handle
the static_thresh deltas.
Remove the ROI test for cyclic refresh (it's allowed but disabled if ROI
is used).
Add an example in vpx_temporal_svc_encoder.c. Turned off by default.
BUG=webm:1470
Change-Id: Ief9ba1d7f967bc00511b412b491c3f70943bfbda
Note this change will trigger the different C version on SSSE3 and
generate different scaled output.
Its speed is 2x compared with the version calling vpx_scaled_2d_ssse3().
Change-Id: I17fff122cd0a5ac8aa451d84daa606582da8e194
Small inncrease the sad_thresh1, avoids some false
detection of possible scene changes within lag.
Small improvement in few clips on ytlive, otherwise neutral change.
Change-Id: Ia79b53bb657bbce65a7aac7d20666b6373d5af8b
Expose the threshold for setting key frame on cut,
and increase it for speed 5.
Also small adjustment to min_thresh.
No change in overall metrics or fps.
Small quality improvement and lower encode time on scene cuts.
Change-Id: I36e06ff3b26b6c29aede39c23fce454525fc9026
Small increase in threshold for the 1 pass VBR datarate tests.
Needed due to commit:
<017257a Adjustment to scene detection and key frame>
Change-Id: I28b3bd7db2192a8cc2bccc3cb0e3b8dbb910ca16
The initial allocation of bits in the two pass code to each frame
should be within the min max limits on the command line. However,
when forming an ARF group the cost of the ARF is shared by frames
in that group such that the residual bits for a frame could drop below
the min value. This change prevents the minimum being re-applied
after the cost of the ARF has been deducted as this may otherwise
cause low rate sections to overshoot their target.
Test runs comparing to a baseline run with min and max section pct
0-2000% vs one closer to the YT use case (50-150%) suggest that
this fix not only results in better rate control but also gives a better
rd outcome.
For example the HD set vs 0-2000% baseline (opsnr, ssim).
Old code (50-150): +0.751, +1.099
New code(50-150): +0.241, -0.009
Change-Id: I715da7b130bf53ba8aa609532aa9e18b84f5e2ef
Let it test extreme inputs and all filter types.
In the future ConvolveTest should test regular 8-bit functions in
high bitdepth mode.
Change-Id: I1042564d1d390589ca203070fe332c6da3315d75
For 1 pass vbr: use higher threshold on avg_sad
and force key frame under scene cut detection if
above the threshold. Allow it for speed >= 6 for now,
since it does not use the full nonrd_pickmode partition
(as in speed 5).
Improves quality somewhat on scene cut frames.
Neutral on overall metrics and fps for speed 6 on
ytlive set.
Change-Id: I12626f7627419ca14f9d0d249df86c7104438162
Change to the bit allocation within a GF/ARF group.
Normal VBR and CQ mode allocate bits to a GF/ARF group based of the mean
complexity score of the frames in that group but then share bits evenly between
the "normal" frames in that group regardless of the individual frame complexity
scores (with the exception of the middle and last frames).
This patch alters the behavior for the experimental "Corpus VBR" mode such that
the allocation is always based on the individual complexity scores.
Change-Id: I5045a143eadeb452302886cc5ccffd0906b75708
This patch makes further changes to support an experimental
corpus wide VBR mode that uses a corpus complexity
number as the midpoint of the distribution used to allocate bits
within a clip, rather than some average error score derived from the
clip itself.
At the moment the midpoint number is hard wired for testing and
the mode is enabled or disabled through a #ifdef. Ultimately this
would need to be controlled by command line parameters.
Change-Id: I9383b76ac9fc646eb35a5d2c5b7d8bc645bfa873
vpx_convolve8_avg works by first running a normal horizontal filter then a
vertical filter averages at the end.
The added vpx_convolve8_avg_avx2 calls pre-existing AVX2 code for the
horizontal step.
vpx_convolve8_avg_vert_avx2 is also added, but only uses ssse3 code.
Change-Id: If5160c0c8e778e10de61ee9bf42ee4be5975c983
This reverts commit 9311ef18b4b4eff0da3adf9d702a34f489a270ff.
Reason for revert:
Notice small regression in some clips.
Will revisit in another change.
Original change's description:
> Speed >=5 real-time: add TM intra mode for high_source_sad.
>
> Small/neutral change in metrics or speed for ytlive.
> Some improvement in quality on frames with big content change.
>
> Change-Id: Ib3b0703a5f28ea6710e90324436e27598ab7384d
TBR=marpan@google.com,builds@webmproject.org,jianj@google.com
Change-Id: I9d8ec5195bb05ddf329d325699355185affb9b13
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
For 1 pass vbr: increase min_thresh slightly, and also add
condition on golden/arf update for using full nonrd_pick_partition.
Reduces possible false detection for scene cut detection.
Neutral/small change in metrics or speed for speed 5.
Change-Id: I388f4d9a56e3cc763e0148338c1bc0381e58ad76
Small/neutral change in metrics or speed for ytlive.
Some improvement in quality on frames with big content change.
Change-Id: Ib3b0703a5f28ea6710e90324436e27598ab7384d
Lower SAD threshold to select non_rd pickmode partition
at superblock level more often.
Small gain in metrics, small/negligible decrease in speed.
Change-Id: I0f728236b91a604e4ca7e02039adc54d5985c4dc
For 1 pass vbr speed >= 6: when REFERENCE_PARTITION is selected,
avoid doing the full nonrd_pickmode based partition.
No change in overall metrics or speed.
Reduces encode times on scene cuts by 10-20%.
Change-Id: I0310b1610cc1c83793a509e0a9059840e8f18308
For 1 pass vbr mode:
On no-show_frame/ARF: instead of skipping alt_ref_frame
completely in mode testing, allow for checking (0, 0) on alt_ref.
Small gain in metrics, ~0.18%, no change in speed.
Change-Id: I32a3c24faca64ab70dd5091071a0dc301db7dd1e
For 1 pass vbr: when significant content/scene change is detected
(high_source_sad = 1) reduce/turnoff the additional qdelta on the
active_worst_quality. This helps somewhat to reduce the occurrence
of large frame sizes and large encode times.
Allow it only when use_altef_onepass is enabled.
Neutral/no change on metrics.
Change-Id: I1dd97dd2ab892d65f707b841b27a5de300b714ea
For speed 6 real-time mode: use adapt_partition
on ARF frame instead of REFERENCE_PARTITION (which is slower).
This requires enabling compute_source_sad_onepass for no-show_frames.
Speedup of ~3-5% on some clips that heavily use ARF,
small loss (~0.2%) in quality on ytlive set.
Change-Id: Ib50acc97df06458244a6ac55d2bd882c30012536
Speed comparing with the one calling vpx_scaled_2d_neon()
~1.7 x in general
~2.8x for BILINEAR filter
BUG=webm:1419
Change-Id: I8f0a54c2013e61ea086033010f97c19ecf47c7c6
Scale 3x3 block instead of 16x16 block in each loop. Disabled by
default.
Benefits:
1. Reduced number of different phase_scaler from 16 to 3.
Optimization code will be smaller and faster.
2. Maximum phase_scaler drifting will be reduced from 5/16 to 1/24.
(The drifting is 1/(3*16) in each step.)
BUG=webm:1419
Change-Id: I59a1f7496d89a1b090498c935d30cfcf1d0c282b
For real-time mode. Move the switch to fixed partition
for is_src_frame_alt_ref so all speeds may use it
if use_altref_onepass is set.
Improves metrics by ~2% for ytlive set at speed 4
(where use_altref_onepass is currently used).
Change-Id: I033240386598c9dbd0364da89ccbcca64bc663ee
Only has effect when sf->use_altref_onepass is enabled,
as in that case scene detection is skipped for non-show frame
and so high_source_sad does not get reset to 0.
No change in metrics or speed.
Change-Id: I421f066d239341449c18826089e1810b9fc5967f
Add stats for past ARF usage, and use it to disable
ARF usage based on some conditions.
Overall improvement on ytlive set, reduces the regression
on the problem clips for this feature.
Only affects when sf->use_altref_onepass is enabled
(currently off by default).
Change-Id: I66267f227ea132dc86acb730e9882f85bead2cdb
This reverts commit 535b7b915ae5574db2f95632243cc5bee865f02e.
This is actually used in CBR to reset the rate control if high source sad is detected.
Original change's description:
> Remove the speed condition on scene detection in 1 pass code.
>
> Scene detection is used for VBR mode and for screen_content mode.
>
> It was also enabled for CBR mode via the speed condition,
> but currently the analysis in the scene detection is not used
> in CRB mode (similar computations are done locally at superblock level
> when the source_sad feature is enabled).
>
> For 1 pass code.
> No change in behavior. Small speed gain, ~0.5%.
>
> Change-Id: I59991d7ef2af320bea7af4b907596e057affa42f
TBR=marpan@google.com,builds@webmproject.org,jianj@google.com
Change-Id: Ib4e6b02047f75632503e7b0fc870af97fa9291c3
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Scene detection is used for VBR mode and for screen_content mode.
It was also enabled for CBR mode via the speed condition,
but currently the analysis in the scene detection is not used
in CRB mode (similar computations are done locally at superblock level
when the source_sad feature is enabled).
For 1 pass code.
No change in behavior. Small speed gain, ~0.5%.
Change-Id: I59991d7ef2af320bea7af4b907596e057affa42f
'iter' parameter is being checked for NULL in every call to
decoder_get_frame which is quite pointless because it is always
going to be NULL unless the application changed it. The code works
as described only because vp9_get_raw_frame returns -1 on all
subsequent calls after the first.
Change-Id: Ic736b9e8fe36fc1430fc11d6a9b292be02497248
* changes:
Remove the unnecessary cast of (int16_t)cospi_{1...31}_64
Remove the unnecessary upcasts of (int)cospi_{1...31}_64
Change cospi_{1...31}_64 from tran_high_t to tran_coef_t
Add the condition frames_since_golden > 0 to the
early exit check for ARF usage in nonrd_pickmode.
This improves quality of first frame following ARF, where
frame_since_golden = 0.
Small/neutral gain in metrics for speed 6, neutral change in speed.
Only affects when USE_ALTREF_FOR_ONE_PASS is enabled.
Change-Id: I82e73e6ff6fc849e5ca5448563cb8a0515fe0cdc
A new bug was introduced in a80bdfd "Change sinpi_{1,2,3,4}_9 from
tran_high_t to int16_t". Reverted the change in this file.
BUG=webm:1450
Failed test C/TransHT.AccuracyCheck/26.
Change-Id: Id001f57aad811803ef7d367d2b2bc008d8499991
Modify simple_block_yrd condition in nonrd_pickmode for SVC:
allow it to be used also on base temporal_layer, only when
spatial_layer > 1 and block size < 32x32.
Speed up of about ~2% for 3 layer SVC, with little/negligible
loss in quality.
Change-Id: I7734bdae51cf51f22b96f6b2b27da20ea1d84344
Fix the setting to frames_till_gf_update_due, and
adjust the limit value.
Only affects when USE_ALTREF_FOR_ONE_PASS is enabled.
Neutral change to metrics and speed for ytlive.
Change-Id: I266d9a00b36221bc8602fa2746d4e8a8f7d4dfae
Only when USE_ALT_REF_ONE_PASS is enabled (off by default).
Force fixed partition to 64x64 when is_src_alt_ref_frame is true,
and don't force early exit for some modes in nonrd_pickmode
for ARF noshow frames.
Small gain ~0.2% on ytlive metrics for speed 6.
Neutral speed difference.
Change-Id: I27eb6622d0453c09a06ccdc3b16368762474d11d
Add datarate test, for both VBR and CBR mode, with the
frame_parallel_decoding mode disabled (and error_resilience off).
Change-Id: I54feec3248a68ecff4bef8d9a31bb1616fab77df
This reverts commit afee58f2c4159172f5340f2c7d3e8041cfa0eb91.
This causes ~8x slowdown in 4:3 in the C-code
Change-Id: I60a7ead12dc4ec1548b1b12cfe4b0be42ef04e0e
In the new AUTO mode, restrict the minimum alt-ref interval and max column
tiles adaptively based on picture size, while not applying any rate control
constraints.
This mode aims to produce encodings that fit into levels corresponding to
the source picture size, with minimum compression quality lost. However, the
bitstream is not guaranteed to be level compatible, e.g., the average bitrate
may exceed level limit.
BUG=b/64451920
Change-Id: I02080b169cbbef4ab2e08c0df4697ce894aad83c
Removed inline for GP load-store in case of (__mips_isa_rev >= 6)
Created one define LD_V for vector load and ST_V for vector store
Change-Id: Ifec3570fa18346e39791b0dd622892e5c18bd448
Also add column headings so that the output can still be parsed if the
set of headers changes later.
Change-Id: I4beaf266521e093db4acf5f715b18fdfb7e3d1cd
This reverts commit 8c42237bb200253931c49e2c530838f3a877dd65.
Because ssse3 code is used for the reference, the qcoeff and dqcoeff
reference buffers must be aligned.
Original change's description:
> quantize avx: copy 32x32 implementation
>
> Ensure avx and ssse3 stay in sync by testing them against each other.
>
> Change-Id: I699f3b48785c83260825402d7826231f475f697c
Change-Id: Ieeef11b9406964194028b0d81d84bcb63296ae06
Scale 3x3 block instead of 16x16 block in each loop.
Benefits:
1. Reduced number of different phase_scaler from 16 to 3. Optimization code
will be smaller and faster.
2. The maximum phase_scaler drifting will be reduced from 5/16 to 1/24.
(The drifting is 1/(3*16) in each step.)
BUG=webm:1419
Change-Id: Ibb9242a629ddb03e1ff93b859bece738255e698c
The intra mode rd penalty was implemented as a rate penalty.
Code was added to scale the penalty according to block size but
this was not done correctly for the SB level or sub 8x8.
The code did a weird double scaling in regard to bit depth that
has been removed. Given that it is a rate penalty the bit depth
should not matter.
This bug fix improves average metrics on our standard test
sets by about 0.1%
Change-Id: I7cf81b66aad0cda389fe234f47beba01c7493b1e
Move class VpxScaleBase to new file test/vpx_scale_test.h.
Add new file test/vp9_scale_test.cc with ScaleFrameTest.
BUG=webm:1419
Change-Id: Iec2098eafcef99b94047de525e5da47bcab519c1
This header doesn't build on g++ v6 as it's a C and not C++ header
(_Atomic is not a keyword in C++11). Since the C and C++ invocations
cannot be guaranteed to point to the same underlying atomic_int
implementation, remove support for them and use compiler intrinsics
instead.
BUG=webm:1461
Change-Id: Ie1cd6759c258042efc87f51f036b9aa53e4ea9d5
Makes main thread wait for the filter level to be picked to avoid a race
between the LPF thread and update_reference_frames(). This also
re-enables the failing tests under thread_sanitizer where this data race
was detected.
BUG=webm:1460
Change-Id: I7f5797142ea0200394309842ce3e91a480be4fbc
Fixes issue on iPad Pro 10.5 (and probably other places) where threads
are not properly synchronized. On x86 this data race was benign as load
and store instructions are atomic, they were being atomic in practice as
the program hasn't been observed to be miscompiled.
Such guarantees are not made outside x86, and real problems manifested
where libvpx reliably reproduced a broken bitstream for even just the
initial keyframe. This was detected in WebRTC where this device started
using multithreading (as its CPU count is higher than earlier devices,
where the problem did not manifest as single-threading was used in
practice).
This issue was not detected under thread-sanitizer bots as mutexes were
conditionally used under this platform to simulate the protected read
and write semantics that were in practice provided on x86 platforms.
This change also removes several mutexes, so encoder/decoder state is
lighter-weight after this change and we do not need to initialize so
many mutexes (this was done even on non-thread-sanitizer platforms where
they were unused).
Change-Id: If41fcb0d99944f7bbc8ec40877cdc34d672ae72a
Neutral on rtc set for speed 8. Neutral on ytlive for speed 5.
Saves some computation cycles but no speed gain observed on Pixel.
Change-Id: I34c4642cd543aa89c5b9c4bff6b7113577c64c91
This reverts commit df9ce12259a4e866feeb580d2e0cf9648f60d3b5.
Reason for revert:
Re-enabled tests still fail tsan in high bitdepth.
Original change's description:
> Re-enable disabled tests under TSan.
>
> These tests point to an already-fixed bug, this should no longer have a
> data race.
>
> BUG=webm:1049
>
> Change-Id: Iaedc5db8df99362bdc501b70ff7fdebf8756fdb8
TBR=jzern@google.com,pbos@chromium.org,builds@webmproject.org
# Not skipping CQ checks because original CL landed > 1 day ago.
Bug: webm:1049
Change-Id: I232f1f7726bf795b301abfb2e07cad6756642e53
Rev d147771 fixed the test failure. So remove the resolution condition
for using source_sad in speed 6.
BUG=webm:1452
Change-Id: I1efba97e1ef5bd4de5f886299f6fcb907187abcd
Enable adapt_partition for vbr mode for speed 6.
This allows the usage of the pickmode-based partition
(used in speed 5), but only selectively for superblocks
with high source sad, otherwise the faster variance based
partition scheme is used.
For speed 6 on ytlive set: avgPSNR/SSIM metrics up by ~0.6%,
several clips up by ~1.5%. Small/negligible decrease in speed.
Change-Id: I12f3efef6b3e059391de330fdbe5a44c2587f1f8
For SVC at speed >= 7: only use the improved mv search
on base spatial layer, if top layer resolution is above 640x360.
~2.3% speedup
Small/negligible loss in avgPSNR metrics on rtc set.
Change-Id: Iaef75a57ebf1c248931bc1aa28d20b7fecac1851
This reverts commit f60d1dcd3de46f72bafc5eeef481bd1a4e203301.
Reason for revert: <INSERT REASONING HERE>
Failures in AVX/VP9QuantizeTest in nightly tests.
Original change's description:
> quantize avx: copy 32x32 implementation
>
> Ensure avx and ssse3 stay in sync by testing them against each other.
>
> Change-Id: I699f3b48785c83260825402d7826231f475f697c
TBR=slavarnway@google.com,johannkoenig@google.com,builds@webmproject.org
Change-Id: Ibd38636212269328317dd0721be9d25452113d1c
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
For speeds < 7, increase threshold that controls the split
of 16x16->8x8 blocks, for resolutions 720p and higher.
Minor change for speed 5 (since it uses reference partition scheme
which only uses variance partition as first step).
For speed 6: ~0.5% increase in avgPSNR/SSIM metrics on ytlvie set.
No change in speed.
Change-Id: I5126580973201538d8ca26a9256b93c4d11d685b
Still does not pass tests. Does match the previous assembly, although
saving the sign before multiplying is dubious.
Change-Id: Ia163f18c755aba542d6e93f7bf7343184660df5a
Adds an early exit based on ptest. Slightly slower than ssse3 in the
full case because of the extra check, but potentially faster if lots of
rows can be skipped.
Very close in speed to the assembly.
Can run in 32 bit, unlike the assembly. Allows reworking the function
prototype to use structs.
Change-Id: If80e2b9ba059370a4cad3c973196e82a97b4330e
Add 1 if negative to get dqcoeff to round towards zero.
10-15% faster than converting to positive before shifting.
Change-Id: I01a62fd0c9bca786b6885b318bd447bb9229903d
About 4x faster when values are below the dequant threshold and 10x
faster if everything needs to be calculated.
Both numbers would improve if the division for dqcoeff could be
simplified.
BUG=webm:1426
Change-Id: I8da67c1f3fcb4abed8751990c1afe00bc841f4b2
This feature is used for the CBR RTC encoding mode
at speed >= 6. This change will exclude it for VBR mode.
For speed 6 live encoding (VBR):
avgPSNR/SSIM metrics on ytlive set up by ~1% (few clips up by 2/3%).
No change in speed.
Change-Id: I1a0dd94c334f7df309ab5a48d477d7e25355b798
* changes:
quantize: ignore skip_block in arm
quantize: ignore skip_block in x86
quantize fp: ignore skip_block in arm
quantize fp: ignore skip_block in x86
This should probably be handled before vp9_regular_quantize_b_4x4 even
gets called.
Fixes an assert resulting from removing skip_block from the quantize
functions.
BUG=webm:1459
Change-Id: I7f52b53f959b4654b3d4517ebda31a678f4d0fde
This condition is handled before this code is reached. The ssse3 version
of the function has always crashed when attempting to handle the
skip_block condition.
Add assert() and comments regarding the usage of skip_block.
Removing the parameter is a fairly involved process so leave it be for
the moment.
Change-Id: Ib299f6fc6589d7ee102262cc74a7aeb60110bc5a
Despite abs_coeff being a positive value, all the other implementations
treat it as signed which simplifies restoring the sign.
HBD builds cast qcoeff to avoid a visual studio warning. Match
vp9_quantize.c style of casting the entire expression.
Change-Id: I62b539b8df05364df3d7644311e325288da7c5b5
Having a very low "lag_in_frames" value could cause the encoder to create
incorrect / corrupt ARF groups including displayed frames that update the
ARF buffer and false overlay frames that are coded at low rate but are not
actually overlays of a real ARF frame.
This is linked to a reported unit test "slow down" where the chosen parameters
(lag of 3 frames) gave rise to such "broken" ARF group(s).
See also BUG=webm:1454
Change-Id: If52d0236243ed5552537d1ea9ed3fed8c867232c
Having a very small value for "lag_in_frames" can result in
corrupt arf groups including displayed frames that update
the arf buffer and fake overlay frames that are not in fact
overlays of real arfs but are nevertheless starved of bits.
Leaving lag_in_frames at the default of 25 for these 5 frame two
pass VBR tests should now give rise to a valid ARF coding pattern
as follows:- K(ey), A(rf), N(ormal), N, N, O(verlay).
This change is part of a response to BUG=webm:1454 where broken
arf groups interacted badly with a change that corrects for large rate
misses. However, it may still in some cases increase encode time by
virtue of the fact that the unit test now codes a correct coding pattern
with "hidden" ARF frames.
Change-Id: Ifd0246a4c1d0be247247c754024d7a4ed5f66a6b
Some clips in nightly unit test exhibiting significant encoder slowdown which
appears to bisect to Change-Id: I692311a709ccdb6003e705103de9d05b59bf840a.
The above change allowed for emergency iterations of the recode loop and
adjustment of the Q range if there is a large rate miss.
This patch disables the above adaptation for cases of cpu_speed >= 3 or more
specifically where cpi->sf.recode_loop >= ALLOW_RECODE_KFARFGF.
For speeds >= 3 the code does not currently run a dummy bit pack operation
inside the recode loop. Without this dummy pack operation there is no up to
date estimate of the current frame's size to use as a basis for assessing the
requirement for a recode. In practice it was using the previous frames size (or 0
for the first frame) which could cause odd behavior.
If we require the emergency rate correction added in Change-Id: I6923.. for
the higher speed settings it will be necessary to enable the dummy pack
which will in turn hurt encode speed.
BUG=webm:1454
Change-Id: I4fb3c6062ca9508325a6f31582f8e80f1a9b126f
Change legacy vp8/9_write_yuv_frame to vpx_write_yuv_files.
Delete some flags that can be enabled during build.
To enable writing denoised YUV, use the following command line:
CFLAGS='-DOUTPUT_YUV_DENOISED' ./configure
--enable-vp9-temporal-denoising
For skinmap, use CFLAGS='-DOUTPUT_YUV_SKINMAP'
Change-Id: I236974ac8b3cf279d20c4dc7f6162d8b480b6528
The result of the xor operation is unsigned. If coeff was negative,
this results in an unsigned value - INT_MIN.
Change-Id: I1f1edeaa6de1f4c68b848e8a82a666d390b749f0
Actual frame size and bitrate is all 0 when using SVC sample encoder
with sl = 1 because the stats are set in parse_superframe_index which
will not caculate properly when sl = 1 since there is no superframe.
Use pkt->data.frame.sz instead when sl = 1.
Change-Id: I93f5e98a4c779e32b007e1564ba5396af9e34ad6
Use input with a narrow range because the filter only applies when the
frames are similar.
Run CompareReferenceRandom more times. Especially before narrowing the
input range, the filter frequently did not apply.
Change-Id: Ie249bedf6d0d33dfa5884611cb1835788e418b38
this test fails with the configuration similar to the assembly prior to:
d52cb5972 quantize: copy ssse3 optimizations to intrinsics
BUG=webm:1458
Change-Id: Idc5c0b84c0598259fc49609a9f0756de531d3baf
Change the denoiser frame buffer management for SVC to more generally
handle the layer patterns in SVC (where last is not always refreshed).
This change is only for SVC with denoising and is bitexact.
Change-Id: Ic2b146a924cdf6e7114609158afa3d4880fe3fae
Testing of 4k videos encoded with a fixed arbitrary chunking interval
uncovered a bug where by if a chunk ends 1 frame before a real scene cut,
the next chunk may be encoded with two consecutive key frames at the start
with the first being assigned 0 bits.
This fix insures that where there is a key frame group of length 1 it is
at least assigned 1 frames worth of bits not 0.
See also patch Change-Id: I692311a709ccdb6003e705103de9d05b59bf840a
which by virtue of allowing fast adaptation of Q made this bug more visible.
BUG=webm:1456
Change-Id: Ic9e016cb66d489b829412052273238975dc6f6ab
Created inline functions highbd_butterfly_cospi16_sse2()
and highbd_butterfly_cospi16_sse4_1()
BUG=webm:1412
Change-Id: Icbc53a73712b6207379872a5e88d0a4d09e2322a
With skip block the neon is about twice as fast as C.
The neon has no shortcut for coeff < zbin so it always takes the
same amount of time. Even if the C can take the shortcut, it is over
twice as fast in neon. If it can't, that gap increases to over 10x.
BUG=webm:1426
Change-Id: I400722146c1b5a5f6289f67d85fd642463d2bfc6
Fairly minor differences from sse2. pabsw and psignw are the big gains.
Also re-uses some values in eob calculation to avoid an extra pcmp.
Fixes test failures in HBD and OS X builds.
Allows using it in 32bit builds, where it is about 40% faster than sse2.
Substantially faster than the assembly for skip_block. 10-20% faster the
rest of the time.
Change-Id: If783bb3567e561e47667e10133b9c84414a334e2
When adapt_partition_source_sad is enabled (currently only at
speed 6 for resoln <= 360p): use lower subsize (8x8 instead of 16x16)
for nonrd_select_partition on 32X32 blocks.
And force avoiding rectangular partition checks in
nonrd_pick_partition for speed >= 6.
Small increase ~0.5 in metrics for speed 6 on rtc_derf,
no change in speed.
Change-Id: Id751bc8f7573634571b2d6f5e29627cd5cebccae
Prepare for high bitdepth 16x16 idct sse4.1 code.
Just functions moving and renaming.
BUG=webm:1412
Change-Id: Ie056fe4494b1f299491968beadcef990e2ab714a
vpx_sub_pixel_variance32xh_avx2() and
vpx_sub_pixel_avg_variance32xh_avx2
see:
17fae3a Change to use correct check for halfpel
Change-Id: Ib0741c5c2fd011e9650ca62b76009f1b59fdbe4c
Enable fast adaptation of Q when there is a large overshoot
for the #ifdef AGGRESSIVE_VBR test case.
AGGRESSIVE_VBR is not currently enabled by default.
Change-Id: I7240bb6589795964b6b0b66df4468e4f21504e0f
Originally, for the purpose of keeping a fast first pass, the first-pass
stats between row_mt_mode = 0 and row_mt_mode = 1 are not bit exact, but
that difference is very small that doesn't cause a mismatch between the
final bitstreams. However, if the encoder changes, this minor difference
may cause a mismatch. Thus, this patch always forces the first pass to
be bit exact.
BUG=webm:1453
Change-Id: I2b67cf529dee81f660f9d9e7fe9a60ea3c7b12b8
For 1 pass CBR mode:
Apply the logic for dropping (and re-adjusting rate control)
due to large overshoot to the case of non-screen content when
drop_frames_allowed is enabled.
For the non-screen content case: add additional condition that
rate correction factor is close to minimum state, and flag to
constrain the frequency of the dropping.
Also handle the case of temporal layers and multi-res encoding.
Add some flags/counters to the layer context for temporal layers.
For multi-res: drop due to overshoot is checked on lowest stream,
and if overshoot is detected we force drops on all upper streams
for that frame.
This feature is to avoid large frame sizes on big content
changes following low content period.
No change in behavior for screen_content_mode = 2.
Change-Id: I797ab236cbbf3b15cad439e9a227fbebced632e6
This replaces commit aa1c4cd, which has a bug and was reverted in
commit 3c73e58.
The bug is caused by rounding -step1[5] in highbd_idct8x8_12_half1d().
Change-Id: I37b3a5f0d91815f2dc570209091dc6626fd178a8
With skip block or coeff < zbin it is about twice as fast as C.
If most coeff values are > zbin it is about 10-15x as fast as C.
BUG=webm:1426
Change-Id: I5d3c007b014a372d5ef0882b39bb48983b4131c7
When the superblock partition is based on the nonrd-pickmode,
we need to avoid the denoising. Current condition was based on
the speed level. This change is to make the condition at the
superblock level, as the switch in partitioning may be done at
sb level based on source_sad (e.g., in speed 6).
Change-Id: I12ece4f60b93ed34ee65ff2d6cdce1213c36de04
This reverts commit c9266b85476aadf078238b7bde3c36bf7953e11c.
Disable source_sad when resolution > 1080P. The test should
pass now.
BUG=webm:1452
Change-Id: I72dde88e66590ff9e41da5e5dd83f5550a83f082
left shifting a negative value is undefined; quiets a ubsan warning.
this is applied to a constant, no change in the generated code.
Change-Id: I595f0ff7904ef025e07bb80234293d958dc9f254
This reverts commit aa1c4cd140007ea5b4be99732fbb23d1fd8cf2b5.
This fails the following tests with extreme input coefficients:
SSE2/InvTrans8x8DCT.CompareReference/0
SSE2/InvTrans8x8DCT.CompareReference/2
previously the optimized path was skipped in this range
Change-Id: I9af015a46eba96208834a219fafd651d37556a80
Move the source_sad feature to speed 6 (from speed 7), and
add speed feature to switch from the variance-based partition
to reference_partition (which uses nonrd-pickmode for bsize selection)
if source_sad is high.
Currently used only for speed 6 for resoln <= 360p.
About 4-5% improvement on 360p in RTC set.
Some speed slowdown, but still ~30% faster than speed 5.
Change-Id: Ib0330ee5fe9fdd2608aed91359a2a339d967491c
This reverts commit 03f5e300d69d368290305e19cc66bac8b0ea1ff8.
This causes test failures under OSX:
SSSE3/VP9QuantizeTest.EOBCheck/0
SSSE3/VP9QuantizeTest.OperationCheck/0
Change-Id: I122732717ead1f7af5b04c529a6948e382e5e59b
allow the right shift to operate on 64-bits, this matches the rest of
the implementations
previously:
b0f1ae147 vpx_get16x16var_avx2: correct cast order
Change-Id: I632ee5e418f3f9b30e79ecd05588eb172b0783aa
allow the right shift to operate on 64-bits, this matches the rest of
the implementations
missed in:
6acd061aa variance_avx2: sync variance functions with c-code
Change-Id: Icae436b881251ccb9f9ed64fcbf8d358c58a4617
For 8-bit the subtrahend is small enough to fit into uint32_t.
For 10/12-bit apply:
63a37d16f Prevent negative variance
previously:
47b9a0912 Resolve -Wshorten-64-to-32 in highbd variance.
c0241664a Resolve -Wshorten-64-to-32 in variance.
Change-Id: I181c85f0b9a03da37c2e8b89482d48aa3dbc0aee
Avoid unsigned overflow warning:
unsigned integer overflow: 19974 - 32703 cannot be represented in type
'unsigned int'
Change-Id: Ifebee014342e4c6f3b53306c0cad6ae0b465ac12
Backend specific optimization for PPC VSX reads 16 bytes, whereas arm neon /
sse2 only reads <= 8 bytes. Although the extra bytes read are actually never
used, this is not a warrant for groping around. Fixed by allocating more when
building for VSX. This is reported by asan.
Also note - PPC does have assembly that loads 64-bit content from memory - lxsdx
loads one 64-bit doubleword (whereas lxvd2x loads two 64-bit doubleword) from
memory. However, we only have "vec_vsx_ld" builtins that mapped to lxvd2x, no
builtins to lxsdx. The only way to access lxsdx is through inline assembly,
which does not fit well in the origin paradigm.
Refer:
vsx:
vpx_tm_predictor_4x4_vsx @ third_party/libvpx/git_root/vpx_dsp/ppc/intrapred_vsx.c
neon:
vpx_tm_predictor_4x4_neon @ third_party/libvpx/git_root/vpx_dsp/arm/intrapred_neon_asm.asm
sse2:
tm_predictor_4x4 @ third_party/libvpx/git_root/vpx_dsp/x86/intrapred_sse2.asm
BUG=b/63112600
Tested:
asan tests passed.
Change-Id: I5f74b56e35c05b67851de8b5530aece213f2ce9d
Keep optimized code out of the reference implementation. This matches
the style of the other sub calls.
Change-Id: I3da6acd4f2c647b029c420e22ac9410a18259689
0.007% regression on rtc and 0.004% gain on rtc_derf.
1 thread on QVGA,VGA and HD has ~0.2% speed regression while 2 threads has
~0.2% speed gain on Google Pixel.
Change-Id: Ia4a6ec904df670d7001e35e070b01e34149d23dc
Officially the quant structures are 8 elements, with one dc element and
7 repeated ac elements. The low bit depth optimizations take advantage
of this to fill the xmm registers. The high bit depth version manually
duplicates the values.
If all the optimizations were unified, the structure sizes could be
greatly reduced.
Change-Id: Ibd7a0337a7832ce2a1a05ee433c310077e1059ae
Use only valid values for quantize inputs. These were determined by
looping over vp9_init_quantizer and looking for max and min values.
This allows extending the test to the low bit depth functions which were
not designed to handle all possible inputs but only valid inputs.
Change-Id: I94e1d8863a49ac227845b65c6b50130e10e6319e
To fix valgrind issueis with SVC tests.
SVC encoding uses prune_evenmore which is causing uinit value.
Will re-enable later when issue is resolved.
Change-Id: I257ff878cf78197ddd813db056582a4d5fe94f44
When content_state_sb is set to LowVarHighSumdiff, don't reset
it to VeryHighSad. Visually better on clips with strong lighting changes.
Small/negligible change in RTC metrics and speed.
Change-Id: I20c383e3c4cf8d1149de5f9260449c0b7cf7c6aa
When int_pro_motion_estimation is done for superblock in
choose_partitioning, use it to avoid the full_pixel_search
for NEWMV mode, if bsize is >= 32X32.
For speed > 7.
Small/neutral change on RTC metrics.
~1-2% speedup on arm on high motion clip.
Change-Id: I3cfe6833ff4bf75d4afa83eaf058ad45729de85b
Although the low bitdepth functions are identical (excepting the need
for larger intermediate values) they do not pass these tests. This
improves the error output to aid debugging.
Simplify buffer usage with Buffer and removing unnecessarily aligned
variables.
eob is a single element and never written using aligned instructions.
BUG=webm:1426
Change-Id: Ic95789a135cf1e8a3846d85270f2b818f6ec7e35
Reduces memory usage, and speeds up encoding for some difficult clips.
No impact on output or metrics.
Ported from aomedia patch:
https://aomedia-review.googlesource.com/c/14501
Change-Id: I26ec69af8336f9e80da486a1cfbfc89a3596954d
This reintroduces the fix:
https://chromium-review.googlesource.com/c/422807/
and later reverted here:
https://chromium-review.googlesource.com/c/447843/
BUG=webm:1355
This time behind a compile time flag :
configure --disable-always_adjust_bpm
configure --enable-always_adjust_bpm
This should make side by side testing easier and let users of the
lib pick which way they want to go.
Change-Id: I7d7b37b83015dc001810af84c132cbc1e71ba8d6
For fixed pattern SVC: keep track of denoised last_frame buffer
for base temporal layer, and if alt_ref is updated on middle/upper
temporal layers, force an update to denoised last_frame buffer.
This allows for improved denoising on top temporal layers.
Change-Id: Icbd08566027d4d2eabc024d3b7a0d959d2f8c18b
This code is unused in vp9. Only vp8 still contains references to
vpx_sad_NxMx[3|8] and only for sizes 16x16, 16x8, 8x16, 8x8 and 4x4.
Remove the remaining sizes and all the highbitdepth versions.
BUG=webm:1425
Change-Id: If6a253977c8e0c04599e25cbeb45f71a94f563e8
Denoiser is used in real-time mode which does not use alt-ref.
Reduce memory usage when denoiser is enabled.
Change-Id: I54ba3bcaeeb1818bbdf718ef90e97d4897ff793d
* changes:
sad neon: avg for 64x[32,64]
sad neon: macroize 64xN definitions
sad neon: avg for 32x[16,32,64]
sad neon: macroize 32xN definitions
sad neon: avg for 16x[8,16,32]
sad neon: macroize 16xN definitions
this has been set to max since:
f5c36a5ce VP9: turn on tile-columns and frame-parallel-mode by default
~v1.4.0
Change-Id: Ic796fc05abe73a58700ec50e3f8e72d3462898ec
In the content_state for a superblock is set to HighSad,
use that to bias some decisions in variance partition and
nonrd pickmde: use int_pro_motion for sad computation in
choose_partitioning, and set large_block in pickmode based
on the content_state_sb.
Only affects speed >= 7.
Immprovement for high motion content.
Small gain (~1%) in RTC metrics.
Speedup of ~5 for high motion clip on android (speed 8, 1 thread).
Change-Id: I5774c4854f012b89c8e969f6129b60988c2ce11c
this has been on by default since:
f5c36a5ce VP9: turn on tile-columns and frame-parallel-mode by default
~v1.4.0
Change-Id: I52017ab0157feaf429dce3d9e1af8a53bb5c1b65
the file was empty after the struct removal. the only remaining use was
within vp9_dx_iface, but the wrapper became unnecessary after the
removal of frame_parallel_decode.
BUG=webm:1395
Change-Id: I515ab585d701e77d388d12b2802d844c424f9bcd
This patch attempts to address a bug reported for 4K video.
https://b.corp.google.com/issues/62215394
In this instance a perfect storm of a moderate complexity section
followed by a much easier section where a CGI overlay helped to
suppress film grain noise, followed by a much harder and very grainy
section at the end, cause a massive local rate spike that pushed a chunk
over the upper allowed rate limit.
This patch detects cases where the rate for a frame is much higher than
expected and allows, in this special case, for rapid adjustment of the active
Q range.
For the example chunk in the bug report the target rate was 18Mb/s and the
observed rate was over 37 Mb/s with a surge for the last few frames to over
100Mb/s. This patch brings the overall chunk rate right back down to ~18.2 Mbit/s
and almost completely eliminates the rate spike at the end. (See graphs appended
to bug report)
Also see I108da7ca42f3bc95c5825dd33c9d84583227dac1 which fixes a bug
unearthed during testing of this patch and also has a bearing on high rate
encodes such as 4K.
This patch does have a negative impact on some metrics. Most notably there are
clips in our standard test set where it hurts global psnr (though in many cases it
conversely helps SSIM, FAST SSIM and PSNR-HVS). It is also worth noting that
the clips (and data rates) where there is a big metric impact, are almost all cases
where there is currently a significant overshoot vs the target rate and overall rate
accuracy is greatly improved.
Change-Id: I692311a709ccdb6003e705103de9d05b59bf840a
Local application of:
https://github.com/google/googletest/pull/1066
Suppress unsigned overflow instrumentation in the LCG
The rest of the (covered) codebase is already integer overflow clean.
TESTED=gtest_shuffle_test goes from fail to pass with -fsanitize=integer
Change-Id: I8a6db02a7c274160adb08b7dfd528b87b5b53050
left shifting a negative value is undefined; quiets a ubsan warning.
this is applied to a constant, no change in the generated code.
Change-Id: Ia17a7672d4832463decbc4afd6cd42974d02698e
Finish the calulations in neon registers. This avoids a potentially
expensive move from neon to gp and allows at least clang to store
directly to memory.
BUG=webm:1424
Change-Id: Idef25eec95f7610947167818e9194bde8b00d282
this makes the function compatible with high-bitdepth and fixes test
failures since:
5ac88162b partial fdct test
Change-Id: Ib630694608237f0c515948942e05dbea259ba338
testing::Range does not include the end parameter in the set of values.
also adjust the start to 2 as the single threaded case is already
covered in another instantiation
Change-Id: Iae3bf3ed4363dd434eccfa5ad4e3c5e553fbee60
For nonrd_pickmode: add condition for checking
intra mode if the sb content state is VeryHighSad.
Reduces artifacts when sudden change in content.
Metrics on RTC/RTC_derf neutral (small gain).
No speed loss observed.
Change-Id: I07006d28fd2dc06c1d06b07630102b0fece50c40
the last frame_worker_owner, row and col references were removed in:
131bd06e6 remove vp9_dthread.c
BUG=webm:1395
Change-Id: Ia7fb2e8782b12a58d2a2263849d20a8abf06aef6
and the related prototypes in vp9_dthread.h. the last references were
removed in:
09dabc58d VP9_COMMON: rm frame_parallel_decode
vp9_dx_iface.c still uses FrameWorkerData
BUG=webm:1395
Change-Id: Ica8e98ae776fc0105f1fbbed9e0a729808980810
creating a thread associated with the sole worker isn't necessary when
only execute() is being used after the removal of frame_parallel_decode.
BUG=webm:1395
Change-Id: I2255ce72607321e5708bc82a632dc6825d4eff5c
Add a method to acm_random.h to generate ranges of values
Add a way to call that method to buffer.h
Adjust dct_[partial_]test.cc to use it.
Change-Id: I8c23ae9d27612c28f050b0e44c41cb4ad2494086
this field has been 0 since:
01d23109a vp9: make VPX_CODEC_USE_FRAME_THREADING a no-op
BUG=webm:1395
Change-Id: I15448e9401e15329b54c6878dda033b17be5ec6b
VPX_CODEC_USE_FRAME_THREADING was made a no-op in:
01d23109a vp9: make VPX_CODEC_USE_FRAME_THREADING a no-op
and the tests in this file have been disabled since:
6ab0870d4 disable VP9MultiThreadedFrameParallel tests
BUG=webm:1395
Change-Id: I2c7a250acb65cf9522cf8a7bb724bb92070e41c6
this was made a no-op in:
01d23109a vp9: make VPX_CODEC_USE_FRAME_THREADING a no-op
and the test hitting this branch has been disabled since:
6ab0870d4 disable VP9MultiThreadedFrameParallel tests
rename the test to VP9MultiThreaded to exercise the tile-based threading
BUG=webm:1395
Change-Id: I35564a75eb5a7d7f7ccb923133b1b07295201f4c
Always return an int32_t. Since it needs to be moved to a register for
shifting, this doesn't really penalize the smaller transforms.
The values could potentially be summed and shifted in place.
BUG=webm:1424
Change-Id: Id5beb35d79c7574ebd99285fc4182788cf2bb972
For the 8x8_1, the highbd output fit nicely in the existing function. 12
bit input will overflow this implementation of 16x16_1.
BUG=webm:1424
Change-Id: I2945fe5478b18f996f1a5de80110fa30f3f4e7ec
The function was originally written with HBD in mind. Enable it and
configure the tests.
BUG=webm:1424
Change-Id: I78a2eba8d4d9d59db98a344ba0840d4a60ebe9a1
* changes:
sad neon: rewrite 64x64 and add 64x32
sad neon: rewrite 32x32, add 32x16 and 32x64
sad neon: rewrite 16x8, 16x16, add 16x32
sad neon: rewrite 8x8 and 8x16
sad neon: rewrite 4x4 and add 4x8
Test the _1 variant of the fdct, which simply sums the block and applies
a modifying shift based on the block size.
BUG=webm:1424
Change-Id: Ic80d6008abba0c596b575fa0484d5b5855321468
Existing logic was only affecting resolutions above 720p.
Needs more testing for reducing subpel for speed >= 8.
No change on RTC metrics.
Change-Id: I2f4bf9f25891614aafa9a86aa5a5063a3ccfce4d
This could save some cycles since skin detection is used in multiple
places in vp9.
1~2% speed up on ARM.
Change-Id: I86b731945f85215bbb0976021cd0f2040ff2687c
Split to load_input_data4() and load_input_data8().
Use pack with signed saturation instruction for high bitdepth.
Change-Id: Icda3e0129a6fdb4a51d1cafbdc652ae3a65f4e06
this normalizes these tests with the regular variance ones both in
implementation and test list output
Change-Id: I387aea81456f94b8223b8fb2a28cab94bc1aa9d5
Use the scene detection for CBR mode, and use it to reset the
rate control if large source sad is detected and rate
correctioni fact/QP is at minimum state.
Avoids large frame sizes after big content change following
low content period.
Only affects CBR mode for 1 pass at speeds 5, 6, 7.
Change-Id: I56dd853478cd5849b32db776e9221e258998d874
Fix misplaced cast that caused an overflow and incorrect rate adaptation
behavior for high data rates. This in particular will have affected 4k encodes
but could also have come into play for some higher rate 1080p cases.
In our standard test sets the quality impact is small though several high rate
clips show improved rate accuracy. This can also impact the number of recode
loop hits and on one problem 4k clip the encode time for speeds 0 and 1 was
reduced by >25%
Change-Id: I108da7ca42f3bc95c5825dd33c9d84583227dac1
Use it to limit NEWMV early exit in nonrd pickmode
Small change in RTC metrics, has some improvement
for high motion clips.
Change-Id: I1d89fd955e1b3486d5fb07f4472eeeecd553f67f
this is consistent with other threaded tests and ensures gtest_filters
meant to operate on these pick them up
Change-Id: I99ce53720553a22c4b9905a2882273c2be2c031b
and vp8_fast_quantize_b_impl_mmx; this was never enabled in rtcd
an sse2 version exists so there isn't much reason to keep a mmx
implementation around.
Change-Id: I8b3ee7f46ba194ffa0d0a6225a0f299f2a4dea90
use an int to quiet an unsigned rollover warning similar to:
25110f283 Fix an ubsan warning: vp9_quantizer.c
Change-Id: Iedecb79a17249bc18f10c0920f88cf704920f12b
Adjust the threshold for turning off cyclic refresh for high motion,
and avoid testing golden in nonrd pickmode for speed >= 8 if
golden refresh was long ago.
No change/neutral on RTC metrics.
Change-Id: I40959b8d9637f3553e7458bbabd8c6024c2c09c0
vpx_idct32x32_1024_add_ssse3() is actually a sse2 function and faster
than vpx_idct32x32_1024_add_sse2(). Replace the slow one. All are
code relocations, no new code.
Change-Id: I5dac0e98cc411a4ce05660406921118986638d19
'in' is used for the reference fdct. 'coeff' is input to the idct being
tested and 'dst[16]' is output
Fixes a segfault on unaligned memory access on x86.
Change-Id: I3691b1380ed49986897dd89a63ce63a80a0e0962
this was deleted in:
98967645a Remove vpx_idct8x8_64_add_ssse3()
but this was merged in:
9e03eedf6 Merge changes Ib26dd515,Ie60dabc3
after:
a92991133 Merge "dct tests: run all possible sizes in one test"
which added a new reference
Change-Id: I8da4a6c80d27b237a378ff15eead1daab89e7e25
Don't overide max_gf_interval if it's not specified. It will
be assigned with a default value in vp9_rc_set_gf_interval_range().
BUG=b/62803416
Change-Id: Ide46ce00279ed076865fc54ce98c55a994f0c798
Sample encoder change: reduce max-intra-rate to 1000 and
buf-initial to 600. Paramaters affect target size of key frame.
Change-Id: I2be6bc2927f5fa74e19e1efa3fb574d23a503300
Sample encoder change: reduce max-intra-rate to 1500 and
buf-initial to 700. Paramaters affect target size of key frame.
Change-Id: I01e238378b63eeef28dfc2178baadffcd3cc7561
Adjust some parameters in sample encoder: vpx_temporal_svc_encoder.
Parameters adjusted to set lower QP for initial key frame,
and allow for larger target size on subsequent key frames.
Change-Id: I092ad968e5b51b9f495dadb6ee96e810663c910e
Modify fdct4x4_test.cc to support all size combinations. This does not
add any new tests and in fact fails a few. There were minimal changes
made to the tests so it's not entirely surprising that some of the
larger 12 bit transforms are failing since it was initially only used
for 4x4.
In follow up patches the tests in fdct8x8_test.cc, dct16x16_test.cc and
dct32x32_test.cc will be evaluated and moved to dct_test.cc.
BUG=webm:1424
Change-Id: I72a23430f457d7fae8c91e706adc0e77c25abc8f
Set the base_mv_aggressive for temporal enhancement layers (TL > 0).
Under the aggressive mode, skip the NEWMV depending on the
SSE of the base_mv. Also reduce the subpel motion to 1/2 under
aggressive mode if base_mv is good.
Speedup ~3% with small/negligible loss in quality on RTC.
Affects speed >= 6.
Change-Id: I89341b279cad6da2a04b76d5e726016191dacdb8
It's almost identical with vpx_idct8x8_64_add_sse2(), except little
difference in instructions order.
Change-Id: Ie60dabc35eaa6ebae7c755e6cff00a710aad284f
This was ported from the greedy version in AV1, written by Dake He
(dkhe@google.com).
See:
https://aomedia.googlesource.com/aom/+/master/av1/encoder/encodemb.c#137
Greedy version is disabled by default, but can be picked by setting
USE_GREEDY_OPTIMIZE_B to 1.
To be enabled by default later.
This is both faster and better in terms of compression.
Compression Improvement:
------------------------
lowres: -0.119
midres: -0.064
hdres: -0.405
Speed Improvement:
------------------
(Based on encode time of 3 videos of different difficulties at
3 different target bitrates)
With --cpu-used=0: 0.38% to 5.55% faster
With --cpu-used=1: 0.24% to 2.79% faster
With --cpu-used=2: 0.29% to 1.46% faster
Change-Id: Ia7a23b3b244ad8eb253ac9e43cd03c5e021d2635
* changes:
Update high bitdepth load_input_data() in x86
Clean array_transpose_{4X8,16x16,16x16_2) in x86
Remove array_transpose_8x8() in x86
Convert 8x8 idct x86 macros to inline functions
some build systems have trouble with duplicate basenames.
vpx_dsp/skin_detection.[hc] were added in:
658e85425 Merge skin detection code in vp8/9.
BUG=webm:1438
Change-Id: Ieaa70b40bda409ec23e6d179b47a930ac6243b05
Set subpel prune_evenmore only for non_reference frames,
instead of all TL > 0 frames. Gain some quality back at
cost of small speed loss (~1-2%).
Change only effects SVC encoding at speed >= 7.
Change-Id: I5b9f51e51dccfd7050521a66996176b0415ca3f9
the check for error correction being disabled was overriding the data
length checks. this avoids returning incorrect information (width /
height) for the decoded frame which could result in inconsistent sizes
returned in to an application causing it to read beyond the bounds of
the frame allocation.
BUG=webm:1443
BUG=b/62458770
Change-Id: I063459674e01b57c0990cb29372e0eb9a1fbf342
min_gf_interval should be no less than min_altref_distance + 1,
as the encoder may produce bitstream with alt-ref distance being
min_gf_interval - 1.
BUG=b/38450599
Change-Id: Ifb733daa643ebc668d1b23e1ce92db94b66dabe8
Roughly 2x speedup. Since the only change for HBD is to store(), the
improvement appears to hold there as well.
BUG=webm:1424
Change-Id: I15b813d50deb2e47b49a6b0705945de748e83c19
Keep the 1/4subpel for all frames, use SUBPEL_TREE_PRUNED_EVENMORE
for all temporal enhancement layer frames.
Change-Id: Ibc681acbb6fc75b7b3c57fc483fcb11d591dfc9a
It is initialized to be { INT_MAX, 0, ... } in ffe0f9b.
No effect on encoders.
Make it consistent with other initializations.
BUG=webm:1440
Change-Id: Ie2a180d93626b55914c8c4255e466a1986d2b922
visual studio will warn if a 32-bit shift is implicitly converted to 64.
in this case integer storage is enough for the result.
since:
f3a9ae5ba Fix ubsan failure in vp9_mcomp.c.
Change-Id: I7e0e199ef8d3c64e07b780c8905da8c53c1d09fc
For SVC 1 pass non-rd mode:
Force subpel seach off for SVC for non-reference frames
under motion threshold.
Add flag to svc context to indicate if the frame is not used
as a reference.
Little/no quaity loss, ~2% speedup.
Change-Id: Ic433c44b514d19d08b28f80ff05231dc943b28e9
Speed >=8: for resolutions above CIF, and for low motion content,
set subpel_search_method to SUBPEL_TREE_PRUNED_EVENMORE.
Small speed gain (~2%) on vga clips,
RTC metrics up by ~2-3% on average.
Change-Id: Ie26ba0264589652f92dfe74308740debf94cf0cc
x86 requires 16 byte alignment for some vector loads/stores.
arm does not have the same requirement.
The asserts are still in avg_pred_sse2.c. This just removes them from
the common code.
Change-Id: Ic5175c607a94d2abf0b80d431c4e30c8a6f731b6
Split vp8/vp9 implementations on yv12_copy_frame_c.
Remove high-bitdepth codes from vp8_yv12_extend_frame_borders_c.
Clean up vp8 codes usage in vp9.
BUG=webm:1435
Change-Id: Ic68e79e9d71e1b20ddfc451fb8dcf2447861236d
Fix the condition on usage of source_sad for temporal layers.
FIx allows it to be used for the case of 1 temporal layer.
Change-Id: I02b1b0ade67a7889d1b93cee66d27c0951131fc3
Adjust the max_copied_frame setting for temporal layers.
Keep the same setting for non-SVC at speed 8.
This change also enables copy_partiton for non-SVC at speed 7,
but with smaller value of max_copied_frame (=2).
~2% speedup for SVC speed 7, 3 layers, with little/no quality loss.
Change-Id: Ic65ac9aad764ec65a35770d263424b2393ec6780
Unlike x86, arm does not impose additional alignment restrictions on
vector loads. For incoming values to the first pass, it uses vld1_u32()
which typically does impose a 4 byte alignment. However, as the first
pass operates on user-supplied values we must prepare for unaligned
values anyway (and have, see mem_neon.h).
But for the local temporary values there is no stride and the load will
use vld1_u8 which does not require 4 byte alignment.
There are 3 temporary structures. In the C, one is uint16_t. The arm
saturates between passes but still passes tests. If this becomes an
issue new functions will be needed.
Change-Id: I3c9d4701bfeb14b77c783d0164608e621bfecfb1
The sub pixel variance uses a temp buffer which guarantees width ==
stride. Take advantage of this with the 4x and avoid the very costly
lane loads.
Change-Id: Ia0c97eb8c29dc8dfa6e51a29dff9b75b3c6726f1
For aq-mode=3: refactor the condition for turning off
the refresh. Add some adjustments for high motion content.
No/little change in RTC metrics, only affects high motion case.
Change-Id: I7da8eabfb0e61db014be4562806f72ee5ef4a43b
When temporal layers are used, only allow for copy partition
on the top temporal enhancement layer frames.
Change-Id: I5472abdc0f9f6c8dafa75a7a84c615e08ae22af8
Only affects speed 8.
Make changes to copy partition to fix a bug in setting microblock
offset. Avg PSNR shows 0.02% gain on rtc_derf and 0.08% loss on rtc.
Change-Id: I61c3e5914dde645331344388e7437e5638acd4f3
The modified error was a derivative of the "coded_error"
that was used to allocate bits between different frames on the
assumption that the allocation should be linear in terms of this
modified error. I.e. a frame with double the modified error score
should all things being equal get double the number of bits. The
code also included upper and lower caps derived from input
VBR parameters.
This patch improves the initial calculation of the clip mean error
(now called "mean_mod_score" as it is no longer a prediction error)
used as the midpoint for the rate distribution function and normalizes
the output "modified scores" scores such that 1.0 indicates a frame
in the middle of the distribution. The VBR upper and lower caps are
then applied directly to a frame's normalized score.
This refactoring is intended to make it easier to drop in alternative
distribution functions or to base the rate allocation on a corpus wide
midpoint (rather than the clip mean).
Change-Id: I4fb09de637e93566bfc4e022b2e7d04660817195
Continue processing sets of 16 values. Plenty of improvement for 4x8
(doubles the speed) but only about 30% for 4x4.
BUG=webm:1422
Change-Id: Ib8dd96f75d474f0348800271d11e58356b620905
Advise the compiler that the store is eventually going to a uint8_t
buffer. This helps avoid getting alignment hints which would cause the
memory access to fail.
Originally added as a workaround for clang:
https://bugs.llvm.org//show_bug.cgi?id=24421
Change-Id: Ie9854b777cfb2f4baaee66764f0e51dcb094d51e
Add PartialIDctTest::PrintDiff() to help debugging.
In RunQuantCheck, try all combinations of +/-mask_ input for 4x4 idct.
Update PartialIDctTest::InitInput().
Change-Id: I13fd163954a4c1a3a6cfeb5e4a4d3d0e7ff901f4
Most existing first pass stats are stored in a form normalized to a
macro-block scale. However the error scores for intra / inter etc were
stored as frame level values but mainly used as MB level values.
This change fixes that. Normalized per MB values make comparisons
between different formats easier and in any case this is usually what is
wanted.
An change in results should be limited to slight differences in rounding.
*** Change after patch 8 +2 requiring new approval.
Final pre-submit testing showed one 4K clip with above expected change.
Investigation showed this was due to a value used to test for ultra low intra
complexity in key frame detection. This was a per frame not per MB value but
also did not scale with frame size. Replacement with a small per MB value
(based on original per frame value and cif frame size) resolved the KF detection
problem.
Also converted kf_group_error_left to a double in line with other error values
to reduce rounding problems in KF group bit allocation
All clips and sets now show nominal (or 0) change as expected.
Change-Id: Ic2d57980398c99ade2b7380e3e6ca6b32186901f
This reverts commit 0d88e15454b632d92404dd6a7181c58d9985e2a2.
Reason for revert: chromium builds are failing to locate vpx_rv during dlopen()
dlopen failed: cannot locate symbol "vpx_rv" referenced by "libstandalonelibwebviewchromium.so"
Original change's description:
> Add visibility="protected" attribute for global variables referenced in asm files.
>
> During aosp builds with binutils-2.27, we're seeing linker error
> messages of this form:
> libvpx.a(subpixel_mmx.o): relocation R_386_GOTOFF against preemptible
> symbol vp8_bilinear_filters_x86_8 cannot be used when making a shared
> object
>
> subpixel_mmx.o is assembled from "vp8/common/x86/subpixel_mmx.asm".
> Other messages refer to symbol references from deblock_sse2.o and
> subpixel_sse2.o, also assembled from asm files.
>
> This change marks such symbols as having "protected" visibility. This
> satisfies the linker as the symbols are not preemptible from outside
> the shared library now, which I think is the original intent anyway.
>
> Change-Id: I2817f7a5f43041533d65ebf41aefd63f8581a452
>
TBR=jzern@google.com,johannkoenig@google.com,rahulchaudhry@chromium.org,builds@webmproject.org
Change-Id: I0c2ea375aa7ef5fda15b9d9e23e654bb315c941b
This reverts commit 3704807805895850671261fa4470f1ce41dd2ac9.
Reason for revert: <INSERT REASONING HERE>
Does not look to be the cause of the test failures.
Original change's description:
> Revert "vp8: Real-time mode: reduce mode_check_freq thresh for speed 10."
>
> This reverts commit 4a7424adba7a65766a92635dc67b6e7d94646693.
>
> Reason for revert: <INSERT REASONING HERE>
> Possibly causing test failures in roll into chromium.
>
> Original change's description:
> > vp8: Real-time mode: reduce mode_check_freq thresh for speed 10.
> >
> > Reduces quality regression at speed 10 for real-time mode.
> >
> > Change-Id: I9f624bea9ca262dab32ce9de7d6d91175d6becc8
> >
>
> TBR=marpan@google.com,builds@webmproject.org,jianj@google.com
> # Not skipping CQ checks because original CL landed > 1 day ago.
>
> Change-Id: I1defcb74e78a5a3bd29b7d1b21a96a79fa26a457
>
TBR=marpan@google.com,builds@webmproject.org,jianj@google.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
Change-Id: I13d86a2a68b8aa8c0c7465e6e58cff0e00bc7862
This reverts commit 4a7424adba7a65766a92635dc67b6e7d94646693.
Reason for revert: <INSERT REASONING HERE>
Possibly causing test failures in roll into chromium.
Original change's description:
> vp8: Real-time mode: reduce mode_check_freq thresh for speed 10.
>
> Reduces quality regression at speed 10 for real-time mode.
>
> Change-Id: I9f624bea9ca262dab32ce9de7d6d91175d6becc8
>
TBR=marpan@google.com,builds@webmproject.org,jianj@google.com
# Not skipping CQ checks because original CL landed > 1 day ago.
Change-Id: I1defcb74e78a5a3bd29b7d1b21a96a79fa26a457
Move the tran_low_t helper functions to a new file. Additional
load/store functions will be added here.
Change-Id: I52bf652c344c585ea2f3e1230886be93f5caefc3
During aosp builds with binutils-2.27, we're seeing linker error
messages of this form:
libvpx.a(subpixel_mmx.o): relocation R_386_GOTOFF against preemptible
symbol vp8_bilinear_filters_x86_8 cannot be used when making a shared
object
subpixel_mmx.o is assembled from "vp8/common/x86/subpixel_mmx.asm".
Other messages refer to symbol references from deblock_sse2.o and
subpixel_sse2.o, also assembled from asm files.
This change marks such symbols as having "protected" visibility. This
satisfies the linker as the symbols are not preemptible from outside
the shared library now, which I think is the original intent anyway.
Change-Id: I2817f7a5f43041533d65ebf41aefd63f8581a452
Increase the partition and acskip thresholds for temporal
enhancement layers.
~1-2% speedup, with negligible loss in quality.
Change-Id: Id527398a05855298ad9ddac10ada972482415627
For SVC 1 pass non-rd pickmode, the interpolation filter for the
upsampling of the golden (spatial) reference was not being explicitly
set and instead was takin gwhatever value was set in the previous
mode/block (which would be either EIGHTTAP or EIGHTAP_SMOOTH).
Fix it to the default EIGHTTAP for now, to be updated/selected
adaptively in a later change.
Minor adjustmemt to rate targeting thresholds in datarate unittests.
Change-Id: I52085048674072c6cfb7163e11e9a2658d773826
A more detailed explanation of the experimentation
leading to this change can be found in:-
https://docs.google.com/a/google.com/document/d/13lsYhxgPyxUHvEess6wg9nikaonIZKY9Ak_Lpafv5Mo/edit?usp=sharing
This change gives gains across all our standard test sets for
overall psnr, ssim, fast ssim and psnr-HVS.
Values expressed as % reduction in bitrate.
Low res set -0.257, -0.192, -0.173, -0.101
Mid res set -0.233, -0.336, -0.367, -0.139
High res set -0.999, -1.039, -1.111, -0.567
NetFlix 2K set -0.734, -0.174, -0.389, -0.820
Netflix 4K set -0.814, -0.485, -0.796, -0.839
Change-Id: Ie981fb3c895c9dfcfc8682640d201a86375db5c8
Speed up for speed 0.
Reduce 10+% of encoding time for hdres in speed 0,
with less than 0.1% PSNR loss.
Compute total difference of previous and current frame context probability
model. If the diff is less than the threshold, skip recoding the frame.
Borg test (positive number means performance loss):
lowres midres hdres
PSNR: 0.030 0.032 0.065
Local speed test: bitrate set at 1200
blue_sky pedestrian rush_hour
Encoding time: -10.0% -16.5% -16.5%
Change-Id: I4e2d200ea3115d48b2c3e890143596b31b8ef9e9
Introduced append situation in Commit 0178d97 which could be
confusing. Clean a little bit and add some comments.
Change-Id: I69ad336f805aca7ce9d45515b8cd237423fadbb2
When the noise estimate is forced off due to large motion,
reset the counter and set smaller window for next estimate.
Change-Id: Ifa4ec95396134173a00d48353ad52f1b6a40c217
Add option in SVC to set the filter type and phase for
the frame level downsampling filters.
For 3 spatial layers: set downsampling filter type to bilinear
and set phase to 8, for lowest spatial layer.
Change-Id: Id81f4b1ba93db19c1cd37b6a46d1281a2c61bc43
Makes more sense to call the corresponding partial idct C function
instead of the full idct C function as the reference.
Change-Id: Ibb7681dd063edd6307ba582c10c26c4c6a4b78c6
Base the condition on the resolution of the spatial layer.
And remove restriction on scaling factor.
Change-Id: Iad00177ce364279d85661654bff00ce7f48a672e
Read in a Q register. Works on blocks of 16 and larger.
Improvement of about 20% for 64x64. The smaller blocks are faster, but
don't have quite the same level of improvement. 16x32 is only about 5%
BUG=webm:1422
Change-Id: Ie11a877c7b839e66690a48117a46657b2ac82d4b
For lowest spatial layer, in 3 layer SVC, set the
downsampling filtertype to get averaging filter.
Needed for reducing aliasing on low-res layer,
small increase in overall encoder time.
Change-Id: Ia31460123bd91b72eca49b46dd924b9f226d4563
An intended behavior change disabling exhaustive searches in speed
feature causes VP9/DatarateTestVP9LargeDenoiser.4threads test failure.
Change the threshold to make it pass.
BUG=webm:1429
Change-Id: Ibcbe2314c6b2525799894f5d7204fc8eb4ec2a1e
Adjust thresholds for noise estimation, for resolutions above VGA.
Tends to push cleaner/low noise clips to LowLow state.
No change in RTC metrics.
Change-Id: I739ca6b797d0a60ccd1c6c6a2775269b1f007e5e
Set noise level to kLowLow for high motion low res clips.
Change the normalization in noise metric for low res.
Reduce the initial time-window for all resolutions.
Change-Id: Iaed39dbb50b205cd9c735dc5b84822304fb01987
Add support for everything except block sizes of 4.
Performance is better but numbers will improve again when the variance
optimizations land.
BUG=webm:1423
Change-Id: I92eb4312b20be423fa2fe6fdb18167a604ff4d80
When a neon version is available it will be called. This allows
decoupling the variance implementations and has no real downside. For
most configurations, the call will be #define'd to the neon
implementation.
Change-Id: Ibb2afe4e156c5610e89488504d366b3e6d1ba712
When the width is equal to 8, process two rows at a time. This doubles
the speed of 8x4 and improves 8x8 by about 20%.
8x16 was using this technique already, but still improved a little bit
with the rewrite.
Also use this for vpx_get8x8var_neon
BUG=webm:1422
Change-Id: Id602909afcec683665536d11298b7387ac0a1207
Some of the mixed sizes were missing. They can be implemented trivially
using the existing helper function.
When comparing the previous 16x8 and 8x16 implementations, the helper
function is about 10% faster than the 16x8 version. The 8x16 is very
close, but the existing version appears to be faster.
BUG=webm:1422
Change-Id: Ib0e856083c1893e1bd399373c5fbcd6271a7f004
Add 31bit pairs before unpacking in x86 block error code
AVX2 code provides a very minor performance improvement.
BUG=webm:1210
Change-Id: I4c82308eaf65741dca2f5c6db9be9c85f905073a
For SVC 1 pass real-time: add condition to skip the
golden (spatial) reference mode in non-rd pickmode.
Condition is to skip golden if the sse of zeromv-last mode
is below threshold. And change order in ref_mode_set_svc
to make sure golden zeromv is tested after last-nearest.
Speedup ~3-4% with little/negligible quality loss.
Change-Id: I6cbe314a93210454ba2997945f714015f1b2fca3
Approximates division using multiply and shift.
Speeds up both sizes (8x8 and 16x16) by 30 times.
Fix the call sites to use the RTCD function.
Delete sse2 and mips implementation. They were based on a previous
implementation of the filter. It was changed in Dec 2015:
ece4fd5d2247c9512b31a93dd593de567beaf928
BUG=webm:1378
Change-Id: I0818e767a802966520b5c6e7999584ad13159276
Don't force disabling of adaptive_rd_thresh for realtime when
row_mt_bit_exact is set.
Row based adaptive rd is made usable in CL
454882(https://chromium-review.googlesource.com/c/454882) for REALTIME.
Change-Id: Ief023414f0fd6eb86f299dd46ae58f4436875af5
Add tentative max cpb size values for levels 5.2 and up. Otherwise
encoding will fail when targeting for these levels.
Change-Id: Ib7e0ba4b9836ea1ac900b6822543812843d48463
107de19698 changes the encoder alt-ref selection behavior. Assuming
min_gf_interval = max_gf_interval = 4, the frame order would be
frm_1 arf_1 frm_2 frm_3 frm_4 frm_5 arf_2 before 107de19698;
frm_1 arf_1 frm_2 frm_3 frm_4 arf_2 frm_5 after 107de19698.
This patch reverts such alt-ref placement change.
Change-Id: I93a4a65036575151286f004d455d4fcea88a1550
Make some speed setting changes for temporal enhancement layers,
and remove the switch in subpel_force_stop for the aggressive_base_mv
in non-rd pickmode.
Gain some 2-3% speed with little/negligible quality loss.
Change-Id: I3e2a7f80ff45f38c0a6ceb01b34dbca2f53edbf0
For speed >= 8 and color_sensitivity not set, skip the transform
skipping test in UV planes.
Add a new condition to check noise level to skip chroma check
for speed >= 8 if y_sad is high.
1~2% speedup on ARM for speed 8.
Borg tests show neutral results in both rtc and rtc_derf.
Change-Id: Idecd3ff6e28c97757a43bb6f3a7082c85f72109c
Add a low-variance high-sumdiff to the superblock content state
and use it to limit the mv and bias some decisions in non-rd pickmode.
Only affects speed >= 6.
Reduces artifact for lighting changes.
Small/no difference in metrics on RTC set.
Change-Id: Ic84b2379fe0ae3fa71ae826ee6bae3eaf551a25b
This patch followed allow_exhaustive_searches feature modification and
continued to modify the encoder to achieve the determinism in the row
based multi-threaded encoding. While row-mt = 1 and using multiple
threads, the adaptive feature in encoder was disabled, which gave
BDRate gain(at speed 1, -0.6% ~ -0.7%; at speed 2, -0.46% ~ -0.59%),
but some encoder speed losses(7% ~ 10% at speed 1 and 3% ~ 6% at
speed 2). These speed losses were acceptable considering the speed
gains obtained from row-mt.
Change-Id: I60d87a25346ebc487a864b57d559f560b7e398bb
A previous patch turned on allow_exhaustive_searches feature only for
FC_GRAPHICS_ANIMATION content. This patch further modified the feature
by removing the exhaustive search limit, and made it no longer adaptive.
As a result, the 2 counts that recorded the number of motion searches
were removed, which helped achieve the determinism in the row based
multi-threading encoding. Tests showed that this patch didn't cause
the encoder much slower.
Used exhaustive_searches_thresh for this speed feature, and removed
allow_exhaustive_searches. Also, refactored the speed feature code
to follow the general speed feature setting style.
Change-Id: Ib96b182c4c8dfff4c1ab91d2497cc42bb9e5a4aa
The more aggressive settings should only be used when denoise_svc
condition is satisfied (which means top spatial layer).
Change-Id: Ia8e3515b27f31bf21b1976ca80a2fa826daece3a
In non-rd pickmode (speed >= 5), avoid duplication of computations in
model_rd_for_sb_y when the speed feature use_simple_block_yrd is
enabled (or for high bitdepth build under certain conditions).
QVGA, VGA and HD have 1.23%, 2.68% and 1.7% speedup on ARM for speed 8,
respectively.
Encoding results are bitexact for speed >= 5.
Change-Id: I3f9130810c21439f5ad7e159e21cb2243dcd05f1
Re-enable the SVC tests, wrap the non-zero expectation
in GetMismatchFrames around #if CONFIG_VP9_DECODER.
Change-Id: I0e8a2d78b868c32f18fe597540f397d3a1b303b5
Slightly faster, the other dc predictors cannot be faster since
the computation speedup is overwhelmed by the time spent reading
dst to write just the 8x8 part.
Change-Id: I94a0b50500adf8b7b6bb919dbf5c7adf5b9fba66
The allow_exhaustive_searches feature improves the encoding quality
of FC_GRAPHICS_ANIMATION content a lot. For non-FC_GRAPHICS_ANIMATION
content, the quality test result is almost neutral. This patch makes
this feature to be used only for FC_GRAPHICS_ANIMATION content.
The motivation of doing that is to make this feature no longer adaptive,
which will be implemented in the following patch.
Change-Id: Ic911df6dd757402b6480789cc247801e99840369
Replace by CAST_TO_BYTEPTR/SHORTPTR.
The rule is: if a short ptr is casted to a byte ptr, any offset
operation on the byte ptr must be doubled. We do this by casting to
short ptr first, adding offset, then casting back to byte ptr.
BUG=webm:1388
Change-Id: I9e18a73ba45ddae58fc9dae470c0ff34951fe248
The scaling filter with zero shift will give sub-sampling for
2x downsampling. Allow for a phase shift to get an averaging filter.
Usage is for source scaling in 1 pass SVC mode for 1:2 downscale.
Reduces aliasing in downsampled image.
Keep the phase to 0/off for now.
Change-Id: Ic547ea0748d151b675f877527e656407fcf4d51e
Disable the 1 pass CBR SVC tests with temporal_layers > 1.
Issue with the commit 863f860, which will cause encoder/decoder
mismatch due to skipping encoder loopfilter for non-reference frames.
Will re-enable the tests when fixed.
Change-Id: I74918a0045a17976b069c4be63fbeb921974df0d
This condiiton is not needed as key_frame should set the refresh
of the reference frames, but good to have for clarity in condition.
Change-Id: Icf9838e7e4f0ff5cf0a9562ae3b5d6c7e6f78702
Modify the frame flags to update the ARF on top layer,
for the tests:
VP9/DatarateTestVP9Large.BasicRateTargeting3TemporalLayers
VP9/DatarateTestVP9Large.BasicRateTargeting3TemporalLayersFrameDropping
This is needed to fix the encode/decoder mismatches caused by 863f860,
and removed in the revert e9b7f98.
Change-Id: I6b9fecfdd17315fc0179e29949338c77636026c0
Buffers on 32 bit x86 builds only guaranteed 8 byte alignment. Fixed
with "AvgPred test: use aligned buffers" and "sad avg: align
intermediate buffer"
Also re-enable asserts on the C version.
BUG=webm:1390
Change-Id: I93081f1b0002a352bb0a3371ac35452417fa8514
This reverts commit 863f860bfcf3bdc26eeecb299aa481d0f63d11ac.
This causes encoder / decoder mismatches in various
VP9/DatarateTestVP9Large.BasicRateTargeting3TemporalLayers tests
BUG=webm:1408
Change-Id: Ic200c39d7ed9c0b0247ef562f5d6f7b2625f7e14
For low resolutions (<= CIF): use quarter-pixel and simple_block_yrd.
~5% gain on RTC_derf.
~6-7% slowdown on ARM.
Change-Id: I4439ebd1116b9decac04786503f978840b68a60c
Fix to avoid getting stuck at very low Q even
though content is changing, which can happen for --min-q=0.
Fix is to more aggressively increase active_worst_quality
when detecting significant rate_deviation at very low Q.
Change will only affect 1 pass VBR for --min-q < 4, so no
change in ytlive metrics for --min-q >= 4.
Change-Id: I4dd77dd7c08a30a4390da0ff2c8bda6fccfa76d7
Useful for SVC, where the top layer enhancement frames may
not update any reference buffers, as is the case for the
patterns in the 1 pass CBR SVC when #temporal_layers > 1.
~3% encoder speedup for SVC patterns with temporal layers
in 1 pass CBR mode.
Updated the SVC datarate tests for the mismatch frames.
Set the frame-dropper off in some tests with #temporal_layers > 1
so we can correctly set #mismatch frames. Adjusted rate target
threshold for tests where frame-dropper was turned off.
Change-Id: Ia0c142f02100be0fed61cd2049691be9c59d6793
Provides over 15x speedup for width > 8.
Due to smaller loads and shifting for width == 8 it gets about 8x
speedup.
For width == 4 it's only about 4x speedup because there is a lot of
shuffling and shifting to get the data properly situated.
BUG=webm:1390
Change-Id: Ice0b3dbbf007be3d9509786a61e7f35e94bdffa8
The MV unit test revealed an integer overflow issue in vp9_mcomp.c.
This was caused if the MV was very large. In mv_err_cost(), when
mv->row = 8184, mv->col = 8184 and ref_mv is 0, mv_cost = 34363
and error_per_bit = 132412, causing the overflow.
BUG=webm:1406
Change-Id: I35f8299f22f9bee39cd9153d7b00d0993838845e
Set adaptive_rd_thresh to 2 when simple block yrd is not used.
Fix regression caused by computing y sad without
int_pro_motion_estimation on low res motion clips.
Overall 0.07% quality loss on rtc_derf.
Change only affects low res on speed 8.
Change-Id: Ic6a188a56529f1034d6431005fb4b0e24e8a7e27
For speed 5, 1 pass CBR: Don't use the nonrd_pick_partition
on the segment, rather use choose_partitioning followed by
nonrd_select_partition (as is done on base segment).
Little/no quality loss on RTC and RTC_derf (< 0.3%),
speedup of at least 5%.
Change-Id: I5273d5f950e60adf5e437b4ca8c4f63964641e83
If the noise estimation is avoided due to large motion,
the last_source for denoising should still be updated.
Change-Id: I67155ea7dbe9ac2785978e64a27bdafd7d57aac0
To reduce refresh on partial super-blocks on boundary,
for noisy input. Reduces some artifacts on noisy input.
Change-Id: I10b5808a296874e08c7f378b3df58466591d8dbe
Edit
Move the condition for effectively disabling the denoising
for speed 5 into the vp9_denoiser_denoise().
This is cleaner, and also moving the condition into vp9_denoiser_denoise
will keep the denoiser buffer updated with the current source.
This allows for more consistent behavior if speed is changed midstream.
Change-Id: Ia001f591c56e454bf724c3ae73c024badb183ef8
To prevent the motion vector out of range bug, added a motion vector unit
test in VP9. In the 4k video encoding, always forced to use extreme motion
vectors and also encouraged to use INTER modes. In the decoding, checked if
the motion vector was valid, and also checked the encoder/decoder mismatch.
The tests showed that this unit test could reveal the issue we saw before.
Change-Id: I0a880bd847dad8a13f7fd2012faf6868b02fa3b4
For 8-bit the subtrahend is small enough to fit into uint32_t.
This is the same that was done for:
c0241664a Resolve -Wshorten-64-to-32 in variance.
For 10/12-bit apply:
63a37d16f Prevent negative variance
Change-Id: Iab35e3f3f269035e17c711bd6cc01272c3137e1d
Temporal denoiser runs in non-rd pickmode, so it is only used
for speed >= 5. Regression exists for speed 5, due to use of
reference_partition (which use non-rd pickmode for partitioning).
Avoid denoising for now at speed 5.
Change-Id: I74a74d2e1404d7cfd33dcf4ec06dd2e503256cf0
Base the low_content_frame metric on the motion vectors,
and adjust the logic for preventing golden update.
Small change in behavior: small positive gain (~0.2-1%) on clips
with high activity.
Change-Id: I0b861c8e9666cd82b45cde5ee57ee8a1e5ab453c
Code cleanup: merged two functions that were doing postencode
update for cylic refresh, remove some unused code and fix comments.
No change in behavior.
Change-Id: I9be0d7e346d34dec29bf4e5bb380a7bf81c8480a
BUG=webm:1397
(yunqingwang)
To verify that this patch wouldn't cause much performance change,
the Borg tests were run. Here was the result:
avg_psnr overall_psnr ssim
hdres: -0.002 0.006 0.013
midres: 0 0 0
lowres: 0 0 0
Change-Id: Iae395ae7b741e0513cf5bab9dcace110b792a67d
The row mt sync read uses sync_range = 1, and wouldn't work if we want
to use a sync_range that is greater than 1. To make it work, this sync
read code is modified. Pass in col instead of col - 1 to make it
consistent with other row mt code in VP9, and then add 1 in "while"
codition.
Change-Id: I4a0e487190ac5d47b8216368da12d80fec779c1a
Issue/bug happens for denoising with spatial layers, where
the golden (spatial) reference is used in pickmode, but
denoising is only done wrt to last (temporal).
Fix is to make sure set_ref_ptrs is set before build predictors
in denoiser.
Change-Id: I793cf441341edf7c4a88b8ab1e1b22b3cb0eb508
Temporary override to condition for disallowing intra-search in SVC,
since golden (spatial) reference is currently suppressed due to
artifact issue.
Change-Id: I28ed7fdddc9fcdbcc0a4175a247a3ecc94c11767
For non-rd variance partition, avoid the chrome check
unless y_sad is below some threshold.
Small decrease in avgPSNR (~0.3) on RTC set.
Small/negligible decrease on RTC_derf.
Change-Id: I7af44235af514058ccf9a4f10bb737da9d720866
Refactor to split the 1 passs source sad computation into scene
detection (currently used for VBR and screen-content mode), and
superblock based source sad computation (used in non-rd CBR mode).
This allows the source sad computation for CBR mode to be
multi-threaded.
No change in compression.
Change-Id: I112f2918613ccbd37c1771d852606d3af18c1388
d207/d63/d45/d117/d135/d153
~9-45% better depending on the predictor on 32-bit ARM, similar range on
x86-64
this matches the non-highbitdepth implementation
BUG=webm:1316
Change-Id: Iddebdf7c58c6f31c47cae04da95c6e5318200e4c
Make the source_sad feature work properly for cases of VBR or
screen_content with SVC.
Added unittest for SVC with screen-content on.
Change-Id: Iba5254fd8833fb11da521e00cc1317ec81d3f89b
tolerate a NULL hist being passed as a result of invalid parameters
passed to init_rate_histogram(). this fixes a divide by zero in
init_rate_histogram() with an invalid fps.
BUG=webm:1383
Change-Id: Id203e0f3b18d67a4a09aaf206abcce4708f966ec
Since y_sad is not computed yet (on the early exit due to source_sad),
no need to check for setting color_sensitiviy.
Only affects speed >=8. No change in behavior.
Change-Id: I3a6f2d20fed38d8b8ec51b75bcacf9a21f2db916
Allow for simple_block_rd for VGA resoln, and reduce
adaptive_rd_thresh to 1.
On average no loss on RTC set, ~4% speedup on mac.
Change-Id: Ib549c4061c853776062b5e34040f839d470fbebc
Change tests to reflect use. Input sizes will be 8 or 16 (but not
necessarily square).
filter_weight is capped at 2 and filter_strength at 6
Speed test, disabled by default.
Change-Id: Idfde9d6c4b7d93aaf0e641b0f4862c15e2a2af7a
Change it to row based array to avoid the slow down cause by sync.
row-mt on, speed 8, 2 threads: ~4% speedup for VGA on ARM benefited
from adaptive_rd_threshold.
Change-Id: I887e65a53af20a6c4f48d293daaee09dab3512cf
- Refer to patch: 48fca113d inv_txfm_ssse3,butterfly: fix win32 abi
compatibility.
- Change four butterfly() calls to butterfly_self(), to simplify the
operations.
Change-Id: Ib2a8cfe6cddcaf0a59e6e6270d8380055ea42ef3
Add additional condition to split to 16x16, for resolutions <= 360p,
reduces dragging artifact near moving boundary.
Small/no change on RTC metrics.
Change-Id: I314694f2166435d918f74e7ab42f002b07f40dae
For each superblock, keep track of how far from current frame
was the last significant content change, and use that (along
with GF distance), to turnoff GF search in non-rd pickmode.
Only enabled for speed >= 8.
avgPNSR on RTC/RTC_derf down by ~0.9/1.2.
Speedup on mac: ~3-5%.
Speedup on arm: 3.6% for VGA and 4.4% for HD.
Change-Id: Ic3f3d6a2af650aca6ba0064d2b1db8d48c035ac7
The sum of tx bloxk eobs is needed in the machine learning based partition
early termination. The eobs are first accumulated during tx search, and
then the value associated with the best tx_size is copied to ctx for later
use.
After the sum of eobs are calculated correctly, re-enabled
ml_partition_search_early_termination speed feature.
Re-did the quality/speed test to check the impact of the fix.
1. Borg test BDRATE result:
4k set: PSNR: +0.183%; SSIM: +0.100%;
hdres set: PSNR: +0.168%; SSIM: +0.256%;
midres set: PSNR: +0.186%; SSIM: +0.326%;
2.Average speed gain result:
4k clips: 21%;
hd clips: 26%;
midres clips: 15%.
The result is in line with the original result.
Change-Id: I4209a95c89be03b4cbfb6a95b16885f89feddbda
Add routine vp9_model_rd_from_var_lapndz_vec and call it from model_rd_for_sb
to model the rate and distortion for MAX_MB_PLANE Laplacian sources in
parallel. The caller ensures that all sources have non-zero variance.
Measured a 18% to 25% reduction in retired instructions, and 17% to 24%
reduction in instruction execution cost with different compilers for the
Laplacian modeling.
No change in behavior.
TEST=Verified that encoded files match bit for bit, with and without this
change.
BUG=b/33678225
Change-Id: I6b76947f21c659a349adb896e13e99f6e3f951e6
Don't denoise spatial layer frames whose base layer is a key frame.
Disallow golden reference for SVC with denoising on frames
that will be denoised (highest layer), as this removes bad artifact.
Will re-enable when issue is resolved.
Change-Id: I87a6597812330500966458172acfce54af65f70f
Fix the update of the denoiser buffer when the base
spatial layer is a key frame. And allow for better/lower
QP on high spatial layers when their base layer is key frame.
Change-Id: I96b2426f1eaa43b8b8d4c31a68b0c6d68c3024a2
Similar issue as Change bc1c18e.
The PartialIDctTest.ResultsMatch test on vpx_idct32x32_135_add_neon()
in high bit-depth mode exposes 16-bit overflow in final stage of pass
2, when changing the test number from 1,000 to 1,000,000.
Change to use saturating add/sub for vpx_idct32x32_34_add_neon(),
vpx_idct32x32_135_add_neon and vpx_idct32x32_1024_add_neon() in high
bit-depth mode.
Change-Id: Iaec0e9aeab41a3fdb4e170d7e9b3ad1fda922f6f
Reduce it from 5 to 4, small/no change in metrics or speed.
Small reduction in dragging artifact near moving head.
Change-Id: Ic3bc5ca67c70bf0c89fc2ed14454840a28ae5b6a
This patch was based on Yang Xian's intern project code. Further modifications
were done.
1. Moved machine-learning related parameters into the context structure.
2. Corrected the calculation of sum_eobs.
3. Removed unused parameters and calculations.
4. Made it work with multiple tiles.
5. Added a speed feature for the machine-learning based partition search
early termination.
6. Re-organized the code.
The patch was rebased to the top-of-tree.
Borg test BDRATE result:
4k set: PSNR: +0.144%; SSIM: +0.043%;
hdres set: PSNR: +0.149%; SSIM: +0.269%;
midres set: PSNR: +0.127%; SSIM: +0.257%;
Average speed gain result:
4k clips: 22%;
hd clips: 23%;
midres clips: 15%.
Change-Id: I0220e93a8277e6a7ea4b2c34b605966e3b1584ac
Fixes an issue when the LAST and golden is not used as a reference,
in which case its possible no encoding mode is set (since intra may be
skipped under certain codtions). Fix is to make sure intra is searched
if no inter mode is checked.
Issue can happen for temporal layer pattern#7 in vpx_temporal_svc_encoder.c
Change-Id: I5ab4999b2f9dbd739044888e0916b5ec491d966b
only the first 3 parameters can be aligned to 16 as required by __m128i,
make them all pointers for consistency.
since:
07c48ccfe Improve idct32x32_34_add SSSE3 intrinsics performance
BUG=webm:1384
Change-Id: I0324f701e723a27cb470036a180693ba8829d01d
shift the bsse[] member of the macroblock struct to the front to avoid
an incorrect offset (0) to the upper half of bsse[0] which leads to a
negative resulting in a crash. restrict this to visual studio versions
before 2015 (the bug was observed with 2013, fixed in 2015) to avoid any
potential cache impact on other platforms.
https://connect.microsoft.com/VisualStudio/feedback/details/2396360/bad-structure-offset-in-32-bit-code
BUG=webm:1054
Change-Id: I40f68a1d421ccc503cc712192263bab4f7dde076
Enable row-mt for SVC for real-time mode, speed >=5.
Add the controls to the sample encoders, but keep it off for now.
Add the control and enable it for the 1 pass CBR unittests.
For speed 7, 3 layer SVC, 2 threads, row-mt enabled gives about ~5% speedup.
Change-Id: Ie8e77323c17263e3e7a7b9858aec12a3a93ec0c1
- Split the inv txfm into three parts to avoid stack spillover.
- Function level speed improves ~12%.
- Use function and macro to remove some repeated code.
Change-Id: I14f5f072334fd766808cb52bf648df792e7379ee
this is similar to the x86 configuration and helps mitigate an issue
with a circular dependency between this function and the ssse3 variant
causing an outsized increase in binary size (~300K for chrome)
chrome.dll:
.text 255B000 -> 252B000
.data 7B000 -> 75000
-221184 bytes
BUG=chromium:697956
Change-Id: Ic95b142ecd62dd4f1795788aa27dd8fab59b708c
The 2 thresholds(i.e. partition_search_breakout_dist_thr and
partition_search_breakout_rate_thr) are used as the partition search
early termination speed feature. This refactoring patch made this
feature to be frame size dependent consistently throughout the code.
Change-Id: Idaa0bd8400badaa0f8e2091e3f41ed2544e71be9
Most are cosmetics changes.
Speed has no change with clang 3.8, and about 5% faster with gcc 4.8.4
Tried the strategy used in 8x8 and 16x16 (which operations' orders are
similar to the C code), though speed gets better with gcc, it's worse
with clang.
Tried to remove store_in_output(), but speed gets worse.
Change-Id: I93c8d284e90836f98962bb23d63a454cd40f776e
Add ppc, ppc64 and ppc64le on all_platforms and ARCH_LIST
Add VSX flags and check for -mvsx
Define empty setup_rtcd_internal
Add Altivec detection based on:
http://freevec.org/function/altivec_runtime_detection_linux
Detect VSX at runtime when enabled
Change-Id: I304f4d8c5fee0ff19b6483cd2e9cc50d6ddec472
Signed-off-by: Rafael de Lucena Valle <rafaeldelucena@gmail.com>
When eob is less than or equal to 135 for high-bitdepth 32x32 idct,
call this function.
BUG=webm:1301
Change-Id: I8a5864f5c076e449c984e602946547a7b09c9fe6
Fix the conditon for getting last_source when denoising is on.
This avoids unneeded scaling in the case of SVC.
No change in quality.
Change-Id: I32c1c2c9085104da51af8535716bcc4d55fb0f42
This may fix the time out failure of valgrind tests in nightly
since more coverages were added on row-mt.
Change-Id: Id9414e66d1a266602c7495243d9f5cb69e17ccdc
clear the entire array on error. the size used previously was equal to
the number of elements.
BUG=webm:1364
Change-Id: I2f2e16ed6e867f41d4774a5a8ac9cedaee11ce46
Reduce the level from 4 to 2.
This gives ~1-2% quality gain on RTC set, with small decreaee in speed (~1-2% on mac).
Change-Id: I7d959731badcee3d45b2f4a08efe378765016a13
- Split the transform into first half and second half.
- Reschedule the instructions to avoid stack spillover.
- Function level speed improves ~16%.
Change-Id: I166889840d23aa8a273eca00f6fbdae8b4566f35
Moves the def from vpx_encoder.h -> vpx_codec.h. The defined value
is changed as part of this move.
Adds the value to decoder capabilities when CONFIG_VP9_HIGHBITDEPTH.
Change-Id: I7d61fc821cda29f1e32bb9b2b9ffd3d83966e419
This reverts commit d3db846cc50b1b0a9f6efcbe2b36c9c1943bc528.
This change causes a large drop in psnr (4-5db) on low framerate
difficult content (tested at 360/480p)
BUG=b/35804225
Change-Id: I8e90012d3b9c8a0cddb062ba93b01b36c0e0c0a0
From commit:
https://chromium-review.googlesource.com/c/441393/
On non-segment the set_vbp_thresholds() should be called
again to adjust thresholds based on content_state of superblock.
This was the intended behavior from 441393.
Small change in RTC metrics and speed.
Change-Id: I45e5fbdc4af74db76b3cb4f13074fcae0eb2219e
new_mt is a very generic name that will get obsolete soon enough.
Since this is exposed as a codec control, renaming it to row_mt to
signify row level paralellism. Also renaming the ETHREAD_BIT_MATCH
codec control to ROW_MT_BIT_EXACT.
Change-Id: Ic7872d78bb3b12fb4cf92ba028ec8e08eb3a9558
Re-organized the encoder threading tests and grouped tests into
4 parts. Added PSNR checking test to make sure the PSNR variation
is within a small range.
BUG=webm:1376
Change-Id: I09edb990236a87a4d2b2b0e1ceaf6c6435a35eff
vp9_highbd_block_error_8bit_c was a very simple wrapper around
vp9_block_error_c. The SSE2 implemention was practically identical to
the non-HBD one. It was missing some minor improvements which only
went into the original version.
In quick speed tests, the AVX implementation showed minimal
improvement over SSE2 when it does not detect overflow. However, when
overflow is detected the function is run a second time. The
OperationCheck test seems to trigger this case and reverses any
speed benefits by running ~60% slower. AVX2 on the other hand is
always 30-40% faster.
Change-Id: I9fcb9afbcb560f234c7ae1b13ddb69eca3988ba1
Only works for bitdepth = 8 when compiled with high bitdepth flag.
4x speed ups for handling 1:2 down/upsampling.
Validated manually for:
1) Dynamic resize for a single layer encoding
2) SVC encoding with 3 spatial layers
Results are bitexact with the patch and the speed gain (~4x) in the
scaling was verified.
BUG=webm:1371
Change-Id: I1bdb5f4d4bd0df67763fc271b6aa355e60f34712
The reduction showed improvement on RTC when aq-mode=3 is on.
Add that (cyclic refresh enabled) to the condition.
Only affects 1 pass CBR.
Change-Id: I5d0843002d8e31d7c165098a62e7a71146b08664
For speed 8 only.
3% speed up for QVGA and 6.3% for VGA on Nexus 6.
~3% avgPSNR decrease on rtc_derf and 2.9% on rtc.
Disabled for now.
Change-Id: I70133f1f6c804d663d594df437bfe7fdb0030d6a
The output needs to be aligned. Input is read with 'movq' not 'movqda'
so it is not expected to be aligned.
Change-Id: Ibd48a84c1785917a6a97c3689a05322abba486b4
Increase the variance partition thresholds for superblocks that
have low sum-diff (from source analysis prior to encoding frame).
Use it for now only for speed >= 7 or for denoising on.
Small change on metrics for rtc set: less than ~0.1 avgPNSR decrease
on RTC set, for both speed 7 and 8.
Change-Id: I38325046ebd5f371f51d6e91233d68ff73561af1
Use the simple block_yrd under certain conditions.
The optimization code is completed but the speed is still slower
(~6% on 720p) than the low-bitdepth build.
For now, use the more complex block_yrd under certain conditions
(always use it for speed <= 5, otherwise use it on key frames and for
bsize >= 32x32).
This gives about ~2-3% gain in quality for speed 7 on RTC set
(over high bitdepth build), with about the same encoder fps as the
low bitdepth build.
Change-Id: Ibe92a1945d0bd635f880befb4c815727df62d754
Modified the code to facilitate bit-match tests in first pass
Added unit-tests to test the row based multi-threading behavior for bit-exactness
Change-Id: Ieaf6a8f935bb1075597e0a3b52d9989c8546d7df
This change subtracts out low complexity intra regions that are also low
error in the inter domain, in the calculation of the frame prediction decay.
The rationale here his that low complexity regions (such as sky) do not imply
high prediction decay in the same way as high error intra or neutral blocks.
The effect of this is small in most clips but in a few clips it can be > 10%.
(E.g. In to tree)
Change-Id: If67ac23d17fca14285cad2defa464c61c9ea861c
vp9[_highbd]_quantize]_fp[_32x32] and vp9_fdct8x8_quant do not make use
of these parameters.
scan is used for C code and iscan is used for SIMD implementations.
Change-Id: I908a0ff7d3febac33da97e0596e040ec7bc18ca5
* changes:
quantize_fp_32x32 highbd ssse3: enable existing function
quantize_fp highbd ssse3: use tran_low_t for coeff
quantize_fp highbd sse2: use tran_low_t for coeff
- Replace the corresponding assembly code.
- No user level speed performance degrade.
- Unit tests passed.
Change-Id: Idd0c5a4bad4976f1617c34100cb46e75e3b961e5
This was created as part of the quantize_fp_ssse3 change. Both
functions use the same source file with different macro parameters.
Change-Id: I267050a559426a85955d215aa0aaca270439c5ab
The previous implementation confused bit/bytes/elements. It was using
'32' as the multiplier but that was mistakenly adopted because a 32x32
transform embedded the stride.
Change-Id: Ieeb867a332416b9a40580b5e7c9b20088e9e691a
The weight segment needs to only be computed once per frame,
so remove it from the funciton vp9_cyclic_refresh_rc_bits_per_mb(),
which is called within a loop inside vp9_rc_regulate_q.
Change-Id: Ia0e18b89abb97e42c466d4dbc47700d7f76555db
vp9_compute_qdelta_by_rate has almost 2% overhead in profiling on Nexus 6.
Reduce the calling of that function in speed 8 by estimating the delta-q.
Both rtc and rtc_derf show little/no change in avg psnr/ssim.
Encoding speed is 2~3% faster on Nexus 6.
Change-Id: If25933715783f31104a18a5092ea347b1221b5f5
This small change replaces the frame boost check in the arf group
length break out clause with a test against a prediction decay value.
The boost value is in fact partly dependent on the decay value but
this change means that the per frame boost calculation can be adjusted
without influencing the group length calculation.
The value chosen gives a close match on all the test sets with the previous
code (on average) but it was noted that a lower threshold was slightly better
for 1080P and up and a slightly higher value for small image sizes.
Change-Id: I4d5b9f67d5b17b0d99ea3f796d3d6202fd61ee0c
The function scale_sse_threshold() returns a threshold scaled
if necessary for use with 10 and 12 bit from an 8 bit baseline.
SSE error values would be expected to rise for the 10 and 12
bit cases where there are more bits of precision.
Hence the threshold used for the test should also be scaled up.
Change-Id: I4009c98b6eecd1bf64c3c38aaa56598e0136b03d
When eob is less than or equal to 38 for high-bitdepth 16x16 idct,
call this function.
BUG=webm:1301
Change-Id: I09167f89d29c401f9c36710b0fd2d02644052060
(Yunqing Wang)
This patch implements the row-based multi-threading within tiles in
the encoding pass, and substantially speeds up the multi-threaded
encoder in VP9.
Speed tests at speed 1 on STDHD(using 4 tiles) set show that the
average speedups of the encoding pass(second pass in the 2-pass
encoding) is 7% while using 2 threads, 16% while using 4 threads,
85% while using 8 threads, and 116% while using 16 threads.
Change-Id: I12e41dbc171951958af9e6d098efd6e2c82827de
This matches bitdepth_conversion_sse2.asm and produces substantially
better assembly. The old way had lots of 'movzwl' and 'shl' and storing
back to memory before loading into an xmm register.
Change-Id: Ib33e35354dfd691a4f8b1e39f4dbcbb14cd5302b
Clears up static clang analysis warning regarding divide by zero.
Trying to explain to the compiler how it's impossible to avoid
incrementing num_blocks at least once is difficult.
Change-Id: Ibaae43be572e5cd7a689b440dcd341c17d33443b
Where clang static analysis or gcc -Wmaybe-uninitialized warns of
uninitialized values, assign 0 to ints, MB_MODE_COUNT to
MB_PREDICTION_MODE, and B_MODE_COUNT to B_PREDICTION_MODE.
Assert that the modes have been changed from the invalid value by
the end of the function.
Change-Id: Ib11e1ffb08f0a6fe4b6c6729dc93b83b1c4b6350
While the new-mt mode is enabled(namely, allowing to use row-based
multi-threading in encoder), several speed features that adaptively
adjust encoding parameters during encoding would cause mismatch
between single-thread encoded bitstream and multi-thread encoded
bitstream. This patch provides a set_control API to disable these
features, so that the bit match bitstream is obtained in the unit
test.
Change-Id: Ie9868bafdfe196296d1dd29e0dca517f6a9a4d60
broken since:
c3f095c8b Merge "Fix to avoid abrupt relaxation of max qindex in recode path"
5f21aba4b Fix to avoid abrupt relaxation of max qindex in recode path
the original change pre-dated the addition of .clang-format
Change-Id: If5e399d9a805bcad9147360b13b36fbc8c560a7c
VBR method that allows a wider Q range for the first normal frame
in each ARF group and then centers the min - max range for the rest of
the arf group on the chosen Q value for that first frame.
This allows for quite rapid adjustment of the active Q range even if the
initial estimate is poor.
In some cases where the ARF frames themselves are tending to
undershoot but the normal frames are overshooting this can still give
net undershoot. This can be corrected by allowing a larger Q delta for
arf frames but is usually is a sign that the allocation to the arfs was to
high.
Change-Id: Icec87758925d8f7aeb2dca29aac0ff9496237469
Temporary fix until optimization work for block_yrd is completed.
This essentially reverts back to the state before the change:
https://chromium-review.googlesource.com/c/433821/
Compression loss is about ~5-6% on RTC set.
Speed-up (from using this simple/model-based block_yrd) over the low
bitdepth builds (which uses more complex block_yrd) is ~5% on 720p.
Change-Id: Ie0af9eb0d111e5595f587870c44f08317403b8d8
this prevents a rollover when tv_sec is a long:
signed integer overflow: 2776 * 1000000 cannot be represented in type
'long'
Change-Id: I03dc4476ee122b02e2856dad28358a20cf16a9f8
The RunQuantCheck() test on it exposes 16-bit overflow in stage 7 of
pass 2. Change to use saturating add/sub for both
vpx_idct16x16_38_add_neon() and vpx_idct16x16_256_add_neon() for high
bitdepth.
Change-Id: Ibf4c107a887553a52852cc582e28d38a5a5a2712
Add factor to increase varianace partition and ac skip thresholds,
under certain conditions (noise level and sum_diff), to increase
denoiser speed.
Change-Id: I7671140ef3598bf5f114a72623d68792bcd7b77b
Threshold for partitioning only affects VGA and lower res.
0.07% quality regression is observed in borg tests on rtc_derf
and 0.2% regression on rtc.
5.6% speed up for low res and 6.8% for VGA on Nexus 6.
Change-Id: If85a2919b48c991de66059c90f32ed06980452be
Fixed the following issue.
..\test\vp9_ethread_test.cc(69): warning C4805: '|=' : unsafe mix of type 'bool' and type 'int' in operation [C:\src\buildbot\test-libvpx\tests\dveCPjwhBE\.build-x86_64-win64-vs10\test_libvpx.vcxproj]
..\test\vp9_ethread_test.cc(69): warning C4800: 'int' : forcing value to bool 'true' or 'false' (performance warning) [C:\src\buildbot\test-libvpx\tests\dveCPjwhBE\.build-x86_64-win64-vs10\test_libvpx.vcxproj]
Change-Id: I37f897cf12a0b7500d2fcbac9e4615f08a83fdb4
Modified the encoding stage to have row level entry points with relevant
initializations and to access the token information at row level
Change-Id: Ife10e55a7c1a420ee906d711caf75002688d9e39
* changes:
hadamard highbd ssse3: use tran_low_t for coeff
hadamard highbd neon: use tran_low_t for coeff
hadamard highbd sse2: use tran_low_t for coeff
Clears up static clang analysis warning regarding a dead store. Only
declare 'c' when it will be used.
Change-Id: I1ac0fc7f94bc44da63938c63cd1efcd6b95e0eb3
This commit resolves the compression performance regression in
real-time encoding setting when high bit-depth mode is enabled.
The current solution temporarily disables the SIMD implementations
of vpx_satd, hadamard8x8, and hadamard16x16 in high bit-depth mode.
The commit makes the coding results bit-wise identical between
regular coding pipeline and high bit-depth at profile 0.
BUG=webm:1365
Change-Id: Icfb900821733749685370460a1a5a7e07f76f4bf
Clears a clang static analyzer warning where 'cols' is assumed to be
less than 0, preventing the for loop from executing.
The assembly already requires that the size be 8 or 16 (U/V or Y plane)
and cols is a multiple of 8.
Change-Id: Ica4612690ead1638c94cfe56b306e87f8ce644f9
In non-rd pickmode: Allow speed 7 to also use larger block size in
model_rd. Small change in behavior for speed 7.
Change-Id: I8c5523e424308e8f0bc71b3f6324dec42a464cc8
In non-rd pickmode: small change in behavior for speed 6 and 7.
Remove condition on HIGHBITDEPTH flag.
Change-Id: I360a13fcc313d72612fe9b918162ef4bb278cdea
Add Buffer features for:
Setting the buffer to the output of an ACMRandom function.
Copying a buffer.
Comparing two buffers.
Printing two buffers.
Change-Id: Ib53fb602451a3abdcee279ea2b65b51fbc02d3df
Skip denoising for blocks < 16x16, and for block = 16x16
skip denoising for low noise levels and width > 480 for now.
Allow for some speed-up in denoiser.
Change-Id: Ib46cefe4741962d145fa08775defea3a9c928567
(yunqingwang)
1. Rebased the patch. Incorporated recent first pass changes.
2. Turned on the first pass unit test.
Change-Id: Ia2f7ba8152d0b6dd6bf8efb9dfaf505ba7d8edee
Increase the qp-delta, mainly for low resolutions,
excluding case of very low bitrates.
avgPSNR/SSSIM gain of ~3-5% on rtc_derf set.
Small change on rtc set.
Change-Id: Ice03d04bd0340404d1957666ef154fd64fed0606
Affecting only speed 8.
Speed tests on Nexus 6 show 4% faster for QVGA and 2.4% faster for VGA.
Little/negligible quality regression observed on both rtc and rtc_derf sets.
Change-Id: I337f301a2db49a568d18ba7623160f7678399ae1
(Yunqing)
This patch added the missing initialization in temporal filter.
Borg test BDRate results:
PSNR: -0.019%(lowres); -0.013%(hdres);
SSIM: -0.001%(lowres); -0.010%(hdres).
Other q values gave comparable but no better results.
Change-Id: I7ad0c18b39e6f558342688e2fe1e12fdb133ce9b
This currently runs 1000 * 1000 = one *million* times which is quite
unnecessary. It's one of the slowest items in Jenkins and takes over an
hour for each of the larger transforms.
Change-Id: I01653b5e610683e1a2d778ec60cf5065562ab8db
Only for speed >= 7, and affects skipping of intra modes.
Threshold is set low for now, needs to be tuned.
Small/no difference in metrics on rtc clips.
Change-Id: If9bdbd43f08d1f80407cdd2e9e5e96780dcd2424
Added the multi-threaded first pass encoder unit test in VP9. The test is
to check if the new multi-threaded first pass encoder(namely, new-mt = 1)
still generates matching stats. In the unit test, the new-mt mode will be
turned on once the multi-threaded first pass implementation is checked in.
Change-Id: Ic21bb1a55c454f024cfd2b397a4c148cfe638218
For short_circuit set to level 1, skip newmv for 64x64 blocks if the
low temporal variance flag is set. Also modify threshold for 64x64 split
in variance partitioning.
Overall speed-up on noisy clips of 2-4%.
Only affect speed >= 7.
Change-Id: I384b3772007e84de6f8707e480d2ddf1fe1f907d
Avoid quality loss when copying partition of superblock with large motions.
Maximum consecutively copied frames can be set (currently 5).
Change-Id: I11c30575514f02194c0f001444cf4021609e5049
Also set the flag to 1 when exit early choosing 64x64 block
such that skipping new mv for golden works in these scenerios.
Change the size of prev_segment_id to the number of superblocks
to save memory.
Borg test shows quality regression of 0.012% on average PSNR
and 0.035% on SSIM.
Change-Id: I5014224c8617d439d35c66ece3fed9ae30b31d23
Adds an optional output framestats.csv file that prints comparions
per-frame instead of averaged over the entire clip. It prints
per-channel and combined metrics for SSIM and PSNR.
Change-Id: Id28dfade27bc5775b59a9d83cfe8b37d1d52b686
The fix relaxes the max qindex based on the data from previous loop of
coding if output frame size is greater than maximum frame size allowed
Change-Id: Iac1f63ec67559d68766e090a7cbb80b812b2560f
before calling vp9_apply_encoding_flags() which may crash if the
resolution was invalid. this is the same change as:
c0523090b vp8e_encode: check validate_config return
BUG=https://bugzilla.mozilla.org/show_bug.cgi?id=1315288
Change-Id: Icd2aab322422e83d3a778fca6d7789e5000239d7
If enabled will compute source_sad for every superblock on every frame,
prior to encoding. Off by default, only on for speed=8 when
copy_partition is set.
Change-Id: Iab7903180a23dad369135e8234b7f896f20e1231
Used with --framestats=file.csv. Currently prints raw codec QP (not
internal 0-63 range) and bytes per frame.
Change-Id: Ifbb90129c218dda869eaf5b810bad12a32ebd82d
Avoid many visual artifacts. Compression quality is improved by more
than 1%. Encode speed is about 4% for QVGA and 6% for VGA faster on
android.
Change-Id: I4dd0a81429ddf7efdef1e80a191da5fb8de8e8af
Renames SSIM to VpxSSIM as an upscaled weighted SSIM metric, then prints
Y, U and V channels unweighted as well as a weighted but not scaled SSIM
score that's 8/1/1 parts Y/U/V (same as VpxSSIM).
Change-Id: Iff800cc8f145314eeb1a9b4af1e11a25bec095ca
For speed 8, it speeds up the encoding on android by 6% for QVGA and
7.4% for VGA with the new threshold. Overall PSNR is improved by 0.667
for rtc.
Change-Id: I4a644560b32c0b5b4e9f49ffb953d000413a3732
If enabled denoiser will only denoise the top spatial layer for now.
Added unittest for SVC with denoising.
Change-Id: Ifa373771c4ecfa208615eb163cc38f1c22c6664b
This commit reworks the SSSE3 implementation of the forward 8x8
2D-DCT. It uses a cyclic rotation approach to the temporary xmm
registers. It reduces the average cycles from 158 to 154. The SSE2
version uses 169 cycles.
Change-Id: I1b79b9642aae0ed3fb3cefb5b70246e6de5d5caa
When aq=3 mode is on and the gf_cbr_boost is set: make sure golden frame
is always refreshed, and don't incorporate segement cost in qp setting
on the boosted golden frame.
Better performance on RTC set with gf_cbr_boost on,
for example with gf_cbr_boost=50, gains from ~0.5-3%.
Change-Id: Ie811f5e4d444ff3320bd6e2c1745b2c4c09a8460
Quality improved by 1.866 and 0.386 for two noisy clips (dark720p and
marcooffice720p), respectively.
Change-Id: Ib33a7672ae9ca53da156208f7cd13f07b5543e44
Avoid the qp-clamping on gf/alt frame if gf_cbr_boost_pct is set.
Change only affect CBR mode when gf_cbr_boost_pct is set.
Change-Id: I0655ed4f2b047c8ed1ed33a070c17960ad776704
This was much more amenable to optimization than the across filter.
Speedup of almost 2.5x
BUG=webm:1320
Change-Id: I49acc0f9cb2e7642303df90132cbc938acade4c4
Speed test shows 25% gain on vpx_idct16x16_256_add_neon(),
and vpx_idct16x16_10_add_neon() got trippled.
Change-Id: If8518d9b6a3efab74031297b8d40cd83c4a49541
The speedup is pretty poor. I would be concerned except the SSE2 is
worse:
Existing SSE2 improvement: 22%
New neon improvement: 35%
BUG=webm:1320
Change-Id: Ied598a261134aa6cbe69f96f58589d2bae17bf62
Increase the boost threshold below which GOLDEN update will use same
rate correction factor as INTER_NORMAL.
Improves performance when gf_cbr_boost_pct is set (between 0 and 100)
in CBR mode.
Change-Id: I9f54cc18664786a100b13a416b7137ae03bd0cab
Set short_circuit_low_temp_var to 3 for speed 8 for all res.
No strong visual difference on all clips.
Change-Id: Ia6d9a314291ab1c14d5421bbdd769974083aeb2a
Constraints on encoder config:
-target_bandwidth is no larger than 80% of level bitrate limit
-target_bandwidth * (1 + max_over_shoot_pct) is no larger than
88% of level bitrate limit
-min_gf_interval is no smaller than level limit
-tile_columns is no larger than level limit
Constraints on rate control:
-current frame size plus previous three frames' size is no larger
than the CPB level limit
-current frame size is no larger than 50%/40%/20% of the CPB
level limit if it's a key/alt-ref/other frame.
Change-Id: I84d1a2d6d6e3c82bfd533b3309ce999cfaba2c8b
The source sad could be used to copy the partition without going into
choose_partitioning function to speed up vp9 encoding. Computing source
sad takes little time. Speed test on Android and Linux shows little
encoding time gain (less than 1.4%).
Turned off for now since partition copy is turned off.
Change-Id: I61c9d5b8f22329760cb29a4ee30a7f9c232ce8d3
vp9: Set short circuit to level 3 for VGA for speed 8. Also change the
threshold_32x32 to 5/8*thresholds[1] to improve quality regression
caused to VGA clips.
Change-Id: Ia1590e91e7cb22be78d5b85013387bb1be4272e3
Comment out check on buffer underrun, as it currently fails
on some of the svc tests.
Also cast the update of bits_in_buffer_model_, as this can
go negative now due to the buffer underrun.
This fixes the issue in #1352.
BUG=webm:1350
BUG=webm:1352
Change-Id: Ibd4ef23921daf09e5c15b000aca904aa4573599c
For out-of-range cases, returned UINT_MAX instead of INT_MAX in the
sub-pixel mv search to be consistent with the "uint32_t" return type.
Change-Id: I8e206d771228c13d89bafbbe9f14722c8ecc6a7a
The function 'vp9_find_best_sub_pixel_tree_pruned_more' is modified
to return INT_MAX for handling invalid MV cases from UINT32_MAX.
yunqingwang:
patch 3: rebased on top of the tree.
patch 4: The return type of vp9_find_best_sub_pixel_tree* was changed
to uint32_t to fix ubsan warnings. Changing UINT_MAX back to INT_MAX
was not quite right. Patch 4 modified vp9_temporal_filter.c to accept
uint32_t.
(Note: Inconsistency exists in vp9_find_best_sub_pixel_tree*, which
will be fixed in a separate CL.)
Change-Id: Ib1a79dc2aa41ea6335c21669c76883cdbb7e0535
This reverts commit f0b491a52405abb1b3dbb6b2c74dd6a4c7a7ddb1.
This change results in unsigned integer overflows (as reported by
-fsanitize=integer) in datarate_test.cc,
for many of --gtest_filter=VP9/DatarateOnePassCbrSvc.OnePassCbrSvc*:
unsigned integer overflow: 167198 - 185560 cannot be represented in type
'unsigned long'
As the encoder didn't change, but the input with the change to
(correctly) use Y4mVideoSource, this revert is merely masking the issue.
BUG=webm:1352
Change-Id: Iecd9a6c83b3fca67c566732a5c92d36193cc2060
Fix compile warnings about implicit type conversion for
target=armv7-android-gcc in vpxenc.c.
BUG=webm:1348
Change-Id: I9fbabd843512f2a1a09f4bb934cd091e834eed9c
Comment out check on buffer underrun, as it currently fails
on some of the svc tests.
BUG=webm:1350
Change-Id: I73c88b800cdcc06bd2f900f7b7e2a5fd08248065
When source frame is altref, we only do zero-mv mode, so we can skip
the find_predictors(). No change in compression.
Small speed gain, ~1%.
Only affects 1 pass vbr with lookhead altref, for ytlive with
the macro flag USE_ALTREF_FOR_ONE_PASS on.
Change-Id: I9318c5da8521f017bf54919cd652438b3a6313d1
Remove superfluous test. Produces a small improvement in instruction scheduling.
Measured a 1% to 1.5% reduction in execution time for routine vp9_optimize_b
with different compilers.
No change in behavior.
TEST=Verified that encoded files match bit for bit, with and without this
change.
BUG=b/33678225
Change-Id: I2bf248d4c25fc0256147d7a8766ff9108ae9cba3
Add feature to copy partition from the last frame.
The copy is only done under certain conditions that SAD is below threshold.
Feature is currently disabled, until threshold is tuned.
Feature will be initially used for Speed 8 (ARM).
Under extreme case of always copying partition for speed 8:
Encode time is reduced by 5.4% on rtc_derf and 7.8% on rtc.
Overall PSNR reduced by 2.1 on rtc_derf and 0.968 on rtc.
Change-Id: I1bcab515af3088e4d60675758f72613c2d3dc7a5
Simplify address arithmetic on token_costs to reduce the number of generated
instructions that are used for address arithmetic inside routine
vp9_optimize_b. It also helps improve instruction scheduling depending on
compiler and optimization level.
Measured a 9.3% reduction in retired instructions and 5.3% reduction in
execution time for this routine with GCC v4.8.4 and optimization flags -O3,
and a reduction of up to 11.6% in execution time with other compilers.
No change in behavior.
TEST=Verified that encoded files match bit for bit, with and without this
change.
BUG=b/33678225
Change-Id: I6098650fb5cd2aa04e014fe6e68ca20761f3a21f
relocate the assignment to 'in' outside of the for loop. this quiets a
spurious warning in visual studio builds since:
86e340c enable vpx_idct32x32_1024_add_neon in hbd builds
+ give the variable a more descriptive name
BUG=webm:1294
Change-Id: I5c3da5c7939621477e0fc0ad3a1b2a3045c5bffd
Correctly set interp_filter to SWITCHABLE for INTRA mode.
Also reduce threshold on noise level for re-evaluating zeromv.
Change-Id: Id32c01e193209fb380aa07204f0be3babf29f70a
For when denoising enabled: change condition to enable
the recheck_zeromv_after_denoising for only very high noise level.
This is causing an issue, so enabling it for very high noise
to effectively shut it off.
Change-Id: Ic40d6025f3f398338cedd270d17c0ccd9a3daa84
The new test is causing valgrind failures:
[ RUN ] SSE2/VpxPostProcDownAndAcrossMbRowTest.CheckCvsAssembly/0
==28923== Invalid read of size 16
28923== at 0x724016: ??? (deblock_sse2.asm:146)
Disable during investigation. The test is new but the code is not.
Change-Id: I5521e5fd48a595e3798b833bf7e3cc97b81c1975
To avoid decode performance hit of 2% when running on hyperthreaded
cores.
This patch only uses the mutex's when we are running tsan.
This is safe because 32 bit operations like read and store are atomic
on all the platforms we care about. Tsan warns about race situations,
but in this case either situation ( read occurs before write or write
before read) the worst case is that we go around one extra time in the
loop. So the ordering doesn't really matter.
That said a few other things have been tried :
for instance as per here:
webrtc/base/atomicops.h#52
In this patch they use:
__atomic_load_n(i, __ATOMIC_ACQUIRE);
__atomic_store_n(i, value, __ATOMIC_RELEASE);
This code works on gcc, clang ( replacing protected write and read), and
avoids tsan errors. Incurring no penalty in performance. In C11 its
replaced by straight atomic operands.
However there is no equivalent in the visual studio's we support as
int32 on all windows platforms is already atomic. To avoid tsan like
warnings on windows we'd need to use interlocked exchange and the
end result doesn't gain us any thing.
Change-Id: I2066e3c7f42641ebb23d53feb1f16f23f85bcf59
Implement vpx_post_proc_down_and_across_mb_row in NEON.
Runs about 6-7x faster than C.
BUG=webm:1320
Change-Id: Ic5c7d3552a88cfcf999ec5bf2bd46fee460642c2
The flag USE_ALTREF_FOR_ONE_PASS allows for alt-ref lookahead
in 1 pass vbr (from https://chromium-review.googlesource.com/#/c/365498).
This change is to make sure this macro flag only has effect if
the config flag cpi->oxcf.enable_auto_altef is also on.
No change in ytlive encoding, as USE_ALTREF_FOR_ONE_PASS is not
yet enabled.
Change-Id: I1a69681e4a15c5244581a3dab4587fca08f02e0f
Reapply this patch:
ff0107f Amend and improve VP8 multithreading implementation
Amended the patch to add a unit test, and fix an asan error.
BUG=webm:851
Change-Id: I6572c03256169c64e80248bf5a5e99f59a2fc93c
use_base_mv assumes 2x2 scaling, so fix is to shutoff
this feature unless spatial scale factors are 2.
Added svc unittest for 2 spatial layers with 5x5 scaling,
which generates the issue without this fix.
Also fix some settings in svc unittest:
let the speed setting vary (from 5 to 8), and enable static threshold.
BUG=webm:1344
Change-Id: Idfd0a6c633c21b49a0479601506302cfe974e30e
Set #threads to default 1 for all streams, change bit allocaton
for 3 temporal layers, and enable denoiser on middle resolution layer.
Change-Id: I4a57adbfdb2c319002b8f3cf359613842dc00d75
after:
2d3d95f enable vpx_idct16x16_256_add_neon in hbd builds
reorder INCLUDEs and fix indent of IF/ENDIFs
remove vpx_config.asm to avoid multiple symbol definitions in windows
builds and shift idct_neon.asm.S to the top to allow use of
CONFIG_VP9_HIGHBITDEPTH in the export list.
Change-Id: I0dacfbae62a6ec8fe4a26940c1a52da2dfad2029
One of the first pass stats "new_mv_count" is no longer used in VP9,
and is removed. This also makes it easy to implement a multi-threaded
first pass. This change doesn't affect the coding performance, which
has been verified by borg tests.
Change-Id: I4c7c7bf9465fda838eb230814ef0c631c068c903
1. Use correct projections when copying real dct/quant outputs.
2. Remove local random number generator and combine loops.
3. Quantization with minimum allowed step sizes instead of maximum.
This may generate larger inputs.
Change-Id: I154afc26230c894d564671cff4b8fd5485b69598
vpx_idct16x16_256_add_neon_pass1, vpx_idct16x16_10_add_neon:
this was a constant 8 in all cases meaning the results are stored
contiguously, this allows the number of stores to be reduced.
Change-Id: I7858a0a15a284883ef45c13dfd97c308df9ea09e
Use the segment weight factor based on the target (cr->percent_refresh)
if it less than the current estimate (avergae of past usage and target).
Small improvement at low bitrates.
Change-Id: Iba8fd909e203f94458901366d3a991f7ea854d49
the expansion of findstring and rtcd_dep_template_CONFIG_ASM_ABIS needs
to be deferred until the block is parsed as makefile syntax rather than
eval time where rtcd_dep_template_CONFIG_ASM_ABIS will be unset. this
ensures vpx_config.asm is properly created.
Change-Id: I7c38c6c082da78397936467482789dd468adc316
* changes:
Update vpx_idct4x4_16_add_neon() to pass SingleExtremeCoeff test
Refine 8-bit 4x4 idct NEON intrinsics
Add idct speed test.
Update partial_idct_test.cc to support high bitdepth
force this to be created before any other .S files. this change
additionally removes the file from the source list as it doesn't need to
be compiled on its own.
Change-Id: I6b4cd56ef6059d08f75f06fb749cddf76e0e165e
Increase the motion threshold and qp-delta for segment#2 boost.
This can increase the frame-drop at low bitrates, but generally
better spatial quality.
Only affects real-time mode with aq-mode=3, at very low bitrates.
Change-Id: I5ccb784667f70d0c27d369806b93b1f93d5605d1
For some filter level, the C/MSA doesn't match SSE2. Part of unit tests
are disabled. They will be re-enabled when C/MSA funcs are fixed.
BUG=webm:1321
Change-Id: Ib16b98b5eecb15d2252aa4ea267b782ee2b27533
replace downloads.webmproject.org with the canonical
storage.googleapis.com/... form. this appears less likely to fail when
dealing with multiple concurrent connections.
Change-Id: I0dcbd04df9e4057fa851f458b3ef7e3589f1f2f1
best_sub8x8[1] won't be used meaningfully when is_compound is false, but
may trigger an msan warning as the value is copied around and later
clamped.
BUG=667044
Change-Id: Icc24c3b72cdb550bebea44d4aaa4ff8bf3fbab56
Use the same feature as https://chromium-review.googlesource.com/#/c/411327/,
but allow it to be used for speed = 6 and 7, where
short_circuit_low_temp_var = 1.
Speed up of ~2-3% for speed 7, with little/no loss in compression.
Change-Id: I263a0f261ad9929034392d68f0153dc6376fdb5f
Remove unnecessary "virtual" before some functions. Change *_btm_* in
variable names to *_bottom_*.
Change-Id: Ifd4ce667537617f451cdfed47dd8c48817fd983b
VpxEncoderThreadTest was taking a very long time for some runs and
timing out a lot. This is an attempt to split the test into runs
that can be run nightly ( speeds 2 through 9) and runs that can
be run weekly ( speeds 0-1 ).
Change-Id: Iee6f61a561006d3a30381dd3b52b9a4dce07a70c
This is a boolean value that is written into bitstream, any value other
than 0 or 1 could have led to unexpected behavior. This commit fix the
issue by adding validation of the value to make sure it is boolean.
BUG=webm:1339
Change-Id: I2d3e69e8dbefcab9a0db9cb39a91a40ce531c5a1
fixes reloc errors like:
R_X86_64_PC32
vpx_dsp/x86/deblock_sse2.o:
requires dynamic R_X86_64_PC32 reloc against 'vpx_rv' which may overflow
at runtime
Change-Id: I218fc0e7c8258197f890d395f335e5a4fe82dccb
tests with 'Large' in the name are reserved for slow running tests which
may not be run on all platforms
Change-Id: I2a7d6dd46b29b50469893e46433844132fb727c2
This runs multiple encodes and decodes of vp8 and vp9 in parallel,
with so many threads that problems with synchronization can show up.
Change-Id: I2b297e7f43d1e741323c7ad9f50a3931ae609f16
Add a new, more aggresive short circuit: short_circuit_low_temp_var = 3 to skip
golden of any mode when variance is lower than threshold for low res.
This change only affects speed = 8, low resolution.
Metrics for avgPSNR/SSIM on rtc_derf (low resolution) show loss of
0.27/0.31%.
On Nexus 6, the encoding time is reduced by ~2.3% on average across all
low-res clips.
Visually little change on rtc_derf clips.
Change-Id: Ia8f7366fc2d49181a96733a380b4dbd7390246ec
this removes the need for __STDC_LIMIT_MACROS which is defined in
vpx_integer.h, but may be preceded by earlier includes of stdint.h;
fixes build with the r13 ndk
Change-Id: I3950c8837cf90d5584a20ce370ae370581c2182c
avoids the definition of min/max macros in headers that may appear in
c++ unit tests. the codebase uses VPXMIN/MAX for this purpose in any
case
Change-Id: I2b679b045d64fb34fd8780f704e3caf10a758d82
use 'android/cpufeatures' rather than 'cpufeatures'; this matches the
documentation, fixes compilation with r12b/r13 and still works with
older ndks.
Change-Id: I2f34233c164e6d4d46428f8905d5502cea4288a2
Changes only affects speed = 8 for low resolutions.
Metrics for avgPSNR/SSIM on rtc_derf (low resolutions) show loss of
0.5/0.6%.
On Nexus 6, the encoding time is reduced by ~5.9% on average across all
low-res clips.
Visually little/no change on rtc_derf clips.
Change-Id: I68dd50e558d72dcc1af8317d224bfae5e3bd872d
This commit enables asymptotic closed-loop encoding decision for
the key frame and alternate reference frame. It follows the regular
rate control scheme, but leaves out additional iteration on the
updated frame level probability model. It is enabled for speed 0.
The compression performance is improved:
lowres 0.2%
midres 0.35%
hdres 0.4%
Change-Id: I905ffa057c9a1ef2e90ef87c9723a6cf7dbe67cb
this was enabled in:
3ae2597 idct,NEON: add a tran_low_t->s16 load adapter
+ enable it for all NEON configs, both intrisincs and assembly versions
exist
BUG=webm:1294
Change-Id: I339088b2a398200f95658d040034fb9b2a7c8ce0
usage of the vp8 versions was removed in:
3f72509 vp8: remove VP8_SET_DBG* control support
vp9 had the usage stripped even earlier.
Change-Id: I978142eb6492552cd29c9c6feb1e89acfc5f7b84
this was enabled in:
3ae2597 idct,NEON: add a tran_low_t->s16 load adapter
+ enable it for all NEON configs, both intrisincs and assembly versions
exist
BUG=webm:1294
Change-Id: Iaade219e9d1de7b69423670d3ea6271b0965e068
idct4x4 and idct8x8 were universally enabled for high-bitdepth builds
in:
3ae2597 idct,NEON: add a tran_low_t->s16 load adapter
BUG=webm:1294
Change-Id: If142afb169c48728cc4b222e7c41aa4a63f95f0f
replace load_and_transpose_s16_8x8() in idct32_6_neon() with a separate
load_tran_low_to_s16() and transpose_s16_8x8(). the combined function is
used in idct32_8_neon() where the input is the correctly sized output
from the earlier stage.
BUG=webm:1294
Change-Id: I4257c4b3a421b2cf5d13651f966eee0680ef98a9
For noisy content, be more aggressive in skippping some blocks for
delta-qp to reduce noise pulsing artifact. Also treat frame boundary
case when dimension is not multiple of superblock size/64.
Only affects non-screen content case, and when source noise
is measured to be high (at least level kMedium).
Change-Id: Ib13a2a20ed1ce37ff3c44d95c3ef2635fd695222
This uses the same sdx4df pointers as vp8_diamond_search_sadx4 and
should therefore target the same optimizations.
See e4ddf9db6a37eee59c079f5ae427643ae3424fcf
Change-Id: Ic298e9b25c34bbe6b7a0799509355b0addb56675
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.