Avoids a segfault in high-bitdepth builds.
This restores the condition to its state prior to:
7991241 vp9: Change the scheme for modeling rd for bsize 32x32.
BUG=webm:1250
Change-Id: I6183d5b34cb89dfbf27b7bb589812148a72cd7de
For real-time CBR mode, use model_rd_for_sb_y_large instead of
model_rd_for_sb_y for 32x32 block. In the former model, transform
might be skipped more aggressively in some condtions, which speeds
up encoding time with only a little PSNR/SSIM drop on rtc test set.
No obvious visual quality regression.
PSNR effect on different speed settings:
speed 8 rtc: 0.129% overall PSNR drop, 0.137% SSIM drop
speed 7 rtc: 0.135% overall PSNR drop, 0.062% SSIM drop
speed 5 rtc_derf: 0.105% overall PSNR drop, 0.095% SSIM drop
Speed up:
gips_motion_WHD, 1mbps: 3.29% faster on speed 7, 2.56% faster on speed8
gips_stat_WHD, 1mbps: 2.17% faster on speed 7, 1.62% faster on speed8
BUG=webm:1250
Change-Id: I818babce5b8549b4b1a7c3978df8591bffde7173
Use quotes whenever possible and {} always for variables.
Replace multiple set_all calls with *able_feature().
Change-Id: If579d3f718bd4133cf1592b4554a8ed00cf9f2d3
decoder_peek_si_internal could potentially read more bytes than
what actually exists in the input buffer. We check for the buffer
size to be at least 8, but we try to read up to 10 bytes in the
worst case. A well crafted file could thus cause a segfault.
Likely change that introduced this bug was:
https://chromium-review.googlesource.com/#/c/70439 (git hash:
7c43fb6)
BUG=chromium:621095
Change-Id: Id74880cfdded44caaa45bbdbaac859c09d3db752
When building without multithreading and for a non-arm, non-x86 system,
ctx is unused.
Cleans up -Wextra warning:
unused parameter ‘ctx’ [-Werror=unused-parameter]
Change-Id: Ifddff89d2ebd45f7d71e3d415a8f2415dd818957
'duration' is not used in realtime-only mode:
Cleans up -Wextra warning:
unused parameter 'duration' [-Wunused-parameter]
Change-Id: I827dfe59ebcdc72c5a93fdf7e5aca063433914b1
In vp9_pick_inter_mode(), instead of using
vp9_get_pred_context_switchable_interp(xd) to assign filter_ref,
we use a less strict condition on assigning filter_ref.
This is to reduce the probabily of entering the flow of not
assigning filter_ref and then skipping filter search.
Overall PSNR gain 0.074% for rtc dataset
Details:
Low Mid High
0.185% -0.008% -0.082%
Change-Id: Id5c5ab38d3766c213d5681e17b4d1afd1529e676
Allows building simple targets with sane default flags.
For example, using the Android arm64 toolchain from the NDK:
https://developer.android.com/ndk/guides/standalone_toolchain.html
./build/tools/make-standalone-toolchain.sh --arch=arm64 \
--platform=android-24 --install-dir=/tmp/arm64
CROSS=/tmp/arm64/bin/aarch64-linux-android- \
~/libvpx/configure --target=arm64-linux-gcc --disable-multithread
BUG=webm:1143
Change-Id: I06f5a7564f5382cf1a4bad41aef4308566c53adf
Speed test shows the new vertical filters have degradation on Celeron
Chromebook. Added "X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON" to control
the vertical filters activated code. Now just simply active the code
without degradation on Celeron. Later there should be 2 set of vertical
filters ssse3 functions, and let jump table to choose based on CPU type.
Change-Id: I37e3e9c5694737d9134a6bce6698d3e43f8fc962
For real-time CBR mode, use model_rd_for_sb_y_large instead of
model_rd_for_sb_y for 32x32 block. In the former model, transform
might be skipped more aggressively in some condtions, which speeds
up encoding time with only a little PSNR/SSIM drop on rtc test set.
No obvious visual quality regression.
PSNR effect on different speed setting:
speed 8 rtc: 0.129% overall PSNR drop, 0.137% SSIM drop
speed 7 rtc: 0.135% overall PSNR drop, 0.062% SSIM drop
speed 5 rtc_derf: 0.105% overall PSNR drop, 0.095% SSIM drop
Speed up:
gips_motion_WHD, 1mbps: 3.29% faster on speed 7, 2.56% faster on speed8
gips_stat_WHD, 1mbps: 2.17% faster on speed 7, 1.62% faster on speed8
Change-Id: I902f62def225ea01c145d7e5a93497398b8f5edf
Due to rounding used computation, HDB variance computation may produce
slightly negative values. This commit adds clamping to make sure
output variance values for 10 and 12 to be non-negative.
Change-Id: Id679aa55a4c201958c4c7d28cd8733b9246a71c8
This commit adds an encoder workaround to support better
compatibility with a non-compliant hardware vp9 profile 2 decoder.
The known issue with this decoder is:
The decoder assumes a wrong value, 127 instead of the correct
value of 511 and 2047, for any assumed top-left corner pixel in
UV planes for 10 and 12 bit, respectively. Such assumed
top-left corner pixel is used for INTRA prediction when a real
decoded/reconstructed pixel is not avalable, e.g. when it is
located inside the row above the top row or inside the column
left to the leftest column of a video image.
Change-Id: Ic15a938a3107e1b85e96cb7903a5c4220986b99d
decoding is done if the decoder is available, with errors handled
accordingly. the encoded frame count should be sufficient for this test.
+ remove HandleDecodeResult() as it's redundant given the base
implementation
BUG=webm:1233
Change-Id: I513c1c3475c58a746f4df627491bdc392fe21416
development has moved to the nextgenv2 branch and a snapshot from here
was used to seed aomedia
BUG=b/29457125
Change-Id: Iedaca11ec7870fb3a4e50b2c9ea0c2b056a0d3c0
This commit refactors the trellis coefficient optimization process.
It saves multiplications used to generate the final dequantized
coefficients. It removes two memset operations on quantized
and dequantized coefficient sets. This improves the unit speed
by 10%.
Change-Id: I23f47c6e14582520a7f952f03ce8f72183e7f0e6
Each time a codec is enabled or disabled with the umbrella
--enable-vpN flag, set the encoder and decoder configurations as well.
This was done as a post-processing step but doing that lost the order of
the arguments.
BUG=webm:1205
Change-Id: Ic629bfdd06acc04bc5a7227309f36bba54dad8b1
Since combining VPX_DL_REALTIME with VPX_RC_FIRST_PASS is basically
nonsense, ignore the user's pass setting when this happens and
behave as if the requested encode is a single pass encode.
BUG=webm:1233
Change-Id: I5ee4c4e5838c4ca6d24988890aae490b10826db2
The logic can be incorporated into configure.sh
Removes a dependency on ios-version.sh which was not part of DIST-SRCS
and removes a warning from 'make dist' sub builds:
../src/build/make/configure.sh: line 787:
../src/build/make/ios-version.sh: No such file or directory
Change-Id: Ic38314708eb278dd9d2a9769a670da32f6126637
This value is signed in vp9/10
Cleans warning in Android build:
comparison of integers of different signs: 'unsigned int' and 'int'
if (cpi->frames_since_golden == (cpi->current_gf_interval >> 1))
~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Change-Id: Ie137724982f3a46c8c1820548c1960d62a4e96f2
left_above_mv and above_block_mv return as_int
as_int is defined as uint32_t in vp8/common/mv.h
Cleans up -Wextra warnings:
signed and unsigned type in conditional expression
this_mv->as_int = col ? d[-1].bmi.mv.as_int : left_block_mv(mic, i);
^
this_mv->as_int = row ? d[-4].bmi.mv.as_int : above_block_mv(mic, i, mis);
^
left_mv.as_int = col ? d[-1].bmi.mv.as_int :
^
Change-Id: Ia043764e4ce93d2152d2269b1c7b28b5d5f814cf
This commit change to use int64_t to represent the sum of pixel
differences, which can be negative.
This fixes a number of ubsan warnings.
BUG=webm:1219
Change-Id: I885f245ae895ab92ca5f3b9848d37024b07aac98
Use ~15 instead of 0x..F0
Cleans warning in Android build:
comparison of integers of different signs: 'unsigned int' and 'int'
if (((cm->Width + 15) & 0xfffffff0) !=
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
comparison of integers of different signs: 'unsigned int' and 'int'
((cm->Height + 15) & 0xfffffff0) !=
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
Change-Id: Iac25839cde3425b7b9db7f33740dc46a551b7546
For VBR: (1) allow newmv mode for golden ref to
select interpolation filter (as in last ref case), and
(2) don't use the more aggressive tx-skip testing logic for large blocks.
Only affects 1 pass real-time vbr mode (speed >= 5).
PSNR/SSIM metrics on ytlive set are all positive, ~0.5-2% gain.
Change-Id: I0ffbb0a9755563a5acd6230c58236e4f19a47266
This change is only for real-time mode if short_circuit_low_temp_var
is on. Add bias to last frame in choosing ref frame for partitioning,
when y_sad and y_sad_g are close. It speeds up real-time encoding by
0.5% on some clips with less than 0.1% overall PSNR drop on rtc test set.
Change-Id: I2a2110fe36455f3d8f0fc404aef2228f512e8df8
Cleans warning in Android build:
comparison of integers of different signs: 'unsigned int' and 'int'
int n = (int)VPXMIN(sizeof(clear_buffer), data_end - data);
^ ~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~
Change-Id: I964355ceae6b39e22c0196294b25e28387f84945
Defined as unsigned in VP8_CONFIG
Cleans warning in Android build:
comparison of integers of different signs: 'unsigned int' and 'int'
if (cpi->oxcf.number_of_layers != prev_number_of_layers)
~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~
Change-Id: I969e64cd2bfda6e61c564476dbd35b892b177646
The vpx_roi_map_t and vpx_active_map_t structures use unsigned rows
and cols but VP8_COMMON uses signed values for mb_rows and mb_cols.
Cleans warning in Android build:
comparison of integers of different signs: 'int' and 'unsigned int'
if (cpi->common.mb_rows != rows || cpi->common.mb_cols != cols)
~~~~~~~~~~~~~~~~~~~ ^ ~~~~
comparison of integers of different signs: 'int' and 'unsigned int'
if (cpi->common.mb_rows != rows || cpi->common.mb_cols != cols)
~~~~~~~~~~~~~~~~~~~ ^ ~~~~
comparison of integers of different signs: 'unsigned int' and 'int'
if (rows == cpi->common.mb_rows && cols == cpi->common.mb_cols)
~~~~ ^ ~~~~~~~~~~~~~~~~~~~
comparison of integers of different signs: 'unsigned int' and 'int'
if (rows == cpi->common.mb_rows && cols == cpi->common.mb_cols)
Change-Id: If1f118c20ffefd2530fbd371e6787cc8a6c31f0a
Mode is signed
Cleans warning in Android build:
comparison of integers of different signs: 'int' and 'unsigned int'
if (ctx->oxcf.Mode != new_qc)
~~~~~~~~~~~~~~ ^ ~~~~~~
Change-Id: I5cf81c40b103e688a31e1339511f5c9eb27edd38
1. Skip golden non-zeromv and newmv-last for bsize >= 16x16 if the
temporal variance obtained from choose_partitioning is very low.
2. Skip horz and vert INTRA mode for speed 8.
This change works best on the clips with little noise and with some
motion (e.g. gips_motion which has > 5% speed up). PSNR drop is 1.78%
on rtc test set, no obvious visual quality regression found.
Change-Id: Ib43b5b20e67809d03c5a6890818ddff59e1fc94a
Move initialization of a some new "twopass" values
to the function vp9_init_second_pass() and some other
small changes.
Remove #if GROUP_ADAPTIVE_MAXQ as this is always
enabled now.
Change-Id: I1dbec2fd7c419779848aa987c4cd7824d4df8456
the difference between src and dst will be signed, the error will be
unsigned.
quiets -fsanitize=integer:
unsigned integer overflow: 4294967295 * 4294967295
Change-Id: I580813093ee46284fde7954520dfcb1188f79268
the difference between src and dst will be signed, the error will be
unsigned.
quiets -fsanitize=integer:
unsigned integer overflow: 4294967295 * 4294967295
Change-Id: I502fd707823c4faaa7f587c9cc0312f057e04904
On scene-cut detected frames (i.e., high_source_sad = 1), use
nonrd_pick_partition (over choose_part + select_part), as
the nonrd_pick partitioning is generally better.
Small positive increase in metrics on ytlive set (~0.5 - 1%).
Negligle overall speed decrease, as its only used on scene-cut frames.
Only affects 1 pass vbr mode, speed = 5.
Change-Id: I07c89cbdc75f5bb16eb8e0e2773ead0980d2de5c
This reverts commit be12fefa4b
and commit 057c1c4034.
Also, the mismatch between the avx version and the
c version has been fixed.
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1168
For a rt encode using 1080p@60fps material, up to 11% performance
improvement overall was seen.
Change-Id: Icd1f216209ebc6fc0b8da885f32f356fa4355ed0
The eob of a block is not perperly set when skip_recode is true,
thus triggering assert(eob <= default_eob) to fail.
Change-Id: Ifecbe33dce2dc4903e0a80bd384dc09bf0dd8a44
Code cleaup, use existing rolling_actual/target metrics instead,
set threshold to get same/similar effect.
Little/no change in metrics on ytlive set.
Change-Id: I74f3c3d0a143a9cf20dc9c3dee54c0f7e6a97a51
Add a max condition and lower the min value.
No change in behavior (metrics for yt live set) for the
default min/max_gf_interval=4/16 settings.
Small positive change when min/max_gf_interval=7/16
(for 60fps clips on ytlive set).
Change-Id: I1c1d72425c86c69419ea43fb9730130e81062f91
add an upper bound to the framerate denominator above which 30fps will
be reported; fixes warning in corrupt / fuzzed files
Change-Id: I46a6a6f34ab756535cd009fe12273d83dcc1e9f1
Provides more comprehensive coverage for --enable-coefficient-checking.
The intent is to make the --enable-coefficient-checking option
consistent with the VP9 spec.
Change-Id: I12d0120756d17572ca2b2d7e6a2ab9d8071d8d58
Error messages:
..\vp9\common\vp9_loopfilter.c(1312): warning C4244: 'function' :
conversion from 'uint64_t' to 'unsigned int', possible loss of data
[.build-x86_64-win64-vs10\vpx.vcxproj]
..\vp9\common\vp9_loopfilter.c(1313): warning C4244: 'function' :
conversion from 'uint64_t' to 'unsigned int', possible loss of data
[.build-x86_64-win64-vs10\vpx.vcxproj]
..\vp9\common\vp9_loopfilter.c(1312): error C2220: warning treated as
error - no 'object' file generated
[.build-x86_64-win64-vs10\vpx.vcxproj]
Change-Id: Ia69260611997cd2ba41c7184a85ecead740a7c07
Increase in the damping used in adjusting the active Q range.
This does hurt rate accuracy a little in a few extreme cases
especially if the clip is very short*, but helps metrics.
* Note that the adjustment is applied at the GF/ARF group level based
on what happened in the last group. Hence for very short clips where
the length of a single group may be a significant % of the clip length
there is still scope for some drift that cannot be accommodated.
In practice most data points in our test sets are now much closer to target
than was previously the case with default settings and in some cases are
better even than they were with the command line undershoot and overshoot
parameter was set very low (e.g. 2%). For example in bridge_close at high rates
the old mechanism was unable to adapt enough to prevent extreme overshoot.
Change-Id: I634f8f0e015b5ee64a9f0ccaa2bcfdbc1d360489
Change to the calculation of the error divisor used in
get_twopass_worst_quality(). This follows on from other
changes to the rate control that impact the output of this
function.
Change-Id: I414fa9aa1e6a68a64dccea17c3712f44b8a0c10c
Changes to the function the redistributes bits from overshoot
or undershoot throughout the rest of the clip to respond more
quickly.
Change-Id: I90f10900cdd82cf2ce1d8da4b6f91eb5934310da
Added a factor based on the bit spend in the last arf group vs the
target to adjust the choice of the active worst quality in subsequent
groups.
Helps clips where previously there was a big overshoot or undershoot
to adapt and get closer to the target rate.
Change-Id: I67034b801679b99024409489a2273ea6fe23b8e6
The use of this value is preventing rate adjustment on clips
or sections that have very little motion but high noise and
this can give rise to some sections with massive overshoot.
Change-Id: I9a65c7c1148dc5d3a7d8b23e50fc1733f3661621
Replaced vpx_d45_predictor_4x4_ssse3(), vpx_d45_predictor_8x8_ssse3()
and vpx_d207_predictor_4x4_ssse3() with
created vpx_d45_predictor_4x4_sse2(), vpx_d45_predictor_8x8_sse2()
and vpx_d207_predictor_4x4_sse2() respectively.
It's mostly neutral or slightly worse than ssse3 in good cases and
better than ssse3 in the bad cases (but still worse than using the mmx
regs).
Change-Id: Ib0237ceb71d2c57b8a93fd3170330cfed9d56bdd
Skip intra-mode and some inter-modes (newmv, nearmv, nearestmv) for
golden frame if the variance got from choose_partitioning is very low.
Only for 1 pass real-time CBR mode and bsize >= 32x32, it has ~2.5%
speed up with less than 0.1% PSNR drop for rtc test set. Don't see
visual regression.
Change-Id: I70efbc95a1007231ae36f02c5b2fbf6cd35077ad
Reduce operations and jumps. perf shows CPU time reduced from 1.9% to
1.6% when decoding fdJc1_IBKJA.248.webm on Xeon E5.
Will apply the changes to vp10 after code review.
Change-Id: I9351509922855d8896ddef1ed093b3ca12619a61
For non-rd pickmode:
best_pred_sad, computed for NEWMV-last, is only used for
skipping golden non-zero modes. Add condition to avoid this
computation if not used (i.e, if golden nonzero modes are not used).
And remove code for computing best_pred_sad for NEWMV-golden,
since that sad is not used.
No change in behavior; small speed gain (~1%) for svc encodes.
Change-Id: Ic2cbdef6c4e9a233a57c0db0eeac8ad5fcead366
convert the random value to int16 before subtracting 256 from it; quiets
a ubsan (sanitize=integer) warning
BUG=webm:1225
Change-Id: Ibc2c5a21f30e112bd6c180f7d6a033327c38d0df
Function level timing test shows about 27% time saving on
a Xeon E5-2680 v2 desktop.
Rename vp9_dct_sse2.c to vp9_dct_intrin_sse2.c for vp9 and
rename dct_sse2.c to dct_intrin_sse2.c for vp10 to avoid
duplicate basenames.
Actually vp9_fwht4x4_mmx/sse2() and vp10_fwht4x4_mmx/sse2()
are identical. TODO: They should be unified later if there is
no intention to keep a duplicate.
Change-Id: I3e537b7bbd9ba417c606cd7c68c4dbbfa583f77d
C does not allow for shifting into the sign bit of a signed
integer, and the two instances here become signed ints via
promotion. Explcitly cast them to unsigned MEM_VALUE_T to
avoid the problem.
BUG=https://bugs.chromium.org/p/chromium/issues/detail?id=614648
Change-Id: I51165361a8c6cbb5c378cf7e4e0f4b80b3ad9a6e
Followed the code style of other lpf fuctions.
These 2 functions put 2 rows of data in a single xmm register,
so they have similar but not identical filter operations,
and cannot share the same macros.
Change-Id: I3bab55a5d1a1232926ac8fd1f03251acc38302bc
- Avoid excessive copying
- Don't both searching if no update can possibly offer savings
- Simplify the interface
- Remove the confusing vp9_cost_upd256 macro
Change-Id: Id9d9676a361fd1203b27e930cd29c23b2813ce59
Apple's version format specification is strictly checked on app
store submission, even for embedded frameworks:
http://apple.co/1WgelY1
The build version number should be a string comprised of
three non-negative, period-separated integers with the
first integer being greater than zero. The string should
only contain numeric (0-9) and period (.) characters.
So that's room for "1.5.0" but not for "1.5.0-906-g656f9c4".
The full version returned from 'version.sh --bare' is now
embedded under a 'VPXFullVersion' custom key in the Info.plist,
so it can still be extracted from the resulting framework.
Change-Id: If34a58d02e407379d1f1859fda533ef7f983170b
vp9_diamond_search_sad_avx was disabled in:
057c1c4 disable vp9_diamond_search_sad_avx
this removes a missing prototype warning as the prototype is no longer
included in vp9_rtcd.h. the file can be restored if someone gets around
to fixing the issue.
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1168
Change-Id: Ia9fda4b81c53dc5fba7c31d780d761f886940b52
Many codes require -mstackrealign flags. Although -mstackrealign has
been already added to CFLAGS of some modules, SIGSEGV occurs in other
modules than those modules.
The best way may be to find causes and to fix them. However, we
cannot know those causes until SIGSEGV occur really. In addition, if
SIGSEGV occurs in other programs, it will be fatal.
So adding -mstackrealign flags to CFLAGS unconditionally is
reasonable.
Change-Id: I999ef597a6afe97f5e7cc7bffaa866537c3eedd2
This reverts commit 2468163e07.
causes valgrind errors for overread of buffer in SubpelVarianceTest
Change-Id: I448e52c76f815ac199305b71f7d169f2bc167679
Move the logic for rechecking zeromv on denoised block out to simplify
the function. To simplify the param passing, add a new structure
VP9_PICKMODE_CTX_DEN which is only used when denoiser is enabled.
Change-Id: Iaa9b4396dfcb8147236c02d4a1868a09103a4476
This commit clarifies integer value range for vairables used in
several variance functions, also change to use proper type
conversion to reflect the value ranges.
Change-Id: Ic3234b83a912ce1ad12d1b254f3378763e15cc5c
The inlining mirrors what was done with the low bit depth
inter_predictor. And the new highbd_inter_predictor name is more
consistent with other high bit depth functions.
Change-Id: I96437f745759aeec6260c6e39a974bf36f1c211c
Rename and change to how its updated.
Only affects 1 pass vbr.
Small change in metrics (< ~0.1%) on ytlive set.
Change-Id: Ibb1fe485699b6c4a8194951c8f229abe2f64b9a5
Also allows use of --enable-shared when configuring for Mac OS X,
producing a bare .dylib.
Enabling the shared framework bumps the iOS deployment target to 8.0,
the minimum required to support dynamic framework deployment in apps.
When not using --enable-shared, a static library for iOS 6.0+ will still
be built.
Minimum version settings have been moved into ios-version.sh so they
can be updated in a single place.
As with the static build, unless header search paths are manually
tweaked, users must add a VPX prefix on includes, such as:
#include <VPX/vpx/vpx_decoder.h>
A module map for headers is not yet included as inttypes.h is not
modular; this means that VPX cannot be used directly in Swift code,
but can still be pulled in through an Objective-C wrapper.
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1092
Change-Id: I28fb06ce65e48ed167a88c14a7bfb2861989317e
In motion estimation stage for subpel motion, subpel variance is
computed use bilinear interpolation. The motion vector precision
used is at 1/8 pel and three bits are used to represent the x and y
subpel offsets. Based on this, the half pel check should be against
4, not 8.
Change-Id: I1f56fa1fa3f2f5e19a20d27983efe628557f170e
there are sse2 equivalents which is a reasonable modern baseline
Removed mmx variance functions:
vpx_get_mb_ss_mmx()
vpx_get8x8var_mmx()
vpx_get4x4var_mmx()
vpx_variance4x4_mmx()
vpx_variance8x8_mmx()
vpx_mse16x16_mmx()
vpx_variance16x16_mmx()
vpx_variance16x8_mmx()
vpx_variance8x16_mmx()
Change-Id: Iffaf85344c6676a3dd337c0645a2dd5deb2f86a1
Added actual and absolute rate miss values to the opsnr.stt
stats output line.
Changes to the borg graphing may be needed before merge.
Change-Id: I1e9d548ce445d29002f0c59ebfd3957a6f15e702
Bug found by Yunqing relating to the correction for size at 8K and
above in get_twopass_worst_quality().
The basis for the correction was changed to the linear size relative to
1080P as a baseline and the adjustment has been clamped to prevent
problems at extreme images sizes.
For 1080P the results on our test sets were neutral but the low res and
mid res sets saw a small gain (0.1%-0.2% average).
I would also expect some gains on 4k and larger content where the
previous correction was overly aggressive.
Change-Id: I30b026b5f4535e9601e3178d738066459d19c8fb
Add control API VP9E_SET_TARGET_LEVEL that allows the encoder to
control the output bitstream level and/or keep level related
statistics.
Usage:
255 do not care about level (default)
0 keep level related stats only
10 target for level 1
11 target for level 1.1
.
.
.
62 target for level 6.2
Usage for vpxenc:
--target-level=0/255/10/11...
Change-Id: I31d1aeca19358b893e7577b4e63748c8e614034a
For at least some of the implementations of sdx8f, such as
vpx_sad4x4x8_sse4_1, aligned moves are used to move the results into the
array.
Change-Id: I83df5a8e657b44e906d0d8b0bc154f1e5660f7f9
block_variance: This operates on 8x8s and would be safe with a int32 *
int32 to uint32 multiply, but this is potentially unsafe for 12-bit
input. Unfortunately the code already segfaults on 12-bit input:
https://bugs.chromium.org/p/webm/issues/detail?id=1223
calculate_variance: This operates on up to a 32x32 of 8x8s and can
overflow even with 8-bit input (log2((256*32*32)**2) == 36).
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1220
Change-Id: I1ca4ff6092db9a7580da371ee9a21f403fdadc40
Reduce factor for setting base-qp for active_best_quality (for inter-frames).
Small increase in metrics on yt live set.
Change-Id: I9cf0ac797783aeddbfaf1ff510696c9035d7c5ee
This change makes the c match the assembly and removes the todo's
associated with getting this to work.
Change-Id: Ie32e9ebb584a9d60399662d8bcb71b74fbd19d1e
These implementations rely on casting the pointers to load the data.
Clang implemented optimizations which automatically add alignment hints
to such loads. The 4x4 filters do not guarantee the necessary alignment
so the resulting assembly is broken.
https://llvm.org/bugs/show_bug.cgi?id=24421
BUG=webm:817
BUG=webm:892
Change-Id: I608885299f1f86ff83653b65e0e40d0ae87fb3fe
* changes:
vp9_frame_scale_ssse3.c: make 2 functions static
vp9_pickmode.c: make function static
vp9_noise_estimate.c: make function static
vp9_aq_360.c: add missing include
vp9_idct_intrin_sse2: add missing vp9_rtcd.h include
vpx_dsp/*.[hc]: add missing vpx_dsp_rtcd.h include
Makes the delta-qp stop little earlier on areas that have been refreshed enough.
This helps to reduce some pulsing artifact on noisy flat areas observed in some
noisy vc-clips.
Threshold changes only take effect for sources where noise level is estimated to
be >= medium level.
Only affects 1 pass CBR, non-screen content case.
Change-Id: Iacf557f6aa8abbcd6782c02ff2e6c14891960850
For 1 pass vbr mode:
Refactor to move the logic for gf setting based on up-coming
key frames to a separate function, so same logic can be used for
scene-cuts/changes.
Change-Id: Ic4ede308e08ba869bb62e4566e19ea31222c5229
Makes the noise estimation react little faster.
Little/no change in metrics.
Change only affects 1 pass cbr.
Change-Id: I13f0daa90ecbf9d49eb1cf2e48febd9d92292940
When building a dynamic framework with Swift compatibility, can't
include any headers that aren't in another module or you get an
error like this from Xcode on the including project:
Include of non-modular header inside framework
For some reason the system inttypes.h is not in a module, unlike
other standard C library headers... but it doesn't seem to be
actually needed on Darwin, so removing it doesn't appear to
be a problem.
Change-Id: I11d264483c54feefd9d2edf573afaef34ddcd0f2
When using git submodules, .git may be a file instead of a directory.
The -d test was failing in that case; switched to -e.
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1215
Change-Id: Iedf0e92bfeb003b28a415945dc729e6ce58c4fe4
"qc" in vp{9,10}_token_state is used to save quantized coefficients, this
commit changes the type from short to tran_low_t to properly reflect
the value range for highbitdepth build.
This fixes an out-of-range bug when optimize_b is used in highbitdepth
build.
Change-Id: Ibf330879e6ac6ae8f099e085caa9d3d9a889fde8
This is an actual overflow where the result of the calculation is
materially changed, not just a negative value that is stored in an
unsigned.
Caught with fsanitize=integer on the VP9/AqSegmentTest.TestNoMisMatchAQ2/1 test.
Change-Id: I514b0ef4ae7ad50e3e08c0079aa204d59fa679aa
In so doing this fixes a couple of bugs:
vpx_plane_add_noise.c needed to subtract a clamp instead of add.
And the assembly (mmx sse) had assumptions that parameters were
continuous in memory which was not true.
Change-Id: I76f2c43cf54bfc838eb2edf8a443eaaa7565d7b5
- iOS SDKs no longer ship with armv6 support.
- Our minimum iOS version means all target devices have neon.
- Remove armv6 darwin LD workaround.
- This removes a TODO.
Change-Id: I2fcb5b82c96213364275475be021c7dd8459d5c0
Move skin superblock force split out of this function as well
as some minor code refactors. Checked bitexact for different speed
settings and different resolutions.
Change-Id: I6078cbe88dd9ce6c0b69470a8a0a8f8d2274161b
this avoids the decoder test which was only correct for vp9, vp10 was
missed in the earlier change
Change-Id: Ib789c906d440c0e4169052cf64c74d5e4b196caa
First, we only set use_4x4_partition for key frame where we don't
denoise; second, envision we have small partitions, we should pass the
actual block size to denoiser and make an early termination if needed.
Change-Id: I331f42046d792b17360723d17ff817d601394658
Wrap around behavior is enforced manually and we use the values in
arithmetic involving negative integers.
Change-Id: I199706b6f3af91f4fb6fe2ef302fbbc6d0cf5785
The product always fits in uint32_t, but the operands don't.
An optimizing compiler should generate the wraparound code.
(Verified with clang).
Change-Id: I25eb64df99152992bc898b8ccbb01d55c8d16e3c
ADL will look this up from the callsite namespace iff it is declared
before the callsite or from the parent namespace of the class type (the
global namespace).
This patch has been tested on MSVS 2015 and clang-3.8.
Change-Id: I00ba74712c9b617b9d81761abed1e14d8f25d8e3
Block size passed into denoiser filter is always >= BLOCK_8X8 (in
vp9_pick_inter_mode), it is not necessary to check smaller block
size. Passed the bitexact test on clips with different resolutions and
noise levels.
Change-Id: I19fa3195d18c27d9e5de60dc11cff1522ef3714e
Fix will reset the consec_zero_mv map on non-skipped blocks with non-zero mv.
Adjust thresholds on consec_zero_mv in noise estimation and skin detection,
as more possible reset on map means lower thresholds should be used.
Change-Id: Ibe8520057472b3609585260b51b6f95a38fb777d
webm_read_frame is the only function now which requires
documentation for what the return value means (other two are quite
obvious - file_is_webm and webm_guess_framerate).
Change-Id: I7a4f7d8097b1d748812b2ee251ee718a0b5ce836
In VP9 internal denoiser, motion magnitude is computed from
best_sse_mv, which should be set to 0 at the begining. This bug may
cause visual aritifact in denoiser. Also, delete two improper comments.
Change-Id: I8710d2acba23320bc85cf72af17d65245c19438b
Need to check that sse for non-zero mv has been set for the current block
(i.e., check that nonzero-mv is tested as a mode, so newmv_sse != UINT_MAX)
before forcing to not use zero-mv for denoising.
Also increase some thresholds (sse and sse_diff) for high noise case,
and use shift operaton instead of multiplication on a threshold computation.
Change-Id: Iae7339475d57240316b7fa8b887c4ee3c0d0dbec
Resolved two TODO items.
Force a minimum value of 1.0 for frame duration as per section duration.
Column inactive zone is currently set to 0 as most of the serious issues
relating to inactive regions relate to letter boxing.
Change-Id: Ifbab3acf2c089d7305620a7ff7ed7c3536cc9235
In Aq mode 1 the segment and AQ delta for each block is based
on spatial variance. There may be a net imbalance between blocks
that have lower Q than the baseline value and those that have higher Q.
This patch monitors that imbalance and extends the allowed baseline
Q range for the frame to accommodate adjustment of that baseline value
to compensate.
Change-Id: Iae8a48c7c01fe2af94a141e149d03acf467237ca
So it can be used even with aq-mode=3 not enabled.
Also cleans up some code in the places where its used.
No change in behavior.
Change-Id: Ib6b265308dbd483f691200da9a0be4da4b380dbc
Removed this todo because of another todo which says none of this code
should exist. It should be integrated into the block by block encode
process as per the decoder.
Change-Id: I076bd15140a060e69c014dd7d7cd07fea260aba3
For 1 pass vbr mode.
Increase the gf interval for case where average Q is close to
max and high overshoot is detected.
Small increase in overall avg_psnr/sssim metrics (~0.2/0.1%) for ytlive,
but improves the low-end (low bitrate) for several clips (less overshoot).
Change-Id: Ifba40f25b4861b2e0d9832c82d5359a6a3dce9f2
More even spacing near key frame and avoid gf on scene cut
if its close to key frame.
Small increase in metrics for ytlive set (which uses key-period=150).
(~0.2% gain)
Change only affects 1 pass vbr mode.
Change-Id: If1e5a59baf1e0befbaf998522fbc47d94ac5b5df
Change only affects 1 pass vbr.
Use a q value somewhat larger (~6%) than avg_frame_qindex[INTER]
as basis for active_best_quality for inter-frames.
And use the minium of this (avg_frame_qindex) and the active_worst_quality.
This reduces some overshoot in ytlive clips.
Overall small but positive average increase in metrics (up on average ~0.2%).
Change-Id: Icdbaae7872d5675fd38a13c0ec6ce0e2e3b919ce
This was never hooked up for the 32x32_34 case as the neon_asm version
in 3f7c12da, when the intrinsics version was added.
Change-Id: Ic7db4ce5850c637315f9fe9e2de93a4f8cf9e320
Change recursive weight for average_source_sad and
put some constraint on spacing between detected scene-cuts.
Change only affects 1 pass real-time mode.
Change-Id: I1917e748d845e244812d11aec2a9d755372ec182
Correct the setting of Q basis of GF/ARF in 1 pass vbr.
Existing logic would switch to using avg_QP of key frame if
avg_QP of inter is less than active worst (even if key frame is
not last frame).
Instead fix the logic (as per the comment) to use the lower of
active_worst_quality and avg_Q for inter as basis for GF/ARF
active_best_quality (unless last frame was key frame).
Increase in metrics: AvgPSNR/SSIM up by ~0.7/0.3 on ytlive set.
Change-Id: I9a628378ec6684bfda9457ebfc2384ef6d8579f7
Adjustment to stop excessive prediction decay triggered by blocks
or frames with extremely low spatial complexity which rendered the
comparison of intra and inter coded errors meaningless.
This was causing much shorter than expected groups on some 4k
test content.
Change-Id: I3f2c64200ef6dcef4721fc9f2ec09e480056ffc2
Uses a metric on fraction of smooth blocks derived from first pass
stats in a frame to adjust down the cq_level modestly in the cq mode.
The current implementation does not add much complexity, and is
fairly light in the adaptation.
Change-Id: Ic484e810d5bd51b7bb6b8945f378c7c3d9d27053
Adjust the motion decay component to account for image size.
This has very little impact for smaller image sizes.
Average bdrate results for our HD test sets:-
Hdres set: opsnr +0,92%, Fast SSIM +1.6%
Netflix hd set: opsnr + 1.5%, Fast SSIM +3.1%
There are a couple of notable -ve clips such as cyclist and sunflower
which seem to be better with a shorter interval but also a few very big
wins such as Jets >12% psnr 22% Fast SSIM and from the Netflix
Netflix set PierSeaside 9.7% psnr and 18.2% Fast SSIM.
Change-Id: Ie43aaedaa74331ed83d624a13548094ac64fed9e
Change only affects 1 pass vbr mode, speed >=5.
Increase min_thresh, decrease boost, and set a min/max
value for gf_interval.
Change-Id: I9c1e1a1ab0c5780064eb62714ee39a72ea4d2107
Trap the case where we end up with a very short arf group just before
a key frame. Such a group often has poor quality and may cause pulsing.
For example if the KF is 17 frames away we are better doing two mid-size
groups of 9 and 8 than a group of 15 followed by a group of 2.
This becomes more and more important when coding with a short forced
kf interval though it may not impact our standard tests much.
Change-Id: I29d83d6637b203eac69be320dd35a7401a4678c1
This reverts commit 74aaa2389e. Unstable
under valgrind because of uninitialized reads. Limiting the bad bisect range.
Change-Id: I45b32f0ee0ba45795e7efb9947fb805830c8ce0e
- Use arithmetic AND (&) instead of logical AND (&&) to
generate correct testing input.
- Fix variance reference function to be consistent with
our codebase implementation.
- Refer to the following issue:
https://bugs.chromium.org/p/webm/issues/detail?id=1166
Change-Id: I8c1ebb03e22dc9e1dcd96bdf935fc126cee71307
Avoid copy-block when denoising is at LowLow level (i.e., no denoising is done).
Instead, don't enter denoiser at all, and when level goes back up over kLowLow
do a reset in denoiser.
Change-Id: I0544adf58f4dd51ecc4a4607fcb0353bfbbb7a59
only output[0] needs to be set, store_output is more involved than a
movdqa in the high bitdepth case
Change-Id: I2cbd85d7cf74688bdf47eb767934fe42e02bff67
Avoid doing the mcomp in denoiser if we don't denoise the
block (because of motion/SSE/skin threshold, etc).
This can reduce encoding time (with denoiser enabled) by ~1.5-2%.
Change-Id: Ia699b68dfd37b89cdf3a82b8aa40e8c8f98a3d4f
This make it more likely clean/low-noise content will
be set as LowLow, and hence no denoising will be done.
Also set early exit on denoising for small blocks.
Change-Id: I4a72bba3e6c5e2d523d304c39deacc9c39bf216c
Some cleanup and bugfix: pass mi_row/mi_col (not mv_col/mv_row)
to build_inter_predictors. This only affects case where
the frame is resized, but since denoising is not done on resized
frames, the fix has not effect currently.
Change-Id: I36617a7f0b43b6f49976745f15d400977e6ffa46
Switch to use new skin model.
And fix condition for denoising skin block.
Previous condition did not denoise skin blocks if the selected
mode was non-zero motion in current frame. Modify condition to
also force no denoising if that mode was not selected as zero motion
now and for at least "x" past frames in a row (x = 2).
Change-Id: I00753e3fe45b9a308a7ef43c58f11868e3bfc6b0
not strictly necessary, but allows projects using '-Wconversion
-Wno-sign-conversion' to reuse these headers.
Change-Id: Id1398d726c90173ccba9aea66798fcef6f20fa23
Change only affects 1 pass, vbr, speed = 5 (real-time mode).
Some improvement for high motion content.
AvgPSNR/SSIM metrics for ytlive set all up, on average ~2%,
some clips (high motion ones) up 4/5%.
Encoder speed down: on mynintendo_x1.1280_720.y4m: 47fps -> 44fps.
Change-Id: I9e3eaa6392dcb6b5b44ee6f43004f97ba859bc11
the vpx_decoder layer guarantees that when called directly this won't
receive NULL data and the reuse via decode() is protected by a NULL data
check and 0 size check (NULL data and non-zero data size is protected by
the vpx_decoder layer).
Change-Id: I7437fb5ca4e4aa431963d55b909d4d920f339be3
The mv is clamped in dec_find_mv_refs() to a smaller region
than the clamp in dec_find_best_ref_mvs(). See clamp_mv_ref
and clamp_mv2.
Change-Id: I47dd5f7fa8b42f2cc593559b4d7c782fe7bcb1db
In multi-thread case, the encoder may crash if using encoder option
tile-rows > 0. To prevent that, force tile-rows=0 in this situation.
This is a workaround for WebM issue 1095:
https://bugs.chromium.org/p/webm/issues/detail?id=1095
The further fix can be done by adding synchronizations after a tile
row is encoded. But this will hurt multi-threaded encoder performance.
So, it is recommended to use tile-rows=0 while encoding with threads
> 1.
Change-Id: I656cbcc200f8d0410d09530e7981ad8f32fe7bc9
This patch was to fix a reported Hangouts deadlock/freezing issue
in VP8 encoder(issue 27232610). The original encoder loopfilter
synchronization happened in the following frame, which was prone
to causing problems in some complex use cases. This patch simplified
the synchronization logic.
More testing needs to be done.
Change-Id: I38fd3f35d11f98fae1e44546aa5e4c6d6e19c4be
Allow the encode loop to select from a wider range of Q values
when encoding normal (non arf or kf) frames.
This change is targeted at improving psycho-visual quality in some
easy sections that are currently not getting enough bits.
This is likely to be a little worse from a metrics perspective and may also
have a small impact on encode speed in cases where extra recode
iterations are triggered.
Change-Id: I667eebf33c753bcbcf8b93596467369e5708b889
Adds a second threshold for recodes even on frames where
recode is normally disabled if there is a big rate miss.
Change-Id: Ifd4a34707da55ec15eb7cfb87de4644b8d76deb2
Fix the threshold for forcing refresh of golden frame based
on high motion. The current comparison was incorrect and
prevented this (force update of gf on high motion) from being used.
For now keep this logic under a flag (and off for now) so as to
not change behavior, until further testing.
Change-Id: Ib5f0082159a428b0603b9534e4bcb6f83e4ccb25
+5.857% BD-RATE on SCREEN_CONTENT
Leaving this off for non-screen content because:
+25.300% on TWITCH120
+37.833% BD-RATE on RTC
Change-Id: Ie0a312182d6cc859fb04298e4cd81d02b39e23fe
For 1 pass vbr mode: Increase the period of gf update on scene
cut (keep it same as orginal/default setting for now).
Change-Id: I679c3bd21152f6c4e486c8098d931c00e1d26b5f
This is the identical change submitted for vp8 here:
https://chromium-review.googlesource.com/#/c/274107/
Tested this change on Mac OSX (10.10) and Linux
(Linux Mint 17 / Ubuntu 14.04) and in both cases:
- downloaded and compiled latest source for libvpx and ffmpeg
- confirmed ffmpeg would build sub-second frame rate webm files
via the previous patch
- confirmed ffmpeg would *not* build fps < 1 for vp9
- made this change, recompiled libvpn and ffmpeg
- confirmed ffmpeg would now create the same webm with
fps < 1
- confirmed the resulting file would play and was vp9 (e.g.
would not play in Firefox (Linux version complained it was
VP9 but mostly could play it) or older vlc, etc., but does
play just fine in Google Chrome and a newer version of vlc.
Sorry I didn't catch this last time - but this seems a solid
change and it's handy to be able to create frame rates
less than one second.
-jk
Change-Id: I38fa32148de8c4c359f228cf08b9a4b83b5a52fb
The change https://chromium-review.googlesource.com/#/c/329181/
also changed behavior for cbr mode, which causes some regression
in screenshare test in webrtc.
Resetting the specific change to leave the cbr behavior
unchanged for now.
Change-Id: I52df158806422f86398e1d2f522e92067d8325eb
Some adjustments to inter-mode selection for vbr mode.
Condition some of the bias to low/zero motion on cbr mode, and
don't use int_pro_motion_estimation for golden ref
(treat it same as last ref).
Change only affect 1 pass vbr mode, speed >=5 (non-rd pickmode).
Encoding time increase within ~5%.
Avg PSNR/SSIM on RTC set increase by ~2%, all clips up,
ranging from 0.5 to 4%.
Change-Id: I0048d0104a8816773d91a2b1484d601169d9bad7
Don't advance the svc frame counters on dropped frame,
since this can break the referencing scheme and lead
to a crash/assert.
Updated svc-datarate unittest to add a lower bitrate test.
Change only affects 1 pass cbr svc, with frame dropper enabled.
Change-Id: Ibb7530b7a587a9344d46898d9286fd9e2ef0779c
Use the superframe counter to set the key frame, and force
it to the key frame on base spatial layer only.
Also, update svc frame counters under frame dropping.
Update unittest: add specific tests with short key frame period.
https://bugs.chromium.org/p/webm/issues/detail?id=1150
Change-Id: I5b1c9a09253e6e5fbfce51b4cf603ae22d422b01
For 1 pass cbr mode: allow for two-stage 1:2 scaling
(which will use the 1:2 optimized scaler) if the spatial
layer is 1/4x1/4 of souce.
Without this change, the base layer for 3 spatial layers would
be using the non-normative scaler which is un-optimized/C code.
Change-Id: I9d73f92a4a96927d0f1d6bf75315c1e60513226a
Use sharp filter to generate motion compensated reference for
temporal filtering. It improves the average coding performance of
VP9 speed 0:
derf 0.34%
hevcmr 0.38%
stdhd 0.58%
Change-Id: I1772a051be545de8c343055274e5ca0929d19cda
This commit back ports the fix from
https://chromium-review.googlesource.com/#/c/326940
It corrects the block partition context fetching in rate-distortion
optimization. It improves the average coding performance of speed 0:
derf 0.098%
hevcmr 0.102%
stdhd 0.282%
Change-Id: I8bcc6fe40ba5c6b50a6136daac116dcc738937ec
The double pointer in xd->mi handles this for us.
Cuts encode_suberblock()'s self time in half at rt speed 8.
Change-Id: I820dae24efdbf9a140bbeae82e4e2a5850317766
* changes:
x86/convolve.h: remove redundant check in FUN_CONV_2D
x86/convolve.h: replace while w/if for w < 16
x86/convolve.h: change filter[] || chains to |
restore the value for VP9 to 9999 to satisfy the current test
expectations; without this
VP9/DatarateTestVP9Large.ChangingDropFrameThresh/8 will overshoot.
Change-Id: I88dad574ae4ab10f923579824c7347ff468c7045
This reverts commit f51f0998e1.
This causes datarate tests to fail. Some are due to the new default
keyframe distance, another causes an assert even forcing 9999:
[ RUN ] VP9/DatarateOnePassCbrSvc.OnePassCbrSvc3SpatialLayers/0
test_libvpx:
vpx_dsp/x86/vpx_subpixel_8t_intrin_ssse3.c:853: scaledconvolve2d:
Assertion `y_step_q4 <= 32' failed.
Change-Id: I4ee4fea97f47e4f1a23b82a62e6afc6280961e38
Reset the scale factors before build_inter_predictors.
Add datarate tests for 3 spatial layers, which exposed this issue.
Change-Id: I7f81efbe44345ecea9fdd5f639a4cca76aed3874
For 1 pass cbr mode: allow for two-stage 1:2 scaling
(which will use the 1:2 optimized scaler) if the spatial
layer is 1/4x1/4 of souce.
Without this change, the base layer for 3 spatial layers would
be using the non-normative scaler which is un-optimized/C code.
Change-Id: Ifcf526ec2aaf3e5fa7924588d9dd8660bf02fb46
some configurations may fail if AltRefTest is undefined though
VP8_INSTANTIATE_TEST_CASE is defined away.
Change-Id: I7272775a506718336bd6cee2225cf83bd72fede5
the same as vp8, with the same reasoning from:
2a0d7b1 Reduce the default kf_max_dist to 128.
see also:
https://trac.ffmpeg.org/ticket/4904https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=815673
+ restore vpxenc behavior of taking the library default rather than
forcing 5s
This change also exposes an issue with one-pass svc in cbr mode, keep
the old default in datarate_test.cc for now.
Change-Id: Id6d1244f42490b06fefc1a7b4e12a423a1f83e88
* changes:
x86inc.asm: only set visibility for chromium builds
Only use .text sections for aout
Use .text instead of .rodata on macho
Copy PIC handling code from x86_abi_support
Set 'private_extern' visibility for macho targets
Expand PIC default to macho64 and respect CONFIG_PIC from libvpx
Use libvpx defines to set name mangling rules
Customize x86inc.asm for libvpx
Update x86inc.asm from x264
Use the existing scene/content change detection to better
update/adjust golden frame refresh.
Change only affects 1 pass real-time vbr mode, speed >=5.
Change-Id: I2963a5bb7ca4a19f8cf8511b0a925e502f60e014
this restores the previous version's behavior avoiding issues with
builds that may split sources on directory boundaries; protected
visibility may work in this case.
Change-Id: Ie759bd96c9ea5b45613f450dffa6e67eb45f5a8b
The read only sections are getting stripped on some OS X builds. As a
result, random data is used in place of the intended tables.
Change-Id: I4629c90d9e0ae4d4efc193a93be6fb93809ae895
Don't initialize first pass costs for a number of symbols where first
pass probabilities aren't initialized.
This brings a 1.22x first pass speedup.
https://bugs.chromium.org/p/webm/issues/detail?id=1089
Change-Id: I97438c357bd88f52f5a15c697031cf0c3cc8f510
replace with vpx_highbd_lpf_horizontal_edge_16 and
vpx_highbd_lpf_horizontal_edge_8 to avoid passing a count parameter
Change-Id: I551f8cec0fce57032cb2652584bb802e2248644d
replace with vpx_lpf_horizontal_edge_16 and vpx_lpf_horizontal_edge_8 to
avoid passing a count parameter
Change-Id: I848c95c02a3c6ebaa6c2bdf0983dce05cd645271
move to encoder_encode() as vp10_get_compressed_data() allocates data and
would require some modification to make its error return meaningful.
Change-Id: Ia5267c35d16ccd42b6da6d2136402b13e28f9159
move to encoder_encode() as vp9_get_compressed_data() allocates data and
would require some modification to make its error return meaningful.
Change-Id: I8ddc390a1441afd0ff937842fa4ad1053c956133
Add frame-level condition for reference masking: under external or
internal dynamic resize, allow for reference masking if none of
the references have been scaled.
Peviously, reference masking was turned off for the stream if dynamic
resize feature was enabled or an external resize event occurred.
reference_masking gives speed up with little/no loss in compression.
For speed 7 on rtc set: encoding time decreases by about 5-7%,
avgPSNR/SSIM goes down ~0.2%.
Change-Id: Ie4444577451ef954414d8fb4b2c99d65cadf1746
This commit fixes issue 1141. The issue was triggered in multi-tile
encoding. The change properly saves and restores the block context
information in the real-time mode selection process. It removes
several redundant memcpy operations in sub8x8 intra block mode search.
Change-Id: I35c9ad197f4bd500ec39b5fc833f052f19eee010
External dynamic resize with swapping width and height was
not handled properly.
Fix is to re-init loop-filter under certain condtions.
Modify unittest to test this case.
Without this change test will fail.
Relates to: https://bugs.chromium.org/p/webm/issues/detail?id=1140
Change-Id: I7d81ca7fe0783b3bc103a52a7b7cf073a96be26e
allocations done within this function are protected with
vpx_internal_error; adding the setjmp fixes a crash in
vp10_lookahead_push() under low memory conditions.
Change-Id: I5515017cd71b218840c506791b3a517da7ffc93e
allocations done within this function are protected with
vpx_internal_error; adding the setjmp fixes a crash in
vp9_lookahead_push() under low memory conditions.
Change-Id: I4b79dca37cc7fadc4b7633f0db44c0e406799bc6
An issue exists with reference_masking in non-rd pickmode for spatial
scaling. It was kept off for internal dynamic resizing and svc, this
change is to keep it off also for external dynamic resizing.
Update to external resize test, and update TODO to re-enable this
at frame level when references have same scale as source.
Change-Id: If880a643572127def703ee5b2d16fd41bdbf256c
For dynamic resizing (whether the new codec size is determined internally
or externally set by user), we should for now keep rc.resize_allowed enabled.
This prevent the use of referene_masking for real-time mode
(in set_rt_speed_feature()).
Change-Id: Ibb7c3ff35be88afdf1a3c6db6693521766f177a3
to vp9_setup_pre_planes(), preventing the function
unscaled_value() from being called. unscaled_value()
returns the same value that was passed in. See
scaled_buffer_offset() in vp9_reconinter.h.
Change-Id: I2a6fbaf07972c2f212834929d29a2cbe72e399c3
The bit to error transformation got doubled as a result of going from
8-bit to 9-bit costs (change d13385c).
Use defines to derive the scale numbers and comment some of the fields.
derf: -0.023 BDRATE
hevcmr: +0.067 BDRATE
stdhd: +0.098 BDRATE
(These are substantially smaller than than the original gains from 8 to
9 bit costing.)
Change-Id: I6a2b3b029b2f1415e4f90a05709b2333ec0eea9b
When the codec frame size is the same as the reference frame size,
release the scaled reference before assigning it a new buf_idx.
Only affects 1 pass non-svc mode, where the scaled references are
release only under certain conditions (to prevent un-needed scaling
of the references every frame).
Modified a unittest that can trigger this bug without this change.
https://code.google.com/p/chromium/issues/detail?id=582598
Change-Id: I9a884e36ebd7608b1641ec2a469e20a4f829cf43
If the application changes frame size (external size changes),
and aq-mode=3 is on, reset the cyclic refresh.
Modify the TestExternalResize unittest (longer run with more resize
actions). Without this change an assert would be triggered on this
longer test.
Change-Id: I0eefd2cd7ffa0c557cca96ae30d607034a2599ce
Fixes an issue where the tx_type was not set correctly for
sub8x8 inter and intra blocks. In the current syntax, for
sub8x8 blocks, there is still a single tx_type that is
transmitted. Ideally, this should be searched for the best
rd performance, albeit at the expense of encode speed.
For now, we just set it to DCT_DCT. Previously it was left
incorrectly as what was used for the previous non sub8x8
block.
derflr: BDRATE -0.277%
Change-Id: If76ba903bfbfd4d374cf1ac7d1daee50e92f0edd
Make this consistent with regular block size rate-distortion
optimization. It improves the compression performance:
derf 0.055%
hevcmr 0.129%
Change-Id: I112fe734f592c21bc7aa6efb7e3f269c4214ee7b
For 1 pass real-time mode. No change in behavior as only last
and golden are used as references in 1 pass real-time mode.
Change-Id: Ie4655014eee1a8b271542f29d74b2c6f7fed54c9
the results along the top and left border are then stored with a moving
window into the vector.
~40-67% faster on ARM, ~40-77+% on x86 depending on the block size.
Change-Id: Iab369aa2946a3ae4eb7290d512868fe5db92dbc8
delete apply_cyclic_refresh_bitrate(). unused since:
3472cbb vp9 aq-mode=3: Keep it on even at low bitrates.
Change-Id: I0fac9a31b59504e31000ac3a8f0b68e8d4320113
The definition is for the number of frames to check to determine the
recent decay rate, further to determine the next key frame in the
first pass of the encoder.
Change-Id: Ic696d6eb518a86fa296842273cf8767ef0b0e27a
when INLINE is defined and mips is not being targeted. otherwise keep
the old --enable-extra-warnings behavior
Change-Id: Iba576edbe5fca03efa56ce99eee11f9cafc573ad
-use larger threshold on y (as in vp8).
-add distance threshold for each cluster
-use larger skin distance threshold for first cluster
-add some early exist checks.
Keep default setting to model=0.
Change-Id: I1044b99ade4bb1f215a860a019a4d84cee2f7715
It improves the compression performance of VP9 by 0.1% across all
test sets. No speed change is observed.
Change-Id: I59338c5c9e67bae22188f35fc3afbfe2a6bba6b0
The postproc vp9_denoise() is a spatial denoise/blur function.
It was not intended to be used if temporal denoising is enabled.
Change-Id: I97d2dcb941e7cc49bbafce99d9286beb2693249d
Put check to avoid possible out of bounds when looping
over the blocks to estimate noise level.
No change in behavior.
Change-Id: I4b7b19b7edee0ae1c35b9dc0700b1bf9b304d7f5
* changes:
configure: extend armv7 hf target autodetect
configure: remove default CROSS for arm targets
configure: avoid default when CROSS is set to null
This commit changes SSSE3 optimized idct8x8 functions to work with
highbitdepth build.
With this commit and the previous one that enabled SSSE3 idct32x32
functions, tests showed virtually no difference on decoding speed for
file fdJc1_IBKJA.248.webm for the build with -enable-vp9-highbitdpeth
option and the build without the option.
Change-Id: Ibe0634149ec70e8b921e6b30171664b8690a9c45
This commit changes the SSSE3 assembly functions for idct32x32 to
support highbitdepth build.
On test clip fdJc1_IBKJA.248.webm, this cuts the speed difference
between hbd and lbd build from between 3-4% to 1-2%.
Change-Id: Ic3390e0113bc1ca5bba8ec80d1795ad31b484fca
the lookahead buffer allocation is deferred to receipt of the first
frame to allow profile changes. if the encoder was flushed before
supplying any frames the encoder would crash trying to dereference the
NULL buffer. vp8 is unaffected.
fixes mozilla bug:
https://bugzilla.mozilla.org/show_bug.cgi?id=1237848
Change-Id: Icee4b64de760476eee0d33b568f0a1010335ff13
Use multiple clusters instead of one and decrease
the distance thresholds.
Add a define to switch between models.
Default is set to existing (1 cluster) model.
Change-Id: I802cd9bb565437ae8983ef39453939f5d5073bb1
If a superblock contains alot of "skin" then force split
of 64x64 partition, and make some adjustments in mode selection.
This helps to reduce artifacts on moving face/skin areas at low bitrates.
Little/no change in metrics: avgPSNR/SSIM down by ~0.12%.
Small encoding time increase < 1%.
Change-Id: Ic57f52148c3716f391419fab0530d916e4c1d186
For aqmode=3, golden period update is set based on period of cyclic refresh.
Put a limit on max golden period update, for now set to 40.
And fix comment.
Change-Id: Icb61dd87c796cce2a5f5f7331c6a129540994696
Limit oscilation detection in the case where overshoot is very very
large.
This keeps the 9-bit cost patch from breaking the DownUp reisze test.
The patch pushed us to an 11% undershoot right before a scene cut
causing a 1200% overshoot. (Whereas before we were undershooting by
only 6% before overshooting by 1200%).
Change-Id: Id90ccfab8aba872ccadc45b73b3bb097b895677f
In inter mode search skip all modes except NEARESTMV and DC_PRED.
10% less encode latency for large frames using the chromium remoting_perftests.
+0.313% BDRATE on the screencast set at speed -6.
Change-Id: Ib97a39dd8bcdeab545509e0e02d78ce7033f8c63
Remove comment(s) and enable frame-dropper for tests.
Frame dropper for 1 pass svc was fixed a while ago:
https://chromium-review.googlesource.com/#/c/309230/
Change-Id: I5fd3192825b22e562db9210d3dc7b246a1799d8d
Make it consistent with the comment/intended behavior,
that is, only denoise if current block is zero_mv.
Change-Id: I3909761e802e80089752a493ab3646dc32698ded
Changes to mode selection for 1 pass SVC mode:
use base layer motion vector, changes to intra-prediction.
Change-Id: I3e883aa04db521cfa026a0b12c9478ea35a344c9
This patch fixes a bug that causes the loop filter search to reset to
a low value or zero after each arf overlay frame. We expect the overlay
frames to need little or no loop filtering but this should not propagate.
Change-Id: I895b28474cf200f20d82793f3de40b60b19579fd
This is a pure-refactor in preparation to potentially raise the bit-cost
resolution.
Verified at good speed 0 and rt speed -6.
Change-Id: I5347e6e8c28a9ad9dd0aae1d76a3d0f3c2335bb9
More aggresive on avoiding denoising on skin.
May supplement this later by adding condtion onn consec_zeromv.
Change-Id: Ied92b332f9b24e821d2009f81d1565758588d9a5
Different quality levels are used for different regions in
the frame depending on how far they are vertically from the
center. Specifically, three segments are used based on the
mi_row index with respect number to the number of mi_rows in
the frame.
Change-Id: Ifc8b777bc58ea8521dffc4640360c67d99f8d381
This reverts commit ea48370a50, reversing
changes made to 15939cb2d7.
The commit was insufficiently tested and causes failures.
Change-Id: I623d6fc2cd3ae6fd42d0abab1f8eada465ae57a7
This commit adds the logic for segmentation map initialization and
disable temporal update of segmentation map when error-resilient
mode is on. It fixes the enc/dec mistmates (release build) and
assertions(debug) when both aq-mode and error-resilient are on.
Change-Id: Id2155e8b28962cf1f64494f4df0c8d79499b6890
Prior to this patch, read_inter_block_mode_info() would
find the nearmv and nearestmv for all modes. Now it does not
search for ZEROMV modes and breaks out early for NEARMV and
NEWMV modes.
Change-Id: Ifa7b1eaf58bb03b9c7792ea5012fef477527d0fd
There are flaws in current implementation of VP8 multithreading encoder
and decoder as reported in the following issue:
https://code.google.com/p/chromium/issues/detail?id=158922
Although the data race warnings are harmless, and wouldn't cause real
problems while encoding and decoding videos, it is better to fix the
warnings so that VP8 code could pass the TSan test.
To synchronize the thread-shared data access and maintain the speed
(i.e. decoding speed), use multiple mutexes based on mb_rows to reduce
the number of synchronizations needed, make the reads and writes of
the shared data protected, and reduce the number of mb_col writes by
nsync times.
The decoder speed tests showed < 3% speed loss while using 2 ~ 4
threads.
Change-Id: Ie296defffcd86a693188b668270d811964227882
The nominal tx_type for a given mode is used as a context
to encode the actual tx_type for intra.
Results:
derflr: -0.241% BDRATE
hevcmr: -0.366% BDRATE
Change-Id: Icfe7b0a58d79bc6497a06e3441779afec6e01e21
This commit enables encoder to avoid 8x4 and 4x8 partitions for
scaled reference frames when libvpx is configured and built with
--enable-better-hw-compatibility
Change-Id: I02ad65c386f5855f4325d72570c49164ed52f413
Move the logic for forcing zero_mode after the
(ref_frame & flag_list) check.
This was causing an memory leak under msan:
https://bugs.chromium.org/p/webrtc/issues/detail?id=5402
Change-Id: Ie9d243369f8ed7c332f46178275945331da4fd85
Under --enable-better-hw-compabibility, this commit adds the asserts
that no mv clamping is applied for scaled references, so when built
with this configure option, decoder will assert if an input bitstream
triggger mv clamping for scaled reference frames.
Change-Id: I786e86a2bbbfb5bc2d2b706a31b0ffa8fe2eb0cb
This commit adds a new configure option:
--enable-better-hw-compatibility
The purpose of the configure option is to provide information on known
hardware decoder implementation bugs, so encoder implementers may
choose to implement their encoders in a way to avoid triggering these
decoder bugs.
The WebM team were made aware of that a number of hardware decoders
have trouble in handling the combination of scaled frame reference
frame and 8x4 or 4x8 partitions. This commit added asserts to vp9
decoder, so when built with above configure option, the decoder can
assert if an input bitstream triggers such decoder bug.
Change-Id: I386204cfa80ed16b50ebde57f886121ed76200bf
Add function to compute skin map for a given block, as its
used in several places (cyclic refresh, noise estimation, and denoising).
Change-Id: Ied622908df43b6927f7fafc6c019d1867f2a24eb
Set initial values for these parameters in the vp9_init_layer_context().
This also fixes an issue in the svc-bypass mode when frame flags are
passed via the vpx_codec_encode().
Change-Id: I0968f04672f8d3d2fe2cea6b8a23f79f80d7a8b1
Otherwise, per-segment lossless might mean that some segments are not
lossless and they could still want to use another mode. The per-block
tx points remain uncoded on blocks where (per the segment id) the Q
value implies lossless.
Change-Id: If210206ab1fe3dd11976797370c77f961f13dfa0
For coding block sizes <=16X16, if the block is determined to be skin,
then always allow for that block to be candidate for refresh. So if that
block happens to be on the boost segment(s), segment won't get reset to 0
and delta-q will be applied.
PSNR/SSIM metrics neutral (little/no change) on RTC clips.
Speed increase small/negligible (< 1%).
Some visual improvement on faces in a few RTC clips.
Change-Id: I6bf0fce6f39d820b491ce05d7c017ad168fce7d6
arm-none-linux-gnueabi- is an anachronism and makes building on native
arm platforms more difficult. further, many distros include alternative
cross compilers, e.g., arm-linux-gnueabihf-, so the choice is best left
up to the user.
Change-Id: Id8aaf820ed112b85db2b8518d0e9d8abee1ad85c
avoids picking up defaults if CROSS is forcibly set empty as in:
$ CROSS= ./configure ...
BUG=1121
Change-Id: I6af91959288dede01efe3e5945698ab249eb6ec3
reduce the register count by 1 to avoid xmm6 and unnecessarily
penalizing the other users of the base macro
Change-Id: I59605c9a41a31c1b74f67ec06a40d1a7f92c4699
In 32-bit build with --enable-shared, there is a lot of
register pressure and register src_strideq is reused.
The code needs to use the stack based version of src_stride,
but this doesn't compile when used in an lea instruction.
This patch also fixes a related segmentation fault caused by the
implementation using src_strideq even though it has been
reused.
This patch also fixes the HBD subpel variance tests that fail
when compiled without disable-optimizations.
These failures were caused by local variables in the assembler
routines colliding with the caller's stack frame.
Change-Id: Ice9d4dafdcbdc6038ad5ee7c1c09a8f06deca362
H/V intra mode was only enabled for bsize < 16x16,
enable it also for bsize=16x16.
Metrics are neutral with this change:
Overall very small gain (0.1%), small visual gain on some RTC clips.
Change-Id: Ib2d7a44382433bfc11cf324aa3cc5c382ea9e088
For testing implemented a fixed pattern and delta, 1 pass,
fixed Q, low delay mode.
This has not in any way been tuned or optimized.
Change-Id: Icf9b57c3bb16cc5c0726d5229009212af36eb6d9
(copied from VP9)
The one pass VBR mode selects a Q range based on a
moving average of recent Q values. This calculation
should have been excluding arf overlay frames as these
are usually coded at the highest allowed value. Their
inclusion skews the average and can cause it to drift
upwards even when the clip as a whole is undershooting.
As such it can undermine correct adaptation of the allowed
Q range especially for easy content.
Change-Id: I9e12da84e12917e836b6e53ca4dfe4f150b9efb1
For testing implemented a fixed pattern and delta, 1 pass,
fixed Q, low delay mode.
This has not in any way been tuned or optimized.
Change-Id: Idf5ee179b277fa15d07a97f14f2ce5bbaae80a04
The one pass VBR mode selects a Q range based on a
moving average of recent Q values. This calculation
should have been excluding arf overlay frames as these
are usually coded at the highest allowed value. Their
inclusion skews the average and can cause it to drift
upwards even when the clip as a whole is undershooting.
As such it can undermine correct adaptation of the allowed
Q range especially for easy content.
Change-Id: I7d10fe4227262376aa2dc2a7aec0f1fd82bf11f9
The culprit is on the decode side xd->lossless[i] setup was in wrong
location where segment features are not yet decoded.
Also on the encoder side, transform mode was not set consistently
between when tx_mode is selected and how tx_mode is enforced in
tx size selection.
Change-Id: I4c4c32188fda7530cadab9b46d4201f33f7ceca3
Keep track of frame indexes for the references, and
constrain inter mode search for reference with same
temporal alignment.
Improves speed by about ~15%, no noticeable loss in
compression performance.
Change-Id: I5c407a8acca921234060c4fcef4afd7d734201c8
Lower the threshold for splitting 32x32->16x16 based on average variance,
and add lower bound condition for this split to occur. This prevents
unneccassry splitting for areas with very low variance.
Change-Id: Ibeb33b3d993632c2019f296eb87ef3b7e3568189
For non-rd variannce partition, speed >= 5:
Adjustments to reduce dragging artifcat of background area near
slow moving boundary.
-Decrease base threshold under low source noise conditions.
-Add condition to split 64x64/32x32 based on average variances
of lower level blocks.
PSNR/SSIM metrics go down ~0.7/0.9% on average on RTC set.
Visually helps to reduce dragging artifact on some rtc clips.
Change-Id: If1f0a1aef1ddacd67464520ca070e167abf82fac
Reallocate the xmm register usage so that no ARCH_X86_64 required.
Reduce memory access to the left neighbor by half.
Speed up by single digit on big core machine.
Change-Id: I392515ed8e8aeb02e6a717b3966b1ba13f5be990
This commit makes the sub8x8 block rate-distortion optimization
scheme use precise motion compensated prediction to compute the rd
cost. It fixes a potential buffer overflow issue related to sub8x8
motion search on scaled reference frame.
Change-Id: I4274992ef4f54eaacfde60db045e269c13aaa2de
GET_GOT modifies the stack pointer so the offset for left's address will
be wrong if loaded afterword.
Change-Id: Iff9433aec45f5f6fe1a59ed8080c589bad429536
Relocate the function from SSSE3 to SSE2, Unroll loop from 16 to 8,
and reduce mem access to left.
Speed up by single digit in ./test_intra_pred_speed on big core
machines.
Change-Id: I2b7fc95ffc0c42145be2baca4dc77116dff1c960
Xcode 7 refuses to link to x86 and x86_64 code that's built for
iphone sim, so add an extra command line flag that forces iosbuild
to use darwin15 targets.
Change-Id: I2228d458f5cccf4d26866040380a974f88d9d360
This commit enables the new temporal filter system for VP9. For
speed 1, it improves the compression performance:
derf 0.54%
stdhd 1.62%
Change-Id: I041760044def943e464345223790d4efad70b91e
This change has been imported from VP9 and
alters the nature and use of exhaustive motion search.
Firstly any exhaustive search is preceded by a normal step search.
The exhaustive search is only carried out if the distortion resulting
from the step search is above a threshold value.
Secondly the simple +/- 64 exhaustive search is replaced by a
multi stage mesh based search where each stage has a range
and step/interval size. Subsequent stages use the best position from
the previous stage as the center of the search but use a reduced range
and interval size.
For example:
stage 1: Range +/- 64 interval 4
stage 2: Range +/- 32 interval 2
stage 3: Range +/- 15 interval 1
This process, especially when it follows on from a normal step
search, has shown itself to be almost as effective as a full range
exhaustive search with step 1 but greatly lowers the computational
complexity such that it can be used in some cases for speeds 0-2.
This patch also removes a double exhaustive search for sub 8x8 blocks
which also contained a bug (the two searches used different distortion
metrics).
For best quality in my test animation sequence this patch has almost
no impact on quality but improves encode speed by more than 5X.
Restricted use in good quality speeds 0-2 yields significant quality gains
on the animation test of 0.2 - 0.5 db with only a small impact on encode
speed. On most natural video clips, however, where the step search
is performing well, the quality gain and speed impact are small.
Change-Id: Iac24152ae239f42a246f39ee5f00fe62d193cb98
4x4 Intra predictor implemented with MMX is replaced with SSE2.
Segfault in change 315561 when decoding vp8 is taken care of.
Change-Id: I083a7cb4eb8982954c20865160f91ebec777ec76
Fix copied over from VP9 master to VP10 master.
Do not reset the alt ref active flag when overlaying the middle
arf(s) of a multi arf group.
Change-Id: I1b7392107e7c675640d5ee1624012f39cc374c58
use CONFIG_VP[89] to protect white-box tests and drop redundant
uses of CONFIG_VP9 in variable assignments within that block
Change-Id: Id3c6cf5c7822aa161b19768b295f58829a1c6447
For non-rd variance partition: Adjust variance threhsold based
on noise level estimate. This change allows the adjustment to be
updated more frequently.
Change-Id: Ie2abf63bf3f1ee54d0bc4ff497298801fdb92b0d
Relocate the function from SSSE3 to SSE2, Unroll loop from 8 to 4,
and reduce mem access to left.
Speed up by >20% in ./test_intra_pred_speed.
Change-Id: Ie48229c2e32404706b722442942c84983bda74cc
Relocate the function from SSSE3 to SSE2, Unroll loop from 4 to 2,
and reduce mem access to left.
Speed up by >20% in ./test_intra_pred_speed.
Change-Id: Ib9f1846819783b6e05e2a310c930eb844b2b4d2e
The range_check is not used because the bit range
in fdct# is not correct. Since we are going to merge in a new version
of fdct# from nextgenv2, we won't fix the incorrect bit range now.
Change-Id: I54f27a6507f27bf475af302b4dbedc71c5385118
For low resolutions, whem 4x4downsample is used for variance,
use the same force split (that is used for 8x8downsample) for 16x16 blocks.
No change in metrics. Small improvement visually.
Change-Id: I915b9895902d0b9a41e75d37fee1bf3714d2366d
the loop filter level is transmitted as 6-bits + sign so needs to be clamped in
the delta + absolute case.
BUG=https://bugzilla.mozilla.org/show_bug.cgi?id=1224363
Change-Id: Icbdca4fdbf043466429bd5c9d59dbe913bf153bc
the quantizer is transmitted as 7-bits + sign so needs to be clamped in
the delta + absolute case.
BUG=https://bugzilla.mozilla.org/show_bug.cgi?id=1224361
Change-Id: I9115f5d1d5cf7e0a1d149d79486d9d17de9b9639
This is so we may update level at any time (e.g., to be used
for setting thresholds in variance-based partition).
Change-Id: I32caad2271b8e03017a531f9ea456a6dbb9d49c7
Under certain denoising conditons, check for re-evaluation of
zero_last mode if best mode was golden reference.
Change-Id: Ic6cdfd175eef2f7d68606300c7173ab6654b3f6e
Reduce mem access to left. Speed up by 10% in ./test_intra_pred_speed
with the same instruction size.
Change-Id: Ia33689d62476972cc82ebb06b50415aeccc95d15
For non-rd variance partition: only allow minmax computation
(which currently has no arm-neon optimization) for speeds < 8.
Performance loss is small: On RTC set with speed 8, few clips lose ~2/3%,
average loss is < 1%.
Change-Id: Ia9414f4d0b77dc83c3e73ca8de5d903f64b425ce
Relocate h_predictor_4x4 from SSSE3 to SSE2 with XMM registers.
Speed up by ~25% in ./test_intra_pred_speed.
Change-Id: I64e14c13b482a471449be3559bfb0da45cf88d9d
Change initial state of noise level, and only update
denoiser with noise level when estimate is done.
Change-Id: If44090d29949d3e4927e855d88241634cdb395dc
For denoising, and for noise level above threshold, re-evaluate
ZEROMV for mode selection after denoising.
Current change only does this check if selected best mode (before denoising)
was intra.
Change-Id: I4b1435b68d26c78f7597b995ee7bff0ddd5f9511
Always round sum error and sum square error toward zero in variance
calculations. This prevents variance from becoming negative.
Avoiding rounding variance at all might be better but would be far
more invasive.
Change-Id: Icf24e0e75ff94952fc026ba6a4d26adf8d373f1c
This change makes sure last reference with zero mv
is always checked for mode selection.
No change in metrics.
Change-Id: Iaf01877bf34272b966c78bfe18daad882a0a419e
the final sum may use up to 26 bits
+ add a unit test
+ disable the sse2 as the result will rollover; this will be fixed in a
future commit
Change-Id: I2a49811dfaa06abfd9fa1e1e65ed7cd68e4c97ce
Change on affects 1 pass CBR.
On key frame, temporal layer_id is reset to 0 for 1 pass CBR,
but since "layer" is reset, the svc.layer_context[layer].is_key_frame
was not correspondingly set properly.
Change-Id: I08f6da0a55ac7429ccfbaddfb7be14479e43543b
tm_predictor_4x4 is implemented with SSE2 using XMM registers.
Speed up by ~25% in ./test_intra_pred_speed.
Change-Id: I25074b78d476a2cb17f81cf654bdfd80df2070e0
--disable-XXX has the effect of disabling all extensions above it, e.g.,
--disable-ssse3 disables ssse3-avx2.
Change-Id: If02b44ca71ee12e4acb12010db8593a7989f2a9d
Small changes to the best quality default speed trade off.
Some speedup settings are worth while even for best quality as they
have only a very small impact on quality but a significant impact on
encode time.
These changes give as much as a further 50-60% increase in encode
speed for my test animations clip with minimal impact on quality.
For this sequence these changes improve the best quality encode speed
to about the same level as good quality speed 0 in Q3 2015 whilst
retaining the large quality gain of over 1 db
For many natural videos though the quality difference from good 0
to best is much smaller.
Change-Id: I28b3840009d77e129817a78a7c41e29cb03e1132
This is simpler than the previous scheme, which tried to allocate
the CRITICAL_SECTION struct in a thread-safe manner before it
could use it to run the wrapped function in a thread-safe manner.
Change-Id: I172e5544e5f16403a3a0e5e2b9104b1292a0d786
This change alters the nature and use of exhaustive motion search.
Firstly any exhaustive search is preceded by a normal step search.
The exhaustive search is only carried out if the distortion resulting
from the step search is above a threshold value.
Secondly the simple +/- 64 exhaustive search is replaced by a
multi stage mesh based search where each stage has a range
and step/interval size. Subsequent stages use the best position from
the previous stage as the center of the search but use a reduced range
and interval size.
For example:
stage 1: Range +/- 64 interval 4
stage 2: Range +/- 32 interval 2
stage 3: Range +/- 15 interval 1
This process, especially when it follows on from a normal step
search, has shown itself to be almost as effective as a full range
exhaustive search with step 1 but greatly lowers the computational
complexity such that it can be used in some cases for speeds 0-2.
This patch also removes a double exhaustive search for sub 8x8 blocks
which also contained a bug (the two searches used different distortion
metrics).
For best quality in my test animation sequence this patch has almost
no impact on quality but improves encode speed by more than 5X.
Restricted use in good quality speeds 0-2 yields significant quality gains
on the animation test of 0.2 - 0.5 db with only a small impact on encode
speed. On most clips though the quality gain and speed impact are small.
Change-Id: Id22967a840e996e1db273f6ac4ff03f4f52d49aa
This reverts commit 380a5519cc.
This causes an assertion failure in debug_check_frame_counts() which
probably isn't valid with this change; leaving the investigation for
later now.
Change-Id: Ieda5ca811ed2fa50a0cc6935919a8d10dca996e0
This function now has an AVX intrinsics version which is about 80%
faster compared to the C implementation. This provides a 2-4% total
speed-up for encode, depending on encoding parameters. The function
utilizes 3 properties of the cost function lookup table, constructed
in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'.
For the joint cost:
- mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3]
For the component costs:
- For all i: mvsadcost[0][i] == mvsadcost[1][i]
(equal per component cost)
- For all i: mvsadcost[0][i] == mvsadcost[0][-i]
(Cost function is even)
These must hold, otherwise the AVX version of the function cannot be used.
Change-Id: I6c2791d43022822a9e6ab43cd124a773946d0bdc
the return value of enabled, which may be empty, is handled by the for
loop. this avoids making an unnecessarily long command line which may
fail in certain cases.
Change-Id: Ib88ecbbe2c0f6d7debb600b4caed4884497263b1
Change is only for real-time mode, speed >= 5, and non-screen content mode.
Add bias to zero/low motion for big blocks, if noise estimation
is enabled and noise level is above threshold.
Change-Id: I3a0a4608ede6aa535bda6eca528d20f8aba738e7
For 1 pass CBR mode: increase waiting time after key frame
before we start sampling rate control behavior for determining
resize. This change need to disable one internal resize(DownUp)
temporally since it requires a longer clip to do so.
Change-Id: If21beda1be23f169ee541ab4dd642f718347887a
Use same setting for speed 5 (as it is for speed > 5).
Change is only for real-time (non-rd) mode.
Change-Id: I830250eac654328373cb318baa89d4f0e63942e1
Reduces Linux perf estimated cycle count for pack_mb_tokens on a
lossless encode on my desktop from 61858501855 to 48154040219 or from
26% of the overall profile to 21%.
Change-Id: I9ca3426d7e3272bc7f7030abda4f0d0cec87fb4a
This reverts commit f1342a7b07.
This breaks 32-bit builds:
runtime error: load of misaligned address 0xf72fdd48 for type 'const
__m128i' (vector of 2 'long long' values), which requires 16 byte
alignment
+ _mm_set1_epi64x is incompatible with some versions of visual studio
Change-Id: I6f6fc3c11403344cef78d1c432cdc9147e5c1673
Add threshold/condition on spatial_variance and brightness level.
Modification to normalization of block variance.
Change resolution limit below which we disable noise estimation.
Change-Id: If5be08a26ceda351242d8a58d2f0bc88c0a918f0
This function now has an AVX intrinsics version which is about 80%
faster compared to the C implementation. This provides a 2-4% total
speed-up for encode, depending on encoding parameters. The function
utilizes 3 properties of the cost function lookup table, constructed
in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'.
For the joint cost:
- mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3]
For the component costs:
- For all i: mvsadcost[0][i] == mvsadcost[1][i]
(equal per component cost)
- For all i: mvsadcost[0][i] == mvsadcost[0][-i]
(Cost function is even)
These must hold, otherwise the AVX version of the function cannot be used.
Change-Id: I184055b864c5a2dc37b2d8c5c9012eb801e9daf6
Change is only for real-time mode, speed > 5, and non-screen content mode.
Bias is based on block size and motion vector level (motion above some threshold).
Helps to improves stability in background from lightning changes.
PSNR/SSIM metrics on RTC set almost no change/neutral (within +/- 0.1).
Change-Id: I7eac13c1ae10be4ab1f40acc7f9f1df5653ece9d
Only use non-zero threshold(s) for breakout if
the motion level of the current tested mode is low.
Change-Id: I22aae961cc42371b49d3f648560181cc54708502
Source noise level estimate is also useful for
setting variance encoder parameters (variance thresholds,
qp-delta, mode selection, etc), so allow it to be used also
if denoising is not on.
Change-Id: I4fe23d47607b4e17a35287057f489c29114beed1
this avoids redefining vpx_codec_vp9_dx, vpx_codec_vp9_dx_algo in
vp9_encoder_parms_get_to_decoder.cc
Change-Id: I3b89e7a62497227ee32419f1a7d30e4c10a13c05
The old workaround "p = 0 ? 0 : p -1" is misleading.
?: happens before =
assigning back to p truncates to one byte.
Therefore it is equivalent to (p - 1) & 0xFF, but the check just exists
to work around a first pass bug, so let's make the work around more
clear.
https://bugs.chromium.org/p/webm/issues/detail?id=1089
Change-Id: I587c44dd61c1f3767543c0126376f881889935af
Width and height of downscaling resolution should not be lower
than min_width and min_height which can be set as needed, both
are 180 for now.
Change-Id: I34d06704ea51affbdd814246e22ee8d41d991f00
This reverts commit 7f56cb2978.
It causes uninitialized reads in the first pass setting up later cost tables.
Change-Id: I2df498df3f5c03eff359f79edf045aed0c618dc9
Remove delta index 254 from probability remapping and subexp coding.
Saves 1-bit when the delta index is 129.
Change-Id: I88aba565fc766b1769165be458d2efd3ce45817e
Adjust variance threshold, delta-qp, and intra penalty cost,
based on estimated noise level in source.
Replace denoising_on with a level value=L/M/H.
Change-Id: I0c017dae75a5d897367d2c42dec26f2f37e447c1
The option exists specifically to allow for configurations
where the build environment is different from the configure
environment.
Change-Id: I95196fa3c49700251d10ff5d256dc7380e39d0c4
The old workaround "p = 0 ? 0 : p -1" is misleading.
?: happens before =
assigning back to p truncates to one byte.
Therefore it is equivalent to (p - 1) & 0xFF, but the check just exists
to work around a first pass bug, so let's make the work around more
clear.
https://code.google.com/p/webm/issues/detail?id=1089
Change-Id: Ia6dcc8922e1acbac0eeca23a4d564a355c489572
2015-10-26 11:29:46 -07:00
328 changed files with 19170 additions and 14698 deletions
@@ -13,12 +13,8 @@ Prefix functions with vpx by default.
Manage name mangling (prefixing with '_') manually because 'PREFIX' does not
exist in libvpx.
Expand PIC default to macho64 and respect CONFIG_PIC from libvpx
Catch all elf formats for 'hidden' status and SECTION notes.
Avoid 'amdnop' when building with nasm.
Set 'private_extern' visibility for macho targets.
Copy PIC 'GLOBAL' macros from x86_abi_support.asm
Use .text instead of .rodata on macho to avoid broken tables in PIC mode.
Use .text with no alignment for aout
Only use 'hidden' visibility with Chromium
Move '%use smartalign' for nasm out of 'INIT_CPUFLAGS' and before
'ALIGNMODE'.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.