The speedup is pretty poor. I would be concerned except the SSE2 is
worse:
Existing SSE2 improvement: 22%
New neon improvement: 35%
BUG=webm:1320
Change-Id: Ied598a261134aa6cbe69f96f58589d2bae17bf62
Increase the boost threshold below which GOLDEN update will use same
rate correction factor as INTER_NORMAL.
Improves performance when gf_cbr_boost_pct is set (between 0 and 100)
in CBR mode.
Change-Id: I9f54cc18664786a100b13a416b7137ae03bd0cab
Set short_circuit_low_temp_var to 3 for speed 8 for all res.
No strong visual difference on all clips.
Change-Id: Ia6d9a314291ab1c14d5421bbdd769974083aeb2a
Constraints on encoder config:
-target_bandwidth is no larger than 80% of level bitrate limit
-target_bandwidth * (1 + max_over_shoot_pct) is no larger than
88% of level bitrate limit
-min_gf_interval is no smaller than level limit
-tile_columns is no larger than level limit
Constraints on rate control:
-current frame size plus previous three frames' size is no larger
than the CPB level limit
-current frame size is no larger than 50%/40%/20% of the CPB
level limit if it's a key/alt-ref/other frame.
Change-Id: I84d1a2d6d6e3c82bfd533b3309ce999cfaba2c8b
The source sad could be used to copy the partition without going into
choose_partitioning function to speed up vp9 encoding. Computing source
sad takes little time. Speed test on Android and Linux shows little
encoding time gain (less than 1.4%).
Turned off for now since partition copy is turned off.
Change-Id: I61c9d5b8f22329760cb29a4ee30a7f9c232ce8d3
vp9: Set short circuit to level 3 for VGA for speed 8. Also change the
threshold_32x32 to 5/8*thresholds[1] to improve quality regression
caused to VGA clips.
Change-Id: Ia1590e91e7cb22be78d5b85013387bb1be4272e3
Comment out check on buffer underrun, as it currently fails
on some of the svc tests.
Also cast the update of bits_in_buffer_model_, as this can
go negative now due to the buffer underrun.
This fixes the issue in #1352.
BUG=webm:1350
BUG=webm:1352
Change-Id: Ibd4ef23921daf09e5c15b000aca904aa4573599c
For out-of-range cases, returned UINT_MAX instead of INT_MAX in the
sub-pixel mv search to be consistent with the "uint32_t" return type.
Change-Id: I8e206d771228c13d89bafbbe9f14722c8ecc6a7a
The function 'vp9_find_best_sub_pixel_tree_pruned_more' is modified
to return INT_MAX for handling invalid MV cases from UINT32_MAX.
yunqingwang:
patch 3: rebased on top of the tree.
patch 4: The return type of vp9_find_best_sub_pixel_tree* was changed
to uint32_t to fix ubsan warnings. Changing UINT_MAX back to INT_MAX
was not quite right. Patch 4 modified vp9_temporal_filter.c to accept
uint32_t.
(Note: Inconsistency exists in vp9_find_best_sub_pixel_tree*, which
will be fixed in a separate CL.)
Change-Id: Ib1a79dc2aa41ea6335c21669c76883cdbb7e0535
This reverts commit f0b491a524.
This change results in unsigned integer overflows (as reported by
-fsanitize=integer) in datarate_test.cc,
for many of --gtest_filter=VP9/DatarateOnePassCbrSvc.OnePassCbrSvc*:
unsigned integer overflow: 167198 - 185560 cannot be represented in type
'unsigned long'
As the encoder didn't change, but the input with the change to
(correctly) use Y4mVideoSource, this revert is merely masking the issue.
BUG=webm:1352
Change-Id: Iecd9a6c83b3fca67c566732a5c92d36193cc2060
Comment out check on buffer underrun, as it currently fails
on some of the svc tests.
BUG=webm:1350
Change-Id: I73c88b800cdcc06bd2f900f7b7e2a5fd08248065
When source frame is altref, we only do zero-mv mode, so we can skip
the find_predictors(). No change in compression.
Small speed gain, ~1%.
Only affects 1 pass vbr with lookhead altref, for ytlive with
the macro flag USE_ALTREF_FOR_ONE_PASS on.
Change-Id: I9318c5da8521f017bf54919cd652438b3a6313d1
Remove superfluous test. Produces a small improvement in instruction scheduling.
Measured a 1% to 1.5% reduction in execution time for routine vp9_optimize_b
with different compilers.
No change in behavior.
TEST=Verified that encoded files match bit for bit, with and without this
change.
BUG=b/33678225
Change-Id: I2bf248d4c25fc0256147d7a8766ff9108ae9cba3
Add feature to copy partition from the last frame.
The copy is only done under certain conditions that SAD is below threshold.
Feature is currently disabled, until threshold is tuned.
Feature will be initially used for Speed 8 (ARM).
Under extreme case of always copying partition for speed 8:
Encode time is reduced by 5.4% on rtc_derf and 7.8% on rtc.
Overall PSNR reduced by 2.1 on rtc_derf and 0.968 on rtc.
Change-Id: I1bcab515af3088e4d60675758f72613c2d3dc7a5
Simplify address arithmetic on token_costs to reduce the number of generated
instructions that are used for address arithmetic inside routine
vp9_optimize_b. It also helps improve instruction scheduling depending on
compiler and optimization level.
Measured a 9.3% reduction in retired instructions and 5.3% reduction in
execution time for this routine with GCC v4.8.4 and optimization flags -O3,
and a reduction of up to 11.6% in execution time with other compilers.
No change in behavior.
TEST=Verified that encoded files match bit for bit, with and without this
change.
BUG=b/33678225
Change-Id: I6098650fb5cd2aa04e014fe6e68ca20761f3a21f
relocate the assignment to 'in' outside of the for loop. this quiets a
spurious warning in visual studio builds since:
86e340c enable vpx_idct32x32_1024_add_neon in hbd builds
+ give the variable a more descriptive name
BUG=webm:1294
Change-Id: I5c3da5c7939621477e0fc0ad3a1b2a3045c5bffd
Correctly set interp_filter to SWITCHABLE for INTRA mode.
Also reduce threshold on noise level for re-evaluating zeromv.
Change-Id: Id32c01e193209fb380aa07204f0be3babf29f70a
For when denoising enabled: change condition to enable
the recheck_zeromv_after_denoising for only very high noise level.
This is causing an issue, so enabling it for very high noise
to effectively shut it off.
Change-Id: Ic40d6025f3f398338cedd270d17c0ccd9a3daa84
The new test is causing valgrind failures:
[ RUN ] SSE2/VpxPostProcDownAndAcrossMbRowTest.CheckCvsAssembly/0
==28923== Invalid read of size 16
28923== at 0x724016: ??? (deblock_sse2.asm:146)
Disable during investigation. The test is new but the code is not.
Change-Id: I5521e5fd48a595e3798b833bf7e3cc97b81c1975
To avoid decode performance hit of 2% when running on hyperthreaded
cores.
This patch only uses the mutex's when we are running tsan.
This is safe because 32 bit operations like read and store are atomic
on all the platforms we care about. Tsan warns about race situations,
but in this case either situation ( read occurs before write or write
before read) the worst case is that we go around one extra time in the
loop. So the ordering doesn't really matter.
That said a few other things have been tried :
for instance as per here:
webrtc/base/atomicops.h#52
In this patch they use:
__atomic_load_n(i, __ATOMIC_ACQUIRE);
__atomic_store_n(i, value, __ATOMIC_RELEASE);
This code works on gcc, clang ( replacing protected write and read), and
avoids tsan errors. Incurring no penalty in performance. In C11 its
replaced by straight atomic operands.
However there is no equivalent in the visual studio's we support as
int32 on all windows platforms is already atomic. To avoid tsan like
warnings on windows we'd need to use interlocked exchange and the
end result doesn't gain us any thing.
Change-Id: I2066e3c7f42641ebb23d53feb1f16f23f85bcf59
Implement vpx_post_proc_down_and_across_mb_row in NEON.
Runs about 6-7x faster than C.
BUG=webm:1320
Change-Id: Ic5c7d3552a88cfcf999ec5bf2bd46fee460642c2
The flag USE_ALTREF_FOR_ONE_PASS allows for alt-ref lookahead
in 1 pass vbr (from https://chromium-review.googlesource.com/#/c/365498).
This change is to make sure this macro flag only has effect if
the config flag cpi->oxcf.enable_auto_altef is also on.
No change in ytlive encoding, as USE_ALTREF_FOR_ONE_PASS is not
yet enabled.
Change-Id: I1a69681e4a15c5244581a3dab4587fca08f02e0f
Reapply this patch:
ff0107f Amend and improve VP8 multithreading implementation
Amended the patch to add a unit test, and fix an asan error.
BUG=webm:851
Change-Id: I6572c03256169c64e80248bf5a5e99f59a2fc93c
use_base_mv assumes 2x2 scaling, so fix is to shutoff
this feature unless spatial scale factors are 2.
Added svc unittest for 2 spatial layers with 5x5 scaling,
which generates the issue without this fix.
Also fix some settings in svc unittest:
let the speed setting vary (from 5 to 8), and enable static threshold.
BUG=webm:1344
Change-Id: Idfd0a6c633c21b49a0479601506302cfe974e30e
Set #threads to default 1 for all streams, change bit allocaton
for 3 temporal layers, and enable denoiser on middle resolution layer.
Change-Id: I4a57adbfdb2c319002b8f3cf359613842dc00d75
after:
2d3d95f enable vpx_idct16x16_256_add_neon in hbd builds
reorder INCLUDEs and fix indent of IF/ENDIFs
remove vpx_config.asm to avoid multiple symbol definitions in windows
builds and shift idct_neon.asm.S to the top to allow use of
CONFIG_VP9_HIGHBITDEPTH in the export list.
Change-Id: I0dacfbae62a6ec8fe4a26940c1a52da2dfad2029
One of the first pass stats "new_mv_count" is no longer used in VP9,
and is removed. This also makes it easy to implement a multi-threaded
first pass. This change doesn't affect the coding performance, which
has been verified by borg tests.
Change-Id: I4c7c7bf9465fda838eb230814ef0c631c068c903
1. Use correct projections when copying real dct/quant outputs.
2. Remove local random number generator and combine loops.
3. Quantization with minimum allowed step sizes instead of maximum.
This may generate larger inputs.
Change-Id: I154afc26230c894d564671cff4b8fd5485b69598
vpx_idct16x16_256_add_neon_pass1, vpx_idct16x16_10_add_neon:
this was a constant 8 in all cases meaning the results are stored
contiguously, this allows the number of stores to be reduced.
Change-Id: I7858a0a15a284883ef45c13dfd97c308df9ea09e
Use the segment weight factor based on the target (cr->percent_refresh)
if it less than the current estimate (avergae of past usage and target).
Small improvement at low bitrates.
Change-Id: Iba8fd909e203f94458901366d3a991f7ea854d49
the expansion of findstring and rtcd_dep_template_CONFIG_ASM_ABIS needs
to be deferred until the block is parsed as makefile syntax rather than
eval time where rtcd_dep_template_CONFIG_ASM_ABIS will be unset. this
ensures vpx_config.asm is properly created.
Change-Id: I7c38c6c082da78397936467482789dd468adc316
* changes:
Update vpx_idct4x4_16_add_neon() to pass SingleExtremeCoeff test
Refine 8-bit 4x4 idct NEON intrinsics
Add idct speed test.
Update partial_idct_test.cc to support high bitdepth
force this to be created before any other .S files. this change
additionally removes the file from the source list as it doesn't need to
be compiled on its own.
Change-Id: I6b4cd56ef6059d08f75f06fb749cddf76e0e165e
Increase the motion threshold and qp-delta for segment#2 boost.
This can increase the frame-drop at low bitrates, but generally
better spatial quality.
Only affects real-time mode with aq-mode=3, at very low bitrates.
Change-Id: I5ccb784667f70d0c27d369806b93b1f93d5605d1
For some filter level, the C/MSA doesn't match SSE2. Part of unit tests
are disabled. They will be re-enabled when C/MSA funcs are fixed.
BUG=webm:1321
Change-Id: Ib16b98b5eecb15d2252aa4ea267b782ee2b27533
replace downloads.webmproject.org with the canonical
storage.googleapis.com/... form. this appears less likely to fail when
dealing with multiple concurrent connections.
Change-Id: I0dcbd04df9e4057fa851f458b3ef7e3589f1f2f1
best_sub8x8[1] won't be used meaningfully when is_compound is false, but
may trigger an msan warning as the value is copied around and later
clamped.
BUG=667044
Change-Id: Icc24c3b72cdb550bebea44d4aaa4ff8bf3fbab56
Use the same feature as https://chromium-review.googlesource.com/#/c/411327/,
but allow it to be used for speed = 6 and 7, where
short_circuit_low_temp_var = 1.
Speed up of ~2-3% for speed 7, with little/no loss in compression.
Change-Id: I263a0f261ad9929034392d68f0153dc6376fdb5f
Remove unnecessary "virtual" before some functions. Change *_btm_* in
variable names to *_bottom_*.
Change-Id: Ifd4ce667537617f451cdfed47dd8c48817fd983b
VpxEncoderThreadTest was taking a very long time for some runs and
timing out a lot. This is an attempt to split the test into runs
that can be run nightly ( speeds 2 through 9) and runs that can
be run weekly ( speeds 0-1 ).
Change-Id: Iee6f61a561006d3a30381dd3b52b9a4dce07a70c
This is a boolean value that is written into bitstream, any value other
than 0 or 1 could have led to unexpected behavior. This commit fix the
issue by adding validation of the value to make sure it is boolean.
BUG=webm:1339
Change-Id: I2d3e69e8dbefcab9a0db9cb39a91a40ce531c5a1
fixes reloc errors like:
R_X86_64_PC32
vpx_dsp/x86/deblock_sse2.o:
requires dynamic R_X86_64_PC32 reloc against 'vpx_rv' which may overflow
at runtime
Change-Id: I218fc0e7c8258197f890d395f335e5a4fe82dccb
tests with 'Large' in the name are reserved for slow running tests which
may not be run on all platforms
Change-Id: I2a7d6dd46b29b50469893e46433844132fb727c2
This runs multiple encodes and decodes of vp8 and vp9 in parallel,
with so many threads that problems with synchronization can show up.
Change-Id: I2b297e7f43d1e741323c7ad9f50a3931ae609f16
Add a new, more aggresive short circuit: short_circuit_low_temp_var = 3 to skip
golden of any mode when variance is lower than threshold for low res.
This change only affects speed = 8, low resolution.
Metrics for avgPSNR/SSIM on rtc_derf (low resolution) show loss of
0.27/0.31%.
On Nexus 6, the encoding time is reduced by ~2.3% on average across all
low-res clips.
Visually little change on rtc_derf clips.
Change-Id: Ia8f7366fc2d49181a96733a380b4dbd7390246ec
this removes the need for __STDC_LIMIT_MACROS which is defined in
vpx_integer.h, but may be preceded by earlier includes of stdint.h;
fixes build with the r13 ndk
Change-Id: I3950c8837cf90d5584a20ce370ae370581c2182c
avoids the definition of min/max macros in headers that may appear in
c++ unit tests. the codebase uses VPXMIN/MAX for this purpose in any
case
Change-Id: I2b679b045d64fb34fd8780f704e3caf10a758d82
use 'android/cpufeatures' rather than 'cpufeatures'; this matches the
documentation, fixes compilation with r12b/r13 and still works with
older ndks.
Change-Id: I2f34233c164e6d4d46428f8905d5502cea4288a2
Changes only affects speed = 8 for low resolutions.
Metrics for avgPSNR/SSIM on rtc_derf (low resolutions) show loss of
0.5/0.6%.
On Nexus 6, the encoding time is reduced by ~5.9% on average across all
low-res clips.
Visually little/no change on rtc_derf clips.
Change-Id: I68dd50e558d72dcc1af8317d224bfae5e3bd872d
This commit enables asymptotic closed-loop encoding decision for
the key frame and alternate reference frame. It follows the regular
rate control scheme, but leaves out additional iteration on the
updated frame level probability model. It is enabled for speed 0.
The compression performance is improved:
lowres 0.2%
midres 0.35%
hdres 0.4%
Change-Id: I905ffa057c9a1ef2e90ef87c9723a6cf7dbe67cb
this was enabled in:
3ae2597 idct,NEON: add a tran_low_t->s16 load adapter
+ enable it for all NEON configs, both intrisincs and assembly versions
exist
BUG=webm:1294
Change-Id: I339088b2a398200f95658d040034fb9b2a7c8ce0
usage of the vp8 versions was removed in:
3f72509 vp8: remove VP8_SET_DBG* control support
vp9 had the usage stripped even earlier.
Change-Id: I978142eb6492552cd29c9c6feb1e89acfc5f7b84
this was enabled in:
3ae2597 idct,NEON: add a tran_low_t->s16 load adapter
+ enable it for all NEON configs, both intrisincs and assembly versions
exist
BUG=webm:1294
Change-Id: Iaade219e9d1de7b69423670d3ea6271b0965e068
idct4x4 and idct8x8 were universally enabled for high-bitdepth builds
in:
3ae2597 idct,NEON: add a tran_low_t->s16 load adapter
BUG=webm:1294
Change-Id: If142afb169c48728cc4b222e7c41aa4a63f95f0f
replace load_and_transpose_s16_8x8() in idct32_6_neon() with a separate
load_tran_low_to_s16() and transpose_s16_8x8(). the combined function is
used in idct32_8_neon() where the input is the correctly sized output
from the earlier stage.
BUG=webm:1294
Change-Id: I4257c4b3a421b2cf5d13651f966eee0680ef98a9
For noisy content, be more aggressive in skippping some blocks for
delta-qp to reduce noise pulsing artifact. Also treat frame boundary
case when dimension is not multiple of superblock size/64.
Only affects non-screen content case, and when source noise
is measured to be high (at least level kMedium).
Change-Id: Ib13a2a20ed1ce37ff3c44d95c3ef2635fd695222
This uses the same sdx4df pointers as vp8_diamond_search_sadx4 and
should therefore target the same optimizations.
See e4ddf9db6a
Change-Id: Ic298e9b25c34bbe6b7a0799509355b0addb56675
The matching on ads2gas_apple.pl is too liberal and catches
CONFIG_EXTERNAL_BUILD and CONFIG_INTERNAL_STATS because they have RN in
the names.
The RN renaming feature is not used in any existing assembly files. It
was used in some armv6 files but they were removed.
Change-Id: Ib65abf1947d3e89f0d1584e2a5de399d24008f95
Two functions do not pass this test:
vpx_idct8x8_64_add_ssse3
vpx_idct8x8_12_add_ssse3
The test has been modified to avoid triggering an issue with those
functions but they still must be investigated.
BUG=webm:1332
Change-Id: I52569a81e8e6e0b33c4a4d060d0b69c3fc4f578e
Add condition that usable_ref_frame > LAST.
This is to avoid potentially skipping all last-nonzero mv modes,
if golden is used as a reference but skipped completely for the
current block.
This has no effect currenty, as we always consider testing golden
mode for each block.
Change-Id: I3182cf44664081935a90ed43aa7b32e710e60e22
Fixed formatting bug introduced by the fix to BUG=webm:1322
( Iedc4477aef1746aa0a4f84d88a1156296fd3ba87)
Change-Id: I715ee446c0e8584967ab87ba4e355759dd394187
vp9_init_macroblockd() resets the error_info to cm's global copy; this
needs to be set to the thread-level target to avoid jumping to the
incorrect stack, resulting in hang or crash.
broken since:
1f4a6c8 vp9/tile_worker_hook: add multiple tile decoding
includes v1.5.0, v1.6.0
BUG=629481
Change-Id: Icbf1696b25ba8c479e845fbf227b3c3ca73542f5
vpx_config.asm and idct_neon.asm.S are required since:
3ae2597 idct,NEON: add a tran_low_t->s16 load adapter
Change-Id: If5959a25edb370dd7dcdca71c96e9a5aad0840ce
enable idct4x4* and idct8x8* which are compatible for 8-bit decodes in
high-bitdepth mode. the adapter narrows 32-bit input to 16, whether the
expansion can be avoided at all in this case remains a TODO. roughly
matches sse2.
BUG=webm:1294
Change-Id: I3ea94e5a2070dfd509b5de0c555aab4e1f4da036
A past patch made it so that every frame that had a decode error
caused a corrupted frame to be counted. Unfortunately it was possible
to get both a decode error and a corrupt frame for the same frame
and thus double count an error. This code makes that impossible.
Change-Id: Iea973727422a3bf093ffda72fa358a285736048b
Permits skipping 0, 1/2 or 3/4 of the frames, corresponding to
temporal layers 2, 1 and 0 of a 3-temporal-layer encoding. 1/2
corresponds to TL0 in a 2-layer encoding.
Change-Id: I7f6d131f63707e5262fc67d111bfb3a751ede90d
Allow for passing in the layer bitrates at command line.
Fix to allow passing in bitrate for each spatial-temporal layer.
Change to some default values for 1 pass cbr mode:
spatial scale and qp-max/min.
Small fixes to some build warnings.
Change-Id: I3f9a776262712480a6570bb863a835b2fc49935a
This change is a step in a larger change to the way boost and interval are
determined for ARF and Key frames.
This patch contains some pluming for the general case but focuses on the
key frame boost calculation. This now relies more heavily on the rate at
which the error score increases between the primary and secondary reference
frame. This seems to be less fragile when dealing with different frame sizes.
For example larger image formats tend in the first pass to see a higher
% of intra coded blocks and the use of this number in calculating the frame
decay factor was leading to much lower boost numbers for 4K, for example,
than the same clip coded at 2K.
This change does give overall gains but they are MUCH larger for the 4K Netflix
set. For the 4K Netflix set the average gain is around 3% with some clips > 20%
whereas for the same set at 2K the average gain is 0.5-1%.
In general for small image formats the boost is most often reduced a little whereas
4K clips the boost is increased. There are some -ve cases such as Akiyo at 352x288
where the reduced boost hurts the metrics, especially for SSIM, even while
the set as a whole improves. This is most notable at very low Q and may be the
subject of a future patch.
Some common code for KF and ARF was separated in this patch for the purposes of
tuning but may later be re-merged if appropriate.
Change-Id: Iaa15ac5a58d2be89181100d95cef6a8dc4b12d0d
Fixes a case where recode is not triggered based on the value
of maxq passed into the recode loop test function.
BUG=b/32375284
Change-Id: I15ad985d0525c68e0443cfaf842440d2754b2266
The result of the transform is added to the destination buffers. In the
existing tests the destination buffer is always empty so that portion of
the code was never exercised.
Change-Id: I1858c4fed2274f1b9faf834d2ba4186a4510492a
Switch to using correctly sized inputs and outputs. This simplifies
adding tests with varying strides.
Change-Id: I716a0d8173dcf6a86d56656ac9d3101b7ec27642
Removed a couple of adjustments that no longer move the needle
much but complicate the process of tuning.
Change-Id: Ie320f5cf155e6aac14a4757ea9ada2cd59f27590
Modified the encoder multi-thread test so that it included cpu-used=0 and
frame-parallel=0.
frame_parallel_decoding_mode is 1 by default, which disables probability
updating and gives lower encoding quality. Current VP9 multi-threading
encoder and decoder support probability updating. To test this part, we
should turn on it in the unit test, namely, setting frame-parallel to 0.
Change-Id: Ia1f86e01f0de628f50d819ae31509de3e1b6c755
This patch modified the motion search counts used in:
https://chromium-review.googlesource.com/#/c/305640/
These 2 counts were originally added as thread data, and used to
make decisions in motion search. The tile encoding order can be
inconsistent while using different number of threads, which can
cause bitstream mismatch. Here moved them to tile data to solve
the issue.
BUG=webm:1322
Change-Id: Iedc4477aef1746aa0a4f84d88a1156296fd3ba87
dst += stride behaving better with gcc/clang
Expanding inline function dc_SIZExSIZE() save intructions for
vpx_dc_predictor_SIZExSIZE_neon().
Change-Id: Id0ccbd58b6a31df539141fd33bdf28633339150d
A failure to decode is most likely equivalent to a corrupt
frame for the purpose of returning a failure.
Change-Id: Ie53db2b8130b40b725841f5f7a299d63aa56913d
Re-use the tile worker threads to pack the bitstream in parallel
on a per-tile basis. Restricting this to real-time only for now
(further testing is needed to ensure this does not make 2-pass
worse in any case).
BUG=webm:1309
Change-Id: I8a80da7c5089b837d0df79a5c49d5e3022dfc8ec
In variance partition low resolutions may use varianace based on
4x4 average for better partitioning.
Increase the threshold for doing this at speed = 8.
Improves speed by ~5%, with little loss, < 1%, on RTC_derf set.
Change-Id: Ib5ec420832ccff887a06cb5e1d2c73199b093941
the intrinsics are neutral to ~20% faster on cros/android
devices when using gcc-4.9/clang-3.8.1 and gcc-4.9/clang-3.8.x from the
r13 ndk. neutral results typically came with gcc-4.9 while larger
positive gains were achieved with clang 3.8.x.
BUG=webm:1303
Change-Id: I4d31f9c017944681b881493525d4573a7a5b1e16
Control already exists for vp9, adding it to vp8.
Usage is only when error_resilient is off.
Added a datarate unittest for non-zero boost.
Change-Id: I4296055ebe2f4f048e8210f344531f6486ac9e35
This reverts commit 9e8efa5b18.
this change causes ubsan warnings, failures in
vpxenc_vp9_webm_rt_multithread_tiled
BUG=webm:1309
Change-Id: I020c7be985c771bfff4b3de1afe51cc8edb980da
git log --no-merges 32d5ac4..9732ae9
9732ae9 EbmlElementSize: quiet uint64->int32 conv warning
da04eba SetProjectionPrivate: quiet uint64->size_t conv warning
6db32d5 mkvparser,Projection::Parse: fix int->bool conv
3bb0dfa cosmetics: fix a couple lint warnings
0e179d6 update .clang-format
fc5f88d Fix temp files being left on system.
c04a134 Add support for overriding PixelWidth and PixelHeight.
c0160e0 Add support to explicitly set segment duration.
02bc809 Add support to estimate file duration.
c97e3e7 Add support to output sub-sample encryption information.
26f4344 MakeUID: quiet unused param warning in Android builds
d6af52a Change check to fix compile error.
1720020 webm_parser: Add Mesh value for ProjectionType
78f2c5a webm_parser: Use ./ prefix for includes
da62f65 webm_parser: Remove webm/ prefix from public includes
e15e8f2 webm_parser: Update README build instructions
5023f2b mkvmuxer: Fix Colour::Valid()
cf16204 mkvmuxer_tests: Actually test cue points in the cue point test.
93e9fb3 Validate Colour element values.
8036925 mkvparser_tests: Add Projection element test.
f52d38c mkvparser_tests: Add Colour element test.
826436a mkvparser: minor SeekHead::Entry clean up.
24fb44a mkvmuxer_tests: Add Projection element test.
1e0a8ea mkvmuxer_tests: Add Colour element test.
0278616 mkvmuxer: Colour accessors/mutators.
2346f8f Add mkvparser wrapper functions.
54d6b6b webm_info: Add Projection element support.
65fee06 mkvmuxer_sample: Add support for Projection element.
9a3f2b5 mkvparser_sample: Add support for Projection element.
41e814a mkvparser: Add Projection element support.
483a0ff mkvmuxer: Add Projection element support.
676a713 Add support for the Projection element
725f362 mkvmuxer: Fix memory leak when Colour is set multiple times.
fa182de mkvparser_sample: Add output of audio track codec private size.
8f521f2 mkvparser_tests: Add invalid BlockGroup test.
39137d7 Remove docs saying binary elements default to 0
80685d3 Do not skip over unknown elements at the root level
c147504 Fix legacy Makefile.
58711e8 mkvparser_sample: Fix version info string.
837746f mkvparser_tests: Add invalid block test.
207cd80 Disambiguate sample sources and targets.
a112d71 mkvparser_tests: Refactor invalid file loading code.
5dea33e Disambiguate test source and target names.
125049e parser_tests: Add another truncated chapter string test.
1de8d4c parser_tests: Add truncated chapter string test.
ff8c2b6 parser_tests: Move cue validation to test_util.
4b0690f parser_tests: Add invalid lacing test.
9828e39 mkvmuxer: Set default doc type version to 4.
5495a59 webm_parser: Reference more files in CMakeLists.txt.
0c0ecd0 vpxpes_parser: Add start code emulation prevention support.
639a4bc webm2pes: Remove debug printfs().
9a51102 webm2pes: fflush() in the correct conversion function.
dc7f155 webm2pes: Track total bytes written.
d518128 webm_parser: Enable usage of werror.
e1fe762 webm2pes: Add test for mux/demux of large input.
1b24a79 vpxpes_parser: Read and store PTS when present.
6cf0a0f vpxpes_parser: Store frame payloads.
25d2602 webm_parser: Convert style to match the rest of libwebm
24be76d webm2pes: Replace VpxFrame with VideoFrame.
b451c3b Add a basic video frame storage class.
05c90eb libwebm_util: Clarify error text in superframe parser.
e6415af webm2pes: Make WritePesPacket() a public method.
8f840dd webm2pes: Move frame read out of PES packet write method.
448af97 webm2pes: Restore frame fragmentation support.
f8bb714 cmake: Integrate new parsing API and tests.
cb8ce0b Add a new incremental parsing API
900d322 vpxpes_parser/webm2pes: BCMV and PTS fixes.
4b73545 webm2pes: Add start code emulation prevention.
82903f3 Add column tiles and frame parallel to webm_info
5d91edf style_clean_up: Remove unnecessary parentheses
a95aa4b vp9_level_stats: correct total_uncompressed_bits_ calculation
f46566f mkvreader: Fix shorten-64-to-32 warning in 32 bit builds.
76630ca mkvwriter: Fix shorten-64-to-32 warning in 32 bit builds.
a8ffbd4 webm2pes: Fix format specifier warnings.
faf89d4 Add MaxLumaSampleRate grace percent to stats.
d31e6c9 Fix profile 2 in vp9_header_parser.
bd3ab3a Add flag to estimate last frame's duration to stats.
c182ed9 Fix lint issue in hdr_util.h
cc62ecd Add test for Cluster memory leak
196708a Change MaxLumaSampleRate to be based on frame resolution.
cbd676b mkvmuxer: Fix leak when a Cluster isn't finalized
9a235e0 mkvmuxer: Set doctype to matroska when muxing non-WebM codecs.
47f2843 Add parsing support for new features in CodecPrivate.
e3c9576 Add VP9 level output to webm_info.
5cf549f cmake: Log compiler flag at check time.
bbaaf2d Add class to gather VP9 level stats.
8bb68c2 Add file to parse data from VP9 frames.
296429a Add support to parse VP9 profile.
df3412f Add support for setting VP9 profile and level to sample_muxer.
87832d4 mkvmuxer: Fix Segment::Finalize in kLive mode
6df3e56 mkvmuxerutil.hpp: Add using directives for overloaded size utils.
ec47928 mkvmuxerutil: Revert to using mkvmuxertypes.
a1dc4f2 Fix parsing of VP9 level.
4e3d037 Add support to output Colour elements to webm_info.
d3656fd muxer_tests: ignore iwyu re gtest-message.h
e76dd5e Fix file name in mkvmuxertypes shim.
1be5889 Add temporary include shims at old file locations.
039df94 Add TEST_TMPDIR environment variable
Change-Id: I84bc1401b0aad71ad6727b687f1bede9953a7a08
While we are at it:
- Rename some variables to more meaningful names
- Reuse some common consts from a header instead of redefining them.
Change-Id: I75c4248cb75aa54c52111686f139b096dc119328
(cherry picked from aomedia 09eea21)
Add stronger condition for splitting 64x64, for low noise content.
This reduces dragging artifact near moving head.
Little/no change in metrics on RTC set.
Change-Id: I39b38cfd20f2ece53ff49c2aaf76ba9f82761be1
fc5f88d Fix temp files being left on system.
c04a134 Add support for overriding PixelWidth and PixelHeight.
c0160e0 Add support to explicitly set segment duration.
02bc809 Add support to estimate file duration.
c97e3e7 Add support to output sub-sample encryption information.
26f4344 MakeUID: quiet unused param warning in Android builds
d6af52a Change check to fix compile error.
1720020 webm_parser: Add Mesh value for ProjectionType
78f2c5a webm_parser: Use ./ prefix for includes
da62f65 webm_parser: Remove webm/ prefix from public includes
e15e8f2 webm_parser: Update README build instructions
5023f2b mkvmuxer: Fix Colour::Valid()
cf16204 mkvmuxer_tests: Actually test cue points in the cue point test.
93e9fb3 Validate Colour element values.
8036925 mkvparser_tests: Add Projection element test.
f52d38c mkvparser_tests: Add Colour element test.
826436a mkvparser: minor SeekHead::Entry clean up.
24fb44a mkvmuxer_tests: Add Projection element test.
1e0a8ea mkvmuxer_tests: Add Colour element test.
0278616 mkvmuxer: Colour accessors/mutators.
2346f8f Add mkvparser wrapper functions.
54d6b6b webm_info: Add Projection element support.
65fee06 mkvmuxer_sample: Add support for Projection element.
9a3f2b5 mkvparser_sample: Add support for Projection element.
41e814a mkvparser: Add Projection element support.
483a0ff mkvmuxer: Add Projection element support.
676a713 Add support for the Projection element
725f362 mkvmuxer: Fix memory leak when Colour is set multiple times.
fa182de mkvparser_sample: Add output of audio track codec private size.
8f521f2 mkvparser_tests: Add invalid BlockGroup test.
39137d7 Remove docs saying binary elements default to 0
c147504 Fix legacy Makefile.
80685d3 Do not skip over unknown elements at the root level
58711e8 mkvparser_sample: Fix version info string.
837746f mkvparser_tests: Add invalid block test.
207cd80 Disambiguate sample sources and targets.
a112d71 mkvparser_tests: Refactor invalid file loading code.
5dea33e Disambiguate test source and target names.
125049e parser_tests: Add another truncated chapter string test.
1de8d4c parser_tests: Add truncated chapter string test.
ff8c2b6 parser_tests: Move cue validation to test_util.
4b0690f parser_tests: Add invalid lacing test.
9828e39 mkvmuxer: Set default doc type version to 4.
5495a59 webm_parser: Reference more files in CMakeLists.txt.
0c0ecd0 vpxpes_parser: Add start code emulation prevention support.
639a4bc webm2pes: Remove debug printfs().
9a51102 webm2pes: fflush() in the correct conversion function.
dc7f155 webm2pes: Track total bytes written.
d518128 webm_parser: Enable usage of werror.
e1fe762 webm2pes: Add test for mux/demux of large input.
1b24a79 vpxpes_parser: Read and store PTS when present.
6cf0a0f vpxpes_parser: Store frame payloads.
25d2602 webm_parser: Convert style to match the rest of libwebm
24be76d webm2pes: Replace VpxFrame with VideoFrame.
b451c3b Add a basic video frame storage class.
05c90eb libwebm_util: Clarify error text in superframe parser.
e6415af webm2pes: Make WritePesPacket() a public method.
8f840dd webm2pes: Move frame read out of PES packet write method.
448af97 webm2pes: Restore frame fragmentation support.
f8bb714 cmake: Integrate new parsing API and tests.
cb8ce0b Add a new incremental parsing API
900d322 vpxpes_parser/webm2pes: BCMV and PTS fixes.
4b73545 webm2pes: Add start code emulation prevention.
82903f3 Add column tiles and frame parallel to webm_info
5d91edf style_clean_up: Remove unnecessary parentheses
a95aa4b vp9_level_stats: correct total_uncompressed_bits_ calculation
f46566f mkvreader: Fix shorten-64-to-32 warning in 32 bit builds.
76630ca mkvwriter: Fix shorten-64-to-32 warning in 32 bit builds.
a8ffbd4 webm2pes: Fix format specifier warnings.
faf89d4 Add MaxLumaSampleRate grace percent to stats.
d31e6c9 Fix profile 2 in vp9_header_parser.
bd3ab3a Add flag to estimate last frame's duration to stats.
c182ed9 Fix lint issue in hdr_util.h
cc62ecd Add test for Cluster memory leak
196708a Change MaxLumaSampleRate to be based on frame resolution.
cbd676b mkvmuxer: Fix leak when a Cluster isn't finalized
47f2843 Add parsing support for new features in CodecPrivate.
9a235e0 mkvmuxer: Set doctype to matroska when muxing non-WebM codecs.
e3c9576 Add VP9 level output to webm_info.
bbaaf2d Add class to gather VP9 level stats.
5cf549f cmake: Log compiler flag at check time.
8bb68c2 Add file to parse data from VP9 frames.
df3412f Add support for setting VP9 profile and level to sample_muxer.
296429a Add support to parse VP9 profile.
87832d4 mkvmuxer: Fix Segment::Finalize in kLive mode
6df3e56 mkvmuxerutil.hpp: Add using directives for overloaded size utils.
ec47928 mkvmuxerutil: Revert to using mkvmuxertypes.
4e3d037 Add support to output Colour elements to webm_info.
a1dc4f2 Fix parsing of VP9 level.
039df94 Add TEST_TMPDIR environment variable
d3656fd muxer_tests: ignore iwyu re gtest-message.h
e76dd5e Fix file name in mkvmuxertypes shim.
1be5889 Add temporary include shims at old file locations.
Change-Id: I6a1026814560be80d604a5ecb9b66406a1186dd9
Re-use the tile worker threads to pack the bitstream in parallel
on a per-tile basis. Restricting this to real-time only for now
(further testing is needed to ensure this does not make 2-pass
worse in any case).
BUG=webm:1309
Change-Id: Ia2c982da56697756e12f02643f589189b3271d98
Fix unit_tests_ubsan failure for VP8/DatarateTestLarge.DenoiserOffOn.
Failure was triggered by commit: df66f8e8.
Change-Id: I7cc5bd309e85950cfc5755e01d0eb942d9ca6984
For 1 pass vbr real-time mode:
Allow for the usage of alt-ref frame when non-zero lag-in-frames is used.
Use non-filtered alt-ref, and select usage based on fast scene/content
analysis/detection within the lag of frames.
Positive gains on ytlive set: overall avgPSNR ~3-4%.
Several clips are up between 5-14%, a few clips are neutral/small change.
Current speed decrease is about ~5-10%.
Use the flag USE_ALTREF_FOR_ONE_PASS to enable this feature
(off by default for now).
Change-Id: I802d2bf3d44f9cf01f6d15c76be9c90192314769
Put limit on gf interval based on lag, and allow
for the adjustment on next gf group also on key frame.
Small/neutral change on ytlive metrics.
Change only affects 1 pass vbr real-time mode.
Change-Id: I339c8f4398848698b6e10fe9482c52ca661b94a5
Updated code to process in 8bit as saturation/clipping takes care of
overflow
Removed unused macro
Change-Id: I113df60286fb28b216df800d95b2d3695ef71440
In 1 pass CBR, with error_resilience off, allow for
special logic to change the default gf behaviour.
In this CL: boost is turned off and the gf period
is set to a multiple of cyclic refresh period.
Change only affect 1 pass CBR mode, i.e, when the flag
gf_update_onepass_cbr is set.
Including the previous change (3ec8e11: to allow cyclic refresh
for error_resilience off), comparing metrics on RTC set for
error_resilience off vs on: avgPSNR/SSIM up by ~6%.
Change-Id: Id5b3fb62a4f04de5a805bd1b418f2b349574e0bc
Due to change in command line to sample encoder from:
7eff8f3 Update to vpx_temporal_svc_encoder command line.
This caused the tests in vpx_temporal_svc_encoder.sh to fail.
Change-Id: Ic667da81955ad117d04610af21877fed1d4f188f
these are compatible as they only load one element of the input so the
larger size of tran_low_t makes no difference in little endian builds.
note the asm is incompatible with big-endian, but there are other points of
failure there so currently it's considered unsupported.
BUG=webm:1294
Change-Id: Icd2665a0699bccae92d1bea43a95b0a83fb17028
It only handles the realloc constraint (preserving low elements) by
serendipity, and we don't actually rely on that behavior anyway.
Meanwhile the calls may do extra copying that gets immediately clobbered
by the callers.
Change-Id: I8dfa89e4a81084b084889c27bd272fdf85184e8d
vp8_short_inv_walsh4x4_msa - Optimized to process in short vector type
Updated below functions to store exact number of bytes in output rather than complete vector
idct4x4_addblk_msa
idct4x4_addconst_msa
dequant_idct4x4_addblk_msa
dequant_idct4x4_addblk_2x_msa
dequant_idct_addconst_2x_msa
Change-Id: Ic1b3752e2421dc7d70a082dcdaab9d140d7e5d9c
cyclic_refresh was tied to error_resilience mode.
Allow it to be on also for 1 pass CBR mode even if
error_resilience is off.
Other option to use new control for this, but prefer to avoid
that for now.
Change-Id: I3625b292ee059a890e31338b514e211bf0ab5c3e
This change will make the highbd txfm input range check more comprehensive
The 25-bit highbd input range is composed by
12 signal input bits + 7 bits for 2D forward transform amplification + 5 bits for
1D inverse transform amplification + 1 bit for contingency in rounding and quantizing
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1286
BUG=https://bugs.chromium.org/p/chromium/issues/detail?id=651625
Change-Id: I04c0796edd7653f8d463fba5dc418132986131e7
These caused the following warning with GCC 5:
warning: logical not is only applied to the left hand side of
comparison [-Wlogical-not-parentheses]
assert(!is_compound == (cm->reference_mode == SINGLE_REFERENCE));
Change-Id: If296aabb2311ceb7d903b395c1549ef81c2cbf9b
(cherry picked from commit c6cf7a6111)
these functions are incompatible currently and unreferenced in rtcd,
exclude them from the build.
BUG=webm:1294
Change-Id: I7790c195a91e1b142f56c04d2a5e305d9133b896
Rename vpx_lpf_horizontal_edge_8() to vpx_lpf_horizontal_16().
Rename vpx_lpf_horizontal_edge_16() to vpx_lpf_horizontal_16_dual().
Change-Id: I798ca8fbbd657d06d3db2bfb0fb3321168f49e52
This patch sets the 16x16 src_diff to zero and ensures correct calculation
of this_error for block sizes smaller than 16x16.
Change-Id: I7b7c02d267433c9f22c8ac9b8d5df2f499175172
change_config() may be called often in real-time application,
to update bitrate/framerate or qp-max/min.
No need to do update_frame_size() unless frame size has changed.
Change-Id: I23a51deade1e03adc91c468f9ffde3235298770c
For real-time mode at speed 8: turn off MINIMAL_LF at speed 8,
for non-screen content mode.
Visually better, avgPSNR/SSIM on rtc set go up by ~4-5%.
Speed decrease of about ~3%.
Change-Id: I8eb69330f02e0ceece1507d43cfc8a049a1d8291
to get_binary_prob(). the only other caller mode_mv_merge_probs() does
its own test on 0.
BUG=chromium:639712
Change-Id: I1178688706baeca2883f7aadbc254abb219a44ce
+ inline the function directly as there was only one consumer
(get_prob())
this is an attempt to reduce the amount of branches to workaround an amd
bug. this change is mildly faster or neutral across x86-64, arm.
http://support.amd.com/TechDocs/44739_12h_Rev_Gd.pdf
665 Integer Divide Instruction May Cause Unpredictable Behavior
BUG=chromium:639712
Suggested-by: Pascal Massimino <pascal.massimino@gmail.com>
Change-Id: Ia91823aded79aab469dd68095d44300e8df04ed2
Inline function called from test/dct16x16_test.cc wouldn't build due to:
invalid operands of types ‘__gnu_cxx::__enable_if<true, double>::__type
{aka double}’ and ‘int’ to binary ‘operator>>’
return (abs(ref->row) >> 3) < COMPANDED_MVREF_THRESH &&
this converts the test to abs() < COMPANDED_MVREF_THRESH << 3 which
hides the promotion issue.
Regression from commit de993a847f
BUG=webm:1291
Change-Id: I73b5943d07d5b61b709d299114216a2371a8fd62
provides msvc-like warnings for implicit conversions from 64-bit to
32-bit types
--enable-vp9-highbitdepth still requires some work
this also skips CXXFLAGS for now as some work would be needed to cleanup
third_party/*.cc or split it from test/*.cc where it comes to flags.
Change-Id: Ic9a095b73286eba5ed39bfc27ff69593748cbbf4
The original commit never set any 'specialize' line:
61311e6103
It appears the sadx4 version of function uses sdx4df calls to speed up
the search. There are no sse3 versions of the sdx4df functions, but
there are sse2 and msa versions.
There is a neon version of vpx_sad16x16x4d but not any of the smaller
versions. Perhaps if they existed this function could be expanded to use
them.
Change-Id: I936d7d6b1a3ff6dcd5a4d2322272708c47cdec13
* changes:
Expand -Wextra to more of the library
mips: clean up wextra warnings
Add compiler flag -Wsign-compare
Add compiler warning flag -Wextra and fix related warnings.
Remove unused zbin variable:
warning: unused parameter ‘zbin’
Use int for loop variables to avoid unsigned conversion:
warning: comparison between signed and unsigned integer expressions
Change-Id: Icea74b870c0ee68a8bf687e796a69392af25a8ad
Also, fix the warnings generated by this flag.
(cherry picked from commit ebeb1155d4fa6d28e2f40c92265245f8df097fcb)
From AOM. Don't actually add -Wsign-compare. It will be covered by
-Wextra.
Switch to vpx_integer.h from df9c9d6d4c43f02c58d4e776c53323788e013cbc
BUG=webm:1069
Change-Id: I1dc6e61caa5d56af4a55b6692ab620bb3144652a
Note: some of these warnings are enabled by a combination of -Wunused
(added earlier) and -Wextra.
Cherry-picked from AOM 4790a69faaec8f03d65f64ff070f6ab4307dbb16
Expands use of (void)x; on unused variables. AOM only supports one codec
in codec_factory.h
Does not include changes to HandleDecodeResult. AOM removed
invalid_file_test.cc which does use the video parameter.
Does not enable -Wextra yet. There are more issues to fix.
BUG=webm:1069
Change-Id: I322a1366bd4fd6c0dec9e758c2d5e88e003b1cbf
Current version does not build with options:
--enable-vp9-highbitdepth --enable-coefficient-range-checking
Change-Id: Ic3285f1a3e0d6be88da7f2cd8fa5a631368dd03b
Reduce the filt_guess for 1 pass cbr on inter-frames.
This reduces visual artifact seen in rtc clip (jimred.vga),
and improves metrics on rtc set.
Metrics on rtc set for cbr mode overall positive, most clips are up:
Speed 7 rtc: avgPSNR/SSIM up by: ~2.6/3.9%
Speed 8 rtc: avgPSNR/SSIM up by: ~1.3/2.5%
Change-Id: Ia4eccea1c19d65b583516df28823cd756c49464f
Added a cap on the maximum boost for an arf based on interval length.
Fixed bug where by the image size was not accounted for in determining
two of the motion breakout thresholds.
Overall small gains of 0.2-0.4% psnr but on large image format clips with
slow zooms the gain may be as much as 20% or more (e.g. in_to_tree
at 1080P)
Change-Id: Id0a47391203026742daa9c97afac5705fd8c4dfb
The value 35468 changes sign when stored in int16_t:
implicit conversion from 'int' to 'int16_t' (aka 'short')
changes value from 35468 to -30068
This negation requires adding back the original value to compensate.
Shifting the value keeps the value positive and saves a post-vqdmulh
shift.
This technique is used in webp and idct_dequant_full_2x_neon
BUG=b/28027557
Change-Id: I0c5ce09bea170fe08061856c2af6f841a557e0c3
This restores d9dce2f48e
Switched to using signed shift-and-narrow. Instead of saturating
negative results to 0, it was saturating them to 255.
BUG=webm:817
BUG=webm:1273
Change-Id: I571095336aa4182e3288b17924fcaaece42b0a49
removes some unnecessary casts and adds a few explicit uint32 ones for
larger sizes to quiet -Wshorten-64-to-32 warnings
Change-Id: I63c5fce8e62c426d5cf5c10a66a113c119a43518
Do nothing in vp9_highbd_iht#x#_##_add_c when input magnitude is beyond
20 bits. Note that, sign bit is not included here.
In the 20 bits, we use 12 bits for input signal, 7 bits for forward
transform amplification, and 1 bit for contingency in rounding and
quantizing
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1286
Change-Id: I332c6f68df4614fc2e7d2dc4c5bb0d0cff8a245c
When filtering it needs 6 pixels: 2 prior to the source, the source, and
3 after the source.
When filtering 16 wide, that means 21. To accomplish this the SSE2 reads
[-2] to [5], [6] to [13], and [14] to [21], a total of 24 bytes (reading
in groups of 8 is easy)
The filter then shifts this last set to the top half of the register and
uses 'or' to combine it with the previous set.
Valgrind detected an issue reading pixels [19], [20] and [21]:
Address 0x7f581c2 is 434 bytes inside a block of size 441 alloc'd
Note: we only need pixels [16], [17], and [18] as context for [15].
To fix this, it now reads 8 bytes starting at [11], which re-loads [11]
through [13], but stops at [18] and does not over-read any values.
This is shifted by 5 and 'or'd with xmm1. Although the lower bits are
not cleared, they overlap directly with [11] through [13], so 'or'
produces the correct results.
Change-Id: I0c89c03afa660fc9b0108ac055d7bd403e493320
On 32 bit machines 'new' does not always appear to allocate sufficiently
aligned buffers, causing intermittent test failures.
Change-Id: I0db4fc73782012e4eef71dc0fb540e74fdbfcebe
the --enable-postproc-visualizer configure option remains as a no-op as
do the control names and values for compatibility
+ remove the corresponding debug flags from vpxdec: --pp-*
Change-Id: I4a001cd9962b59560d7d6bda6272d4ff32b8d37c
similar to changes that were done in vp9 for encoded frame size
reporting. has the side-effect of quieting a -Wshorten-64-to-32 warning.
Change-Id: I89f74cb617fc29334ee351dc8dfaa3b8cfd4e5af
The referenced bug was fixed by saving neon registers. That this had any
effect was coincidental.
Both chromium and Android build with clang and neither uses this flag.
Change-Id: I470247d6fd9226fc207b42a187105581a94badc3
The vp9_mv_class0_tree is a balanced tree with two leafs and can
simply be coded as a boolean with probability class0[0].
Change-Id: If294dac825a5f945371092c74aa8e3f84cd962b6
(cherry picked from commit be8a8ab62ebdd111c6f2e9a33b15630570671eba)
assume __clang_major__==0 has the latest version of
_mm256_broadcastsi128_si256. fixes builds with custom clang toolchains.
BUG=b/30970831
Change-Id: I90becd56278e4716bd46e2ba9d910af977e8dfa6
The code only has issues when xoffset == 0 and yoffset == 0 which
represents a simple copy. Presumably this case does not need to be
handled because the issue has existed since 2010.
BUG=webm:1287
Change-Id: Ic47e2653f3b729e99b40e53d8d2d8d1501edaaa9
Build out the sixtap_predict test because the filters are
interchangeable. Add verbose failures and border checking.
Change-Id: I962f50041750dca6f8d0cd35a943424cf82ddcb1
This reverts commit d9dce2f48e.
Appears to be failing the SixtapPredict tests in some configurations and possibly test vectors as well.
Change-Id: Ica6aa83ebac47d0a76e451846e7da67b1c17a7d7
This function was removed when clang started introducing alignment hints
which caused the 32 bit vld1_lane_u32/vst1_lane_u32 to fail:
https://llvm.org/bugs/show_bug.cgi?id=24421
The load has been rendered safe with an implementation ~indiscernible
performance-wise that uses _u8 and over-reads just a touch.
It is still ~5x faster than C in the unaligned case and doing both
filters.
BUG=webm:892
BUG=webm:1273
Change-Id: Icf7167189391b46202f47233bb585c24c42bcc36
postproc.c is overloaded and used for both postproc and internal stats.
If only --enable-internal-stats is specified there are issues with
non-existent struct members and unused functions.
Change-Id: I82367f1ffce659c3918c9f964dbce94a716fbb89
All the other test which do not use 'pass' (which appears to be almost
all of them) do this.
Cleans -Wextra/-Wunused-parameter:
unused parameter ‘pass’
Change-Id: I1ff3acf3f3d1e831f94dcb00ea36337afe0aefe0
Remove the experiment LIMIT_QP_ONEPASS_VBR_LAG, as its
not currently used and no plan to use in near future.
Change-Id: Ib069f8d7225195be04b765d0ab477510dfba6a3b
This function was removed when clang started introducing alignment hints
which caused the 32 bit vld1_lane_u32/vst1_lane_u32 to fail:
https://llvm.org/bugs/show_bug.cgi?id=24421
The load has been rendered safe with an implementation ~indiscernible
performance-wise that uses _u8 and over-reads just a touch.
The store, when unaligned, has a version that is ~25% slower but safe
when xoffset = 0 (second pass filter only). When the first pass filter
(or both) are in play, the new version is almost identical in speed.
Worst case performance (both filters, unaligned stores) is roughly 3-4x
faster than C.
BUG=webm:817
BUG=webm:1273
Change-Id: I1e490e94453e0872151fe0dafb05557463f6247d
Use the canonical 'vpx_codec_dec_cfg_t()' as opposed to 'vp9_zero()'
which just hammered everything to 0.
Change-Id: Id820efef700ad92a625797f8fd58e465b15eeca4
For some reason allocated_decoding_thread_count is signed, but decoding_thread_count is not.
Cleans -Wextra/-Wsign-compare:
comparison between signed and unsigned integer expressions
Change-Id: Id0ada78100acff27c1c4ed7493c563d13c55cdcd
Use vp9_zero() to set every element.
Cleans -Wextra/-Wmissing-field-initializers:
missing initializer for member ‘vpx_codec_dec_cfg::w’
missing initializer for member ‘vpx_codec_dec_cfg::h’
Change-Id: I5b41ce7d55a912e29b1d4c3e840cea80e8510fbe
This change was reverted before due to a hangouts encode-time
regression investigation. But since then this change has been
cleared of causing any noticeable regression.
This mode reduces some false detection, and uses the
same model as in vp9.
Change-Id: I9c82a748c5f601d0aca9f61ee218abfbd58c62bd
* changes:
Enable -Wundef by default
Define VP8_TEMPORAL_ALT_REF to !CONFIG_REALTIME_ONLY
Remove CONFIG_DEBUG guards from assert()
Remove unused function vpx_de_mblock
Fix -Wundef warning for OUTPUT_FPF
Fix -Wundef warning for __SANITIZE_ADDRESS__
Added casts to remove warnings:
BUG=webm:1274
In regards to the safety of these casts they are of two types:-
- Normalized bits per (16x16) MB stored in a 32 bit int (This is safe as bits
per MB even with << 9 normalization cant overflow 32 bits. Even raw 12
bits hdr source even would only be 29 bits :- (4+4+12+9) and the encoder
imposes much stricter limits than this on max bit rate.
- Cast as part of variance calculations. There is an internal cast up to 64 bit
for the Sum X Sum calculation, but after normalization dividing by the number
of points the result will always be <= the SSE value.
Change-Id: I4e700236ed83d6b2b1955e92e84c3b1978b9eaa0
Previously VP8_TEMPORAL_ALT_REF was only defined for non-realtime-only
builds. However, its value was checked with #if, not #ifdef.
Fixes -Wundef warnings.
BUG=webm:1069
Change-Id: If78d8731298f3f0d3662ffa25f973e7adaf67152
Using a tighter resize constraint on undershoot seems to help
results (especially SSIM) as significant undershoot on a frame
seems to have more of a damaging impact than overshoot.
This patch has been tuned so that in local testing using the
derf set it is encode speed neutral for speed setting 2.
Average quality result for speed 2 (psnr,ssim) were as follows:-
lowres 0.039, 0.453
midres 0.249, 0.853
hdres 0.159, 0.659
NetFlix -0.241, 0.360
Change-Id: Ie8d3a0d7d6f7ea89d9965d1821be17f8bda85062
Fixes windows build issue:
==> tests::VS10_x64 is broken
LINK : warning C4742: 'kYvuI601Constants' has different alignment in 'third_party\libyuv\source\row_common.cc' and 'third_party\libyuv\source\planar_functions.cc': 32 and 2 [.build-x86_64-win64-vs10\vpxdec.vcxproj]
LINK : warning C4744: 'kYvuI601Constants' has different type in 'third_party\libyuv\source\row_common.cc' and 'third_party\libyuv\source\planar_functions.cc': '__declspec(align(32)) struct (224 bytes)' and 'struct (224 bytes)' [.build-x86_64-win64-vs10\vpxdec.vcxproj]
LINK : warning C4742: 'kYuvI601Constants' has different alignment in 'third_party\libyuv\source\row_common.cc' and 'third_party\libyuv\source\planar_functions.cc': 32 and 2 [.build-x86_64-win64-vs10\vpxdec.vcxproj]
LINK : warning C4744: 'kYuvI601Constants' has different type in 'third_party\libyuv\source\row_common.cc' and 'third_party\libyuv\source\planar_functions.cc': '__declspec(align(32)) struct (224 bytes)' and 'struct (224 bytes)' [.build-x86_64-win64-vs10\vpxdec.vcxproj]
LINK : warning C4742: 'kYvuI601Constants' has different alignment in 'third_party\libyuv\source\row_common.cc' and 'third_party\libyuv\source\planar_functions.cc': 32 and 2 [.build-x86_64-win64-vs10\vpxenc.vcxproj]
LINK : warning C4744: 'kYvuI601Constants' has different type in 'third_party\libyuv\source\row_common.cc' and 'third_party\libyuv\source\planar_functions.cc': '__declspec(align(32)) struct (224 bytes)' and 'struct (224 bytes)' [.build-x86_64-win64-vs10\vpxenc.vcxproj]
LINK : warning C4742: 'kYuvI601Constants' has different alignment in 'third_party\libyuv\source\row_common.cc' and 'third_party\libyuv\source\planar_functions.cc': 32 and 2 [.build-x86_64-win64-vs10\vpxenc.vcxproj]
LINK : warning C4744: 'kYuvI601Constants' has different type in 'third_party\libyuv\source\row_common.cc' and 'third_party\libyuv\source\planar_functions.cc': '__declspec(align(32)) struct (224 bytes)' and 'struct (224 bytes)' [.build-x86_64-win64-vs10\vpxenc.vcxproj]
LINK : error C2220: warning treated as error - no 'executable' file generated [.build-x86_64-win64-vs10\vpxdec.vcxproj]
LINK : error C2220: warning treated as error - no 'executable' file generated [.build-x86_64-win64-vs10\vpxenc.vcxproj]
Change-Id: Ic3c4fff9209f5a52ff8f8ff321548d49ba09ec06
removes the need for an intermediate cast to int, which was missing in
the call added in:
69c5ba1 vpx_mem: Refactor code
quiets a visual studio warning:
C4146: unary minus operator applied to unsigned type, result still
unsigned
Change-Id: I76c4003416759c6c76b78f74de7c0d2ba5071216
This patch is to address concerns that changes to allow
recodes on the first frame in each ARF group do not give a
good enough speed quality trade off for speed 2. Though the
average impact on encode speed is 1-2%, for some hard clips
it is > 5% rise. For speed 1 this is less an issue and for Speed 0
the previous patch actually improves speed.
Change-Id: Ie1bcefdbfdf846d3f4428590173f621465dffe3a
This corrects a formatting error introduced in:
I1e9d548ce445d29002f0c59ebfd3957a6f15e702
where spaces were used as delimiters instead of tabs.
The corresponding fix for vp10 is in
Ica3d625d6672b3c47e0e208b45eede29b9004030.
Change-Id: Ibc4eb8fd82e6b926ba259a679dc98557cadba9b1
Current commit is just an API template for the rest of the code, and
I will add inner logic later.
Altref frames generate a lot of bitrate and at the same time
other frames refer to them a lot, so it makes sense to apply
special compensation-based adaptive quantization scheme for altref
frames. E.g., for blocks that are good predictors for the future
apply rate-control chosen quantizer while for bad predictors apply
worse one.
Change-Id: Iba3f8ec349470673b7249f6a125f6859336a47c8
Previously Tx domain rd was used in all cases above speed 0.
Coefficient optimization was only enabled for best and speed 0.
This patch selectively sets these features at other speed settings
based on block complexity.
For the Netflix and HD sets in particular the quality gains are
large compared to the speed hit. At speed 1 the average psnr
gain in the NF set is > 2.5% with one clip coming in at 18%
and some points almost 30%. Average gains for the lower
resolution test sets are around 1%.
The gains are biggest at low Q so some further optimization
may be possible.
Change-Id: I340376c7b2a78e5389a34b7ebdc41072808d0576
fixes SSE2/AddNoiseTest.CheckCvsAssembly/0 with -funsigned-char.
visibly broken since:
0dc69c7 postproc : fix function parameters for noise functions.
where the types diverged (char vs. int8)
but likely the return changed in:
2ca24b0 postproc - move filling of noise buffer to vpx_dsp.
when multiple implementations were merged.
Change-Id: I176ca1f170217f05ba7872b0c4de63e41949e999
set a max allocable size to prevent overflows in 32-bit and extremely
large allocation attempts in 64-bit. this could be amended to allow size
or num parameters to be 64-bits with the correct size being used at each
call site.
BUG=webm:819
Change-Id: Ia81004d6c4279680714c4488b4f6cf287ab396a5
vpx_realloc was allocating 1 byte more than needed every time.
Fixed this, and took this opportunity to do a small refactoring.
Change-Id: I38fcb62b698894acbbab43466c1decd12f906789
(cherry picked from aom: 2a876b4 aom_realloc correction.)
In the future this option will activate adaptive quantization special
for altref frames. Encoder will create the adaptive quantization map
on the basis of lookahead buffers similarity which is the estimate of
the future motion compensation performance.
Change-Id: Ia0088b3babb0f9a4899c79d8d819947ba5a03df2
This function only exists as a shortcut to subpixel variance with
predefined offsets. xoffset = 4 for horizontal, yoffset = 4 for vertical
and both for "hv"
Removing this allows the existing optimizations for the variance
functions to be called. Instead of having only sse2 optimizations, this
gives sse2, ssse3, msa and neon.
BUG=webm:1273
Change-Id: Ieb407b423b91b87d33c4263c6a1ad5e673b0efd6
decoding the same invalid keyframe twice would result in a crash as the
second time through the decoder would be assumed to have been
initialized as there was no resolution change. in this case the
resolution was itself invalid (0x6), but vp8_peek_si() was only failing
in the case of 0x0.
invalid-vp80-00-comprehensive-018.ivf.2kf_0x6.ivf tests this case by
duplicating the first keyframe and additionally adds a valid one to
ensure decoding can resume without error.
BUG=b/30593765
Change-Id: If0859035908b7870d67a7f3f646b5a080252eb6d
Disabled the split mode while encoding 4k video to speed
up the encoder.
Borg test result on 4k set:
Overall PSNR: +0.029%; SSIM: +0.009%.
Average encoder speedup at speed 2 is 2.5%.
Change-Id: I1519c658f07c3ac838affbe5aff0ed9b94f3f8f4
Adjusted speed 2 features to speed up 4k video encoding.
BDBR results from borg test:
PSNR: +0.313%; SSIM: +0.268%.
Average speedup: 8.5%
Change-Id: I1e2695a01fb3f3817c1df4480e184c2aed8f2eba
this fixes a crash in vp9_dec_setup_mi() via
vp9_init_context_buffers() should decoding continue and the decoder
resyncs on a smaller frame
BUG=b/30593752
Change-Id: I9ce8d94abe89bcd058697e8bd8599690e61bd380
Bias towards base_mv and skip 1/4 pixel motion search when using base mv.
2~3% speed up for 2 spatial layers, 3~5% speed up for 3 spatial layers.
PSNR loss:
(2 layers) 0.07dB for gips_stationary, 0.04dB for gips_motion;
(3 layers) 0.07dB for gips_stationary, 0.06dB for gips_motion.
Change-Id: I773acbda080c301cabe8cd259f842bcc5b8bc999
Add option, for newmv-last, to limit the rd-threshold update for early exit,
under a source varianace condition.
This can improve visual quality in low texture moving areas,
like forehead/faces.
Also add bias against golden to improve the speed/fps,
will little/negligible loss in quality.
Only affects CBR mode, non-svc, non-screen-content.
Change-Id: I3a5229eee860c71499a6fd464c450b167b07534d
The flag was added because Apple clang and Chromium clang disagreed
for certain versions of instructions.
qsubaddx, qaddsubx, ldrneb and ldrneh were used in armv6 assembly
which was removed in d55724fae9
vqshrun was used in some neon assembly but superseded by
dcbfacbb98
.include was used for obj_int_extract/asm_offsets and removed in
6eec73a747
Change-Id: I32f4c9b536d0318482101c0b8e91e42b8f545f18
Changes the default recode rule for Speed 0 and best quality
from ALLOW_RECODE to ALLOW_RECODE_KFARFGF.
Tested on the NF, hdres, midres and lowres test sets, this setting
when combined with patch I40cb559... now performs "as well" in
metrics terms (in fact it came out a tiny amount better overall)
but encode time is 9.6% faster (measured as the average
from 27 mid rate local encodes on clips in the derf/lowres set.
Change-Id: I8c781c0cdfa3a9929cd9406d15582fce47d6ae3b
Allow recodes for the first inter frame in each arf group
even when the recode rule is set to ALLOW_RECODE_KFARFGF.
Small gains of 0.05%.
Change-Id: I40cb559d36a2bf0ebf5cf758c3f92e452b480577
disable clang-format for bilinear_filters_avx2
restores the row layout prior to:
099bd7f vpx_dsp: apply clang-format
but keeps the justification used by clang-format
Change-Id: Icf1733a37edb807e74c26b23a93963c03bd08fd7
This patch fixed a motion vector out of range bug:
vpxenc: ../libvpx/vp9/encoder/vp9_mcomp.c:69:
mv_cost: Assertion `mv->col >= -((1 << (11 + 1 + 2)) - 1) &&
mv->col < ((1 << (11 + 1 + 2)) - 1)' failed.
For blocks that returned without having full-pixel search, the original
MV limits were not restored, which caused the failure. Moved the set
MV limit function down to fix the bug.
Change-Id: Id7d798fc7214e95c6e4846c588f0233fcf1a4223
_beginthreadex does not align the stack on 16-byte boundary as expected
by gcc.
On x86 targets, the force_align_arg_pointer attribute may be applied to
individual function definitions, generating an alternate prologue and
epilogue that realigns the run-time stack if necessary. This supports
mixing legacy codes that run with a 4-byte aligned stack with modern
codes that keep a 16-byte stack for SSE compatibility.
https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html
Change-Id: Ie4e4ab32948c238fa87054d5664189972ca6708e
Signed-off-by: Aleksey Vasenev <margtu-fivt@ya.ru>
prevents use of an uninitialized value in the deconstructor should the
test fail before tmpfile_ is set.
Change-Id: I8b49fd05f0d05e055fdf653bd46983d30f466a68
applied against a x86_64 configure with and without
--enable-vp9-highbitdepth
clang-tidy-3.7.1 \
-checks='-*,google-readability-braces-around-statements' \
-header-filter='.*' -fix
+ clang-format afterward
Change-Id: Ia2993ec64cf1eb3505d3bfb39068d9e44cfbce8d
Extract the duplicated data generation code in OperationCheck() of
Loop8Test6Param and Loop8Test9Param, and put in function InitInput().
Change-Id: Ied39ba4ee86b50501cc5d10ebf54f5333c4708f0
This patch fixed a motion vector(MV) out of range bug, which was caused
by not restoring the original values of the MV min/max thresholds after
the sub8x8 full pixel motion search. It occurred rarely and only was seen
while encoding a 4k clip for 200 frames.
BUG=webm:1271
Change-Id: Ibc4e0de80846f297431923cef8a0c80fe8dcc6a5
* changes:
Use common transpose for vpx_idct32x32_1024_add_neon
Use common transpose for vpx_idct8x8_[12|64]_add_neon
Use common transpose for vp9_iht8x8_add_neon
Use common transpose for vpx_idct16x16_[10|256]_add_neon
The code was expanding to Q registers so that vqrshn could be used, for
vector quad round shift and narrow. If 4 values are added together,
there is a shift by 2. If 8 values, a shift by 3. Since this accounts
for any possibility of overflow, we can skip the narrowing shift.
This allows keeping the values in D registers and casting the 16 bit
value to 8 bits.
Change-Id: I8d9cfa07176271f492c116ffa6a7b351af0b8751
The neon intrinsics are not able to load just the 4 values that are
used. In vpx_dsp/arm/intrapred_neon.c:dc_4x4 it loads 8 values for both
the 'above' and 'left' computations, but only uses the sum of the first
4 values.
BUG=webm:1268
Change-Id: I937113d7e3a21e25bebde3593de0446bf6b0115a
Increase the minimum distance.
Reduces the overshoot somewhat on some clips,
small gain in avgPSNR (~0.1%) on ytlive set.
Change-Id: Id5ddde20c2907dbdb536e79542eff775019c142b
This error was introduced by the patch:
8ce67d7 vp9 svc: Enable different speed setting for each spatial layer.
To use svc, svc_param should be cleared to 0 at the beginning.
Change-Id: I222f03ddae8a50e84b4690b78263abb742fae91e
Move best index into the token state. Shrink it down to one byte. This
is more cache friendly (access are group together) and uses less total
memory.
Results in 4% fewer cycles in optimize_b().
Change-Id: I75db484fb3dc82f59928d54b659d79c80ee40452
everything outside of third_party should follow 'PointerAlignment:
right' i.e., associate the '*' with the variable
+ add a note about the clang-format that generated this file
Change-Id: I13e3f4f5fb6e22a8fa7fc3d06879c995b7c41a39
- make Check() void as the EXPECT's are sufficient to document failure
cumulatively this has the effect of avoiding reporting incorrect Check()
failures due to earlier test failures.
Change-Id: I2cf775449f18c90c1506b8eadd7067adbc3ea046
The LLVM trunk has reached 4.0 and now __clang_major__ is not enough
to distinguish between old XCode Clang and the new 'real' Clang.
Using __apple_build_version__ allows to make this distinction.
BUG=chromium:631144
Change-Id: I0b6e46fddfe4f409c7b7e558bda34872e60ee2d9
Shutdown all threads before reclaiming any memory. The frame-level
parallel decoder may access data from another worker.
BUG=webm:1259
Change-Id: I26856ebd1f77cc4a4545331baa19bbf3e01c4ea4
1 - stops de allocating before threads are closed.
2 - limits threads to mb_rows when mb_rows < partitions
BUG=webm:851
Change-Id: I7ead53e80cc0f8c2e4c1c53506eff8431de2a37e
This commit changes the call in vp9 encoder from vp9_deblock() to
vp9_post_proc_frame() to ensure the data structures used in the call
are properly allocated. This fixes an encoder crash when configured
with --enable-internal-stats.
Change-Id: I2393b336c0f566665336df4f1ba91c405eb56764
bitstream.c: asserts are disabled when CONFIG_DEBUG is unset
vp8_dx_iface.c: split |s into 2 statements across #if bounds
Change-Id: I307d1e969134db5c9c0edd7690589b6b29116cbd
allows 'make test_libvpx', etc. some reworking of the makefiles would be
needed to avoid hard coding targets here.
Change-Id: I18982dbf691e7d36ab8bcf5934bab9340687b061
remove some (but not all yet!) tuple mis-use, and revamp the code a lot.
Factorize some common chores into MainTestClass.
Change-Id: Id37b7330eebe80d19b9d12a454f24ff9be6b1116
Allow usage of lookahead for VBR in real-time mode, for 1 pass vbr.
Current usage is for fast checking of future scene cuts/changes,
and adjusting rate control (gf interval and active_worst/target size).
Added unittests (datarate) for 1 pass vbr mode, with non-zero lag.
Added an experimental option to limit QP based on lookahead.
Overall positive gain in metrics on ytlive set:
avgPNSR/SSIM up on average ~1-3%; several clips up by 5, 7%.
Change-Id: I960d57dfc89de121c4824b9a9bf88d2814e74b56
Chromium changed the upstream default to --squash but this conflicts
with libvpx historical defaults.
Change-Id: I80f2f2b48e2ba08e02184b50e6d5f8f5e76fec24
remove some (but not all yet!) tuple mis-use, and revamp the code a lot.
Factorize some common chores into MainTestClass.
Change-Id: Ia14f3924140e8545e4f10d0504475681baae8336
* changes:
vpx_thread: use CreateThread for windows phone
vpx_thread: use WaitForSingleObjectEx if available
vpx_thread: use InitializeCriticalSectionEx if available
vpx_thread: use native windows cond var if available
vpx_thread.[hc]: update webp source reference
This change eliminates redundant computation in the two stage
downscaling, which saves ~1% encoding time in 3-layer svc encoding.
Change-Id: Ib4b218811b68499a740af1f9b7b5a5445e28d671
Modify the gfu_boost and af_ratio setting based on the
average frame motion level.
Change only affects 1 pass vbr.
Metrics overall positive on ytlive set.
On average up by ~1%, several clips up by 2-4%.
Change-Id: Ic18c49eb2df74cb4986b63cdb11be36d86ab5e8d
Use the precise context to estimate the zero token cost in trellis
optimization process. This improves the speed 0 coding performance
by 0.15% for lowres and 0.1% for midres. It improves the speed 1
coding performance by 0.2% for midres and hdres.
Change-Id: I59c7c08702fc79dc4f8534b64ca594da909e2c91
This commit allows the inter prediction residual to use uniform
quantization followed by trellis coefficient optimization in
speed 0. It improves the coding performance by
lowres 0.79%
midres 1.07%
hdres 1.44%
Change-Id: I46ef8cfe042a4ccc7a0055515012cd6cbf5c9619
Move the operations that update the context buffers outside this
function. The coeff_cost() takes all input as const value and returns
the coefficient cost.
This makes preparation for the next coefficient optimization CLs.
Change-Id: I850eec6e5470b91ea84646ff26b9231b09f70a0c
Replace the existing mv bias with a bias only for
NEWMV, and based on the motion vector difference of
its top/left neighbors.
For cbr non-screen-content mode.
Change-Id: I8a8cf56347cfa23e9ffd8ead69eec8746c8f9e09
Use a measure of noise energy to adjust Q estimate and
arf filter strength.
Gains 0.3-0.5% on Lowres and |Netflix sets.
Hdres and Midres neutral.
Change-Id: Ic0de552e7b6763e70eeeaa3651619831b423e151
Use pixel domain distortion metric in speed 0. This improves the
compression performance by 0.3% for both low and high resolution
test sets.
Change-Id: I5b5b7115960de73f0b5e5d0c69db305e490e6f1d
Safer to have the decoder operate normally and have
better-hw-compatibility only implement encoding changes.
Fixes some test failures.
Change-Id: I0dd70d002e4e893992f0cd59774b9363e6f7fe76
For real time CBR mode, use model_rd_for_sb_y for 32x32 if the sb is
a skin sb to avoid visual regression on the slowly moving face.
Refer to the cl: https://chromium-review.googlesource.com/#/c/356020/
Change-Id: I42c36666b2b474ce5ee274239d52ae8ab400fd46
The transform block row and column positions are always available
outside the callees. There is no need to re-compute these values
again. This approach has been used by the decoder. This commit
removes txfrm_block_to_raster_xy() function.
Change-Id: I5b90f91a0d8b7c35cfa7d171da9edf8202630108
BUG=b/29583578
original webp change:
commit d2afe974f9d751de144ef09d31255aea13b442c0
Author: James Zern <jzern@google.com>
Date: Mon Nov 23 20:41:26 2015 -0800
thread: use CreateThread for windows phone
_beginthreadex is unavailable for winrt/uwp
Change-Id: Ie7412a568278ac67f0047f1764e2521193d74d4d
100644 blob 93f7622797f05f6acc1126e8296c481d276e4047 src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h
Change-Id: Iade8fff6367b45534986c77ebe61abeb45bce0f8
BUG=b/29583578
original webp change:
commit 0fd0e12bfe83f16ce4f1c038b251ccbc13c62ac2
Author: James Zern <jzern@google.com>
Date: Mon Nov 23 20:40:26 2015 -0800
thread: use WaitForSingleObjectEx if available
Windows XP and up
Change-Id: Ie1a46a82722b8624437c8aba0aa4566a4b0b3f57
100644 blob d58f74e5523dbc985fc531cf5f0833f1e9157cf0 src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h
Change-Id: If165c38b378c6e0c55e17a1b071efd3ec3e7dcdd
BUG=b/29583578
original webp change:
commit 63fadc9ffacc77d4617526a50c696d21d558a70b
Author: James Zern <jzern@google.com>
Date: Mon Nov 23 20:38:46 2015 -0800
thread: use InitializeCriticalSectionEx if available
Windows Vista / Server 2008 and up
Change-Id: I32c5b4e5384d614c5a821ef511293ff014c67966
100644 blob f84207d89b3a6bb98bfe8f3fa55cad72dfd061ff src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h
Change-Id: I9ce49b3a86857267e504cd8ceab503b7b441d614
BUG=b/29583578
original webp change:
commit 110ad5835ecd66995d0e7f66dca1b90dea595f5a
Author: James Zern <jzern@google.com>
Date: Mon Nov 23 19:49:58 2015 -0800
thread: use native windows cond var if available
Vista / Server 2008 and up. no speed difference observed.
Change-Id: Ice19704777cb679b290dc107a751a0f36dd0c0a9
100644 blob 4fc372b7bc6980a9ed3618c8cce5b67ed7b0f412 src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h
Change-Id: Iede7ae8a7184e4b17a4050b33956918fc84e15b5
these are debug-only modules that can be added in manually when needed.
leave a reference in vp8_common.mk / vp9_common.mk for easy addition.
quiets -Wmissing-prototypes warning
BUG=b/29584271
Change-Id: Ifc8637d877edfbd562b34dc5c540428bba7951fc
Moved the API patch from NextGenv2. An example was included.
To try it, for example, run the following command:
$ examples/vpx_cx_set_ref vp9 352 288 in.yuv out.ivf 4 30
Change-Id: I4cf8f23b86d7ebd85ffd2630dcfbd799c0b88101
The scaling of the threshold for 10 and 12 bit here appears
to be in the wrong direction. For 10 and 12 bit we expect sse
values to be higher and hence the threshold used should be
scaled up not down.
Change-Id: I2678116652b539aef48100e0f22873edd4f5a786
This function seems to scale the threshold for testing an
SSE value in the wrong direction for 10 and 12 bit inputs.
Also for a true SSE the scalings should probably be << 4 and 8
Change-Id: Iba8047b3f70d04aa46d9688a824f3d49c1c58e90
CONVERT_TO_BYTEPTR(x) was corrected in:
003a9d2 Port metric computation changes from nextgenv2
to use the more common (x) within the expansion. offsets should occur
after converting the pointer to the desired type.
+ factorized some common expressions
Change-Id: I171c3faaa5606d098e984baa9aa74bb36042f57f
Force enable x86inc.asm when building for x86. Previously there were
compatibility issues so a flag was added to simplify disabling this
code.
The known issues have been resolved and x86inc.asm is the preferred
abstraction layer (over x86_abi_support.asm).
BUG=b:29583530
Change-Id: Ib935e97b37ffb22d7af72ba0f04564ae6280f1fd
inadvertently lost in the final patchset of:
078dff7 configure: remove old visual studio support (<2010)
this prevents an empty CONFIG_VS_VERSION and avoids make failure
Change-Id: I529d52eca59329e2715309efd63d80f0e1fed462
For real time CBR mode, use model_rd_for_sb_y for 32x32 if the mode is
newmv last, which is less aggressive in skipping transform and
quantization, to avoid quality regression in some conditions.
Change-Id: Ifa30be587f2a8a4a7f182a172de6ce277c0f8556
Speed test shows the new vertical filters have degradation on Celeron
Chromebook. Added "X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON" to control
the vertical filters activated code. Now just simply active the code
without degradation on Celeron. Later there should be 2 set of vertical
filters ssse3 functions, and let jump table to choose based on CPU type.
Change-Id: Iba2f1f2fe059a9d142c396d03a6b8d2d3b981e87
Due to rounding, hbd variance may become negative. This commit put in
check and clamp of negative values to 0.
Change-Id: I610d9c8aa2d4eebe7bc5f2c5624a9e3cadad4c94
For forced key frames in particular this helps to make them
blend better with the surrounding frames where noise tends
to be suppressed by a combination of quantization and alt
ref filtering.
Currently disabled by default under and IFDEF flag pending
wider testing.
Change-Id: I971b5cc2b2a4b9e1f11fe06c67ef073f01b25056
Bug fix: The crash is caused by not allocating buffer for prev_mip in
postproc_state and prev_mip in postproc_state is only used for MFQE,
ohter postproc modules, deblocking and etc., should not use it.
BUG=webm:1251
Change-Id: I3120d2f50603b4a2d400e92d583960a513953a28
add a trailing ':', though it's optional with the tools we support, it's
more common to use it to mark a label. this also quiets the
orphan-labels warning with nasm/yasm.
BUG=b/29583530
Change-Id: I46e95255e12026dd542d9838e2dd3fbddf7b56e2
For real-time mode, increase variance threshold for 32x32 blocks in
var-based partitioning for resolution >= 720p, so that it is more
likely to stay at 32x32 for high resolution which accelerates the
encoding speed with little/no PSNR drop.
PSNR effect on different speed settings:
speed 8 rtc: 0.02 overall PSNR drop, 0.285% SSIM drop
speed 7 rtc: 0.196% overall PSNR increase, 0.066% SSIM increase
speed 5 rtc_derf: no effect.
Speed up:
gips_motion_WHD, 1mbps: 2.5% faster on speed 7, 2.6% faster on speed8
gips_stat_WHD, 1mbps: 4.6% faster on speed 7, 5.6% faster on speed8
Change-Id: Ie7c33c4d2dd7d09294917e031357fc5476c3a4bb
Avoids a segfault in high-bitdepth builds.
This restores the condition to its state prior to:
7991241 vp9: Change the scheme for modeling rd for bsize 32x32.
BUG=webm:1250
Change-Id: I6183d5b34cb89dfbf27b7bb589812148a72cd7de
For real-time CBR mode, use model_rd_for_sb_y_large instead of
model_rd_for_sb_y for 32x32 block. In the former model, transform
might be skipped more aggressively in some condtions, which speeds
up encoding time with only a little PSNR/SSIM drop on rtc test set.
No obvious visual quality regression.
PSNR effect on different speed settings:
speed 8 rtc: 0.129% overall PSNR drop, 0.137% SSIM drop
speed 7 rtc: 0.135% overall PSNR drop, 0.062% SSIM drop
speed 5 rtc_derf: 0.105% overall PSNR drop, 0.095% SSIM drop
Speed up:
gips_motion_WHD, 1mbps: 3.29% faster on speed 7, 2.56% faster on speed8
gips_stat_WHD, 1mbps: 2.17% faster on speed 7, 1.62% faster on speed8
BUG=webm:1250
Change-Id: I818babce5b8549b4b1a7c3978df8591bffde7173
Use quotes whenever possible and {} always for variables.
Replace multiple set_all calls with *able_feature().
Change-Id: If579d3f718bd4133cf1592b4554a8ed00cf9f2d3
decoder_peek_si_internal could potentially read more bytes than
what actually exists in the input buffer. We check for the buffer
size to be at least 8, but we try to read up to 10 bytes in the
worst case. A well crafted file could thus cause a segfault.
Likely change that introduced this bug was:
https://chromium-review.googlesource.com/#/c/70439 (git hash:
7c43fb6)
BUG=chromium:621095
Change-Id: Id74880cfdded44caaa45bbdbaac859c09d3db752
When building without multithreading and for a non-arm, non-x86 system,
ctx is unused.
Cleans up -Wextra warning:
unused parameter ‘ctx’ [-Werror=unused-parameter]
Change-Id: Ifddff89d2ebd45f7d71e3d415a8f2415dd818957
'duration' is not used in realtime-only mode:
Cleans up -Wextra warning:
unused parameter 'duration' [-Wunused-parameter]
Change-Id: I827dfe59ebcdc72c5a93fdf7e5aca063433914b1
In vp9_pick_inter_mode(), instead of using
vp9_get_pred_context_switchable_interp(xd) to assign filter_ref,
we use a less strict condition on assigning filter_ref.
This is to reduce the probabily of entering the flow of not
assigning filter_ref and then skipping filter search.
Overall PSNR gain 0.074% for rtc dataset
Details:
Low Mid High
0.185% -0.008% -0.082%
Change-Id: Id5c5ab38d3766c213d5681e17b4d1afd1529e676
Allows building simple targets with sane default flags.
For example, using the Android arm64 toolchain from the NDK:
https://developer.android.com/ndk/guides/standalone_toolchain.html
./build/tools/make-standalone-toolchain.sh --arch=arm64 \
--platform=android-24 --install-dir=/tmp/arm64
CROSS=/tmp/arm64/bin/aarch64-linux-android- \
~/libvpx/configure --target=arm64-linux-gcc --disable-multithread
BUG=webm:1143
Change-Id: I06f5a7564f5382cf1a4bad41aef4308566c53adf
Speed test shows the new vertical filters have degradation on Celeron
Chromebook. Added "X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON" to control
the vertical filters activated code. Now just simply active the code
without degradation on Celeron. Later there should be 2 set of vertical
filters ssse3 functions, and let jump table to choose based on CPU type.
Change-Id: I37e3e9c5694737d9134a6bce6698d3e43f8fc962
For real-time CBR mode, use model_rd_for_sb_y_large instead of
model_rd_for_sb_y for 32x32 block. In the former model, transform
might be skipped more aggressively in some condtions, which speeds
up encoding time with only a little PSNR/SSIM drop on rtc test set.
No obvious visual quality regression.
PSNR effect on different speed setting:
speed 8 rtc: 0.129% overall PSNR drop, 0.137% SSIM drop
speed 7 rtc: 0.135% overall PSNR drop, 0.062% SSIM drop
speed 5 rtc_derf: 0.105% overall PSNR drop, 0.095% SSIM drop
Speed up:
gips_motion_WHD, 1mbps: 3.29% faster on speed 7, 2.56% faster on speed8
gips_stat_WHD, 1mbps: 2.17% faster on speed 7, 1.62% faster on speed8
Change-Id: I902f62def225ea01c145d7e5a93497398b8f5edf
Due to rounding used computation, HDB variance computation may produce
slightly negative values. This commit adds clamping to make sure
output variance values for 10 and 12 to be non-negative.
Change-Id: Id679aa55a4c201958c4c7d28cd8733b9246a71c8
This commit adds an encoder workaround to support better
compatibility with a non-compliant hardware vp9 profile 2 decoder.
The known issue with this decoder is:
The decoder assumes a wrong value, 127 instead of the correct
value of 511 and 2047, for any assumed top-left corner pixel in
UV planes for 10 and 12 bit, respectively. Such assumed
top-left corner pixel is used for INTRA prediction when a real
decoded/reconstructed pixel is not avalable, e.g. when it is
located inside the row above the top row or inside the column
left to the leftest column of a video image.
Change-Id: Ic15a938a3107e1b85e96cb7903a5c4220986b99d
decoding is done if the decoder is available, with errors handled
accordingly. the encoded frame count should be sufficient for this test.
+ remove HandleDecodeResult() as it's redundant given the base
implementation
BUG=webm:1233
Change-Id: I513c1c3475c58a746f4df627491bdc392fe21416
development has moved to the nextgenv2 branch and a snapshot from here
was used to seed aomedia
BUG=b/29457125
Change-Id: Iedaca11ec7870fb3a4e50b2c9ea0c2b056a0d3c0
This commit refactors the trellis coefficient optimization process.
It saves multiplications used to generate the final dequantized
coefficients. It removes two memset operations on quantized
and dequantized coefficient sets. This improves the unit speed
by 10%.
Change-Id: I23f47c6e14582520a7f952f03ce8f72183e7f0e6
Each time a codec is enabled or disabled with the umbrella
--enable-vpN flag, set the encoder and decoder configurations as well.
This was done as a post-processing step but doing that lost the order of
the arguments.
BUG=webm:1205
Change-Id: Ic629bfdd06acc04bc5a7227309f36bba54dad8b1
Since combining VPX_DL_REALTIME with VPX_RC_FIRST_PASS is basically
nonsense, ignore the user's pass setting when this happens and
behave as if the requested encode is a single pass encode.
BUG=webm:1233
Change-Id: I5ee4c4e5838c4ca6d24988890aae490b10826db2
The logic can be incorporated into configure.sh
Removes a dependency on ios-version.sh which was not part of DIST-SRCS
and removes a warning from 'make dist' sub builds:
../src/build/make/configure.sh: line 787:
../src/build/make/ios-version.sh: No such file or directory
Change-Id: Ic38314708eb278dd9d2a9769a670da32f6126637
This value is signed in vp9/10
Cleans warning in Android build:
comparison of integers of different signs: 'unsigned int' and 'int'
if (cpi->frames_since_golden == (cpi->current_gf_interval >> 1))
~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Change-Id: Ie137724982f3a46c8c1820548c1960d62a4e96f2
left_above_mv and above_block_mv return as_int
as_int is defined as uint32_t in vp8/common/mv.h
Cleans up -Wextra warnings:
signed and unsigned type in conditional expression
this_mv->as_int = col ? d[-1].bmi.mv.as_int : left_block_mv(mic, i);
^
this_mv->as_int = row ? d[-4].bmi.mv.as_int : above_block_mv(mic, i, mis);
^
left_mv.as_int = col ? d[-1].bmi.mv.as_int :
^
Change-Id: Ia043764e4ce93d2152d2269b1c7b28b5d5f814cf
This commit change to use int64_t to represent the sum of pixel
differences, which can be negative.
This fixes a number of ubsan warnings.
BUG=webm:1219
Change-Id: I885f245ae895ab92ca5f3b9848d37024b07aac98
Use ~15 instead of 0x..F0
Cleans warning in Android build:
comparison of integers of different signs: 'unsigned int' and 'int'
if (((cm->Width + 15) & 0xfffffff0) !=
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
comparison of integers of different signs: 'unsigned int' and 'int'
((cm->Height + 15) & 0xfffffff0) !=
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
Change-Id: Iac25839cde3425b7b9db7f33740dc46a551b7546
For VBR: (1) allow newmv mode for golden ref to
select interpolation filter (as in last ref case), and
(2) don't use the more aggressive tx-skip testing logic for large blocks.
Only affects 1 pass real-time vbr mode (speed >= 5).
PSNR/SSIM metrics on ytlive set are all positive, ~0.5-2% gain.
Change-Id: I0ffbb0a9755563a5acd6230c58236e4f19a47266
This change is only for real-time mode if short_circuit_low_temp_var
is on. Add bias to last frame in choosing ref frame for partitioning,
when y_sad and y_sad_g are close. It speeds up real-time encoding by
0.5% on some clips with less than 0.1% overall PSNR drop on rtc test set.
Change-Id: I2a2110fe36455f3d8f0fc404aef2228f512e8df8
Cleans warning in Android build:
comparison of integers of different signs: 'unsigned int' and 'int'
int n = (int)VPXMIN(sizeof(clear_buffer), data_end - data);
^ ~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~
Change-Id: I964355ceae6b39e22c0196294b25e28387f84945
Defined as unsigned in VP8_CONFIG
Cleans warning in Android build:
comparison of integers of different signs: 'unsigned int' and 'int'
if (cpi->oxcf.number_of_layers != prev_number_of_layers)
~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~
Change-Id: I969e64cd2bfda6e61c564476dbd35b892b177646
The vpx_roi_map_t and vpx_active_map_t structures use unsigned rows
and cols but VP8_COMMON uses signed values for mb_rows and mb_cols.
Cleans warning in Android build:
comparison of integers of different signs: 'int' and 'unsigned int'
if (cpi->common.mb_rows != rows || cpi->common.mb_cols != cols)
~~~~~~~~~~~~~~~~~~~ ^ ~~~~
comparison of integers of different signs: 'int' and 'unsigned int'
if (cpi->common.mb_rows != rows || cpi->common.mb_cols != cols)
~~~~~~~~~~~~~~~~~~~ ^ ~~~~
comparison of integers of different signs: 'unsigned int' and 'int'
if (rows == cpi->common.mb_rows && cols == cpi->common.mb_cols)
~~~~ ^ ~~~~~~~~~~~~~~~~~~~
comparison of integers of different signs: 'unsigned int' and 'int'
if (rows == cpi->common.mb_rows && cols == cpi->common.mb_cols)
Change-Id: If1f118c20ffefd2530fbd371e6787cc8a6c31f0a
Mode is signed
Cleans warning in Android build:
comparison of integers of different signs: 'int' and 'unsigned int'
if (ctx->oxcf.Mode != new_qc)
~~~~~~~~~~~~~~ ^ ~~~~~~
Change-Id: I5cf81c40b103e688a31e1339511f5c9eb27edd38
1. Skip golden non-zeromv and newmv-last for bsize >= 16x16 if the
temporal variance obtained from choose_partitioning is very low.
2. Skip horz and vert INTRA mode for speed 8.
This change works best on the clips with little noise and with some
motion (e.g. gips_motion which has > 5% speed up). PSNR drop is 1.78%
on rtc test set, no obvious visual quality regression found.
Change-Id: Ib43b5b20e67809d03c5a6890818ddff59e1fc94a
Move initialization of a some new "twopass" values
to the function vp9_init_second_pass() and some other
small changes.
Remove #if GROUP_ADAPTIVE_MAXQ as this is always
enabled now.
Change-Id: I1dbec2fd7c419779848aa987c4cd7824d4df8456
the difference between src and dst will be signed, the error will be
unsigned.
quiets -fsanitize=integer:
unsigned integer overflow: 4294967295 * 4294967295
Change-Id: I580813093ee46284fde7954520dfcb1188f79268
the difference between src and dst will be signed, the error will be
unsigned.
quiets -fsanitize=integer:
unsigned integer overflow: 4294967295 * 4294967295
Change-Id: I502fd707823c4faaa7f587c9cc0312f057e04904
On scene-cut detected frames (i.e., high_source_sad = 1), use
nonrd_pick_partition (over choose_part + select_part), as
the nonrd_pick partitioning is generally better.
Small positive increase in metrics on ytlive set (~0.5 - 1%).
Negligle overall speed decrease, as its only used on scene-cut frames.
Only affects 1 pass vbr mode, speed = 5.
Change-Id: I07c89cbdc75f5bb16eb8e0e2773ead0980d2de5c
This reverts commit be12fefa4b
and commit 057c1c4034.
Also, the mismatch between the avx version and the
c version has been fixed.
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1168
For a rt encode using 1080p@60fps material, up to 11% performance
improvement overall was seen.
Change-Id: Icd1f216209ebc6fc0b8da885f32f356fa4355ed0
The eob of a block is not perperly set when skip_recode is true,
thus triggering assert(eob <= default_eob) to fail.
Change-Id: Ifecbe33dce2dc4903e0a80bd384dc09bf0dd8a44
Code cleaup, use existing rolling_actual/target metrics instead,
set threshold to get same/similar effect.
Little/no change in metrics on ytlive set.
Change-Id: I74f3c3d0a143a9cf20dc9c3dee54c0f7e6a97a51
Add a max condition and lower the min value.
No change in behavior (metrics for yt live set) for the
default min/max_gf_interval=4/16 settings.
Small positive change when min/max_gf_interval=7/16
(for 60fps clips on ytlive set).
Change-Id: I1c1d72425c86c69419ea43fb9730130e81062f91
add an upper bound to the framerate denominator above which 30fps will
be reported; fixes warning in corrupt / fuzzed files
Change-Id: I46a6a6f34ab756535cd009fe12273d83dcc1e9f1
Provides more comprehensive coverage for --enable-coefficient-checking.
The intent is to make the --enable-coefficient-checking option
consistent with the VP9 spec.
Change-Id: I12d0120756d17572ca2b2d7e6a2ab9d8071d8d58
Error messages:
..\vp9\common\vp9_loopfilter.c(1312): warning C4244: 'function' :
conversion from 'uint64_t' to 'unsigned int', possible loss of data
[.build-x86_64-win64-vs10\vpx.vcxproj]
..\vp9\common\vp9_loopfilter.c(1313): warning C4244: 'function' :
conversion from 'uint64_t' to 'unsigned int', possible loss of data
[.build-x86_64-win64-vs10\vpx.vcxproj]
..\vp9\common\vp9_loopfilter.c(1312): error C2220: warning treated as
error - no 'object' file generated
[.build-x86_64-win64-vs10\vpx.vcxproj]
Change-Id: Ia69260611997cd2ba41c7184a85ecead740a7c07
Increase in the damping used in adjusting the active Q range.
This does hurt rate accuracy a little in a few extreme cases
especially if the clip is very short*, but helps metrics.
* Note that the adjustment is applied at the GF/ARF group level based
on what happened in the last group. Hence for very short clips where
the length of a single group may be a significant % of the clip length
there is still scope for some drift that cannot be accommodated.
In practice most data points in our test sets are now much closer to target
than was previously the case with default settings and in some cases are
better even than they were with the command line undershoot and overshoot
parameter was set very low (e.g. 2%). For example in bridge_close at high rates
the old mechanism was unable to adapt enough to prevent extreme overshoot.
Change-Id: I634f8f0e015b5ee64a9f0ccaa2bcfdbc1d360489
Change to the calculation of the error divisor used in
get_twopass_worst_quality(). This follows on from other
changes to the rate control that impact the output of this
function.
Change-Id: I414fa9aa1e6a68a64dccea17c3712f44b8a0c10c
Changes to the function the redistributes bits from overshoot
or undershoot throughout the rest of the clip to respond more
quickly.
Change-Id: I90f10900cdd82cf2ce1d8da4b6f91eb5934310da
Added a factor based on the bit spend in the last arf group vs the
target to adjust the choice of the active worst quality in subsequent
groups.
Helps clips where previously there was a big overshoot or undershoot
to adapt and get closer to the target rate.
Change-Id: I67034b801679b99024409489a2273ea6fe23b8e6
The use of this value is preventing rate adjustment on clips
or sections that have very little motion but high noise and
this can give rise to some sections with massive overshoot.
Change-Id: I9a65c7c1148dc5d3a7d8b23e50fc1733f3661621
Replaced vpx_d45_predictor_4x4_ssse3(), vpx_d45_predictor_8x8_ssse3()
and vpx_d207_predictor_4x4_ssse3() with
created vpx_d45_predictor_4x4_sse2(), vpx_d45_predictor_8x8_sse2()
and vpx_d207_predictor_4x4_sse2() respectively.
It's mostly neutral or slightly worse than ssse3 in good cases and
better than ssse3 in the bad cases (but still worse than using the mmx
regs).
Change-Id: Ib0237ceb71d2c57b8a93fd3170330cfed9d56bdd
Skip intra-mode and some inter-modes (newmv, nearmv, nearestmv) for
golden frame if the variance got from choose_partitioning is very low.
Only for 1 pass real-time CBR mode and bsize >= 32x32, it has ~2.5%
speed up with less than 0.1% PSNR drop for rtc test set. Don't see
visual regression.
Change-Id: I70efbc95a1007231ae36f02c5b2fbf6cd35077ad
Reduce operations and jumps. perf shows CPU time reduced from 1.9% to
1.6% when decoding fdJc1_IBKJA.248.webm on Xeon E5.
Will apply the changes to vp10 after code review.
Change-Id: I9351509922855d8896ddef1ed093b3ca12619a61
For non-rd pickmode:
best_pred_sad, computed for NEWMV-last, is only used for
skipping golden non-zero modes. Add condition to avoid this
computation if not used (i.e, if golden nonzero modes are not used).
And remove code for computing best_pred_sad for NEWMV-golden,
since that sad is not used.
No change in behavior; small speed gain (~1%) for svc encodes.
Change-Id: Ic2cbdef6c4e9a233a57c0db0eeac8ad5fcead366
convert the random value to int16 before subtracting 256 from it; quiets
a ubsan (sanitize=integer) warning
BUG=webm:1225
Change-Id: Ibc2c5a21f30e112bd6c180f7d6a033327c38d0df
Function level timing test shows about 27% time saving on
a Xeon E5-2680 v2 desktop.
Rename vp9_dct_sse2.c to vp9_dct_intrin_sse2.c for vp9 and
rename dct_sse2.c to dct_intrin_sse2.c for vp10 to avoid
duplicate basenames.
Actually vp9_fwht4x4_mmx/sse2() and vp10_fwht4x4_mmx/sse2()
are identical. TODO: They should be unified later if there is
no intention to keep a duplicate.
Change-Id: I3e537b7bbd9ba417c606cd7c68c4dbbfa583f77d
C does not allow for shifting into the sign bit of a signed
integer, and the two instances here become signed ints via
promotion. Explcitly cast them to unsigned MEM_VALUE_T to
avoid the problem.
BUG=https://bugs.chromium.org/p/chromium/issues/detail?id=614648
Change-Id: I51165361a8c6cbb5c378cf7e4e0f4b80b3ad9a6e
Followed the code style of other lpf fuctions.
These 2 functions put 2 rows of data in a single xmm register,
so they have similar but not identical filter operations,
and cannot share the same macros.
Change-Id: I3bab55a5d1a1232926ac8fd1f03251acc38302bc
- Avoid excessive copying
- Don't both searching if no update can possibly offer savings
- Simplify the interface
- Remove the confusing vp9_cost_upd256 macro
Change-Id: Id9d9676a361fd1203b27e930cd29c23b2813ce59
Apple's version format specification is strictly checked on app
store submission, even for embedded frameworks:
http://apple.co/1WgelY1
The build version number should be a string comprised of
three non-negative, period-separated integers with the
first integer being greater than zero. The string should
only contain numeric (0-9) and period (.) characters.
So that's room for "1.5.0" but not for "1.5.0-906-g656f9c4".
The full version returned from 'version.sh --bare' is now
embedded under a 'VPXFullVersion' custom key in the Info.plist,
so it can still be extracted from the resulting framework.
Change-Id: If34a58d02e407379d1f1859fda533ef7f983170b
vp9_diamond_search_sad_avx was disabled in:
057c1c4 disable vp9_diamond_search_sad_avx
this removes a missing prototype warning as the prototype is no longer
included in vp9_rtcd.h. the file can be restored if someone gets around
to fixing the issue.
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1168
Change-Id: Ia9fda4b81c53dc5fba7c31d780d761f886940b52
Many codes require -mstackrealign flags. Although -mstackrealign has
been already added to CFLAGS of some modules, SIGSEGV occurs in other
modules than those modules.
The best way may be to find causes and to fix them. However, we
cannot know those causes until SIGSEGV occur really. In addition, if
SIGSEGV occurs in other programs, it will be fatal.
So adding -mstackrealign flags to CFLAGS unconditionally is
reasonable.
Change-Id: I999ef597a6afe97f5e7cc7bffaa866537c3eedd2
This reverts commit 2468163e07.
causes valgrind errors for overread of buffer in SubpelVarianceTest
Change-Id: I448e52c76f815ac199305b71f7d169f2bc167679
Move the logic for rechecking zeromv on denoised block out to simplify
the function. To simplify the param passing, add a new structure
VP9_PICKMODE_CTX_DEN which is only used when denoiser is enabled.
Change-Id: Iaa9b4396dfcb8147236c02d4a1868a09103a4476
This commit clarifies integer value range for vairables used in
several variance functions, also change to use proper type
conversion to reflect the value ranges.
Change-Id: Ic3234b83a912ce1ad12d1b254f3378763e15cc5c
The inlining mirrors what was done with the low bit depth
inter_predictor. And the new highbd_inter_predictor name is more
consistent with other high bit depth functions.
Change-Id: I96437f745759aeec6260c6e39a974bf36f1c211c
Rename and change to how its updated.
Only affects 1 pass vbr.
Small change in metrics (< ~0.1%) on ytlive set.
Change-Id: Ibb1fe485699b6c4a8194951c8f229abe2f64b9a5
Also allows use of --enable-shared when configuring for Mac OS X,
producing a bare .dylib.
Enabling the shared framework bumps the iOS deployment target to 8.0,
the minimum required to support dynamic framework deployment in apps.
When not using --enable-shared, a static library for iOS 6.0+ will still
be built.
Minimum version settings have been moved into ios-version.sh so they
can be updated in a single place.
As with the static build, unless header search paths are manually
tweaked, users must add a VPX prefix on includes, such as:
#include <VPX/vpx/vpx_decoder.h>
A module map for headers is not yet included as inttypes.h is not
modular; this means that VPX cannot be used directly in Swift code,
but can still be pulled in through an Objective-C wrapper.
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1092
Change-Id: I28fb06ce65e48ed167a88c14a7bfb2861989317e
In motion estimation stage for subpel motion, subpel variance is
computed use bilinear interpolation. The motion vector precision
used is at 1/8 pel and three bits are used to represent the x and y
subpel offsets. Based on this, the half pel check should be against
4, not 8.
Change-Id: I1f56fa1fa3f2f5e19a20d27983efe628557f170e
there are sse2 equivalents which is a reasonable modern baseline
Removed mmx variance functions:
vpx_get_mb_ss_mmx()
vpx_get8x8var_mmx()
vpx_get4x4var_mmx()
vpx_variance4x4_mmx()
vpx_variance8x8_mmx()
vpx_mse16x16_mmx()
vpx_variance16x16_mmx()
vpx_variance16x8_mmx()
vpx_variance8x16_mmx()
Change-Id: Iffaf85344c6676a3dd337c0645a2dd5deb2f86a1
Added actual and absolute rate miss values to the opsnr.stt
stats output line.
Changes to the borg graphing may be needed before merge.
Change-Id: I1e9d548ce445d29002f0c59ebfd3957a6f15e702
Bug found by Yunqing relating to the correction for size at 8K and
above in get_twopass_worst_quality().
The basis for the correction was changed to the linear size relative to
1080P as a baseline and the adjustment has been clamped to prevent
problems at extreme images sizes.
For 1080P the results on our test sets were neutral but the low res and
mid res sets saw a small gain (0.1%-0.2% average).
I would also expect some gains on 4k and larger content where the
previous correction was overly aggressive.
Change-Id: I30b026b5f4535e9601e3178d738066459d19c8fb
Add control API VP9E_SET_TARGET_LEVEL that allows the encoder to
control the output bitstream level and/or keep level related
statistics.
Usage:
255 do not care about level (default)
0 keep level related stats only
10 target for level 1
11 target for level 1.1
.
.
.
62 target for level 6.2
Usage for vpxenc:
--target-level=0/255/10/11...
Change-Id: I31d1aeca19358b893e7577b4e63748c8e614034a
For at least some of the implementations of sdx8f, such as
vpx_sad4x4x8_sse4_1, aligned moves are used to move the results into the
array.
Change-Id: I83df5a8e657b44e906d0d8b0bc154f1e5660f7f9
block_variance: This operates on 8x8s and would be safe with a int32 *
int32 to uint32 multiply, but this is potentially unsafe for 12-bit
input. Unfortunately the code already segfaults on 12-bit input:
https://bugs.chromium.org/p/webm/issues/detail?id=1223
calculate_variance: This operates on up to a 32x32 of 8x8s and can
overflow even with 8-bit input (log2((256*32*32)**2) == 36).
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1220
Change-Id: I1ca4ff6092db9a7580da371ee9a21f403fdadc40
Reduce factor for setting base-qp for active_best_quality (for inter-frames).
Small increase in metrics on yt live set.
Change-Id: I9cf0ac797783aeddbfaf1ff510696c9035d7c5ee
This change makes the c match the assembly and removes the todo's
associated with getting this to work.
Change-Id: Ie32e9ebb584a9d60399662d8bcb71b74fbd19d1e
These implementations rely on casting the pointers to load the data.
Clang implemented optimizations which automatically add alignment hints
to such loads. The 4x4 filters do not guarantee the necessary alignment
so the resulting assembly is broken.
https://llvm.org/bugs/show_bug.cgi?id=24421
BUG=webm:817
BUG=webm:892
Change-Id: I608885299f1f86ff83653b65e0e40d0ae87fb3fe
* changes:
vp9_frame_scale_ssse3.c: make 2 functions static
vp9_pickmode.c: make function static
vp9_noise_estimate.c: make function static
vp9_aq_360.c: add missing include
vp9_idct_intrin_sse2: add missing vp9_rtcd.h include
vpx_dsp/*.[hc]: add missing vpx_dsp_rtcd.h include
Makes the delta-qp stop little earlier on areas that have been refreshed enough.
This helps to reduce some pulsing artifact on noisy flat areas observed in some
noisy vc-clips.
Threshold changes only take effect for sources where noise level is estimated to
be >= medium level.
Only affects 1 pass CBR, non-screen content case.
Change-Id: Iacf557f6aa8abbcd6782c02ff2e6c14891960850
For 1 pass vbr mode:
Refactor to move the logic for gf setting based on up-coming
key frames to a separate function, so same logic can be used for
scene-cuts/changes.
Change-Id: Ic4ede308e08ba869bb62e4566e19ea31222c5229
Makes the noise estimation react little faster.
Little/no change in metrics.
Change only affects 1 pass cbr.
Change-Id: I13f0daa90ecbf9d49eb1cf2e48febd9d92292940
When building a dynamic framework with Swift compatibility, can't
include any headers that aren't in another module or you get an
error like this from Xcode on the including project:
Include of non-modular header inside framework
For some reason the system inttypes.h is not in a module, unlike
other standard C library headers... but it doesn't seem to be
actually needed on Darwin, so removing it doesn't appear to
be a problem.
Change-Id: I11d264483c54feefd9d2edf573afaef34ddcd0f2
When using git submodules, .git may be a file instead of a directory.
The -d test was failing in that case; switched to -e.
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1215
Change-Id: Iedf0e92bfeb003b28a415945dc729e6ce58c4fe4
"qc" in vp{9,10}_token_state is used to save quantized coefficients, this
commit changes the type from short to tran_low_t to properly reflect
the value range for highbitdepth build.
This fixes an out-of-range bug when optimize_b is used in highbitdepth
build.
Change-Id: Ibf330879e6ac6ae8f099e085caa9d3d9a889fde8
This is an actual overflow where the result of the calculation is
materially changed, not just a negative value that is stored in an
unsigned.
Caught with fsanitize=integer on the VP9/AqSegmentTest.TestNoMisMatchAQ2/1 test.
Change-Id: I514b0ef4ae7ad50e3e08c0079aa204d59fa679aa
In so doing this fixes a couple of bugs:
vpx_plane_add_noise.c needed to subtract a clamp instead of add.
And the assembly (mmx sse) had assumptions that parameters were
continuous in memory which was not true.
Change-Id: I76f2c43cf54bfc838eb2edf8a443eaaa7565d7b5
- iOS SDKs no longer ship with armv6 support.
- Our minimum iOS version means all target devices have neon.
- Remove armv6 darwin LD workaround.
- This removes a TODO.
Change-Id: I2fcb5b82c96213364275475be021c7dd8459d5c0
Move skin superblock force split out of this function as well
as some minor code refactors. Checked bitexact for different speed
settings and different resolutions.
Change-Id: I6078cbe88dd9ce6c0b69470a8a0a8f8d2274161b
this avoids the decoder test which was only correct for vp9, vp10 was
missed in the earlier change
Change-Id: Ib789c906d440c0e4169052cf64c74d5e4b196caa
First, we only set use_4x4_partition for key frame where we don't
denoise; second, envision we have small partitions, we should pass the
actual block size to denoiser and make an early termination if needed.
Change-Id: I331f42046d792b17360723d17ff817d601394658
Wrap around behavior is enforced manually and we use the values in
arithmetic involving negative integers.
Change-Id: I199706b6f3af91f4fb6fe2ef302fbbc6d0cf5785
The product always fits in uint32_t, but the operands don't.
An optimizing compiler should generate the wraparound code.
(Verified with clang).
Change-Id: I25eb64df99152992bc898b8ccbb01d55c8d16e3c
ADL will look this up from the callsite namespace iff it is declared
before the callsite or from the parent namespace of the class type (the
global namespace).
This patch has been tested on MSVS 2015 and clang-3.8.
Change-Id: I00ba74712c9b617b9d81761abed1e14d8f25d8e3
Block size passed into denoiser filter is always >= BLOCK_8X8 (in
vp9_pick_inter_mode), it is not necessary to check smaller block
size. Passed the bitexact test on clips with different resolutions and
noise levels.
Change-Id: I19fa3195d18c27d9e5de60dc11cff1522ef3714e
Fix will reset the consec_zero_mv map on non-skipped blocks with non-zero mv.
Adjust thresholds on consec_zero_mv in noise estimation and skin detection,
as more possible reset on map means lower thresholds should be used.
Change-Id: Ibe8520057472b3609585260b51b6f95a38fb777d
webm_read_frame is the only function now which requires
documentation for what the return value means (other two are quite
obvious - file_is_webm and webm_guess_framerate).
Change-Id: I7a4f7d8097b1d748812b2ee251ee718a0b5ce836
In VP9 internal denoiser, motion magnitude is computed from
best_sse_mv, which should be set to 0 at the begining. This bug may
cause visual aritifact in denoiser. Also, delete two improper comments.
Change-Id: I8710d2acba23320bc85cf72af17d65245c19438b
Need to check that sse for non-zero mv has been set for the current block
(i.e., check that nonzero-mv is tested as a mode, so newmv_sse != UINT_MAX)
before forcing to not use zero-mv for denoising.
Also increase some thresholds (sse and sse_diff) for high noise case,
and use shift operaton instead of multiplication on a threshold computation.
Change-Id: Iae7339475d57240316b7fa8b887c4ee3c0d0dbec
Resolved two TODO items.
Force a minimum value of 1.0 for frame duration as per section duration.
Column inactive zone is currently set to 0 as most of the serious issues
relating to inactive regions relate to letter boxing.
Change-Id: Ifbab3acf2c089d7305620a7ff7ed7c3536cc9235
In Aq mode 1 the segment and AQ delta for each block is based
on spatial variance. There may be a net imbalance between blocks
that have lower Q than the baseline value and those that have higher Q.
This patch monitors that imbalance and extends the allowed baseline
Q range for the frame to accommodate adjustment of that baseline value
to compensate.
Change-Id: Iae8a48c7c01fe2af94a141e149d03acf467237ca
So it can be used even with aq-mode=3 not enabled.
Also cleans up some code in the places where its used.
No change in behavior.
Change-Id: Ib6b265308dbd483f691200da9a0be4da4b380dbc
Removed this todo because of another todo which says none of this code
should exist. It should be integrated into the block by block encode
process as per the decoder.
Change-Id: I076bd15140a060e69c014dd7d7cd07fea260aba3
For 1 pass vbr mode.
Increase the gf interval for case where average Q is close to
max and high overshoot is detected.
Small increase in overall avg_psnr/sssim metrics (~0.2/0.1%) for ytlive,
but improves the low-end (low bitrate) for several clips (less overshoot).
Change-Id: Ifba40f25b4861b2e0d9832c82d5359a6a3dce9f2
More even spacing near key frame and avoid gf on scene cut
if its close to key frame.
Small increase in metrics for ytlive set (which uses key-period=150).
(~0.2% gain)
Change only affects 1 pass vbr mode.
Change-Id: If1e5a59baf1e0befbaf998522fbc47d94ac5b5df
Change only affects 1 pass vbr.
Use a q value somewhat larger (~6%) than avg_frame_qindex[INTER]
as basis for active_best_quality for inter-frames.
And use the minium of this (avg_frame_qindex) and the active_worst_quality.
This reduces some overshoot in ytlive clips.
Overall small but positive average increase in metrics (up on average ~0.2%).
Change-Id: Icdbaae7872d5675fd38a13c0ec6ce0e2e3b919ce
This was never hooked up for the 32x32_34 case as the neon_asm version
in 3f7c12da, when the intrinsics version was added.
Change-Id: Ic7db4ce5850c637315f9fe9e2de93a4f8cf9e320
Change recursive weight for average_source_sad and
put some constraint on spacing between detected scene-cuts.
Change only affects 1 pass real-time mode.
Change-Id: I1917e748d845e244812d11aec2a9d755372ec182
Correct the setting of Q basis of GF/ARF in 1 pass vbr.
Existing logic would switch to using avg_QP of key frame if
avg_QP of inter is less than active worst (even if key frame is
not last frame).
Instead fix the logic (as per the comment) to use the lower of
active_worst_quality and avg_Q for inter as basis for GF/ARF
active_best_quality (unless last frame was key frame).
Increase in metrics: AvgPSNR/SSIM up by ~0.7/0.3 on ytlive set.
Change-Id: I9a628378ec6684bfda9457ebfc2384ef6d8579f7
Adjustment to stop excessive prediction decay triggered by blocks
or frames with extremely low spatial complexity which rendered the
comparison of intra and inter coded errors meaningless.
This was causing much shorter than expected groups on some 4k
test content.
Change-Id: I3f2c64200ef6dcef4721fc9f2ec09e480056ffc2
Uses a metric on fraction of smooth blocks derived from first pass
stats in a frame to adjust down the cq_level modestly in the cq mode.
The current implementation does not add much complexity, and is
fairly light in the adaptation.
Change-Id: Ic484e810d5bd51b7bb6b8945f378c7c3d9d27053
Adjust the motion decay component to account for image size.
This has very little impact for smaller image sizes.
Average bdrate results for our HD test sets:-
Hdres set: opsnr +0,92%, Fast SSIM +1.6%
Netflix hd set: opsnr + 1.5%, Fast SSIM +3.1%
There are a couple of notable -ve clips such as cyclist and sunflower
which seem to be better with a shorter interval but also a few very big
wins such as Jets >12% psnr 22% Fast SSIM and from the Netflix
Netflix set PierSeaside 9.7% psnr and 18.2% Fast SSIM.
Change-Id: Ie43aaedaa74331ed83d624a13548094ac64fed9e
Change only affects 1 pass vbr mode, speed >=5.
Increase min_thresh, decrease boost, and set a min/max
value for gf_interval.
Change-Id: I9c1e1a1ab0c5780064eb62714ee39a72ea4d2107
Trap the case where we end up with a very short arf group just before
a key frame. Such a group often has poor quality and may cause pulsing.
For example if the KF is 17 frames away we are better doing two mid-size
groups of 9 and 8 than a group of 15 followed by a group of 2.
This becomes more and more important when coding with a short forced
kf interval though it may not impact our standard tests much.
Change-Id: I29d83d6637b203eac69be320dd35a7401a4678c1
This reverts commit 74aaa2389e. Unstable
under valgrind because of uninitialized reads. Limiting the bad bisect range.
Change-Id: I45b32f0ee0ba45795e7efb9947fb805830c8ce0e
- Use arithmetic AND (&) instead of logical AND (&&) to
generate correct testing input.
- Fix variance reference function to be consistent with
our codebase implementation.
- Refer to the following issue:
https://bugs.chromium.org/p/webm/issues/detail?id=1166
Change-Id: I8c1ebb03e22dc9e1dcd96bdf935fc126cee71307
Avoid copy-block when denoising is at LowLow level (i.e., no denoising is done).
Instead, don't enter denoiser at all, and when level goes back up over kLowLow
do a reset in denoiser.
Change-Id: I0544adf58f4dd51ecc4a4607fcb0353bfbbb7a59
only output[0] needs to be set, store_output is more involved than a
movdqa in the high bitdepth case
Change-Id: I2cbd85d7cf74688bdf47eb767934fe42e02bff67
Avoid doing the mcomp in denoiser if we don't denoise the
block (because of motion/SSE/skin threshold, etc).
This can reduce encoding time (with denoiser enabled) by ~1.5-2%.
Change-Id: Ia699b68dfd37b89cdf3a82b8aa40e8c8f98a3d4f
This make it more likely clean/low-noise content will
be set as LowLow, and hence no denoising will be done.
Also set early exit on denoising for small blocks.
Change-Id: I4a72bba3e6c5e2d523d304c39deacc9c39bf216c
Some cleanup and bugfix: pass mi_row/mi_col (not mv_col/mv_row)
to build_inter_predictors. This only affects case where
the frame is resized, but since denoising is not done on resized
frames, the fix has not effect currently.
Change-Id: I36617a7f0b43b6f49976745f15d400977e6ffa46
Switch to use new skin model.
And fix condition for denoising skin block.
Previous condition did not denoise skin blocks if the selected
mode was non-zero motion in current frame. Modify condition to
also force no denoising if that mode was not selected as zero motion
now and for at least "x" past frames in a row (x = 2).
Change-Id: I00753e3fe45b9a308a7ef43c58f11868e3bfc6b0
not strictly necessary, but allows projects using '-Wconversion
-Wno-sign-conversion' to reuse these headers.
Change-Id: Id1398d726c90173ccba9aea66798fcef6f20fa23
Change only affects 1 pass, vbr, speed = 5 (real-time mode).
Some improvement for high motion content.
AvgPSNR/SSIM metrics for ytlive set all up, on average ~2%,
some clips (high motion ones) up 4/5%.
Encoder speed down: on mynintendo_x1.1280_720.y4m: 47fps -> 44fps.
Change-Id: I9e3eaa6392dcb6b5b44ee6f43004f97ba859bc11
the vpx_decoder layer guarantees that when called directly this won't
receive NULL data and the reuse via decode() is protected by a NULL data
check and 0 size check (NULL data and non-zero data size is protected by
the vpx_decoder layer).
Change-Id: I7437fb5ca4e4aa431963d55b909d4d920f339be3
The mv is clamped in dec_find_mv_refs() to a smaller region
than the clamp in dec_find_best_ref_mvs(). See clamp_mv_ref
and clamp_mv2.
Change-Id: I47dd5f7fa8b42f2cc593559b4d7c782fe7bcb1db
In multi-thread case, the encoder may crash if using encoder option
tile-rows > 0. To prevent that, force tile-rows=0 in this situation.
This is a workaround for WebM issue 1095:
https://bugs.chromium.org/p/webm/issues/detail?id=1095
The further fix can be done by adding synchronizations after a tile
row is encoded. But this will hurt multi-threaded encoder performance.
So, it is recommended to use tile-rows=0 while encoding with threads
> 1.
Change-Id: I656cbcc200f8d0410d09530e7981ad8f32fe7bc9
This patch was to fix a reported Hangouts deadlock/freezing issue
in VP8 encoder(issue 27232610). The original encoder loopfilter
synchronization happened in the following frame, which was prone
to causing problems in some complex use cases. This patch simplified
the synchronization logic.
More testing needs to be done.
Change-Id: I38fd3f35d11f98fae1e44546aa5e4c6d6e19c4be
Allow the encode loop to select from a wider range of Q values
when encoding normal (non arf or kf) frames.
This change is targeted at improving psycho-visual quality in some
easy sections that are currently not getting enough bits.
This is likely to be a little worse from a metrics perspective and may also
have a small impact on encode speed in cases where extra recode
iterations are triggered.
Change-Id: I667eebf33c753bcbcf8b93596467369e5708b889
Adds a second threshold for recodes even on frames where
recode is normally disabled if there is a big rate miss.
Change-Id: Ifd4a34707da55ec15eb7cfb87de4644b8d76deb2
Fix the threshold for forcing refresh of golden frame based
on high motion. The current comparison was incorrect and
prevented this (force update of gf on high motion) from being used.
For now keep this logic under a flag (and off for now) so as to
not change behavior, until further testing.
Change-Id: Ib5f0082159a428b0603b9534e4bcb6f83e4ccb25
+5.857% BD-RATE on SCREEN_CONTENT
Leaving this off for non-screen content because:
+25.300% on TWITCH120
+37.833% BD-RATE on RTC
Change-Id: Ie0a312182d6cc859fb04298e4cd81d02b39e23fe
For 1 pass vbr mode: Increase the period of gf update on scene
cut (keep it same as orginal/default setting for now).
Change-Id: I679c3bd21152f6c4e486c8098d931c00e1d26b5f
This is the identical change submitted for vp8 here:
https://chromium-review.googlesource.com/#/c/274107/
Tested this change on Mac OSX (10.10) and Linux
(Linux Mint 17 / Ubuntu 14.04) and in both cases:
- downloaded and compiled latest source for libvpx and ffmpeg
- confirmed ffmpeg would build sub-second frame rate webm files
via the previous patch
- confirmed ffmpeg would *not* build fps < 1 for vp9
- made this change, recompiled libvpn and ffmpeg
- confirmed ffmpeg would now create the same webm with
fps < 1
- confirmed the resulting file would play and was vp9 (e.g.
would not play in Firefox (Linux version complained it was
VP9 but mostly could play it) or older vlc, etc., but does
play just fine in Google Chrome and a newer version of vlc.
Sorry I didn't catch this last time - but this seems a solid
change and it's handy to be able to create frame rates
less than one second.
-jk
Change-Id: I38fa32148de8c4c359f228cf08b9a4b83b5a52fb
The change https://chromium-review.googlesource.com/#/c/329181/
also changed behavior for cbr mode, which causes some regression
in screenshare test in webrtc.
Resetting the specific change to leave the cbr behavior
unchanged for now.
Change-Id: I52df158806422f86398e1d2f522e92067d8325eb
Some adjustments to inter-mode selection for vbr mode.
Condition some of the bias to low/zero motion on cbr mode, and
don't use int_pro_motion_estimation for golden ref
(treat it same as last ref).
Change only affect 1 pass vbr mode, speed >=5 (non-rd pickmode).
Encoding time increase within ~5%.
Avg PSNR/SSIM on RTC set increase by ~2%, all clips up,
ranging from 0.5 to 4%.
Change-Id: I0048d0104a8816773d91a2b1484d601169d9bad7
Don't advance the svc frame counters on dropped frame,
since this can break the referencing scheme and lead
to a crash/assert.
Updated svc-datarate unittest to add a lower bitrate test.
Change only affects 1 pass cbr svc, with frame dropper enabled.
Change-Id: Ibb7530b7a587a9344d46898d9286fd9e2ef0779c
Use the superframe counter to set the key frame, and force
it to the key frame on base spatial layer only.
Also, update svc frame counters under frame dropping.
Update unittest: add specific tests with short key frame period.
https://bugs.chromium.org/p/webm/issues/detail?id=1150
Change-Id: I5b1c9a09253e6e5fbfce51b4cf603ae22d422b01
For 1 pass cbr mode: allow for two-stage 1:2 scaling
(which will use the 1:2 optimized scaler) if the spatial
layer is 1/4x1/4 of souce.
Without this change, the base layer for 3 spatial layers would
be using the non-normative scaler which is un-optimized/C code.
Change-Id: I9d73f92a4a96927d0f1d6bf75315c1e60513226a
Use sharp filter to generate motion compensated reference for
temporal filtering. It improves the average coding performance of
VP9 speed 0:
derf 0.34%
hevcmr 0.38%
stdhd 0.58%
Change-Id: I1772a051be545de8c343055274e5ca0929d19cda
This commit back ports the fix from
https://chromium-review.googlesource.com/#/c/326940
It corrects the block partition context fetching in rate-distortion
optimization. It improves the average coding performance of speed 0:
derf 0.098%
hevcmr 0.102%
stdhd 0.282%
Change-Id: I8bcc6fe40ba5c6b50a6136daac116dcc738937ec
The double pointer in xd->mi handles this for us.
Cuts encode_suberblock()'s self time in half at rt speed 8.
Change-Id: I820dae24efdbf9a140bbeae82e4e2a5850317766
* changes:
x86/convolve.h: remove redundant check in FUN_CONV_2D
x86/convolve.h: replace while w/if for w < 16
x86/convolve.h: change filter[] || chains to |
restore the value for VP9 to 9999 to satisfy the current test
expectations; without this
VP9/DatarateTestVP9Large.ChangingDropFrameThresh/8 will overshoot.
Change-Id: I88dad574ae4ab10f923579824c7347ff468c7045
This reverts commit f51f0998e1.
This causes datarate tests to fail. Some are due to the new default
keyframe distance, another causes an assert even forcing 9999:
[ RUN ] VP9/DatarateOnePassCbrSvc.OnePassCbrSvc3SpatialLayers/0
test_libvpx:
vpx_dsp/x86/vpx_subpixel_8t_intrin_ssse3.c:853: scaledconvolve2d:
Assertion `y_step_q4 <= 32' failed.
Change-Id: I4ee4fea97f47e4f1a23b82a62e6afc6280961e38
Reset the scale factors before build_inter_predictors.
Add datarate tests for 3 spatial layers, which exposed this issue.
Change-Id: I7f81efbe44345ecea9fdd5f639a4cca76aed3874
For 1 pass cbr mode: allow for two-stage 1:2 scaling
(which will use the 1:2 optimized scaler) if the spatial
layer is 1/4x1/4 of souce.
Without this change, the base layer for 3 spatial layers would
be using the non-normative scaler which is un-optimized/C code.
Change-Id: Ifcf526ec2aaf3e5fa7924588d9dd8660bf02fb46
some configurations may fail if AltRefTest is undefined though
VP8_INSTANTIATE_TEST_CASE is defined away.
Change-Id: I7272775a506718336bd6cee2225cf83bd72fede5
the same as vp8, with the same reasoning from:
2a0d7b1 Reduce the default kf_max_dist to 128.
see also:
https://trac.ffmpeg.org/ticket/4904https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=815673
+ restore vpxenc behavior of taking the library default rather than
forcing 5s
This change also exposes an issue with one-pass svc in cbr mode, keep
the old default in datarate_test.cc for now.
Change-Id: Id6d1244f42490b06fefc1a7b4e12a423a1f83e88
* changes:
x86inc.asm: only set visibility for chromium builds
Only use .text sections for aout
Use .text instead of .rodata on macho
Copy PIC handling code from x86_abi_support
Set 'private_extern' visibility for macho targets
Expand PIC default to macho64 and respect CONFIG_PIC from libvpx
Use libvpx defines to set name mangling rules
Customize x86inc.asm for libvpx
Update x86inc.asm from x264
Use the existing scene/content change detection to better
update/adjust golden frame refresh.
Change only affects 1 pass real-time vbr mode, speed >=5.
Change-Id: I2963a5bb7ca4a19f8cf8511b0a925e502f60e014
this restores the previous version's behavior avoiding issues with
builds that may split sources on directory boundaries; protected
visibility may work in this case.
Change-Id: Ie759bd96c9ea5b45613f450dffa6e67eb45f5a8b
The read only sections are getting stripped on some OS X builds. As a
result, random data is used in place of the intended tables.
Change-Id: I4629c90d9e0ae4d4efc193a93be6fb93809ae895
Don't initialize first pass costs for a number of symbols where first
pass probabilities aren't initialized.
This brings a 1.22x first pass speedup.
https://bugs.chromium.org/p/webm/issues/detail?id=1089
Change-Id: I97438c357bd88f52f5a15c697031cf0c3cc8f510
replace with vpx_highbd_lpf_horizontal_edge_16 and
vpx_highbd_lpf_horizontal_edge_8 to avoid passing a count parameter
Change-Id: I551f8cec0fce57032cb2652584bb802e2248644d
replace with vpx_lpf_horizontal_edge_16 and vpx_lpf_horizontal_edge_8 to
avoid passing a count parameter
Change-Id: I848c95c02a3c6ebaa6c2bdf0983dce05cd645271
move to encoder_encode() as vp10_get_compressed_data() allocates data and
would require some modification to make its error return meaningful.
Change-Id: Ia5267c35d16ccd42b6da6d2136402b13e28f9159
move to encoder_encode() as vp9_get_compressed_data() allocates data and
would require some modification to make its error return meaningful.
Change-Id: I8ddc390a1441afd0ff937842fa4ad1053c956133
Add frame-level condition for reference masking: under external or
internal dynamic resize, allow for reference masking if none of
the references have been scaled.
Peviously, reference masking was turned off for the stream if dynamic
resize feature was enabled or an external resize event occurred.
reference_masking gives speed up with little/no loss in compression.
For speed 7 on rtc set: encoding time decreases by about 5-7%,
avgPSNR/SSIM goes down ~0.2%.
Change-Id: Ie4444577451ef954414d8fb4b2c99d65cadf1746
This commit fixes issue 1141. The issue was triggered in multi-tile
encoding. The change properly saves and restores the block context
information in the real-time mode selection process. It removes
several redundant memcpy operations in sub8x8 intra block mode search.
Change-Id: I35c9ad197f4bd500ec39b5fc833f052f19eee010
External dynamic resize with swapping width and height was
not handled properly.
Fix is to re-init loop-filter under certain condtions.
Modify unittest to test this case.
Without this change test will fail.
Relates to: https://bugs.chromium.org/p/webm/issues/detail?id=1140
Change-Id: I7d81ca7fe0783b3bc103a52a7b7cf073a96be26e
allocations done within this function are protected with
vpx_internal_error; adding the setjmp fixes a crash in
vp10_lookahead_push() under low memory conditions.
Change-Id: I5515017cd71b218840c506791b3a517da7ffc93e
allocations done within this function are protected with
vpx_internal_error; adding the setjmp fixes a crash in
vp9_lookahead_push() under low memory conditions.
Change-Id: I4b79dca37cc7fadc4b7633f0db44c0e406799bc6
An issue exists with reference_masking in non-rd pickmode for spatial
scaling. It was kept off for internal dynamic resizing and svc, this
change is to keep it off also for external dynamic resizing.
Update to external resize test, and update TODO to re-enable this
at frame level when references have same scale as source.
Change-Id: If880a643572127def703ee5b2d16fd41bdbf256c
For dynamic resizing (whether the new codec size is determined internally
or externally set by user), we should for now keep rc.resize_allowed enabled.
This prevent the use of referene_masking for real-time mode
(in set_rt_speed_feature()).
Change-Id: Ibb7c3ff35be88afdf1a3c6db6693521766f177a3
to vp9_setup_pre_planes(), preventing the function
unscaled_value() from being called. unscaled_value()
returns the same value that was passed in. See
scaled_buffer_offset() in vp9_reconinter.h.
Change-Id: I2a6fbaf07972c2f212834929d29a2cbe72e399c3
The bit to error transformation got doubled as a result of going from
8-bit to 9-bit costs (change d13385c).
Use defines to derive the scale numbers and comment some of the fields.
derf: -0.023 BDRATE
hevcmr: +0.067 BDRATE
stdhd: +0.098 BDRATE
(These are substantially smaller than than the original gains from 8 to
9 bit costing.)
Change-Id: I6a2b3b029b2f1415e4f90a05709b2333ec0eea9b
When the codec frame size is the same as the reference frame size,
release the scaled reference before assigning it a new buf_idx.
Only affects 1 pass non-svc mode, where the scaled references are
release only under certain conditions (to prevent un-needed scaling
of the references every frame).
Modified a unittest that can trigger this bug without this change.
https://code.google.com/p/chromium/issues/detail?id=582598
Change-Id: I9a884e36ebd7608b1641ec2a469e20a4f829cf43
If the application changes frame size (external size changes),
and aq-mode=3 is on, reset the cyclic refresh.
Modify the TestExternalResize unittest (longer run with more resize
actions). Without this change an assert would be triggered on this
longer test.
Change-Id: I0eefd2cd7ffa0c557cca96ae30d607034a2599ce
Fixes an issue where the tx_type was not set correctly for
sub8x8 inter and intra blocks. In the current syntax, for
sub8x8 blocks, there is still a single tx_type that is
transmitted. Ideally, this should be searched for the best
rd performance, albeit at the expense of encode speed.
For now, we just set it to DCT_DCT. Previously it was left
incorrectly as what was used for the previous non sub8x8
block.
derflr: BDRATE -0.277%
Change-Id: If76ba903bfbfd4d374cf1ac7d1daee50e92f0edd
Make this consistent with regular block size rate-distortion
optimization. It improves the compression performance:
derf 0.055%
hevcmr 0.129%
Change-Id: I112fe734f592c21bc7aa6efb7e3f269c4214ee7b
For 1 pass real-time mode. No change in behavior as only last
and golden are used as references in 1 pass real-time mode.
Change-Id: Ie4655014eee1a8b271542f29d74b2c6f7fed54c9
the results along the top and left border are then stored with a moving
window into the vector.
~40-67% faster on ARM, ~40-77+% on x86 depending on the block size.
Change-Id: Iab369aa2946a3ae4eb7290d512868fe5db92dbc8
delete apply_cyclic_refresh_bitrate(). unused since:
3472cbb vp9 aq-mode=3: Keep it on even at low bitrates.
Change-Id: I0fac9a31b59504e31000ac3a8f0b68e8d4320113
The definition is for the number of frames to check to determine the
recent decay rate, further to determine the next key frame in the
first pass of the encoder.
Change-Id: Ic696d6eb518a86fa296842273cf8767ef0b0e27a
when INLINE is defined and mips is not being targeted. otherwise keep
the old --enable-extra-warnings behavior
Change-Id: Iba576edbe5fca03efa56ce99eee11f9cafc573ad
-use larger threshold on y (as in vp8).
-add distance threshold for each cluster
-use larger skin distance threshold for first cluster
-add some early exist checks.
Keep default setting to model=0.
Change-Id: I1044b99ade4bb1f215a860a019a4d84cee2f7715
It improves the compression performance of VP9 by 0.1% across all
test sets. No speed change is observed.
Change-Id: I59338c5c9e67bae22188f35fc3afbfe2a6bba6b0
The postproc vp9_denoise() is a spatial denoise/blur function.
It was not intended to be used if temporal denoising is enabled.
Change-Id: I97d2dcb941e7cc49bbafce99d9286beb2693249d
Put check to avoid possible out of bounds when looping
over the blocks to estimate noise level.
No change in behavior.
Change-Id: I4b7b19b7edee0ae1c35b9dc0700b1bf9b304d7f5
* changes:
configure: extend armv7 hf target autodetect
configure: remove default CROSS for arm targets
configure: avoid default when CROSS is set to null
This commit changes SSSE3 optimized idct8x8 functions to work with
highbitdepth build.
With this commit and the previous one that enabled SSSE3 idct32x32
functions, tests showed virtually no difference on decoding speed for
file fdJc1_IBKJA.248.webm for the build with -enable-vp9-highbitdpeth
option and the build without the option.
Change-Id: Ibe0634149ec70e8b921e6b30171664b8690a9c45
This commit changes the SSSE3 assembly functions for idct32x32 to
support highbitdepth build.
On test clip fdJc1_IBKJA.248.webm, this cuts the speed difference
between hbd and lbd build from between 3-4% to 1-2%.
Change-Id: Ic3390e0113bc1ca5bba8ec80d1795ad31b484fca
the lookahead buffer allocation is deferred to receipt of the first
frame to allow profile changes. if the encoder was flushed before
supplying any frames the encoder would crash trying to dereference the
NULL buffer. vp8 is unaffected.
fixes mozilla bug:
https://bugzilla.mozilla.org/show_bug.cgi?id=1237848
Change-Id: Icee4b64de760476eee0d33b568f0a1010335ff13
Use multiple clusters instead of one and decrease
the distance thresholds.
Add a define to switch between models.
Default is set to existing (1 cluster) model.
Change-Id: I802cd9bb565437ae8983ef39453939f5d5073bb1
If a superblock contains alot of "skin" then force split
of 64x64 partition, and make some adjustments in mode selection.
This helps to reduce artifacts on moving face/skin areas at low bitrates.
Little/no change in metrics: avgPSNR/SSIM down by ~0.12%.
Small encoding time increase < 1%.
Change-Id: Ic57f52148c3716f391419fab0530d916e4c1d186
For aqmode=3, golden period update is set based on period of cyclic refresh.
Put a limit on max golden period update, for now set to 40.
And fix comment.
Change-Id: Icb61dd87c796cce2a5f5f7331c6a129540994696
Limit oscilation detection in the case where overshoot is very very
large.
This keeps the 9-bit cost patch from breaking the DownUp reisze test.
The patch pushed us to an 11% undershoot right before a scene cut
causing a 1200% overshoot. (Whereas before we were undershooting by
only 6% before overshooting by 1200%).
Change-Id: Id90ccfab8aba872ccadc45b73b3bb097b895677f
In inter mode search skip all modes except NEARESTMV and DC_PRED.
10% less encode latency for large frames using the chromium remoting_perftests.
+0.313% BDRATE on the screencast set at speed -6.
Change-Id: Ib97a39dd8bcdeab545509e0e02d78ce7033f8c63
Remove comment(s) and enable frame-dropper for tests.
Frame dropper for 1 pass svc was fixed a while ago:
https://chromium-review.googlesource.com/#/c/309230/
Change-Id: I5fd3192825b22e562db9210d3dc7b246a1799d8d
Make it consistent with the comment/intended behavior,
that is, only denoise if current block is zero_mv.
Change-Id: I3909761e802e80089752a493ab3646dc32698ded
Changes to mode selection for 1 pass SVC mode:
use base layer motion vector, changes to intra-prediction.
Change-Id: I3e883aa04db521cfa026a0b12c9478ea35a344c9
This patch fixes a bug that causes the loop filter search to reset to
a low value or zero after each arf overlay frame. We expect the overlay
frames to need little or no loop filtering but this should not propagate.
Change-Id: I895b28474cf200f20d82793f3de40b60b19579fd
This is a pure-refactor in preparation to potentially raise the bit-cost
resolution.
Verified at good speed 0 and rt speed -6.
Change-Id: I5347e6e8c28a9ad9dd0aae1d76a3d0f3c2335bb9
More aggresive on avoiding denoising on skin.
May supplement this later by adding condtion onn consec_zeromv.
Change-Id: Ied92b332f9b24e821d2009f81d1565758588d9a5
Different quality levels are used for different regions in
the frame depending on how far they are vertically from the
center. Specifically, three segments are used based on the
mi_row index with respect number to the number of mi_rows in
the frame.
Change-Id: Ifc8b777bc58ea8521dffc4640360c67d99f8d381
This reverts commit ea48370a50, reversing
changes made to 15939cb2d7.
The commit was insufficiently tested and causes failures.
Change-Id: I623d6fc2cd3ae6fd42d0abab1f8eada465ae57a7
This commit adds the logic for segmentation map initialization and
disable temporal update of segmentation map when error-resilient
mode is on. It fixes the enc/dec mistmates (release build) and
assertions(debug) when both aq-mode and error-resilient are on.
Change-Id: Id2155e8b28962cf1f64494f4df0c8d79499b6890
Prior to this patch, read_inter_block_mode_info() would
find the nearmv and nearestmv for all modes. Now it does not
search for ZEROMV modes and breaks out early for NEARMV and
NEWMV modes.
Change-Id: Ifa7b1eaf58bb03b9c7792ea5012fef477527d0fd
There are flaws in current implementation of VP8 multithreading encoder
and decoder as reported in the following issue:
https://code.google.com/p/chromium/issues/detail?id=158922
Although the data race warnings are harmless, and wouldn't cause real
problems while encoding and decoding videos, it is better to fix the
warnings so that VP8 code could pass the TSan test.
To synchronize the thread-shared data access and maintain the speed
(i.e. decoding speed), use multiple mutexes based on mb_rows to reduce
the number of synchronizations needed, make the reads and writes of
the shared data protected, and reduce the number of mb_col writes by
nsync times.
The decoder speed tests showed < 3% speed loss while using 2 ~ 4
threads.
Change-Id: Ie296defffcd86a693188b668270d811964227882
The nominal tx_type for a given mode is used as a context
to encode the actual tx_type for intra.
Results:
derflr: -0.241% BDRATE
hevcmr: -0.366% BDRATE
Change-Id: Icfe7b0a58d79bc6497a06e3441779afec6e01e21
This commit enables encoder to avoid 8x4 and 4x8 partitions for
scaled reference frames when libvpx is configured and built with
--enable-better-hw-compatibility
Change-Id: I02ad65c386f5855f4325d72570c49164ed52f413
Move the logic for forcing zero_mode after the
(ref_frame & flag_list) check.
This was causing an memory leak under msan:
https://bugs.chromium.org/p/webrtc/issues/detail?id=5402
Change-Id: Ie9d243369f8ed7c332f46178275945331da4fd85
Under --enable-better-hw-compabibility, this commit adds the asserts
that no mv clamping is applied for scaled references, so when built
with this configure option, decoder will assert if an input bitstream
triggger mv clamping for scaled reference frames.
Change-Id: I786e86a2bbbfb5bc2d2b706a31b0ffa8fe2eb0cb
This commit adds a new configure option:
--enable-better-hw-compatibility
The purpose of the configure option is to provide information on known
hardware decoder implementation bugs, so encoder implementers may
choose to implement their encoders in a way to avoid triggering these
decoder bugs.
The WebM team were made aware of that a number of hardware decoders
have trouble in handling the combination of scaled frame reference
frame and 8x4 or 4x8 partitions. This commit added asserts to vp9
decoder, so when built with above configure option, the decoder can
assert if an input bitstream triggers such decoder bug.
Change-Id: I386204cfa80ed16b50ebde57f886121ed76200bf
Add function to compute skin map for a given block, as its
used in several places (cyclic refresh, noise estimation, and denoising).
Change-Id: Ied622908df43b6927f7fafc6c019d1867f2a24eb
Set initial values for these parameters in the vp9_init_layer_context().
This also fixes an issue in the svc-bypass mode when frame flags are
passed via the vpx_codec_encode().
Change-Id: I0968f04672f8d3d2fe2cea6b8a23f79f80d7a8b1
Otherwise, per-segment lossless might mean that some segments are not
lossless and they could still want to use another mode. The per-block
tx points remain uncoded on blocks where (per the segment id) the Q
value implies lossless.
Change-Id: If210206ab1fe3dd11976797370c77f961f13dfa0
For coding block sizes <=16X16, if the block is determined to be skin,
then always allow for that block to be candidate for refresh. So if that
block happens to be on the boost segment(s), segment won't get reset to 0
and delta-q will be applied.
PSNR/SSIM metrics neutral (little/no change) on RTC clips.
Speed increase small/negligible (< 1%).
Some visual improvement on faces in a few RTC clips.
Change-Id: I6bf0fce6f39d820b491ce05d7c017ad168fce7d6
arm-none-linux-gnueabi- is an anachronism and makes building on native
arm platforms more difficult. further, many distros include alternative
cross compilers, e.g., arm-linux-gnueabihf-, so the choice is best left
up to the user.
Change-Id: Id8aaf820ed112b85db2b8518d0e9d8abee1ad85c
avoids picking up defaults if CROSS is forcibly set empty as in:
$ CROSS= ./configure ...
BUG=1121
Change-Id: I6af91959288dede01efe3e5945698ab249eb6ec3
reduce the register count by 1 to avoid xmm6 and unnecessarily
penalizing the other users of the base macro
Change-Id: I59605c9a41a31c1b74f67ec06a40d1a7f92c4699
In 32-bit build with --enable-shared, there is a lot of
register pressure and register src_strideq is reused.
The code needs to use the stack based version of src_stride,
but this doesn't compile when used in an lea instruction.
This patch also fixes a related segmentation fault caused by the
implementation using src_strideq even though it has been
reused.
This patch also fixes the HBD subpel variance tests that fail
when compiled without disable-optimizations.
These failures were caused by local variables in the assembler
routines colliding with the caller's stack frame.
Change-Id: Ice9d4dafdcbdc6038ad5ee7c1c09a8f06deca362
H/V intra mode was only enabled for bsize < 16x16,
enable it also for bsize=16x16.
Metrics are neutral with this change:
Overall very small gain (0.1%), small visual gain on some RTC clips.
Change-Id: Ib2d7a44382433bfc11cf324aa3cc5c382ea9e088
For testing implemented a fixed pattern and delta, 1 pass,
fixed Q, low delay mode.
This has not in any way been tuned or optimized.
Change-Id: Icf9b57c3bb16cc5c0726d5229009212af36eb6d9
(copied from VP9)
The one pass VBR mode selects a Q range based on a
moving average of recent Q values. This calculation
should have been excluding arf overlay frames as these
are usually coded at the highest allowed value. Their
inclusion skews the average and can cause it to drift
upwards even when the clip as a whole is undershooting.
As such it can undermine correct adaptation of the allowed
Q range especially for easy content.
Change-Id: I9e12da84e12917e836b6e53ca4dfe4f150b9efb1
For testing implemented a fixed pattern and delta, 1 pass,
fixed Q, low delay mode.
This has not in any way been tuned or optimized.
Change-Id: Idf5ee179b277fa15d07a97f14f2ce5bbaae80a04
The one pass VBR mode selects a Q range based on a
moving average of recent Q values. This calculation
should have been excluding arf overlay frames as these
are usually coded at the highest allowed value. Their
inclusion skews the average and can cause it to drift
upwards even when the clip as a whole is undershooting.
As such it can undermine correct adaptation of the allowed
Q range especially for easy content.
Change-Id: I7d10fe4227262376aa2dc2a7aec0f1fd82bf11f9
The culprit is on the decode side xd->lossless[i] setup was in wrong
location where segment features are not yet decoded.
Also on the encoder side, transform mode was not set consistently
between when tx_mode is selected and how tx_mode is enforced in
tx size selection.
Change-Id: I4c4c32188fda7530cadab9b46d4201f33f7ceca3
Keep track of frame indexes for the references, and
constrain inter mode search for reference with same
temporal alignment.
Improves speed by about ~15%, no noticeable loss in
compression performance.
Change-Id: I5c407a8acca921234060c4fcef4afd7d734201c8
Lower the threshold for splitting 32x32->16x16 based on average variance,
and add lower bound condition for this split to occur. This prevents
unneccassry splitting for areas with very low variance.
Change-Id: Ibeb33b3d993632c2019f296eb87ef3b7e3568189
For non-rd variannce partition, speed >= 5:
Adjustments to reduce dragging artifcat of background area near
slow moving boundary.
-Decrease base threshold under low source noise conditions.
-Add condition to split 64x64/32x32 based on average variances
of lower level blocks.
PSNR/SSIM metrics go down ~0.7/0.9% on average on RTC set.
Visually helps to reduce dragging artifact on some rtc clips.
Change-Id: If1f0a1aef1ddacd67464520ca070e167abf82fac
Reallocate the xmm register usage so that no ARCH_X86_64 required.
Reduce memory access to the left neighbor by half.
Speed up by single digit on big core machine.
Change-Id: I392515ed8e8aeb02e6a717b3966b1ba13f5be990
This commit makes the sub8x8 block rate-distortion optimization
scheme use precise motion compensated prediction to compute the rd
cost. It fixes a potential buffer overflow issue related to sub8x8
motion search on scaled reference frame.
Change-Id: I4274992ef4f54eaacfde60db045e269c13aaa2de
GET_GOT modifies the stack pointer so the offset for left's address will
be wrong if loaded afterword.
Change-Id: Iff9433aec45f5f6fe1a59ed8080c589bad429536
Relocate the function from SSSE3 to SSE2, Unroll loop from 16 to 8,
and reduce mem access to left.
Speed up by single digit in ./test_intra_pred_speed on big core
machines.
Change-Id: I2b7fc95ffc0c42145be2baca4dc77116dff1c960
Xcode 7 refuses to link to x86 and x86_64 code that's built for
iphone sim, so add an extra command line flag that forces iosbuild
to use darwin15 targets.
Change-Id: I2228d458f5cccf4d26866040380a974f88d9d360
This commit enables the new temporal filter system for VP9. For
speed 1, it improves the compression performance:
derf 0.54%
stdhd 1.62%
Change-Id: I041760044def943e464345223790d4efad70b91e
This change has been imported from VP9 and
alters the nature and use of exhaustive motion search.
Firstly any exhaustive search is preceded by a normal step search.
The exhaustive search is only carried out if the distortion resulting
from the step search is above a threshold value.
Secondly the simple +/- 64 exhaustive search is replaced by a
multi stage mesh based search where each stage has a range
and step/interval size. Subsequent stages use the best position from
the previous stage as the center of the search but use a reduced range
and interval size.
For example:
stage 1: Range +/- 64 interval 4
stage 2: Range +/- 32 interval 2
stage 3: Range +/- 15 interval 1
This process, especially when it follows on from a normal step
search, has shown itself to be almost as effective as a full range
exhaustive search with step 1 but greatly lowers the computational
complexity such that it can be used in some cases for speeds 0-2.
This patch also removes a double exhaustive search for sub 8x8 blocks
which also contained a bug (the two searches used different distortion
metrics).
For best quality in my test animation sequence this patch has almost
no impact on quality but improves encode speed by more than 5X.
Restricted use in good quality speeds 0-2 yields significant quality gains
on the animation test of 0.2 - 0.5 db with only a small impact on encode
speed. On most natural video clips, however, where the step search
is performing well, the quality gain and speed impact are small.
Change-Id: Iac24152ae239f42a246f39ee5f00fe62d193cb98
4x4 Intra predictor implemented with MMX is replaced with SSE2.
Segfault in change 315561 when decoding vp8 is taken care of.
Change-Id: I083a7cb4eb8982954c20865160f91ebec777ec76
Fix copied over from VP9 master to VP10 master.
Do not reset the alt ref active flag when overlaying the middle
arf(s) of a multi arf group.
Change-Id: I1b7392107e7c675640d5ee1624012f39cc374c58
use CONFIG_VP[89] to protect white-box tests and drop redundant
uses of CONFIG_VP9 in variable assignments within that block
Change-Id: Id3c6cf5c7822aa161b19768b295f58829a1c6447
For non-rd variance partition: Adjust variance threhsold based
on noise level estimate. This change allows the adjustment to be
updated more frequently.
Change-Id: Ie2abf63bf3f1ee54d0bc4ff497298801fdb92b0d
Relocate the function from SSSE3 to SSE2, Unroll loop from 8 to 4,
and reduce mem access to left.
Speed up by >20% in ./test_intra_pred_speed.
Change-Id: Ie48229c2e32404706b722442942c84983bda74cc
Relocate the function from SSSE3 to SSE2, Unroll loop from 4 to 2,
and reduce mem access to left.
Speed up by >20% in ./test_intra_pred_speed.
Change-Id: Ib9f1846819783b6e05e2a310c930eb844b2b4d2e
The range_check is not used because the bit range
in fdct# is not correct. Since we are going to merge in a new version
of fdct# from nextgenv2, we won't fix the incorrect bit range now.
Change-Id: I54f27a6507f27bf475af302b4dbedc71c5385118
For low resolutions, whem 4x4downsample is used for variance,
use the same force split (that is used for 8x8downsample) for 16x16 blocks.
No change in metrics. Small improvement visually.
Change-Id: I915b9895902d0b9a41e75d37fee1bf3714d2366d
the loop filter level is transmitted as 6-bits + sign so needs to be clamped in
the delta + absolute case.
BUG=https://bugzilla.mozilla.org/show_bug.cgi?id=1224363
Change-Id: Icbdca4fdbf043466429bd5c9d59dbe913bf153bc
the quantizer is transmitted as 7-bits + sign so needs to be clamped in
the delta + absolute case.
BUG=https://bugzilla.mozilla.org/show_bug.cgi?id=1224361
Change-Id: I9115f5d1d5cf7e0a1d149d79486d9d17de9b9639
This is so we may update level at any time (e.g., to be used
for setting thresholds in variance-based partition).
Change-Id: I32caad2271b8e03017a531f9ea456a6dbb9d49c7
Under certain denoising conditons, check for re-evaluation of
zero_last mode if best mode was golden reference.
Change-Id: Ic6cdfd175eef2f7d68606300c7173ab6654b3f6e
Reduce mem access to left. Speed up by 10% in ./test_intra_pred_speed
with the same instruction size.
Change-Id: Ia33689d62476972cc82ebb06b50415aeccc95d15
For non-rd variance partition: only allow minmax computation
(which currently has no arm-neon optimization) for speeds < 8.
Performance loss is small: On RTC set with speed 8, few clips lose ~2/3%,
average loss is < 1%.
Change-Id: Ia9414f4d0b77dc83c3e73ca8de5d903f64b425ce
Relocate h_predictor_4x4 from SSSE3 to SSE2 with XMM registers.
Speed up by ~25% in ./test_intra_pred_speed.
Change-Id: I64e14c13b482a471449be3559bfb0da45cf88d9d
Change initial state of noise level, and only update
denoiser with noise level when estimate is done.
Change-Id: If44090d29949d3e4927e855d88241634cdb395dc
For denoising, and for noise level above threshold, re-evaluate
ZEROMV for mode selection after denoising.
Current change only does this check if selected best mode (before denoising)
was intra.
Change-Id: I4b1435b68d26c78f7597b995ee7bff0ddd5f9511
Always round sum error and sum square error toward zero in variance
calculations. This prevents variance from becoming negative.
Avoiding rounding variance at all might be better but would be far
more invasive.
Change-Id: Icf24e0e75ff94952fc026ba6a4d26adf8d373f1c
This change makes sure last reference with zero mv
is always checked for mode selection.
No change in metrics.
Change-Id: Iaf01877bf34272b966c78bfe18daad882a0a419e
the final sum may use up to 26 bits
+ add a unit test
+ disable the sse2 as the result will rollover; this will be fixed in a
future commit
Change-Id: I2a49811dfaa06abfd9fa1e1e65ed7cd68e4c97ce
Change on affects 1 pass CBR.
On key frame, temporal layer_id is reset to 0 for 1 pass CBR,
but since "layer" is reset, the svc.layer_context[layer].is_key_frame
was not correspondingly set properly.
Change-Id: I08f6da0a55ac7429ccfbaddfb7be14479e43543b
tm_predictor_4x4 is implemented with SSE2 using XMM registers.
Speed up by ~25% in ./test_intra_pred_speed.
Change-Id: I25074b78d476a2cb17f81cf654bdfd80df2070e0
--disable-XXX has the effect of disabling all extensions above it, e.g.,
--disable-ssse3 disables ssse3-avx2.
Change-Id: If02b44ca71ee12e4acb12010db8593a7989f2a9d
Small changes to the best quality default speed trade off.
Some speedup settings are worth while even for best quality as they
have only a very small impact on quality but a significant impact on
encode time.
These changes give as much as a further 50-60% increase in encode
speed for my test animations clip with minimal impact on quality.
For this sequence these changes improve the best quality encode speed
to about the same level as good quality speed 0 in Q3 2015 whilst
retaining the large quality gain of over 1 db
For many natural videos though the quality difference from good 0
to best is much smaller.
Change-Id: I28b3840009d77e129817a78a7c41e29cb03e1132
This is simpler than the previous scheme, which tried to allocate
the CRITICAL_SECTION struct in a thread-safe manner before it
could use it to run the wrapped function in a thread-safe manner.
Change-Id: I172e5544e5f16403a3a0e5e2b9104b1292a0d786
This change alters the nature and use of exhaustive motion search.
Firstly any exhaustive search is preceded by a normal step search.
The exhaustive search is only carried out if the distortion resulting
from the step search is above a threshold value.
Secondly the simple +/- 64 exhaustive search is replaced by a
multi stage mesh based search where each stage has a range
and step/interval size. Subsequent stages use the best position from
the previous stage as the center of the search but use a reduced range
and interval size.
For example:
stage 1: Range +/- 64 interval 4
stage 2: Range +/- 32 interval 2
stage 3: Range +/- 15 interval 1
This process, especially when it follows on from a normal step
search, has shown itself to be almost as effective as a full range
exhaustive search with step 1 but greatly lowers the computational
complexity such that it can be used in some cases for speeds 0-2.
This patch also removes a double exhaustive search for sub 8x8 blocks
which also contained a bug (the two searches used different distortion
metrics).
For best quality in my test animation sequence this patch has almost
no impact on quality but improves encode speed by more than 5X.
Restricted use in good quality speeds 0-2 yields significant quality gains
on the animation test of 0.2 - 0.5 db with only a small impact on encode
speed. On most clips though the quality gain and speed impact are small.
Change-Id: Id22967a840e996e1db273f6ac4ff03f4f52d49aa
This reverts commit 380a5519cc.
This causes an assertion failure in debug_check_frame_counts() which
probably isn't valid with this change; leaving the investigation for
later now.
Change-Id: Ieda5ca811ed2fa50a0cc6935919a8d10dca996e0
This function now has an AVX intrinsics version which is about 80%
faster compared to the C implementation. This provides a 2-4% total
speed-up for encode, depending on encoding parameters. The function
utilizes 3 properties of the cost function lookup table, constructed
in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'.
For the joint cost:
- mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3]
For the component costs:
- For all i: mvsadcost[0][i] == mvsadcost[1][i]
(equal per component cost)
- For all i: mvsadcost[0][i] == mvsadcost[0][-i]
(Cost function is even)
These must hold, otherwise the AVX version of the function cannot be used.
Change-Id: I6c2791d43022822a9e6ab43cd124a773946d0bdc
the return value of enabled, which may be empty, is handled by the for
loop. this avoids making an unnecessarily long command line which may
fail in certain cases.
Change-Id: Ib88ecbbe2c0f6d7debb600b4caed4884497263b1
Change is only for real-time mode, speed >= 5, and non-screen content mode.
Add bias to zero/low motion for big blocks, if noise estimation
is enabled and noise level is above threshold.
Change-Id: I3a0a4608ede6aa535bda6eca528d20f8aba738e7
For 1 pass CBR mode: increase waiting time after key frame
before we start sampling rate control behavior for determining
resize. This change need to disable one internal resize(DownUp)
temporally since it requires a longer clip to do so.
Change-Id: If21beda1be23f169ee541ab4dd642f718347887a
Use same setting for speed 5 (as it is for speed > 5).
Change is only for real-time (non-rd) mode.
Change-Id: I830250eac654328373cb318baa89d4f0e63942e1
Reduces Linux perf estimated cycle count for pack_mb_tokens on a
lossless encode on my desktop from 61858501855 to 48154040219 or from
26% of the overall profile to 21%.
Change-Id: I9ca3426d7e3272bc7f7030abda4f0d0cec87fb4a
This reverts commit f1342a7b07.
This breaks 32-bit builds:
runtime error: load of misaligned address 0xf72fdd48 for type 'const
__m128i' (vector of 2 'long long' values), which requires 16 byte
alignment
+ _mm_set1_epi64x is incompatible with some versions of visual studio
Change-Id: I6f6fc3c11403344cef78d1c432cdc9147e5c1673
Add threshold/condition on spatial_variance and brightness level.
Modification to normalization of block variance.
Change resolution limit below which we disable noise estimation.
Change-Id: If5be08a26ceda351242d8a58d2f0bc88c0a918f0
This function now has an AVX intrinsics version which is about 80%
faster compared to the C implementation. This provides a 2-4% total
speed-up for encode, depending on encoding parameters. The function
utilizes 3 properties of the cost function lookup table, constructed
in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'.
For the joint cost:
- mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3]
For the component costs:
- For all i: mvsadcost[0][i] == mvsadcost[1][i]
(equal per component cost)
- For all i: mvsadcost[0][i] == mvsadcost[0][-i]
(Cost function is even)
These must hold, otherwise the AVX version of the function cannot be used.
Change-Id: I184055b864c5a2dc37b2d8c5c9012eb801e9daf6
Change is only for real-time mode, speed > 5, and non-screen content mode.
Bias is based on block size and motion vector level (motion above some threshold).
Helps to improves stability in background from lightning changes.
PSNR/SSIM metrics on RTC set almost no change/neutral (within +/- 0.1).
Change-Id: I7eac13c1ae10be4ab1f40acc7f9f1df5653ece9d
Only use non-zero threshold(s) for breakout if
the motion level of the current tested mode is low.
Change-Id: I22aae961cc42371b49d3f648560181cc54708502
Source noise level estimate is also useful for
setting variance encoder parameters (variance thresholds,
qp-delta, mode selection, etc), so allow it to be used also
if denoising is not on.
Change-Id: I4fe23d47607b4e17a35287057f489c29114beed1
this avoids redefining vpx_codec_vp9_dx, vpx_codec_vp9_dx_algo in
vp9_encoder_parms_get_to_decoder.cc
Change-Id: I3b89e7a62497227ee32419f1a7d30e4c10a13c05
(cherry picked from commit ca163b85bb)
this avoids redefining vpx_codec_vp9_dx, vpx_codec_vp9_dx_algo in
vp9_encoder_parms_get_to_decoder.cc
Change-Id: I3b89e7a62497227ee32419f1a7d30e4c10a13c05
The old workaround "p = 0 ? 0 : p -1" is misleading.
?: happens before =
assigning back to p truncates to one byte.
Therefore it is equivalent to (p - 1) & 0xFF, but the check just exists
to work around a first pass bug, so let's make the work around more
clear.
https://bugs.chromium.org/p/webm/issues/detail?id=1089
Change-Id: I587c44dd61c1f3767543c0126376f881889935af
Width and height of downscaling resolution should not be lower
than min_width and min_height which can be set as needed, both
are 180 for now.
Change-Id: I34d06704ea51affbdd814246e22ee8d41d991f00
This reverts commit 7f56cb2978.
It causes uninitialized reads in the first pass setting up later cost tables.
Change-Id: I2df498df3f5c03eff359f79edf045aed0c618dc9
Remove delta index 254 from probability remapping and subexp coding.
Saves 1-bit when the delta index is 129.
Change-Id: I88aba565fc766b1769165be458d2efd3ce45817e
Adjust variance threshold, delta-qp, and intra penalty cost,
based on estimated noise level in source.
Replace denoising_on with a level value=L/M/H.
Change-Id: I0c017dae75a5d897367d2c42dec26f2f37e447c1
The option exists specifically to allow for configurations
where the build environment is different from the configure
environment.
Change-Id: I95196fa3c49700251d10ff5d256dc7380e39d0c4
The old workaround "p = 0 ? 0 : p -1" is misleading.
?: happens before =
assigning back to p truncates to one byte.
Therefore it is equivalent to (p - 1) & 0xFF, but the check just exists
to work around a first pass bug, so let's make the work around more
clear.
https://code.google.com/p/webm/issues/detail?id=1089
Change-Id: Ia6dcc8922e1acbac0eeca23a4d564a355c489572
The custom LCG is based on the POSIX recommend constants for a 16-bit
rand(). This implementation uses less computation than typical standard
library procedures which have been extended for 32-bit support, is
guaranteed to be reentrant, and identical everywhere.
Change-Id: I3140bbd566f44ab820d131c584a5d4ec6134c5a0
Ref: http://pubs.opengroup.org/onlinepubs/9699919799/functions/rand.html
Bug relating to issue:- http://b/25090786
base_frame_target is supposed to track the idealized bit
allocation based on error score and not the actual bits
allocated to each frame.
The clamping of this value based on the VBR min and max pct values
was causing a bug where in some cases the loop that adjusts the
active max quantizer for each GF group was running out of bits at
the end of a KF group. This caused a spike in Q and some ugly artifacts.
A second change makes sure that the calculation of the active
Q range for a group DOES, however, take account of clamping.
Change-Id: I31035e97d18853530b0874b433c1da7703f607d1
Periodically estiamte noise level in source, and only denoise
if estimated noise level is above threshold.
Change-Id: I54f967b3003b0c14d0b1d3dc83cb82ce8cc2d381
Add the row and column index to the argument list of unit functions
called by foreach_transformed_block wrapper. This avoids the
repeated internal parsing according to the block index.
Change-Id: Ie7508acdac0b498487564639bc5cc6378a8a0df7
A new version of vp9_highbd_error_8bit is now available which is
optimized with AVX assembly. AVX itself does not buy us too much, but
the non-destructive 3 operand format encoding of the 128bit SSEn integer
instructions helps to eliminate move instructions. The Sandy Bridge
micro-architecture cannot eliminate move instructions in the processor
front end, so AVX will help on these machines.
Further 2 optimizations are applied:
1. The common case of computing block error on 4x4 blocks is optimized
as a special case.
2. All arithmetic is speculatively done on 32 bits only. At the end of
the loop, the code detects if overflow might have happened and if so,
the whole computation is re-executed using higher precision arithmetic.
This case however is extremely rare in real use, so we can achieve a
large net gain here.
The optimizations rely on the fact that the coefficients are in the
range [-(2^15-1), 2^15-1], and that the quantized coefficients always
have the same sign as the input coefficients (in the worst case they are
0). These are the same assumptions that the old SSE2 assembly code for
the non high bitdepth configuration relied on. The unit tests have been
updated to take this constraint into consideration when generating test
input data.
Change-Id: I57d9888a74715e7145a5d9987d67891ef68f39b7
This causes the output of find_ref_mvs() to always be unique or zero.
A nice side-effect of this is that it also causes the output of
find_ref_mvs_sub8x8() to be unique-or-zero, and it will not ignore
available candidate MVs under certain conditions.
See issue 1012.
Change-Id: If4792789cb7885dbc9db420001d95f9b91b63bfa
Added optimization of the 8 bit assembly quantizer routines. This makes
these functions up to 100% faster, depending on encoding parameters.
This patch maskes the encoder faster in both the high bitdepth and 8bit
configurations. In the high bitdepth configuration, it effects profile 0
only.
Based on my profiling using 1080p input the net gain is between 1-3% for
the 8 bit config, and around 2.5-4.5% for the high bitdepth config,
depending on target bitrate. The difference between the 8 bit and high
bitdepth configurations for the same encoder run is reduced by 1% in all
cases I have profiled.
Change-Id: I86714a6b7364da20cd468cd784247009663a5140
VP8E_UPD_ENTROPY, VP8E_UPD_REFERENCE and VP8E_USE_REFERENCE have been
deprecated since the initial public release
Change-Id: Ied16b441eec13434d85f1ab115d49ccaf5f2f7b0
Some more testing of this patch would probably be useful, but I
think the basics of it should work fine now.
See issue 1035.
Change-Id: I4a36d58f671c5391cb09d564581784a00ed26245
This experiment allows using full above/right edges for all transform
sizes whenever available (for d45/d63), and adds bottom/left edges for
d207.
See issue 1043.
Change-Id: I5cf7f345e783e8539bb6b6d2c9972fb1d6d0a78b
In VP9, the ref MV had to point to a block that itself fully resided
within the visible image, i.e. all borders of the image had to be
within the visible borders of the coded frame. This is somewhat
illogical, and had obscure side effects, e.g. clamping of fairly
reasonable motion vectors such as 0,0 were clipped to negative values
if the block was overhanging on frame edges (such as the last rows
on 1080p content), which makes no sense whatsoever.
Instead, relax clamping constraints such that the ref MVs are allowed
to point to blocks exactly outside the visible edges in both Y as well
as UV planes, including the 8tap filter edges (that's why the offset is
8 pixels + block size).
See issue 1037.
Change-Id: I2683eb2a18b24955e4dcce36c2940aa2ba3a1061
This has various benefits:
- simplify implementations because we don't have to switch between
multiple probability tables depending on frametype
- allows fw subexp and bw adaptivity for partitions/uvmode in keyframes
See issue 1040 point 5.
Change-Id: Ia566aa2863252d130cee9deedcf123bb2a0d3765
Locate them (code-wise) in frame_context, and have them be updated
as any other probability using the subexp forward and adaptive bw
updates.
See issue 1040 point 1.
TODOs:
- real-world default probabilities
- why is counts sometimes NULL in the decoder? Does that mean bw
adaptivity updates only work on some frames? (I haven't looked
very closely yet, maybe this is a red herring.)
Change-Id: I23b57b4e5e7574b75f16eb64823b29c22fbab42e
Account for rounding in distortion calculation in k-means;
carry out rounding before duplicates removal of base colors;
replace numbers with macros;
use prefix increment.
Slight coding gain (<0.1%) on screen_content testset.
Change-Id: Ie8bd241266da6b82c7b2874befc3a0c72b4fcd8c
Adjust the qp threshold and consec_zeromv threshold for
limiting cyclic refresh. Also increase the refresh period
when the limit amount is significant, and some code-cleanup.
Small gain in PSNR/SSIM metrics: ~0.25/0.3 gain on RTC set, speed 7.
Change only affects non-screen content.
Change-Id: I1ced87a89a132684c071e722616e445b2d18236a
Adjust the qp threshold based on the denoising setting; not allow
to scale directly from original resolution to one half and vise versa.
Change-Id: I032a9b22f8e1c88de6bb81cf8351367223a3e40d
For the re-encoding (at max-qp) on the detected high-content change:
update rate correction factor, reset rate over/under-shoot flags,
and update/reset the rate control for layered coding.
Change-Id: I5dc72bb235427344dc87b5235f2b0f31704a034a
Changes to the breakout behavior for partition selection.
The biggest impact is on speed 0 where encode speed in
some cases more than doubles with typically less than 1%
impact on quality.
Speed 0 encode speed impact examples
Animation test clip: +128%
Park Joy: +59%
Old town Cross: + 109%
Change-Id: I222720657e56cede1b2a5539096f788ffb2df3a1
This change (in a new config experiment: universal_hp) removes the
bitstream parsing dependency of the HP MV bit on the ref MV to be
coded. It also cleans up clearing of the HP bit in near/nearestMV,
since HP is always on if it's set in the frame header.
This admittedly doesn't clean up the crap that could be cleaned up,
but that's mostly because I think this needs some careful review;
not so much for coding style, but more from hardware people and from
the codec team on what we/you want. It would also be nice to get some
actual numbers on the real quality impact of this change. If, for
example, hardware people come up and tell us they don't actually care
anymore, we should probably just this code as-is and do nothing (i.e.
discard this patch).
See issue 1036.
Change-Id: Ic9b106f34422aa0f79de0c28125b72d566bd511a
This actually has no effect whatsoever, since the input MVs themselves
are clamped by clamp_mv_ref() already, which is significantly more
restrictive in its bounds.
Change-Id: I4a3a7b2b121ee422c56428c2a12d930c3813c06e
We only write EOSB tokens if we write tokens (i.e. not for skip blocks),
and we write EOSB tokens per-plane instead of per block.
Change-Id: I8d7ee99f8ec50eb7ae809f9f9282c1c91dbf6537
Add palette mode for keyframe luma channel. Palette mode is enabled
when using "--tune-content=screen" in encoding config parameters.
on screen_content testset: +6.89%
on derlr : +0.00%
Design doc (WIP):
https://goo.gl/lD4yJw
Change-Id: Ib368b216bfd3ea21c6c27436934ad87afdaa6f88
If high bit depth configuration is enabled, but encoding in profile 0,
the code now falls back on optimized SSE2 assembler to compute the
block errors, similar to when high bit depth is not enabled.
Change-Id: I471d1494e541de61a4008f852dbc0d548856484f
The artifact occurs periodically when VP9 denoiser is on and
refresh_golden_frame happen. When refresh_golden_frame happen,
we should copy the frame buffer instead of swapping the pointers.
Change-Id: Ib3204c4b04db28ecf439c6d9e61f3d146f04196d
this reduces the number of synchronizations in decode_tiles_mt() and
improves overall performance when the number of threads is less than the
number of tiles
Change-Id: Iaee6082673dc187ffe0e3d91a701d1e470c62924
Small code cleanup. consec_zeromv refresh threshold
does not need to be computed for every super-block.
No change in behavior.
Change-Id: I8c4b1b28072f42b01d917fff6d1f62722f1e1554
The serial decode check is too strict for tile-threaded decoding as
there is no guarantee on the decode order nor which specific error
will take precedence. Currently a tile-level error is not forwarded so
the frame will simply be marked corrupt.
Change-Id: I51cf1e39e44bedeac93746154b36a4ccb2f059b1
Use the existing VP9_SET_SVC control to set the
first spatial layer to encode.
Since we loop over all spatial layers inside the encoder, the
setting of spatial_layer_id via VP9_SET_SVC has no relevance.
Use it instead to set the first_spatial_layer_to_encode,
which allows an application to skip encoding lower layer(s).
Change only affects the 1 pass CBR SVC.
Change-Id: I5d63ab713c3e250fdf42c637f38d5ec8f60cd1fb
When configured with high bit detpth enabled, the 8bit quantize
function stopped using optimised code. This made 8bit content
decode slowly. This commit re-enables the SSSE3 optimisations.
Change-Id: I194b505dd3f4c494e5c5e53e020f5d94534b16b5
We have historically added new bits to cat6 whenever we added a new
transform size (or bitdepth, for that matter). However, we have
always coded these new bits regardless of the actual transform size,
which means that for smaller transforms, we code bits that cannot
possibly be set. The coding (quality) impact of this is negligible,
but the bigger issue is that this allows creating bitstreams with
coefficient values that are nonsensible and can cause int overflows,
which then de facto become part of the bitstream spec. By not coding
these bits, we remove this possibility.
See issue 1065.
Change-Id: Ib3186eca2df6a7a15ddc60c8b55af182aadd964d
This is identical to what the tile size does for the last tile. See
issue 1042 (which covers generalizing the superframe/tile concepts).
Change-Id: I1f187d2e3b984e424e3b6d79201b8723069e1a50
See issue 1051. 6 bits is fairly arbitrary but at least allows writing
delta Q values that are fairly normal in other codecs. I can extend to
8 if people want full range, although I personally don't have any need
for that.
Change-Id: I0a5a7c3d9b8eb3de4418430ab0e925d4a08cd7a0
The resolution check fixs the issue which resets resize_pending
unnecessarily and causes not-bitexact with previous one-step version.
Change-Id: I4e7660b3c8f34f59781e2e61ca30d61080c322de
When configured with high bit detpth enabled, the 8bit quantize
function stopped using optimised code. This made 8bit content
decode slowly. This commit re-enables the SSE2 optimisation
(but not the SSSE3 optimisation).
Change-Id: Id015fe3c1c44580a4bff3f4bd985170f2806a9d9
Temporary fix to denoiser when dynamic resizing is on.
-Reallocate denoiser buffers on resized frame.
-Force golden update on resized frame.
-Don't denoise resized frame, and copy source into denoised buffers.
Change-Id: Ife7638173b76a1c49eac7da4f2a30c9c1f4e2000
For screen-content mode, with frame dropper off, put a limit
on how low encoder buffer can go.
Under hard slide changes, the buffer level can go too low and then
take long time to come back up (in particular when frame-dropping
is not used), which will affect the active_worst and target frame size.
Change-Id: Ie9fca097e05cd71141f978ec687f852daf9de332
Dynamic resizing now support two-steps scaling: first go down to
3/4 and then 1/2. This feature is under a flag which controls the
switch between two-steps scaling and one-step scaling (1/2 only).
Change-Id: I3a6c1d3d5668cf8e016a0a02aeca737565604a0f
This is more a proof of concept than anything else. The problem here
isn't so much how to code it, but rather where to place the resulting
code. All intrapred DSP code lives in vpx_dsp, so do we want the vp10
specific intra pred functions to live there, or in vp10/?
See issue 1015.
Change-Id: I675f7badcc8e18fd99a9553910ecf3ddf81f0a05
The x86 simd expects this. Identical alignment can be found in vp9
and vp10 also. Fixes crashes on 32bit x86 systems.
Change-Id: I229c88d8f696acbef5337c8fa9503528df4e1c40
I've added a few new functions (d45e, d63e, he, ve) to cover the
filtered h/v 4x4 predictors that are vp8-specific, the "correct"
d45 with the correctly filtered bottom-right pixel (as opposed to
the unfiltered version in vp9), and the "broken" d63 with weirdly
filtered bottom-right pixels (which is correctly filtered in vp9).
There may be a minor performance impact on all systems because we
have to do an extra copy of the Above pixel array to incorporate
the topleft pixel in the same array (thus fitting the vpx_dsp API).
In addition, armv6 will have a more serious performance impact b/c
I removed the armv6/vp8-specific assembly. I'm not sure anyone
cares...
Change-Id: I7f9e5ebee11d8e21aca2cd517a69eefc181b2e86
define NOMINMAX to allow the std:: versions to be used; min/max will be
defined transitively via windows.h otherwise
Change-Id: I692b03fa3e70b7a53962d3fd209498f70f712fed
vp9_filter_block_plane_ss11() and vp9_filter_block_plane_non420()
are only called for the uv planes.
Change-Id: Iacd3b3242c8ce581edd37c8f06d95efc8a0f88a3
The loopfilter masks are now built in the decode loop.
This is done so we can eventually reduce the number of
MODE_INFO structs required by the decoder.
The encoder builds the masks for the entire frame prior
to calling the loopfilter.
Change-Id: Ia2146b07e0acb8c50203e586dfae0c4c5b316f11
When configured with high bitdepth enabled, the 8bit transform
stopped using optimised code. This made 8bit content decode slowly.
Change-Id: I67d91f9b212921d5320f949fc0a0d3f32f90c0ea
Update rate correction factor when we drop the frame due to overshoot.
Only affects when the drop_overshoot feature is on: screen_content_mode = 2.
Change-Id: I67e24de979b4c74744151d2ceb3cd75fec2a1e7a
In the decoder, map this to the output variable vpx_image_t.r_w/h.
This is intended as an improved version of VP9D_GET_DISPLAY_SIZE,
which doesn't work with parallel frame decoding. In the encoder,
map this to a codec control func (VP9E_SET_RENDER_SIZE) that takes
a w/h pair argument in a int[2] (identical to VP9D_GET_DISPLAY_SIZE).
Also add render_size to the encoder_param_get_to_decoder unit test.
See issue 1030.
Change-Id: I12124c13602d832bf4c44090db08c1009c94c7e8
The name "display_*" (or "d_*") is used for non-compatible information
(that is, the cropped frame dimensions in pixels, as opposed to the
intended screen rendering surface size). Therefore, continuing to use
display_* would be confusing to end users. Instead, rename the field
to render_*, so that struct vpx_image can include it.
Change-Id: Iab8d2eae96492b71c4ea60c4bce8121cb2a1fe2d
In practice, this fixes the issue that if you have an odd number of
mi_cols, on the full right of the image, the UV int4x4 loopfilter
will be skipped over odd cols as well as odd rows (because it holds a
single variable for both edges).
See issue 1016.
Change-Id: Id53b501cbff9323a8239ed4775ae01fe91874b7e
Use the existing QP condition on limiting cyclic refresh, and add
addiitonal condition that block has been encoded with zero/small motion
x frames in row (where x is at least several times the refresh period).
Additional condition only affect non-screen content mode.
This helps to improve visual stability for noisy input, where on steady
background areas the application of delta_qp may lead to encoding the noise.
Also added a change to use the true skip (after encoding) to update the
last QP.
Change-Id: I234a1128d017d284cf767fdb58ef6c59d809f679
When the iOS SDK major version is 9 or higher:
- Pass -fembed-bitcode to compiler, assembler, and linker.
- Add a warning for simulator targets since yasm doesn't know
what -fembed-bitcode means, and exits with an error.
BUG=https://code.google.com/p/webm/issues/detail?id=1075
Change-Id: I38c997a0225e53c5dd1b4ddf7935d21362953f76
Always add IOS_VERSION_MIN to darwin arm cflags. The warning occured
because the default (9.0) does not match the value set by configure
(6.0).
BUG=https://code.google.com/p/webm/issues/detail?id=1075
Change-Id: Ia9085ceeca10e057f9eb781c14f07581bb6280a5
- Use the iphoneos SDK path (instead of macosx).
- Detect iOS SDK major version and disable media (armv6) when using
iOS SDK version 9 or higher.
BUG=https://code.google.com/p/webm/issues/detail?id=1075
Change-Id: I12f77dbeee4c0084e8322f6841813da8b5e91c16
these have been supported in tile-threaded decoding since:
b3b7645 vp9_dthread: remove frame_parallel_decoding_mode requirement
Change-Id: Ia5a752db9be937153cf4830d9258752136356d1b
Limit transform size for intra to 16x16, for non-screen content mode.
Little/no change in speed or metrics.
32x32 intra block is rarley selected in RTC (non-screen content) case,
but some visual improvement can be seen in some example,
e.g., captured_video_dark_whd.yuv.
Change-Id: I68e2db87875343b3fb9bb407a7709f0088f84072
remove static from fdct4/8/16/32 in vp10/encoder/dct.c
add prefix vp10_ to fdct4/8/16/32
add vp10/encoder/dct.h
Change-Id: I644827a191c1a7761850ec0b1da705638b618c66
Reallocation of mi buffer fails if change size on the first frame and
change config in subsequent frames. Add a condition for resolution
check to avoid assertion failure.
BUG=1074
Change-Id: Ie26ed816a57fa871ba27a72db9805baaaeaba9f3
Reference frame masking logic may skip checking zeromv-last mode.
Fix to avoid this and make sure zero-last is always checked.
No noticeable change in speed, and PSNR/SSIM metrics on RTC set overall
neutral (very small gain ~0.02).
Small visual improvement on few RTC clips.
Change-Id: I26eacdc449126424001a4a64e5ac31949f064417
the range check in dct.c (abs(input[i]) < (1 << bit)) will fail in many
cases. this was broken at the time this check was added
BUG=1076
Change-Id: I3df8c7a555e95567d73ac16acda997096ab8d6e2
the range check in dct.c (abs(input[i]) < (1 << bit)) will fail in the
25-29 range. this was broken at the time this check was added
Change-Id: I8ca9607f6cbdc8be7f47696ffeabbab3ac5727e2
Shortcut arg for --extra-configure-args --enable-examples. Enables
the examples, and thus ensures that all versions of libvpx that
iosbuild.sh produces can actually be linked.
Change-Id: I2ddda094361bf0ac77f8d2ae542e4dc7b2cab158
fixes build on windows x64; previously 'heightq' i.e., the 64-bit register
was accessed when only the 32-bit value was needed. given this is from a
stack variable the upper bits were undefined.
+ bump register/xmm counts; users of SETUP_LOCAL_VARS touch xmm13 in
64-bit builds and filter_block1d16_v* uses one extra temp variable
Change-Id: I9c768c0b2047481d1d3b11c2e16b2f8de6eb0d80
This commit removes mbmi_ext_base pointer from MACROBLOCK struct.
Its use case can be fully covered by cpi->mbmi_ext_base pointer.
Change-Id: I155351609336cf5b6145ed13c21b105052727f30
Add SVC codec control to set the frame flags and buffer indices
for each spatial layer of the current (super)frame to be encoded.
This allows the application to set (and change on the fly) the
reference frame configuration for spatial layers.
Added an example layer pattern (spatial and temporal layers)
in vp9_spatial_svc_encoder for the bypass_mode using new control.
Change-Id: I05f941897cae13fb9275b939d11f93941cb73bee
This means that we don't reconstruct in 4x4 dimensions, but in
blocksize dimensions, e.g. 4x8 or 8x4. This may in some cases lead
to performance improvements. Also, if we decide to re-introduce
scalable coding support, this would fix the fact that you need to
re-scale the MV halfway the block in sub8x8 non-4x4 blocks.
See issue 1013.
Change-Id: If39c890cad20dff96635720d8c75b910cafac495
In vp9, the bottom MV would be the average of the topright and
bottomleft luma MV (instead of the bottomleft/bottomright luma MV).
See issue 993.
Change-Id: Ic91c0b195950e7b32fc26c84c04788a09321e391
This has virtually no effect on coding efficiency, but it is more
logical from a theoretical perspective (since it makes no sense to
me that you would exclude a MV from a list just because it's sign-
inversed value is identical to a value already in a list), and it
also makes the code simpler (it removes a duplicate value check in
cases where signbias is equal between the two MVs being compared).
See issue 662.
Change-Id: I23e607c6de150b9f11d1372fb2868b813c322d37
For reading, this makes the operation branchless, although it still
requires two shifts. For writing, this makes the operation as fast
as writing an unsigned value, branchlessly. This is also how other
codecs typically code signed, non-arithcoded bitstream elements.
See issue 1039.
Change-Id: I6a8182cc88a16842fb431688c38f6b52d7f24ead
The implicitly changed value would be used for contextualizing future
skip flags of neighbour blocks (bottom/right), which is certainly not
what was intended. The original code stems from vp8, and was useful
in cases where coding of the skip flag was disabled. In vp9, the skip
flag is always coded. The result of this change is that for bitstream
parsing purposes, decoding of the skip flag becomes independent of
decoding of block coefficients.
See issue 1014.
Change-Id: I8629e6abe76f7c1d649f28cd6fe22a675ce4a15d
In decoder, export (eventually) into vpx_image_t.range field. In
encoder, use oxcf->color_range to set it (same way as for
color_space).
See issue 1059.
Change-Id: Ieabbb2a785fa58cc4044bd54eee66f328f3906ce
Verify the dynamic resizer behavior for real time, 1 pass CBR mode.
Start at low target bitrate, raise the bitrate in the middle of the
clip, verify that scaling-up does occur after bitrate changed.
Change-Id: I7ad8c9a4c8288387d897dd6bdda592f142d8870c
Verify the dynamic resizer behavior for real time, 1 pass CBR mode.
Run at low bitrate, with resize_allowed = 1, and verify that we get
one resize down event.
Change-Id: Ic347be60972fa87f7d68310da2a055679788929d
For 1 pass CBR spatial-SVC:
Add cyclic refresh parameters to the svc-layer context.
This allows cyclic refresh (aq-mode=3) to be applied to
the whole super-frame (all spatial layers).
This gives a performance improvement for spatial layer encoding.
Addd the aq_mode mode on/off setting as command line option.
Change-Id: Ib9c3b5ba3cb7851bfb8c37d4f911664bef38e165
Fixes temporal scalability. Updates were inadvertently turned
off for two pass svc causing crashes due to gf_group.index
growing unchecked.
Change-Id: Iff759946bf61bbde70630347cc8fa4d51a8c2d2f
The counts didn't take usehp into account, which means that if the
scope of the refmv is too large for the hp bit to be coded, the value
(always 1) is still included in the stats. Therefore, the final
counts will not reflect the entropy of the coded bits, but rather the
entropy of the combination of coded bits and the implied value (which
is always 1). Fix that by only including counts if the hp bit is
actually coded.
See issue 1060.
Change-Id: I19a3adda4a8662a05f08a9e58d7e56ff979be11e
The normative (convolve8) filter is optimized/faster than
the nonnormative one. Pass usage of scaler (normative/nonomorative)
to vp9_scale_if_required(), and always use normative one for 1 pass.
Change-Id: I2b71d9ff18b3c7499b058d1325a9554de993dd52
Unify the style of fdct4() fdct8() fdct16()
Add fdct32()
Add range_check() at each stage
Add unit test at ../../test/vp10_dct_test.cc
Change-Id: I13f76d9046c3ea473c82024b09a5bc8662e2c28e
Upstream hash: 476366249e1fda7710a389cd41c57db42305e0d4
Changes from upstream since last update:
4763662 mkvparser: fix type warnings
267f71c mkvparser: SafeArrayAlloc fix type warning
f1a99d5 mkvparser: s/LONG_LONG_MAX/LLONG_MAX/ for compatibility
bff1aa5 mkvparser: add msvc compatibility for isnan/isinf
Change-Id: Ie0375e564fc74b3b296744d0039830d2f77b83b6
See issue 1030. The value of frame_parallel_decoding_mode was ignored
in vp9 if refresh_frame_context was 0, so instead make it a 3-member
enum where the dependency is obviously stated.
Change-Id: I37f0177e5759f54e2e6cc6217023d5681de92438
In vp9, [0] and [1] had identical meaning, so merge them into a
single value. Make it impossible to code RESET_FRAME_CONTEXT_NONE
for intra_only frames, since that is a non-sensical combination.
See issue 1030.
Change-Id: If450c74162d35ca63a9d279beaa53ff9cdd6612b
1) copy following files from vpx_dsp/ to vp10/common/
vp10_inv_txfm.c
vp10_inv_txfm.h
vp10_inv_txfm_sse2.c
vp10_inv_txfm_sse2.h
2) change the function prefix "vpx_" to "vp10_" in above files
3) add unit test at vp10_inv_txfm_test.cc
Change-Id: I206f10f60c8b27d872c84b7482c3bb1d1cb4b913
This condition is not effectively in use. The actual reference
frame masking is done in other route.
Change-Id: Ia59c843bcac7243dada92f0f67658d7ce43df5e8
remove 'u' and specify all objects to allow objects with the same
basename to be added and a incremental rebuild to succeed
fixes issue #1067
Change-Id: Id0ebc89be826a026f1bbf21b4e32a2b1af45154d
Take out speed features that affect the compression performance
to simplify the coding route. This commit removes the motion field
mode search used in speed 3.
Change-Id: Ifdf6862cb1ece8261125a56d9d89bcef60758c00
WebM files could have CodecId missing in the track headers. Treat those files as
unknown input file type in vpxdec.
Fixes issue #1064.
Change-Id: I6c3bb7b4bd3a4f5c244312482a5996f8b68db3f3
avoid duplicating internal structures and include vp9_dx_iface.c
directly. these had fallen out of sync after the frame-parallel branch
merge.
Change-Id: I604cfbffa95abe2a1c8e906a696f32436b1422ed
this file needs to be reworked to remove the duplication of codec
internals + allow for divergence of vp9 and vp10
Change-Id: I6266b94ccfbc24dae30148f134804b52aa411b88
prevents an int -> vpx_img_fmt_t conversion warning with high-bitdepth
as it modifies the image format
Change-Id: Ie3135d031565312613a036a1e6937abb59760a7e
Access scaled reference frame in the sub8x8 rate-distortion
optimization loop only when the current test mode is an inter mode.
This prevents an ioc warning triggered by sending intra_frame index
to fetch scaled reference frame.
Change-Id: I6177ecc946651dd86c7ce362e3f65c4074444604
For 3 temporal layers, reduce somewhat the
cyclic_refresh_mode_max_mbs_perframe parameter, from 20% to ~14%.
Small increase in PSNR/SSIM metrics.
Change-Id: Ia216fa5474048f1ef7fe3db88cd60dfef2a1bf8a
Change settings for 1 pass CBR.
And only use SET_SVC control(s) if there is at least 1 layer (spatial or temporal).
This allows sample encoder to also work for 1 layer case.
Change-Id: I5b0a33c25afb2f24a3a8aa4ec8ade9afc87cd702
This commit allows the encoder to include sub8x8 inter mode with
scaled reference frame in the rate-distortion optimization scheme.
Change-Id: Ibbe9678801592826ef22566566dcdeeb008350d5
Sync the encoder's buffer offset calculation for sub8x8 block motion
compensated prediction with scaled reference frame to match the
decoder's behavior. This resolves an enc/dec mismatch issue when
sub8x8 inter mode with scaled is turned on.
Change-Id: I4bab3672b007a5ae0c992f8a701341892d2458b0
The fields are always coded in the frame itself, so there is never any
dependency on past frames. In practice, this fixes sign_bias being
ignored when error_resilient_mode=1.
See issue 1011.
Change-Id: I9d134ef6b445ced4d100fa735ce579855a0fa5af
the check performed within the while was redundant; simply place the
accumulation after all tiles are decoded.
Change-Id: I6a74e87257c775fd8bfc8ac4511e4a6ad8f18346
This is based on the original patch optimized for 32bit
platforms by Tamar/Ilya and now uses the x86inc style asm.
The assembly was also modified to support 64bit platforms.
Change-Id: Ice12f249bbbc162a7427e3d23fbf0cbe4135aff2
These frame types cannot make bitstream parsing depend on previous
frames, so the hypothetical combinations of e.g. keyframe=1 and
update_map=0 or keyframe=1 and temporal_update=1 are non-sensical.
Therefore, make it impossible to code such combinations in the vp10
bitstream header.
See issue 1044.
Change-Id: I3f0a83d5c7e3989541a469a909471424a285239d
If the encoder dynamic resize is triggered and change config()
is then called, it will reset the current (resized) codec width/height
back to the the config (unresized) width/height (which will then
prevent the resizing action from occurring in encoder_loop).
Avoid this by checking for a change in the config width/height
before resetting the cm->width/height.
Change-Id: Id9d50c0ee8a943abe4b6c72bbaa02d9696f93177
In VP9, the order for frame header was: [0] smooth, [1] regular, [2]
sharp, [3] bilinear. Per-block, the order was [0] regular, [1] smooth
and [2] sharp. For VP10, swap smooth/regular in the frame header so
that the block ordering and frame header ordering are interchangeable.
See issue #1046.
Change-Id: Ic9ec5964874375e40cd59bef50b489a76cbe4365
Unify the style of fdct4() fdct8() fdct16()
Add fdct32()
Add range_check() at each stage
Add unit test at ../../test/vp10_dct_test.cc
Change-Id: I9e912b2c5683862e65c5a21abc3e1c260cca4576
* changes:
test: limit the valid image size on OS/2
configure: add -Zhigh-mem to LDFLAGS on OS/2
configure: disable PIC on OS/2
Makefile: add $(STACKREALIGN) to CFLAGS for vp9_reconintra.c
x86inc.asm: fix NASM compilation
* changes:
Only build multithreaded functions on mt builds.
Don't build calc_psnr for high bit depth.
Enable missing dual lpf test
Remove unused VP10 functions.
Mark VP10 functions as 'INLINE'
Remove unused functions from test files
Only build append_negative_gtest_filter when it is used.
Add INLINE decoration to static test functions
* changes:
vp9_mcomp: make search functions private
vp9_mbgraph: use vp9_full_pixel_search(HEX)
vp9_temporal_filter: use vp9_full_pixel_search(HEX)
vp9_firstpass: make vp9_init_subsampling private
vp9_encoder: make vp9_alloc_compressor_data private
If either the encoder or the decoder is enabled, CONFIG_VP<N> will be
set. This simplifies the conditional and passes the chromium update
script when CONFIG_ values are passed in with 'yes' and 'no' values.
This was failing because it was checking against empty strings but
they are set to 'no'
Change-Id: I02ecd557210088ba1458cd0e89eead5666f6597a
Spatial/temporal svc code was removed. Verified using Borg test,
and the results before and after the change are matching.
Change-Id: I4c2ee5cd560428e3e50be02e57e5871ef4246390
For one pass CBR: only check for updating refresh_golden
if ext_refresh_frame_flags_pending is not set (i.e., == 0).
And move the resetting of ext_refresh_frame_flags_pending = 0
down to after the encode_loop (and account for dropped frames).
This is to prevent changing refresh_golden flga when the user
supplies the reference/update flags.
Change-Id: I4d87b3e705ba43f243667e367503b585c61e2a54
* changes:
Only build ssse3 filter functions on 64 bit
Clean up unused function warnings in vp8 encoder
Clean up unused function warnings in vp8 onyx_if.c
These were lost in the great sub pixel variance move of
6a82f0d7fb
Not having these functions caused a ~10% performance regression in
some realtime vp8 encodes.
Change-Id: I50658483d9198391806b27899f2c0d309233c4b5
previously any flags added while setting up the toolchain would
override the user selections; environment variables could be treated
similarly
Change-Id: Ibfcc644137d8e579af554d19a38d4020019a7a34
Mark functions in findnearmv.h, invtrans.h and setupintrarecon.h
with INLINE.
Hide function in postproc.h behind the same #if as it's callers.
Change-Id: Ic1e014a943d2aca280f137019218b9d4f1443d61
This is a leftover of the XMA code which was removed a long time ago.
Found while looking for unused functions.
Change-Id: I07a3d542ae55440af59380dcdcf9a6c11cdfcb75
This commit adds clamp of new vectors similar to the logic in RD loop.
Such clamp is not necessary from the perspective of VP8 bitstream, but
is added to improve ChromeCast mirroring's robustness.
Change-Id: I42f6adbc60ffce283b994869364230858632d6fa
For each block in pickinter: use average of four middle
pixels (instead of single pixel) to set skin map.
This can help a little in reducing false skin detection in
some cases.
Change-Id: Ic247af75e9c2948b08ab977a39e061adacd8ec97
In high bitdepth setting, the rate multipier may be set as 0. In
lossless mode, the RD cost would always be 0, resulting in bad
partition and prediction mode choices.
Change-Id: I297014dd8bfa8a07ff0ab480119f75678300ff68
This patch just fixes the test for the time being, but does not
actually solve the underlying issue, which still needs investigation.
Change-Id: I54a35de839723f5b499b57e38dd2bdd400adc427
Switch to use the normative (convolve8) filter for source scaling,
only for 1/2x1/2 scaling for now. This is faster and has better
quality than either the vpx_scale_frame or the nonnormative scaler.
Remove the vp9_scale_if_required_fast, which is now not used.
Change-Id: I2f7d73950589d19baafb1fa650eac987d531bcc8
Define it as a function of reference frame types to provide
scalability for multiple reference frames.
Change-Id: I77b856c96916f352bc31004b9266b3f24e19bd0f
this restores the previous version's behavior avoiding issues with
builds that may split sources on directory boundaries; protected
visibility may work in this case.
Change-Id: If37c70d9bd81de85a8e112457b9819a5cac6129d
For 1 pass CBR mode under screen content mode:
if pre-analysis (source temporal-sad) indicates significant
change in content, then check the projected frame size after
encode_frame(), and if size is above threshold, force re-encode
of that frame at max QP.
Change-Id: I91e66d9f3167aff2ffcc6f16f47f19f1c21dc688
Only test for using golden as reference for variance partition
selection if it is used as a reference for that frame.
For temporal layers, golden may not be a reference on a given frame,
even though it was for some previous frame. If it is not a reference
for current frame, don't check/use it for partition selection.
Change-Id: I6b0f2bd36aebbb5903077c9a0a66d80f1de9a7b1
CONFIG_VP9_HIGHBITDEPTH is currently used by both vp9 and vp10, but in
many place outside vp9/vp10, the macro was used in conjunction of
CONFIG_VP9. This created a dependency on vp9 for vp10 to build. This
commit removes the dependency by use CONFIG_VP9_HIGHBITDEPTH only in
these places.
Change-Id: I8cc007fc9cf132394c6498ce6759e606b64a6ad0
This commit renames the vp10 encoder, decoder, and common interface
file names from vp9_ prefix to vp10_ prefix.
Change-Id: Iafb5d786e4b428d2b9bf097123bd86c4fa9ded24
For speed 7, real-time mode: Base layer frames are further apart
(for #temporal layers = 3, this is every 4 frames) so worth keeping
same motion search parameters (as in speed 6) on the base layer frames.
Change-Id: Idebf49dda6ef4f3d9a55aee55129a68253f692fb
gcc-based builds will allow a 0-element array, but visual studio builds
will not; this change hides the encoder and decoder specific symbols as
modules using them are selected based on the configuration.
Change-Id: Ic16ba9d12241070ec689dc5880164c14a4f7ca44
* changes:
Only use .text sections for aout
Use newer x86inc.asm
Use .text instead of .rodata on macho
Copy PIC handling code from x86_abi_support
Set 'private_extern' visibility for macho targets
Avoid 'amdnop' when building with nasm
Catch all elf formats
Expand PIC default to macho64 and respect CONFIG_PIC from libvpx
Use libvpx defines to set name mangling rules
Customize x86inc.asm for libvpx
Rename updated version of x86inc.asm
Use "private_prefix" instead of "program_name" and make vpx the default
prefix.
Change-Id: I4883a99b2aee8e5dc9f2c16a2e6f4b5d6e4de458
The read only sections are getting stripped on some OS X builds. As a
result, random data is used in place of the intended tables.
Change-Id: I58c18a53e503f093ee268451698c5761e6c32540
Other implementations of x86inc.asm have more comprehensive nasm
workarounds. This is the only thing that was changed for the previous
import to libvpx. See if we can still get away with it.
Change-Id: I3ef6fe9a4816461c89431a82b7e4a08b4b948d39
Make sure all variants get correct visibility and SECTION notes.
libvpx only pass elf32 and elf64 to the assembler, never just elf.
Change-Id: I7c36c115bf52436c9afe61985c859a2081948271
Revision a95584945dd9ce3acc66c6cd8f6796bc4404d40d
from git://git.videolan.org/x264.git
Temporarily name file x86inc.asm until all necessary local patches are
applied.
Change-Id: I9c7d0ed4d3ed900ae2d5db0abbcc048a2892c9b8
Use the correct period (in terms of cr->percent_refresh) for the condition
of larger delta-qp following key frame.
And account for larger interval for temporal layers.
Change-Id: Ibb43f5200f9b1eeb8bbb8211327b08ecda3c3b8a
Re-investigated the second-level sub-pixel motion search. Improved the
way of choosing search points. Rewrote the second-level search code.
At speed 0, the borg tests showed:
1. for stdhd set, Avg PSNR gain: 0.216%; Overall PSNR gain: 0.196%;
SSIM gain: 0.206%. Only 1 out of 15 clips showed PSNR loss.
2. for derf set, Avg PSNR gain: 0.171%; Overall PSNR gain: 0.192%;
SSIM gain: 0.207%. Only 3 out of 30 clips showed PSNR losses.
Added the condition for third-point checking, namely, less points
were checked. Speed tests showed no speed loss(Avg 0.3% speedup at
speed 0).
Change-Id: I6284ebb3fa7ba63be8528184c49e06757211a7f1
when configuring with mips32-android-gcc HAVE_MIPS32 would be set, but the
ndk does not set -mips32r2 for APP_ABI=mips which results in BSwap32 failing
to build; refine the check in endian_inl.h
Change-Id: I22893fe61f29111eb902d961b500b2174596268d
The test file compiler fails if one uses --disable-vp8-decoder
--enable-vp9-decoder. It effectively turns on CONFIG_VP8 and
CONFIG_DECODERS, but turns off CONFIG_VP8_DECODER, which causes
compiler error at test_vector_test.cc.
This commit fixes this issue by adding vp8/9 decoder flags to
the decoder behavior test, respectively.
Change-Id: I097ff8fd5e12715a94a565a82e54503885eb7187
-For ambient qp in active_worst setting: increase the initial
averaging time (from very first frame) to account for avg_qp of key_frame.
-In postencode on key frame: update the last_q/avg_q[key_frame] for
all temporal layers.
Change-Id: I5313153d350b1045b4835ce948dfffb7d2039b52
Condition usage of rc.frames_since_golden to non-svc mode.
rc.frames_since_golden, which is used in non-svc mode to add second reference,
was causing, under certain condiiton, the turning off of golden reference
for svc case.
Change-Id: Icec644d235d0471e56d8ff73d6c37278bd6ecd3b
and FUN_CONV_2D macros. The predict lut now handles
this case. The encoder now calls vpx_scaled_2d() instead
of vpx_convolve8() for scaling.
Change-Id: Ia1c8af8a31e4cb4887a587143108cb45835f7df7
This reverts commit a5e97d874b.
Additionally:
Revert "vpx_convolve_copy_sse2: fix win64"
This reverts commit 22a8474fe7.
This change performs poorly on various x86_64 devices affecting
performance by 1-3% at 1080P. Performance on chromebook like devices was
mixed neutral to slightly negative, so there should be minimal change
there.
Change-Id: I95831233b4b84ee96369baa192a2d4cc7639658c
This commit clears all the vp9_ prefix use case in vpx_dsp. It gets
the vp9 folder ready to branch out vp10.
Change-Id: I2906eec179ee792b4af8c9b4161313653050e931
This commit clears the function naming convention in vpx_dsp. It
replaces vp9_ prefix of global functions with vpx_ prefix. It also
removes the vp9_ prefix from static functions.
Change-Id: I6394359a63b71a51dda01342eec6a3cc08dfeedf
Choose a different diagonal point to check when the two costs are
the same, making it consistent with the way we choose the best mv.
This slightly changes the encoding result, and the derflr set borg
test at speed 0 shows 0.027% Overall PSNR gain, 0.024% Avg PSNR
gain, and 0.043% SSIM gain.
Change-Id: Ic8ee3a6767394866d159e4f9e1c777604dd73c17
If the current best mv(namely, the search center) is still the best mv
after the first level search, the second level checks is skipped. This
patch doesn't change the bitstream. At speed 0, it speeds up the encoder
by 1% - 2%.
Change-Id: I054c91b884d3f7aef157436c061744562bd6506d
Add a guard to exclud dspr2 inverse transform files from vpx_dsp
make file, when high bit-depth is turned on. This fixes the jenkins
nightly build.
Change-Id: Ibacd86563af1ec4810c550905b3fa0397baeeafc
Changes:
b6de61a Adds support for simple tags
75a6d2d sample_muxer: Don't write huge files.
cec1f85 mkvmuxer: remove unused timecode_scale variable
8a61b40 Merge "mkvparser: Tiny whitespace fix."
7affc5c clang-format re-run
d6d04ac mkvmuxer: use generic Cluster::AddFrame
4928b0b Merge "mkvmuxer: Write Block key frames correctly."
c2e4a46 Merge "sample_muxer: Use AddGenericFrame to add frames."
e97f296 mkvparser: Tiny whitespace fix.
d66ba44 Merge "Add support to parse DisplayUnit."
deb41c2 Add support to parse DisplayUnit.
42e5660 Fix issues on EBML lacing block parsing
fe1e9bb Fix block parsing to not allow frame_size = 0
2cb6a28 Change assertions to checks when parsing TrackPositions
d04580f Fixes issues on Block Group parsing
c3550fd mkvmuxer: Write Block key frames correctly.
5dd0e40 Merge "mkvmuxer: Set is_key to true for metadata blocks."
8e96863 mkvmuxer: Set is_key to true for metadata blocks.
a9e4819 sample_muxer: Use AddGenericFrame to add frames.
5a3be73 Change assertions to checks when load CuePoints
f99f3b2 mkvmuxerutil::EbmlDateElementSize: remove value param
ff572b5 Frame::IsValid: fix track_number check
b6311dc mkvmuxer: Refactor to remove a lot of duplicate code
256cd02 Merge "mkvmuxer: DiscardPadding should be signed integer."
16c8e78 mkvmuxer: s/frame/data in all AddFrame* functions.
c5e511c mkvmuxer: DiscardPadding should be signed integer.
4baaa2c Add framework build script: iosbuild.sh
3d06eb1 PATENTS: fix a typo: constitutes -> constitute
d3849c2 mkvparser: Dead code removal.
f439e52 Change assertions to checks when preloading Cues
d3a44cd Fix track transversal when listing Cues on sample
c6255af Tweak .gitignore so git status is clean after checkout and
build: - added missing underscore to sample_muxer - added cmake and make
related files
b5229c7 Makefile.unix: s/samplemuxer/sample_muxer/
e3616a6 Add support to parse stereo mode, display width and display
height in mkvparser
a4b68f8 parser: Fix bug in Chapters::Atom::Parse()
bab0a00 cmake: Set library and project name the proper way on Windows.
feeb9b1 Set library name to match Windows expectations.
b9a549b Fix CMakefile to generate libwebm.a
b386aa5 Add CMakeLists.txt and msvc_runtime.cmake.
b0f8a81 parser: Fix memory leak in Chapter parsing
f06e152 mkvmuxer: Fix MoveCuesBeforeClustersHelper recursive call.
27bb747 allow subtitle tracks with ContentEncodings
623d182 DoLoadCluster: tolerate empty clusters
1156da8 Update PATENTS to reflect s/VP8/WebM/g
0d4cb40 mkvmuxerutil: Use rand() in MSVC builds.
e12fff0 mkvmuxer: Overload WriteEbmlHeader for backward compatibility
a321704 mkvmuxer: write correct DocTypeVersion
574045e mkvmuxer: fix DiscardPadding
8be6397 Include crop elements when calculating size of Video element
8f2d1b3 mkvparser: fix DiscardPadding extraction
1c36c24 mkvmuxer: fix style guide violations
568504e Merge "UUIDs can have their high bit set"
acf788b Add support for CropLeft, CropRight, CropTop and CropBottom
elements.
418188b Merge "muxer: codec_id is a mandatory element"
07688c9 mkvmuxer: Reject frames if invalid track number is passed.
2a63e47 muxer: codec_id is a mandatory element
d13c017 UUIDs can have their high bit set
Change-Id: Iba28acb1ff774349d03e565f2641ddea132cf1e7
fixes link under vs9; this is the same change as:
dbf6e3f gen_msvs_vcxproj.sh: Avoid object name collisions.
Change-Id: I2a188c9024d0605e60e5e03ddcef1a25e7e53585
Chromium puts all the yasm output in the same directory. Looking at ways
to improve this but in the meantime get rid of collisions.
Change-Id: I923c5231d14e895ab96521eb89807ede868a0753
Ssim_vars is used to accumulate stats based 4x4 pixel blocks, this
commit changes the allocations size to be based on mi_rows and mi_cols
to avoid out-of-bound memory access for larger size videos. The hard
coded 720x480 can only work for image size up to 2880x1920.
Change-Id: Id9d07f3f777385b448ac88a6034b7472e4cf3c79
This commit moves the module inverse transform functions from vp9
to vpx_dsp folder. The hybrid transform wrapper functions stay in
the vp9 folder, since it involves codec-specific data structures.
Change-Id: Ib066367c953d3d024c73ba65157bbd70a95c9ef8
This got erroneously changed during the refactor. This fixes
SvcTest.TwoPassEncode2TemporalLayersWithMultipleFrameContextsAndTiles.
Change-Id: Ifa5ab0e098396c5e2d10478db87df256eadfa4c7
This function suffers from a couple problems in small core(tablets):
-The load of the next iteration is blocked by the store of previous iteration
-4k aliasing (between future store and older loads)
-current small core machine are in-order machine and because of it the store will spin the rehabQ until the load is finished
fixed by:
- prefetching 2 lines ahead
- unroll copy of 2 rows of block
- pre-load all xmm regiters before the loop, final stores after the loop
The function is optimized by:
copy_convolve_sse2 64x64 - 16%
copy_convolve_sse2 32x32 - 52%
copy_convolve_sse2 16x16 - 6%
copy_convolve_sse2 8x8 - 2.5%
copy_convolve_sse2 4x4 - 2.7%
credit goes to Tom Craver(tom.r.craver@intel.com) and Ilya Albrekht(ilya.albrekht@intel.com)
Change-Id: I63d3428799c50b2bf7b5677c8268bacb9fc29671
It in essence refactors the code for both the interpolation
filtering and the convolution. This change includes the moving
of all the files as well as the changing of the code from vp9_
prefix to vpx_ prefix accordingly, for underneath architectures:
(1) x86;
(2) arm/neon; and
(3) mips/msa.
The work on mips/drsp2 will be done in a separate change list.
Change-Id: Ic3ce7fb7f81210db7628b373c73553db68793c46
Don't run rate_block (cost_coeffs) if distortion alone is enough to
surpass best_rd.
This decreases 2nd pass runtime on HD at speed 2 by about 2%. There is
zero effect on output if tx_cache is removed.
Change-Id: Ia3b1cc77bfbe6ee988c395fde06c0eb92940b784
1. The RD scores obtained during the tx size selection were stored in the
tx cache, and used to help make the tx decision for the following frames.
This wasn't used anymore in VP9 encoder. Recovered the related decision
making code from 1.5+ years ago, and borg tests didn't show any quality
gain. This patch removed it to lower the complexity.
2. An optimization was done after the above refactoring. If the tx_mode
is not TX_MODE_SELECT, we only need to test the chosen tx size instead
of all posible tx sizes. This gave a 1.5% average speed gain at speed 2,
and a 1% average speed gain at speed 3.
Change-Id: Id8cd650e066a8cef33829d8c15388a8138adc78c
The forward 32x32 2D-DCT functions are aligned in vpx_dsp folder.
The vp9_dct.h file is not effectively used now.
Change-Id: Ie7946b6fdd784b8e91496242337bc9002c75c281
Replace the duplicate coefficient definition in neon implementations
of inverse transform with those from vpx_dsp/txfm_common.h
Change-Id: I4cd9bd9569ab1793dfdbb6f16d80bcb581599f0d
This commit replaces vp9_idct.h with txfm_common.h in many SIMD
implementation files for precise file dependency.
Change-Id: If73dd726bb16537e7494f28538b0a169810f9756
Separate the common coefficient constant into vpx_dsp/txfm_common.h.
Move the SSE2 macro definitions to vpx_dsp/x86/txfm_common_sse2.h.
This clears the use case of vp9_idct.h in vpx_dsp folder.
Change-Id: I319735a2abf42888e5080ac14cfbcde34be7b121
requires r10e or newer:
Android NDK, Revision 10e (May 2015)
...
Other bug fixes:
...
- Fixed .asm support for ABI x86_64.
Change-Id: I51ec9a5f77c982b7412d922e896348a83ae2d7d6
Avoid scaling the references if they have already been scaled.
Change only affects 1 pass non-svc mode for now.
Change-Id: I204f4079c026cba7adce7a7f855d072f6139ccec
The RD and load save/code grabs it as groups of four. In practice there
is no change to physical allocations becaquse this is backed by a 16-byte
memalign.
Change-Id: I01e89769872300e23227e03dd24a6e229f482025
Add vpx_dsp_rtcd.h to the header file list. The od_bin_fdct8x8()
here depends on forward 8x8 2D-DCT.
Change-Id: I1d71edc71f07069808823d2445c1cafd285e1b94
This commit factors out common macro definitions from the forward
and inverse transform implementations into vpx_dsp. It removes
the duplicate macro definitions from encoder and decoder folders.
Change-Id: I92301acbd3317075e9c5f03328a25abb123bca78
This commit factors the 4x4, 8x8, and 16x16 2D-DCT forward
transform operations into vpx_dsp folder.
Change-Id: I084b117b79c0925edcbcabb93f62b9f4bf8dbe7d
vp9_itrans*_dspr2.c aren't necessary for high bitdepth builds and
notably vp9_itrans8_dspr2.c fails in various configurations using a
codesourcery toolchain:
vp9_itrans8_dspr2.c:31:5: can't find a register in class 'GR_REGS' while reloading 'asm'
Change-Id: I2ac76203e65cc643cb835ab50e95701896d92a1a
This test places 128 in positions that would not be found
in the VP9 filter tables. The ssse3 code packs this table
into chars and uses the pmaddubsw instruction, which treats
the value as signed. The ssse3 code checks for 128 in
position 3, skipping the ssse3 code if found, and calls
vp9_convolve8_c(). vp9_convolve8_c() is also used for scaling.
ChangeFilterWorks breaks the ssse3 scaling code found in other
commits.
Change-Id: I1f5a76834bc35180b9094c48f9421bdb19d3d1cb
Eliminates the byte by byte read from bool decoder, by reading
in a size_t and then shifting it into place.
Change-Id: I0ed8c7b6f942847e79cc90105dc1d2b5b3deb0d6
This commit limits the scope of 1-D DCT and ADST functions within
vp9_dct.c and makes them static. This largely clears out the cross
referencing issue between vp9_dct.c and the SIMD optimizations.
Change-Id: If7cac478b11bb32328ccf70a9f60b709dad43d7f
The SSE2 version high bit-depth forward hybrid transforms are
essentially using the C functions via cross referencing to 1-D
functions in vp9_dct.c. This commit unifies the two versions and
removes the unnecessary dependency.
Change-Id: Ib4d0702a138f8daf7d0bd97c141ee7088f293765
Separate the hybrid transform case from 2D-DCT case. This will
allow us to clear up cross dependency between c and SIMD
implementations later.
Change-Id: Iaa499e8b096850a1c5a0c50a3b6e63e15d0184bf
The following quantization functions were moved:
vp9_quantize_b
vp9_quantize_b_32x32
vp9_highbd_quantize_b
vp9_highbd_quantize_b_32x32
vp9_quantize_dc
vp9_quantize_dc_32x32
vp9_highbd_quantize_dc
vp9_highbd_quantize_dc_32x32
The purpose of doing that was to allow these functions to be shared
by multiple codecs.
Change-Id: Id8ab939f283353cdd07bd930d47db3d932a5d87f
This commit moves the loop filter dspr2 implementation from vp9 to
vpx_dsp directory. It also fixes header file format issues.
Change-Id: I09203ed4bd267d7fd76bb79a6ee84a37646206b2
Remove the use of drop_frames_water_mark, as this is used for
frame dropping control. Use fixed threshold for now on buffer underflow.
Change-Id: If0ddda9f7f6fa96067cdcb0eccb42e17bda37c32
For vp9 decoder build without profile 2 and profile 3 support, this
commit changes to report error "Unsupported bitstream profile" for
input streams in profile 2 or 3, rather than other misleading error
information.
In addition, one of the invalid files in unit tests is actually coded
profile 2, this commit makes it tested only when the decoder is built
with vp9-highbitdepth.
This fixes issue #1028.
Change-Id: I8b6c1210787c8f89c703a546687dcf973ac20fc0
The various tap loop filter operations are common functions across
codec. This commit moves them along with SIMD optimizations to
vpx_dsp folder.
Change-Id: Ia5fa0b2e5289cdb98467502a549c380b9c60e92c
In aq-mode=3 under a resizing action (i.e., resize_pending != 0),
force an update of the golden reference frame.
Change-Id: I14806f6db71b5f8c827678cc5e1fc913c138a9a4
Fix bug in setting this flag for animated content.
The bug did cause quality to increase because far
more frames are not boosted than boosted.
However, the speed trade off to gain is a lot less
favorable and the behavior was not as intended.
Change-Id: I89fb70419c88b26f40b3534de0481730a1b3fcfa
Move the clamp functions to vpx_dsp_common.h file. Clear out the
dependency of vp9_loopfilter_filters.c on vp9_common.h file.
Change-Id: I9c4b928bcd7f597106b5aa96354356d3775a3431
Use drop_frames_water_mark for threshold on buffer underflow,
and change threshold for resize down.
Change-Id: I2de19adce50abe9bcdc0b107528cec8cc1857fcc
The fast scaling for 1 pass mode was being used only on the
first frame after resizing event (because resize_scale_num/den
is set to 1 and only changed for first frame following resize event).
Change-Id: I723b63e21823eb858f25f5662d2bbe4f1842e61f
Proper use/update of resize_state and resize_pending to constrain
the total amount of downsizing to be at most one scale down, for now.
Change-Id: Id18fc32499f2fbdbec16728dcdc9e4eac09098f0
From Change Ibf0c30b72074b3f71918ab278ccccc02a95a70a0
There is still an issue relating to one animated test clip with repeat
patterns where this change effectively increase the default maximum
arf interval by +1. This can be examined seperately.
Change-Id: Idd01d5480fc45202d8a059a0c3afc0997cc5bdd1
Flaten the intra block decoding process. It removes the legacy
foreach_transformed_block use in the decoder. This saves cycles
spent on retrieving the transform block position.
Change-Id: I21969afa50bb0a8ca292ef72f3569f33f663ef00
This commit simplifies the intra block boundary condition logic.
It removes the block index from the argument set.
Change-Id: If00142512eb88992613d6609356dfd73ba390138
Eliminates the byte by byte read from bool decoder, by reading
in a size_t and then shifting it into place.
Change-Id: Id89241977103fc3b973e4ed172a5cbf246998e5d
The clamp calls with INT32_MIN and INT32_MAX have no effect at all on
int values passed in, therefore this commit removes those effectless
clamps and also adds more const intermediate results to make the code
more readable.
Change-Id: I66d8811f58bb74ec31cbec9a6c441983a662352e
Rework the inter mode transform block decoding loop. Replace the
block index with the row and col index as the input argument. It
saves function call to compute the row and col index according to
the block index and overall block size, and many if statements
associated with the transform block position relative to the coding
block. For the test bit-stream pedestrian_area 1080p at 5 Mbps,
the decoding speed goes up from 81.13 fps to 81.92 fps.
Note that the intra coded block decoding needs more refactoring
work than the inter ones. So keep it using foreach_transforme_block
as for now.
Change-Id: I5622bdae7be28ed5af96693274057f55ba9b4fb4
The encoder gets its dqcoeff from the context tree. In the decoder move
it to directly after MACROBLOCKD.
Change-Id: I46c9b76f26956a360d17de0b26ecb994dae34ecb
Even if the recode loop is not enabled for the current frame type
trap the case where the projected size of a a frame is above the
maximum allowed in recode_loop_test()
Change-Id: I453004694b8f8699e3c2a83252e9f83adccdda4e
Changes to allow more use of rectangular partitions at
speeds 1 and 2 for content classed by the first pass as
animation and for blocks near the active image edge.
This has quite a big impact in quality for the animated
test sequence but also hurts encode speed for speed 2.
For other content types the impact on both speed and
quality is small.
Added some plumbing for detection of internal vertical
image edges.
Change-Id: I3fc48de2349f8cb87946caaf0b06dbb0ea261a9a
Change speed features / behavior for split mode when there
is an internal active edge (e.g. formatting bars).
Remove some threshold constraints in rd code near the active
edge of the image.
Add some plumbing for left and right active edge detection.
Patch set 5. Limit rd pass through for sub 8x8 to internal active edges.
This takes away any speed penalty for most clips but keeps the enhanced
edge coding for the more critical case of internal image edges
Change-Id: If644e4762874de4fe9cbb0a66211953fa74c13a5
Replace block index with transform type in the argument list. This
allows to save an extra fetch to the prediction mode. For pedestrian
area 1080p coded at 5 Mbps with single tile, the average decoding
speed goes up from 80.55 fps (before the refactoring series) to
81.13 fps.
Change-Id: Icbebf84ce63c19c0c92f3690ed201f6c3eab7881
If the pre-selected partition size (from variance partition) is
32x32, also apply nonrd partition search for 32x32 and 16x16 size.
Overall small positive gain in metrics, average ~1%.
Some visual improvement, for lower resolutions.
Change-Id: I69cb425bda94f7d13d34c451ab30e9276335a30e
The decoding process handles detokenization and reconstruction per
transform block sequentially. There is no need to offset the dqcoeff
buffer according to the transform block index. This allows to
reduce the memory spill and improve cache performance.
Change-Id: Ibb8bfe532a7a08fcabaf6d42cbec1e986901d32d
This commit replaces the vp8_ prefixed subtract function with the
common vpx_subtract_block function. It removes redundant SIMD
optimization codes and unit tests.
Change-Id: I42e086c32c93c6125e452dcaa6ed04337fe028d9
inline the code directly in read_mv_component(), the only place where it
was being used; this removes a function call in a hot function
Change-Id: I66f99c0c9ce3bc310101dbca4a470f023cc6fb55
To aid version management for integration with ffmpeg by use
of:
#ifdef VPX_CTRL_<CTRL_ID>
...
#endif
Change-Id: If550e06de4d3aa3685881f312ce6a86fa9de083b
Adds two new vp9 parameters --min-gf-interval and --max-gf-interval
to enable testing based on frequency of alt-ref frames.
Also adds a unit-test to test enforcement of min-gf-interval.
For both these parameters the default value is 0, which indicates
they are picked by the encoder, based on resolution and framerate
considerations. If they are greater than zero, the specified
parameter is honored.
(Additional note by paulwilkins)
Note that there is a slight oddity in that key frames are also GFs and
considered part of GF only group. However they are treated as not
being part of an arf group because for arf groups the previous GF is
assumed to be the terminal or overlay frame for the previous group.
(end note)
Change-Id: Ibf0c30b72074b3f71918ab278ccccc02a95a70a0
This reverts commit a42df86c03.
this change causes MSA/VP9SubpelVarianceTest.Ref and
MSA/VP9SubpelVarianceTest.ExtremeRef failures under
mips32r5el-msa-linux-gnu and mips64r6el-msa-linux-gnu
Change-Id: I40b71a0b774eaeb31f66f795733f95cf360909f7
This reverts commit 61774ad1c4.
this change causes MSA/VP9SubpelAvgVarianceTest.Ref failures under
mips32r5el-msa-linux-gnu and mips64r6el-msa-linux-gnu
Change-Id: I7fb520c12b2a3b212d5e84b7619a380a48e49bb0
The vp9_lpf_vertical_16_dual function optimized for x86 32bit target. The hot code in that function was caused by the call to the transpose8x16.
The gcc generated assembly created uneeded fills and spills to the stack. By interleaving 2 loads and unpack instructions, in addition to hoisting the consumer
instruction closer to the producer instructions, we eliminated most of the fills and spills and improve the function-level performance by 17%.
credit for writing the function as well as finding the root cause goes to Erik Niemeyer (erik.a.niemeyer@intel.com)
Change-Id: I6173cf53956d52918a047d1c53d9a673f952ec46
Added code to reduce the minimum partition size searched
for super blocks at or straddling the edge of the image.
If the first pass has detected formatting bars the "active" edge
may not be the real edge.
Change-Id: I9c4bdd1477e60f162a75fac95ba6be7c3521e05c
Correct the ARF boost calculations to partly discount
inactive or very low energy regions of the image.
Examples (formatting bars and 0 energy areas of animated clips).
Change-Id: I241af058d10aba8c67a4deca36deb913047d4561
This commit moves the primitive multi-threading files from vp9
folder to vpx_thread, which will be accessible by all vpx codec.
Change-Id: Ib51e66e9c69801c10631fab56d35a0c0aaed5883
the max value of the lookup in expanded form is:
(((1 << 7) - 1) << 1) - 65 + 1 + 64 = 254
remove the clamp [0, 253] and add one table entry
Change-Id: I0b5d0c66702fdb0b8f1cc9ab9b0dac66326e85a6
to MB_MODE_INFO_EXT. This saves 36 bytes per 8x8 area for
both the decoder and encoder. (encoder has two MODE_INFO
buffers)
Change-Id: If006abb2224acaf326df3c2be09e77e967662107
the offending assembly code was deleted in:
08e38f0 VP8 for ARMv8 by using NEON intrinsics 14
the intrinsics currently pass.
fixes issue #725
Change-Id: I43e4263bef21f9d9008c51ffdfa39fcf10b8e776
2014-11-05 18:10:59 -08:00
986 changed files with 177037 additions and 169146 deletions
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.