Control already exists for vp9, adding it to vp8.
Usage is only when error_resilient is off.
Added a datarate unittest for non-zero boost.
Change-Id: I4296055ebe2f4f048e8210f344531f6486ac9e35
In 1 pass CBR, with error_resilience off, allow for
special logic to change the default gf behaviour.
In this CL: boost is turned off and the gf period
is set to a multiple of cyclic refresh period.
Change only affect 1 pass CBR mode, i.e, when the flag
gf_update_onepass_cbr is set.
Including the previous change (3ec8e11: to allow cyclic refresh
for error_resilience off), comparing metrics on RTC set for
error_resilience off vs on: avgPSNR/SSIM up by ~6%.
Change-Id: Id5b3fb62a4f04de5a805bd1b418f2b349574e0bc
vp8_short_inv_walsh4x4_msa - Optimized to process in short vector type
Updated below functions to store exact number of bytes in output rather than complete vector
idct4x4_addblk_msa
idct4x4_addconst_msa
dequant_idct4x4_addblk_msa
dequant_idct4x4_addblk_2x_msa
dequant_idct_addconst_2x_msa
Change-Id: Ic1b3752e2421dc7d70a082dcdaab9d140d7e5d9c
cyclic_refresh was tied to error_resilience mode.
Allow it to be on also for 1 pass CBR mode even if
error_resilience is off.
Other option to use new control for this, but prefer to avoid
that for now.
Change-Id: I3625b292ee059a890e31338b514e211bf0ab5c3e
The original commit never set any 'specialize' line:
61311e6103
It appears the sadx4 version of function uses sdx4df calls to speed up
the search. There are no sse3 versions of the sdx4df functions, but
there are sse2 and msa versions.
There is a neon version of vpx_sad16x16x4d but not any of the smaller
versions. Perhaps if they existed this function could be expanded to use
them.
Change-Id: I936d7d6b1a3ff6dcd5a4d2322272708c47cdec13
Remove unused zbin variable:
warning: unused parameter ‘zbin’
Use int for loop variables to avoid unsigned conversion:
warning: comparison between signed and unsigned integer expressions
Change-Id: Icea74b870c0ee68a8bf687e796a69392af25a8ad
The value 35468 changes sign when stored in int16_t:
implicit conversion from 'int' to 'int16_t' (aka 'short')
changes value from 35468 to -30068
This negation requires adding back the original value to compensate.
Shifting the value keeps the value positive and saves a post-vqdmulh
shift.
This technique is used in webp and idct_dequant_full_2x_neon
BUG=b/28027557
Change-Id: I0c5ce09bea170fe08061856c2af6f841a557e0c3
This restores d9dce2f48e
Switched to using signed shift-and-narrow. Instead of saturating
negative results to 0, it was saturating them to 255.
BUG=webm:817
BUG=webm:1273
Change-Id: I571095336aa4182e3288b17924fcaaece42b0a49
When filtering it needs 6 pixels: 2 prior to the source, the source, and
3 after the source.
When filtering 16 wide, that means 21. To accomplish this the SSE2 reads
[-2] to [5], [6] to [13], and [14] to [21], a total of 24 bytes (reading
in groups of 8 is easy)
The filter then shifts this last set to the top half of the register and
uses 'or' to combine it with the previous set.
Valgrind detected an issue reading pixels [19], [20] and [21]:
Address 0x7f581c2 is 434 bytes inside a block of size 441 alloc'd
Note: we only need pixels [16], [17], and [18] as context for [15].
To fix this, it now reads 8 bytes starting at [11], which re-loads [11]
through [13], but stops at [18] and does not over-read any values.
This is shifted by 5 and 'or'd with xmm1. Although the lower bits are
not cleared, they overlap directly with [11] through [13], so 'or'
produces the correct results.
Change-Id: I0c89c03afa660fc9b0108ac055d7bd403e493320
the --enable-postproc-visualizer configure option remains as a no-op as
do the control names and values for compatibility
+ remove the corresponding debug flags from vpxdec: --pp-*
Change-Id: I4a001cd9962b59560d7d6bda6272d4ff32b8d37c
similar to changes that were done in vp9 for encoded frame size
reporting. has the side-effect of quieting a -Wshorten-64-to-32 warning.
Change-Id: I89f74cb617fc29334ee351dc8dfaa3b8cfd4e5af
The code only has issues when xoffset == 0 and yoffset == 0 which
represents a simple copy. Presumably this case does not need to be
handled because the issue has existed since 2010.
BUG=webm:1287
Change-Id: Ic47e2653f3b729e99b40e53d8d2d8d1501edaaa9
This reverts commit d9dce2f48e.
Appears to be failing the SixtapPredict tests in some configurations and possibly test vectors as well.
Change-Id: Ica6aa83ebac47d0a76e451846e7da67b1c17a7d7
This function was removed when clang started introducing alignment hints
which caused the 32 bit vld1_lane_u32/vst1_lane_u32 to fail:
https://llvm.org/bugs/show_bug.cgi?id=24421
The load has been rendered safe with an implementation ~indiscernible
performance-wise that uses _u8 and over-reads just a touch.
It is still ~5x faster than C in the unaligned case and doing both
filters.
BUG=webm:892
BUG=webm:1273
Change-Id: Icf7167189391b46202f47233bb585c24c42bcc36
postproc.c is overloaded and used for both postproc and internal stats.
If only --enable-internal-stats is specified there are issues with
non-existent struct members and unused functions.
Change-Id: I82367f1ffce659c3918c9f964dbce94a716fbb89
This function was removed when clang started introducing alignment hints
which caused the 32 bit vld1_lane_u32/vst1_lane_u32 to fail:
https://llvm.org/bugs/show_bug.cgi?id=24421
The load has been rendered safe with an implementation ~indiscernible
performance-wise that uses _u8 and over-reads just a touch.
The store, when unaligned, has a version that is ~25% slower but safe
when xoffset = 0 (second pass filter only). When the first pass filter
(or both) are in play, the new version is almost identical in speed.
Worst case performance (both filters, unaligned stores) is roughly 3-4x
faster than C.
BUG=webm:817
BUG=webm:1273
Change-Id: I1e490e94453e0872151fe0dafb05557463f6247d
For some reason allocated_decoding_thread_count is signed, but decoding_thread_count is not.
Cleans -Wextra/-Wsign-compare:
comparison between signed and unsigned integer expressions
Change-Id: Id0ada78100acff27c1c4ed7493c563d13c55cdcd