The value 35468 changes sign when stored in int16_t:
implicit conversion from 'int' to 'int16_t' (aka 'short')
changes value from 35468 to -30068
This negation requires adding back the original value to compensate.
Shifting the value keeps the value positive and saves a post-vqdmulh
shift.
This technique is used in webp and idct_dequant_full_2x_neon
BUG=b/28027557
Change-Id: I0c5ce09bea170fe08061856c2af6f841a557e0c3
This restores d9dce2f48e
Switched to using signed shift-and-narrow. Instead of saturating
negative results to 0, it was saturating them to 255.
BUG=webm:817
BUG=webm:1273
Change-Id: I571095336aa4182e3288b17924fcaaece42b0a49
removes some unnecessary casts and adds a few explicit uint32 ones for
larger sizes to quiet -Wshorten-64-to-32 warnings
Change-Id: I63c5fce8e62c426d5cf5c10a66a113c119a43518
Do nothing in vp9_highbd_iht#x#_##_add_c when input magnitude is beyond
20 bits. Note that, sign bit is not included here.
In the 20 bits, we use 12 bits for input signal, 7 bits for forward
transform amplification, and 1 bit for contingency in rounding and
quantizing
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1286
Change-Id: I332c6f68df4614fc2e7d2dc4c5bb0d0cff8a245c
When filtering it needs 6 pixels: 2 prior to the source, the source, and
3 after the source.
When filtering 16 wide, that means 21. To accomplish this the SSE2 reads
[-2] to [5], [6] to [13], and [14] to [21], a total of 24 bytes (reading
in groups of 8 is easy)
The filter then shifts this last set to the top half of the register and
uses 'or' to combine it with the previous set.
Valgrind detected an issue reading pixels [19], [20] and [21]:
Address 0x7f581c2 is 434 bytes inside a block of size 441 alloc'd
Note: we only need pixels [16], [17], and [18] as context for [15].
To fix this, it now reads 8 bytes starting at [11], which re-loads [11]
through [13], but stops at [18] and does not over-read any values.
This is shifted by 5 and 'or'd with xmm1. Although the lower bits are
not cleared, they overlap directly with [11] through [13], so 'or'
produces the correct results.
Change-Id: I0c89c03afa660fc9b0108ac055d7bd403e493320
On 32 bit machines 'new' does not always appear to allocate sufficiently
aligned buffers, causing intermittent test failures.
Change-Id: I0db4fc73782012e4eef71dc0fb540e74fdbfcebe
the --enable-postproc-visualizer configure option remains as a no-op as
do the control names and values for compatibility
+ remove the corresponding debug flags from vpxdec: --pp-*
Change-Id: I4a001cd9962b59560d7d6bda6272d4ff32b8d37c
similar to changes that were done in vp9 for encoded frame size
reporting. has the side-effect of quieting a -Wshorten-64-to-32 warning.
Change-Id: I89f74cb617fc29334ee351dc8dfaa3b8cfd4e5af
The referenced bug was fixed by saving neon registers. That this had any
effect was coincidental.
Both chromium and Android build with clang and neither uses this flag.
Change-Id: I470247d6fd9226fc207b42a187105581a94badc3
The vp9_mv_class0_tree is a balanced tree with two leafs and can
simply be coded as a boolean with probability class0[0].
Change-Id: If294dac825a5f945371092c74aa8e3f84cd962b6
(cherry picked from commit be8a8ab62ebdd111c6f2e9a33b15630570671eba)
assume __clang_major__==0 has the latest version of
_mm256_broadcastsi128_si256. fixes builds with custom clang toolchains.
BUG=b/30970831
Change-Id: I90becd56278e4716bd46e2ba9d910af977e8dfa6
The code only has issues when xoffset == 0 and yoffset == 0 which
represents a simple copy. Presumably this case does not need to be
handled because the issue has existed since 2010.
BUG=webm:1287
Change-Id: Ic47e2653f3b729e99b40e53d8d2d8d1501edaaa9
Build out the sixtap_predict test because the filters are
interchangeable. Add verbose failures and border checking.
Change-Id: I962f50041750dca6f8d0cd35a943424cf82ddcb1
This reverts commit d9dce2f48e.
Appears to be failing the SixtapPredict tests in some configurations and possibly test vectors as well.
Change-Id: Ica6aa83ebac47d0a76e451846e7da67b1c17a7d7
This function was removed when clang started introducing alignment hints
which caused the 32 bit vld1_lane_u32/vst1_lane_u32 to fail:
https://llvm.org/bugs/show_bug.cgi?id=24421
The load has been rendered safe with an implementation ~indiscernible
performance-wise that uses _u8 and over-reads just a touch.
It is still ~5x faster than C in the unaligned case and doing both
filters.
BUG=webm:892
BUG=webm:1273
Change-Id: Icf7167189391b46202f47233bb585c24c42bcc36