When the superblock partition is based on the nonrd-pickmode,
we need to avoid the denoising. Current condition was based on
the speed level. This change is to make the condition at the
superblock level, as the switch in partitioning may be done at
sb level based on source_sad (e.g., in speed 6).
Change-Id: I12ece4f60b93ed34ee65ff2d6cdce1213c36de04
This reverts commit c9266b8547.
Disable source_sad when resolution > 1080P. The test should
pass now.
BUG=webm:1452
Change-Id: I72dde88e66590ff9e41da5e5dd83f5550a83f082
This reverts commit aa1c4cd140.
This fails the following tests with extreme input coefficients:
SSE2/InvTrans8x8DCT.CompareReference/0
SSE2/InvTrans8x8DCT.CompareReference/2
previously the optimized path was skipped in this range
Change-Id: I9af015a46eba96208834a219fafd651d37556a80
Move the source_sad feature to speed 6 (from speed 7), and
add speed feature to switch from the variance-based partition
to reference_partition (which uses nonrd-pickmode for bsize selection)
if source_sad is high.
Currently used only for speed 6 for resoln <= 360p.
About 4-5% improvement on 360p in RTC set.
Some speed slowdown, but still ~30% faster than speed 5.
Change-Id: Ib0330ee5fe9fdd2608aed91359a2a339d967491c
This reverts commit 03f5e300d6.
This causes test failures under OSX:
SSSE3/VP9QuantizeTest.EOBCheck/0
SSSE3/VP9QuantizeTest.OperationCheck/0
Change-Id: I122732717ead1f7af5b04c529a6948e382e5e59b
allow the right shift to operate on 64-bits, this matches the rest of
the implementations
previously:
b0f1ae147 vpx_get16x16var_avx2: correct cast order
Change-Id: I632ee5e418f3f9b30e79ecd05588eb172b0783aa
allow the right shift to operate on 64-bits, this matches the rest of
the implementations
missed in:
6acd061aa variance_avx2: sync variance functions with c-code
Change-Id: Icae436b881251ccb9f9ed64fcbf8d358c58a4617
For 8-bit the subtrahend is small enough to fit into uint32_t.
For 10/12-bit apply:
63a37d16f Prevent negative variance
previously:
47b9a0912 Resolve -Wshorten-64-to-32 in highbd variance.
c0241664a Resolve -Wshorten-64-to-32 in variance.
Change-Id: I181c85f0b9a03da37c2e8b89482d48aa3dbc0aee
Avoid unsigned overflow warning:
unsigned integer overflow: 19974 - 32703 cannot be represented in type
'unsigned int'
Change-Id: Ifebee014342e4c6f3b53306c0cad6ae0b465ac12
Backend specific optimization for PPC VSX reads 16 bytes, whereas arm neon /
sse2 only reads <= 8 bytes. Although the extra bytes read are actually never
used, this is not a warrant for groping around. Fixed by allocating more when
building for VSX. This is reported by asan.
Also note - PPC does have assembly that loads 64-bit content from memory - lxsdx
loads one 64-bit doubleword (whereas lxvd2x loads two 64-bit doubleword) from
memory. However, we only have "vec_vsx_ld" builtins that mapped to lxvd2x, no
builtins to lxsdx. The only way to access lxsdx is through inline assembly,
which does not fit well in the origin paradigm.
Refer:
vsx:
vpx_tm_predictor_4x4_vsx @ third_party/libvpx/git_root/vpx_dsp/ppc/intrapred_vsx.c
neon:
vpx_tm_predictor_4x4_neon @ third_party/libvpx/git_root/vpx_dsp/arm/intrapred_neon_asm.asm
sse2:
tm_predictor_4x4 @ third_party/libvpx/git_root/vpx_dsp/x86/intrapred_sse2.asm
BUG=b/63112600
Tested:
asan tests passed.
Change-Id: I5f74b56e35c05b67851de8b5530aece213f2ce9d
Keep optimized code out of the reference implementation. This matches
the style of the other sub calls.
Change-Id: I3da6acd4f2c647b029c420e22ac9410a18259689
0.007% regression on rtc and 0.004% gain on rtc_derf.
1 thread on QVGA,VGA and HD has ~0.2% speed regression while 2 threads has
~0.2% speed gain on Google Pixel.
Change-Id: Ia4a6ec904df670d7001e35e070b01e34149d23dc
Officially the quant structures are 8 elements, with one dc element and
7 repeated ac elements. The low bit depth optimizations take advantage
of this to fill the xmm registers. The high bit depth version manually
duplicates the values.
If all the optimizations were unified, the structure sizes could be
greatly reduced.
Change-Id: Ibd7a0337a7832ce2a1a05ee433c310077e1059ae
Use only valid values for quantize inputs. These were determined by
looping over vp9_init_quantizer and looking for max and min values.
This allows extending the test to the low bit depth functions which were
not designed to handle all possible inputs but only valid inputs.
Change-Id: I94e1d8863a49ac227845b65c6b50130e10e6319e