This matches bitdepth_conversion_sse2.asm and produces substantially
better assembly. The old way had lots of 'movzwl' and 'shl' and storing
back to memory before loading into an xmm register.
Change-Id: Ib33e35354dfd691a4f8b1e39f4dbcbb14cd5302b
This commit reworks the SSSE3 implementation of the forward 8x8
2D-DCT. It uses a cyclic rotation approach to the temporary xmm
registers. It reduces the average cycles from 158 to 154. The SSE2
version uses 169 cycles.
Change-Id: I1b79b9642aae0ed3fb3cefb5b70246e6de5d5caa
While we are at it:
- Rename some variables to more meaningful names
- Reuse some common consts from a header instead of redefining them.
Change-Id: I75c4248cb75aa54c52111686f139b096dc119328
(cherry picked from aomedia 09eea21)
Rename vpx_lpf_horizontal_edge_8() to vpx_lpf_horizontal_16().
Rename vpx_lpf_horizontal_edge_16() to vpx_lpf_horizontal_16_dual().
Change-Id: I798ca8fbbd657d06d3db2bfb0fb3321168f49e52
Note: some of these warnings are enabled by a combination of -Wunused
(added earlier) and -Wextra.
Cherry-picked from AOM 4790a69faaec8f03d65f64ff070f6ab4307dbb16
Expands use of (void)x; on unused variables. AOM only supports one codec
in codec_factory.h
Does not include changes to HandleDecodeResult. AOM removed
invalid_file_test.cc which does use the video parameter.
Does not enable -Wextra yet. There are more issues to fix.
BUG=webm:1069
Change-Id: I322a1366bd4fd6c0dec9e758c2d5e88e003b1cbf
assume __clang_major__==0 has the latest version of
_mm256_broadcastsi128_si256. fixes builds with custom clang toolchains.
BUG=b/30970831
Change-Id: I90becd56278e4716bd46e2ba9d910af977e8dfa6
This function only exists as a shortcut to subpixel variance with
predefined offsets. xoffset = 4 for horizontal, yoffset = 4 for vertical
and both for "hv"
Removing this allows the existing optimizations for the variance
functions to be called. Instead of having only sse2 optimizations, this
gives sse2, ssse3, msa and neon.
BUG=webm:1273
Change-Id: Ieb407b423b91b87d33c4263c6a1ad5e673b0efd6
disable clang-format for bilinear_filters_avx2
restores the row layout prior to:
099bd7f vpx_dsp: apply clang-format
but keeps the justification used by clang-format
Change-Id: Icf1733a37edb807e74c26b23a93963c03bd08fd7
The LLVM trunk has reached 4.0 and now __clang_major__ is not enough
to distinguish between old XCode Clang and the new 'real' Clang.
Using __apple_build_version__ allows to make this distinction.
BUG=chromium:631144
Change-Id: I0b6e46fddfe4f409c7b7e558bda34872e60ee2d9
Use pixel domain distortion metric in speed 0. This improves the
compression performance by 0.3% for both low and high resolution
test sets.
Change-Id: I5b5b7115960de73f0b5e5d0c69db305e490e6f1d
Speed test shows the new vertical filters have degradation on Celeron
Chromebook. Added "X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON" to control
the vertical filters activated code. Now just simply active the code
without degradation on Celeron. Later there should be 2 set of vertical
filters ssse3 functions, and let jump table to choose based on CPU type.
Change-Id: Iba2f1f2fe059a9d142c396d03a6b8d2d3b981e87
Due to rounding, hbd variance may become negative. This commit put in
check and clamp of negative values to 0.
Change-Id: I610d9c8aa2d4eebe7bc5f2c5624a9e3cadad4c94