This commit bring all up-to-date changes from master that are
applicable to nextgenv2. Due to the remove VP10 code in master,
we had to cherry pick the following commits to get those changes:
Add default flags for arm64/armv8 builds
Allows building simple targets with sane default flags.
For example, using the Android arm64 toolchain from the NDK:
https://developer.android.com/ndk/guides/standalone_toolchain.html
./build/tools/make-standalone-toolchain.sh --arch=arm64 \
--platform=android-24 --install-dir=/tmp/arm64
CROSS=/tmp/arm64/bin/aarch64-linux-android- \
~/libvpx/configure --target=arm64-linux-gcc --disable-multithread
BUG=webm:1143
vpx_lpf_horizontal_4_sse2: Remove dead load.
Change-Id: I51026c52baa1f0881fcd5b68e1fdf08a2dc0916e
Fail early when android target does not include --sdk-path
Change-Id: I07e7e63476a2e32e3aae123abdee8b7bbbdc6a8c
configure: clean up var style and set_all usage
Use quotes whenever possible and {} always for variables.
Replace multiple set_all calls with *able_feature().
Conflicts:
build/make/configure.sh
vp9-svc: Remove some unneeded code/comment.
datarate_test,DatarateTestLarge: normalize bits type
quiets a msvc warning:
conversion from 'const int64_t' to 'size_t', possible loss of data
mips added p6600 cpu support
Removed -funroll-loops
psnr.c: use int64_t for sum of differences
Since the values can be negative.
*.asm: normalize label format
add a trailing ':', though it's optional with the tools we support, it's
more common to use it to mark a label. this also quiets the
orphan-labels warning with nasm/yasm.
BUG=b/29583530
Prevent negative variance
Due to rounding, hbd variance may become negative. This commit put in
check and clamp of negative values to 0.
configure: remove old visual studio support (<2010)
BUG=b/29583530
Conflicts:
configure
configure: restore vs_version variable
inadvertently lost in the final patchset of:
078dff7 configure: remove old visual studio support (<2010)
this prevents an empty CONFIG_VS_VERSION and avoids make failure
Require x86inc.asm
Force enable x86inc.asm when building for x86. Previously there were
compatibility issues so a flag was added to simplify disabling this
code.
The known issues have been resolved and x86inc.asm is the preferred
abstraction layer (over x86_abi_support.asm).
BUG=b:29583530
convolve_test: fix byte offsets in hbd build
CONVERT_TO_BYTEPTR(x) was corrected in:
003a9d2 Port metric computation changes from nextgenv2
to use the more common (x) within the expansion. offsets should occur
after converting the pointer to the desired type.
+ factorized some common expressions
Conflicts:
test/convolve_test.cc
vpx_dsp: remove x86inc.asm distinction
BUG=b:29583530
Conflicts:
vpx_dsp/vpx_dsp.mk
vpx_dsp/vpx_dsp_rtcd_defs.pl
vpx_dsp/x86/highbd_variance_sse2.c
vpx_dsp/x86/variance_sse2.c
test: remove x86inc.asm distinction
BUG=b:29583530
Conflicts:
test/vp9_subtract_test.cc
configure: remove x86inc.asm distinction
BUG=b:29583530
Change-Id: I59a1192142e89a6a36b906f65a491a734e603617
Update vpx subpixel 1d filter ssse3 asm
Speed test shows the new vertical filters have degradation on Celeron
Chromebook. Added "X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON" to control
the vertical filters activated code. Now just simply active the code
without degradation on Celeron. Later there should be 2 set of vertical
filters ssse3 functions, and let jump table to choose based on CPU type.
improve vpx_filter_block1d* based on replace paddsw+psrlw to pmulhrsw
Make set_reference control API work in VP9
Moved the API patch from NextGenv2. An example was included.
To try it, for example, run the following command:
$ examples/vpx_cx_set_ref vp9 352 288 in.yuv out.ivf 4 30
Conflicts:
examples.mk
examples/vpx_cx_set_ref.c
test/cx_set_ref.sh
vp9/decoder/vp9_decoder.c
deblock filter : moved from vp8 code branch
The deblocking filters used in vp8 have been moved to vpx_dsp for
use by both vp8 and vp9.
vpx_thread.[hc]: update webp source reference
+ drop the blob hash, the updated reference will be updated in the
commit message
BUG=b/29583578
vpx_thread: use native windows cond var if available
BUG=b/29583578
original webp change:
commit 110ad5835ecd66995d0e7f66dca1b90dea595f5a
Author: James Zern <jzern@google.com>
Date: Mon Nov 23 19:49:58 2015 -0800
thread: use native windows cond var if available
Vista / Server 2008 and up. no speed difference observed.
100644 blob 4fc372b7bc6980a9ed3618c8cce5b67ed7b0f412 src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h
vpx_thread: use InitializeCriticalSectionEx if available
BUG=b/29583578
original webp change:
commit 63fadc9ffacc77d4617526a50c696d21d558a70b
Author: James Zern <jzern@google.com>
Date: Mon Nov 23 20:38:46 2015 -0800
thread: use InitializeCriticalSectionEx if available
Windows Vista / Server 2008 and up
100644 blob f84207d89b3a6bb98bfe8f3fa55cad72dfd061ff src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h
vpx_thread: use WaitForSingleObjectEx if available
BUG=b/29583578
original webp change:
commit 0fd0e12bfe83f16ce4f1c038b251ccbc13c62ac2
Author: James Zern <jzern@google.com>
Date: Mon Nov 23 20:40:26 2015 -0800
thread: use WaitForSingleObjectEx if available
Windows XP and up
100644 blob d58f74e5523dbc985fc531cf5f0833f1e9157cf0 src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h
vpx_thread: use CreateThread for windows phone
BUG=b/29583578
original webp change:
commit d2afe974f9d751de144ef09d31255aea13b442c0
Author: James Zern <jzern@google.com>
Date: Mon Nov 23 20:41:26 2015 -0800
thread: use CreateThread for windows phone
_beginthreadex is unavailable for winrt/uwp
Change-Id: Ie7412a568278ac67f0047f1764e2521193d74d4d
100644 blob 93f7622797f05f6acc1126e8296c481d276e4047 src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h
vp9_postproc.c missing extern.
BUG=webm:1256
deblock: missing const on extern const.
postproc - move filling of noise buffer to vpx_dsp.
Fix encoder crashes for odd size input
clean-up vp9_intrapred_test
remove tuple and overkill VP9IntraPredBase class.
postproc: noise style fixes.
gtest-all.cc: quiet an unused variable warning
under windows / mingw builds
vp9_intrapred_test: follow-up cleanup
address few comments from ce050afaf3e288895c3bee4160336e2d2133b6ea
Change-Id: I3eece7efa9335f4210303993ef6c1857ad5c29c8
This commit clears the function naming convention in vpx_dsp. It
replaces vp9_ prefix of global functions with vpx_ prefix. It also
removes the vp9_ prefix from static functions.
Change-Id: I6394359a63b71a51dda01342eec6a3cc08dfeedf
This commit factors the 4x4, 8x8, and 16x16 2D-DCT forward
transform operations into vpx_dsp folder.
Change-Id: I084b117b79c0925edcbcabb93f62b9f4bf8dbe7d
Updated sources according to improved version of common MSA macros.
Enabled idct MSA hooks and tests.
Overall, this is just upgrading the code with styling changes.
Change-Id: I1f488ab2c741f6c622b7a855388a202168082209
Done little restructuring/styling changes to the sources like generic macro definitions, their use to reduce code lines, better code alignments etc.
Disabled all MSA hooks and tests
Change-Id: Ic6f2dce0b501f46b80c06c46c0fe2043d557b190
this macro was used inconsistently and only differs in behavior from
DECLARE_ALIGNED when an alignment attribute is unavailable. this macro
is used with calls to assembly, while generic c-code doesn't rely on it,
so in a c-only build without an alignment attribute the code will
function as expected.
Change-Id: Ie9d06d4028c0de17c63b3a27e6c1b0491cc4ea79
The rotation computation using 2X of cos(pi/16) has a potential to
overflow 32 bit, this commit disable the function to allow further
investigation and optimization.
Change-Id: I4a9803bc71303d459cb1ec5bbd7c4aaf8968e5cf
This reverts commit 7d07f512cd87446eef541e9af4af19b1e8c6342a.
this breaks visual studio builds:
'#' : invalid character : possibly the result of a macro expansion
Change-Id: I77170d549afb71e75a878fa0f6acd204fe8d9e67
The test filter is not a prefix matcher. It requires test type to
contain no more than the optimization type. In this example, SSSE3_64
fails to match and the test is not skipped even when SSSE3 is not
available.
Change-Id: Ia74229a167c88da4e6da169012a7a77d438c3f75
Incorporates the WRAPLOW macro into the non-highbitdepth transforms
to aid hardware verification between a software C model and an
intended hardware implementation though the use of the configure
options: --enable-experimental --enable-emulate-hardware.
Note that to avoid further discrepancies between the sse/sse2
implementations of the transforms and the C implementation, when the
emulate hardware option is invoked, we also disable sse/sse2/etc.
Also incudes some minor cleanups/renaming etc.
Change-Id: Ib864d8493313927d429cce402982f1c8e45b3287
Adds various high bitdepth transform functions and tests.
Much of the changes are related to using typedefs tran_low_t
and tran_high_t for the final transform cofficients and intermediate
stages of the transform computation respectively rather than fixed
types int16_t/int. When vp9_highbitdepth configure flag is off,
these map tp int16_t/int32_t, but when the flag is on, they map
to int32_t/int64_t to make space for needed extra precision.
Change-Id: I3c56de79e15b904d6f655b62ffae170729befdd8
used to wrap API functions to ensure full environment consistency as
opposed to the renamed ASM_REGISTER_STATE_CHECK which is used with
assembly functions.
currently checks the FPU tag word in x86/x86_64 gcc builds to ensure
emms has been called.
Change-Id: Ie241772dbf903d33d516a1add4c8c6783f2e1490
This commit enables unit test for SSSE3 16x16 inverse 2D-DCT with
10 non-zero coefficients. It includes a new test condition to
cover the potential overflow issue due to extremely coarse quantization.
Change-Id: I945e16f05dfbe19500f0da5f15990feba8e26d99
This commit enables SSSE3 implementation of the inverse 2D-DCT
with only first 10 coefficients non-zero. It reduces the runtime
of SSE2 version from 745 cycles to 538 cycles, i.e., 27% speed-up.
Change-Id: I18ba4128859b09c704a6ee361d69a86c09fe8dfe
The scanning order has the first 12 coefficients of the 8x8 2D-DCT
sitting in the top left 4x4 block. Hence the partial inverse 8x8
2D-DCT allows to handle cases with eob below 12.
The overall runtime of the inverse 8x8 2D-DCT unit is reduced from
166 cycles (using SSE2) to 150 cycles (using SSSE3).
Change-Id: I4514f9748042809ac84df4c14382c00f313f1cd2