generic-library/vpx

Author	SHA1	Message	Date
Frank Galligan	ec1d8387e1	Add 64x64 sub_pel_variance Neon function On Nexus 7 speed -5, -6, -7, and -8 saw about a 15% increase in perf for 480p. Speeds -5, -6, -7, and -8 saw about a 10% increase in perf for 720p. Tested on Nexus 7, built with ndk r10d, gcc 4.9. Change-Id: I2fa5315845e3021c9a6e2ea47e52e68b398d8334	2015-01-14 08:36:24 -08:00
Frank Galligan	74d40cd507	Add 64x variance Neon functions Add optimized Neon functions of: vp9_variance32x64 vp9_variance64x32 vp9_variance64x64 On Nexus 7 speed -5 and -6 saw about a 4% increase in perf. Speeds -7 and -8 saw about a 6% increase in perf. Tested on Nexus 7, built with ndk r10d, gcc 4.9. Change-Id: I5a81f13c9897eb927fa39662530f5524a0f768fa	2015-01-13 15:08:13 -08:00
Peter de Rivaz	48032bfcdb	Added sse2 acceleration for highbitdepth variance Change-Id: I446bdf3a405e4e9d2aa633d6281d66ea0cdfd79f (cherry picked from commit `d7422b2b1e`) (cherry picked from commit `6d741e4d76`)	2014-11-14 15:18:53 -08:00
Scott LaVarnway	fe2cc873dc	VP8 encoder for ARMv8 by using NEON intrinsics 1 Add vp8_mse16x16_neon.c - vp8_mse16x16_neon - vp8_get4x4sse_cs_neon Change-Id: I108952f60a9ae50613f0ce3903c2c81df19d99d0 Signed-off-by: James Yu <james.yu@linaro.org>	2014-09-15 12:04:09 -07:00
Dmitry Kovalev	1f19ebbab6	Replacing vp9_get_mb_ss_sse2 asm implementation with intrinsics. Change-Id: Ib4f5dd733eb2939b108070a01e83da5d9990bac0	2014-09-06 00:10:25 -07:00
Dmitry Kovalev	202edb3d23	Actually resetting random generator for all variance test cases. Calling Reset(int) method instead of overloaded operator()(int). Adding underscore at the end of class member name. Change-Id: I01934e7bc056d4b594e5d05d693328febd34ac3c	2014-09-04 12:24:52 -07:00
Dmitry Kovalev	12cd6f421d	Removing variance MMX code. Removed functions: * vp9_mse16x16_mmx * vp9_get_mb_ss_mmx * vp9_get4x4var_mmx * vp9_get8x8var_mmx * vp9_variance4x4_mmx * vp9_variance8x8_mmx * vp9_variance16x16_mmx * vp9_variance16x8_mmx * vp9_variance8x16_mmx They all have SSE2 equivalent. Change-Id: I3796f2477c4f59b35b4828f46a300c16e62a2615	2014-08-29 10:26:42 -07:00
levytamar82	69a5f5ecf7	Fix bug 807 in the sub_pixel_variance function the dst is aligned to 16 bytes and not to 32 bytes - now load unaligned data Change-Id: I2e0b9745543697efc56fefa32857ea10117af135	2014-08-07 18:51:02 -07:00
Scott LaVarnway	98165ec074	Neon version of vp9_sub_pixel_variance8x8(), vp9_variance8x8(), and vp9_get8x8var(). On a Nexus 7, vpxenc (in realtime mode, speed -12) reported a performance improvement of ~1.2%. Change-Id: I8a66ac2a0f550b407caa27816833bdc563395102	2014-08-01 11:35:55 -07:00
Scott LaVarnway	d39448e2d4	Neon version of vp9_sub_pixel_variance32x32(), vp9_variance32x32(), and vp9_get32x32var(). Change-Id: I8137e2540e50984744da59ae3a41e94f8af4a548	2014-07-31 08:00:36 -07:00
Scott LaVarnway	521cf7e879	Neon version of vp9_sub_pixel_variance16x16(), vp9_variance16x16(), and vp9_get16x16var(). On a Nexus 7, vpxenc (in realtime mode, speed -12) reported a performance improvement of ~16.7%. Change-Id: Ib163aa99f56e680194aabe00dacdd7f0899a4ecb	2014-07-30 08:17:32 -07:00
Yunqing Wang	5c93e62e0a	Allocate aligned source in variance test The source buffer is an aligned buffer in VP9. Added the alignment to make it consistent with libvpx. Change-Id: I3ebb9d2e8555ed532951da479dd5cbbb8812e02d	2014-07-24 17:11:58 -07:00
James Zern	29e1b1a4b0	tests: add API_REGISTER_STATE_CHECK used to wrap API functions to ensure full environment consistency as opposed to the renamed ASM_REGISTER_STATE_CHECK which is used with assembly functions. currently checks the FPU tag word in x86/x86_64 gcc builds to ensure emms has been called. Change-Id: Ie241772dbf903d33d516a1add4c8c6783f2e1490	2014-07-10 12:40:31 -07:00
James Zern	520cb3f39f	vp9_sub_pixel_variance: disable avx2 variants tests failing under Win32/Win64 + variance_test: add missing avx2 functions (partially disabled) Change-Id: I6abc0657ea076379ab9ca65c12678b9ea199849d	2014-06-10 16:11:15 -07:00
James Zern	6e5e75fa21	Revert "Removing redundant variables from variance_test.cc." This reverts commit `4725ab7e51`. The constants are necessary to avoid breakage in vs9 builds: warning C4180: qualifier applied to function type has no meaning; ignored error C2436: 'f2_' : member function or nested class in constructor initializer list while compiling class template member function 'std::tr1::tuple<T0,T1,T2,T3,T4,T5,T6,T7,T8,T9>::tuple(const int &,const int &,unsigned int (__cdecl &))' ..\test\variance_test.cc : see reference to class template instantiation 'std::tr1::tuple<T0,T1,T2,T3,T4,T5,T6,T7,T8,T9>' being compiled Change-Id: Ia218b74fc473d40f02fee84cb7009adfbe82e5a7	2014-05-08 14:35:40 -07:00
Dmitry Kovalev	4725ab7e51	Removing redundant variables from variance_test.cc. Change-Id: Icd44bce1c9d292f6e6f4d5157b694f6170b7b289	2014-05-07 14:40:21 -07:00
James Zern	d5e07a8451	variance_test: add NEON functions note not all functions have NEON implementations: - variance4x4_neon Change-Id: I03c1ba21f3b02aa2482d7ca8feedc3ef74b5947f	2014-02-26 19:25:02 -08:00
James Zern	002ad40897	test/: remove unnecessary extern "C"s Change-Id: I826655a708010149de231ca31a2e3ba4f1842c0c	2014-01-23 19:42:59 -08:00
James Zern	a0fcbcfa5f	fix vp8-only build Change-Id: Id9ce44f3364dd57b30ea491d956a2a0d6186be05	2013-09-17 18:47:25 -07:00
Yaowu Xu	afffa3d9b0	cleanup cpplint warnings Suggested by James Zern to clear out cpplint warnings for all unit test code. Change-Id: I731a3fa4d2a257eb9ef733426ba84286fbd7ea34	2013-09-06 10:13:49 -07:00
Jim Bankoski	5b307886fb	variance x86inc guards also fixed bug in sad calcs Change-Id: I6571fcbe37556c16ae32be66dc0fd879852aac1d	2013-08-06 14:17:13 -07:00
James Zern	e247ab09a6	variance_test: add missing ClearSystemState... ...to recently added SubpelVarianceTest Change-Id: I8775e39fd5dbfba81ad42b79b47bf6dd6ca8cc0e	2013-06-26 18:32:21 -07:00
Ronald S. Bultje	ac6ea2ab91	Allocate memory using appropriate expected alignment in unit tests. Fixes crashes of test_libvpx on 32-bit Linux. Change-Id: If94e7628a86b788ca26c004861dee2f162e47ed6	2013-06-21 17:03:57 -07:00
James Zern	cc774c8bb0	variance_test: use REGISTER_STATE_CHECK Change-Id: Id54ad9a781634f075e990d5bade5be8490959975	2013-06-21 14:30:08 -07:00
Ronald S. Bultje	1e6a32f1af	SSE2/SSSE3 optimizations and unit test for sub_pixel_avg_variance(). Encoding of bus @ 1500kbps (first 50 frames) goes from 3min57 to 3min35, i.e. approximately a 10.5% speedup. Note that the SIMD versions which use a bilinear filter (x_offset & 7 \|\| y_offset & 7) aren't perfectly interleaved, and can probably be improved further in the future. I've marked this with a few TODOs/FIXMEs in the code. Change-Id: I5c9e900c0f0d32e431a50fecae213b510b2549f9	2013-06-20 15:59:48 -07:00
Ronald S. Bultje	8fb6c58191	Implement sse2 and ssse3 versions for all sub_pixel_variance sizes. Overall speedup around 5% (bus @ 1500kbps first 50 frames 4min10 -> 3min58). Specific changes to timings for each function compared to original assembly-optimized versions (or just new version timings if no previous assembly-optimized version was available): sse2 4x4: 99 -> 82 cycles sse2 4x8: 128 cycles sse2 8x4: 121 cycles sse2 8x8: 149 -> 129 cycles sse2 8x16: 235 -> 245 cycles (?) sse2 16x8: 269 -> 203 cycles sse2 16x16: 441 -> 349 cycles sse2 16x32: 641 cycles sse2 32x16: 643 cycles sse2 32x32: 1733 -> 1154 cycles sse2 32x64: 2247 cycles sse2 64x32: 2323 cycles sse2 64x64: 6984 -> 4442 cycles ssse3 4x4: 100 cycles (?) ssse3 4x8: 103 cycles ssse3 8x4: 71 cycles ssse3 8x8: 147 cycles ssse3 8x16: 158 cycles ssse3 16x8: 188 -> 162 cycles ssse3 16x16: 316 -> 273 cycles ssse3 16x32: 535 cycles ssse3 32x16: 564 cycles ssse3 32x32: 973 cycles ssse3 32x64: 1930 cycles ssse3 64x32: 1922 cycles ssse3 64x64: 3760 cycles Change-Id: I81ff6fe51daf35a40d19785167004664d7e0c59d	2013-06-20 09:34:25 -07:00
James Zern	5b756748fd	tests: clear system state after non-API calls add ClearSystemState() to reset MMX registers avoiding corrupting subsequent tests. Change-Id: I668deb09aa7aa467709776e5819f936910698bc0	2013-06-18 11:32:27 -07:00
Yunqing Wang	f4fcfe3075	Optimize variance functions Added SSE2 version of variance functions for super blocks. Change-Id: Ibeaae8771ca21c99d41dd74067574a51e97b412d	2013-05-22 10:29:38 -07:00
James Zern	1711cf2dbb	add vp8 variance test Change-Id: I4e94ee2c4e2360d6a11a454c323f2899c1bb6f72	2013-02-22 16:25:14 -08:00
John Koleszar	fcccbcbb39	Add vp9_ prefix to all vp9 files Support for gyp which doesn't support multiple objects in the same static library having the same basename. Change-Id: Ib947eefbaf68f8b177a796d23f875ccdfa6bc9dc	2012-11-27 14:12:30 -08:00
John Koleszar	a9c7597adc	support building vp8 and vp9 into a single lib Change-Id: Ib8f8a66c9fd31e508cdc9caa662192f38433aa3d	2012-11-15 10:46:17 -08:00
James Zern	984734436d	Fix variance (signed integer) overflow In the variance calculations the difference is summed and later squared. When the sum exceeds sqrt(2^31) the value is treated as a negative when it is shifted which gives incorrect results. To fix this we force the multiplication to be unsigned. The alternative fix is to shift sum down by 4 before multiplying. However that will reduce precision. For 16x16 blocks the maximum sum is 65280 and sqrt(2^31) is 46340 (and change). This change is based on: `1698234` Missed some variance casts `fea3556` Fix variance overflow Change-Id: I2c61856cca9db54b9b81de83b4505ea81a050a0f	2012-11-06 23:06:44 -08:00

32 Commits