Johann
c3bdffb0a5
Move variance functions to vpx_dsp
...
subpel functions will be moved in another patch.
Change-Id: Idb2e049bad0b9b32ac42cc7731cd6903de2826ce
2015-05-26 12:01:52 -07:00
James Zern
43d5cc7fe1
vp9_variance_sse2: sync function signatures
...
+ include vp9_rtcd.h
silences missing prototype warnings
Change-Id: I77902f07a454029baad4fe5fe6fc37c65644e6f7
2015-05-15 10:43:47 -07:00
Yunqing Wang
10d5e09c87
Fix issues in 32bit PIC enabled build
...
This patch was to fix issue 924:
https://code.google.com/p/webm/issues/detail?id=924
The SECTION_RODATA macro was modified to support macho32 format.
The sub-pixel functions were modified to pass in 2 more parameters
to handle the global offsets for PIC build.
Change-Id: I3bfcd336bcae945edf300bca4ab40376a2628cd4
2015-01-27 22:20:21 -08:00
Dmitry Kovalev
1f19ebbab6
Replacing vp9_get_mb_ss_sse2 asm implementation with intrinsics.
...
Change-Id: Ib4f5dd733eb2939b108070a01e83da5d9990bac0
2014-09-06 00:10:25 -07:00
Dmitry Kovalev
48197f0a70
Adding sse2 variant for vp9_mse{8x8, 8x16, 16x8}.
...
Change-Id: I6786d25ce4f32b8d8912f2d239a45ca15b310c4b
2014-09-03 19:02:14 -07:00
Dmitry Kovalev
6f6bd282c9
Replacing asm 16x16 variance calculation with intrinsics.
...
New code is 20% faster for 64-bit and 15% faster for 32-bit. Compiled
using clang.
Change-Id: Icfea461238411001fd093561293dbfedfbf8d0bb
2014-09-02 13:54:34 -07:00
Dmitry Kovalev
0b721db543
Replacing asm 8x8 variance calculation with intrinsics.
...
New code is 10% faster for 64-bit and 25% faster for 32-bit. Compiled
using clang.
Change-Id: I8ba1544c30dd6f3ca479db806384317549650dfc
2014-08-29 17:28:31 -07:00
Dmitry Kovalev
dcac083cf3
Implementing 4x4 variance calculation with SSE2.
...
New SSE2 function is three times faster than MMX one.
Change-Id: I4f387ce9f75b88379176ec7bdc62d86eb5f70fbe
2014-08-28 15:01:16 -07:00
Dmitry Kovalev
a789bfec87
Cleaning up vp9_variance_sse2.c.
...
Change-Id: I5ec336848f6489c31cf2b645026fa2025db07466
2014-05-27 13:53:19 -07:00
Dmitry Kovalev
72ab966d5e
Removing vp9_pragmas.h.
...
Change-Id: I9120a87e27e73e496932d11716937e2fad246521
2014-05-22 13:46:31 -07:00
Dmitry Kovalev
94f5491c46
Removing half-variance asm functions which are not used.
...
Corresponding C functions were removed in
I99695564a3aa9bc8c79ac0a551d257e2ff3ad3c3
Change-Id: I50a5575065a7a9e41904eb2161afd739def927db
2014-04-30 12:21:54 -07:00
Dmitry Kovalev
6e01079cc0
Removing unused vp9_variance_halfpixvar*() functions.
...
Change-Id: I99695564a3aa9bc8c79ac0a551d257e2ff3ad3c3
2014-04-25 11:50:07 -07:00
Yaowu Xu
5511968f21
Removed several unused functions.
...
Change-Id: Ib9e27298c575afc02a98b593bc6ad60762064d9b
2014-03-17 14:09:29 -07:00
Jim Bankoski
25ecb1f0b3
cpplint vp9_variance_sse2.c
...
Change-Id: Ifce8f5b57a1ea8952e8a67c5b92a127a061899fa
2013-10-04 14:15:06 -07:00
Jim Bankoski
5b307886fb
variance x86inc guards
...
also fixed bug in sad calcs
Change-Id: I6571fcbe37556c16ae32be66dc0fd879852aac1d
2013-08-06 14:17:13 -07:00
Ronald S. Bultje
1e6a32f1af
SSE2/SSSE3 optimizations and unit test for sub_pixel_avg_variance().
...
Encoding of bus @ 1500kbps (first 50 frames) goes from 3min57 to
3min35, i.e. approximately a 10.5% speedup. Note that the SIMD versions
which use a bilinear filter (x_offset & 7 || y_offset & 7) aren't
perfectly interleaved, and can probably be improved further in the
future. I've marked this with a few TODOs/FIXMEs in the code.
Change-Id: I5c9e900c0f0d32e431a50fecae213b510b2549f9
2013-06-20 15:59:48 -07:00
Ronald S. Bultje
8fb6c58191
Implement sse2 and ssse3 versions for all sub_pixel_variance sizes.
...
Overall speedup around 5% (bus @ 1500kbps first 50 frames 4min10 ->
3min58). Specific changes to timings for each function compared to
original assembly-optimized versions (or just new version timings if
no previous assembly-optimized version was available):
sse2 4x4: 99 -> 82 cycles
sse2 4x8: 128 cycles
sse2 8x4: 121 cycles
sse2 8x8: 149 -> 129 cycles
sse2 8x16: 235 -> 245 cycles (?)
sse2 16x8: 269 -> 203 cycles
sse2 16x16: 441 -> 349 cycles
sse2 16x32: 641 cycles
sse2 32x16: 643 cycles
sse2 32x32: 1733 -> 1154 cycles
sse2 32x64: 2247 cycles
sse2 64x32: 2323 cycles
sse2 64x64: 6984 -> 4442 cycles
ssse3 4x4: 100 cycles (?)
ssse3 4x8: 103 cycles
ssse3 8x4: 71 cycles
ssse3 8x8: 147 cycles
ssse3 8x16: 158 cycles
ssse3 16x8: 188 -> 162 cycles
ssse3 16x16: 316 -> 273 cycles
ssse3 16x32: 535 cycles
ssse3 32x16: 564 cycles
ssse3 32x32: 973 cycles
ssse3 32x64: 1930 cycles
ssse3 64x32: 1922 cycles
ssse3 64x64: 3760 cycles
Change-Id: I81ff6fe51daf35a40d19785167004664d7e0c59d
2013-06-20 09:34:25 -07:00
Yunqing Wang
f4fcfe3075
Optimize variance functions
...
Added SSE2 version of variance functions for super blocks.
Change-Id: Ibeaae8771ca21c99d41dd74067574a51e97b412d
2013-05-22 10:29:38 -07:00
Ronald S. Bultje
a788e0fe63
Add sse2 versions of sub_pixel_variance{32x32,64x64}.
...
7.5% faster overall encoding.
Change-Id: Ie9bb7f9fdf93659eda106404cb342525df1ba02f
2013-02-06 11:20:59 -08:00
John Koleszar
4a4d2aa55c
vp9_bilinear_filters_mmx: add missing extern specifiers
...
Change-Id: Ibabf18947f90cb4f45052763ebf44cfb8209bd8b
2012-12-05 08:27:48 -08:00
John Koleszar
fcccbcbb39
Add vp9_ prefix to all vp9 files
...
Support for gyp which doesn't support multiple objects in the same
static library having the same basename.
Change-Id: Ib947eefbaf68f8b177a796d23f875ccdfa6bc9dc
2012-11-27 14:12:30 -08:00