Commit Graph

3 Commits

Author SHA1 Message Date
James Zern
caecedc92f vp9_subpixel_8t_intrin_avx2: fix build w/clang 3.4+
clang reports gcc-4.2.1 in e.g., 3.3, 3.4; add a specific clang version
check for _mm256_broadcastsi128_si256

fixes issue #720

Change-Id: I5c8e3c27fdea05d8a5b050e8cb74894b595f4709
2014-03-06 10:55:44 -08:00
James Zern
a96af49bab vp9_subpixel_8t_intrin_avx2.c: make some tables static
+ fix formatting

Change-Id: Ia62610bff3d63855104366d7860749b6a3cf4577
2014-02-18 20:40:40 -08:00
levytamar82
876c72a093 AVX2 Convolve Optimization
Two convolve functions were optimized for AVX2:
1. vp9_filter_block1d16_h8
2. vp9_filter_block1d16_v8
vp9_filter_block1d16_v8 was optimized for AVX2 by reducing the number of
loop strides by half, two strides were processed in parallel.
vp9_filter_block1d16_v8 was also optimized in the same way also some of the
loads were being done outside of the loop and by that preventing redundant
loads.
This Optimization gives 43% function level gain and 1.3% user level gain.
Now can be compiled in Windows

Change-Id: I2714124cfb0c14a77d7a0ce126a20db92ffbf92c
2014-02-12 20:45:31 -07:00