vpx/vp9/common/x86
levytamar82 8f9d94ec17 SSSE3 Optimization for Atom processors using new instruction selection and ordering
The function vp9_filter_block1d16_h8_ssse3 uses the PSHUFB instruction which has a 3 cycle latency and slows execution when done in blocks of 5 or more on Atom processors.
By replacing the PSHUFB instructions with other more efficient single cycle instructions (PUNPCKLBW + PUNPCHBW + PALIGNR) performance can be improved.
In the original code, the PSHUBF uses every byte and is consecutively copied.
This is done more efficiently by PUNPCKLBW and PUNPCHBW, using PALIGNR to concatenate the intermediate result and then shift right the next consecutive 16 bytes for the final result.

For example:
filter = 0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8
Reg = 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
REG1 = PUNPCKLBW Reg, Reg = 0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7
REG2 = PUNPCHBW Reg, Reg = 8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15
PALIGNR REG2, REG1, 1 = 0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8

This optimization improved the function performance by 23% and produced a 3% user level gain on 1080p content on Atom processors.
There was no observed performance impact on Core processors (expected).

Change-Id: I3cec701158993d95ed23ff04516942b5a4a461c0
2014-12-08 13:11:01 -07:00
..
vp9_asm_stubs.c Rename highbitdepth functions to use highbd prefix 2014-10-09 14:40:40 -07:00
vp9_copy_sse2.asm Fix encoder uninitialized read errors reported by drmemory 2014-04-09 09:59:15 -07:00
vp9_high_intrapred_sse2.asm Rename highbitdepth functions to use highbd prefix 2014-10-09 14:40:40 -07:00
vp9_high_loopfilter_intrin_sse2.c Rename highbitdepth functions to use highbd prefix 2014-10-09 14:40:40 -07:00
vp9_high_subpixel_8t_sse2.asm Rename highbitdepth functions to use highbd prefix 2014-10-09 14:40:40 -07:00
vp9_high_subpixel_bilinear_sse2.asm Rename highbitdepth functions to use highbd prefix 2014-10-09 14:40:40 -07:00
vp9_idct_intrin_sse2.c Added high bitdepth sse2 transform functions 2014-12-02 11:16:24 -08:00
vp9_idct_intrin_sse2.h Enable SSSE3 inverse 2D-DCT with 10 non-zero coeffs 2014-05-28 10:53:33 -07:00
vp9_idct_intrin_ssse3.c Fix visual studio 2013 compiler warnings 2014-11-05 13:47:28 -08:00
vp9_idct_ssse3_x86_64.asm Renames x86_64 specific asm files 2014-05-21 13:55:56 -07:00
vp9_intrapred_sse2.asm Fix x86inc.asm to build PIC code correctly 2013-09-18 13:45:46 -07:00
vp9_intrapred_ssse3.asm vp9 ssse3 d207_predictor_32x32: add missing GLOBAL() 2013-11-01 20:33:22 -07:00
vp9_loopfilter_intrin_avx2.c WORKAROUND FIX FOR GCC4.9.1 2014-11-01 11:27:28 -07:00
vp9_loopfilter_intrin_sse2.c FIX: vp9_loopfilter_intrin_sse2.c 2014-09-18 13:09:13 -07:00
vp9_loopfilter_mmx.asm minor spelling cleanup in comments 2014-02-12 16:32:51 -08:00
vp9_postproc_sse2.asm Use lrand48 on Android 2014-06-12 19:57:25 -07:00
vp9_subpixel_8t_intrin_avx2.c Fix bug 804 2014-08-07 15:09:24 -07:00
vp9_subpixel_8t_intrin_ssse3.c Fix decoder mismatch in sub-pixel SSSE3 intrinsic filters 2014-05-23 11:52:20 -07:00
vp9_subpixel_8t_sse2.asm SSE2 8-tap sub-pixel filter optimization 2013-10-10 14:12:47 -07:00
vp9_subpixel_8t_ssse3.asm SSSE3 Optimization for Atom processors using new instruction selection and ordering 2014-12-08 13:11:01 -07:00
vp9_subpixel_bilinear_sse2.asm Optimize bilinear sub-pixel filters in sse2 2014-02-03 10:34:45 -08:00
vp9_subpixel_bilinear_ssse3.asm Optimize bilinear sub-pixel filters in ssse3 2014-02-04 08:01:55 -08:00