52dac5d1cb
Optimizing 2 functions to process 32 elements in parallel instead of 16: 1. vp9_sub_pixel_variance64x64 2. vp9_sub_pixel_variance32x32 both of those function were calling vp9_sub_pixel_variance16xh_ssse3 instead of calling that function, it calls vp9_sub_pixel_variance32xh_avx2 that is written in avx2 and process 32 elements in parallel. This Optimization gave 70% function level gain and 2% user level gain Change-Id: I4f5cb386b346ff6c878a094e1c3b37e418e50bde |
||
---|---|---|
.. | ||
common | ||
decoder | ||
encoder | ||
exports_dec | ||
exports_enc | ||
vp9_common.mk | ||
vp9_cx_iface.c | ||
vp9_dx_iface.c | ||
vp9_iface_common.h | ||
vp9cx.mk | ||
vp9dx.mk |