Rename updated version of x86inc.asm
Use "private_prefix" instead of "program_name" and make vpx the default
prefix.
Change-Id: I4883a99b2aee8e5dc9f2c16a2e6f4b5d6e4de458
This reverts commit a5e97d874b.
Additionally:
Revert "vpx_convolve_copy_sse2: fix win64"
This reverts commit 22a8474fe7.
This change performs poorly on various x86_64 devices affecting
performance by 1-3% at 1080P. Performance on chromebook like devices was
mixed neutral to slightly negative, so there should be minimal change
there.
Change-Id: I95831233b4b84ee96369baa192a2d4cc7639658c
This function suffers from a couple problems in small core(tablets):
-The load of the next iteration is blocked by the store of previous iteration
-4k aliasing (between future store and older loads)
-current small core machine are in-order machine and because of it the store will spin the rehabQ until the load is finished
fixed by:
- prefetching 2 lines ahead
- unroll copy of 2 rows of block
- pre-load all xmm regiters before the loop, final stores after the loop
The function is optimized by:
copy_convolve_sse2 64x64 - 16%
copy_convolve_sse2 32x32 - 52%
copy_convolve_sse2 16x16 - 6%
copy_convolve_sse2 8x8 - 2.5%
copy_convolve_sse2 4x4 - 2.7%
credit goes to Tom Craver(tom.r.craver@intel.com) and Ilya Albrekht(ilya.albrekht@intel.com)
Change-Id: I63d3428799c50b2bf7b5677c8268bacb9fc29671
It in essence refactors the code for both the interpolation
filtering and the convolution. This change includes the moving
of all the files as well as the changing of the code from vp9_
prefix to vpx_ prefix accordingly, for underneath architectures:
(1) x86;
(2) arm/neon; and
(3) mips/msa.
The work on mips/drsp2 will be done in a separate change list.
Change-Id: Ic3ce7fb7f81210db7628b373c73553db68793c46