8 Commits

Author SHA1 Message Date
Sindre Aamås
1995e03d91 [Processing/x86] Add an SSSE3 implementation of GeneralBilinearFastDownsample
Keep track of relative pixel offsets and utilize pshufb to efficiently
extract relevant pixels for horizontal scaling ratios <= 4.

Fall back to a generic approach for ratios > 4. Note that the generic
approach can be backported to SSE2.

The implementation assumes that data beyond the end of each line,
before the next line begins, can be dirtied; which AFAICT is safe with
the current usage of these routines.

Speedup is ~6.67x/~3.26x (32-bit/64-bit) for horizontal ratios <= 2,
~6.24x/~3.00x for ratios within (2, 4], and ~4.89x/~2.17x for ratios
> 4 when not memory-bound on Haswell as compared with the current SSE2
implementation.
2016-05-23 20:23:31 +02:00
Guangwei Wang
64657d3cfd add new c and assembly functions to optimize downsampler when downscale equal 1:3/1:4 2015-09-11 16:45:40 +08:00
Martin Storsjö
51efa57a3d Convert tabs to spaces in vertically aligned code 2015-06-10 10:21:29 +03:00
Martin Storsjö
43767cddb6 Remove tabs from commented out code 2015-06-10 10:21:21 +03:00
Sijia Chen
4e89e71e8f reformat cpp files for unit tests 2014-10-23 17:54:33 +08:00
ruil2
3ff145e839 rename namespace and funciton name to avoid conflicts with old library 2014-09-17 15:50:59 +08:00
Martin Storsjö
c3710c4130 Avoid warnings about comparison between signed and unsigned
There's no need for these variables to be explicitly unsigned.
This also matches the function above.
2014-08-28 11:13:38 +03:00
zhiliang wang
93af7bfc64 Add UT for Downsample functions. 2014-08-27 15:40:14 +08:00