3f31aff4dc
Use a combination of table lookups and pshufb to convert coefficients to zero run/level format. Two 16-entry lookup tables are used for a total of 192 bytes worth of tables. (The existing SSE2 version uses a table of size 2048 bytes.) Speedup is ~1.5x-3x as compared with the SSE2 version on Haswell (the speedup is greater for input with many trailing zeros). The use of popcnt makes it require SSE4.2. This can be replaced with a small LUT and accumulation which would reduce the requirement to SSSE3. |
||
---|---|---|
.. | ||
api | ||
build | ||
common | ||
decoder | ||
encoder | ||
encoder_binary_comparison | ||
processing | ||
utils | ||
BaseDecoderTest.h | ||
BaseEncoderTest.h | ||
sha1.h | ||
test_stdint.h |