vp9_short_idct10_16x16_add is used to handle the block that only have valid data
at top left 4x4 block. All the other datas are 0. So we could cut many
unnecessary calculations in order to save instructions.
Change-Id: I6e30a3fee1ece5af7f258532416d0bfddd1143f0
vp9_short_idct10_8x8_add is used to handle the block that only have valid data
at top left 4x4 block. All the other datas are 0. So we could cut several
unnecessary calculations in order to save instructions.
Change-Id: I34fda95e29082b789aded97c2df193991c2d9195
Enable SSE2 implementation of high precision 32x32 forward DCT. The
intermediate stacks are of 32-bits. The run-time goes down from
32126 cycles to 13442 cycles.
Change-Id: Ib5ccafe3176c65bd6f2dbdef790bd47bbc880e56
Enable use_x86inc as a commandline option. Fix Bug with sse2 when
x86inc is disabled. Adds Sad asm protection to x86inc protection
Change-Id: Iee0f9dd235ea10e8ace512eb362ba9bebe8c9df6
Support enabling it or disabling it. Moved read out to configure.sh
so that its done once instead of in make and in config.
Change-Id: I73a9190cf31de9f03e8a577f478fa522f8c01c8b
Chromium does not support 32bit builds for Mac which use x86inc.asm.
Make the files which include it work if 64bit or not PIC enabled
starting with vp9_copy_sse2.asm
Consolidate these targets in vp9_rtcd_defs.sh
Change-Id: If18f0b957a611efd085a3ee7d245cf1eb91e8248
The inverse 32x32 transform detects all zero entries and skips the
computations accordingly per 8 rows in the first 1-D operation. The
function vp9_short_idct10_32x32_add performs differently and is not
used anywhere, hence removed.
Change-Id: Ic4fad422debbde7b6b6ffed47c69fbd4268a906c
This commit provides special handle on 16x16 inverse 2D-DCT, where
only DC coefficient is quantized to be non-zero value.
Change-Id: I7bf71be7fa13384fab453dc8742b5b50e77a277c
This commit enables a special handle for the 8x8 inverse 2D-DCT,
where only DC coefficient is quantized to be non-zero. For bus_cif
at 2000 kbps, it provides about 1% speed-up at speed 0.
Change-Id: I2523222359eec26b144cf8fd4c63a4ad63b1b011
Add SSE2 implementation to handle the special case of inverse 2D-DCT
where only DC coefficient is non-zero.
Change-Id: I2c6a59e21e5e77b8cf39a4af5eecf4d5ade32e2f
Call the individually optimized horizontal and vertical functions. This
implementation abuses the temp buffer.
This will be replaced with a custom optimized function.
Over 2x speedup.
Change-Id: I5b908d2a73d264e9810d6022bbff73207a3055dd
This commit enables SSE2 implementation of 16x16 inverse ADST/DCT
hybrid transform. The runtime goes from 5742 cycles -> 1821 cycles.
This provides about 1% encoding speed-up at speed 0.
Change-Id: I1678d0988bf30b9efd524877705bbb3645edb17b
This commit enables SSE2 implementation of 8x8 inverse ADST/DCT
transform. The runtime goes from 1216 cycles -> 266 cycles.
For bus_cif at 2000 kbps, the overall runtime reduces from
253707ms -> 248430ms, i.e., 2% speed-up at speed 0.
Change-Id: Ib0372e17e9162d7b11a10d653b1c8be547c878fb
Super basic conversion from the other implementations. Any changes to
one should be trivial to copy over keep in sync.
Change-Id: I1720b4128e0aba4b2779e3761f6494f8a09d3ea8
Independent horizontal and vertical implementations.
Requires that blocks be built from 4x4 and [xy]_step_q4 == 16
6-10% improvement. CIF improved the least.
Change-Id: I137f5ceae4440adc0960bf88e4453e55a618bcda
Enable SSE2 4x4 inverse ADST/DCT transform. The runtime goes from
292 cycles down to 89 cycles. Running bus_cif at 2000 kbps, the
overall runtime of speed 0 goes from 301s to 295s (2% speed-up).
Change-Id: I24098136e7fee7ab2fbf1c11755bdf2ca37f3628
Where possible, do the 16 pixel wide filter while doing the horizontal
filtering pass. The same approach can be taken for the mbloop_filter
when that's implemented. Doing so on the vertical pass is a little more
involved, but possible.
Change-Id: I010cb505e623464247ae8f67fa25a0cdac091320
This commit enables 16x16 ADST/DCT forward hybrid transform using SSE2
operations. It reduces the runtime from 5433 cycles to 1621 cycles, at
no compression performance loss.
Change-Id: I75fd7f1984e9e28846af459f810ff0d6ae125230