159 Commits

Author SHA1 Message Date
Yunqing Wang
e731b2ba2c Merge "Improve vp9_idct4x4_1_add_sse2" 2013-11-08 12:00:36 -08:00
James Zern
2d980b803a vp9 ssse3 d207_predictor_32x32: add missing GLOBAL()
removes a textrel for sh_b23456789abcdefff

Change-Id: I80cb9dfd8e49a0fe884c8ff76472275b3a00cb57
2013-11-01 20:33:22 -07:00
Tamar Levy
54f9205653 mb_lpf_horizontal_edge AVX2 optimization
This CL contains two AVX2 optimized loop filter functions,
mb_lpf_horizontal_edge_w_avx2_8 and mb_lpf_horizontal_edge_w_avx2_16.

Change-Id: I604e4fe6e99752b7800c2ea98721d97f7e0b931b
2013-10-31 10:26:15 -06:00
Yunqing Wang
47665452f0 Merge "Add 32x32 idct function for eob<=34 case" 2013-10-25 09:34:46 -07:00
Yunqing Wang
f88315cb29 Add 32x32 idct function for eob<=34 case
When only upper-left 8x8 area has non-zero dct coefficients, we
could skip 1D IDCT for 9th to 32th rows to save operations. This
function is called when eob <= 34.

Change-Id: I9684b75947bdde346cfe3720f08a953aa7a13fb5
2013-10-24 16:13:21 -07:00
Dmitry Kovalev
fa143dbc8e Renaming vp9_short_fdct8x8 to vp9_fdct8x8.
For consistency with idct function names.

Change-Id: I7b6af2f92c66eff56f84ed29edc3a66af8dc421f
2013-10-23 10:52:33 -07:00
Abo Talib Mahfoodh
908a992d7f Improve vp9_idct4x4_1_add_sse2
Simple modification to reduce number of cycles in the
function.
Original function number of cycles: 973
Modified function number of cycles: 835
Improvment factor: 1.165

Tested with: park_joy_420_720p50.y4m

Change-Id: Ic5857272ea3aafe21d5ef9a69258d78c688f69bd
2013-10-22 09:35:36 -04:00
Yunqing Wang
dd51042802 Fix d207 intra prediction SSSE3 functions
This patch fixed a bug that caused 32bit PIC build mismatch. The
stack pointer was modified after "GET_GOT". Loading left pointer
from a hard-coded position gave wrong result.

Change-Id: Iea0aec6f917b12a6b3393ffc986bad74510248cc
2013-10-18 17:00:18 -07:00
Jingning Han
bf187d1b2d Merge "Fix a few indent format issues in buffer defs" 2013-10-15 16:23:50 -07:00
Jingning Han
0a66541619 Fix a few indent format issues in buffer defs
Change-Id: Iac55891ac9e6f13718c9f822aa099b5ca491832a
2013-10-15 11:51:09 -07:00
Dmitry Kovalev
65f118d72f Making input pointer of any inverse transform constant.
Also renaming dest_stride to stride in some places.

Change-Id: I75f602b623a5a7071d4922b747c45fa0b7d7a940
2013-10-11 18:27:12 -07:00
Dmitry Kovalev
7ef573914d Consistent names for inverse hybrid transforms (1 of 2).
Renames:
  vp9_short_iht4x4_add     -> vp9_iht4x4_16_add
  vp9_short_iht8x8_add     -> vp9_iht8x8_64_add
  vp9_short_iht16x16_add_c -> vp9_iht16x16_256_add

Change-Id: Ibca7a188fd062b196787ac5efc1ea545e7f166c0
2013-10-11 13:31:32 -07:00
Dmitry Kovalev
9c8f3063b1 Merge "Removing vp9_idct4_1d_sse2 function." 2013-10-11 10:43:56 -07:00
Yunqing Wang
57b97b56f6 Code cleanup
Minor code cleanup.

Change-Id: I47c1f794842d4570bb39cfd23b80f54f5606bba6
2013-10-11 09:08:41 -07:00
Yunqing Wang
3a0b59e3fd Merge "SSE2 8-tap sub-pixel filter optimization" 2013-10-11 08:44:56 -07:00
Dmitry Kovalev
ddf1b76205 Removing vp9_idct4_1d_sse2 function.
We have two SSE2-optimized functions for idct4_1d:
  vp9_idct4_1d_sse2 <-- removing this one
  idct4_1d_sse2

vp9_idct4_1d_sse2 was used only by the following functions which already
have SSE2 optimized variants:
  vp9_idct4x4_16_add_c   -> vp9_idct4x4_16_add_see2
  idct8_1d               -> vp9_idct8x8_{16, 10, 1}_see2
  vp9_short_iht4x4_add_c -> vp9_short_iht4x4_add_see2

Change-Id: Ib0a7f6d1373dbaf7a4a41208cd9d0671fdf15edb
2013-10-10 16:50:43 -07:00
Scott LaVarnway
83936e8cd5 d207 intra prediction ssse3 using bytes
byte version of ronalds d207 ssse3 optimizations
(commit: f891f84d3ba9345b0074e682f0fea09b8ddf4f1e)

Change-Id: If15f71a589ea16f78ac86a501b0c5c6231dc9af1
2013-10-10 15:50:31 -07:00
Dmitry Kovalev
2be3b84aed Merge "Giving consistent names to IDCT 32x32 functions." 2013-10-10 15:31:25 -07:00
Yunqing Wang
86528586a3 Merge "d153 intra prediction (32x32) ssse3 using bytes" 2013-10-10 15:16:45 -07:00
Yunqing Wang
3fb728c749 SSE2 8-tap sub-pixel filter optimization
To ensure fast encoding/decoding on devices without ssse3 support,
SSE2 optimization of sub-pixel filters was done. Test using 1080p
clip showed the decoder speeds were ~70fps with ssse3 filters, ~60fps
with sse2 filters, and ~15fps with c filters.

Change-Id: Ie2088f87d83a889fba80a613e4d0e287aadd785c
2013-10-10 14:12:47 -07:00
Dmitry Kovalev
1e766b50e2 Giving consistent names to IDCT 32x32 functions.
Renames:
  vp9_short_idct32x32_add   -> vp9_idct32x32_1024_add
  vp9_short_idct32x32_1_add -> vp9_idct32x32_1_add
  vp9_idct_add_32x32        -> vp9_idct32x32_add

Change-Id: Id85306f5814bac6c47463a6b5901a93082510666
2013-10-10 11:27:39 -07:00
Dmitry Kovalev
b096c5a336 Giving consistent names to IDCT 16x16 functions.
Renames:
  vp9_short_idct16x16_add    -> vp9_idct16x16_256_add
  vp9_short_idct16x16_10_add -> vp9_idct16x16_10_add
  vp9_short_idct16x16_1_add  -> vp9_idct16x16_1_add
  vp9_idct_add_16x16         -> vp9_idct16x16_add

Change-Id: Ief8a3904de78deab0f4ede944c4d0339c228cfc3
2013-10-07 14:31:10 -07:00
Dmitry Kovalev
2ae93a776b Merge "Giving consistent names to IDCT 8x8 functions." 2013-10-07 14:19:50 -07:00
Scott LaVarnway
a2a3b4a479 d153 intra prediction (32x32) ssse3 using bytes
Change-Id: Ie2c0d84ff9f6294084d65f4380e1f30c09e681c9
2013-10-07 11:21:10 -04:00
Jim Bankoski
bf893e84bd Merge changes I8a106dd6,Iec442603
* changes:
  d153 intra prediction (16x16) ssse3 using bytes
  d153 intra prediction ssse3 using bytes
2013-10-06 20:11:24 -07:00
Dmitry Kovalev
c6ad70d5f1 Giving consistent names to IDCT 8x8 functions.
Renames:
  vp9_short_idct8x8_add    -> vp9_idct8x8_64_add
  vp9_short_idct8x8_1_add  -> vp9_idct8x8_1_add
  vp9_short_idct8x8_10_add -> vp9_idct8x8_10_add
  vp9_idct_add_8x8         -> vp9_idct8x8_add

Change-Id: Ifb8d3a45b4c0397aa805b30463f3d14581bf72c1
2013-10-06 00:24:09 -07:00
Dmitry Kovalev
3a0602578e Giving consistent names to IDCT/IWHT functions.
The idea is to have the following names for each transform size:

vp9_idct4x4_add
  vp9_idct4x4_1_add
  vp9_idct4x4_10_add
  vp9_idct4x4_16_add

vp9_idct8x8_add
  vp9_idct8x8_1_add
  vp9_idct8x8_10_add
  vp9_idct8x8_64_add

etc for 16x16, 32x32

The actual list of renames in this patch:

vp9_idct_add_lossless     -> vp9_iwht4x4_add
vp9_short_iwalsh4x4_add   -> vp9_iwht4x4_16_add
vp9_short_iwalsh4x4_1_add -> vp9_iwht4x4_1_add

vp9_idct_add            -> vp9_idct4x4_add
vp9_short_idct4x4_add   -> vp9_idct4x4_16_add
vp9_short_idct4x4_1_add -> vp9_idct4x4_1_add

Change-Id: I6f43f7437c68dd30cdd05d72e213765578ed30b1
2013-10-04 14:17:06 -07:00
Yunqing Wang
134dfea878 Merge "Rewrite HORIZx4 and HORIZx8 in subpixel filter functions" 2013-10-03 12:17:47 -07:00
Yunqing Wang
ed22179a82 Rewrite HORIZx4 and HORIZx8 in subpixel filter functions
In subpixel filters, prefetched source data, unrolled loops,
and interleaved instructions.

In HORIZx4, integrated the idea in Scott's CL (commit:
d22a504d11a15dc3eab666859db0046b5a7d75c5), which was suggested by
Erik/Tamar from Intel. Further tweaking was done to combine row 0,
2, and row 1, 3 in registers to do more 2-row-in-1 operations until
the last add.

Test showed a ~2% decoder speedup.

Change-Id: Ib53d04ede8166c38c3dc744da8c6f737ce26a0e3
2013-10-03 09:04:02 -07:00
Scott LaVarnway
20a09d928a d153 intra prediction (16x16) ssse3 using bytes
Change-Id: I8a106dd61b0a2520fae792d87d6348e662649b2d
2013-10-02 16:34:05 -04:00
Dmitry Kovalev
3c4e9e341f Adding SSE2 optimized vp9_short_idct32x32_1_add function.
Change-Id: I4b1c6bb9ff615f5872b96ed07dbf0f5e18e63643
2013-10-01 18:34:36 -07:00
Yunqing Wang
03698aa6d8 Merge "Modify HORIZx16 macro in subpixel filter functions" 2013-10-01 14:18:10 -07:00
Yunqing Wang
df8e156432 Modify HORIZx16 macro in subpixel filter functions
Interleaved the instructions, reduced register dependency, and
prefetched the source data. This improved the decoder speed
by 0.6% - 2%.

Change-Id: I568067aa0c629b2e58219326899c82aedf7eccca
2013-10-01 12:49:25 -07:00
Scott LaVarnway
27b390e1a1 d153 intra prediction ssse3 using bytes
byte version of ronalds d153 ssse3 optimizations for
4x4 and 8x8
(commit: fc91a2a112238a1aee568f3b840585de4e928fca)

Change-Id: Iec4426032311483f615fd9e0dceba3ee85ddebd7
2013-10-01 09:05:20 -04:00
Jim Bankoski
152fd59964 fixed cpp lint issue in vp9_postproc_x86
Change-Id: I2b2af1dd9f5c29c05e28a4fd51fa58ccc4071477
2013-09-29 18:44:58 -07:00
Jim Bankoski
ec421b7810 nolintify intrinsic idct file
Change-Id: Id2cc5c829399a2afdf7a8a82615a4e272c814986
2013-09-29 18:42:24 -07:00
Dmitry Kovalev
3fab2125ff Renaming vp9_short_idct10_8x8_add to vp9_short_idct8x8_10_add.
Making name consistent with vp9_short_idct8x8 and vp9_short_idct8x8_1.

Change-Id: I99e0be040ec893f9571dcf090e18f98dc58339f5
2013-09-27 15:26:27 -07:00
Dmitry Kovalev
db60c02c9e Merge "Renaming vp9_short_idct10_16x16 to vp9_short_idct16x16_10." 2013-09-27 13:08:52 -07:00
Dmitry Kovalev
15a36a0a0d Renaming vp9_short_idct10_16x16 to vp9_short_idct16x16_10.
Making function name consistent with vp9_short_idct16x16 and
vp9_short_idct16x16_1.

Change-Id: I70e54be9e6b9a1dddab0de470686591e96d05517
2013-09-26 14:01:25 -07:00
Scott LaVarnway
208658490c d63 intra prediction ssse3 using bytes
byte version of ronalds d63 ssse3 optimizations
(commit: c5a1c8cf3541cf3665fee981b36d22c9fbd4191e)

Change-Id: Ifd3e6d454a2246085f23eabb38518a930321e807
2013-09-25 16:16:44 -04:00
Yunqing Wang
9d901217c6 Fix x86inc.asm to build PIC code correctly
Current x86inc.asm didn't handle 32bit PIC build properly.
TEXTRELs were seen in the library built. The PIC macros from
libvpx's x86_abi_support.asm was used to fix this problem.
The assembly code was modified to use the macros.

Notes: We need this fix in for decoder building. Functions in
encoder will be fixed later.

Change-Id: Ifa548d37b1d0bc7d0528db75009cc18cd5eb1838
2013-09-18 13:45:46 -07:00
James Zern
2d58761993 Revert "Improved 8t filters"
This is incompatible with most toolchains other than gcc.

Revert "Deleted #include <inttypes.h>"

This reverts commit 4d018be950ef8b056a7c797a22ee58012443df26.

This reverts commit d22a504d11a15dc3eab666859db0046b5a7d75c5.

Change-Id: I1751dc6831f4395ee064e6748281418e967e1dcf
2013-09-13 15:13:06 -07:00
Paul Wilkins
4d018be950 Deleted #include <inttypes.h>
This seems not to be needed and is not supported
in the Windows build.

Change-Id: Iaca3bbf8cca283aee6bc336cb31ba9dd4610322b
2013-09-12 13:43:07 +01:00
Scott LaVarnway
d22a504d11 Improved 8t filters
Reformatted version of a patch submitted by Erik/Tamar
from Intel.  For the test clips used, the decoder
performance improved by ~2%.

Change-Id: Ifbc37ac6311bca9ff1cfefe3f2e9b7f13a4a511b
2013-09-11 13:56:32 -04:00
Scott LaVarnway
22dc946a7e Improved mb_lpf_horizontal_edge_w_sse2_8
This patch is a reformatted version of optimizations done by
engineers at Intel (Erik/Tamar) who have been providing
performance feedback for VP9.  For the test clips used (720p, 1080p),
up to 1.2% performance improvement was seen.

Change-Id: Ic1a7149098740079d5453b564da6fbfdd0b2f3d2
2013-08-29 08:30:17 -04:00
Jingning Han
9d67495f72 Optimize 32x32 2D inverse DCT for speed-up
This commit exploits the sparsity of quantized coefficient matrix.
It detects each 32x8 array and skip the corresponding inverse
transformation if all entries are zero.

For ped1080p at 8000 kbps, this on average reduces the runtime of
32x32 inverse 2D-DCT SSE2 function from 6256 cycles -> 5200
cycles. It makes the overall encoding process about 2% faster at
speed 0. The speed-up is more pronounceable for the decoding process.

Change-Id: If20056c3566bd117642a76f8884c83e8bc8efbcf
2013-07-31 17:13:31 -07:00
Jingning Han
a7c4de22e1 16x16 inverse 2D-DCT with DC only
This commit provides special handle on 16x16 inverse 2D-DCT, where
only DC coefficient is quantized to be non-zero value.

Change-Id: I7bf71be7fa13384fab453dc8742b5b50e77a277c
2013-07-29 14:45:53 -07:00
Ronald S. Bultje
6f3054b65d Merge "d45 intra prediction SSSE3 optimizations." 2013-07-26 17:21:09 -07:00
Jingning Han
325e0aa650 Special handle on DC only inverse 8x8 2D-DCT
This commit enables a special handle for the 8x8 inverse 2D-DCT,
where only DC coefficient is quantized to be non-zero. For bus_cif
at 2000 kbps, it provides about 1% speed-up at speed 0.

Change-Id: I2523222359eec26b144cf8fd4c63a4ad63b1b011
2013-07-26 14:16:51 -07:00
Ronald S. Bultje
94b0c6791d d45 intra prediction SSSE3 optimizations.
Change-Id: Ie48035ff4f93c41f8a9b3023e6444fd10432d8fb
2013-07-26 13:30:02 -07:00