Commit Graph

122 Commits

Author SHA1 Message Date
Yunqing Wang
3a0b59e3fd Merge "SSE2 8-tap sub-pixel filter optimization" 2013-10-11 08:44:56 -07:00
Yunqing Wang
3fb728c749 SSE2 8-tap sub-pixel filter optimization
To ensure fast encoding/decoding on devices without ssse3 support,
SSE2 optimization of sub-pixel filters was done. Test using 1080p
clip showed the decoder speeds were ~70fps with ssse3 filters, ~60fps
with sse2 filters, and ~15fps with c filters.

Change-Id: Ie2088f87d83a889fba80a613e4d0e287aadd785c
2013-10-10 14:12:47 -07:00
Dmitry Kovalev
9a1250e3e0 Merge "Moving all scan/iscan code into separate vp9_scan.{h, c} files." 2013-10-10 10:45:07 -07:00
Parag Salasakar
eeb5b62dc1 mips dsp-ase r2 vp9 decoder bilinear convolve optimizations
Change-Id: Ic31b4ef85e65070b4f8b9f26e068ccfaae00c4f0
2013-10-09 18:05:27 +05:30
Dmitry Kovalev
e3597c6af7 Moving all scan/iscan code into separate vp9_scan.{h, c} files.
Now we have entropy code separate from scan/iscan code. The next step
in future is to move iscan code from common part to the encoder.

Change-Id: Id9732f7d80aec00af35c1d58d1137c4c96c91451
2013-10-07 13:55:56 -07:00
Parag Salasakar
40edab5e39 mips dsp-ase r2 vp9 decoder convolve module optimizations
Change-Id: I401536778e3c68ba2b3ae3955c689d005e1f1d59
2013-10-02 16:58:37 -07:00
Dmitry Kovalev
efbacc9f89 Merge "Removing vp9_subpelvar.h from common." 2013-09-29 12:00:46 -07:00
Christian Duvivier
b1b4ba1bdd Properly save neon registers.
Replace current code which corrupts the stack by
duplicate of vp8 code to save and restore neon
registers.

Change-Id: Ibb0220b9aa985d10533befa0a455ebce57a2891a
2013-09-27 14:25:33 -07:00
Christian Duvivier
5b1dc1515f Fix a bunch of TODO from vp9_short_idct32x32_add_neon.
- full ASM version, no more C gateway file.
- integrate combine-add with last step of 2nd pass.
- remove a few push/pop pairs.
- some instruction reordering to hide latency.

Change-Id: Ic9d9933c908b65d1bf7ba8fd47b524cda808c9c6
2013-09-25 21:15:19 -07:00
Dmitry Kovalev
64eff7f360 Removing vp9_subpelvar.h from common.
Moving all code from that file to vp9_variace_c.c in the encoder.

Change-Id: Ic803d5b4c78d5191e4d25541b3df97337878fc3e
2013-09-25 16:10:43 -07:00
James Zern
2d58761993 Revert "Improved 8t filters"
This is incompatible with most toolchains other than gcc.

Revert "Deleted #include <inttypes.h>"

This reverts commit 4d018be950.

This reverts commit d22a504d11.

Change-Id: I1751dc6831f4395ee064e6748281418e967e1dcf
2013-09-13 15:13:06 -07:00
hkuang
86fb12b600 Merge "Add neon optimize iht8x8 which is 282% faster than C." 2013-09-12 15:42:44 -07:00
hkuang
182366c736 Add neon optimize iht8x8 which is 282% faster than C.
Change-Id: I963dd4a6e8671957403ccbb9a16ea7de703e3530
2013-09-12 11:49:05 -07:00
Christian Duvivier
6a501462f8 First draft of vp9_short_idct32x32_add_neon.
Lots of TODO which will be taken care in upcoming changes. As is,
about 6x faster than C version.

Change-Id: Ie2557b72fd2d8edca376dbf400a4d173aa5e63e0
2013-09-11 15:19:38 -07:00
Scott LaVarnway
d22a504d11 Improved 8t filters
Reformatted version of a patch submitted by Erik/Tamar
from Intel.  For the test clips used, the decoder
performance improved by ~2%.

Change-Id: Ifbc37ac6311bca9ff1cfefe3f2e9b7f13a4a511b
2013-09-11 13:56:32 -04:00
hkuang
3c05bda058 Merge "Add neon optimize vp9_short_iht4x4_add." 2013-09-04 13:35:09 -07:00
hkuang
3b8614a8f6 Add neon optimize vp9_short_iht4x4_add.
Change-Id: I42c497b68ae1ee645b59c9968ad805db0a43e37e
2013-09-04 12:37:58 -07:00
Jim Bankoski
79401542f7 make vp9 postproc a config option
Vp9 postproc is disabled for now as its not been shown to help and
may be merged with vp8.

Change-Id: I25620d6cd34c6e10331b18c7b5ef7482e39c6057
2013-09-04 10:02:08 -07:00
hkuang
3a679e56b2 Add neon optimize vp9_short_idct16x16_1_add.
Change-Id: Ib9354c1d975d03e8081df20d50b6a77dfe2dc7e5
2013-08-27 14:00:27 -07:00
hkuang
36e9b82080 Add neon optimize vp9_short_idct8x8_1_add.
Change-Id: I0b15d5e3b0eb97abb9ab5ec08e88b61f8723aaf4
2013-08-26 16:28:57 -07:00
hkuang
69384f4fad Add neon optimize vp9_short_idct4x4_1_add.
Change-Id: I6ecb5c4a1a472feb8e84e9f3352b536d5e28a4a5
2013-08-26 15:55:16 -07:00
Johann
a9aa7d07d0 Merge "vp9: neon: add vp9_convolve_avg_neon" 2013-08-15 14:55:15 -07:00
Johann
63e140eaa7 Merge "vp9: neon: add vp9_convolve_copy_neon" 2013-08-15 14:55:08 -07:00
hkuang
39f42c8713 Merge "Add neon optimize vp9_short_idct16x16_add." 2013-08-14 14:16:20 -07:00
hkuang
cf6beea661 Add neon optimize vp9_short_idct16x16_add.
Change-Id: I27134b9a5cace2bdad53534562c91d829b48838d
2013-08-14 13:52:16 -07:00
Mans Rullgard
0f1deccf86 vp9: neon: add vp9_convolve_avg_neon
Change-Id: I33cff9ac4f2234558f6f87729f9b2e88a33fbf58
2013-08-14 16:27:55 +01:00
Mans Rullgard
635ba269be vp9: neon: add vp9_convolve_copy_neon
Change-Id: I15adbbda15d1842e9f15f21878a5ffbb75c3c0c9
2013-08-14 16:27:55 +01:00
Dmitry Kovalev
8ffe85ad00 Moving scale_factors and related code to separate files.
Change-Id: I531829e5aee2a4a7a112d528ecccbddf052d0e74
2013-08-09 14:07:09 -07:00
Christian Duvivier
78182538d6 Neon version of vp9_short_idct4x4_add.
Change-Id: Idec4cae0cb9b3a29835fd2750d354c1393d47aa4
2013-08-06 18:41:27 -07:00
Jim Bankoski
6eb1254b88 sse3 intrapred x86inc protected
Change-Id: I4a3c83119cdf8a205920034c8019d855d5504605
2013-08-06 14:17:13 -07:00
Jim Bankoski
25ec1375c9 intrapred x86inc guards
Change-Id: If0399d8e11f4ebe75a5c91abb8d6a52a7709065b
2013-08-06 09:39:30 -07:00
Jim Bankoski
c3809f3de5 Begin to restrict x86inc.asm usage
Chromium does not support 32bit builds for Mac which use x86inc.asm.
Make the files which include it work if 64bit or not PIC enabled
starting with vp9_copy_sse2.asm

Consolidate these targets in vp9_rtcd_defs.sh

Change-Id: If18f0b957a611efd085a3ee7d245cf1eb91e8248
2013-08-05 12:07:30 -07:00
Mans Rullgard
d85ae87183 vp9: neon: add vp9_mb_lpf_* functions
Change-Id: I13e0880df234f15abc4cc7c57fe84488d5d46a75
2013-08-02 08:10:50 -07:00
hkuang
d757de744c Add neon optimize vp9_short_idct8x8_add.
Change-Id: Ic32acf3e2939c6d12d9c2bf192a5f5da59705fda
2013-07-18 16:40:41 -07:00
Johann
9ca66ec050 Merge "vp9_convolve8_neon placeholder" 2013-07-17 10:09:00 -07:00
Johann
59dc4e9cdd vp9_convolve8_neon placeholder
Call the individually optimized horizontal and vertical functions. This
implementation abuses the temp buffer.

This will be replaced with a custom optimized function.

Over 2x speedup.

Change-Id: I5b908d2a73d264e9810d6022bbff73207a3055dd
2013-07-17 08:39:27 -07:00
James Zern
98e132bde0 Merge changes I40454d26,I892e76d5,I865ab3f9,I4a4bec17,I61c4351e,I37eb3559,I1031c556,I8c8f1f42
* changes:
  delete vp9_loopfilter_sse2.asm
  vp9_loopfilter_intrin_sse2: cosmetics: fix indent
  delete x86/vp9_loopfilter_x86.h
  vp9_loopfilter_intrin_sse2: make some funcs static
  vp9_loopfilter_intrin_sse2: remove unused uv funcs
  vp9_loopfilter: remove uv function typedef
  filter_block_plane: reuse some constants
  vp9_loopfilter.c: make some functions static
2013-07-16 14:25:32 -07:00
James Zern
50015f6eba delete vp9_loopfilter_sse2.asm
sse2 functions are provided by vp9_loopfilter_intrin_sse2.c

Change-Id: I40454d26034e3ef915eeaf889937fe7d1b519b9b
2013-07-16 13:09:16 -07:00
James Zern
af58254267 delete x86/vp9_loopfilter_x86.h
also remove prototype_loopfilter{,_block} defines from vp9_loopfilter.h

Change-Id: I865ab3f9436c7b1ca166f76630328abf01389405
2013-07-16 13:09:05 -07:00
Dmitry Kovalev
baf0c959c7 Moving vp9_kf_default_bmode_probs to vp9_entropymode.c.
Removing vp9_modelcontext.c.

Change-Id: If2316c58dead2708d9f95b52d9494ba4c1dd7427
2013-07-16 10:54:34 -07:00
Johann
a15bebfc0a vp9_convolve8_[horiz|vert]_avg
Super basic conversion from the other implementations. Any changes to
one should be trivial to copy over keep in sync.

Change-Id: I1720b4128e0aba4b2779e3761f6494f8a09d3ea8
2013-07-12 16:21:33 -07:00
Johann
158c80cbb0 convolve8 optimizations for neon
Independent horizontal and vertical implementations.

Requires that blocks be built from 4x4 and [xy]_step_q4 == 16

6-10% improvement. CIF improved the least.

Change-Id: I137f5ceae4440adc0960bf88e4453e55a618bcda
2013-07-11 11:08:19 -07:00
hkuang
c9b25dcae4 Add neon optimize vp9_dc_only_idct_add.
Change-Id: Iae84ab945cc9662a0ddd839aa2b9ca59f2ae5423
2013-07-11 10:30:47 -07:00
Ronald S. Bultje
decead7336 Replace copy_memNxM functions with a generic copy/avg function.
Change-Id: I3ce849452ed4f08527de9565a9914d5ee36170aa
2013-07-10 18:27:24 -07:00
Ronald S. Bultje
3f210f10eb Remove unused iwalsh4x4 MMX/SSE2 functions.
Change-Id: I2d22577911a37ed7d8c7e08cac20764842267652
2013-07-10 14:52:47 -07:00
Ronald S. Bultje
48c53233fd Remove unused 16x3/3x16 sad SSE2 functions.
Change-Id: I30a597c0cc366e34c9a3e2afe32d70e044f95ca4
2013-07-10 14:52:47 -07:00
Ronald S. Bultje
e6f955251f Merge "SSSE3 assembly for 4x4/8x8/16x16/32x32 H intra prediction." 2013-07-10 14:52:23 -07:00
Ronald S. Bultje
89810bfd71 Merge "SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 DC intra prediction." 2013-07-10 10:13:16 -07:00
Dmitry Kovalev
20986c81b3 Merge "Removing vp9_maskingmv.c and corresponding assembly file." 2013-07-10 10:05:06 -07:00
Ronald S. Bultje
7fd643264a SSSE3 assembly for 4x4/8x8/16x16/32x32 H intra prediction.
Change-Id: Iad70966b986f65259329070e258f76ef0af816b4
2013-07-10 09:28:03 -07:00