Commit Graph

241 Commits

Author SHA1 Message Date
hkuang
86fb12b600 Merge "Add neon optimize iht8x8 which is 282% faster than C." 2013-09-12 15:42:44 -07:00
hkuang
182366c736 Add neon optimize iht8x8 which is 282% faster than C.
Change-Id: I963dd4a6e8671957403ccbb9a16ea7de703e3530
2013-09-12 11:49:05 -07:00
Christian Duvivier
6a501462f8 First draft of vp9_short_idct32x32_add_neon.
Lots of TODO which will be taken care in upcoming changes. As is,
about 6x faster than C version.

Change-Id: Ie2557b72fd2d8edca376dbf400a4d173aa5e63e0
2013-09-11 15:19:38 -07:00
Jingning Han
1c263d6918 Merge "Use saturated addition in SSSE3 of 32x32 quant" 2013-09-05 14:09:40 -07:00
Jingning Han
458c2833c0 Use saturated addition in SSSE3 of 32x32 quant
The 32x32 forward transform can potentially reach peak coefficient
value close to 32700, while the rounding factor can go upto 610.
This could cause overflow issue in the SSSE3 implementation of 32x32
quantization process.

This commit resolves this issue by replacing the addition operations
with saturated addition operations in 32x32 block quantization.

Change-Id: Id6b98996458e16c5b6241338ca113c332bef6e70
2013-09-05 12:49:12 -07:00
hkuang
3c05bda058 Merge "Add neon optimize vp9_short_iht4x4_add." 2013-09-04 13:35:09 -07:00
hkuang
3b8614a8f6 Add neon optimize vp9_short_iht4x4_add.
Change-Id: I42c497b68ae1ee645b59c9968ad805db0a43e37e
2013-09-04 12:37:58 -07:00
Jim Bankoski
bb2313db28 Merge "make vp9 postproc a config option" 2013-09-04 10:35:26 -07:00
Jim Bankoski
79401542f7 make vp9 postproc a config option
Vp9 postproc is disabled for now as its not been shown to help and
may be merged with vp8.

Change-Id: I25620d6cd34c6e10331b18c7b5ef7482e39c6057
2013-09-04 10:02:08 -07:00
Jingning Han
3cf46fa591 Fix 32x32 forward transform SSE2 version
This commit fixed the potential overflow issue in the SSE2
implementation of 32x32 forward DCT. It resolved the corrupted
coded frames in the border of scenes.

Change-Id: If87eef2d46209269f74ef27e7295b6707fbf56f9
2013-08-31 18:47:08 -07:00
Jingning Han
c86c5443eb Merge "Fix overflow issue in SSSE3 32x32 quantization" 2013-08-29 16:49:04 -07:00
Jingning Han
abff678866 Fix overflow issue in SSSE3 32x32 quantization
The 32x32 quantization process can potentially have the intermediate
stacks over 16-bit range, thereby causing enc/dec mismatch. This commit
fixes this overflow issue in the SSSE3 implementation, as well as the
prototype, of 32x32 quantization.

This fixes issue 607 from webm@googlecode.

Change-Id: I85635e6ca236b90c3dcfc40d449215c7b9caa806
2013-08-29 11:00:54 -07:00
hkuang
3a679e56b2 Add neon optimize vp9_short_idct16x16_1_add.
Change-Id: Ib9354c1d975d03e8081df20d50b6a77dfe2dc7e5
2013-08-27 14:00:27 -07:00
hkuang
36e9b82080 Add neon optimize vp9_short_idct8x8_1_add.
Change-Id: I0b15d5e3b0eb97abb9ab5ec08e88b61f8723aaf4
2013-08-26 16:28:57 -07:00
hkuang
69384f4fad Add neon optimize vp9_short_idct4x4_1_add.
Change-Id: I6ecb5c4a1a472feb8e84e9f3352b536d5e28a4a5
2013-08-26 15:55:16 -07:00
Jingning Han
166dc85bed Temporarily disable SSSE3 quant_32x32
Make the current head working properly, while working on fixing an
issue in the SSSE3 implementation of 32x32 quantization.

Change-Id: Ic029da3fd7f1f5e58bc641341cbd226ec49a16bc
2013-08-26 10:45:59 -07:00
hkuang
4082bf9d7c Add neon optimize vp9_short_idct10_16x16_add.
vp9_short_idct10_16x16_add is used to handle the block that only have valid data
at top left 4x4 block. All the other datas are 0. So we could cut many
unnecessary calculations in order to save instructions.

Change-Id: I6e30a3fee1ece5af7f258532416d0bfddd1143f0
2013-08-22 15:53:22 -07:00
James Zern
ac12f3926b Merge "vp9 rtcd: remove non-existent sad functions" 2013-08-21 13:55:59 -07:00
James Zern
ae455fabd8 vp9 rtcd: remove non-existent sad functions
vp9_sad32x3, vp9_sad3x32

+ remove unnecessary sad include from vp9_findnearmv.c

Change-Id: Idef2a89cadc3fec64eff82ba9be60ffff50b3468
2013-08-20 18:07:53 -07:00
hkuang
37cda6dc4c Add neon optimize vp9_short_idct10_8x8_add.
vp9_short_idct10_8x8_add is used to handle the block that only have valid data
at top left 4x4 block. All the other datas are 0. So we could cut several
unnecessary calculations in order to save instructions.

Change-Id: I34fda95e29082b789aded97c2df193991c2d9195
2013-08-20 11:51:07 -07:00
Dmitry Kovalev
1462433370 Merge "Renaming d27 predictor to d207." 2013-08-16 12:07:24 -07:00
Johann
a9aa7d07d0 Merge "vp9: neon: add vp9_convolve_avg_neon" 2013-08-15 14:55:15 -07:00
Johann
63e140eaa7 Merge "vp9: neon: add vp9_convolve_copy_neon" 2013-08-15 14:55:08 -07:00
Dmitry Kovalev
81d7bd50f5 Renaming d27 predictor to d207.
27 degrees intra predictor is actually 207 degrees, so renaming it.

Change-Id: Ife96a910437eb80ccdc0b7a5b7a62c77542ae5be
2013-08-15 11:09:49 -07:00
hkuang
39f42c8713 Merge "Add neon optimize vp9_short_idct16x16_add." 2013-08-14 14:16:20 -07:00
hkuang
cf6beea661 Add neon optimize vp9_short_idct16x16_add.
Change-Id: I27134b9a5cace2bdad53534562c91d829b48838d
2013-08-14 13:52:16 -07:00
Dmitry Kovalev
f2c073efaa Adding const to arguments of intra prediction functions.
Adding const to above and left pointers. Cleanup.

Change-Id: I51e195fa2e2923048043fe68b4e38a47ee82cda1
2013-08-14 10:35:56 -07:00
Mans Rullgard
0f1deccf86 vp9: neon: add vp9_convolve_avg_neon
Change-Id: I33cff9ac4f2234558f6f87729f9b2e88a33fbf58
2013-08-14 16:27:55 +01:00
Mans Rullgard
635ba269be vp9: neon: add vp9_convolve_copy_neon
Change-Id: I15adbbda15d1842e9f15f21878a5ffbb75c3c0c9
2013-08-14 16:27:55 +01:00
Jingning Han
78136edcdc SSE2 high precision 32x32 forward DCT
Enable SSE2 implementation of high precision 32x32 forward DCT. The
intermediate stacks are of 32-bits. The run-time goes down from
32126 cycles to 13442 cycles.

Change-Id: Ib5ccafe3176c65bd6f2dbdef790bd47bbc880e56
2013-08-12 16:52:53 -07:00
Christian Duvivier
78182538d6 Neon version of vp9_short_idct4x4_add.
Change-Id: Idec4cae0cb9b3a29835fd2750d354c1393d47aa4
2013-08-06 18:41:27 -07:00
Jim Bankoski
5b307886fb variance x86inc guards
also fixed bug in sad calcs

Change-Id: I6571fcbe37556c16ae32be66dc0fd879852aac1d
2013-08-06 14:17:13 -07:00
Jim Bankoski
6eb1254b88 sse3 intrapred x86inc protected
Change-Id: I4a3c83119cdf8a205920034c8019d855d5504605
2013-08-06 14:17:13 -07:00
Jim Bankoski
c9126e0b30 sad + miscellaneous updates
Enable use_x86inc as a commandline option.  Fix Bug with sse2 when
x86inc is disabled. Adds Sad asm protection to x86inc protection

Change-Id: Iee0f9dd235ea10e8ace512eb362ba9bebe8c9df6
2013-08-06 12:16:04 -07:00
Jim Bankoski
efc94102f0 Merge "intrapred x86inc guards" 2013-08-06 10:39:19 -07:00
Jim Bankoski
25ec1375c9 intrapred x86inc guards
Change-Id: If0399d8e11f4ebe75a5c91abb8d6a52a7709065b
2013-08-06 09:39:30 -07:00
Jim Bankoski
62c6aa884d block error / x86inc mods
Change-Id: Icb607745634e10b9bac5019d06661ece09fcdb40
2013-08-06 06:23:38 -07:00
Jim Bankoski
a93b115cd6 reworked config for use_x86_inc
Support enabling it or disabling it.  Moved read out to configure.sh
so that its done once instead of in make and in config.

Change-Id: I73a9190cf31de9f03e8a577f478fa522f8c01c8b
2013-08-05 17:35:25 -07:00
Jim Bankoski
f4837579d1 fixed script problem with config_force_x86_inc
Change-Id: I226e5094d216b09dc47fa5511a66e2d314608000
2013-08-05 14:48:20 -07:00
Jim Bankoski
c3809f3de5 Begin to restrict x86inc.asm usage
Chromium does not support 32bit builds for Mac which use x86inc.asm.
Make the files which include it work if 64bit or not PIC enabled
starting with vp9_copy_sse2.asm

Consolidate these targets in vp9_rtcd_defs.sh

Change-Id: If18f0b957a611efd085a3ee7d245cf1eb91e8248
2013-08-05 12:07:30 -07:00
Dmitry Kovalev
5d86f3886d Moving struct loop_filter_info from *.h to *.c file.
Change-Id: I3fe90eb40088a5b07bdc7d66d93ffe6ef99943d5
2013-08-02 11:53:49 -07:00
Mans Rullgard
d85ae87183 vp9: neon: add vp9_mb_lpf_* functions
Change-Id: I13e0880df234f15abc4cc7c57fe84488d5d46a75
2013-08-02 08:10:50 -07:00
Jingning Han
67719abde1 Remove unused vp9_short_idct10_32x32_add
The inverse 32x32 transform detects all zero entries and skips the
computations accordingly per 8 rows in the first 1-D operation. The
function vp9_short_idct10_32x32_add performs differently and is not
used anywhere, hence removed.

Change-Id: Ic4fad422debbde7b6b6ffed47c69fbd4268a906c
2013-08-01 12:45:16 -07:00
Jingning Han
a7c4de22e1 16x16 inverse 2D-DCT with DC only
This commit provides special handle on 16x16 inverse 2D-DCT, where
only DC coefficient is quantized to be non-zero value.

Change-Id: I7bf71be7fa13384fab453dc8742b5b50e77a277c
2013-07-29 14:45:53 -07:00
Ronald S. Bultje
6f3054b65d Merge "d45 intra prediction SSSE3 optimizations." 2013-07-26 17:21:09 -07:00
Jingning Han
325e0aa650 Special handle on DC only inverse 8x8 2D-DCT
This commit enables a special handle for the 8x8 inverse 2D-DCT,
where only DC coefficient is quantized to be non-zero. For bus_cif
at 2000 kbps, it provides about 1% speed-up at speed 0.

Change-Id: I2523222359eec26b144cf8fd4c63a4ad63b1b011
2013-07-26 14:16:51 -07:00
Ronald S. Bultje
94b0c6791d d45 intra prediction SSSE3 optimizations.
Change-Id: Ie48035ff4f93c41f8a9b3023e6444fd10432d8fb
2013-07-26 13:30:02 -07:00
Jingning Han
384e37e32b SSE2 inverse 4x4 2D-DCT with DC only
Add SSE2 implementation to handle the special case of inverse 2D-DCT
where only DC coefficient is non-zero.

Change-Id: I2c6a59e21e5e77b8cf39a4af5eecf4d5ade32e2f
2013-07-24 23:19:56 -07:00
Jingning Han
d2de1ca37b Merge vp9_dc_only_idct_add and vp9_short_idct4x4_1
They share the same functionality, so merging together.

Change-Id: I98a0386fcee052cb854f9ff90c283c1b844bcb79
2013-07-24 16:51:15 -07:00
hkuang
d757de744c Add neon optimize vp9_short_idct8x8_add.
Change-Id: Ic32acf3e2939c6d12d9c2bf192a5f5da59705fda
2013-07-18 16:40:41 -07:00