Commit Graph

252 Commits

Author SHA1 Message Date
Parag Salasakar
481fb7640c mips msa vp9 common headers added
Change-Id: Ia31ada59172eb1818e1eb91009f83cbb1f581223
2015-04-09 15:35:12 +05:30
James Zern
a98f6c0254 vp9/neon: skip some files in high-bitdepth build
exclude files that only contain functions for non-high-bitdepth builds.
this removes some warnings related to missing prototypes

Change-Id: Ic6642998c46a7b808c6c53b2f9c34bcd4d037abe
2015-03-31 18:06:21 -07:00
Yunqing Wang
41063137c3 Rename loopfilter_thread files to thread_common files
Renames the files to allow more common thread code to
be moved to vp9/common.

Change-Id: I7386e64e221086e3cdc087e79812f993c423413b
2015-02-06 10:03:31 -08:00
JackyChen
65f60f8e8c Merge "SSE2 code for the filter in MFQE." 2015-01-23 11:08:16 -08:00
JackyChen
09673deba9 SSE2 code for the filter in MFQE.
The SSE2 code is from VP8 MFQE, reuse it in VP9. No change on VP8
side. In our testing, we achieve 2X speed by adopting this change.

Change-Id: Ib2b14144ae57c892005c1c4b84e3379d02e56716
2015-01-18 16:07:59 -08:00
Yunqing Wang
e76eaf05b1 vp9_ethread: add parallel loopfilter
1. Added row-based loopfilter in encoder;
2. Moved common multi-threaded loopfilter functions from decoder
   to common;
3. Merged multi-threaded loopfilter code, and made encoder/
   decoder call same function to reduce code duplication.

Encoder tests showed that 1% - 2% speedup was seen for good-quality
2-pass mode(at speed 3); 1% - 3% speedup using 2 threads and 4% - 6%
speedup using 4 threads were seen for real-time mode(at speed 7).

Change-Id: I8a4ac51c2ad9bab9fa7b864e90743931c53ec1c4
2015-01-16 17:19:27 -08:00
Johann
377b6682f9 Disable vp9 _8_ loopfilters
Investigating https://code.google.com/p/chromium/issues/detail?id=443839

Change-Id: Ibb7485d835c5aa5e1d40f31715596ba8d208eedb
2015-01-06 19:26:11 -08:00
Johann
b1ba4cc394 Rearrange loopfilter functions
Separate functions and rename files. This will make it easier to disable
some functions later to help work around a compiler issue in chromium.

Change-Id: I7f30e109f77c4cd22e2eda7bd006672f090c1dc5
2015-01-06 19:26:11 -08:00
James Yu
aeeaa67987 VP9 common for ARMv8 by using NEON intrinsics 15
Re-write
- vp9_lpf_horizontal_4_dual_neon
in vp9_loopfilter_16_neon.c

Change-Id: Ie14f63d352f9564ad01db3939a61d91cf6d21a31
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-16 20:00:26 -08:00
James Yu
aa8dd897c1 VP9 common for ARMv8 by using NEON intrinsics 16
Add vp9_reconintra_neon.c
- vp9_v_predictor_4x4_neon
- vp9_v_predictor_8x8_neon
- vp9_v_predictor_16x16_neon
- vp9_v_predictor_32x32_neon
- vp9_h_predictor_4x4_neon
- vp9_h_predictor_8x8_neon
- vp9_h_predictor_16x16_neon
- vp9_h_predictor_32x32_neon
- vp9_tm_predictor_4x4_neon
- vp9_tm_predictor_8x8_neon
- vp9_tm_predictor_16x16_neon
- vp9_tm_predictor_32x32_neon

Change-Id: Ib5d54a4766a1b5127169045659974f33aa98376d
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-16 12:57:52 -08:00
James Yu
ba05a4c640 VP9 common for ARMv8 by using NEON intrinsics 19
Delete vp9_dc_only_idct_add_neon.c

The function was merged with vp9_short_idct4x4_1_add (later
vp9_idct4x4_1_add) in d2de1ca and should have been deleted then.

Change-Id: Ie58ba3dd9dc7330a8f1238dd7dd71c9ed4639b94
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-16 11:14:12 -08:00
James Yu
4f856cd7fa VP9 common for ARMv8 by using NEON intrinsics 06
Add vp9_iht8x8_add_neon.c
- vp9_iht8x8_64_add_neon

The assembly did not previously implement tx_type 0
BUG=716

Change-Id: Icfc99dd24f3d59047f9184a7d0c761ba7e3de934
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-15 12:18:06 -08:00
James Yu
6b71013277 VP9 common for ARMv8 by using NEON intrinsics 05
Add vp9_iht4x4_add_neon.c
- vp9_iht4x4_16_add_neon

The assembly did not previously implement tx_type 0
BUG=715

Change-Id: I60034d1568de034edba45c5cdd13f3d87dbc73b6
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-15 12:16:19 -08:00
JackyChen
3425d6c83e Merge "Multiframe Quality Enhancement(MFQE) in VP9." 2014-12-11 16:24:08 -08:00
JackyChen
7ac3e3c1d6 Multiframe Quality Enhancement(MFQE) in VP9.
It is the first version of MFQE in VP9. There are a few TODOs included
in this version.
Usage: Add flag --enable-vp9-postproc to config the project.
In decoder, use flag --mfqe in the command line to enable
MFQE in postproc.
Note: Need to have key frame with low quality to see the effect of this
new patch. In my experiment, I fixed the qindex to 200 in key frame.

Change-Id: I021f9ce4616ed3574c81e48d968662994b56a396
2014-12-11 09:19:39 -08:00
James Yu
3f7c12dab9 VP9 common for ARMv8 by using NEON intrinsics 18
Add vp9_idct32x32_add_neon.c
- vp9_idct32x32_1024_add_neon

Change-Id: Ic598b772c28bd3487a8ead7a4598a66b25f9b00f
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-10 18:20:04 -08:00
James Yu
3cfed4bf76 VP9 common for ARMv8 by using NEON intrinsics 14
Add vp9_idct16x16_add_neon.c
- vp9_idct16x16_256_add_neon_pass1
- vp9_idct16x16_256_add_neon_pass2
- vp9_idct16x16_10_add_neon_pass1
- vp9_idct16x16_10_add_neon_pass2

Change-Id: I54d25b54a36f4371760f54e4036693aaea40a5de
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-10 18:19:54 -08:00
James Yu
ce76aeb00d VP9 common for ARMv8 by using NEON intrinsics 13
Add vp9_idct8x8_add_neon.c
- vp9_idct8x8_64_add_neon
- vp9_idct8x8_10_add_neon

Change-Id: I6ee7b4496765aa36ed52990f2ef73e9f24459610
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-10 14:56:54 -08:00
James Yu
8c25f4af6a VP9 common for ARMv8 by using NEON intrinsics 12
Add vp9_idct4x4_add_neon.c
- vp9_idct4x4_16_add_neon

Change-Id: I011a96b10f1992dbd52246019ce05bae7ca8ea4f
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-10 14:49:59 -08:00
James Yu
420f58f2d2 VP9 common for ARMv8 by using NEON intrinsics 11
Add vp9_idct16x16_1_add_neon.c
- vp9_idct16x16_1_add_neon

Change-Id: I7c6524024ad4cb4e66aa38f1c887e733503c39df
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-10 13:06:58 -08:00
James Yu
030ca4d0e5 VP9 common for ARMv8 by using NEON intrinsics 10
Add vp9_idct32x32_1_add_neon.c
- vp9_idct32x32_1_add_neon

Change-Id: If9ffe9a857228f5c67f61dc2b428b40965816eda
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-10 13:04:29 -08:00
James Yu
2772b45ac0 VP9 common for ARMv8 by using NEON intrinsics 09
Add vp9_idct8x8_1_add_neon.c
- vp9_idct8x8_1_add_neon

Change-Id: I9d23e01fa96013febbf64db6c76c6c955f14e3ff
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-10 12:52:33 -08:00
James Yu
9114f0afdb VP9 common for ARMv8 by using NEON intrinsics 08
Add vp9_idct4x4_1_add_neon.c
- vp9_idct4x4_1_add_neon

Change-Id: Ieab9af107dbd07a4f9503bc945890c90faccb8ac
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-10 12:49:28 -08:00
James Yu
01fc6f51e0 VP9 common for ARMv8 by using NEON intrinsics 07
Add vp9_convolve8_neon.c
- vp9_convolve8_horiz_neon
- vp9_convolve8_vert_neon

Change-Id: I0bdd99ff72d275223fe211ac7243c25a5a60cf87
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-09 20:03:07 -08:00
James Yu
893534a996 VP9 common for ARMv8 by using NEON intrinsics 04
Add vp9_convolve8_avg_neon.c
- vp9_convolve8_avg_horiz_neon
- vp9_convolve8_avg_vert_neon

Change-Id: I617971e37b02186fec5aca181f4f9622050ea2df
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-09 20:03:07 -08:00
James Yu
d12757f5c6 VP9 common for ARMv8 by using NEON intrinsics 03
Add vp9_copy_neon.c
- vp9_convolve_copy_neon

Change-Id: I291fc5423d06240876411bbceab03eae5ef585be
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-09 20:02:46 -08:00
Scott LaVarnway
617382a2e3 VP9 common for ARMv8 by using NEON intrinsics 02
Add vp9_avg_neon.c
- vp9_convolve_avg_neon

Change-Id: Id2c9d5bcfa37cff1a16417aba1656ff07bdf10fd
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-09 19:00:21 -08:00
James Yu
5b098b1825 VP9 common for ARMv8 by using NEON intrinsics 01
Add vp9_loopfilter_neon.c
- vp9_lpf_horizontal_4_neon
- vp9_lpf_vertical_4_neon
- vp9_lpf_horizontal_8_neon
- vp9_lpf_vertical_8_neon

Change-Id: I97a0d7b399a431c21ee77396be3d5f5a1f7ebccb
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-09 12:26:56 -08:00
Deb Mukherjee
931ed516ba High bit-depth loop/arf/postproc filter functions
Adds high-bitdepth loopfilter, temporal filter and postproc functions

Change-Id: I81c8a9176890784686bc4f2af0d550d243b3b2d3
2014-09-23 16:20:43 -07:00
Deb Mukherjee
0d3c3d3ce7 Adds high bitdepth convolve, interpred & scaling
Change-Id: Ie51c352a6b250547207cbc1ebba833a01ed053e3
2014-09-18 07:26:17 -07:00
Deb Mukherjee
81a8138fc3 Adding high-bitdepth intra prediction functions
Change-Id: I6f5cb101e2dc57c3d3f4d7e0ffb4ddbed027d111
2014-09-16 15:04:39 -07:00
Dmitry Kovalev
1100e262c5 Removing postproc mmx code.
Removed functions:
* vp9_post_proc_down_and_across_mmx
* vp9_mbpost_proc_down_mmx
* vp9_plane_add_noise_mmx

They all have sse2 equivalent.

Change-Id: I59c1fac12b7c96ca4538d455e4400c2b7875feff
2014-09-05 11:52:50 -07:00
Johann
7516abc7dc Remove vp9_postproc_x86.h
This configuration has moved to vp9_rtcd_defs.pl

Change-Id: I71a31dbb8d79df226b60dd834324a5af69956c51
2014-08-05 15:46:13 -07:00
hkuang
337e8015c9 Move vp9_thread.* to common.
Prepare for frame parallel decoding, the reference count buffers
need to be protected by mutex. Move vp9_thread.* to common
folder so that those buffers could use cross-platform mutex
from vp9_thread.*.

Change-Id: I541277cf15eefed6641555944f67f4a0bcdc8154
2014-07-07 14:52:19 -07:00
Jingning Han
59c3f446fe Merge "Inverse 16x16 2D-DCT SSSE3 implementation" 2014-05-23 16:01:22 -07:00
Jingning Han
48b0891370 Inverse 16x16 2D-DCT SSSE3 implementation
This commit enables the SSSE3 implementation of full inverse 16x16
2D-DCT. The unit runtime goes down from 1642 cycles to 1519 cycles,
about 7% speed-up.

Change-Id: I14d2fdf9da1fb4ed1e5db7ce24f77a1bfc8ea90d
2014-05-23 15:09:35 -07:00
Dmitry Kovalev
72ab966d5e Removing vp9_pragmas.h.
Change-Id: I9120a87e27e73e496932d11716937e2fad246521
2014-05-22 13:46:31 -07:00
Deb Mukherjee
e272273443 Renames x86_64 specific asm files
Renames all x86_64 specific assembly files to consistently
end in _x86_64.asm. This will be useful for build systems to
handle these files differently.
All new 64-bit specific assembly files should use the new
naming convention.

Change-Id: I36c89584967c82ffc4088b1b5044ac15d2bb7536
2014-05-21 13:55:56 -07:00
Johann
ce23931a3f Only build neon assembly for armv7 targets
Allow selectively building just the intrinsics for armv8

Change-Id: I2f29b2e4508b8b8e5649c2906b3159ad1d4ec477
2014-05-12 08:52:02 -07:00
Jingning Han
52ae97b6aa SSSE3 implementation of full inverse 8x8 2D-DCT
This commit enables SSSE3 version full inverse 8x8 2D-DCT and
reconstruction. It makes the runtime of vp9_idct8x8_64_add down
from 256 cycles (SSE2) to 246 cycles.

Change-Id: I0600feac894d6a443a3c9d18daf34156d4e225c3
2014-05-05 10:49:27 -07:00
Dmitry Kovalev
3f1ab25812 Removing vp9_onyx.h and moving its content to the encoder.
Change-Id: I03451c88536bc498edddbe0cd9773ff79da085c2
2014-03-05 23:33:22 -08:00
James Zern
805078a1bf build: convert rtcd.sh to perl
significantly speeds up file generation.

the goal of this change is to convert rtcd.sh to perl as directly as
possible to allow for simple comparison. future changes can make it more
perl-like.

---
Linux
    [CREATE] vpx_scale_rtcd.h
real    0m0.485s ->    0m0.022s
    [CREATE] vp8_rtcd.h
real    0m4.619s ->    0m0.060s
    [CREATE] vp9_rtcd.h
real    0m10.102s ->    0m0.087s

Windows
    [CREATE] vpx_scale_rtcd.h
real    0m8.360s ->    0m0.080s
    [CREATE] vp8_rtcd.h
real    1m8.083s ->    0m0.160s
    [CREATE] vp9_rtcd.h
real    2m6.489s ->    0m0.233s

Change-Id: Idfb71188206c91237d6a3c3a81dfe00d103f11ee
2014-03-03 14:47:11 -08:00
Dmitry Kovalev
2c594a5275 Removing vp9_systemdependent.c.
Change-Id: I7b9738a7113c0c4687e5d320581ff69d98a8b271
2014-02-26 18:07:23 -08:00
levytamar82
3068d7d944 SSSE3 convolution optimization
Optimizing all SSSE3 assembly for convolution:
1. vp9_filter_block1d4_h8_sse2
2. vp9_filter_block1d8_h8_sse2
3. vp9_filter_block1d16_h8_sse2
4. vp9_filter_block1d4_v8_sse2
5. vp9_filter_block1d8_v8_sse2
6. vp9_filter_block1d16_v8_sse2
my optimization include:
-processing 2x8 elements in one 128 bit register instead of processing
8 elements in one 128 bit register.
-removing unecessary loads.
This optimization gives between 2.4% user level gain for 480p input
and 1.6% user level gain for 720p.
This Optimization is done only for 64 bit

Change-Id: Ic07fce2f9360329b4f2d956efda1480ae958766b
2014-02-14 15:08:42 -07:00
levytamar82
876c72a093 AVX2 Convolve Optimization
Two convolve functions were optimized for AVX2:
1. vp9_filter_block1d16_h8
2. vp9_filter_block1d16_v8
vp9_filter_block1d16_v8 was optimized for AVX2 by reducing the number of
loop strides by half, two strides were processed in parallel.
vp9_filter_block1d16_v8 was also optimized in the same way also some of the
loads were being done outside of the loop and by that preventing redundant
loads.
This Optimization gives 43% function level gain and 1.3% user level gain.
Now can be compiled in Windows

Change-Id: I2714124cfb0c14a77d7a0ce126a20db92ffbf92c
2014-02-12 20:45:31 -07:00
Frank Galligan
d51ca0db00 Merge "Add get release decoder frame buffer functions." 2014-02-11 08:19:37 -08:00
James Zern
66bfc69bfc Merge "*.mk: s/\bUSE_X86INC/CONFIG_USE_X86INC/" 2014-02-10 15:39:28 -08:00
Frank Galligan
e8e152799b Add get release decoder frame buffer functions.
This CL changes libvpx to call a function when a frame buffer
is needed for decode. Libvpx will call a release callback when
no other frames reference the frame buffer. This CL adds a
default implementation of the frame buffer callbacks. Currently
only VP9 is supported. A future CL will add support for
applications to supply their own frame buffer callbacks.

Change-Id: I1405a320118f1cdd95f80c670d52b085a62cb10d
2014-02-10 14:08:11 -08:00
James Zern
7cf0c783c1 *.mk: s/\bUSE_X86INC/CONFIG_USE_X86INC/
CONFIG_USE_X86INC is available to every makefile, there's no need to
duplicate its value with USE_X86INC

Change-Id: Id12bd5f09cba78abba56ab5a8f56351562e5b8b6
2014-02-04 20:04:38 -08:00
Yunqing Wang
d1961e6fbf Optimize bilinear sub-pixel filters in ssse3
This patch added ssse3 optimization of bilinear sub-pixel filters.
The real time encoder was speeded up by ~1%.

Change-Id: Ie82e98976f411183cb8c61ab8d2ba0276e55a338
2014-02-04 08:01:55 -08:00
Dmitry Kovalev
282f36adc4 Merge "Removing "_short" suffix from arm transform file names." 2014-02-03 14:28:47 -08:00
Yunqing Wang
2488cb34bc Optimize bilinear sub-pixel filters in sse2
Using bilinear filters could speed up the codec in real-time mode.
This patch added sse2 optimizations of bilinear filters that
operate on different-sized blocks.

Tests showed that the real-time encoder was speeded up by 3%.

Change-Id: If99a7ee4385fcc225c3ee7445d962d5752e57c3f
2014-02-03 10:34:45 -08:00
Jim Bankoski
9dec7712ab static function convert to inline or global vp9_blockd.h
Change-Id: Ifdd951f24932839f06d1c700371662511dde6ebe
2014-01-31 19:50:40 -08:00
Dmitry Kovalev
c49b08c9a1 Removing "_short" suffix from arm transform file names.
Change-Id: Iefe118f61a335e88821a21a9f50fb919212c1507
2014-01-31 17:19:02 -08:00
Yunqing Wang
d2bb0c51d3 Revert "Revert "Revert "SSSE3 convolution optimization"""
This reverts commit f9404f2406.

This patch caused some ASAN error.

Change-Id: If15b7e581310e19061d111c69f2931809662ed19
2014-01-16 16:11:46 -08:00
Yunqing Wang
f9404f2406 Revert "Revert "SSSE3 convolution optimization""
This reverts commit b645257121.

Change-Id: I60d1bf57ae8e9eb6127f42f2d5a780124ac51b45
2014-01-13 12:29:55 -08:00
Paul Wilkins
b645257121 Revert "SSSE3 convolution optimization"
This reverts commit 511d218c60.

In current form intrinsics break borg build.

Change-Id: Ied37936af841250ecff449802e69a3d3761c91b9
2014-01-10 13:38:26 +00:00
Yunqing Wang
f3b9b97c0e Merge "SSSE3 convolution optimization" 2014-01-09 12:39:47 -08:00
levytamar82
511d218c60 SSSE3 convolution optimization
Optimizing all SSSE3 assembly for convolution:
1. vp9_filter_block1d4_h8_sse2
2. vp9_filter_block1d8_h8_sse2
3. vp9_filter_block1d16_h8_sse2
4. vp9_filter_block1d4_v8_sse2
5. vp9_filter_block1d8_v8_sse2
6. vp9_filter_block1d16_v8_sse2
my optimization include:
-processing 2x8 elements in one 128 bit register instead of processing
8 elements in one 128 bit register.
-removing unecessary loads.
This optimization gives between 2.4% user level gain for 480p input
and 1.6% user level gain for 720p.
This Optimization done only for 64bit.

Change-Id: Icb586dc0c938b56699864fcee6c52fd43b36b969
2014-01-09 12:27:51 -07:00
hkuang
5be0ed30dc Merge "Add initial intra frame neon optimization. 1~2% gain." 2014-01-08 14:41:43 -08:00
hkuang
691111aacf Add initial intra frame neon optimization. 1~2% gain.
More intra optimizations will be added.

Change-Id: I33ae8d93f6002bf7b64cc2669602d9e6bfa5a6e8
2014-01-08 11:58:42 -08:00
Dmitry Kovalev
987810ad95 Removing vp9_findnearmv.{h, c} files.
Moving all code from that files to vp9_mvref_common.{h, c}.

Change-Id: Ibc4afcb8cea6847166ff411130e93611ebe63b20
2013-12-19 17:39:57 -08:00
Dmitry Kovalev
b5c9261832 Converting vp9_treecoder.h to vp9_prob.{h, c}
Moving vp9_norm probability table from vp9_entropy.c to vp9_prob.c

Change-Id: Ie757b73860c6f43130790c332b292e2a1a81b788
2013-12-16 12:53:09 -08:00
Dmitry Kovalev
4ac6a2552b Moving vp9_tree_probs_from_distribution() to encoder.
Writing custom coeff branch count calculation (which is much clearer) in
adapt_coef_probs() function. Removing vp9_treecoder.c file.

Change-Id: I8880fb7a39996c8bcf6cd0acf9898a8c712ba91f
2013-12-05 18:13:26 -08:00
Dmitry Kovalev
4afd141a05 Removing vp9_default_coef_probs.h file.
Moving all probability tables from removed file to vp9_entropy.c.

Change-Id: I12846f1da778c3016d96b82e53384d4634883430
2013-12-04 17:04:35 -08:00
Frank Galligan
b4874e2c82 Fix 16 wide neon horz loopfilter.
Multiply by 3 was on 8bit vectors when it should have been on
16bit vectors.

Change-Id: I248c1429b3134dfd171dfab0ebb109fd2437e1fc
2013-11-26 10:02:40 -08:00
Frank Galligan
97d1258375 Revert "Add 16 wide neon horz loopfilter."
The change caused mismatches with some test vectors on neon.

Original CL: https://gerrit.chromium.org/gerrit/#/c/67863/

Change-Id: I913891636d53783e93cb1865ca78ded1821dc4b0
2013-11-21 14:01:33 -08:00
Frank Galligan
2dd77580c0 Merge "Add 16 wide neon horz loopfilter." 2013-11-21 10:29:30 -08:00
Frank Galligan
98de15137e Add 16 wide neon horz loopfilter.
Add support to do 16 pixel horizontal filtering in Neon.
Nexus devices saw about 0.5% decode speed increase.

Change-Id: I2993f6c2d49f31fa74976879eeaa289fd3f4e15d
2013-11-21 09:39:36 -08:00
Yaowu Xu
30b03050a2 Move vp9_sadmxn.h from common to encoder
Change-Id: I6f6ba91b1b8b280902b171472314d665aa0baf0b
2013-11-19 12:46:08 -08:00
Yaowu Xu
a42ab027fd Merge "Move vp9_extend.{h,c} from common to encoder" 2013-11-18 15:43:32 -08:00
Yaowu Xu
1c61e1960d Move vp9_extend.{h,c} from common to encoder
Since they used in encoder only. This commit also re-order includes
for the files that include vp9_extend.h

Change-Id: I929fc113f2135d3198cd1fc6a17434e5a2f8a459
2013-11-18 12:43:36 -08:00
Yunqing Wang
64f728caef Do horizontal loopfiltering in parallel
This patch followed "Rewrite filter_selectively_horiz for parallel
loopfiltering" commit, and added x86 SSE2 optimization to do
16-pixel filtering in parallel. Also, corrected the declaration
of aligned arrays. For 8-pixel-in-parallel case, improved the
calculation of the masks and filters. Updated the threshold loading
since the thresholds were already duplicated. Updated neon C functions
to call neon loopfilters twice.

Using tulip clip, tests showed it gave a ~1.5% decoder speed gain.

Change-Id: Id02638626ac27a4b0e0b09d71792a24c0499bd35
2013-11-15 16:18:43 -08:00
Johann
4da2a8b718 Merge "mips dsp-ase r2 vp9 decoder intra module optimizations (rebase)" 2013-11-13 09:00:09 -08:00
Parag Salasakar
1530a6b77f mips dsp-ase r2 vp9 decoder intra module optimizations (rebase)
Change-Id: Ib27fc4f3dbe01fe8adfa04a61aaba21b3480e75c
2013-11-13 11:17:14 +05:30
Parag Salasakar
248cf6f69f mips dsp-ase r2 vp9 decoder loopfilter module optimizations (rebase)
Change-Id: Ia7f640ca395e8deaac5986f19d11ab18d85eec2d
2013-11-13 10:53:16 +05:30
hkuang
a6462990e6 Merge "Add back vp9_short_idct32x32_1_add_neon which is deleted in cleanup I63df79a13cf62aa2c9360a7a26933c100f9ebda3." 2013-11-07 14:42:29 -08:00
hkuang
6b16f63332 Add back vp9_short_idct32x32_1_add_neon which is deleted in
cleanup I63df79a13cf62aa2c9360a7a26933c100f9ebda3.

Change-Id: I034848cf05031618818f7df2e7f9c35102686948
2013-11-05 14:57:32 -08:00
Tamar Levy
54f9205653 mb_lpf_horizontal_edge AVX2 optimization
This CL contains two AVX2 optimized loop filter functions,
mb_lpf_horizontal_edge_w_avx2_8 and mb_lpf_horizontal_edge_w_avx2_16.

Change-Id: I604e4fe6e99752b7800c2ea98721d97f7e0b931b
2013-10-31 10:26:15 -06:00
Parag Salasakar
1699eb0bf6 mips dsp-ase r2 vp9 decoder idct module optimizations (rebase)
Change-Id: Iedcdb8867084f328f4fce2fadb968e0984217308
2013-10-24 11:29:04 +05:30
Yunqing Wang
3a0b59e3fd Merge "SSE2 8-tap sub-pixel filter optimization" 2013-10-11 08:44:56 -07:00
Yunqing Wang
3fb728c749 SSE2 8-tap sub-pixel filter optimization
To ensure fast encoding/decoding on devices without ssse3 support,
SSE2 optimization of sub-pixel filters was done. Test using 1080p
clip showed the decoder speeds were ~70fps with ssse3 filters, ~60fps
with sse2 filters, and ~15fps with c filters.

Change-Id: Ie2088f87d83a889fba80a613e4d0e287aadd785c
2013-10-10 14:12:47 -07:00
Dmitry Kovalev
9a1250e3e0 Merge "Moving all scan/iscan code into separate vp9_scan.{h, c} files." 2013-10-10 10:45:07 -07:00
Parag Salasakar
eeb5b62dc1 mips dsp-ase r2 vp9 decoder bilinear convolve optimizations
Change-Id: Ic31b4ef85e65070b4f8b9f26e068ccfaae00c4f0
2013-10-09 18:05:27 +05:30
Dmitry Kovalev
e3597c6af7 Moving all scan/iscan code into separate vp9_scan.{h, c} files.
Now we have entropy code separate from scan/iscan code. The next step
in future is to move iscan code from common part to the encoder.

Change-Id: Id9732f7d80aec00af35c1d58d1137c4c96c91451
2013-10-07 13:55:56 -07:00
Parag Salasakar
40edab5e39 mips dsp-ase r2 vp9 decoder convolve module optimizations
Change-Id: I401536778e3c68ba2b3ae3955c689d005e1f1d59
2013-10-02 16:58:37 -07:00
Dmitry Kovalev
efbacc9f89 Merge "Removing vp9_subpelvar.h from common." 2013-09-29 12:00:46 -07:00
Christian Duvivier
b1b4ba1bdd Properly save neon registers.
Replace current code which corrupts the stack by
duplicate of vp8 code to save and restore neon
registers.

Change-Id: Ibb0220b9aa985d10533befa0a455ebce57a2891a
2013-09-27 14:25:33 -07:00
Christian Duvivier
5b1dc1515f Fix a bunch of TODO from vp9_short_idct32x32_add_neon.
- full ASM version, no more C gateway file.
- integrate combine-add with last step of 2nd pass.
- remove a few push/pop pairs.
- some instruction reordering to hide latency.

Change-Id: Ic9d9933c908b65d1bf7ba8fd47b524cda808c9c6
2013-09-25 21:15:19 -07:00
Dmitry Kovalev
64eff7f360 Removing vp9_subpelvar.h from common.
Moving all code from that file to vp9_variace_c.c in the encoder.

Change-Id: Ic803d5b4c78d5191e4d25541b3df97337878fc3e
2013-09-25 16:10:43 -07:00
James Zern
2d58761993 Revert "Improved 8t filters"
This is incompatible with most toolchains other than gcc.

Revert "Deleted #include <inttypes.h>"

This reverts commit 4d018be950.

This reverts commit d22a504d11.

Change-Id: I1751dc6831f4395ee064e6748281418e967e1dcf
2013-09-13 15:13:06 -07:00
hkuang
86fb12b600 Merge "Add neon optimize iht8x8 which is 282% faster than C." 2013-09-12 15:42:44 -07:00
hkuang
182366c736 Add neon optimize iht8x8 which is 282% faster than C.
Change-Id: I963dd4a6e8671957403ccbb9a16ea7de703e3530
2013-09-12 11:49:05 -07:00
Christian Duvivier
6a501462f8 First draft of vp9_short_idct32x32_add_neon.
Lots of TODO which will be taken care in upcoming changes. As is,
about 6x faster than C version.

Change-Id: Ie2557b72fd2d8edca376dbf400a4d173aa5e63e0
2013-09-11 15:19:38 -07:00
Scott LaVarnway
d22a504d11 Improved 8t filters
Reformatted version of a patch submitted by Erik/Tamar
from Intel.  For the test clips used, the decoder
performance improved by ~2%.

Change-Id: Ifbc37ac6311bca9ff1cfefe3f2e9b7f13a4a511b
2013-09-11 13:56:32 -04:00
hkuang
3c05bda058 Merge "Add neon optimize vp9_short_iht4x4_add." 2013-09-04 13:35:09 -07:00
hkuang
3b8614a8f6 Add neon optimize vp9_short_iht4x4_add.
Change-Id: I42c497b68ae1ee645b59c9968ad805db0a43e37e
2013-09-04 12:37:58 -07:00
Jim Bankoski
79401542f7 make vp9 postproc a config option
Vp9 postproc is disabled for now as its not been shown to help and
may be merged with vp8.

Change-Id: I25620d6cd34c6e10331b18c7b5ef7482e39c6057
2013-09-04 10:02:08 -07:00
hkuang
3a679e56b2 Add neon optimize vp9_short_idct16x16_1_add.
Change-Id: Ib9354c1d975d03e8081df20d50b6a77dfe2dc7e5
2013-08-27 14:00:27 -07:00
hkuang
36e9b82080 Add neon optimize vp9_short_idct8x8_1_add.
Change-Id: I0b15d5e3b0eb97abb9ab5ec08e88b61f8723aaf4
2013-08-26 16:28:57 -07:00