Commit Graph

117 Commits

Author SHA1 Message Date
Johann
b1ba4cc394 Rearrange loopfilter functions
Separate functions and rename files. This will make it easier to disable
some functions later to help work around a compiler issue in chromium.

Change-Id: I7f30e109f77c4cd22e2eda7bd006672f090c1dc5
2015-01-06 19:26:11 -08:00
James Yu
aeeaa67987 VP9 common for ARMv8 by using NEON intrinsics 15
Re-write
- vp9_lpf_horizontal_4_dual_neon
in vp9_loopfilter_16_neon.c

Change-Id: Ie14f63d352f9564ad01db3939a61d91cf6d21a31
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-16 20:00:26 -08:00
Johann
ebc1951c7c Merge "Use defines for inline and __builtin_prefetch" 2014-12-16 18:04:04 -08:00
Johann
2fdbf70d40 Use defines for inline and __builtin_prefetch
These were established for compatibility. Make sure to use them.

Most frequently they manifest as issues on Visual Studio builds.

Change-Id: I39d764d2eb341b999d7a6132cb44b2acfc511160
2014-12-16 15:21:19 -08:00
James Yu
aa8dd897c1 VP9 common for ARMv8 by using NEON intrinsics 16
Add vp9_reconintra_neon.c
- vp9_v_predictor_4x4_neon
- vp9_v_predictor_8x8_neon
- vp9_v_predictor_16x16_neon
- vp9_v_predictor_32x32_neon
- vp9_h_predictor_4x4_neon
- vp9_h_predictor_8x8_neon
- vp9_h_predictor_16x16_neon
- vp9_h_predictor_32x32_neon
- vp9_tm_predictor_4x4_neon
- vp9_tm_predictor_8x8_neon
- vp9_tm_predictor_16x16_neon
- vp9_tm_predictor_32x32_neon

Change-Id: Ib5d54a4766a1b5127169045659974f33aa98376d
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-16 12:57:52 -08:00
James Yu
ba05a4c640 VP9 common for ARMv8 by using NEON intrinsics 19
Delete vp9_dc_only_idct_add_neon.c

The function was merged with vp9_short_idct4x4_1_add (later
vp9_idct4x4_1_add) in d2de1ca and should have been deleted then.

Change-Id: Ie58ba3dd9dc7330a8f1238dd7dd71c9ed4639b94
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-16 11:14:12 -08:00
James Yu
4f856cd7fa VP9 common for ARMv8 by using NEON intrinsics 06
Add vp9_iht8x8_add_neon.c
- vp9_iht8x8_64_add_neon

The assembly did not previously implement tx_type 0
BUG=716

Change-Id: Icfc99dd24f3d59047f9184a7d0c761ba7e3de934
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-15 12:18:06 -08:00
James Yu
6b71013277 VP9 common for ARMv8 by using NEON intrinsics 05
Add vp9_iht4x4_add_neon.c
- vp9_iht4x4_16_add_neon

The assembly did not previously implement tx_type 0
BUG=715

Change-Id: I60034d1568de034edba45c5cdd13f3d87dbc73b6
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-15 12:16:19 -08:00
James Yu
3f7c12dab9 VP9 common for ARMv8 by using NEON intrinsics 18
Add vp9_idct32x32_add_neon.c
- vp9_idct32x32_1024_add_neon

Change-Id: Ic598b772c28bd3487a8ead7a4598a66b25f9b00f
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-10 18:20:04 -08:00
James Yu
3cfed4bf76 VP9 common for ARMv8 by using NEON intrinsics 14
Add vp9_idct16x16_add_neon.c
- vp9_idct16x16_256_add_neon_pass1
- vp9_idct16x16_256_add_neon_pass2
- vp9_idct16x16_10_add_neon_pass1
- vp9_idct16x16_10_add_neon_pass2

Change-Id: I54d25b54a36f4371760f54e4036693aaea40a5de
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-10 18:19:54 -08:00
James Yu
ce76aeb00d VP9 common for ARMv8 by using NEON intrinsics 13
Add vp9_idct8x8_add_neon.c
- vp9_idct8x8_64_add_neon
- vp9_idct8x8_10_add_neon

Change-Id: I6ee7b4496765aa36ed52990f2ef73e9f24459610
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-10 14:56:54 -08:00
James Yu
8c25f4af6a VP9 common for ARMv8 by using NEON intrinsics 12
Add vp9_idct4x4_add_neon.c
- vp9_idct4x4_16_add_neon

Change-Id: I011a96b10f1992dbd52246019ce05bae7ca8ea4f
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-10 14:49:59 -08:00
James Yu
420f58f2d2 VP9 common for ARMv8 by using NEON intrinsics 11
Add vp9_idct16x16_1_add_neon.c
- vp9_idct16x16_1_add_neon

Change-Id: I7c6524024ad4cb4e66aa38f1c887e733503c39df
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-10 13:06:58 -08:00
James Yu
030ca4d0e5 VP9 common for ARMv8 by using NEON intrinsics 10
Add vp9_idct32x32_1_add_neon.c
- vp9_idct32x32_1_add_neon

Change-Id: If9ffe9a857228f5c67f61dc2b428b40965816eda
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-10 13:04:29 -08:00
James Yu
2772b45ac0 VP9 common for ARMv8 by using NEON intrinsics 09
Add vp9_idct8x8_1_add_neon.c
- vp9_idct8x8_1_add_neon

Change-Id: I9d23e01fa96013febbf64db6c76c6c955f14e3ff
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-10 12:52:33 -08:00
James Yu
9114f0afdb VP9 common for ARMv8 by using NEON intrinsics 08
Add vp9_idct4x4_1_add_neon.c
- vp9_idct4x4_1_add_neon

Change-Id: Ieab9af107dbd07a4f9503bc945890c90faccb8ac
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-10 12:49:28 -08:00
James Yu
01fc6f51e0 VP9 common for ARMv8 by using NEON intrinsics 07
Add vp9_convolve8_neon.c
- vp9_convolve8_horiz_neon
- vp9_convolve8_vert_neon

Change-Id: I0bdd99ff72d275223fe211ac7243c25a5a60cf87
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-09 20:03:07 -08:00
James Yu
893534a996 VP9 common for ARMv8 by using NEON intrinsics 04
Add vp9_convolve8_avg_neon.c
- vp9_convolve8_avg_horiz_neon
- vp9_convolve8_avg_vert_neon

Change-Id: I617971e37b02186fec5aca181f4f9622050ea2df
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-09 20:03:07 -08:00
James Yu
d12757f5c6 VP9 common for ARMv8 by using NEON intrinsics 03
Add vp9_copy_neon.c
- vp9_convolve_copy_neon

Change-Id: I291fc5423d06240876411bbceab03eae5ef585be
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-09 20:02:46 -08:00
Scott LaVarnway
617382a2e3 VP9 common for ARMv8 by using NEON intrinsics 02
Add vp9_avg_neon.c
- vp9_convolve_avg_neon

Change-Id: Id2c9d5bcfa37cff1a16417aba1656ff07bdf10fd
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-09 19:00:21 -08:00
James Yu
5b098b1825 VP9 common for ARMv8 by using NEON intrinsics 01
Add vp9_loopfilter_neon.c
- vp9_lpf_horizontal_4_neon
- vp9_lpf_vertical_4_neon
- vp9_lpf_horizontal_8_neon
- vp9_lpf_vertical_8_neon

Change-Id: I97a0d7b399a431c21ee77396be3d5f5a1f7ebccb
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-09 12:26:56 -08:00
Frank Galligan
95a568b3a8 Fix Neon convolve profiling
When profiling, gprof can't distinguish between matching labels in
different files.

Change-Id: I56770df212ed314a0d8568071fa8157624ef1e8f
2014-10-22 10:51:53 -07:00
Johann
1fc2b0fd00 Merge "Include type defines" 2014-06-20 11:29:19 -07:00
Johann
d658216276 Don't return value for void functions
Clears "warning: 'return' with a value, in function returning void"

Change-Id: I93972610d67e243ec772a1021d2fdfcfc689c8c2
2014-06-20 11:26:44 -07:00
Johann
baef0b89da Include type defines
Clears error: unknown type name 'uint8_t'

Change-Id: I9b6eff66a5c69bc24aeaeb5ade29255a164ef0e2
2014-06-20 11:26:13 -07:00
Jingning Han
41a350a83d Change eob threshold for partial inverse 8x8 2D-DCT to 12
The scanning order has the first 12 coefficients of the 8x8 2D-DCT
sitting in the top left 4x4 block. Hence the partial inverse 8x8
2D-DCT allows to handle cases with eob below 12.

The overall runtime of the inverse 8x8 2D-DCT unit is reduced from
166 cycles (using SSE2) to 150 cycles (using SSSE3).

Change-Id: I4514f9748042809ac84df4c14382c00f313f1cd2
2014-05-08 09:48:58 -07:00
hkuang
edcbbf2ee3 Merge "Fix a bug in neon that has not save and restore q4-q7 registers." 2014-02-28 09:48:26 -08:00
hkuang
f3d8e315ac Fix a bug in neon that has not save and restore q4-q7 registers.
Change-Id: Ie21b5ae89100389b80f919710839084f935a8545
2014-02-27 14:06:52 -08:00
James Yu
e486488ce8 Replace vqshrun by vqmovun if shift #0 bit
Change-Id: Ifabb8c7ec0c327fea9d6739cab10addb060ff435
Signed-off-by: James Yu <james.yu@linaro.org>
2014-02-14 21:03:40 -08:00
Johann
4378503665 Merge "Remove redundant arm neon instructions." 2014-02-14 20:02:51 -08:00
Yaowu Xu
ecf392a155 Merge "minor spelling cleanup in comments" 2014-02-14 14:29:35 -08:00
Frank Galligan
b41acbf9bb Fix neon wide loopfilter for filter8 only branch
The current code removed the check to only perform the filter8.

Change-Id: Ie54e19a77745042a5660eab986d9ef1c42e82410
2014-02-12 18:36:17 -08:00
Andrew Russell
549c31f8ae minor spelling cleanup in comments
Change-Id: Ia91c6c406273345b08505097ffe1af3896980f06
2014-02-12 16:32:51 -08:00
James Yu
619f29cdb0 Remove redundant arm neon instructions.
Change-Id: I1fabad59747eb5f68c64275a36c3a1d94daf32a3
Signed-off-by: James Yu <james.yu@linaro.org>
2014-02-11 21:19:12 -08:00
Martin Storsjo
03bc491721 arm: Consistently use braces around doubleword arguments to vld
This isn't strictly necessary, but makes the file more consistent
with the other arm assembly source files.

Change-Id: I245c9677d89e0ab3f31991e473764858af35b180
2014-02-05 13:24:25 +02:00
Martin Storsjo
c2bb1aa544 arm: Use {} around quadword arguments to vld
This fixes building for iOS.

Change-Id: Ice082648c02a3faf93891f7ddc122875e2bdc9cb
2014-02-05 13:24:17 +02:00
Dmitry Kovalev
c49b08c9a1 Removing "_short" suffix from arm transform file names.
Change-Id: Iefe118f61a335e88821a21a9f50fb919212c1507
2014-01-31 17:19:02 -08:00
hkuang
770454f3a8 Add vp9_tm_predictor_32x32 neon implementation
which is 7.8 times faster than C.

Change-Id: I858ef4ec09202a07d445da8db702783d6d9d7321
2014-01-27 16:01:07 -08:00
hkuang
05d2081d38 Fix the vp9_tm_predictor_8x8_neon.
Change-Id: I832cf83871044bfee7b7e57dbd31bae05cbd53e9
2014-01-27 10:17:20 -08:00
Frank Galligan
183361dadb Merge "Optimize vp9_tm_predictor_8x8_neon function" 2014-01-24 16:21:56 -08:00
Frank Galligan
56a8a0b54b Optimize vp9_tm_predictor_8x8_neon function
Change-Id: Ia12aae491202098ff66366145aa0c3da38dc97e5
2014-01-24 11:07:14 -08:00
hkuang
3633ffcbf7 Add vp9_tm_predictor_16x16 neon implementation
which is 3.5 times faster than C.

Change-Id: I24439ba7a2971829c11620f34848facf2c916678
2014-01-24 10:22:58 -08:00
hkuang
97826df96b Add tm_predictor_8x8 neon implementation.
Change-Id: I76c2720546b737cb63018a8ab6a3ff62a291786d
2014-01-22 13:43:20 -08:00
hkuang
2a2d8c140f Merge "Add vp9_tm_predictor_4x4 neon implementation" 2014-01-16 10:18:12 -08:00
hkuang
f2ef389256 Add vp9_tm_predictor_4x4 neon implementation
Change-Id: I10c423bde7ea5a3bac9f14f35c73b6bc31c8f3e3
2014-01-15 11:51:36 -08:00
hkuang
5be0ed30dc Merge "Add initial intra frame neon optimization. 1~2% gain." 2014-01-08 14:41:43 -08:00
hkuang
691111aacf Add initial intra frame neon optimization. 1~2% gain.
More intra optimizations will be added.

Change-Id: I33ae8d93f6002bf7b64cc2669602d9e6bfa5a6e8
2014-01-08 11:58:42 -08:00
Jim Bankoski
b720ba165f rename loop filter functions
This renames all the loop filter functions so that they no
longer refer to mb

Change-Id: I8a58a8c7fd253d835cb619bde13913e896ece90b
2013-12-17 17:34:34 -08:00
Frank Galligan
b4874e2c82 Fix 16 wide neon horz loopfilter.
Multiply by 3 was on 8bit vectors when it should have been on
16bit vectors.

Change-Id: I248c1429b3134dfd171dfab0ebb109fd2437e1fc
2013-11-26 10:02:40 -08:00
Yunqing Wang
ed36720b66 Do vertical loopfiltering in parallel
This patch followed "Add filter_selectively_vert_row2 to enable
parallel loopfiltering" commit, and added x86 SSE2 optimization
to do 16-pixel filtering in parallel. For other optimizations
(neon and dspr2), current 16-pixel functions were done by calling
8-pixel functions twice, and real 16-pixel functions could be added
later.

Decoder speedup:
tulip clip:     2% speed gain;
old_town_cross: 1.2% speed gain;
bus:            2% speed gain.

Change-Id: I4818a0c72f84b34f5fe678e496cf4a10238574b7
2013-11-22 10:04:51 -08:00