generic-library/vpx

Author	SHA1	Message	Date
James Yu	8c25f4af6a	VP9 common for ARMv8 by using NEON intrinsics 12 Add vp9_idct4x4_add_neon.c - vp9_idct4x4_16_add_neon Change-Id: I011a96b10f1992dbd52246019ce05bae7ca8ea4f Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 14:49:59 -08:00
James Yu	420f58f2d2	VP9 common for ARMv8 by using NEON intrinsics 11 Add vp9_idct16x16_1_add_neon.c - vp9_idct16x16_1_add_neon Change-Id: I7c6524024ad4cb4e66aa38f1c887e733503c39df Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 13:06:58 -08:00
James Yu	030ca4d0e5	VP9 common for ARMv8 by using NEON intrinsics 10 Add vp9_idct32x32_1_add_neon.c - vp9_idct32x32_1_add_neon Change-Id: If9ffe9a857228f5c67f61dc2b428b40965816eda Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 13:04:29 -08:00
James Yu	2772b45ac0	VP9 common for ARMv8 by using NEON intrinsics 09 Add vp9_idct8x8_1_add_neon.c - vp9_idct8x8_1_add_neon Change-Id: I9d23e01fa96013febbf64db6c76c6c955f14e3ff Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 12:52:33 -08:00
James Yu	9114f0afdb	VP9 common for ARMv8 by using NEON intrinsics 08 Add vp9_idct4x4_1_add_neon.c - vp9_idct4x4_1_add_neon Change-Id: Ieab9af107dbd07a4f9503bc945890c90faccb8ac Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 12:49:28 -08:00
James Yu	01fc6f51e0	VP9 common for ARMv8 by using NEON intrinsics 07 Add vp9_convolve8_neon.c - vp9_convolve8_horiz_neon - vp9_convolve8_vert_neon Change-Id: I0bdd99ff72d275223fe211ac7243c25a5a60cf87 Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 20:03:07 -08:00
James Yu	893534a996	VP9 common for ARMv8 by using NEON intrinsics 04 Add vp9_convolve8_avg_neon.c - vp9_convolve8_avg_horiz_neon - vp9_convolve8_avg_vert_neon Change-Id: I617971e37b02186fec5aca181f4f9622050ea2df Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 20:03:07 -08:00
James Yu	d12757f5c6	VP9 common for ARMv8 by using NEON intrinsics 03 Add vp9_copy_neon.c - vp9_convolve_copy_neon Change-Id: I291fc5423d06240876411bbceab03eae5ef585be Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 20:02:46 -08:00
Scott LaVarnway	617382a2e3	VP9 common for ARMv8 by using NEON intrinsics 02 Add vp9_avg_neon.c - vp9_convolve_avg_neon Change-Id: Id2c9d5bcfa37cff1a16417aba1656ff07bdf10fd Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 19:00:21 -08:00
James Yu	5b098b1825	VP9 common for ARMv8 by using NEON intrinsics 01 Add vp9_loopfilter_neon.c - vp9_lpf_horizontal_4_neon - vp9_lpf_vertical_4_neon - vp9_lpf_horizontal_8_neon - vp9_lpf_vertical_8_neon Change-Id: I97a0d7b399a431c21ee77396be3d5f5a1f7ebccb Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 12:26:56 -08:00
Frank Galligan	95a568b3a8	Fix Neon convolve profiling When profiling, gprof can't distinguish between matching labels in different files. Change-Id: I56770df212ed314a0d8568071fa8157624ef1e8f	2014-10-22 10:51:53 -07:00
Johann	1fc2b0fd00	Merge "Include type defines"	2014-06-20 11:29:19 -07:00
Johann	d658216276	Don't return value for void functions Clears "warning: 'return' with a value, in function returning void" Change-Id: I93972610d67e243ec772a1021d2fdfcfc689c8c2	2014-06-20 11:26:44 -07:00
Johann	baef0b89da	Include type defines Clears error: unknown type name 'uint8_t' Change-Id: I9b6eff66a5c69bc24aeaeb5ade29255a164ef0e2	2014-06-20 11:26:13 -07:00
Jingning Han	41a350a83d	Change eob threshold for partial inverse 8x8 2D-DCT to 12 The scanning order has the first 12 coefficients of the 8x8 2D-DCT sitting in the top left 4x4 block. Hence the partial inverse 8x8 2D-DCT allows to handle cases with eob below 12. The overall runtime of the inverse 8x8 2D-DCT unit is reduced from 166 cycles (using SSE2) to 150 cycles (using SSSE3). Change-Id: I4514f9748042809ac84df4c14382c00f313f1cd2	2014-05-08 09:48:58 -07:00
hkuang	edcbbf2ee3	Merge "Fix a bug in neon that has not save and restore q4-q7 registers."	2014-02-28 09:48:26 -08:00
hkuang	f3d8e315ac	Fix a bug in neon that has not save and restore q4-q7 registers. Change-Id: Ie21b5ae89100389b80f919710839084f935a8545	2014-02-27 14:06:52 -08:00
James Yu	e486488ce8	Replace vqshrun by vqmovun if shift #0 bit Change-Id: Ifabb8c7ec0c327fea9d6739cab10addb060ff435 Signed-off-by: James Yu <james.yu@linaro.org>	2014-02-14 21:03:40 -08:00
Johann	4378503665	Merge "Remove redundant arm neon instructions."	2014-02-14 20:02:51 -08:00
Yaowu Xu	ecf392a155	Merge "minor spelling cleanup in comments"	2014-02-14 14:29:35 -08:00
Frank Galligan	b41acbf9bb	Fix neon wide loopfilter for filter8 only branch The current code removed the check to only perform the filter8. Change-Id: Ie54e19a77745042a5660eab986d9ef1c42e82410	2014-02-12 18:36:17 -08:00
Andrew Russell	549c31f8ae	minor spelling cleanup in comments Change-Id: Ia91c6c406273345b08505097ffe1af3896980f06	2014-02-12 16:32:51 -08:00
James Yu	619f29cdb0	Remove redundant arm neon instructions. Change-Id: I1fabad59747eb5f68c64275a36c3a1d94daf32a3 Signed-off-by: James Yu <james.yu@linaro.org>	2014-02-11 21:19:12 -08:00
Martin Storsjo	03bc491721	arm: Consistently use braces around doubleword arguments to vld This isn't strictly necessary, but makes the file more consistent with the other arm assembly source files. Change-Id: I245c9677d89e0ab3f31991e473764858af35b180	2014-02-05 13:24:25 +02:00
Martin Storsjo	c2bb1aa544	arm: Use {} around quadword arguments to vld This fixes building for iOS. Change-Id: Ice082648c02a3faf93891f7ddc122875e2bdc9cb	2014-02-05 13:24:17 +02:00
Dmitry Kovalev	c49b08c9a1	Removing "_short" suffix from arm transform file names. Change-Id: Iefe118f61a335e88821a21a9f50fb919212c1507	2014-01-31 17:19:02 -08:00
hkuang	770454f3a8	Add vp9_tm_predictor_32x32 neon implementation which is 7.8 times faster than C. Change-Id: I858ef4ec09202a07d445da8db702783d6d9d7321	2014-01-27 16:01:07 -08:00
hkuang	05d2081d38	Fix the vp9_tm_predictor_8x8_neon. Change-Id: I832cf83871044bfee7b7e57dbd31bae05cbd53e9	2014-01-27 10:17:20 -08:00
Frank Galligan	183361dadb	Merge "Optimize vp9_tm_predictor_8x8_neon function"	2014-01-24 16:21:56 -08:00
Frank Galligan	56a8a0b54b	Optimize vp9_tm_predictor_8x8_neon function Change-Id: Ia12aae491202098ff66366145aa0c3da38dc97e5	2014-01-24 11:07:14 -08:00
hkuang	3633ffcbf7	Add vp9_tm_predictor_16x16 neon implementation which is 3.5 times faster than C. Change-Id: I24439ba7a2971829c11620f34848facf2c916678	2014-01-24 10:22:58 -08:00
hkuang	97826df96b	Add tm_predictor_8x8 neon implementation. Change-Id: I76c2720546b737cb63018a8ab6a3ff62a291786d	2014-01-22 13:43:20 -08:00
hkuang	2a2d8c140f	Merge "Add vp9_tm_predictor_4x4 neon implementation"	2014-01-16 10:18:12 -08:00
hkuang	f2ef389256	Add vp9_tm_predictor_4x4 neon implementation Change-Id: I10c423bde7ea5a3bac9f14f35c73b6bc31c8f3e3	2014-01-15 11:51:36 -08:00
hkuang	5be0ed30dc	Merge "Add initial intra frame neon optimization. 1~2% gain."	2014-01-08 14:41:43 -08:00
hkuang	691111aacf	Add initial intra frame neon optimization. 1~2% gain. More intra optimizations will be added. Change-Id: I33ae8d93f6002bf7b64cc2669602d9e6bfa5a6e8	2014-01-08 11:58:42 -08:00
Jim Bankoski	b720ba165f	rename loop filter functions This renames all the loop filter functions so that they no longer refer to mb Change-Id: I8a58a8c7fd253d835cb619bde13913e896ece90b	2013-12-17 17:34:34 -08:00
Frank Galligan	b4874e2c82	Fix 16 wide neon horz loopfilter. Multiply by 3 was on 8bit vectors when it should have been on 16bit vectors. Change-Id: I248c1429b3134dfd171dfab0ebb109fd2437e1fc	2013-11-26 10:02:40 -08:00
Yunqing Wang	ed36720b66	Do vertical loopfiltering in parallel This patch followed "Add filter_selectively_vert_row2 to enable parallel loopfiltering" commit, and added x86 SSE2 optimization to do 16-pixel filtering in parallel. For other optimizations (neon and dspr2), current 16-pixel functions were done by calling 8-pixel functions twice, and real 16-pixel functions could be added later. Decoder speedup: tulip clip: 2% speed gain; old_town_cross: 1.2% speed gain; bus: 2% speed gain. Change-Id: I4818a0c72f84b34f5fe678e496cf4a10238574b7	2013-11-22 10:04:51 -08:00
Frank Galligan	97d1258375	Revert "Add 16 wide neon horz loopfilter." The change caused mismatches with some test vectors on neon. Original CL: https://gerrit.chromium.org/gerrit/#/c/67863/ Change-Id: I913891636d53783e93cb1865ca78ded1821dc4b0	2013-11-21 14:01:33 -08:00
Frank Galligan	98de15137e	Add 16 wide neon horz loopfilter. Add support to do 16 pixel horizontal filtering in Neon. Nexus devices saw about 0.5% decode speed increase. Change-Id: I2993f6c2d49f31fa74976879eeaa289fd3f4e15d	2013-11-21 09:39:36 -08:00
Yunqing Wang	64f728caef	Do horizontal loopfiltering in parallel This patch followed "Rewrite filter_selectively_horiz for parallel loopfiltering" commit, and added x86 SSE2 optimization to do 16-pixel filtering in parallel. Also, corrected the declaration of aligned arrays. For 8-pixel-in-parallel case, improved the calculation of the masks and filters. Updated the threshold loading since the thresholds were already duplicated. Updated neon C functions to call neon loopfilters twice. Using tulip clip, tests showed it gave a ~1.5% decoder speed gain. Change-Id: Id02638626ac27a4b0e0b09d71792a24c0499bd35	2013-11-15 16:18:43 -08:00
Johann	e72d49a97a	Use lowercase 'b' to branch iOS doesn't recognize B: bad instruction `B idct32_pass_loop' Change-Id: I3cf6aede4639f1d9efa97f7962fa287ba6feaaef	2013-11-12 10:41:06 -08:00
hkuang	c689a126ed	Fix a bug in the assembly code. Change-Id: Ic416e3f8a11e82ee298e6f709b2119a9ddf1e2f8	2013-11-11 12:49:12 -08:00
hkuang	6b16f63332	Add back vp9_short_idct32x32_1_add_neon which is deleted in cleanup I63df79a13cf62aa2c9360a7a26933c100f9ebda3. Change-Id: I034848cf05031618818f7df2e7f9c35102686948	2013-11-05 14:57:32 -08:00
Dmitry Kovalev	65f118d72f	Making input pointer of any inverse transform constant. Also renaming dest_stride to stride in some places. Change-Id: I75f602b623a5a7071d4922b747c45fa0b7d7a940	2013-10-11 18:27:12 -07:00
Dmitry Kovalev	7ef573914d	Consistent names for inverse hybrid transforms (1 of 2). Renames: vp9_short_iht4x4_add -> vp9_iht4x4_16_add vp9_short_iht8x8_add -> vp9_iht8x8_64_add vp9_short_iht16x16_add_c -> vp9_iht16x16_256_add Change-Id: Ibca7a188fd062b196787ac5efc1ea545e7f166c0	2013-10-11 13:31:32 -07:00
Dmitry Kovalev	1e766b50e2	Giving consistent names to IDCT 32x32 functions. Renames: vp9_short_idct32x32_add -> vp9_idct32x32_1024_add vp9_short_idct32x32_1_add -> vp9_idct32x32_1_add vp9_idct_add_32x32 -> vp9_idct32x32_add Change-Id: Id85306f5814bac6c47463a6b5901a93082510666	2013-10-10 11:27:39 -07:00
Dmitry Kovalev	b096c5a336	Giving consistent names to IDCT 16x16 functions. Renames: vp9_short_idct16x16_add -> vp9_idct16x16_256_add vp9_short_idct16x16_10_add -> vp9_idct16x16_10_add vp9_short_idct16x16_1_add -> vp9_idct16x16_1_add vp9_idct_add_16x16 -> vp9_idct16x16_add Change-Id: Ief8a3904de78deab0f4ede944c4d0339c228cfc3	2013-10-07 14:31:10 -07:00
Dmitry Kovalev	c6ad70d5f1	Giving consistent names to IDCT 8x8 functions. Renames: vp9_short_idct8x8_add -> vp9_idct8x8_64_add vp9_short_idct8x8_1_add -> vp9_idct8x8_1_add vp9_short_idct8x8_10_add -> vp9_idct8x8_10_add vp9_idct_add_8x8 -> vp9_idct8x8_add Change-Id: Ifb8d3a45b4c0397aa805b30463f3d14581bf72c1	2013-10-06 00:24:09 -07:00

1 2 3

106 Commits