generic-library/vpx

Author	SHA1	Message	Date
Johann	911e96a4eb	Revert "Revert "VP8 for ARMv8 by using NEON intrinsics 06" This reverts commit `81ad047ee5`. Revert "VP8 for ARMv8 by using NEON intrinsics 15" This reverts commit 727af7cebe3698b8493ba6c1360b0a6606c310fb." This reverts commit `920f803f2e` Change-Id: I410d9036214a1b18427cca70b4bc6d8239740737	2014-08-20 09:41:50 -07:00
James Zern	74ed33cf6e	vp8_bilinear_predict4x4_neon: init src vectors quiets uninitialized warnings on the first load. Change-Id: I58a5af337087d96b4eaea8991a0f85c4ba58aebe	2014-07-11 00:05:25 -07:00
James Zern	5e30127c7a	vp8_sixtap_predict4x4_neon: init src vectors quiets uninitialized warnings on the first load. Change-Id: Ied9b03928537a9ed2cd414b9e8a0be00191b0f32	2014-07-10 23:48:47 -07:00
Johann	2f6f955a17	Remove intermediate step in vp8_dequantize_b With the intrinsics it is no longer necessary to have a stub/helper function. Change-Id: I3695961c3c94f1bb750d3b7b29716e509ebba482	2014-05-14 12:24:18 -07:00
Johann	920f803f2e	Revert "VP8 for ARMv8 by using NEON intrinsics 06" This reverts commit `81ad047ee5`. Revert "VP8 for ARMv8 by using NEON intrinsics 15" This reverts commit `727af7cebe`. This exposes a bug in gcc 4.9 regarding register allocation. Will reland when 4.9 is fixed. Change-Id: I2d8a04e4edde93719280e41550f4c0765608ec4d	2014-05-13 13:21:17 -07:00
Johann	4bffb75ba3	Merge "Revert "VP8 for ARMv8 by using NEON intrinsics 10""	2014-05-07 06:47:48 -07:00
Johann	3a695015ad	Merge "arm: Use a correct neon vector type for 64 bit integers"	2014-05-07 06:34:25 -07:00
Martin Storsjo	d5d82a5e1a	arm: Add a no-op define of __builtin_prefetch for MSVC Both GCC and RVCT/ARMCC support __builtin_prefetch, but MSVC doesn't. Change-Id: I44e1eecead61bc88d8fdfd3fef03d76d4f5afe08	2014-05-07 10:43:24 +03:00
Martin Storsjo	82a83c4fe0	arm: Use a correct neon vector type for 64 bit integers This fixes building with MSVC. Change-Id: I763ba8855c8083d82c8b477d3a297e310e93a335	2014-05-07 10:22:40 +03:00
Johann	677fb5123e	Revert "VP8 for ARMv8 by using NEON intrinsics 10" This reverts commit `c500fc22c1` There is an issue with gcc 4.6 in the Android NDK: loopfiltersimpleverticaledge_neon.c: In function 'vp8_loop_filter_bvs_neon': loopfiltersimpleverticaledge_neon.c:176:1: error: insn does not satisfy its constraints: Change-Id: I95b6509d12f075890308914cc691b813d2e5cd9f	2014-05-06 14:28:00 -07:00
Johann	928ff03889	Revert "VP8 for ARMv8 by using NEON intrinsics 08" This reverts commit `a5d79f43b9` There is an issue with gcc 4.6 in the Android NDK: loopfilter_neon.c: In function 'vp8_loop_filter_vertical_edge_y_neon': loopfilter_neon.c:394:1: error: insn does not satisfy its constraints: Change-Id: I2b8c6ee3fa595c152ac3a5c08dd79bd9770c7b52	2014-05-06 13:20:24 -07:00
Johann	34843e9784	Merge "VP8 for ARMv8 by using NEON intrinsics 16"	2014-05-05 13:09:43 -07:00
Johann	ad42b09f2f	Merge "VP8 for ARMv8 by using NEON intrinsics 15"	2014-05-05 13:09:12 -07:00
Johann	1b7291d52c	Merge "VP8 for ARMv8 by using NEON intrinsics 14"	2014-05-05 07:08:08 -07:00
Johann	a7355f3bbb	Merge changes Iaf7d6b0a,Iece0bf56 * changes: Use INLINE and include vpx_config.h instead of plain 'inline' Use vreinterpret instead of casting neon vector types	2014-05-05 05:36:54 -07:00
Martin Storsjo	7afed9a1b6	Use INLINE and include vpx_config.h instead of plain 'inline' This fixes compilation with MSVC. Change-Id: Iaf7d6b0a0134968a6addf315fde6d852f298db8c	2014-05-04 22:42:13 +03:00
Martin Storsjo	dfb8fc917a	Use vreinterpret instead of casting neon vector types MSVC doesn't support casting neon vector types but requires using vreinterpret. Change-Id: Iece0bf5632567efd7f37f527abea38afeab4926d	2014-05-04 22:40:57 +03:00
James Yu	4ea9cf3e2d	VP8 for ARMv8 by using NEON intrinsics 16 Add variance_neon.c - vp8_variance16x16_neon - vp8_variance16x8_neon - vp8_variance8x16_neon - vp8_variance8x8_neon Change-Id: Idfb9c96134a1c6a696a98ce68b4f7ed593a00660 Signed-off-by: James Yu <james.yu@linaro.org>	2014-05-03 19:07:40 -07:00
James Yu	727af7cebe	VP8 for ARMv8 by using NEON intrinsics 15 Add idct_dequant_0_2x_neon.c - idct_dequant_0_2x_neon Change-Id: I8e129172ef1b2517cf72ff267788921f1a792586 Signed-off-by: James Yu <james.yu@linaro.org>	2014-05-03 19:07:33 -07:00
James Yu	08e38f06db	VP8 for ARMv8 by using NEON intrinsics 14 Add sixtappredict_neon.c - vp8_sixtap_predict16x16_neon - vp8_sixtap_predict8x8_neon - vp8_sixtap_predict8x4_neon - vp8_sixtap_predict4x4_neon Change-Id: I3b02fce48ae2e6c6099041ba5ddd7b090f1463b9 Signed-off-by: James Yu <james.yu@linaro.org>	2014-05-03 19:07:12 -07:00
James Yu	18e9caad47	VP8 for ARMv8 by using NEON intrinsics 13 Add shortidct4x4llm_neon.c - vp8_short_idct4x4llm_neon Change-Id: I5a734bbffca8dacf8633c2b0ff07b98aa2f438ba Signed-off-by: James Yu <james.yu@linaro.org>	2014-05-03 19:07:05 -07:00
James Yu	feaf766bd0	VP8 for ARMv8 by using NEON intrinsics 12 Add sad_neon.c - vp8_sad16x16_neon - vp8_sad16x8_neon - vp8_sad8x8_neon - vp8_sad8x16_neon - vp8_sad4x4_neon Change-Id: I08eaae49ec03fb91b394354660a5df0367cea311 Signed-off-by: James Yu <james.yu@linaro.org>	2014-05-03 04:54:39 -07:00
James Yu	4a8336fa9d	VP8 for ARMv8 by using NEON intrinsics 11 Add mbloopfilter_neon.c - vp8_mbloop_filter_horizontal_edge_y_neon - vp8_mbloop_filter_horizontal_edge_uv_neon - vp8_mbloop_filter_vertical_edge_y_neon - vp8_mbloop_filter_vertical_edge_uv_neon Change-Id: Ia9084e0892d4d49412d9cf2b165a0f719f2382d7 Signed-off-by: James Yu <james.yu@linaro.org>	2014-05-03 04:54:33 -07:00
James Yu	c500fc22c1	VP8 for ARMv8 by using NEON intrinsics 10 Add loopfiltersimpleverticaledge_neon.c - vp8_loop_filter_bvs_neon - vp8_loop_filter_mbvs_neon Change-Id: I7cf0a161ad4ae37c881b94cc0122f895d3baae79 Signed-off-by: James Yu <james.yu@linaro.org>	2014-05-03 04:11:00 -07:00
James Yu	55c95f2d2c	VP8 for ARMv8 by using NEON intrinsics 09 Add loopfiltersimplehorizontaledge_neon.c - vp8_loop_filter_bhs_neon - vp8_loop_filter_mbhs_neon Change-Id: I77f9721b20585da8bf3869a3850ff0ae4b4bfeea Signed-off-by: James Yu <james.yu@linaro.org>	2014-05-03 04:10:45 -07:00
James Yu	a5d79f43b9	VP8 for ARMv8 by using NEON intrinsics 08 Add loopfilter_neon.c - vp8_loop_filter_horizontal_edge_y_neon - vp8_loop_filter_horizontal_edge_uv_neon - vp8_loop_filter_vertical_edge_y_neon - vp8_loop_filter_vertical_edge_uv_neon Change-Id: I50b57dedabd42d2a3c183c1738cc5346f0e71ed8 Signed-off-by: James Yu <james.yu@linaro.org>	2014-05-02 09:32:11 -07:00
James Yu	930557be10	VP8 for ARMv8 by using NEON intrinsics 07 Add iwalsh_neon.c - vp8_short_inv_walsh4x4_neon Change-Id: I8beda6ce11ad8ce9e80cc0a38d40161938359162 Signed-off-by: James Yu <james.yu@linaro.org>	2014-05-02 09:24:54 -07:00
James Yu	81ad047ee5	VP8 for ARMv8 by using NEON intrinsics 06 Add idct_dequant_full_2x_neon.c - idct_dequant_full_2x_neon ==== Summary of apply VP8 decode patch series ==== Benchmark on Samsung Chromebook, Cortex-A15, 1.7GHz, Dual core Toolchain: linaro-1.13.1-4.8-2014.01 Compile argument: CROSS=arm-linux-gnueabihf- ../libvpx/configure --target=armv7-linux-gcc --prefix=$HOME/out --enable-shared --cpu=cortex-a7 Test argument: vpxdec --summary --noblit ./tears_of_steel_1080p.webm NEON assembly 46.68 (fps) Apply patch 06 46.65, -0.03 Apply patch 07 46.86, +0.21 Apply patch 08 46.58, -0.28 Apply patch 09 46.57, -0.01 Apply patch 10 46.51, -0.06 Apply patch 11 46.13, -0.38 Apply patch 12 45.42, -0.71 Apply patch 13 46.06, +0.64 Apply patch 14 45.19, -0.87 Apply patch 15 45.93, +0.74 Apply patch 16 45.48, -0.45 Apply patch 17 45.84, +0.36 Apply patch 18 45.91, +0.07 <= With all NEON intrinsics patches Total -0.77 fps, 1.65% performance regression Change-Id: I77bfc9eaccfb97b8d401e949ceff8795e26ca6b7 Signed-off-by: James Yu <james.yu@linaro.org>	2014-05-02 11:57:47 +08:00
Yunqing Wang	096eaba728	Remove VP8 save_reg_neon function This patch did a cleanup following the commit "Save NEON registers in VP8 NEON functions". The pushing/poping of callee-saved NEON registers was moved into individual NEON functions. Therefore, we don't need to save those registers at the beginning of codec. The related code was removed. Change-Id: I5648166514fc9beffb780aa138495597731f49ea	2014-04-29 16:13:24 -07:00
Yunqing Wang	33df6d1fc1	Save NEON registers in VP8 NEON functions The recent compiler can generate optimized code that uses NEON registers for various operations besides floating-point operations. Therefore, only saving callee-saved registers d8 - d15 at the beginning of the encoder/decoder is not enough anymore. This patch added register saving code in VP8 NEON functions that use those registers. Change-Id: Ie9e44f5188cf410990c8aaaac68faceee9dffd31	2014-04-28 14:51:53 -07:00
James Yu	fb5d281bb6	VP8 for ARMv8 by using NEON intrinsics 05 Add dequantizeb_neon.c - vp8_dequantize_b_loop_neon vpxdec --summary --noblit ../videos/tears_of_steel_1080p.webm Before => After, 13.25 => 13.23 (fps) Change-Id: Iebe3b0c6ed2359c778b0570763c5681ae25fef0c Signed-off-by: James Yu <james.yu@linaro.org>	2014-02-26 10:16:00 +08:00
James Yu	28b2f82f97	VP8 for ARMv8 by using NEON intrinsics 04 Add dequant_idct_neon.c - vp8_dequant_idct_add_neon vpxdec --summary --noblit ../videos/tears_of_steel_1080p.webm Before => After, 13.25 => 13.22 (fps) Change-Id: Id48f39e1da58dd3d8d37658e94989411997f4f7c Signed-off-by: James Yu <james.yu@linaro.org>	2014-02-26 09:59:23 +08:00
James Yu	d749ab6221	VP8 for ARMv8 by using NEON intrinsics 03 Add dc_only_idct_add_neon.c - vp8_dc_only_idct_add_neon vpxdec --summary --noblit ../videos/tears_of_steel_1080p.webm Before => After, 13.25 => 13.24 (fps) Change-Id: I5e9e277ec3a3ca67e13c8cc4c324a6fbe8a897fc Signed-off-by: James Yu <james.yu@linaro.org>	2014-02-26 09:28:29 +08:00
James Yu	300a3bfc73	VP8 for ARMv8 by using NEON intrinsics 02 Add copymem_neon.c - vp8_copy_mem16x16_neon - vp8_copy_mem8x8_neon - vp8_copy_mem8x4_neon vpxdec --summary --noblit ../videos/tears_of_steel_1080p.webm Before => After, 13.25 => 13.25 (fps) Change-Id: Ib956b5a20522ff57dc8a580bf0aef7b252bddba6 Signed-off-by: James Yu <james.yu@linaro.org>	2014-02-23 22:56:53 +08:00
Johann	dadf350551	Apply neon flags to intrinsic files Filter out files ending in _neon.c and append .neon so the Android build system knows to apply -mfpu=neon Change-Id: Ib67277e5920bfcaeda7c4aa16cd1001b11d59305	2014-01-10 12:16:59 -08:00
James Yu	79395e16cf	VP8 for ARMv8 by using NEON intrinsics 01 Add bilinearpredict_neon_intrinsics.c - vp8_bilinear_predict4x4_neon - vp8_bilinear_predict8x4_neon - vp8_bilinear_predict8x8_neon - vp8_bilinear_predict16x16_neon Change-Id: I33dfa502881219841b442dda32b73220e51b716b Signed-off-by: James Yu <james.yu@linaro.org>	2014-01-09 09:56:22 -08:00
Martin Storsjo	fba3110b09	arm: Move the definition of bilinear_taps_coeff to within the section Previously, the microsoft arm assembler errored out, saying that bilinear_taps_coeff was an undefined symbol. Change-Id: Ib938f0b454c41ccbd801e70a7c9acc0fa04e3c55	2013-05-22 01:50:59 +03:00
John Koleszar	a9c7597adc	support building vp8 and vp9 into a single lib Change-Id: Ib8f8a66c9fd31e508cdc9caa662192f38433aa3d	2012-11-15 10:46:17 -08:00
John Koleszar	0164a1cc5b	Fix pedantic compiler warnings Allows building the library with the gcc -pedantic option, for improved portabilty. In particular, this commit removes usage of C99/C++ style single-line comments and dynamic struct initializers. This is a continuation of the work done in commit `97b766a46`, which removed most of these warnings for decode only builds. Change-Id: Id453d9c1d9f44cc0381b10c3869fabb0184d5966	2012-06-11 15:14:58 -07:00
Timothy B. Terriberry	e50c842755	Fix TEXTRELs in the ARM asm. Besides imposing a performance penalty at startup in most configurations, these relocations break the dynamic linker for native Fennec, since it does not support them at all. Change-Id: Id5dc768609354ebb4379966eb61a7313e6fd18de	2012-05-02 10:36:01 -07:00
Johann	e50f96a4a3	Move SAD and variance functions to common The MFQE function of the postprocessor depends on these Change-Id: I256a37c6de079fe92ce744b1f11e16526d06b50a	2012-03-05 16:50:33 -08:00
John Koleszar	f103dcefaf	RTCD: add subpixel functions This commit continues the process of converting to the new RTCD system. Change-Id: I6c519ab61e4f4e0ebcc796f2df061f945c48cefe	2012-01-30 12:08:29 -08:00
John Koleszar	ab77b4e898	RTCD: add remaining IDCT functions This commit continues the process of converting to the new RTCD system. Change-Id: I03c4dbf30dfd3558b0e256ff9d3ff4c012aadc80	2012-01-30 12:08:22 -08:00
John Koleszar	a910049aea	New RTCD implementation This is a proof of concept RTCD implementation to replace the current system of nested includes, prototypes, INVOKE macros, etc. Currently only the decoder specific functions are implemented in the new system. Additional functions will be added in subsequent commits. Overview: RTCD "functions" are implemented as either a global function pointer or a macro (when only one eligible specialization available). Functions which have RTCD specializations are listed using a simple DSL identifying the function's base name, its prototype, and the architecture extensions that specializations are available for. Advantages over the old system: - No INVOKE macros. A call to an RTCD function looks like an ordinary function call. - No need to pass vtables around. - If there is only one eligible function to call, the function is called directly, rather than indirecting through a function pointer. - Supports the notion of "required" extensions, so in combination with the above, on x86_64 if the best function available is sse2 or lower it will be called directly, since all x86_64 platforms implement sse2. - Elides all references to functions which will never be called, which could reduce binary size. For example if sse2 is required and there are both mmx and sse2 implementations of a certain function, the code will have no link time references to the mmx code. - Significantly easier to add a new function, just one file to edit. Disadvantages: - Requires global writable data (though this is not a new requirement) - 1 new generated source file. Change-Id: Iae6edab65315f79c168485c96872641c5aa09d55	2012-01-30 12:06:27 -08:00
Attila Nagy	294aa37745	Rename save_neon_reg.asm as save_reg_neon.asm Easier to filter out all NEON asm. Change-Id: I0022dae8321a9608e864b09d4181414c5fff4610	2012-01-26 09:44:00 +02:00
Scott LaVarnway	a53d5a4c44	Moved dequant idct into common These functions are now used by the encoder. This is WIP with the goal of creating a common idct/add for the encoder and decoder. A boost of 1.8% was seen for the HD rt test clip used. [Tero] Added needed changes to ARM side. Change-Id: Ibbb8000be09034203d7adffc457d3c3f8b06a5bf	2011-12-15 14:23:41 -05:00
Scott LaVarnway	4a91541c94	Modified the inverse walsh to output directly to the dqcoeff or qcoeff buffer. The encoder would populate the dc coeffs of the y blocks as a separate stage (recon_dcblock) and the decoder would use a special version of the idct. This change eliminates the extra copy and reduces the code footprint. [Tero] Added needed changes to armv6 and NEON assembly. Change-Id: I83202ffdbaf83f6e5dd69f4ba2519fcf0b13b3ba	2011-11-25 09:24:04 +02:00
Scott LaVarnway	ed9c66f584	Remove usage of predict buffer for decode Instead of using the predict buffer, the decoder now writes the predictor into the recon buffer. For blocks with eob=0, unnecessary idcts can be eliminated. This gave a performance boost of ~1.8% for the HD clips used. Tero: Added needed changes to ARM side and scheduled some assembly code to prevent interlocks. Patch Set 6: Merged (I1bcdca7a95aacc3a181b9faa6b10e3a71ee24df3) into this commit because of similarities in the idct functions. Patch Set 7: EC bug fix. Change-Id: Ie31d90b5d3522e1108163f2ac491e455e3f955e6	2011-10-18 12:06:50 -04:00
Attila Nagy	1a7d25a484	Replace vpx_ports/config.h with vpx_config.h Just a clean-up. Change-Id: Iea5b6dc925dcfa7db548bc1ab1a13d26ed5a2c9a	2011-09-22 13:33:54 +03:00
Attila Nagy	283b0e25ac	Update armv7 loopfilter to new interface Change-Id: I65105a9c63832669237e6a6a7fcb4ea3ea683346	2011-07-12 12:12:25 +03:00

1 2

65 Commits