Commit Graph

80 Commits

Author SHA1 Message Date
James Yu
fb5d281bb6 VP8 for ARMv8 by using NEON intrinsics 05
Add dequantizeb_neon.c
- vp8_dequantize_b_loop_neon

vpxdec  --summary --noblit ../videos/tears_of_steel_1080p.webm
Before => After, 13.25 => 13.23 (fps)

Change-Id: Iebe3b0c6ed2359c778b0570763c5681ae25fef0c
Signed-off-by: James Yu <james.yu@linaro.org>
2014-02-26 10:16:00 +08:00
James Yu
28b2f82f97 VP8 for ARMv8 by using NEON intrinsics 04
Add dequant_idct_neon.c
- vp8_dequant_idct_add_neon

vpxdec  --summary --noblit ../videos/tears_of_steel_1080p.webm
Before => After, 13.25 => 13.22 (fps)

Change-Id: Id48f39e1da58dd3d8d37658e94989411997f4f7c
Signed-off-by: James Yu <james.yu@linaro.org>
2014-02-26 09:59:23 +08:00
James Yu
d749ab6221 VP8 for ARMv8 by using NEON intrinsics 03
Add dc_only_idct_add_neon.c
- vp8_dc_only_idct_add_neon

vpxdec  --summary --noblit ../videos/tears_of_steel_1080p.webm
Before => After, 13.25 => 13.24 (fps)

Change-Id: I5e9e277ec3a3ca67e13c8cc4c324a6fbe8a897fc
Signed-off-by: James Yu <james.yu@linaro.org>
2014-02-26 09:28:29 +08:00
James Yu
300a3bfc73 VP8 for ARMv8 by using NEON intrinsics 02
Add copymem_neon.c
- vp8_copy_mem16x16_neon
- vp8_copy_mem8x8_neon
- vp8_copy_mem8x4_neon

vpxdec  --summary --noblit ../videos/tears_of_steel_1080p.webm
Before => After, 13.25 => 13.25 (fps)

Change-Id: Ib956b5a20522ff57dc8a580bf0aef7b252bddba6
Signed-off-by: James Yu <james.yu@linaro.org>
2014-02-23 22:56:53 +08:00
Andrew Russell
549c31f8ae minor spelling cleanup in comments
Change-Id: Ia91c6c406273345b08505097ffe1af3896980f06
2014-02-12 16:32:51 -08:00
James Zern
aceba82c41 vp8/common: add extern "C" to headers
Change-Id: I13b434b1e6621e31962b08831c3587c039368c83
2014-01-23 16:21:24 -08:00
Johann
dadf350551 Apply neon flags to intrinsic files
Filter out files ending in _neon.c and append .neon so the Android build
system knows to apply -mfpu=neon

Change-Id: Ib67277e5920bfcaeda7c4aa16cd1001b11d59305
2014-01-10 12:16:59 -08:00
James Yu
79395e16cf VP8 for ARMv8 by using NEON intrinsics 01
Add bilinearpredict_neon_intrinsics.c
- vp8_bilinear_predict4x4_neon
- vp8_bilinear_predict8x4_neon
- vp8_bilinear_predict8x8_neon
- vp8_bilinear_predict16x16_neon

Change-Id: I33dfa502881219841b442dda32b73220e51b716b
Signed-off-by: James Yu <james.yu@linaro.org>
2014-01-09 09:56:22 -08:00
James Zern
e903cacf5b vp8/common: normalize include guards
Change-Id: Ia8789a8f864e0edc0bf94f00f6430846f86911c3
2013-12-16 19:40:54 -08:00
Martin Storsjo
fba3110b09 arm: Move the definition of bilinear_taps_coeff to within the section
Previously, the microsoft arm assembler errored out, saying that
bilinear_taps_coeff was an undefined symbol.

Change-Id: Ib938f0b454c41ccbd801e70a7c9acc0fa04e3c55
2013-05-22 01:50:59 +03:00
Martin Storsjo
b9ed185659 arm: Explicitly write both target registers for ldrd
The microsoft assembler can't handle the second register being
implicit.

Change-Id: Ia831953a78a25fd6b2082474f05fdb78d96cdf78
2013-05-22 01:50:58 +03:00
John Koleszar
a9c7597adc support building vp8 and vp9 into a single lib
Change-Id: Ib8f8a66c9fd31e508cdc9caa662192f38433aa3d
2012-11-15 10:46:17 -08:00
Johann
aa165c8c5d Update armv6 vp8_intra4x4_predict
Change-Id: I52a3b0a4a42e5af91b987e19523df07c8f467847
2012-08-08 10:57:33 -07:00
Johann
a82c58c40f Change vp8_intra4x4_predict call sites
Use the _d variant from the decoder. It moves the pointer calculations
to the caller.

Change-Id: Iae2a793433ef082980a3ffa0a1cabf0264a6a24d
2012-08-01 10:48:46 -07:00
John Koleszar
0164a1cc5b Fix pedantic compiler warnings
Allows building the library with the gcc -pedantic option, for improved
portabilty. In particular, this commit removes usage of C99/C++ style
single-line comments and dynamic struct initializers. This is a
continuation of the work done in commit 97b766a46, which removed most
of these warnings for decode only builds.

Change-Id: Id453d9c1d9f44cc0381b10c3869fabb0184d5966
2012-06-11 15:14:58 -07:00
John Koleszar
d8216b19b6 Merge "Fix compiler warnings" into eider 2012-05-02 16:22:34 -07:00
Timothy B. Terriberry
e50c842755 Fix TEXTRELs in the ARM asm.
Besides imposing a performance penalty at startup in most
 configurations, these relocations break the dynamic linker for
 native Fennec, since it does not support them at all.

Change-Id: Id5dc768609354ebb4379966eb61a7313e6fd18de
2012-05-02 10:36:01 -07:00
Attila Nagy
14c9fce8e4 Fix compiler warnings
Fix code for following warnings:
-Wimplicit-function-declaration
-Wuninitialized
-Wunused-but-set-variable
-Wunused-variable

Change-Id: I2be434f22fdecb903198e8b0711255b4c1a2947a
2012-05-02 10:57:57 +03:00
Johann
e50f96a4a3 Move SAD and variance functions to common
The MFQE function of the postprocessor depends on these

Change-Id: I256a37c6de079fe92ce744b1f11e16526d06b50a
2012-03-05 16:50:33 -08:00
John Koleszar
f103dcefaf RTCD: add subpixel functions
This commit continues the process of converting to the new RTCD
system.

Change-Id: I6c519ab61e4f4e0ebcc796f2df061f945c48cefe
2012-01-30 12:08:29 -08:00
John Koleszar
fdb61a4531 RTCD: add recon functions
This commit continues the process of converting to the new RTCD
system.

Change-Id: I9bfcf9bef65c3d4ba0fb9a3e1532bad1463a10d6
2012-01-30 12:08:28 -08:00
John Koleszar
ab77b4e898 RTCD: add remaining IDCT functions
This commit continues the process of converting to the new RTCD
system.

Change-Id: I03c4dbf30dfd3558b0e256ff9d3ff4c012aadc80
2012-01-30 12:08:22 -08:00
John Koleszar
55f74c59c7 RTCD: add loopfilter functions
This commit continues the process of converting to the new RTCD
system.

Change-Id: Ic8a4047d72ff3a54ec98977dd90e70c13213db71
2012-01-30 12:06:31 -08:00
John Koleszar
a910049aea New RTCD implementation
This is a proof of concept RTCD implementation to replace the current
system of nested includes, prototypes, INVOKE macros, etc. Currently
only the decoder specific functions are implemented in the new system.
Additional functions will be added in subsequent commits.

Overview:
  RTCD "functions" are implemented as either a global function pointer
  or a macro (when only one eligible specialization available).
  Functions which have RTCD specializations are listed using a simple
  DSL identifying the function's base name, its prototype, and the
  architecture extensions that specializations are available for.

Advantages over the old system:
  - No INVOKE macros. A call to an RTCD function looks like an ordinary
    function call.
  - No need to pass vtables around.
  - If there is only one eligible function to call, the function is
    called directly, rather than indirecting through a function pointer.
  - Supports the notion of "required" extensions, so in combination with
    the above, on x86_64 if the best function available is sse2 or lower
    it will be called directly, since all x86_64 platforms implement
    sse2.
  - Elides all references to functions which will never be called, which
    could reduce binary size. For example if sse2 is required and there
    are both mmx and sse2 implementations of a certain function, the
    code will have no link time references to the mmx code.
  - Significantly easier to add a new function, just one file to edit.

Disadvantages:
  - Requires global writable data (though this is not a new requirement)
  - 1 new generated source file.

Change-Id: Iae6edab65315f79c168485c96872641c5aa09d55
2012-01-30 12:06:27 -08:00
Attila Nagy
294aa37745 Rename save_neon_reg.asm as save_reg_neon.asm
Easier to filter out all NEON asm.

Change-Id: I0022dae8321a9608e864b09d4181414c5fff4610
2012-01-26 09:44:00 +02:00
Fritz Koenig
892102842a Disconnect ARM tgt_isa from dsp extensions
A processor with ARMv7 instructions does not
necessarily have NEON dsp extensions.  This CL
has the added side effect of allowing the ability
to enable/disable the dsp extensions cleanly.

Change-Id: Ie1e879b8fe131885bc3d4138a0acc9ffe73a36df
2012-01-20 10:38:15 -08:00
Scott LaVarnway
5f25d4c175 Reduced the size of Y1Dequant and friends to [128][2]
This patch removes the local copies of the dequantize
constants and implements John's idea as described
in "Make a local copy of the dequantized data" commit.

Change-Id: Ic6b7d681f00bf63263f71ff1e39ab2f80729e8b2
2012-01-06 11:12:00 -08:00
John Koleszar
0c2f8e77cc Remove useless g_common.h
This file declared a bunch of nonexistent, unreferenced global
function pointers.

Change-Id: Ic26bb8c7712deba754c49fc01f383b53afc9e728
2011-12-21 15:02:23 -08:00
Scott LaVarnway
a53d5a4c44 Moved dequant idct into common
These functions are now used by the encoder.
This is WIP with the goal of creating a common idct/add for
the encoder and decoder.  A boost of 1.8% was seen for
the HD rt test clip used.

[Tero] Added needed changes to ARM side.

Change-Id: Ibbb8000be09034203d7adffc457d3c3f8b06a5bf
2011-12-15 14:23:41 -05:00
Scott LaVarnway
4a91541c94 Modified the inverse walsh to output directly
to the dqcoeff or qcoeff buffer.  The encoder would
populate the dc coeffs of the y blocks as a separate
stage (recon_dcblock) and the decoder would use a special
version of the idct.  This change eliminates the extra copy
and reduces the code footprint.

[Tero] Added needed changes to armv6 and NEON assembly.

Change-Id: I83202ffdbaf83f6e5dd69f4ba2519fcf0b13b3ba
2011-11-25 09:24:04 +02:00
Tero Rintaluoma
5a2fd63a2a ARMv6 optimized Intra4x4 prediction
Added ARM optimized intra 4x4 prediction
 - 2x faster on Profiler compared to C-code compiled with -O3
 - Function interface changed a little to improve BLOCKD structure
   access

Change-Id: I9bc2b723155943fe0cf03dd9ca5f1760f7a81f54
2011-11-09 09:13:51 +02:00
Scott LaVarnway
ed9c66f584 Remove usage of predict buffer for decode
Instead of using the predict buffer, the decoder now writes
the predictor into the recon buffer.  For blocks with eob=0,
unnecessary idcts can be eliminated.  This gave a performance
boost of ~1.8% for the HD clips used.

Tero: Added needed changes to ARM side and scheduled some
      assembly code to prevent interlocks.

Patch Set 6:  Merged (I1bcdca7a95aacc3a181b9faa6b10e3a71ee24df3)
into this commit because of similarities in the idct
functions.
Patch Set 7: EC bug fix.

Change-Id: Ie31d90b5d3522e1108163f2ac491e455e3f955e6
2011-10-18 12:06:50 -04:00
Attila Nagy
1a7d25a484 Replace vpx_ports/config.h with vpx_config.h
Just a clean-up.

Change-Id: Iea5b6dc925dcfa7db548bc1ab1a13d26ed5a2c9a
2011-09-22 13:33:54 +03:00
Johann
8f910594bd Merge "Update armv6 loopfilter to new interface" 2011-07-13 04:09:55 -07:00
Johann
1a219c22b1 Merge "Update armv7 loopfilter to new interface" 2011-07-13 04:09:42 -07:00
Attila Nagy
c231b0175d Update armv6 loopfilter to new interface
Change-Id: I5fe581d797571a7a9432fbd17fc557591d0c1afa
2011-07-12 12:14:51 +03:00
Attila Nagy
283b0e25ac Update armv7 loopfilter to new interface
Change-Id: I65105a9c63832669237e6a6a7fcb4ea3ea683346
2011-07-12 12:12:25 +03:00
Johann
6611f66978 clean up warnings when building arm with rtcd
Change-Id: I3683cb87e9cb7c36fc22c1d70f0799c7c46a21df
2011-06-29 10:51:41 -04:00
Johann
dc004e8c17 Merge "Avoid text relocations in ARM vp8 decoder" 2011-06-28 16:34:10 -07:00
Mike Hommey
e3f850ee05 Avoid text relocations in ARM vp8 decoder
The current code stores pointers to coefficient tables and loads them to
access the tables contents. As these pointers are stored in the code
sections, it means we end up with text relocations. eu-findtextrel will
thus complain about code not compiled with -fpic/-fPIC.

Since the pointers are stored in the code sections, we can actually cheat
and let the assembler generate relative addressing when accessing the
coefficient tables, and just load their location with adr.

Change-Id: Ib74ae2d3f2bab80b29991355f2dbe6955f38f6ae
2011-06-28 09:11:40 +02:00
Taekhyun Kim
458fb8f491 utilize preload in ARMv6 MC/LPF/Copy routines
About 9~10% decoding perf improvement on non-Neon ARM cpus

Change-Id: I7dc2a026764e84e9c2faf282b4ae113090326837
2011-06-17 14:04:53 -07:00
Attila Nagy
f96d56c4aa Fixed iwalsh_neon build problems with RVDS4.1
rvct 4.1 was complaining about vstmia.16, store multiple expects 64 data type.
optimized the implementation.

Change-Id: I0701052cabd685c375637bbc3796ff6d88f5972c
2011-05-19 10:27:26 +03:00
Attila Nagy
a6aa389d2f Loopfilter NEON: Use VMOV for constant vectors instead of VLD.
Change-Id: I562b6e01c32bb51d00f3b95faf757fc7dc29a3a3
2011-05-04 11:29:23 +03:00
Johann
01527e743f remove simpler_lpf
the decision to run the regular or simple loopfilter is made outside the
function and managed with pointers

stop tracking the option in two places. use filter_type exclusively

Change-Id: I39d7b5d1352885efc632c0a94aaf56b72cc2fe15
2011-04-25 17:37:41 -04:00
John Koleszar
27972d2c1d Move build_intra_predictors_mby to RTCD framework
The vp8_build_intra_predictors_mby and vp8_build_intra_predictors_mby_s
functions had global function pointers rather than using the RTCD
framework. This can show up as a potential data race with tools such as
helgrind. See https://bugzilla.mozilla.org/show_bug.cgi?id=640935
for an example.

Change-Id: I29c407f828ac2bddfc039f852f138de5de888534
2011-03-11 13:04:50 -05:00
John Koleszar
02321de0f2 Fix relative include paths
Allow compiling without adding vp8/{common,encoder,decoder} to the
include paths.

Change-Id: Ifeb5dac351cdfadcd659736f5158b315a0030b6c
2011-02-10 15:09:44 -05:00
Tero Rintaluoma
cb14764fab Adds armv6 optimized variance calculation
Adds vp8_sub_pixel_variance16x16_armv6 function to encoder. Integrates
ARMv6 optimized bilinear interpolations from vp8/common/arm/armv6
and adds new assembly file for variance16x16 calculation.
 - vp8_filter_block2d_bil_first_pass_armv6   (integrated)
 - vp8_filter_block2d_bil_second_pass_armv6  (integrated)
 - vp8_variance16x16_armv6 (new)
 - bilinearfilter_arm.h (new)
Change-Id: I18a8331ce7d031ceedd6cd415ecacb0c8f3392db
2011-02-09 10:23:43 -05:00
Johann
e5aaac24bb clean up bilinear filter
make reference version of bilinear_filters short.
use reference versions of bilinear_filters and sub_pel_filters when
possible.

recognize that Width was being passed into
filter_block2d_bil_first_pass multiple times. ARM version had already
fixed this. propegate to C.

change references to src_pixels_per_line to src_pitch and standardize on
src/dst (instead of input/output).

recognize that first_pass is only run in the verticle and second_pass
only horizontal. ARM version had already fixed this. propegate to C

Change-Id: I292d376d239a9a7ca37ec2bf03cc0720606983e2
2011-02-08 17:42:54 -05:00
Johann
3273c7b679 move one of the offset files
common/arm/vpx_asm_offsets moves up a level. prepare for muxing with
encoder/arm/vpx_vp8_enc_asm_offsets

Change-Id: I89a04a5235447e66571995c9d9b4b6edcb038e24
2011-02-07 11:35:30 -05:00
Johann
bb9c95ea53 remove unused dboolhuff code
we were holding on to this "just in case." purge it instead

Change-Id: I77a367b36d0821d731019f2566ecfffdae1d4b8a
2011-02-04 16:00:00 -05:00