Commit Graph

166 Commits

Author SHA1 Message Date
James Yu
4a8336fa9d VP8 for ARMv8 by using NEON intrinsics 11
Add mbloopfilter_neon.c
- vp8_mbloop_filter_horizontal_edge_y_neon
- vp8_mbloop_filter_horizontal_edge_uv_neon
- vp8_mbloop_filter_vertical_edge_y_neon
- vp8_mbloop_filter_vertical_edge_uv_neon

Change-Id: Ia9084e0892d4d49412d9cf2b165a0f719f2382d7
Signed-off-by: James Yu <james.yu@linaro.org>
2014-05-03 04:54:33 -07:00
James Yu
c500fc22c1 VP8 for ARMv8 by using NEON intrinsics 10
Add loopfiltersimpleverticaledge_neon.c
- vp8_loop_filter_bvs_neon
- vp8_loop_filter_mbvs_neon

Change-Id: I7cf0a161ad4ae37c881b94cc0122f895d3baae79
Signed-off-by: James Yu <james.yu@linaro.org>
2014-05-03 04:11:00 -07:00
James Yu
55c95f2d2c VP8 for ARMv8 by using NEON intrinsics 09
Add loopfiltersimplehorizontaledge_neon.c
- vp8_loop_filter_bhs_neon
- vp8_loop_filter_mbhs_neon

Change-Id: I77f9721b20585da8bf3869a3850ff0ae4b4bfeea
Signed-off-by: James Yu <james.yu@linaro.org>
2014-05-03 04:10:45 -07:00
James Yu
a5d79f43b9 VP8 for ARMv8 by using NEON intrinsics 08
Add loopfilter_neon.c
- vp8_loop_filter_horizontal_edge_y_neon
- vp8_loop_filter_horizontal_edge_uv_neon
- vp8_loop_filter_vertical_edge_y_neon
- vp8_loop_filter_vertical_edge_uv_neon

Change-Id: I50b57dedabd42d2a3c183c1738cc5346f0e71ed8
Signed-off-by: James Yu <james.yu@linaro.org>
2014-05-02 09:32:11 -07:00
James Yu
930557be10 VP8 for ARMv8 by using NEON intrinsics 07
Add iwalsh_neon.c
- vp8_short_inv_walsh4x4_neon

Change-Id: I8beda6ce11ad8ce9e80cc0a38d40161938359162
Signed-off-by: James Yu <james.yu@linaro.org>
2014-05-02 09:24:54 -07:00
James Yu
81ad047ee5 VP8 for ARMv8 by using NEON intrinsics 06
Add idct_dequant_full_2x_neon.c
- idct_dequant_full_2x_neon

==== Summary of apply VP8 decode patch series ====
Benchmark on Samsung Chromebook, Cortex-A15, 1.7GHz, Dual core
Toolchain: linaro-1.13.1-4.8-2014.01
Compile argument: CROSS=arm-linux-gnueabihf- ../libvpx/configure
                     --target=armv7-linux-gcc --prefix=$HOME/out
                     --enable-shared --cpu=cortex-a7
Test argument: vpxdec --summary --noblit ./tears_of_steel_1080p.webm

NEON assembly   46.68 (fps)
Apply patch 06  46.65, -0.03
Apply patch 07  46.86, +0.21
Apply patch 08  46.58, -0.28
Apply patch 09  46.57, -0.01
Apply patch 10  46.51, -0.06
Apply patch 11  46.13, -0.38
Apply patch 12  45.42, -0.71
Apply patch 13  46.06, +0.64
Apply patch 14  45.19, -0.87
Apply patch 15  45.93, +0.74
Apply patch 16  45.48, -0.45
Apply patch 17  45.84, +0.36
Apply patch 18  45.91, +0.07  <= With all NEON intrinsics patches
                 Total -0.77 fps, 1.65% performance regression

Change-Id: I77bfc9eaccfb97b8d401e949ceff8795e26ca6b7
Signed-off-by: James Yu <james.yu@linaro.org>
2014-05-02 11:57:47 +08:00
Yunqing Wang
096eaba728 Remove VP8 save_reg_neon function
This patch did a cleanup following the commit "Save NEON registers
in VP8 NEON functions". The pushing/poping of callee-saved NEON
registers was moved into individual NEON functions. Therefore,
we don't need to save those registers at the beginning of codec.
The related code was removed.

Change-Id: I5648166514fc9beffb780aa138495597731f49ea
2014-04-29 16:13:24 -07:00
James Zern
805078a1bf build: convert rtcd.sh to perl
significantly speeds up file generation.

the goal of this change is to convert rtcd.sh to perl as directly as
possible to allow for simple comparison. future changes can make it more
perl-like.

---
Linux
    [CREATE] vpx_scale_rtcd.h
real    0m0.485s ->    0m0.022s
    [CREATE] vp8_rtcd.h
real    0m4.619s ->    0m0.060s
    [CREATE] vp9_rtcd.h
real    0m10.102s ->    0m0.087s

Windows
    [CREATE] vpx_scale_rtcd.h
real    0m8.360s ->    0m0.080s
    [CREATE] vp8_rtcd.h
real    1m8.083s ->    0m0.160s
    [CREATE] vp9_rtcd.h
real    2m6.489s ->    0m0.233s

Change-Id: Idfb71188206c91237d6a3c3a81dfe00d103f11ee
2014-03-03 14:47:11 -08:00
James Yu
fb5d281bb6 VP8 for ARMv8 by using NEON intrinsics 05
Add dequantizeb_neon.c
- vp8_dequantize_b_loop_neon

vpxdec  --summary --noblit ../videos/tears_of_steel_1080p.webm
Before => After, 13.25 => 13.23 (fps)

Change-Id: Iebe3b0c6ed2359c778b0570763c5681ae25fef0c
Signed-off-by: James Yu <james.yu@linaro.org>
2014-02-26 10:16:00 +08:00
James Yu
28b2f82f97 VP8 for ARMv8 by using NEON intrinsics 04
Add dequant_idct_neon.c
- vp8_dequant_idct_add_neon

vpxdec  --summary --noblit ../videos/tears_of_steel_1080p.webm
Before => After, 13.25 => 13.22 (fps)

Change-Id: Id48f39e1da58dd3d8d37658e94989411997f4f7c
Signed-off-by: James Yu <james.yu@linaro.org>
2014-02-26 09:59:23 +08:00
James Yu
d749ab6221 VP8 for ARMv8 by using NEON intrinsics 03
Add dc_only_idct_add_neon.c
- vp8_dc_only_idct_add_neon

vpxdec  --summary --noblit ../videos/tears_of_steel_1080p.webm
Before => After, 13.25 => 13.24 (fps)

Change-Id: I5e9e277ec3a3ca67e13c8cc4c324a6fbe8a897fc
Signed-off-by: James Yu <james.yu@linaro.org>
2014-02-26 09:28:29 +08:00
James Yu
300a3bfc73 VP8 for ARMv8 by using NEON intrinsics 02
Add copymem_neon.c
- vp8_copy_mem16x16_neon
- vp8_copy_mem8x8_neon
- vp8_copy_mem8x4_neon

vpxdec  --summary --noblit ../videos/tears_of_steel_1080p.webm
Before => After, 13.25 => 13.25 (fps)

Change-Id: Ib956b5a20522ff57dc8a580bf0aef7b252bddba6
Signed-off-by: James Yu <james.yu@linaro.org>
2014-02-23 22:56:53 +08:00
Johann
dadf350551 Apply neon flags to intrinsic files
Filter out files ending in _neon.c and append .neon so the Android build
system knows to apply -mfpu=neon

Change-Id: Ib67277e5920bfcaeda7c4aa16cd1001b11d59305
2014-01-10 12:16:59 -08:00
James Yu
79395e16cf VP8 for ARMv8 by using NEON intrinsics 01
Add bilinearpredict_neon_intrinsics.c
- vp8_bilinear_predict4x4_neon
- vp8_bilinear_predict8x4_neon
- vp8_bilinear_predict8x8_neon
- vp8_bilinear_predict16x16_neon

Change-Id: I33dfa502881219841b442dda32b73220e51b716b
Signed-off-by: James Yu <james.yu@linaro.org>
2014-01-09 09:56:22 -08:00
James Zern
f89335f7ca remove unused VP8 com/dec asm offsets
Change-Id: Ib3b26ee27f04b2dcbbd32b3127afb45e9f50cfcf
2013-07-09 14:33:49 -07:00
James Zern
08348d9cab prefix vp8 asm_{com,dec,enc}_offsets files
make them symmetrical with the generated output and their vp9
counterparts

Change-Id: I72cc97c4d33d713dff620a6d7cc25955266216fc
2013-03-02 14:45:40 -08:00
John Koleszar
a9c7597adc support building vp8 and vp9 into a single lib
Change-Id: Ib8f8a66c9fd31e508cdc9caa662192f38433aa3d
2012-11-15 10:46:17 -08:00
John Koleszar
7b8dfcb5a2 Rough merge of master into experimental
Creates a merge between the master and experimental branches. Fixes a
number of conflicts in the build system to allow *either* VP8 or VP9
to be built. Specifically either:

  $ configure --disable-vp9 $ configure --disable-vp8
  --disable-unit-tests

VP9 still exports its symbols and files as VP8, so that will be
resolved in the next commit.

Unit tests are broken in VP9, but this isn't a new issue. They are
fixed upstream on origin/experimental as of this writing, but rebasing
this merge proved difficult, so will tackle that in a second merge
commit.

Change-Id: I2b7d852c18efd58d1ebc621b8041fe0260442c21
2012-11-07 11:30:16 -08:00
Ronald S. Bultje
4b2c2b9aa4 Rename vp8/ codec directory to vp9/.
Change-Id: Ic084c475844b24092a433ab88138cf58af3abbe4
2012-11-01 16:31:22 -07:00
Ronald S. Bultje
6c280c2299 Adjust style to match Google Coding Style a little more closely.
Most of these were picked up by jenkins in the commit that changed
the vp8 namespace to vp9 in common/.

Change-Id: I5cbd56ffc753b92ef805133cda6acc1713a13878
2012-11-01 10:03:48 -07:00
Ronald S. Bultje
8166657109 Make implicit_segmentation-related code an experiment.
This way, the code is not compiled in by default, thus decreasing
overall binary size.

Change-Id: I85cac8f5a22a51a7d99c820ef6d6ed179d4106a0
2012-10-29 21:15:42 -07:00
Scott LaVarnway
ce811f87c4 Faster 8t filtering
Quickly modified the ssse3 sixtap filters to support eight taps.  For the test
clip used, a 23+% boost in decoder performance was seen.  We can
revisit later and improve further.

Change-Id: I5f59860459e80d6fa23e6cc0fd91296a969f5240
2012-10-25 17:24:50 -07:00
Scott LaVarnway
9ba2efd034 Added sse2 instrinsic version of vp8_sad16x3
3.7% boost in decoder performance for the clip used.

Change-Id: I74f28486a9352b472b36e21b5eaf30eff35e9199
2012-10-25 12:16:08 -07:00
Scott LaVarnway
d36ecb42da Added rtcd support vp8_sad16x3 and vp8_sad3x16
Change-Id: I5bca7b7a4b230082d36ac6fb84db84137ad177d7
2012-10-22 13:45:42 -07:00
Scott LaVarnway
085433c2d0 sse2 intrinsic version of vp8_mbloop_filter_vertical_edge()
First sse2 version of vp8_mbloop_filter_vertical_edge().  For now,
intrinsics are being used until the bitstream is finalized.  This function
will be revisited later for further performance improvements.

For the test clip used, a 34+% decoder performance improvement
was seen.  This will vary depending on material.

Change-Id: I455b438bc8d8af76cf7533ac42eda5f689b21f7c
2012-10-19 15:52:12 -07:00
Scott LaVarnway
992b5e2d95 sse2 intrinsic version of vp8_mbloop_filter_horizontal_edge()
First sse2 version of vp8_mbloop_filter_horizontal_edge().  For now,
intrinsics are being used until the bitstream is finalized.  This function
will be revisited later for further performance improvements.
For the test clip used, a 31+% decoder performance improvement
was seen.  This will vary depending on material.

Change-Id: I03ed3a7182478bdd1f094644ff3e0442625600e7
2012-10-18 14:29:26 -07:00
Jim Bankoski
ffff213463 removed obselete build dependency
this commit fixes the build on windows with visual studio 2008.

Change-Id: I0baa4044e9e54237da29f2e17332ea6f766dbbec
2012-10-17 09:22:05 -07:00
Paul Wilkins
2d60bee1fb New Motion Reference Search
Alternative strategy for finding a list of candidate motion
vectors to use as reference values in mv coding and as
nearest and near.

Sort by sad in vp8_find_best_ref_mvs() rather than just
pick the best. Allow 0,0 as a best ref option but not a
nearest or near unless there are no alternatives.

Encode/Decode verified on at least some clips.

Some commented out experimental and stats code still in place.

Gain over existing code averages about 1% on derf (alll metrics)
with improvement on all clips. Other test results pending.

The entropy coding of the mode (nearest/near etc) still
depends upon and requires the old "findnear" code so
this needs looking at and may provide room for further gains.

Change-Id: I871d7cba1d1c379c4bad9bcccce1fb19c46b8247
2012-08-24 18:08:21 +01:00
John Koleszar
b43ed7a5b1 Merge "remove rotation experiment" into experimental 2012-08-22 10:01:39 -07:00
Christian Duvivier
63ef9c40a4 SSE2 version of vectorized 8-tap filtering.
About 20% overall encoder speedup (vs. about 30% for sse4 version).

Change-Id: Ibf608a6a1bc94b14ec47e8046d3206b275b5a8bd
2012-08-21 15:26:14 -07:00
John Koleszar
5055a1610d remove rotation experiment
This is being reimplemented more generically in terms of affine
transforms.

Change-Id: I9300bfde5f8b93c708c64f59427087720f8ed782
2012-08-21 10:09:56 -07:00
Christian Duvivier
5a34e0eb89 First partial snapshot of vectorized 8-tap filtering.
About 3.5x faster, 30% overall encoder speedup. Rest of optimizations
will come soon (see TODO section in filter_sse4.c).

Change-Id: If18108048bfd5345fc942e8574e4c7f58e0e86e0
2012-08-15 17:55:06 -07:00
Christian Duvivier
707b65bd16 Partial import of "New RTCD implementation" from master branch.
Latest version of all scripts/makefile but rtcd_defs.sh is empty, all
existing functions are still selected using the old/current way.

Change-Id: Ib92946a48a31d6c8d1d7359eca524bc1d3e66174
2012-08-08 16:43:48 -07:00
Johann
aa165c8c5d Update armv6 vp8_intra4x4_predict
Change-Id: I52a3b0a4a42e5af91b987e19523df07c8f467847
2012-08-08 10:57:33 -07:00
Johann
a497cb59cd Rename vp8_intra4x4_predict_d
predict_d has become canonical. Remove previous helper function.

Disable ARM assembly pending update.

Change-Id: Idd84ac8a28f9b0221ea97904a77de1e705d06a7d
2012-08-01 11:17:57 -07:00
Dragan Mrdjan
07ff7fa811 VP8 optimizations for MIPS dspr2
Signed-off-by: Raghu Gandham <raghu@mips.com>

Change-Id: I3a8bca425cd3dab746a6328c8fc8843c8e87aea6
2012-07-10 10:01:54 -07:00
Yaowu Xu
e9818bb697 changed the way that default probs for 8x8 is set.
The commit changed how baseline 8x8 coefficient probabilities are
initialized, to be consistent with the initialization of baseline
4x4 coefficient probabilities.

The commit does not have any effect on compression.

Change-Id: Ifb3902b5dc0b0c2e6dc3aa5d4a6589d528e58355
2012-05-23 10:09:41 -07:00
John Koleszar
2d225689d3 Move all tests to test/ directory
Consolodate the unit tests under vp8/ to the test/ directory

Change-Id: I6d6a0fb60f5e3874a4d6710e9e121dd3e81a93db
2012-05-22 15:00:10 -07:00
John Koleszar
e82d261d10 Build unit tests monolithically
Rework unit tests  to have a single executable rather than many, which
should avoid pollution of the visual studio project namespace, improve
build times, and make it easier to use the gtest test sharding system
when we get these going on the continuous build cluster.

Change-Id: If4c3e5d4b3515522869de6c89455c2a64697cca6
2012-05-22 14:37:30 -07:00
Scott LaVarnway
317d4244cb Makes all mode token tables const part 2
(see Change I9b2ccc88: Makes all mode token tables const)
Further remove runtime table initialization and use
precalculated const data.  Data footprint reduced
by 4112 bytes.

Change-Id: Ia3ae9fc19f77316b045cabff01f6e5f0876a86ab
2012-04-19 17:35:20 -04:00
Yaowu Xu
3f5feb7d13 fixed .mk files to reflect add/remove of a header file
In a previous commit, the duplicate of headerfile defaultcoefcounts.h
was identified. This commit updates the .mk file to ensure configure
and make works properly for all platforms.

Change-Id: I31a39c809a734ba438ee53db700f252e9a03eddd
2012-03-12 14:51:54 -07:00
Johann
fd903902ef RFC: Reorganize MFQE loops
Break MFQE code into it's own file.

It is currently only valid for 16x16 and 8x8 Y blocks. It also filters
4x4 U/V blocks.

Refactor filtering and add associated assembly. Limited test cases show
--mfqe introduces a penalty of ~20% with HD content. The assembly
reduces the penalty to ~15%

Change-Id: I4b8de6b5cdff5413037de5b6c42f437033ee55bf
2012-03-06 15:20:03 -08:00
Johann
e50f96a4a3 Move SAD and variance functions to common
The MFQE function of the postprocessor depends on these

Change-Id: I256a37c6de079fe92ce744b1f11e16526d06b50a
2012-03-05 16:50:33 -08:00
James Berry
0c1cec2205 Add unit tests for idctllm_test and idctllm_mmx
add unit tests for vp8_short_idct4x4llm_c

Change-Id: I472b7c0baa365ba25dc99a3f6efccc816d27c941
2012-02-21 14:52:36 -05:00
Makoto Kato
7989bb7fe7 Support Android x86 NDK build
On Android NDK, rand() is inlined function.  But, on our SSE optimization,
we need symbol for rand()

Change-Id: I42ab00e3255208ba95d7f9b9a8a3605ff58da8e1
2012-02-16 12:03:30 -08:00
Paul Wilkins
2615ca5d41 Removal of threading code.
For the experimental branch we are trying to slim the codebase
down removing features such as threading for now which complicate
the process of development and testing.

Change-Id: I657c0246aef4d1fa8c8ffc6a1adfeee45bce8e24
2012-02-10 16:23:59 +00:00
Paul Wilkins
b2f64dff7d Added common prediction modules.
This function adds the common prediction modules,  some data structures
and a config option but does not use them.

It also corrects a bug in clearing down  the MODE_INFO border and introduces
a new element that indicates if an entry corresponds to an "in image" macro block
or is part of the border.

Change-Id: Ib69eec0876173ebe9d1de9df9537d0b2447702e0
2012-01-31 12:53:36 +00:00
John Koleszar
f103dcefaf RTCD: add subpixel functions
This commit continues the process of converting to the new RTCD
system.

Change-Id: I6c519ab61e4f4e0ebcc796f2df061f945c48cefe
2012-01-30 12:08:29 -08:00
John Koleszar
2a8f57f50d RTCD: add postproc functions
This commit continues the process of converting to the new RTCD
system.

Change-Id: If54eb5cb5d1b0cac6c4c0633a9e99c93ca860ba2
2012-01-30 12:08:29 -08:00
John Koleszar
fdb61a4531 RTCD: add recon functions
This commit continues the process of converting to the new RTCD
system.

Change-Id: I9bfcf9bef65c3d4ba0fb9a3e1532bad1463a10d6
2012-01-30 12:08:28 -08:00
John Koleszar
ab77b4e898 RTCD: add remaining IDCT functions
This commit continues the process of converting to the new RTCD
system.

Change-Id: I03c4dbf30dfd3558b0e256ff9d3ff4c012aadc80
2012-01-30 12:08:22 -08:00
John Koleszar
55f74c59c7 RTCD: add loopfilter functions
This commit continues the process of converting to the new RTCD
system.

Change-Id: Ic8a4047d72ff3a54ec98977dd90e70c13213db71
2012-01-30 12:06:31 -08:00
John Koleszar
a910049aea New RTCD implementation
This is a proof of concept RTCD implementation to replace the current
system of nested includes, prototypes, INVOKE macros, etc. Currently
only the decoder specific functions are implemented in the new system.
Additional functions will be added in subsequent commits.

Overview:
  RTCD "functions" are implemented as either a global function pointer
  or a macro (when only one eligible specialization available).
  Functions which have RTCD specializations are listed using a simple
  DSL identifying the function's base name, its prototype, and the
  architecture extensions that specializations are available for.

Advantages over the old system:
  - No INVOKE macros. A call to an RTCD function looks like an ordinary
    function call.
  - No need to pass vtables around.
  - If there is only one eligible function to call, the function is
    called directly, rather than indirecting through a function pointer.
  - Supports the notion of "required" extensions, so in combination with
    the above, on x86_64 if the best function available is sse2 or lower
    it will be called directly, since all x86_64 platforms implement
    sse2.
  - Elides all references to functions which will never be called, which
    could reduce binary size. For example if sse2 is required and there
    are both mmx and sse2 implementations of a certain function, the
    code will have no link time references to the mmx code.
  - Significantly easier to add a new function, just one file to edit.

Disadvantages:
  - Requires global writable data (though this is not a new requirement)
  - 1 new generated source file.

Change-Id: Iae6edab65315f79c168485c96872641c5aa09d55
2012-01-30 12:06:27 -08:00
Attila Nagy
294aa37745 Rename save_neon_reg.asm as save_reg_neon.asm
Easier to filter out all NEON asm.

Change-Id: I0022dae8321a9608e864b09d4181414c5fff4610
2012-01-26 09:44:00 +02:00
Jim Bankoski
91325b8fe7 vpn common -> implicit segmentation
This introduces base functions for introducing implicit segmentation.
The code that actually stores the results to the segment map isn't
here yet.   This just prints out the segmentation map results
if you call it.

Uses connected component labeling technique on mbmi info so that only
if 2 mbs are horizontally or vertically touching do they get the same
segment.

vp8next - plumbing for rotation

code to produce taps for rotation ( tapify. py ),  code
for predicting using rotation ( predict_rotated.c ) ,  code
for finding the best rotation find_rotation.c.

didn't checkin code that uses this in the codec.   still work
in progress.

Fixed copyright notice

Change-Id: I450c13cfa41ab2fcb699f3897760370b4935fdf8
2012-01-24 11:20:13 -08:00
Fritz Koenig
892102842a Disconnect ARM tgt_isa from dsp extensions
A processor with ARMv7 instructions does not
necessarily have NEON dsp extensions.  This CL
has the added side effect of allowing the ability
to enable/disable the dsp extensions cleanly.

Change-Id: Ie1e879b8fe131885bc3d4138a0acc9ffe73a36df
2012-01-20 10:38:15 -08:00
Scott LaVarnway
33d9ea5471 Merge "Remove useless g_common.h" 2012-01-03 09:48:35 -08:00
John Koleszar
f56918ba9c Remove legacy integer types
Remove BOOL, INTn, UINTn, etc, in favor of C99-style fixed width
types.

Change-Id: I396636212fb5edd6b347d43cc940186d8cd1e7b5
2011-12-22 09:58:40 -08:00
John Koleszar
0c2f8e77cc Remove useless g_common.h
This file declared a bunch of nonexistent, unreferenced global
function pointers.

Change-Id: Ic26bb8c7712deba754c49fc01f383b53afc9e728
2011-12-21 15:02:23 -08:00
John Koleszar
056bcc8771 remove armv6 files from armv5 build
Make bilinearfilter_arm.c compiled only when HAVE_ARMV6, as its definitions
are v6 only. This is normally not a problem for static builds as the file
is elided at link time, but this was not being done properly for the
--enable-shared --enable-pic build.

Change-Id: Ic800a7cde751f74f22555c5b247f99f9df5e550d
2011-12-19 13:51:11 -08:00
Scott LaVarnway
a53d5a4c44 Moved dequant idct into common
These functions are now used by the encoder.
This is WIP with the goal of creating a common idct/add for
the encoder and decoder.  A boost of 1.8% was seen for
the HD rt test clip used.

[Tero] Added needed changes to ARM side.

Change-Id: Ibbb8000be09034203d7adffc457d3c3f8b06a5bf
2011-12-15 14:23:41 -05:00
Johann
f2cd4ded22 Move shared data to shared location
Storing vp8_bilinear_filters_mmx in an mmx file and using it in an sse2
file is bad

Moving towards allowing --disable-mmx

Change-Id: I20493b35bdedcdcfc0915e6f05fdbe6c81a4a742
2011-11-18 16:23:14 -08:00
Tero Rintaluoma
5a2fd63a2a ARMv6 optimized Intra4x4 prediction
Added ARM optimized intra 4x4 prediction
 - 2x faster on Profiler compared to C-code compiled with -O3
 - Function interface changed a little to improve BLOCKD structure
   access

Change-Id: I9bc2b723155943fe0cf03dd9ca5f1760f7a81f54
2011-11-09 09:13:51 +02:00
Paul Wilkins
01ce04bc06 Further segment feature extensions.
This quite large check in includes the following:

Merge in some code from Ronald (mbgraph.c) that scans a Gf/arf group.
This is used as a basis for a simple segmentation for the normal frames
in a gf/arf group. This code also uses satd functions from Yaowu.

Adds functionality for coding the latest possible position of an EOB for
blocks in the segment. (Currently 0-15 only, hence just for 4x4 dct).
Where the EOB position is 0 this acts like "skip" and the normal coding
of skip at the per mb level is disabled.

Added functions (seg_common.c) for setting and reading segment feature
elements. These may want to be optimized away at some point but while the
mecahnism is in a state of flux they provide a single location for making
changes and keep things a bit cleaner.

This is still proof of concept code. Currently the tested feature set:-

Quantizer,
Loop Filter level,
Reference frame,
Prediction Mode,
EOB end stop.

TBD:-

Add functions for setting and reading the feature data with range
and validity checking.

Handling of signed and unsigned feature data. At the moment all is assumed
to be signed and a sign bit is coded but many cannot be negative.

Correct handling of EOB feature with intra coded blocks.

Testing/trapping of legal/illegal ref frame and mode combinations.

Transform size switch plus merge and test with 8c8 DCT work

Merge and test with Sumans Segmenation coding optimizations

Change-Id: Iee12e83661c7abbd1e0ce6810915eb4ec35e2d8e
2011-10-24 15:52:18 +01:00
Scott LaVarnway
ed9c66f584 Remove usage of predict buffer for decode
Instead of using the predict buffer, the decoder now writes
the predictor into the recon buffer.  For blocks with eob=0,
unnecessary idcts can be eliminated.  This gave a performance
boost of ~1.8% for the HD clips used.

Tero: Added needed changes to ARM side and scheduled some
      assembly code to prevent interlocks.

Patch Set 6:  Merged (I1bcdca7a95aacc3a181b9faa6b10e3a71ee24df3)
into this commit because of similarities in the idct
functions.
Patch Set 7: EC bug fix.

Change-Id: Ie31d90b5d3522e1108163f2ac491e455e3f955e6
2011-10-18 12:06:50 -04:00
Johann
3556deaca3 combine loopfilter data access
The data processed by the loopfilter overlaps. At the block level, this
results in some redundant transforms. Grouping the filtering allows for
a single 16x16 transpose (and inversion) instead of three 16x8 transposes
(and three more inversions).

This implementation is x86_64 only. We retain the previous
implementation for x86.

Improvements are obviously material dependant, but it seems to be ~%1 in
tests here.

Change-Id: I467b7ec3655be98fb5f1a94b5d145e5e5a660007
2011-09-30 07:38:35 -07:00
John Koleszar
4a6ac727fe Install missing default_coef_probs.h
Make sure that this header is listed as one of the sources, so that it
will be installed if necessary.

Change-Id: I2427e494488126b179151dc21043c1e2c8ba5991
2011-09-22 11:08:24 -04:00
John Koleszar
180b0306cc Merge remote branch 'internal/upstream' into HEAD
Conflicts:
	vp8/common/defaultcoefcounts.h
	vp8/common/entropy.c
	vp8/encoder/bitstream.c

Change-Id: Idd4990c80d5b5494ac036254694015fab449bc08
2011-08-25 08:36:19 -04:00
Scott LaVarnway
19987dcbfa Faster vp8_default_coef_probs
Copies from a generated table instead of building the
default coeff probabilities during runtime.

Change-Id: I4d9551ea3a2d7d4a4f7ce9eda006495221a8de50
2011-08-16 16:21:21 -04:00
John Koleszar
a16cd74ba1 Merge remote branch 'internal/upstream-experimental' into HEAD
Conflicts:
	vp8/decoder/detokenize.c
	vp8/decoder/onyxd_if.c
	vp8/vp8_common.mk

Change-Id: Ifca1108186a8bc715da86a44021ee2fa5550b5b8
2011-08-11 13:01:45 -04:00
James Berry
27ee521753 include asm_com/dec_offsets for make dist
Change-Id: Ia1ad66066a24c01915cd9e3ff75c7e070cc984c8
2011-08-02 13:42:03 -04:00
Johann
3e8c6d3d35 include the arm header files in make dist
Change-Id: Ibcf5b4b14153f65ce1b53c3bfba87ad2feb17bbd
2011-08-01 17:20:21 -04:00
John Koleszar
9dfd006017 Merge remote branch 'internal/upstream-experimental' into HEAD
Conflicts:
	vp8/encoder/bitstream.c

Change-Id: I44c00f98dcb99eb728ce4f5256aefb135a711a74
2011-06-30 08:46:49 -04:00
Stefan Holmer
4cb0ebe5b2 Adding support for independent partitions
Adding support in the encoder for generating
independent residual partitions by forcing
equal probabilities over the prev coef entropy
contexts.

Change-Id: I402f5c353255f3ca20eae2620af739f6a498cd21
2011-06-28 11:10:17 -04:00
John Koleszar
f86e14d8dc Merge remote branch 'internal/upstream' into HEAD 2011-06-28 00:05:04 -04:00
Attila Nagy
6f23f24afe configuration, support disabling any subset of ARM arch
Useful for leaving out any version specific asm files.

Change-Id: I233514410eb9d7ca88d2d2c839673122c507fa99
2011-06-21 10:39:01 +03:00
John Koleszar
e1b90ce862 Merge remote branch 'internal/upstream' into HEAD 2011-04-28 00:05:07 -04:00
Ronald S. Bultje
1083fe4999 SSE2/SSSE3 optimizations for build_predictors_mbuv{,_s}().
decoding

before
10.425
10.432
10.423
=10.426

after:
10.405
10.416
10.398
=10.406, 0.2% faster

encoding

before
14.252
14.331
14.250
14.223
14.241
14.220
14.221
=14.248

after
14.095
14.090
14.085
14.095
14.064
14.081
14.089
=14.086, 1.1% faster

Change-Id: I483d3d8f0deda8ad434cea76e16028380722aee2
2011-04-27 11:31:27 -07:00
John Koleszar
51bcf621c1 Merge remote branch 'internal/upstream' into HEAD
Conflicts:
	vp8/decoder/decodemv.c
	vp8/decoder/onyxd_if.c
	vp8/encoder/ratectrl.c
	vp8/encoder/rdopt.c

Change-Id: Ia1c1c5e589f4200822d12378c7749ba62bd17ae2
2011-03-23 00:27:52 -04:00
John Koleszar
429dc676b1 Increase static linkage, remove unused functions
A large number of functions were defined with external linkage, even
though they were only used from within one file. This patch changes
their linkage to static and removes the vp8_ prefix from their names,
which should make it more obvious to the reader that the function is
contained within the current translation unit. Functions that were
not referenced were removed.

These symbols were identified by:

  $ nm -A libvpx.a | sort -k3 | uniq -c -f2 | grep ' [A-Z] ' \
    | sort | grep '^ *1 '

Change-Id: I59609f58ab65312012c047036ae1e0634f795779
2011-03-17 20:53:47 -04:00
John Koleszar
820b2b927f Merge remote branch 'internal/upstream' into HEAD 2011-03-10 00:05:04 -05:00
John Koleszar
5c24071504 Add missing filter.h to build system
Missing file causes 'make dist' to not include a complete copy of the
source.

Change-Id: I3f55aeb5a86d0e81234e4e4588cb8086ba4cfc4a
2011-03-09 13:43:31 -05:00
John Koleszar
b21fe3b278 Merge remote branch 'internal/upstream' into HEAD 2011-02-19 00:05:44 -05:00
John Koleszar
c764c2a20f Merge "clean up unused files" 2011-02-18 06:33:05 -08:00
John Koleszar
3ed8fe8778 remove unused vp8_predict_dc function
Change-Id: I64fa47889c54cfed094a674c49ef0996d49bdd42
2011-02-18 09:12:20 -05:00
John Koleszar
cbf923b12c clean up unused files
Removed a number of files that were unused or little-used.

Change-Id: If9ae5e5b11390077581a9a879e8a0defe709f5da
2011-02-18 09:09:49 -05:00
John Koleszar
f13212b728 Merge remote branch 'internal/upstream' into HEAD 2011-02-18 00:05:13 -05:00
John Koleszar
64aebb6c7a Merge remote branch 'internal/upstream' into HEAD 2011-02-11 00:05:19 -05:00
John Koleszar
02321de0f2 Fix relative include paths
Allow compiling without adding vp8/{common,encoder,decoder} to the
include paths.

Change-Id: Ifeb5dac351cdfadcd659736f5158b315a0030b6c
2011-02-10 15:09:44 -05:00
John Koleszar
ec3b8f1f32 Merge remote branch 'internal/upstream' into HEAD
Conflicts:
	vp8/decoder/onyxd_int.h

Change-Id: Id9aa577f03e37b4f406ba3b593c3c4330812a49e
2011-02-10 14:26:40 -05:00
Tero Rintaluoma
cb14764fab Adds armv6 optimized variance calculation
Adds vp8_sub_pixel_variance16x16_armv6 function to encoder. Integrates
ARMv6 optimized bilinear interpolations from vp8/common/arm/armv6
and adds new assembly file for variance16x16 calculation.
 - vp8_filter_block2d_bil_first_pass_armv6   (integrated)
 - vp8_filter_block2d_bil_second_pass_armv6  (integrated)
 - vp8_variance16x16_armv6 (new)
 - bilinearfilter_arm.h (new)
Change-Id: I18a8331ce7d031ceedd6cd415ecacb0c8f3392db
2011-02-09 10:23:43 -05:00
John Koleszar
b2ad177942 Merge remote branch 'internal/upstream' into HEAD
Conflicts:
	vp8/vp8_common.mk

Change-Id: I2094ddf20834c0b7dfe912feac6a79500bb8cce2
2011-02-09 08:34:48 -05:00
Johann
e5aaac24bb clean up bilinear filter
make reference version of bilinear_filters short.
use reference versions of bilinear_filters and sub_pel_filters when
possible.

recognize that Width was being passed into
filter_block2d_bil_first_pass multiple times. ARM version had already
fixed this. propegate to C.

change references to src_pixels_per_line to src_pitch and standardize on
src/dst (instead of input/output).

recognize that first_pass is only run in the verticle and second_pass
only horizontal. ARM version had already fixed this. propegate to C

Change-Id: I292d376d239a9a7ca37ec2bf03cc0720606983e2
2011-02-08 17:42:54 -05:00
Johann
40dcae9c2e clarify *_offsets.asm differences
it's difficult to mux the *_offsets.c files because of header conflicts.
make three instead, name them consistently and partititon the contents
to allow building them as required.

Change-Id: I8f9768c09279f934f44b6c5b0ec363f7943bb796
2011-02-08 16:35:43 -05:00
Johann
3273c7b679 move one of the offset files
common/arm/vpx_asm_offsets moves up a level. prepare for muxing with
encoder/arm/vpx_vp8_enc_asm_offsets

Change-Id: I89a04a5235447e66571995c9d9b4b6edcb038e24
2011-02-07 11:35:30 -05:00
John Koleszar
7211ac407b Merge remote branch 'internal/upstream' into HEAD 2010-12-14 00:05:07 -05:00
John Koleszar
b1aa54ab26 remove unused temporal preproc code
This code is unused, as the current preproc implementation uses the
same spatial filter that postproc uses.

Change-Id: Ia06d5664917d67283f279e2480016bebed602ea7
2010-12-13 16:47:59 -05:00
Jim Bankoski
b4a3602f66 changes to start experimenting with color segmentation prediction modes. 2010-11-16 14:38:40 -05:00
John Koleszar
d6c67f02c9 make vp8_recon16x16mb{,y} RTCD functions
ARM NEON has a platform specific version of vp8_recon16x16mb, though
it's just a stub to extract the various parameters from the
MACROBLOCKD struct and pass them to vp8_recon16x16mb_neon(). Using
that function's prototype directly will be a better long term solution,
but it's quite an invasive change.

Change-Id: I04273149e2ade34749e2d09e7edb0c396e1dd620
2010-10-26 13:23:36 -04:00
John Koleszar
19638c2309 arm: move unrolled loops back to generic code
Some of the ARM functions differed from their generic counterparts
only by unrolling their loops. Since this change may be useful
on other platforms, or might even supercede the looped version
in the generic case, move it back to the generic file.

This code is left under #if ARCH_ARM for now, but it may be worth
considering a different (possibly new) conditional for these. If
it turns out that this should be runtime selectable, these
functions will have to move to the RTCD infrastructure. Don't want
to take that step at this time without more profile data.

Change-Id: I4612fdbc606fbebba4971a690fb743ad184ff15f
2010-10-26 09:51:35 -04:00