Commit Graph

107 Commits

Author SHA1 Message Date
James Yu
eed005b076 VP8 encoder for ARMv8 by using NEON intrinsics 6
Add shortfdct_neon.c
- vp8_short_fdct4x4_neon
- vp8_short_fdct8x4_neon

Change-Id: I90152c803b484f5fab839473d632c50af0524e68
Signed-off-by: James Yu <james.yu@linaro.org>
2014-08-20 09:25:29 -07:00
James Yu
6d6fdd9c3d VP8 encoder for ARMv8 by using NEON intrinsics 3
Add subtract_neon.c
- vp8_subtract_b_neon
- vp8_subtract_mby_neon
- vp8_subtract_mbuv_neon

Change-Id: If9a17a093478552e3e3276eeaa3f098b9021d08c
Signed-off-by: James Yu <james.yu@linaro.org>
2014-08-20 09:20:55 -07:00
Scott LaVarnway
8013aaa10b VP8 encoder for ARMv8 by using NEON intrinsics 2
Add vp8_shortwalsh4x4_neon.c
- vp8_short_walsh4x4_neon

Change-Id: Ica5f584be608c9e636f62db14f563757e94be09b
Signed-off-by: James Yu <james.yu@linaro.org>
2014-08-20 09:19:23 -07:00
Marco Paniconi
7788c62286 Fix clang compiler warning in denoising_neon.
Issue: https://code.google.com/p/webm/issues/detail?id=829

Change-Id: I580308f8aa4af194b5d8990a9692ebd18db68ee8
2014-07-23 09:59:27 -07:00
Scott LaVarnway
a4b7ae7e82 Neon version of vp8_denoiser_filter_uv()
The encoder performance improved by 5% (vs "C")
for the test clip used.

Change-Id: I866b35eb2a06092edce7b37fc409562d0dacd7e7
2014-06-27 11:03:58 -07:00
Scott LaVarnway
4d9b9fa508 Neon match to vp8 temporal denoiser fix
Now match the "C" version of "Fix to reduce block
artifacts from vp8 temporal denoiser."
(see change id Id9b56e59e33f3c22e79d2f89f763bdde246fdf3f)

Change-Id: I99e569bb6af4ae3532621127e12bf917a48ba08e
2014-05-28 13:32:52 -07:00
Scott LaVarnway
03de5a38e2 neon matches "C" when using increase_denoising
If increase_denoising is set,
vp8_denoiser_filter_neon() produced incorrect results.

Change-Id: I645f78e48b8f6657fa8a4b69d2c4d3488a0581dc
2014-05-26 08:06:25 -07:00
Marco Paniconi
6da66e1114 vp8: Add increase_denoising parameter to denoiser.
Change-Id: I96ed73e109c4f89dd06f3583cf7ecf9277401fae
2014-05-16 15:06:59 -07:00
Marco Paniconi
96d1946e87 Revert "Revert "Remove struct params from vp8_denoiser_filter""
This reverts commit 06e6d56fa1

Change-Id: If95598385b693945d6b144d03b6da8f6a57dac98
2014-05-14 10:55:53 -07:00
Frank Galligan
06e6d56fa1 Revert "Remove struct params from vp8_denoiser_filter"
This reverts commit e516a42527

Change-Id: I7c78712acc737ad5f580181cdab3aa76b23f3ca5
2014-05-07 16:19:20 -07:00
Scott LaVarnway
e516a42527 Remove struct params from vp8_denoiser_filter
This eliminates the asm_offsets dependency for future
all-assembly versions of this function.

Change-Id: I3227073ecfcb8ee6e593934fab941e9081abdda0
2014-05-02 10:31:52 -07:00
Scott LaVarnway
dea687f733 Merge "Improved intrinsic version of vp8_denoiser_filter_neon" 2014-05-02 09:59:59 -07:00
Scott LaVarnway
ff209de82b Improved intrinsic version of vp8_denoiser_filter_neon
Used horizonal add instructions instead of adding
byte lanes.  The encoder performance improved by
~4% for the test clip used.

Change-Id: Iaddd10403fcffb5b3f53b1f591ab2fe0ff002c08
2014-04-30 06:58:16 -07:00
Yunqing Wang
33df6d1fc1 Save NEON registers in VP8 NEON functions
The recent compiler can generate optimized code that uses NEON registers
for various operations besides floating-point operations. Therefore,
only saving callee-saved registers d8 - d15 at the beginning of the
encoder/decoder is not enough anymore. This patch added register saving
code in VP8 NEON functions that use those registers.

Change-Id: Ie9e44f5188cf410990c8aaaac68faceee9dffd31
2014-04-28 14:51:53 -07:00
Martin Storsjo
e5647d6826 arm: Use vreinterpret instead of a plain cast for converting between neon vector types
This fixes building with MSVC for arm.

Change-Id: Iffae0408e0c68760e87e96b9e17d9df8e8cadb1a
2014-01-22 11:28:37 +02:00
Christian Duvivier
b52db6b7e8 ARM NEON version of denoiser.
Change-Id: I951abd4ad0078f78949f3cb79453ac334fb82a7e
2014-01-02 10:51:05 -08:00
Johann
4d5f1955de Remove type from vmvn
datatype is optional for the instruction but clang refuses it.
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/CIHIJIHC.html

It is still required when using an immediate.
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/CIHGGEEB.html

Change-Id: I0fae956c8c0fa3f97578ce80abea247f7fc88705
2013-05-23 13:02:44 -07:00
John Koleszar
a9c7597adc support building vp8 and vp9 into a single lib
Change-Id: Ib8f8a66c9fd31e508cdc9caa662192f38433aa3d
2012-11-15 10:46:17 -08:00
John Koleszar
7b8dfcb5a2 Rough merge of master into experimental
Creates a merge between the master and experimental branches. Fixes a
number of conflicts in the build system to allow *either* VP8 or VP9
to be built. Specifically either:

  $ configure --disable-vp9 $ configure --disable-vp8
  --disable-unit-tests

VP9 still exports its symbols and files as VP8, so that will be
resolved in the next commit.

Unit tests are broken in VP9, but this isn't a new issue. They are
fixed upstream on origin/experimental as of this writing, but rebasing
this merge proved difficult, so will tackle that in a second merge
commit.

Change-Id: I2b7d852c18efd58d1ebc621b8041fe0260442c21
2012-11-07 11:30:16 -08:00
Ronald S. Bultje
4b2c2b9aa4 Rename vp8/ codec directory to vp9/.
Change-Id: Ic084c475844b24092a433ab88138cf58af3abbe4
2012-11-01 16:31:22 -07:00
Ronald S. Bultje
6a4b1e5958 Remove vp8 in local symbols.
For non-static functions, change the prefix to vp9_. For static functions,
remove the prefix. Also fix some comments, remove unused code or unused
function prototypes.

Change-Id: I1f8be05362f66060fe421c3d4c9a906fdf835de5
2012-11-01 10:03:43 -07:00
Ronald S. Bultje
982deebb5e Change name of common top-level structures from VP8 to VP9.
This change encompasses VP8_PTR, VP8_COMP, VP8D_COMP, VP8_COMMON,
VP8Decompressor and VP8Common.

Change-Id: I514ef4ad4e682370f36d656af1c09ee20da216ad
2012-10-31 10:15:08 -07:00
Ronald S. Bultje
f88558fb1d Change encoder vp8_ and vp8cx_ public symbol prefixes to vp9_.
Change-Id: Ie2e3652591b010ded10c216501ce24fd95d0aec5
2012-10-30 22:07:07 -07:00
Jim Bankoski
818ee904a9 remove fdct invoke macros
Remove the fdct invoke macro calls

Change-Id: Ica2431c655819fa012133ee7abc75a16761e5fd6
2012-10-29 11:25:56 -07:00
Deb Mukherjee
7d0656537b Merging in the sixteenth subpel uv experiment
Merges this experiment in to make it easier to run tests on
filter precision, vectorized implementation etc.

Also removes an experimental filter.

Change-Id: I1e8706bb6d4fc469815123939e9c6e0b5ae945cd
2012-08-08 16:57:43 -07:00
John Koleszar
c6b9039fd9 Restyle code
Approximate the Google style guide[1] so that that there's a written
document to follow and tools to check compliance[2].

[1]: http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml
[2]: http://google-styleguide.googlecode.com/svn/trunk/cpplint/cpplint.py

Change-Id: Idf40e3d8dddcc72150f6af127b13e5dab838685f
2012-07-17 11:46:03 -07:00
John Koleszar
d8216b19b6 Merge "Fix compiler warnings" into eider 2012-05-02 16:22:34 -07:00
Timothy B. Terriberry
e50c842755 Fix TEXTRELs in the ARM asm.
Besides imposing a performance penalty at startup in most
 configurations, these relocations break the dynamic linker for
 native Fennec, since it does not support them at all.

Change-Id: Id5dc768609354ebb4379966eb61a7313e6fd18de
2012-05-02 10:36:01 -07:00
Attila Nagy
14c9fce8e4 Fix compiler warnings
Fix code for following warnings:
-Wimplicit-function-declaration
-Wuninitialized
-Wunused-but-set-variable
-Wunused-variable

Change-Id: I2be434f22fdecb903198e8b0711255b4c1a2947a
2012-05-02 10:57:57 +03:00
John Koleszar
3f8349467a remove unused BOOL_CODER::value
Change-Id: Ic7782707afed38c3ec7e996a4a11dc2d55226691
2012-03-29 13:56:48 -07:00
Yaowu Xu
6035da5448 WebM Experimental Codec Branch Snapshot
This is a code snapshot of experimental work currently ongoing for a
next-generation codec.

The codebase has been cut down considerably from the libvpx baseline.
For example, we are currently only supporting VBR 2-pass rate control
and have removed most of the code relating to coding speed, threading,
error resilience, partitions and various other features.  This is in
part to make the codebase easier to work on and experiment with, but
also because we want to have an open discussion about how the bitstream
will be structured and partitioned and not have that conversation
constrained by past work.

Our basic working pattern has been to initially encapsulate experiments
using configure options linked to #IF CONFIG_XXX statements in the
code. Once experiments have matured and we are reasonably happy that
they give benefit and can be merged without breaking other experiments,
we remove the conditional compile statements and merge them in.

Current changes include:
* Temporal coding experiment for segments (though still only 4 max, it
  will likely be increased).
* Segment feature experiment - to allow various bits of information to
  be coded at the segment level. Features tested so far include mode
  and reference frame information, limiting end of block offset and
  transform size, alongside Q and loop filter parameters, but this set
  is very fluid.
* Support for 8x8 transform - 8x8 dct with 2nd order 2x2 haar is used
  in MBs using 16x16 prediction modes within inter frames.
* Compound prediction (combination of signals from existing predictors
  to create a new predictor).
* 8 tap interpolation filters and 1/8th pel motion vectors.
* Loop filter modifications.
* Various entropy modifications and changes to how entropy contexts and
  updates are handled.
* Extended quantizer range matched to transform precision improvements.

There are also ongoing further experiments that we hope to merge in the
near future: For example, coding of motion and other aspects of the
prediction signal to better support larger image formats, use of larger
block sizes (e.g. 32x32 and up) and lossless non-transform based coding
options (especially for key frames). It is our hope that we will be
able to make regular updates and we will warmly welcome community
contributions.

Please be warned that, at this stage, the codebase is currently slower
than VP8 stable branch as most new code has not been optimized, and
even the 'C' has been deliberately written to be simple and obvious,
not fast.

The following graphs have the initial test results, numbers in the
tables measure the compression improvement in terms of percentage. The
build has  the following optional experiments configured:
--enable-experimental --enable-enhanced_interp --enable-uvintra
--enable-high_precision_mv --enable-sixteenth_subpel_uv

CIF Size clips:
http://getwebm.org/tmp/cif/
HD size clips:
http://getwebm.org/tmp/hd/
(stable_20120309 represents encoding results of WebM master branch
build as of commit#7a15907)

They were encoded using the following encode parameters:
--good --cpu-used=0 -t 0 --lag-in-frames=25 --min-q=0 --max-q=63
--end-usage=0 --auto-alt-ref=1 -p 2 --pass=2 --kf-max-dist=9999
--kf-min-dist=0 --drop-frame=0 --static-thresh=0 --bias-pct=50
--minsection-pct=0 --maxsection-pct=800 --sharpness=0
--arnr-maxframes=7 --arnr-strength=3(for HD,6 for CIF)
--arnr-type=3

Change-Id: I5c62ed09cfff5815a2bb34e7820d6a810c23183c
2012-03-15 07:36:47 -07:00
Johann
e50f96a4a3 Move SAD and variance functions to common
The MFQE function of the postprocessor depends on these

Change-Id: I256a37c6de079fe92ce744b1f11e16526d06b50a
2012-03-05 16:50:33 -08:00
Johann
fea3556e20 Fix variance overflow
In the variance calculations the difference is summed and later squared.
When the sum exceeds sqrt(2^31) the value is treated as a negative when
it is shifted which gives incorrect results.

To fix this we cast the result of the multiplication as unsigned.

The alternative fix is to shift sum down by 4 before multiplying.
However that will reduce precision.

For 16x16 blocks the maximum sum is 65280 and sqrt(2^31) is 46340 (and
change).

PPC change is untested.

Change-Id: I1bad27ea0720067def6d71a6da5f789508cec265
2012-02-09 12:38:31 -08:00
John Koleszar
8aae246089 RTCD: finalize removal of old RTCD system
This is the final commit in the series converting to the new RTCD
system. It removes the encoder csystemdependent files and the remaining
global function pointers that didn't conform to the old RTCD system.

Change-Id: I9649706f1bb89f0cbf431ab0e3e7552d37be4d8e
2012-01-30 12:10:48 -08:00
John Koleszar
be8af188d0 RTCD: add block subtraction functions
This commit continues the process of converting to the new RTCD
system.

Change-Id: Id8a287fdd4bd050ea4452e1582ad85520f3081be
2012-01-30 12:10:47 -08:00
John Koleszar
61311e6103 RTCD: add quantizer functions
This commit continues the process of converting to the new RTCD
system.

Change-Id: Iba9df4c03a508e51c37201c621be43523fae87d9
2012-01-30 12:10:46 -08:00
John Koleszar
510e0ab467 RTCD: add FDCT functions
This commit continues the process of converting to the new RTCD
system.

Change-Id: I3f9c07db65eb206f6363d21bdb80e871570da767
2012-01-30 12:10:42 -08:00
John Koleszar
83a91e789c RTCD: add variance functions
This commit continues the process of converting to the new RTCD
system.

Change-Id: Ie5c1aa480637e98dc3918fb562ff45c37a66c538
2012-01-30 12:08:30 -08:00
John Koleszar
f103dcefaf RTCD: add subpixel functions
This commit continues the process of converting to the new RTCD
system.

Change-Id: I6c519ab61e4f4e0ebcc796f2df061f945c48cefe
2012-01-30 12:08:29 -08:00
Fritz Koenig
892102842a Disconnect ARM tgt_isa from dsp extensions
A processor with ARMv7 instructions does not
necessarily have NEON dsp extensions.  This CL
has the added side effect of allowing the ability
to enable/disable the dsp extensions cleanly.

Change-Id: Ie1e879b8fe131885bc3d4138a0acc9ffe73a36df
2012-01-20 10:38:15 -08:00
John Koleszar
056bcc8771 remove armv6 files from armv5 build
Make bilinearfilter_arm.c compiled only when HAVE_ARMV6, as its definitions
are v6 only. This is normally not a problem for static builds as the file
is elided at link time, but this was not being done properly for the
--enable-shared --enable-pic build.

Change-Id: Ic800a7cde751f74f22555c5b247f99f9df5e550d
2011-12-19 13:51:11 -08:00
Attila Nagy
97259b460c Fix encoder partitioned output on ARM
API was not returning correct partition sizes on arm targets.
The armv5 token packing functions were not storing the information to the
partition size table.
As a fix, have one boolcoder instance allocated for each partition so
that partition sizes are internally available after all partitions
were encoded. This will also allow more flexibility in producing
several partitions in parallel.

Use buffer validation (overflow check) in all ARM bitpacking
functions.

Change-Id: I31c8a11d8a7613676f0ff50928cb2a2ab14fd169
2011-11-23 12:29:43 +02:00
Scott LaVarnway
edd98b7310 Added predictor stride argument(s) to subtract functions
Patch set 2: 64 bit build fix
Patch set 3: 64 bit crash fix

[Tero]
Patch set 4: Updated ARMv6 and NEON assembly.
             Added also minor NEON optimizations to subtract
             functions.

Patch set 5: x86 stride bug fix

Change-Id: I1fcca93e90c89b89ddc204e1c18f208682675c15
2011-11-15 12:53:01 -05:00
Scott LaVarnway
46639567a0 Merge "Change use of eob in the encoder" 2011-11-03 08:06:06 -07:00
Tero Rintaluoma
e4f2ec7a52 Change use of eob in the encoder
Changed 'int eob' to 'char *eob' in BLOCKD so that both encoder and
decoder will use eobs[25] array from MACROBLOCKD structure. In future,
this will enable use of the decoder side IDCT in the encoder.

Change-Id: I6e1c011628cb8864fd4a0b80f0279ce16a5ca978
2011-11-03 16:08:09 +02:00
Attila Nagy
9452dce181 Fix ARM build problem introduced by CL I3fab6f2b
Update ARM asm implementation of vp8_start_encode to new definition.

Change-Id: Ic44791c969e351082331ba6146c3384c01a0dfad
2011-10-27 09:06:45 +03:00
Attila Nagy
de82809444 Reduce partial frame copy in encoder's pick_filter_level_fast
The partial frame copy function used to copy an extra 8 lines above
and  below. The partial frame filtering can only modify 3 pixel rows
above the partial frame. Reduce copy to bare minimum needed, which is
4 lines, so that partial filtering on copied frame is possible.

Define the "magic" fraction number for partial filtering in
loopfilter.h .

Change-Id: I4791ffc541b6884b12759a0d0714a8faf16147ec
2011-10-26 15:25:07 +03:00
Attila Nagy
a5cd42feb9 Fix: vp8cx_pack_tokens_into_partitions_armv5 crash
It was crashing when number of partitions was bigger than the number
of MB rows (ex. 128x96 with 8 partitions).
Start point was not checked against mb_rows, plus extra
"empty" partitions were not written out.

Change-Id: I9c2f013b9ec022354b658fab4ef799ff8b1de93d
2011-10-14 10:53:04 +03:00
Johann
9f41a8b0aa Merge "Replace vpx_ports/config.h with vpx_config.h" 2011-09-22 09:30:18 -07:00
Attila Nagy
1a7d25a484 Replace vpx_ports/config.h with vpx_config.h
Just a clean-up.

Change-Id: Iea5b6dc925dcfa7db548bc1ab1a13d26ed5a2c9a
2011-09-22 13:33:54 +03:00