Compare commits

..

29 Commits

Author SHA1 Message Date
Jingning Han
ac50b75e50 Use balanced model for intra prediction mode coding
This commit replaces the previous table based intra mode model
coding with a more balanced entropy coding system. It reduces the
decoder lookup table size by 1K bytes. The key frame compression
performance is about even on average. There are a few points where
the compression performance is improved by over 5%. Most test
points are fairly close to the lookup table approach.

Change-Id: I47154276c0a6a22ae87de8845bc2d494681b95f6
2015-06-23 16:42:56 -07:00
Jingning Han
81c389e790 Make tx partition entropy coder account for block size
This commit allows the entropy coder for transform block partition
to account for its relative position with respect to the block size.

Change-Id: I2b5019c378bfb58c11b926fa50c0db1933f35852
2015-06-18 21:56:30 +00:00
Jingning Han
0a42a1efd4 Add max_tx_size to MB_MODE_INFO
Refactor the recursive transform block partition to reduce repeated
computation maximum transform block size per block.

Change-Id: Ib408c78dc6923fe7d337dc937e74f2701ac63859
2015-06-18 14:54:49 -07:00
Jingning Han
2aa2ef4094 Make loop filter support variable transform block size
This commit refactors the loop filter implementation to make it
support recursive transform block partition.

Change-Id: Ica2daa9cb54730cff7770ee2c2d7ffdb240ff418
2015-06-16 18:56:47 -07:00
Jingning Han
85c220b2c4 Turn on loop filter
Temporarily use univariate transform size for loop filter.
As compared to VP9 master branch with loop filter turned on, the
compression gains are:

derf  0.671%
mr    0.749%
stdhd 0.886%
hr    1.394%

The encoding speed currently is about 1.3X that of speed 0.

Change-Id: I64788f894e70fde14c5be3159501bedf836e5998
2015-06-16 08:49:13 -07:00
Jingning Han
7cbea06386 Update transform block partition information for intra blocks
If a block is coded in the intra modes, update the transform block
partition information as maximum block size.

Change-Id: I5ea440c700fc887ff2fe84fabde77a9d896d16f4
2015-06-15 15:53:19 -07:00
Jingning Han
a4fd58a761 Refactor tx_block_rd_b() to compute per block rd cost
This commit makes the tx_block_rd_b() compute the rate and
distortion cost per transform block, instead of accumulating these
costs.

Change-Id: Iff5adc4c27cc54f8e6eb3abd95f8d88ba00f462c
2015-06-15 09:08:00 -07:00
Jingning Han
e272e5b8fb Skip redundant flag reset
If the skip flag is already on, there is no need to further check
the all zero block case. This improves encoding speed at no coding
statistics change.

Change-Id: Icab997ca2977e650351a47ff1314def5ac4ecb1d
2015-06-12 11:44:01 -07:00
Jingning Han
5180368403 Allow encoder to force all zero coefficient block
This commit allows the encoder to force all zero quantized
coefficient block per transform block, if that provides better
rate-distortion trade-off.

Change-Id: I5b57b28cccd257ebfaf7c1749dda7be482abc834
2015-06-12 09:18:10 -07:00
Jingning Han
63c0d8df9f Assign largest transform block size to skip block
If a block has all coefficients quantized to zero, the codec will
assume that it uses largest transform block size.

Change-Id: I1a32527e50026e8e4759ad8de474189cd20e89c8
2015-06-11 11:01:44 -07:00
Jingning Han
9ce132ac37 Refactor transform block partition entropy coding
This commit refactors the transform block partition entropy
coding process to improve the encoding speed. There is no change
in the compression statistics.

Change-Id: I237466fd95c1b888df432babfa36e01f74240eef
2015-06-11 09:41:20 -07:00
Jingning Han
9692042493 Refactor transform block partition update process
Unify transform block partition update process used in rate
distortion optimization and encoding stage.

Change-Id: I4e5f2b6d2482c53ceadb7c8743435158f229a82c
2015-06-10 10:01:31 -07:00
Jingning Han
87a0d5436b Account for context information for partition rate estimate
This commit allows the encoder to account for the boundary block
information to estimate the transform block partitiion rate cost
in the rate-distortion optimization scheme.

Change-Id: Idb79cf936d96cdd15bcba27e47318295413a5f5d
2015-06-09 15:53:55 -07:00
Jingning Han
948c6d882e Enable transform block partition entropy coding
Select the probability model for transform block partition coding
conditioned on the neighbor transform block sizes.

Change-Id: Ib701296e59009bad97dbd21d8dcd58bc5e552f39
2015-06-09 12:30:52 -07:00
Jingning Han
79d6b8fc85 Properly handle boundary block rate distortion computation
This commit makes the encoder to properly compute the rate
distortion cost for blocks that partially cover extend pixels.

Change-Id: I44529af6f76925cdc0f6b24a5d190b51b0813983
2015-06-09 11:14:24 -07:00
Jingning Han
b54dd00f53 Align the intra and inter mode cost measurement
This commit aligns the measurement method used to evaluate both
intra and inter modes.

Change-Id: I8071584ce87fa3c5401800363daa0e670de29af5
2015-06-05 11:37:21 -07:00
Jingning Han
3239e22a42 Conditionally use recursive transform block partition search
If the frame header sets to use fixed transform block size, use
the univariate transform block partition search flow.

Change-Id: Ic422ecb6565642cd8ddb96dc67a37109ef3ce90f
2015-06-03 11:14:26 -07:00
Jingning Han
a96f2ca319 Rework the rate and distortion computation pipeline
This allows the encoder to use more precise rate and distortion
costs for mode decision.

Change-Id: I7cfd676a88531a194b9a509375feea8365e5ef12
2015-06-02 23:15:09 -07:00
Jingning Han
0207dcde4a Fix rate estimate issue in transform block partition coding
This commit fixes the over count issue in the recursive transform
block partition rate cost estimation. It improves the compression
performance by about 0.45%.

Change-Id: I01ccda954ed0e120263977472c1c759c3c67170c
2015-06-02 18:51:03 -07:00
Jingning Han
33f05e90fe Enable rate-distortion optimization for transform partition
This commit enables the rate-distortion optimization for recursive
transform block partition for inter mode blocks based on luma
component. The chroma component infers the transform block size
decision from those of luma component.

Change-Id: I907cc52af888a606b718e087e717b189fa505748
2015-06-01 16:50:36 -07:00
Jingning Han
0451c6b6dd Refactor per block rate distortion estimate
Move the rate-distortion estimate function outside the recursion
as an individual operating module.

Change-Id: I662199223c256664bcd312084b3aebffb8a8034b
2015-06-01 12:41:45 -07:00
Jingning Han
d4b8dd76c4 Make chroma component RD estimate support transform partition
This commit makes the rate-distortion estimation of the chroma
components support the recursive transform block partition
inferred from the luma component mode decisions.

Change-Id: I2e038bebf558da406e966015952ad1058bdf4766
2015-06-01 11:15:15 -07:00
Jingning Han
cd4aca5959 Add decoder support to recursive transform block partition
It allows the decoder to recursively parse and use the transform
block size for inter coded blocks.

Change-Id: I12ceea48ab35501ac1a3447142deb2a334eff3b8
2015-05-22 16:45:34 -07:00
Jingning Han
64f3820f80 Refactor bit-stream syntax support to transform partition
Make the bit-stream syntax elelment coding ready to support
variable transform coding block sizes.

Change-Id: I07ae4ab62d1ecd46c4a5ae45702fc14bd1d4b07d
2015-05-22 12:13:29 -07:00
Jingning Han
6fc13b5cc2 Inter block transform coding partition syntax elements
Allocate memory buffer to store the transform coding partition
information of inter prediction mode blocks.

Change-Id: I428b1dd0b26e8eaf24030a833554ceb4479c5551
2015-05-22 10:57:36 -07:00
Jingning Han
df2042dc1e Synchronize encoding process and tokenization handle
The encoding and tokenization process support the recursive
transform block partition coding scheme.

Change-Id: I47283cc6ee9c383059950623ece60a0fcce82e00
2015-05-21 18:51:27 -07:00
Jingning Han
a15cf9a5b7 Synchronize tokenization and detokenization process
Make the encoder and decoder synchronized for recursive
tokenization coding.

Change-Id: I84c5f3dfc3ee9982ab57e658ffe6cb17a949eda2
2015-05-22 01:45:31 +00:00
Jingning Han
bf99a00340 Arrange tokenization order to support recursive txfm block coding
Make the encoder packetize transform block in a recursive order.
Note that the block index with respect to the coding block remains
identical.

Change-Id: I07c6d2017f4f150274aff46c05388a7fd47cd920
2015-05-21 18:43:37 -07:00
Jingning Han
5f6fe83ac5 Syntax coding support for transform block coding
This commit re-designs the bitstream syntax to support recursive
transform block partition. It disables the decoder vector unit
tests.

Change-Id: I6cac24c4f1e44f29ffcc9b87ba1167eeb32d1b69
2015-05-18 15:43:02 -07:00
215 changed files with 15395 additions and 8759 deletions

View File

@@ -1,26 +1,18 @@
Adrian Grange <agrange@google.com>
Alex Converse <aconverse@google.com> <alex.converse@gmail.com>
Alexis Ballier <aballier@gentoo.org> <alexis.ballier@gmail.com>
Alpha Lam <hclam@google.com> <hclam@chromium.org>
Deb Mukherjee <debargha@google.com>
Erik Niemeyer <erik.a.niemeyer@intel.com> <erik.a.niemeyer@gmail.com>
Guillaume Martres <gmartres@google.com> <smarter3@gmail.com>
Hangyu Kuang <hkuang@google.com>
Jim Bankoski <jimbankoski@google.com>
John Koleszar <jkoleszar@google.com>
Johann Koenig <johannkoenig@google.com>
Johann Koenig <johannkoenig@google.com> <johann.koenig@duck.com>
John Koleszar <jkoleszar@google.com>
Joshua Litt <joshualitt@google.com> <joshualitt@chromium.org>
Marco Paniconi <marpan@google.com>
Marco Paniconi <marpan@google.com> <marpan@chromium.org>
Johann Koenig <johannkoenig@google.com> <johannkoenig@dhcp-172-19-7-52.mtv.corp.google.com>
Pascal Massimino <pascal.massimino@gmail.com>
Paul Wilkins <paulwilkins@google.com>
Ralph Giles <giles@xiph.org> <giles@entropywave.com>
Ralph Giles <giles@xiph.org> <giles@mozilla.com>
Sami Pietilä <samipietila@google.com>
Tamar Levy <tamar.levy@intel.com>
Tamar Levy <tamar.levy@intel.com> <levytamar82@gmail.com>
Tero Rintaluoma <teror@google.com> <tero.rintaluoma@on2.com>
Timothy B. Terriberry <tterribe@xiph.org> Tim Terriberry <tterriberry@mozilla.com>
Tom Finegan <tomfinegan@google.com>
Ralph Giles <giles@xiph.org> <giles@entropywave.com>
Ralph Giles <giles@xiph.org> <giles@mozilla.com>
Alpha Lam <hclam@google.com> <hclam@chromium.org>
Deb Mukherjee <debargha@google.com>
Yaowu Xu <yaowu@google.com> <yaowu@xuyaowu.com>

29
AUTHORS
View File

@@ -3,11 +3,10 @@
Aaron Watry <awatry@gmail.com>
Abo Talib Mahfoodh <ab.mahfoodh@gmail.com>
Adam Xu <adam@xuyaowu.com>
Adrian Grange <agrange@google.com>
Ahmad Sharif <asharif@google.com>
Alexander Voronov <avoronov@graphics.cs.msu.ru>
Alex Converse <aconverse@google.com>
Alex Converse <alex.converse@gmail.com>
Alexis Ballier <aballier@gentoo.org>
Alok Ahuja <waveletcoeff@gmail.com>
Alpha Lam <hclam@google.com>
@@ -15,58 +14,44 @@ A.Mahfoodh <ab.mahfoodh@gmail.com>
Ami Fischman <fischman@chromium.org>
Andoni Morales Alastruey <ylatuya@gmail.com>
Andres Mejia <mcitadel@gmail.com>
Andrew Russell <anrussell@google.com>
Aron Rosenberg <arosenberg@logitech.com>
Attila Nagy <attilanagy@google.com>
changjun.yang <changjun.yang@intel.com>
Charles 'Buck' Krasic <ckrasic@google.com>
chm <chm@rock-chips.com>
Christian Duvivier <cduvivier@google.com>
Daniel Kang <ddkang@google.com>
Deb Mukherjee <debargha@google.com>
Dim Temp <dimtemp0@gmail.com>
Dmitry Kovalev <dkovalev@google.com>
Dragan Mrdjan <dmrdjan@mips.com>
Ehsan Akhgari <ehsan.akhgari@gmail.com>
Erik Niemeyer <erik.a.niemeyer@intel.com>
Erik Niemeyer <erik.a.niemeyer@gmail.com>
Fabio Pedretti <fabio.ped@libero.it>
Frank Galligan <fgalligan@google.com>
Fredrik Söderquist <fs@opera.com>
Fritz Koenig <frkoenig@google.com>
Gaute Strokkenes <gaute.strokkenes@broadcom.com>
Giuseppe Scrivano <gscrivano@gnu.org>
Gordana Cmiljanovic <gordana.cmiljanovic@imgtec.com>
Guillaume Martres <gmartres@google.com>
Guillermo Ballester Valor <gbvalor@gmail.com>
Hangyu Kuang <hkuang@google.com>
Hanno Böck <hanno@hboeck.de>
Henrik Lundin <hlundin@google.com>
Hui Su <huisu@google.com>
Ivan Maltz <ivanmaltz@google.com>
Jacek Caban <cjacek@gmail.com>
JackyChen <jackychen@google.com>
James Berry <jamesberry@google.com>
James Yu <james.yu@linaro.org>
James Zern <jzern@google.com>
Jan Gerber <j@mailb.org>
Jan Kratochvil <jan.kratochvil@redhat.com>
Janne Salonen <jsalonen@google.com>
Jeff Faust <jfaust@google.com>
Jeff Muizelaar <jmuizelaar@mozilla.com>
Jeff Petkau <jpet@chromium.org>
Jia Jia <jia.jia@linaro.org>
Jim Bankoski <jimbankoski@google.com>
Jingning Han <jingning@google.com>
Joey Parrish <joeyparrish@google.com>
Johann Koenig <johannkoenig@google.com>
John Koleszar <jkoleszar@google.com>
John Stark <jhnstrk@gmail.com>
Joshua Bleecher Snyder <josh@treelinelabs.com>
Joshua Litt <joshualitt@google.com>
Justin Clift <justin@salasaga.org>
Justin Lebar <justin.lebar@gmail.com>
KO Myung-Hun <komh@chollian.net>
Lawrence Velázquez <larryv@macports.org>
Lou Quillio <louquillio@google.com>
Luca Barbato <lu_zero@gentoo.org>
Makoto Kato <makoto.kt@gmail.com>
@@ -80,7 +65,6 @@ Michael Kohler <michaelkohler@live.com>
Mike Frysinger <vapier@chromium.org>
Mike Hommey <mhommey@mozilla.com>
Mikhal Shemer <mikhal@google.com>
Minghai Shang <minghai@google.com>
Morton Jonuschat <yabawock@gmail.com>
Parag Salasakar <img.mips1@gmail.com>
Pascal Massimino <pascal.massimino@gmail.com>
@@ -88,8 +72,6 @@ Patrik Westin <patrik.westin@gmail.com>
Paul Wilkins <paulwilkins@google.com>
Pavol Rusnak <stick@gk2.sk>
Paweł Hajdan <phajdan@google.com>
Pengchong Jin <pengchong@google.com>
Peter de Rivaz <peter.derivaz@gmail.com>
Philip Jägenstedt <philipj@opera.com>
Priit Laes <plaes@plaes.org>
Rafael Ávila de Espíndola <rafael.espindola@gmail.com>
@@ -97,29 +79,22 @@ Rafaël Carré <funman@videolan.org>
Ralph Giles <giles@xiph.org>
Rob Bradford <rob@linux.intel.com>
Ronald S. Bultje <rbultje@google.com>
Rui Ueyama <ruiu@google.com>
Sami Pietilä <samipietila@google.com>
Scott Graham <scottmg@chromium.org>
Scott LaVarnway <slavarnway@google.com>
Sean McGovern <gseanmcg@gmail.com>
Sergey Ulanov <sergeyu@chromium.org>
Shimon Doodkin <helpmepro1@gmail.com>
Stefan Holmer <holmer@google.com>
Suman Sunkara <sunkaras@google.com>
Taekhyun Kim <takim@nvidia.com>
Takanori MATSUURA <t.matsuu@gmail.com>
Tamar Levy <tamar.levy@intel.com>
Tao Bai <michaelbai@chromium.org>
Tero Rintaluoma <teror@google.com>
Thijs Vermeir <thijsvermeir@gmail.com>
Tim Kopp <tkopp@google.com>
Timothy B. Terriberry <tterribe@xiph.org>
Tom Finegan <tomfinegan@google.com>
Vignesh Venkatasubramanian <vigneshv@google.com>
Yaowu Xu <yaowu@google.com>
Yongzhe Wang <yongzhe@google.com>
Yunqing Wang <yunqingwang@google.com>
Zoe Liu <zoeliu@google.com>
Google Inc.
The Mozilla Foundation
The Xiph.Org Foundation

View File

@@ -1,26 +1,3 @@
2015-04-03 v1.4.0 "Indian Runner Duck"
This release includes significant improvements to the VP9 codec.
- Upgrading:
This release is ABI incompatible with 1.3.0. It drops the compatibility
layer, requiring VPX_IMG_FMT_* instead of IMG_FMT_*, and adds several codec
controls for VP9.
- Enhancements:
Faster VP9 encoding and decoding
Multithreaded VP9 decoding (tile and frame-based)
Multithreaded VP9 encoding - on by default
YUV 4:2:2 and 4:4:4 support in VP9
10 and 12bit support in VP9
64bit ARM support by replacing ARM assembly with intrinsics
- Bug Fixes:
Fixes a VP9 bitstream issue in Profile 1. This only affected non-YUV 4:2:0
files.
- Known Issues:
Frame Parallel decoding fails for segmented and non-420 files.
2013-11-15 v1.3.0 "Forest"
This release introduces the VP9 codec in a backward-compatible way.
All existing users of VP8 can continue to use the library without

View File

@@ -17,7 +17,7 @@ or agree to the institution of patent litigation or any other patent
enforcement activity against any entity (including a cross-claim or
counterclaim in a lawsuit) alleging that any of these implementations of WebM
or any code incorporated within any of these implementations of WebM
constitute direct or contributory patent infringement, or inducement of
constitutes direct or contributory patent infringement, or inducement of
patent infringement, then any patent rights granted to you under this License
for these implementations of WebM shall terminate as of the date such
litigation is filed.

11
README
View File

@@ -1,4 +1,4 @@
README - 23 March 2015
README - 30 May 2014
Welcome to the WebM VP8/VP9 Codec SDK!
@@ -62,6 +62,12 @@ COMPILING THE APPLICATIONS/LIBRARIES:
armv7s-darwin-gcc
mips32-linux-gcc
mips64-linux-gcc
ppc32-darwin8-gcc
ppc32-darwin9-gcc
ppc32-linux-gcc
ppc64-darwin8-gcc
ppc64-darwin9-gcc
ppc64-linux-gcc
sparc-solaris-gcc
x86-android-gcc
x86-darwin8-gcc
@@ -72,7 +78,6 @@ COMPILING THE APPLICATIONS/LIBRARIES:
x86-darwin11-gcc
x86-darwin12-gcc
x86-darwin13-gcc
x86-darwin14-gcc
x86-iphonesimulator-gcc
x86-linux-gcc
x86-linux-icc
@@ -90,7 +95,6 @@ COMPILING THE APPLICATIONS/LIBRARIES:
x86_64-darwin11-gcc
x86_64-darwin12-gcc
x86_64-darwin13-gcc
x86_64-darwin14-gcc
x86_64-iphonesimulator-gcc
x86_64-linux-gcc
x86_64-linux-icc
@@ -107,7 +111,6 @@ COMPILING THE APPLICATIONS/LIBRARIES:
universal-darwin11-gcc
universal-darwin12-gcc
universal-darwin13-gcc
universal-darwin14-gcc
generic-gnu
The generic-gnu target, in conjunction with the CROSS environment variable,

View File

@@ -158,6 +158,8 @@ LOCAL_CFLAGS += \
LOCAL_MODULE := libvpx
LOCAL_LDLIBS := -llog
ifeq ($(CONFIG_RUNTIME_CPU_DETECT),yes)
LOCAL_STATIC_LIBRARIES := cpufeatures
endif
@@ -182,11 +184,7 @@ clean:
@$(RM) -r $(ASM_CNV_PATH)
@$(RM) $(CLEAN-OBJS)
ifeq ($(ENABLE_SHARED),1)
include $(BUILD_SHARED_LIBRARY)
else
include $(BUILD_STATIC_LIBRARY)
endif
include $(BUILD_SHARED_LIBRARY)
ifeq ($(CONFIG_RUNTIME_CPU_DETECT),yes)
$(call import-module,cpufeatures)

View File

@@ -383,8 +383,8 @@ LIBS=$(call enabled,LIBS)
.libs: $(LIBS)
@touch $@
$(foreach lib,$(filter %_g.a,$(LIBS)),$(eval $(call archive_template,$(lib))))
$(foreach lib,$(filter %so.$(SO_VERSION_MAJOR).$(SO_VERSION_MINOR).$(SO_VERSION_PATCH),$(LIBS)),$(eval $(call so_template,$(lib))))
$(foreach lib,$(filter %$(SO_VERSION_MAJOR).dylib,$(LIBS)),$(eval $(call dl_template,$(lib))))
$(foreach lib,$(filter %so.$(VERSION_MAJOR).$(VERSION_MINOR).$(VERSION_PATCH),$(LIBS)),$(eval $(call so_template,$(lib))))
$(foreach lib,$(filter %$(VERSION_MAJOR).dylib,$(LIBS)),$(eval $(call dl_template,$(lib))))
INSTALL-LIBS=$(call cond_enabled,CONFIG_INSTALL_LIBS,INSTALL-LIBS)
ifeq ($(MAKECMDGOALS),dist)

View File

@@ -640,6 +640,12 @@ process_common_toolchain() {
*i[3456]86*)
tgt_isa=x86
;;
*powerpc64*)
tgt_isa=ppc64
;;
*powerpc*)
tgt_isa=ppc32
;;
*sparc*)
tgt_isa=sparc
;;
@@ -1036,30 +1042,25 @@ EOF
disable_feature fast_unaligned
fi
if enabled runtime_cpu_detect; then
disable_feature runtime_cpu_detect
fi
if [ -n "${tune_cpu}" ]; then
case ${tune_cpu} in
p5600)
check_add_cflags -mips32r5 -funroll-loops -mload-store-pairs
check_add_cflags -msched-weight -mhard-float -mfp64
check_add_asflags -mips32r5 -mhard-float -mfp64
check_add_ldflags -mfp64
add_cflags -mips32r5 -funroll-loops -mload-store-pairs
add_cflags -msched-weight -mhard-float
add_asflags -mips32r5 -mhard-float
;;
i6400)
check_add_cflags -mips64r6 -mabi=64 -funroll-loops -msched-weight
check_add_cflags -mload-store-pairs -mhard-float -mfp64
check_add_asflags -mips64r6 -mabi=64 -mhard-float -mfp64
check_add_ldflags -mips64r6 -mabi=64 -mfp64
add_cflags -mips64r6 -mabi=64 -funroll-loops -mload-store-pairs
add_cflags -msched-weight -mhard-float
add_asflags -mips64r6 -mabi=64 -mhard-float
add_ldflags -mips64r6 -mabi=64
;;
esac
if enabled msa; then
add_cflags -mmsa
add_asflags -mmsa
add_ldflags -mmsa
add_cflags -mmsa -mfp64 -flax-vector-conversions
add_asflags -mmsa -mfp64 -flax-vector-conversions
add_ldflags -mmsa -mfp64 -flax-vector-conversions
disable_feature fast_unaligned
fi
@@ -1069,6 +1070,29 @@ EOF
check_add_asflags -march=${tgt_isa}
check_add_asflags -KPIC
;;
ppc*)
enable_feature ppc
bits=${tgt_isa##ppc}
link_with_cc=gcc
setup_gnu_toolchain
add_asflags -force_cpusubtype_ALL -I"\$(dir \$<)darwin"
soft_enable altivec
enabled altivec && add_cflags -maltivec
case "$tgt_os" in
linux*)
add_asflags -maltivec -mregnames -I"\$(dir \$<)linux"
;;
darwin*)
darwin_arch="-arch ppc"
enabled ppc64 && darwin_arch="${darwin_arch}64"
add_cflags ${darwin_arch} -m${bits} -fasm-blocks
add_asflags ${darwin_arch} -force_cpusubtype_ALL -I"\$(dir \$<)darwin"
add_ldflags ${darwin_arch} -m${bits}
enabled altivec && add_cflags -faltivec
;;
esac
;;
x86*)
case ${tgt_os} in
win*)
@@ -1305,15 +1329,11 @@ EOF
# only for MIPS platforms
case ${toolchain} in
mips*)
if enabled big_endian; then
if enabled dspr2; then
if enabled dspr2; then
if enabled big_endian; then
echo "dspr2 optimizations are available only for little endian platforms"
disable_feature dspr2
fi
if enabled msa; then
echo "msa optimizations are available only for little endian platforms"
disable_feature msa
fi
fi
;;
esac

23
configure vendored
View File

@@ -40,6 +40,7 @@ Advanced options:
${toggle_vp8} VP8 codec support
${toggle_vp9} VP9 codec support
${toggle_internal_stats} output of encoder internal stats for debug, if supported (encoders)
${toggle_mem_tracker} track memory usage
${toggle_postproc} postprocessing
${toggle_vp9_postproc} vp9 specific postprocessing
${toggle_multithread} multithreaded encoding and decoding
@@ -111,6 +112,12 @@ all_platforms="${all_platforms} armv7-win32-vs12"
all_platforms="${all_platforms} armv7s-darwin-gcc"
all_platforms="${all_platforms} mips32-linux-gcc"
all_platforms="${all_platforms} mips64-linux-gcc"
all_platforms="${all_platforms} ppc32-darwin8-gcc"
all_platforms="${all_platforms} ppc32-darwin9-gcc"
all_platforms="${all_platforms} ppc32-linux-gcc"
all_platforms="${all_platforms} ppc64-darwin8-gcc"
all_platforms="${all_platforms} ppc64-darwin9-gcc"
all_platforms="${all_platforms} ppc64-linux-gcc"
all_platforms="${all_platforms} sparc-solaris-gcc"
all_platforms="${all_platforms} x86-android-gcc"
all_platforms="${all_platforms} x86-darwin8-gcc"
@@ -240,6 +247,8 @@ ARCH_LIST="
mips
x86
x86_64
ppc32
ppc64
"
ARCH_EXT_LIST="
edsp
@@ -260,6 +269,8 @@ ARCH_EXT_LIST="
sse4_1
avx
avx2
altivec
"
HAVE_LIST="
${ARCH_EXT_LIST}
@@ -295,6 +306,9 @@ CONFIG_LIST="
codec_srcs
debug_libs
fast_unaligned
mem_manager
mem_tracker
mem_checks
dequant_tokens
dc_recon
@@ -369,6 +383,7 @@ CMDLINE_SELECT="
${CODECS}
${CODEC_FAMILIES}
static_msvcrt
mem_tracker
spatial_resampling
realtime_only
onthefly_bitpacking
@@ -606,6 +621,12 @@ process_toolchain() {
universal-darwin*)
darwin_ver=${tgt_os##darwin}
# Snow Leopard (10.6/darwin10) dropped support for PPC
# Include PPC support for all prior versions
if [ $darwin_ver -lt 10 ]; then
fat_bin_archs="$fat_bin_archs ppc32-${tgt_os}-gcc"
fi
# Tiger (10.4/darwin8) brought support for x86
if [ $darwin_ver -ge 8 ]; then
fat_bin_archs="$fat_bin_archs x86-${tgt_os}-${tgt_cc}"
@@ -706,7 +727,7 @@ process_toolchain() {
esac
# Other toolchain specific defaults
case $toolchain in x86*|universal*) soft_enable postproc;; esac
case $toolchain in x86*|ppc*|universal*) soft_enable postproc;; esac
if enabled postproc_visualizer; then
enabled postproc || die "postproc_visualizer requires postproc to be enabled"

View File

@@ -674,14 +674,14 @@ int main(int argc, char **argv) {
if (strncmp(encoder->name, "vp8", 3) == 0) {
vpx_codec_control(&codec, VP8E_SET_CPUUSED, -speed);
vpx_codec_control(&codec, VP8E_SET_NOISE_SENSITIVITY, kDenoiserOff);
vpx_codec_control(&codec, VP8E_SET_STATIC_THRESHOLD, 0);
vpx_codec_control(&codec, VP8E_SET_NOISE_SENSITIVITY, kDenoiserOnYOnly);
vpx_codec_control(&codec, VP8E_SET_STATIC_THRESHOLD, 1);
} else if (strncmp(encoder->name, "vp9", 3) == 0) {
vpx_codec_control(&codec, VP8E_SET_CPUUSED, speed);
vpx_codec_control(&codec, VP9E_SET_AQ_MODE, 3);
vpx_codec_control(&codec, VP9E_SET_FRAME_PERIODIC_BOOST, 0);
vpx_codec_control(&codec, VP9E_SET_NOISE_SENSITIVITY, 0);
vpx_codec_control(&codec, VP8E_SET_STATIC_THRESHOLD, 0);
vpx_codec_control(&codec, VP8E_SET_STATIC_THRESHOLD, 1);
vpx_codec_control(&codec, VP9E_SET_TILE_COLUMNS, (cfg.g_threads >> 1));
if (vpx_codec_control(&codec, VP9E_SET_SVC, layering_mode > 0 ? 1: 0)) {
die_codec(&codec, "Failed to set SVC");

14
libs.mk
View File

@@ -230,27 +230,25 @@ $(BUILD_PFX)libvpx_g.a: $(LIBVPX_OBJS)
BUILD_LIBVPX_SO := $(if $(BUILD_LIBVPX),$(CONFIG_SHARED))
SO_VERSION_MAJOR := 2
SO_VERSION_MINOR := 0
SO_VERSION_PATCH := 0
ifeq ($(filter darwin%,$(TGT_OS)),$(TGT_OS))
LIBVPX_SO := libvpx.$(SO_VERSION_MAJOR).dylib
LIBVPX_SO := libvpx.$(VERSION_MAJOR).dylib
EXPORT_FILE := libvpx.syms
LIBVPX_SO_SYMLINKS := $(addprefix $(LIBSUBDIR)/, \
libvpx.dylib )
else
LIBVPX_SO := libvpx.so.$(SO_VERSION_MAJOR).$(SO_VERSION_MINOR).$(SO_VERSION_PATCH)
LIBVPX_SO := libvpx.so.$(VERSION_MAJOR).$(VERSION_MINOR).$(VERSION_PATCH)
EXPORT_FILE := libvpx.ver
SYM_LINK := libvpx.so
LIBVPX_SO_SYMLINKS := $(addprefix $(LIBSUBDIR)/, \
libvpx.so libvpx.so.$(SO_VERSION_MAJOR) \
libvpx.so.$(SO_VERSION_MAJOR).$(SO_VERSION_MINOR))
libvpx.so libvpx.so.$(VERSION_MAJOR) \
libvpx.so.$(VERSION_MAJOR).$(VERSION_MINOR))
endif
LIBS-$(BUILD_LIBVPX_SO) += $(BUILD_PFX)$(LIBVPX_SO)\
$(notdir $(LIBVPX_SO_SYMLINKS))
$(BUILD_PFX)$(LIBVPX_SO): $(LIBVPX_OBJS) $(EXPORT_FILE)
$(BUILD_PFX)$(LIBVPX_SO): extralibs += -lm
$(BUILD_PFX)$(LIBVPX_SO): SONAME = libvpx.so.$(SO_VERSION_MAJOR)
$(BUILD_PFX)$(LIBVPX_SO): SONAME = libvpx.so.$(VERSION_MAJOR)
$(BUILD_PFX)$(LIBVPX_SO): EXPORTS_FILE = $(EXPORT_FILE)
libvpx.ver: $(call enabled,CODEC_EXPORTS)

View File

@@ -40,13 +40,7 @@ include $(CLEAR_VARS)
LOCAL_ARM_MODE := arm
LOCAL_MODULE := libvpx_test
LOCAL_STATIC_LIBRARIES := gtest libwebm
ifeq ($(ENABLE_SHARED),1)
LOCAL_SHARED_LIBRARIES := vpx
else
LOCAL_STATIC_LIBRARIES += vpx
endif
LOCAL_SHARED_LIBRARIES := vpx
include $(LOCAL_PATH)/test/test.mk
LOCAL_C_INCLUDES := $(BINDINGS_DIR)
FILTERED_SRC := $(sort $(filter %.cc %.c, $(LIBVPX_TEST_SRCS-yes)))

View File

@@ -1,229 +0,0 @@
/*
* Copyright (c) 2012 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include <string.h>
#include <limits.h>
#include <stdio.h>
#include "./vpx_config.h"
#if CONFIG_VP9_ENCODER
#include "./vp9_rtcd.h"
#endif
#include "test/acm_random.h"
#include "test/clear_system_state.h"
#include "test/register_state_check.h"
#include "test/util.h"
#include "third_party/googletest/src/include/gtest/gtest.h"
#include "vpx_mem/vpx_mem.h"
extern "C"
double vp9_get_blockiness(const unsigned char *img1, int img1_pitch,
const unsigned char *img2, int img2_pitch,
int width, int height);
using libvpx_test::ACMRandom;
namespace {
class BlockinessTestBase : public ::testing::Test {
public:
BlockinessTestBase(int width, int height) : width_(width), height_(height) {}
static void SetUpTestCase() {
source_data_ = reinterpret_cast<uint8_t*>(
vpx_memalign(kDataAlignment, kDataBufferSize));
reference_data_ = reinterpret_cast<uint8_t*>(
vpx_memalign(kDataAlignment, kDataBufferSize));
}
static void TearDownTestCase() {
vpx_free(source_data_);
source_data_ = NULL;
vpx_free(reference_data_);
reference_data_ = NULL;
}
virtual void TearDown() {
libvpx_test::ClearSystemState();
}
protected:
// Handle frames up to 640x480
static const int kDataAlignment = 16;
static const int kDataBufferSize = 640*480;
virtual void SetUp() {
source_stride_ = (width_ + 31) & ~31;
reference_stride_ = width_ * 2;
rnd_.Reset(ACMRandom::DeterministicSeed());
}
void FillConstant(uint8_t *data, int stride, uint8_t fill_constant,
int width, int height) {
for (int h = 0; h < height; ++h) {
for (int w = 0; w < width; ++w) {
data[h * stride + w] = fill_constant;
}
}
}
void FillConstant(uint8_t *data, int stride, uint8_t fill_constant) {
FillConstant(data, stride, fill_constant, width_, height_);
}
void FillRandom(uint8_t *data, int stride, int width, int height) {
for (int h = 0; h < height; ++h) {
for (int w = 0; w < width; ++w) {
data[h * stride + w] = rnd_.Rand8();
}
}
}
void FillRandom(uint8_t *data, int stride) {
FillRandom(data, stride, width_, height_);
}
void FillRandomBlocky(uint8_t *data, int stride) {
for (int h = 0; h < height_; h += 4) {
for (int w = 0; w < width_; w += 4) {
FillRandom(data + h * stride + w, stride, 4, 4);
}
}
}
void FillCheckerboard(uint8_t *data, int stride) {
for (int h = 0; h < height_; h += 4) {
for (int w = 0; w < width_; w += 4) {
if (((h/4) ^ (w/4)) & 1)
FillConstant(data + h * stride + w, stride, 255, 4, 4);
else
FillConstant(data + h * stride + w, stride, 0, 4, 4);
}
}
}
void Blur(uint8_t *data, int stride, int taps) {
int sum = 0;
int half_taps = taps / 2;
for (int h = 0; h < height_; ++h) {
for (int w = 0; w < taps; ++w) {
sum += data[w + h * stride];
}
for (int w = taps; w < width_; ++w) {
sum += data[w + h * stride] - data[w - taps + h * stride];
data[w - half_taps + h * stride] = (sum + half_taps) / taps;
}
}
for (int w = 0; w < width_; ++w) {
for (int h = 0; h < taps; ++h) {
sum += data[h + w * stride];
}
for (int h = taps; h < height_; ++h) {
sum += data[w + h * stride] - data[(h - taps) * stride + w];
data[(h - half_taps) * stride + w] = (sum + half_taps) / taps;
}
}
}
int width_, height_;
static uint8_t* source_data_;
int source_stride_;
static uint8_t* reference_data_;
int reference_stride_;
ACMRandom rnd_;
};
#if CONFIG_VP9_ENCODER
typedef std::tr1::tuple<int, int> BlockinessParam;
class BlockinessVP9Test
: public BlockinessTestBase,
public ::testing::WithParamInterface<BlockinessParam> {
public:
BlockinessVP9Test() : BlockinessTestBase(GET_PARAM(0), GET_PARAM(1)) {}
protected:
int CheckBlockiness() {
return vp9_get_blockiness(source_data_, source_stride_,
reference_data_, reference_stride_,
width_, height_);
}
};
#endif // CONFIG_VP9_ENCODER
uint8_t* BlockinessTestBase::source_data_ = NULL;
uint8_t* BlockinessTestBase::reference_data_ = NULL;
#if CONFIG_VP9_ENCODER
TEST_P(BlockinessVP9Test, SourceBlockierThanReference) {
// Source is blockier than reference.
FillRandomBlocky(source_data_, source_stride_);
FillConstant(reference_data_, reference_stride_, 128);
int super_blocky = CheckBlockiness();
EXPECT_EQ(0, super_blocky) << "Blocky source should produce 0 blockiness.";
}
TEST_P(BlockinessVP9Test, ReferenceBlockierThanSource) {
// Source is blockier than reference.
FillConstant(source_data_, source_stride_, 128);
FillRandomBlocky(reference_data_, reference_stride_);
int super_blocky = CheckBlockiness();
EXPECT_GT(super_blocky, 0.0)
<< "Blocky reference should score high for blockiness.";
}
TEST_P(BlockinessVP9Test, BlurringDecreasesBlockiness) {
// Source is blockier than reference.
FillConstant(source_data_, source_stride_, 128);
FillRandomBlocky(reference_data_, reference_stride_);
int super_blocky = CheckBlockiness();
Blur(reference_data_, reference_stride_, 4);
int less_blocky = CheckBlockiness();
EXPECT_GT(super_blocky, less_blocky)
<< "A straight blur should decrease blockiness.";
}
TEST_P(BlockinessVP9Test, WorstCaseBlockiness) {
// Source is blockier than reference.
FillConstant(source_data_, source_stride_, 128);
FillCheckerboard(reference_data_, reference_stride_);
int super_blocky = CheckBlockiness();
Blur(reference_data_, reference_stride_, 4);
int less_blocky = CheckBlockiness();
EXPECT_GT(super_blocky, less_blocky)
<< "A straight blur should decrease blockiness.";
}
#endif // CONFIG_VP9_ENCODER
using std::tr1::make_tuple;
//------------------------------------------------------------------------------
// C functions
#if CONFIG_VP9_ENCODER
const BlockinessParam c_vp9_tests[] = {
make_tuple(320, 240),
make_tuple(318, 242),
make_tuple(318, 238),
};
INSTANTIATE_TEST_CASE_P(C, BlockinessVP9Test, ::testing::ValuesIn(c_vp9_tests));
#endif
} // namespace

View File

@@ -21,13 +21,13 @@
namespace {
const int kLegacyByteAlignment = 0;
const int kLegacyYPlaneByteAlignment = 32;
const int kNumPlanesToCheck = 3;
const char kVP9TestFile[] = "vp90-2-02-size-lf-1920x1080.webm";
const char kVP9Md5File[] = "vp90-2-02-size-lf-1920x1080.webm.md5";
//const int kLegacyByteAlignment = 0;
//const int kLegacyYPlaneByteAlignment = 32;
//const int kNumPlanesToCheck = 3;
//const char kVP9TestFile[] = "vp90-2-02-size-lf-1920x1080.webm";
//const char kVP9Md5File[] = "vp90-2-02-size-lf-1920x1080.webm.md5";
#if CONFIG_WEBM_IO
#if CONFIG_WEBM_IO && 0
struct ByteAlignmentTestParam {
int byte_alignment;

View File

@@ -1,224 +0,0 @@
/*
* Copyright (c) 2012 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include <string.h>
#include <limits.h>
#include <stdio.h>
#include "./vpx_config.h"
#if CONFIG_VP9_ENCODER
#include "./vp9_rtcd.h"
#endif
#include "test/acm_random.h"
#include "test/clear_system_state.h"
#include "test/register_state_check.h"
#include "test/util.h"
#include "third_party/googletest/src/include/gtest/gtest.h"
#include "vp9/encoder/vp9_ssim.h"
#include "vpx_mem/vpx_mem.h"
extern "C"
double vp9_get_ssim_metrics(uint8_t *img1, int img1_pitch,
uint8_t *img2, int img2_pitch,
int width, int height,
Ssimv *sv2, Metrics *m,
int do_inconsistency);
using libvpx_test::ACMRandom;
namespace {
class ConsistencyTestBase : public ::testing::Test {
public:
ConsistencyTestBase(int width, int height) : width_(width), height_(height) {}
static void SetUpTestCase() {
source_data_[0] = reinterpret_cast<uint8_t*>(
vpx_memalign(kDataAlignment, kDataBufferSize));
reference_data_[0] = reinterpret_cast<uint8_t*>(
vpx_memalign(kDataAlignment, kDataBufferSize));
source_data_[1] = reinterpret_cast<uint8_t*>(
vpx_memalign(kDataAlignment, kDataBufferSize));
reference_data_[1] = reinterpret_cast<uint8_t*>(
vpx_memalign(kDataAlignment, kDataBufferSize));
ssim_array_ = new Ssimv[kDataBufferSize / 16];
}
static void ClearSsim() {
memset(ssim_array_, 0, kDataBufferSize / 16);
}
static void TearDownTestCase() {
vpx_free(source_data_[0]);
source_data_[0] = NULL;
vpx_free(reference_data_[0]);
reference_data_[0] = NULL;
vpx_free(source_data_[1]);
source_data_[1] = NULL;
vpx_free(reference_data_[1]);
reference_data_[1] = NULL;
delete ssim_array_;
}
virtual void TearDown() {
libvpx_test::ClearSystemState();
}
protected:
// Handle frames up to 640x480
static const int kDataAlignment = 16;
static const int kDataBufferSize = 640*480;
virtual void SetUp() {
source_stride_ = (width_ + 31) & ~31;
reference_stride_ = width_ * 2;
rnd_.Reset(ACMRandom::DeterministicSeed());
}
void FillRandom(uint8_t *data, int stride, int width, int height) {
for (int h = 0; h < height; ++h) {
for (int w = 0; w < width; ++w) {
data[h * stride + w] = rnd_.Rand8();
}
}
}
void FillRandom(uint8_t *data, int stride) {
FillRandom(data, stride, width_, height_);
}
void Copy(uint8_t *reference, uint8_t *source) {
memcpy(reference, source, kDataBufferSize);
}
void Blur(uint8_t *data, int stride, int taps) {
int sum = 0;
int half_taps = taps / 2;
for (int h = 0; h < height_; ++h) {
for (int w = 0; w < taps; ++w) {
sum += data[w + h * stride];
}
for (int w = taps; w < width_; ++w) {
sum += data[w + h * stride] - data[w - taps + h * stride];
data[w - half_taps + h * stride] = (sum + half_taps) / taps;
}
}
for (int w = 0; w < width_; ++w) {
for (int h = 0; h < taps; ++h) {
sum += data[h + w * stride];
}
for (int h = taps; h < height_; ++h) {
sum += data[w + h * stride] - data[(h - taps) * stride + w];
data[(h - half_taps) * stride + w] = (sum + half_taps) / taps;
}
}
}
int width_, height_;
static uint8_t* source_data_[2];
int source_stride_;
static uint8_t* reference_data_[2];
int reference_stride_;
static Ssimv *ssim_array_;
Metrics metrics_;
ACMRandom rnd_;
};
#if CONFIG_VP9_ENCODER
typedef std::tr1::tuple<int, int> ConsistencyParam;
class ConsistencyVP9Test
: public ConsistencyTestBase,
public ::testing::WithParamInterface<ConsistencyParam> {
public:
ConsistencyVP9Test() : ConsistencyTestBase(GET_PARAM(0), GET_PARAM(1)) {}
protected:
double CheckConsistency(int frame) {
EXPECT_LT(frame, 2)<< "Frame to check has to be less than 2.";
return
vp9_get_ssim_metrics(source_data_[frame], source_stride_,
reference_data_[frame], reference_stride_,
width_, height_, ssim_array_, &metrics_, 1);
}
};
#endif // CONFIG_VP9_ENCODER
uint8_t* ConsistencyTestBase::source_data_[2] = {NULL, NULL};
uint8_t* ConsistencyTestBase::reference_data_[2] = {NULL, NULL};
Ssimv* ConsistencyTestBase::ssim_array_ = NULL;
#if CONFIG_VP9_ENCODER
TEST_P(ConsistencyVP9Test, ConsistencyIsZero) {
FillRandom(source_data_[0], source_stride_);
Copy(source_data_[1], source_data_[0]);
Copy(reference_data_[0], source_data_[0]);
Blur(reference_data_[0], reference_stride_, 3);
Copy(reference_data_[1], source_data_[0]);
Blur(reference_data_[1], reference_stride_, 3);
double inconsistency = CheckConsistency(1);
inconsistency = CheckConsistency(0);
EXPECT_EQ(inconsistency, 0.0)
<< "Should have 0 inconsistency if they are exactly the same.";
// If sources are not consistent reference frames inconsistency should
// be less than if the source is consistent.
FillRandom(source_data_[0], source_stride_);
FillRandom(source_data_[1], source_stride_);
FillRandom(reference_data_[0], reference_stride_);
FillRandom(reference_data_[1], reference_stride_);
CheckConsistency(0);
inconsistency = CheckConsistency(1);
Copy(source_data_[1], source_data_[0]);
CheckConsistency(0);
double inconsistency2 = CheckConsistency(1);
EXPECT_LT(inconsistency, inconsistency2)
<< "Should have less inconsistency if source itself is inconsistent.";
// Less of a blur should be less inconsistent than more blur coming off a
// a frame with no blur.
ClearSsim();
FillRandom(source_data_[0], source_stride_);
Copy(source_data_[1], source_data_[0]);
Copy(reference_data_[0], source_data_[0]);
Copy(reference_data_[1], source_data_[0]);
Blur(reference_data_[1], reference_stride_, 4);
CheckConsistency(0);
inconsistency = CheckConsistency(1);
ClearSsim();
Copy(reference_data_[1], source_data_[0]);
Blur(reference_data_[1], reference_stride_, 8);
CheckConsistency(0);
inconsistency2 = CheckConsistency(1);
EXPECT_LT(inconsistency, inconsistency2)
<< "Stronger Blur should produce more inconsistency.";
}
#endif // CONFIG_VP9_ENCODER
using std::tr1::make_tuple;
//------------------------------------------------------------------------------
// C functions
#if CONFIG_VP9_ENCODER
const ConsistencyParam c_vp9_tests[] = {
make_tuple(320, 240),
make_tuple(318, 242),
make_tuple(318, 238),
};
INSTANTIATE_TEST_CASE_P(C, ConsistencyVP9Test,
::testing::ValuesIn(c_vp9_tests));
#endif
} // namespace

View File

@@ -398,9 +398,9 @@ class ConvolveTest : public ::testing::TestWithParam<ConvolveParam> {
}
void CopyOutputToRef() {
memcpy(output_ref_, output_, kOutputBufferSize);
vpx_memcpy(output_ref_, output_, kOutputBufferSize);
#if CONFIG_VP9_HIGHBITDEPTH
memcpy(output16_ref_, output16_, kOutputBufferSize);
vpx_memcpy(output16_ref_, output16_, kOutputBufferSize);
#endif
}
@@ -1814,27 +1814,4 @@ INSTANTIATE_TEST_CASE_P(DSPR2, ConvolveTest, ::testing::Values(
make_tuple(32, 64, &convolve8_dspr2),
make_tuple(64, 64, &convolve8_dspr2)));
#endif
#if HAVE_MSA
const ConvolveFunctions convolve8_msa(
vp9_convolve_copy_msa, vp9_convolve_avg_msa,
vp9_convolve8_horiz_msa, vp9_convolve8_avg_horiz_c,
vp9_convolve8_vert_msa, vp9_convolve8_avg_vert_c,
vp9_convolve8_msa, vp9_convolve8_avg_c, 0);
INSTANTIATE_TEST_CASE_P(MSA, ConvolveTest, ::testing::Values(
make_tuple(4, 4, &convolve8_msa),
make_tuple(8, 4, &convolve8_msa),
make_tuple(4, 8, &convolve8_msa),
make_tuple(8, 8, &convolve8_msa),
make_tuple(16, 8, &convolve8_msa),
make_tuple(8, 16, &convolve8_msa),
make_tuple(16, 16, &convolve8_msa),
make_tuple(32, 16, &convolve8_msa),
make_tuple(16, 32, &convolve8_msa),
make_tuple(32, 32, &convolve8_msa),
make_tuple(64, 32, &convolve8_msa),
make_tuple(32, 64, &convolve8_msa),
make_tuple(64, 64, &convolve8_msa)));
#endif // HAVE_MSA
} // namespace

View File

@@ -502,11 +502,11 @@ class Trans16x16TestBase {
fwd_txfm_ref(input_extreme_block, output_ref_block, pitch_, tx_type_);
// clear reconstructed pixel buffers
memset(dst, 0, kNumCoeffs * sizeof(uint8_t));
memset(ref, 0, kNumCoeffs * sizeof(uint8_t));
vpx_memset(dst, 0, kNumCoeffs * sizeof(uint8_t));
vpx_memset(ref, 0, kNumCoeffs * sizeof(uint8_t));
#if CONFIG_VP9_HIGHBITDEPTH
memset(dst16, 0, kNumCoeffs * sizeof(uint16_t));
memset(ref16, 0, kNumCoeffs * sizeof(uint16_t));
vpx_memset(dst16, 0, kNumCoeffs * sizeof(uint16_t));
vpx_memset(ref16, 0, kNumCoeffs * sizeof(uint16_t));
#endif
// quantization with maximum allowed step sizes
@@ -933,4 +933,12 @@ INSTANTIATE_TEST_CASE_P(
make_tuple(&idct16x16_12,
&idct16x16_256_add_12_sse2, 3167, VPX_BITS_12)));
#endif // HAVE_SSE2 && CONFIG_VP9_HIGHBITDEPTH && !CONFIG_EMULATE_HARDWARE
#if HAVE_SSSE3 && !CONFIG_VP9_HIGHBITDEPTH && !CONFIG_EMULATE_HARDWARE
INSTANTIATE_TEST_CASE_P(
SSSE3, Trans16x16DCT,
::testing::Values(
make_tuple(&vp9_fdct16x16_c, &vp9_idct16x16_256_add_ssse3, 0,
VPX_BITS_8)));
#endif // HAVE_SSSE3 && !CONFIG_VP9_HIGHBITDEPTH && !CONFIG_EMULATE_HARDWARE
} // namespace

View File

@@ -29,6 +29,8 @@ void Encoder::InitEncoder(VideoSource *video) {
cfg_.g_timebase = video->timebase();
cfg_.rc_twopass_stats_in = stats_->buf();
// Default to 1 thread.
cfg_.g_threads = 1;
res = vpx_codec_enc_init(&encoder_, CodecInterface(), &cfg_,
init_flags_);
ASSERT_EQ(VPX_CODEC_OK, res) << EncoderError();

View File

@@ -183,10 +183,7 @@ class EncoderTest {
protected:
explicit EncoderTest(const CodecFactory *codec)
: codec_(codec), abort_(false), init_flags_(0), frame_flags_(0),
last_pts_(0) {
// Default to 1 thread.
cfg_.g_threads = 1;
}
last_pts_(0) {}
virtual ~EncoderTest() {}

View File

@@ -398,7 +398,7 @@ TEST_P(ExternalFrameBufferMD5Test, ExtFBMD5Match) {
delete video;
}
#if CONFIG_WEBM_IO
#if CONFIG_WEBM_IO && 0
TEST_F(ExternalFrameBufferTest, MinFrameBuffers) {
// Minimum number of external frame buffers for VP9 is
// #VP9_MAXIMUM_REF_BUFFERS + #VPX_MAXIMUM_WORK_BUFFERS.
@@ -481,8 +481,8 @@ TEST_F(ExternalFrameBufferTest, SetAfterDecode) {
}
#endif // CONFIG_WEBM_IO
VP9_INSTANTIATE_TEST_CASE(ExternalFrameBufferMD5Test,
::testing::ValuesIn(libvpx_test::kVP9TestVectors,
libvpx_test::kVP9TestVectors +
libvpx_test::kNumVP9TestVectors));
//VP9_INSTANTIATE_TEST_CASE(ExternalFrameBufferMD5Test,
// ::testing::ValuesIn(libvpx_test::kVP9TestVectors,
// libvpx_test::kVP9TestVectors +
// libvpx_test::kNumVP9TestVectors));
} // namespace

View File

@@ -110,23 +110,23 @@ TEST_P(InvalidFileTest, ReturnCode) {
RunTest();
}
const DecodeParam kVP9InvalidFileTests[] = {
{1, "invalid-vp90-02-v2.webm"},
{1, "invalid-vp90-2-00-quantizer-00.webm.ivf.s5861_r01-05_b6-.v2.ivf"},
{1, "invalid-vp90-03-v3.webm"},
{1, "invalid-vp90-2-00-quantizer-11.webm.ivf.s52984_r01-05_b6-.ivf"},
{1, "invalid-vp90-2-00-quantizer-11.webm.ivf.s52984_r01-05_b6-z.ivf"},
{1, "invalid-vp90-2-12-droppable_1.ivf.s3676_r01-05_b6-.ivf"},
{1, "invalid-vp90-2-05-resize.ivf.s59293_r01-05_b6-.ivf"},
{1, "invalid-vp90-2-09-subpixel-00.ivf.s20492_r01-05_b6-.v2.ivf"},
{1, "invalid-vp91-2-mixedrefcsp-444to420.ivf"},
{1, "invalid-vp90-2-12-droppable_1.ivf.s73804_r01-05_b6-.ivf"},
{1, "invalid-vp90-2-03-size-224x196.webm.ivf.s44156_r01-05_b6-.ivf"},
{1, "invalid-vp90-2-03-size-202x210.webm.ivf.s113306_r01-05_b6-.ivf"},
};
//const DecodeParam kVP9InvalidFileTests[] = {
// {1, "invalid-vp90-02-v2.webm"},
// {1, "invalid-vp90-2-00-quantizer-00.webm.ivf.s5861_r01-05_b6-.v2.ivf"},
// {1, "invalid-vp90-03-v3.webm"},
// {1, "invalid-vp90-2-00-quantizer-11.webm.ivf.s52984_r01-05_b6-.ivf"},
// {1, "invalid-vp90-2-00-quantizer-11.webm.ivf.s52984_r01-05_b6-z.ivf"},
// {1, "invalid-vp90-2-12-droppable_1.ivf.s3676_r01-05_b6-.ivf"},
// {1, "invalid-vp90-2-05-resize.ivf.s59293_r01-05_b6-.ivf"},
// {1, "invalid-vp90-2-09-subpixel-00.ivf.s20492_r01-05_b6-.v2.ivf"},
// {1, "invalid-vp91-2-mixedrefcsp-444to420.ivf"},
// {1, "invalid-vp90-2-12-droppable_1.ivf.s73804_r01-05_b6-.ivf"},
// {1, "invalid-vp90-2-03-size-224x196.webm.ivf.s44156_r01-05_b6-.ivf"},
// {1, "invalid-vp90-2-03-size-202x210.webm.ivf.s113306_r01-05_b6-.ivf"},
//};
VP9_INSTANTIATE_TEST_CASE(InvalidFileTest,
::testing::ValuesIn(kVP9InvalidFileTests));
//VP9_INSTANTIATE_TEST_CASE(InvalidFileTest,
// ::testing::ValuesIn(kVP9InvalidFileTests));
// This class will include test vectors that are expected to fail
// peek. However they are still expected to have no fatal failures.
@@ -142,26 +142,26 @@ TEST_P(InvalidFileInvalidPeekTest, ReturnCode) {
RunTest();
}
const DecodeParam kVP9InvalidFileInvalidPeekTests[] = {
{1, "invalid-vp90-01-v2.webm"},
};
//const DecodeParam kVP9InvalidFileInvalidPeekTests[] = {
// {1, "invalid-vp90-01-v2.webm"},
//};
VP9_INSTANTIATE_TEST_CASE(InvalidFileInvalidPeekTest,
::testing::ValuesIn(kVP9InvalidFileInvalidPeekTests));
//VP9_INSTANTIATE_TEST_CASE(InvalidFileInvalidPeekTest,
// ::testing::ValuesIn(kVP9InvalidFileInvalidPeekTests));
const DecodeParam kMultiThreadedVP9InvalidFileTests[] = {
{4, "invalid-vp90-2-08-tile_1x4_frame_parallel_all_key.webm"},
{4, "invalid-"
"vp90-2-08-tile_1x2_frame_parallel.webm.ivf.s47039_r01-05_b6-.ivf"},
{4, "invalid-vp90-2-08-tile_1x8_frame_parallel.webm.ivf.s288_r01-05_b6-.ivf"},
{2, "invalid-vp90-2-09-aq2.webm.ivf.s3984_r01-05_b6-.v2.ivf"},
{4, "invalid-vp90-2-09-subpixel-00.ivf.s19552_r01-05_b6-.v2.ivf"},
};
//const DecodeParam kMultiThreadedVP9InvalidFileTests[] = {
// {4, "invalid-vp90-2-08-tile_1x4_frame_parallel_all_key.webm"},
// {4, "invalid-"
// "vp90-2-08-tile_1x2_frame_parallel.webm.ivf.s47039_r01-05_b6-.ivf"},
// {4, "invalid-vp90-2-08-tile_1x8_frame_parallel.webm.ivf.s288_r01-05_b6-.ivf"},
// {2, "invalid-vp90-2-09-aq2.webm.ivf.s3984_r01-05_b6-.v2.ivf"},
// {4, "invalid-vp90-2-09-subpixel-00.ivf.s19552_r01-05_b6-.v2.ivf"},
//};
INSTANTIATE_TEST_CASE_P(
VP9MultiThreaded, InvalidFileTest,
::testing::Combine(
::testing::Values(
static_cast<const libvpx_test::CodecFactory*>(&libvpx_test::kVP9)),
::testing::ValuesIn(kMultiThreadedVP9InvalidFileTests)));
//INSTANTIATE_TEST_CASE_P(
// VP9MultiThreaded, InvalidFileTest,
// ::testing::Combine(
// ::testing::Values(
// static_cast<const libvpx_test::CodecFactory*>(&libvpx_test::kVP9)),
// ::testing::ValuesIn(kMultiThreadedVP9InvalidFileTests)));
} // namespace

View File

@@ -52,7 +52,7 @@ typedef void (*dual_loop_op_t)(uint8_t *s, int p, const uint8_t *blimit0,
const uint8_t *thresh1);
#endif // CONFIG_VP9_HIGHBITDEPTH
typedef std::tr1::tuple<loop_op_t, loop_op_t, int, int> loop8_param_t;
typedef std::tr1::tuple<loop_op_t, loop_op_t, int> loop8_param_t;
typedef std::tr1::tuple<dual_loop_op_t, dual_loop_op_t, int> dualloop8_param_t;
#if HAVE_SSE2
@@ -144,7 +144,6 @@ class Loop8Test6Param : public ::testing::TestWithParam<loop8_param_t> {
loopfilter_op_ = GET_PARAM(0);
ref_loopfilter_op_ = GET_PARAM(1);
bit_depth_ = GET_PARAM(2);
count_ = GET_PARAM(3);
mask_ = (1 << bit_depth_) - 1;
}
@@ -152,7 +151,6 @@ class Loop8Test6Param : public ::testing::TestWithParam<loop8_param_t> {
protected:
int bit_depth_;
int count_;
int mask_;
loop_op_t loopfilter_op_;
loop_op_t ref_loopfilter_op_;
@@ -208,6 +206,7 @@ TEST_P(Loop8Test6Param, OperationCheck) {
tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp
};
int32_t p = kNumCoeffs/32;
int count = 1;
uint16_t tmp_s[kNumCoeffs];
int j = 0;
@@ -239,13 +238,13 @@ TEST_P(Loop8Test6Param, OperationCheck) {
ref_s[j] = s[j];
}
#if CONFIG_VP9_HIGHBITDEPTH
ref_loopfilter_op_(ref_s + 8 + p * 8, p, blimit, limit, thresh, count_, bd);
ref_loopfilter_op_(ref_s + 8 + p * 8, p, blimit, limit, thresh, count, bd);
ASM_REGISTER_STATE_CHECK(
loopfilter_op_(s + 8 + p * 8, p, blimit, limit, thresh, count_, bd));
loopfilter_op_(s + 8 + p * 8, p, blimit, limit, thresh, count, bd));
#else
ref_loopfilter_op_(ref_s+8+p*8, p, blimit, limit, thresh, count_);
ref_loopfilter_op_(ref_s+8+p*8, p, blimit, limit, thresh, count);
ASM_REGISTER_STATE_CHECK(
loopfilter_op_(s + 8 + p * 8, p, blimit, limit, thresh, count_));
loopfilter_op_(s + 8 + p * 8, p, blimit, limit, thresh, count));
#endif // CONFIG_VP9_HIGHBITDEPTH
for (int j = 0; j < kNumCoeffs; ++j) {
@@ -280,8 +279,8 @@ TEST_P(Loop8Test6Param, ValueCheck) {
// function of sharpness_lvl and the loopfilter lvl as:
// block_inside_limit = lvl >> ((sharpness_lvl > 0) + (sharpness_lvl > 4));
// ...
// memset(lfi->lfthr[lvl].mblim, (2 * (lvl + 2) + block_inside_limit),
// SIMD_WIDTH);
// vpx_memset(lfi->lfthr[lvl].mblim, (2 * (lvl + 2) + block_inside_limit),
// SIMD_WIDTH);
// This means that the largest value for mblim will occur when sharpness_lvl
// is equal to 0, and lvl is equal to its greatest value (MAX_LOOP_FILTER).
// In this case block_inside_limit will be equal to MAX_LOOP_FILTER and
@@ -306,18 +305,19 @@ TEST_P(Loop8Test6Param, ValueCheck) {
tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp
};
int32_t p = kNumCoeffs / 32;
int count = 1;
for (int j = 0; j < kNumCoeffs; ++j) {
s[j] = rnd.Rand16() & mask_;
ref_s[j] = s[j];
}
#if CONFIG_VP9_HIGHBITDEPTH
ref_loopfilter_op_(ref_s + 8 + p * 8, p, blimit, limit, thresh, count_, bd);
ref_loopfilter_op_(ref_s + 8 + p * 8, p, blimit, limit, thresh, count, bd);
ASM_REGISTER_STATE_CHECK(
loopfilter_op_(s + 8 + p * 8, p, blimit, limit, thresh, count_, bd));
loopfilter_op_(s + 8 + p * 8, p, blimit, limit, thresh, count, bd));
#else
ref_loopfilter_op_(ref_s+8+p*8, p, blimit, limit, thresh, count_);
ref_loopfilter_op_(ref_s+8+p*8, p, blimit, limit, thresh, count);
ASM_REGISTER_STATE_CHECK(
loopfilter_op_(s + 8 + p * 8, p, blimit, limit, thresh, count_));
loopfilter_op_(s + 8 + p * 8, p, blimit, limit, thresh, count));
#endif // CONFIG_VP9_HIGHBITDEPTH
for (int j = 0; j < kNumCoeffs; ++j) {
err_count += ref_s[j] != s[j];
@@ -521,62 +521,55 @@ INSTANTIATE_TEST_CASE_P(
SSE2, Loop8Test6Param,
::testing::Values(
make_tuple(&vp9_highbd_lpf_horizontal_4_sse2,
&vp9_highbd_lpf_horizontal_4_c, 8, 1),
&vp9_highbd_lpf_horizontal_4_c, 8),
make_tuple(&vp9_highbd_lpf_vertical_4_sse2,
&vp9_highbd_lpf_vertical_4_c, 8, 1),
&vp9_highbd_lpf_vertical_4_c, 8),
make_tuple(&vp9_highbd_lpf_horizontal_8_sse2,
&vp9_highbd_lpf_horizontal_8_c, 8, 1),
&vp9_highbd_lpf_horizontal_8_c, 8),
make_tuple(&vp9_highbd_lpf_horizontal_16_sse2,
&vp9_highbd_lpf_horizontal_16_c, 8, 1),
make_tuple(&vp9_highbd_lpf_horizontal_16_sse2,
&vp9_highbd_lpf_horizontal_16_c, 8, 2),
&vp9_highbd_lpf_horizontal_16_c, 8),
make_tuple(&vp9_highbd_lpf_vertical_8_sse2,
&vp9_highbd_lpf_vertical_8_c, 8, 1),
&vp9_highbd_lpf_vertical_8_c, 8),
make_tuple(&wrapper_vertical_16_sse2,
&wrapper_vertical_16_c, 8, 1),
&wrapper_vertical_16_c, 8),
make_tuple(&vp9_highbd_lpf_horizontal_4_sse2,
&vp9_highbd_lpf_horizontal_4_c, 10, 1),
&vp9_highbd_lpf_horizontal_4_c, 10),
make_tuple(&vp9_highbd_lpf_vertical_4_sse2,
&vp9_highbd_lpf_vertical_4_c, 10, 1),
&vp9_highbd_lpf_vertical_4_c, 10),
make_tuple(&vp9_highbd_lpf_horizontal_8_sse2,
&vp9_highbd_lpf_horizontal_8_c, 10, 1),
&vp9_highbd_lpf_horizontal_8_c, 10),
make_tuple(&vp9_highbd_lpf_horizontal_16_sse2,
&vp9_highbd_lpf_horizontal_16_c, 10, 1),
make_tuple(&vp9_highbd_lpf_horizontal_16_sse2,
&vp9_highbd_lpf_horizontal_16_c, 10, 2),
&vp9_highbd_lpf_horizontal_16_c, 10),
make_tuple(&vp9_highbd_lpf_vertical_8_sse2,
&vp9_highbd_lpf_vertical_8_c, 10, 1),
&vp9_highbd_lpf_vertical_8_c, 10),
make_tuple(&wrapper_vertical_16_sse2,
&wrapper_vertical_16_c, 10, 1),
&wrapper_vertical_16_c, 10),
make_tuple(&vp9_highbd_lpf_horizontal_4_sse2,
&vp9_highbd_lpf_horizontal_4_c, 12, 1),
&vp9_highbd_lpf_horizontal_4_c, 12),
make_tuple(&vp9_highbd_lpf_vertical_4_sse2,
&vp9_highbd_lpf_vertical_4_c, 12, 1),
&vp9_highbd_lpf_vertical_4_c, 12),
make_tuple(&vp9_highbd_lpf_horizontal_8_sse2,
&vp9_highbd_lpf_horizontal_8_c, 12, 1),
&vp9_highbd_lpf_horizontal_8_c, 12),
make_tuple(&vp9_highbd_lpf_horizontal_16_sse2,
&vp9_highbd_lpf_horizontal_16_c, 12, 1),
make_tuple(&vp9_highbd_lpf_horizontal_16_sse2,
&vp9_highbd_lpf_horizontal_16_c, 12, 2),
&vp9_highbd_lpf_horizontal_16_c, 12),
make_tuple(&vp9_highbd_lpf_vertical_8_sse2,
&vp9_highbd_lpf_vertical_8_c, 12, 1),
&vp9_highbd_lpf_vertical_8_c, 12),
make_tuple(&wrapper_vertical_16_sse2,
&wrapper_vertical_16_c, 12, 1),
&wrapper_vertical_16_c, 12),
make_tuple(&wrapper_vertical_16_dual_sse2,
&wrapper_vertical_16_dual_c, 8, 1),
&wrapper_vertical_16_dual_c, 8),
make_tuple(&wrapper_vertical_16_dual_sse2,
&wrapper_vertical_16_dual_c, 10, 1),
&wrapper_vertical_16_dual_c, 10),
make_tuple(&wrapper_vertical_16_dual_sse2,
&wrapper_vertical_16_dual_c, 12, 1)));
&wrapper_vertical_16_dual_c, 12)));
#else
INSTANTIATE_TEST_CASE_P(
SSE2, Loop8Test6Param,
::testing::Values(
make_tuple(&vp9_lpf_horizontal_8_sse2, &vp9_lpf_horizontal_8_c, 8, 1),
make_tuple(&vp9_lpf_horizontal_16_sse2, &vp9_lpf_horizontal_16_c, 8, 1),
make_tuple(&vp9_lpf_horizontal_16_sse2, &vp9_lpf_horizontal_16_c, 8, 2),
make_tuple(&vp9_lpf_vertical_8_sse2, &vp9_lpf_vertical_8_c, 8, 1),
make_tuple(&wrapper_vertical_16_sse2, &wrapper_vertical_16_c, 8, 1)));
make_tuple(&vp9_lpf_horizontal_8_sse2, &vp9_lpf_horizontal_8_c, 8),
make_tuple(&vp9_lpf_horizontal_16_sse2, &vp9_lpf_horizontal_16_c, 8),
make_tuple(&vp9_lpf_vertical_8_sse2, &vp9_lpf_vertical_8_c, 8),
make_tuple(&wrapper_vertical_16_sse2, &wrapper_vertical_16_c, 8)));
#endif // CONFIG_VP9_HIGHBITDEPTH
#endif
@@ -584,9 +577,7 @@ INSTANTIATE_TEST_CASE_P(
INSTANTIATE_TEST_CASE_P(
AVX2, Loop8Test6Param,
::testing::Values(
make_tuple(&vp9_lpf_horizontal_16_avx2, &vp9_lpf_horizontal_16_c, 8, 1),
make_tuple(&vp9_lpf_horizontal_16_avx2, &vp9_lpf_horizontal_16_c, 8,
2)));
make_tuple(&vp9_lpf_horizontal_16_avx2, &vp9_lpf_horizontal_16_c, 8)));
#endif
#if HAVE_SSE2
@@ -644,22 +635,20 @@ INSTANTIATE_TEST_CASE_P(
// Using #if inside the macro is unsupported on MSVS but the tests are not
// currently built for MSVS with ARM and NEON.
make_tuple(&vp9_lpf_horizontal_16_neon,
&vp9_lpf_horizontal_16_c, 8, 1),
make_tuple(&vp9_lpf_horizontal_16_neon,
&vp9_lpf_horizontal_16_c, 8, 2),
&vp9_lpf_horizontal_16_c, 8),
make_tuple(&wrapper_vertical_16_neon,
&wrapper_vertical_16_c, 8, 1),
&wrapper_vertical_16_c, 8),
make_tuple(&wrapper_vertical_16_dual_neon,
&wrapper_vertical_16_dual_c, 8, 1),
&wrapper_vertical_16_dual_c, 8),
make_tuple(&vp9_lpf_horizontal_8_neon,
&vp9_lpf_horizontal_8_c, 8, 1),
&vp9_lpf_horizontal_8_c, 8),
make_tuple(&vp9_lpf_vertical_8_neon,
&vp9_lpf_vertical_8_c, 8, 1),
&vp9_lpf_vertical_8_c, 8),
#endif // HAVE_NEON_ASM
make_tuple(&vp9_lpf_horizontal_4_neon,
&vp9_lpf_horizontal_4_c, 8, 1),
&vp9_lpf_horizontal_4_c, 8),
make_tuple(&vp9_lpf_vertical_4_neon,
&vp9_lpf_vertical_4_c, 8, 1)));
&vp9_lpf_vertical_4_c, 8)));
INSTANTIATE_TEST_CASE_P(
NEON, Loop8Test9Param,
::testing::Values(

View File

@@ -230,7 +230,7 @@ INSTANTIATE_TEST_CASE_P(
&vp9_idct4x4_1_add_c,
TX_4X4, 1)));
#if HAVE_NEON && !CONFIG_VP9_HIGHBITDEPTH && !CONFIG_EMULATE_HARDWARE
#if HAVE_NEON
INSTANTIATE_TEST_CASE_P(
NEON, PartialIDctTest,
::testing::Values(
@@ -258,7 +258,7 @@ INSTANTIATE_TEST_CASE_P(
&vp9_idct4x4_16_add_c,
&vp9_idct4x4_1_add_neon,
TX_4X4, 1)));
#endif // HAVE_NEON && !CONFIG_VP9_HIGHBITDEPTH && !CONFIG_EMULATE_HARDWARE
#endif // HAVE_NEON
#if HAVE_SSE2 && !CONFIG_VP9_HIGHBITDEPTH && !CONFIG_EMULATE_HARDWARE
INSTANTIATE_TEST_CASE_P(
@@ -305,4 +305,13 @@ INSTANTIATE_TEST_CASE_P(
TX_8X8, 12)));
#endif
#if HAVE_SSSE3 && !CONFIG_VP9_HIGHBITDEPTH && !CONFIG_EMULATE_HARDWARE
INSTANTIATE_TEST_CASE_P(
SSSE3, PartialIDctTest,
::testing::Values(
make_tuple(&vp9_fdct16x16_c,
&vp9_idct16x16_256_add_c,
&vp9_idct16x16_10_add_ssse3,
TX_16X16, 10)));
#endif
} // namespace

View File

@@ -63,12 +63,12 @@ TEST_P(VP8PostProcessingFilterTest, FilterOutputCheck) {
uint8_t *const dst_image_ptr = dst_image + 8;
uint8_t *const flimits =
reinterpret_cast<uint8_t *>(vpx_memalign(16, block_width));
(void)memset(flimits, 255, block_width);
(void)vpx_memset(flimits, 255, block_width);
// Initialize pixels in the input:
// block pixels to value 1,
// border pixels to value 10.
(void)memset(src_image, 10, input_size);
(void)vpx_memset(src_image, 10, input_size);
uint8_t *pixel_ptr = src_image_ptr;
for (int i = 0; i < block_height; ++i) {
for (int j = 0; j < block_width; ++j) {
@@ -78,7 +78,7 @@ TEST_P(VP8PostProcessingFilterTest, FilterOutputCheck) {
}
// Initialize pixels in the output to 99.
(void)memset(dst_image, 99, output_size);
(void)vpx_memset(dst_image, 99, output_size);
ASM_REGISTER_STATE_CHECK(
GetParam()(src_image_ptr, dst_image_ptr, input_stride,

View File

@@ -56,7 +56,7 @@ class QuantizeTestBase {
// The full configuration is necessary to generate the quantization tables.
VP8_CONFIG vp8_config;
memset(&vp8_config, 0, sizeof(vp8_config));
vpx_memset(&vp8_config, 0, sizeof(vp8_config));
vp8_comp_ = vp8_create_compressor(&vp8_config);
@@ -69,7 +69,8 @@ class QuantizeTestBase {
// Copy macroblockd from the reference to get pre-set-up dequant values.
macroblockd_dst_ = reinterpret_cast<MACROBLOCKD *>(
vpx_memalign(32, sizeof(*macroblockd_dst_)));
memcpy(macroblockd_dst_, &vp8_comp_->mb.e_mbd, sizeof(*macroblockd_dst_));
vpx_memcpy(macroblockd_dst_, &vp8_comp_->mb.e_mbd,
sizeof(*macroblockd_dst_));
// Fix block pointers - currently they point to the blocks in the reference
// structure.
vp8_setup_block_dptrs(macroblockd_dst_);
@@ -78,7 +79,8 @@ class QuantizeTestBase {
void UpdateQuantizer(int q) {
vp8_set_quantizer(vp8_comp_, q);
memcpy(macroblockd_dst_, &vp8_comp_->mb.e_mbd, sizeof(*macroblockd_dst_));
vpx_memcpy(macroblockd_dst_, &vp8_comp_->mb.e_mbd,
sizeof(*macroblockd_dst_));
vp8_setup_block_dptrs(macroblockd_dst_);
}

View File

@@ -53,7 +53,7 @@ TEST(VP8RoiMapTest, ParameterCheck) {
cpi.common.mb_rows = 240 >> 4;
cpi.common.mb_cols = 320 >> 4;
const int mbs = (cpi.common.mb_rows * cpi.common.mb_cols);
memset(cpi.segment_feature_data, 0, sizeof(cpi.segment_feature_data));
vpx_memset(cpi.segment_feature_data, 0, sizeof(cpi.segment_feature_data));
// Segment map
cpi.segmentation_map = reinterpret_cast<unsigned char *>(vpx_calloc(mbs, 1));
@@ -61,9 +61,9 @@ TEST(VP8RoiMapTest, ParameterCheck) {
// Allocate memory for the source memory map.
unsigned char *roi_map =
reinterpret_cast<unsigned char *>(vpx_calloc(mbs, 1));
memset(&roi_map[mbs >> 2], 1, (mbs >> 2));
memset(&roi_map[mbs >> 1], 2, (mbs >> 2));
memset(&roi_map[mbs -(mbs >> 2)], 3, (mbs >> 2));
vpx_memset(&roi_map[mbs >> 2], 1, (mbs >> 2));
vpx_memset(&roi_map[mbs >> 1], 2, (mbs >> 2));
vpx_memset(&roi_map[mbs -(mbs >> 2)], 3, (mbs >> 2));
// Do a test call with valid parameters.
int roi_retval = vp8_set_roimap(&cpi, roi_map, cpi.common.mb_rows,

View File

@@ -63,9 +63,6 @@ class SvcTest : public ::testing::Test {
vpx_codec_dec_cfg_t dec_cfg = vpx_codec_dec_cfg_t();
VP9CodecFactory codec_factory;
decoder_ = codec_factory.CreateDecoder(dec_cfg, 0);
tile_columns_ = 0;
tile_rows_ = 0;
}
virtual void TearDown() {
@@ -78,8 +75,6 @@ class SvcTest : public ::testing::Test {
vpx_svc_init(&svc_, &codec_, vpx_codec_vp9_cx(), &codec_enc_);
EXPECT_EQ(VPX_CODEC_OK, res);
vpx_codec_control(&codec_, VP8E_SET_CPUUSED, 4); // Make the test faster
vpx_codec_control(&codec_, VP9E_SET_TILE_COLUMNS, tile_columns_);
vpx_codec_control(&codec_, VP9E_SET_TILE_ROWS, tile_rows_);
codec_initialized_ = true;
}
@@ -113,8 +108,7 @@ class SvcTest : public ::testing::Test {
codec_enc_.g_pass = VPX_RC_FIRST_PASS;
InitializeEncoder();
libvpx_test::I420VideoSource video(test_file_name_,
codec_enc_.g_w, codec_enc_.g_h,
libvpx_test::I420VideoSource video(test_file_name_, kWidth, kHeight,
codec_enc_.g_timebase.den,
codec_enc_.g_timebase.num, 0, 30);
video.Begin();
@@ -182,8 +176,7 @@ class SvcTest : public ::testing::Test {
}
InitializeEncoder();
libvpx_test::I420VideoSource video(test_file_name_,
codec_enc_.g_w, codec_enc_.g_h,
libvpx_test::I420VideoSource video(test_file_name_, kWidth, kHeight,
codec_enc_.g_timebase.den,
codec_enc_.g_timebase.num, 0, 30);
video.Begin();
@@ -317,8 +310,6 @@ class SvcTest : public ::testing::Test {
std::string test_file_name_;
bool codec_initialized_;
Decoder *decoder_;
int tile_columns_;
int tile_rows_;
};
TEST_F(SvcTest, SvcInit) {
@@ -746,51 +737,4 @@ TEST_F(SvcTest,
FreeBitstreamBuffers(&outputs[0], 10);
}
TEST_F(SvcTest, TwoPassEncode2TemporalLayersWithTiles) {
// First pass encode
std::string stats_buf;
vpx_svc_set_options(&svc_, "scale-factors=1/1");
svc_.temporal_layers = 2;
Pass1EncodeNFrames(10, 1, &stats_buf);
// Second pass encode
codec_enc_.g_pass = VPX_RC_LAST_PASS;
svc_.temporal_layers = 2;
vpx_svc_set_options(&svc_, "auto-alt-refs=1 scale-factors=1/1");
codec_enc_.g_w = 704;
codec_enc_.g_h = 144;
tile_columns_ = 1;
tile_rows_ = 1;
vpx_fixed_buf outputs[10];
memset(&outputs[0], 0, sizeof(outputs));
Pass2EncodeNFrames(&stats_buf, 10, 1, &outputs[0]);
DecodeNFrames(&outputs[0], 10);
FreeBitstreamBuffers(&outputs[0], 10);
}
TEST_F(SvcTest,
TwoPassEncode2TemporalLayersWithMultipleFrameContextsAndTiles) {
// First pass encode
std::string stats_buf;
vpx_svc_set_options(&svc_, "scale-factors=1/1");
svc_.temporal_layers = 2;
Pass1EncodeNFrames(10, 1, &stats_buf);
// Second pass encode
codec_enc_.g_pass = VPX_RC_LAST_PASS;
svc_.temporal_layers = 2;
codec_enc_.g_error_resilient = 0;
codec_enc_.g_w = 704;
codec_enc_.g_h = 144;
tile_columns_ = 1;
tile_rows_ = 1;
vpx_svc_set_options(&svc_, "auto-alt-refs=1 scale-factors=1/1 "
"multi-frame-contexts=1");
vpx_fixed_buf outputs[10];
memset(&outputs[0], 0, sizeof(outputs));
Pass2EncodeNFrames(&stats_buf, 10, 1, &outputs[0]);
DecodeNFrames(&outputs[0], 10);
FreeBitstreamBuffers(&outputs[0], 10);
}
} // namespace

View File

@@ -150,9 +150,6 @@ LIBVPX_TEST_SRCS-$(CONFIG_VP9) += vp9_intrapred_test.cc
ifeq ($(CONFIG_VP9_ENCODER),yes)
LIBVPX_TEST_SRCS-$(CONFIG_SPATIAL_SVC) += svc_test.cc
LIBVPX_TEST_SRCS-$(CONFIG_INTERNAL_STATS) += blockiness_test.cc
LIBVPX_TEST_SRCS-$(CONFIG_INTERNAL_STATS) += consistency_test.cc
endif
ifeq ($(CONFIG_VP9_ENCODER)$(CONFIG_VP9_TEMPORAL_DENOISING),yesyes)

View File

@@ -15,11 +15,10 @@
extern "C" {
#if CONFIG_VP8
extern void vp8_rtcd();
#endif // CONFIG_VP8
#endif
#if CONFIG_VP9
extern void vp9_rtcd();
#endif // CONFIG_VP9
extern void vpx_scale_rtcd();
#endif
}
#include "third_party/googletest/src/include/gtest/gtest.h"
@@ -60,12 +59,11 @@ int main(int argc, char **argv) {
#if CONFIG_VP8
vp8_rtcd();
#endif // CONFIG_VP8
#endif
#if CONFIG_VP9
vp9_rtcd();
#endif // CONFIG_VP9
vpx_scale_rtcd();
#endif // !CONFIG_SHARED
#endif
#endif
return RUN_ALL_TESTS();
}

View File

@@ -145,28 +145,28 @@ VP8_INSTANTIATE_TEST_CASE(
libvpx_test::kNumVP8TestVectors)));
// Test VP9 decode in serial mode with single thread.
VP9_INSTANTIATE_TEST_CASE(
TestVectorTest,
::testing::Combine(
::testing::Values(0), // Serial Mode.
::testing::Values(1), // Single thread.
::testing::ValuesIn(libvpx_test::kVP9TestVectors,
libvpx_test::kVP9TestVectors +
libvpx_test::kNumVP9TestVectors)));
//VP9_INSTANTIATE_TEST_CASE(
// TestVectorTest,
// ::testing::Combine(
// ::testing::Values(0), // Serial Mode.
// ::testing::Values(1), // Single thread.
// ::testing::ValuesIn(libvpx_test::kVP9TestVectors,
// libvpx_test::kVP9TestVectors +
// libvpx_test::kNumVP9TestVectors)));
#if CONFIG_VP9_DECODER
// Test VP9 decode in frame parallel mode with different number of threads.
INSTANTIATE_TEST_CASE_P(
VP9MultiThreadedFrameParallel, TestVectorTest,
::testing::Combine(
::testing::Values(
static_cast<const libvpx_test::CodecFactory *>(&libvpx_test::kVP9)),
::testing::Combine(
::testing::Values(1), // Frame Parallel mode.
::testing::Range(2, 9), // With 2 ~ 8 threads.
::testing::ValuesIn(libvpx_test::kVP9TestVectors,
libvpx_test::kVP9TestVectors +
libvpx_test::kNumVP9TestVectors))));
#endif
//#if CONFIG_VP9_DECODER
//// Test VP9 decode in frame parallel mode with different number of threads.
//INSTANTIATE_TEST_CASE_P(
// VP9MultiThreadedFrameParallel, TestVectorTest,
// ::testing::Combine(
// ::testing::Values(
// static_cast<const libvpx_test::CodecFactory *>(&libvpx_test::kVP9)),
// ::testing::Combine(
// ::testing::Values(1), // Frame Parallel mode.
// ::testing::Range(2, 9), // With 2 ~ 8 threads.
// ::testing::ValuesIn(libvpx_test::kVP9TestVectors,
// libvpx_test::kVP9TestVectors +
// libvpx_test::kNumVP9TestVectors))));
//#endif
} // namespace

View File

@@ -402,7 +402,6 @@ VP9_IVF_FILE="${LIBVPX_TEST_DATA_PATH}/vp90-2-09-subpixel-00.ivf"
VP9_WEBM_FILE="${LIBVPX_TEST_DATA_PATH}/vp90-2-00-quantizer-00.webm"
VP9_FPM_WEBM_FILE="${LIBVPX_TEST_DATA_PATH}/vp90-2-07-frame_parallel-1.webm"
VP9_LT_50_FRAMES_WEBM_FILE="${LIBVPX_TEST_DATA_PATH}/vp90-2-02-size-32x08.webm"
YUV_RAW_INPUT="${LIBVPX_TEST_DATA_PATH}/hantro_collage_w352h288.yuv"
YUV_RAW_INPUT_WIDTH=352

View File

@@ -30,7 +30,7 @@ namespace {
using std::string;
using libvpx_test::ACMRandom;
#if CONFIG_WEBM_IO
#if CONFIG_WEBM_IO && 0
void CheckUserPrivateData(void *user_priv, int *target) {
// actual pointer value should be the same as expected.

View File

@@ -43,29 +43,29 @@ void test_decrypt_cb(void *decrypt_state, const uint8_t *input,
namespace libvpx_test {
TEST(TestDecrypt, DecryptWorksVp9) {
libvpx_test::IVFVideoSource video("vp90-2-05-resize.ivf");
video.Init();
vpx_codec_dec_cfg_t dec_cfg = vpx_codec_dec_cfg_t();
VP9Decoder decoder(dec_cfg, 0);
video.Begin();
// no decryption
vpx_codec_err_t res = decoder.DecodeFrame(video.cxdata(), video.frame_size());
ASSERT_EQ(VPX_CODEC_OK, res) << decoder.DecodeError();
// decrypt frame
video.Next();
std::vector<uint8_t> encrypted(video.frame_size());
encrypt_buffer(video.cxdata(), &encrypted[0], video.frame_size(), 0);
vpx_decrypt_init di = { test_decrypt_cb, &encrypted[0] };
decoder.Control(VPXD_SET_DECRYPTOR, &di);
res = decoder.DecodeFrame(&encrypted[0], encrypted.size());
ASSERT_EQ(VPX_CODEC_OK, res) << decoder.DecodeError();
}
//TEST(TestDecrypt, DecryptWorksVp9) {
// libvpx_test::IVFVideoSource video("vp90-2-05-resize.ivf");
// video.Init();
//
// vpx_codec_dec_cfg_t dec_cfg = vpx_codec_dec_cfg_t();
// VP9Decoder decoder(dec_cfg, 0);
//
// video.Begin();
//
// // no decryption
// vpx_codec_err_t res = decoder.DecodeFrame(video.cxdata(), video.frame_size());
// ASSERT_EQ(VPX_CODEC_OK, res) << decoder.DecodeError();
//
// // decrypt frame
// video.Next();
//
// std::vector<uint8_t> encrypted(video.frame_size());
// encrypt_buffer(video.cxdata(), &encrypted[0], video.frame_size(), 0);
// vpx_decrypt_init di = { test_decrypt_cb, &encrypted[0] };
// decoder.Control(VPXD_SET_DECRYPTOR, &di);
//
// res = decoder.DecodeFrame(&encrypted[0], encrypted.size());
// ASSERT_EQ(VPX_CODEC_OK, res) << decoder.DecodeError();
//}
} // namespace libvpx_test

View File

@@ -27,9 +27,9 @@ namespace {
using std::string;
#if CONFIG_WEBM_IO
#if CONFIG_WEBM_IO && 0
struct PauseFileList {
struct FileList {
const char *name;
// md5 sum for decoded frames which does not include skipped frames.
const char *expected_md5;
@@ -39,8 +39,7 @@ struct PauseFileList {
// Decodes |filename| with |num_threads|. Pause at the specified frame_num,
// seek to next key frame and then continue decoding until the end. Return
// the md5 of the decoded frames which does not include skipped frames.
string DecodeFileWithPause(const string &filename, int num_threads,
int pause_num) {
string DecodeFile(const string &filename, int num_threads, int pause_num) {
libvpx_test::WebMVideoSource video(filename);
video.Init();
int in_frames = 0;
@@ -93,12 +92,12 @@ string DecodeFileWithPause(const string &filename, int num_threads,
return string(md5.Get());
}
void DecodeFilesWithPause(const PauseFileList files[]) {
for (const PauseFileList *iter = files; iter->name != NULL; ++iter) {
void DecodeFiles(const FileList files[]) {
for (const FileList *iter = files; iter->name != NULL; ++iter) {
SCOPED_TRACE(iter->name);
for (int t = 2; t <= 8; ++t) {
EXPECT_EQ(iter->expected_md5,
DecodeFileWithPause(iter->name, t, iter->pause_frame_num))
DecodeFile(iter->name, t, iter->pause_frame_num))
<< "threads = " << t;
}
}
@@ -107,19 +106,19 @@ void DecodeFilesWithPause(const PauseFileList files[]) {
TEST(VP9MultiThreadedFrameParallel, PauseSeekResume) {
// vp90-2-07-frame_parallel-1.webm is a 40 frame video file with
// one key frame for every ten frames.
static const PauseFileList files[] = {
static const FileList files[] = {
{ "vp90-2-07-frame_parallel-1.webm",
"6ea7c3875d67252e7caf2bc6e75b36b1", 6 },
"6ea7c3875d67252e7caf2bc6e75b36b1", 6},
{ "vp90-2-07-frame_parallel-1.webm",
"4bb634160c7356a8d7d4299b6dc83a45", 12 },
"4bb634160c7356a8d7d4299b6dc83a45", 12},
{ "vp90-2-07-frame_parallel-1.webm",
"89772591e6ef461f9fa754f916c78ed8", 26 },
{ NULL, NULL, 0 },
"89772591e6ef461f9fa754f916c78ed8", 26},
{ NULL, NULL, 0},
};
DecodeFilesWithPause(files);
DecodeFiles(files);
}
struct FileList {
struct InvalidFileList {
const char *name;
// md5 sum for decoded frames which does not include corrupted frames.
const char *expected_md5;
@@ -129,8 +128,8 @@ struct FileList {
// Decodes |filename| with |num_threads|. Return the md5 of the decoded
// frames which does not include corrupted frames.
string DecodeFile(const string &filename, int num_threads,
int expected_frame_count) {
string DecodeInvalidFile(const string &filename, int num_threads,
int expected_frame_count) {
libvpx_test::WebMVideoSource video(filename);
video.Init();
@@ -174,47 +173,37 @@ string DecodeFile(const string &filename, int num_threads,
return string(md5.Get());
}
void DecodeFiles(const FileList files[]) {
for (const FileList *iter = files; iter->name != NULL; ++iter) {
void DecodeInvalidFiles(const InvalidFileList files[]) {
for (const InvalidFileList *iter = files; iter->name != NULL; ++iter) {
SCOPED_TRACE(iter->name);
for (int t = 2; t <= 8; ++t) {
EXPECT_EQ(iter->expected_md5,
DecodeFile(iter->name, t, iter->expected_frame_count))
DecodeInvalidFile(iter->name, t, iter->expected_frame_count))
<< "threads = " << t;
}
}
}
TEST(VP9MultiThreadedFrameParallel, InvalidFileTest) {
static const FileList files[] = {
static const InvalidFileList files[] = {
// invalid-vp90-2-07-frame_parallel-1.webm is a 40 frame video file with
// one key frame for every ten frames. The 11th frame has corrupted data.
{ "invalid-vp90-2-07-frame_parallel-1.webm",
"0549d0f45f60deaef8eb708e6c0eb6cb", 30 },
"0549d0f45f60deaef8eb708e6c0eb6cb", 30},
// invalid-vp90-2-07-frame_parallel-2.webm is a 40 frame video file with
// one key frame for every ten frames. The 1st and 31st frames have
// corrupted data.
{ "invalid-vp90-2-07-frame_parallel-2.webm",
"6a1f3cf6f9e7a364212fadb9580d525e", 20 },
"6a1f3cf6f9e7a364212fadb9580d525e", 20},
// invalid-vp90-2-07-frame_parallel-3.webm is a 40 frame video file with
// one key frame for every ten frames. The 5th and 13th frames have
// corrupted data.
{ "invalid-vp90-2-07-frame_parallel-3.webm",
"8256544308de926b0681e04685b98677", 27 },
{ NULL, NULL, 0 },
"8256544308de926b0681e04685b98677", 27},
{ NULL, NULL, 0},
};
DecodeFiles(files);
DecodeInvalidFiles(files);
}
TEST(VP9MultiThreadedFrameParallel, ValidFileTest) {
static const FileList files[] = {
#if CONFIG_VP9_HIGHBITDEPTH
{ "vp92-2-20-10bit-yuv420.webm",
"a16b99df180c584e8db2ffeda987d293", 10 },
#endif
{ NULL, NULL, 0 },
};
DecodeFiles(files);
}
#endif // CONFIG_WEBM_IO
} // namespace

View File

@@ -152,7 +152,7 @@ TEST(VP9WorkerThreadTest, TestInterfaceAPI) {
// -----------------------------------------------------------------------------
// Multi-threaded decode tests
#if CONFIG_WEBM_IO
#if CONFIG_WEBM_IO && 0
struct FileList {
const char *name;
const char *expected_md5;

View File

@@ -33,10 +33,10 @@ class VpxScaleBase {
void ResetImage(int width, int height) {
width_ = width;
height_ = height;
memset(&img_, 0, sizeof(img_));
vpx_memset(&img_, 0, sizeof(img_));
ASSERT_EQ(0, vp8_yv12_alloc_frame_buffer(&img_, width_, height_,
VP8BORDERINPIXELS));
memset(img_.buffer_alloc, kBufFiller, img_.frame_size);
vpx_memset(img_.buffer_alloc, kBufFiller, img_.frame_size);
FillPlane(img_.y_buffer, img_.y_crop_width, img_.y_crop_height,
img_.y_stride);
FillPlane(img_.u_buffer, img_.uv_crop_width, img_.uv_crop_height,
@@ -44,15 +44,15 @@ class VpxScaleBase {
FillPlane(img_.v_buffer, img_.uv_crop_width, img_.uv_crop_height,
img_.uv_stride);
memset(&ref_img_, 0, sizeof(ref_img_));
vpx_memset(&ref_img_, 0, sizeof(ref_img_));
ASSERT_EQ(0, vp8_yv12_alloc_frame_buffer(&ref_img_, width_, height_,
VP8BORDERINPIXELS));
memset(ref_img_.buffer_alloc, kBufFiller, ref_img_.frame_size);
vpx_memset(ref_img_.buffer_alloc, kBufFiller, ref_img_.frame_size);
memset(&cpy_img_, 0, sizeof(cpy_img_));
vpx_memset(&cpy_img_, 0, sizeof(cpy_img_));
ASSERT_EQ(0, vp8_yv12_alloc_frame_buffer(&cpy_img_, width_, height_,
VP8BORDERINPIXELS));
memset(cpy_img_.buffer_alloc, kBufFiller, cpy_img_.frame_size);
vpx_memset(cpy_img_.buffer_alloc, kBufFiller, cpy_img_.frame_size);
ReferenceCopyFrame();
}
@@ -87,8 +87,8 @@ class VpxScaleBase {
// Fill the border pixels from the nearest image pixel.
for (int y = 0; y < crop_height; ++y) {
memset(left, left[padding], padding);
memset(right, right[-1], right_extend);
vpx_memset(left, left[padding], padding);
vpx_memset(right, right[-1], right_extend);
left += stride;
right += stride;
}
@@ -101,13 +101,13 @@ class VpxScaleBase {
// The first row was already extended to the left and right. Copy it up.
for (int y = 0; y < padding; ++y) {
memcpy(top, left, extend_width);
vpx_memcpy(top, left, extend_width);
top += stride;
}
uint8_t *bottom = left + (crop_height * stride);
for (int y = 0; y < bottom_extend; ++y) {
memcpy(bottom, left + (crop_height - 1) * stride, extend_width);
vpx_memcpy(bottom, left + (crop_height - 1) * stride, extend_width);
bottom += stride;
}
}

View File

@@ -17,8 +17,7 @@
# Environment check: Make sure input is available.
vpxdec_verify_environment() {
if [ ! -e "${VP8_IVF_FILE}" ] || [ ! -e "${VP9_WEBM_FILE}" ] || \
[ ! -e "${VP9_FPM_WEBM_FILE}" ] || \
[ ! -e "${VP9_LT_50_FRAMES_WEBM_FILE}" ] ; then
[ ! -e "${VP9_FPM_WEBM_FILE}" ] ; then
elog "Libvpx test data must exist in LIBVPX_TEST_DATA_PATH."
return 1
fi
@@ -88,29 +87,12 @@ vpxdec_vp9_webm_frame_parallel() {
--frame-parallel
done
fi
}
vpxdec_vp9_webm_less_than_50_frames() {
# ensure that reaching eof in webm_guess_framerate doesn't result in invalid
# frames in actual webm_read_frame calls.
if [ "$(vpxdec_can_decode_vp9)" = "yes" ] && \
[ "$(webm_io_available)" = "yes" ]; then
local readonly decoder="$(vpx_tool_path vpxdec)"
local readonly expected=10
local readonly num_frames=$(${VPX_TEST_PREFIX} "${decoder}" \
"${VP9_LT_50_FRAMES_WEBM_FILE}" --summary --noblit 2>&1 \
| awk '/^[0-9]+ decoded frames/ { print $1 }')
if [ "$num_frames" -ne "$expected" ]; then
elog "Output frames ($num_frames) != expected ($expected)"
return 1
fi
fi
}
vpxdec_tests="vpxdec_vp8_ivf
vpxdec_vp8_ivf_pipe_input
vpxdec_vp9_webm
vpxdec_vp9_webm_frame_parallel
vpxdec_vp9_webm_less_than_50_frames"
vpxdec_vp9_webm_frame_parallel"
run_tests vpxdec_verify_environment "${vpxdec_tests}"

View File

@@ -9,4 +9,3 @@ defines that help automatically allow assembly to work cross-platform.
Local Modifications:
Some modifications to allow PIC to work with x86inc.
Conditionally define program_name to allow overriding.

View File

@@ -36,9 +36,7 @@
%include "vpx_config.asm"
%ifndef program_name
%define program_name vp9
%endif
%define UNIX64 0

View File

@@ -103,9 +103,9 @@ int vp8_alloc_frame_buffers(VP8_COMMON *oci, int width, int height)
goto allocation_fail;
oci->post_proc_buffer_int_used = 0;
memset(&oci->postproc_state, 0, sizeof(oci->postproc_state));
memset(oci->post_proc_buffer.buffer_alloc, 128,
oci->post_proc_buffer.frame_size);
vpx_memset(&oci->postproc_state, 0, sizeof(oci->postproc_state));
vpx_memset(oci->post_proc_buffer.buffer_alloc, 128,
oci->post_proc_buffer.frame_size);
/* Allocate buffer to store post-processing filter coefficients.
*
@@ -176,7 +176,7 @@ void vp8_create_common(VP8_COMMON *oci)
oci->clamp_type = RECON_CLAMP_REQUIRED;
/* Initialize reference frame sign bias structure to defaults */
memset(oci->ref_frame_sign_bias, 0, sizeof(oci->ref_frame_sign_bias));
vpx_memset(oci->ref_frame_sign_bias, 0, sizeof(oci->ref_frame_sign_bias));
/* Default disable buffer to buffer copying */
oci->copy_buffer_to_gf = 0;

View File

@@ -165,7 +165,7 @@ vp8_dequant_idct_loop2_v6
str r1, [r2], r12 ; store output to dst
bne vp8_dequant_idct_loop2_v6
; memset
; vpx_memset
sub r0, r0, #32
add sp, sp, #4

View File

@@ -29,19 +29,19 @@ extern "C" {
#define vp8_copy( Dest, Src) { \
assert( sizeof( Dest) == sizeof( Src)); \
memcpy( Dest, Src, sizeof( Src)); \
vpx_memcpy( Dest, Src, sizeof( Src)); \
}
/* Use this for variably-sized arrays. */
#define vp8_copy_array( Dest, Src, N) { \
assert( sizeof( *Dest) == sizeof( *Src)); \
memcpy( Dest, Src, N * sizeof( *Src)); \
vpx_memcpy( Dest, Src, N * sizeof( *Src)); \
}
#define vp8_zero( Dest) memset( &Dest, 0, sizeof( Dest));
#define vp8_zero( Dest) vpx_memset( &Dest, 0, sizeof( Dest));
#define vp8_zero_array( Dest, N) memset( Dest, 0, N * sizeof( *Dest));
#define vp8_zero_array( Dest, N) vpx_memset( Dest, 0, N * sizeof( *Dest));
#ifdef __cplusplus

View File

@@ -81,6 +81,7 @@ void vp8_print_modes_and_motion_vectors(MODE_INFO *mi, int rows, int cols, int f
fprintf(mvs, "\n");
/* print out the block modes */
mb_index = 0;
fprintf(mvs, "Mbs for Frame %d\n", frame);
{
int b_row;
@@ -128,6 +129,7 @@ void vp8_print_modes_and_motion_vectors(MODE_INFO *mi, int rows, int cols, int f
/* print out the block modes */
mb_index = 0;
fprintf(mvs, "MVs for Frame %d\n", frame);
{
int b_row;

View File

@@ -38,6 +38,6 @@ void vp8_dequant_idct_add_c(short *input, short *dq,
vp8_short_idct4x4llm_c(input, dest, stride, dest, stride);
memset(input, 0, 32);
vpx_memset(input, 0, 32);
}

View File

@@ -183,6 +183,7 @@ const vp8_extra_bit_struct vp8_extra_bits[12] =
void vp8_default_coef_probs(VP8_COMMON *pc)
{
memcpy(pc->fc.coef_probs, default_coef_probs, sizeof(default_coef_probs));
vpx_memcpy(pc->fc.coef_probs, default_coef_probs,
sizeof(default_coef_probs));
}

View File

@@ -159,13 +159,13 @@ const vp8_tree_index vp8_small_mvtree [14] =
void vp8_init_mbmode_probs(VP8_COMMON *x)
{
memcpy(x->fc.ymode_prob, vp8_ymode_prob, sizeof(vp8_ymode_prob));
memcpy(x->fc.uv_mode_prob, vp8_uv_mode_prob, sizeof(vp8_uv_mode_prob));
memcpy(x->fc.sub_mv_ref_prob, sub_mv_ref_prob, sizeof(sub_mv_ref_prob));
vpx_memcpy(x->fc.ymode_prob, vp8_ymode_prob, sizeof(vp8_ymode_prob));
vpx_memcpy(x->fc.uv_mode_prob, vp8_uv_mode_prob, sizeof(vp8_uv_mode_prob));
vpx_memcpy(x->fc.sub_mv_ref_prob, sub_mv_ref_prob, sizeof(sub_mv_ref_prob));
}
void vp8_default_bmode_probs(vp8_prob p [VP8_BINTRAMODES-1])
{
memcpy(p, vp8_bmode_prob, sizeof(vp8_bmode_prob));
vpx_memcpy(p, vp8_bmode_prob, sizeof(vp8_bmode_prob));
}

View File

@@ -40,9 +40,9 @@ static void copy_and_extend_plane
for (i = 0; i < h; i++)
{
memset(dest_ptr1, src_ptr1[0], el);
memcpy(dest_ptr1 + el, src_ptr1, w);
memset(dest_ptr2, src_ptr2[0], er);
vpx_memset(dest_ptr1, src_ptr1[0], el);
vpx_memcpy(dest_ptr1 + el, src_ptr1, w);
vpx_memset(dest_ptr2, src_ptr2[0], er);
src_ptr1 += sp;
src_ptr2 += sp;
dest_ptr1 += dp;
@@ -60,13 +60,13 @@ static void copy_and_extend_plane
for (i = 0; i < et; i++)
{
memcpy(dest_ptr1, src_ptr1, linesize);
vpx_memcpy(dest_ptr1, src_ptr1, linesize);
dest_ptr1 += dp;
}
for (i = 0; i < eb; i++)
{
memcpy(dest_ptr2, src_ptr2, linesize);
vpx_memcpy(dest_ptr2, src_ptr2, linesize);
dest_ptr2 += dp;
}
}

View File

@@ -33,7 +33,7 @@ void vp8_dequant_idct_add_y_block_c
else
{
vp8_dc_only_idct_add_c (q[0]*dq[0], dst, stride, dst, stride);
memset(q, 0, 2 * sizeof(q[0]));
vpx_memset(q, 0, 2 * sizeof(q[0]));
}
q += 16;
@@ -59,7 +59,7 @@ void vp8_dequant_idct_add_uv_block_c
else
{
vp8_dc_only_idct_add_c (q[0]*dq[0], dstu, stride, dstu, stride);
memset(q, 0, 2 * sizeof(q[0]));
vpx_memset(q, 0, 2 * sizeof(q[0]));
}
q += 16;
@@ -78,7 +78,7 @@ void vp8_dequant_idct_add_uv_block_c
else
{
vp8_dc_only_idct_add_c (q[0]*dq[0], dstv, stride, dstv, stride);
memset(q, 0, 2 * sizeof(q[0]));
vpx_memset(q, 0, 2 * sizeof(q[0]));
}
q += 16;

View File

@@ -82,10 +82,11 @@ void vp8_loop_filter_update_sharpness(loop_filter_info_n *lfi,
if (block_inside_limit < 1)
block_inside_limit = 1;
memset(lfi->lim[i], block_inside_limit, SIMD_WIDTH);
memset(lfi->blim[i], (2 * filt_lvl + block_inside_limit), SIMD_WIDTH);
memset(lfi->mblim[i], (2 * (filt_lvl + 2) + block_inside_limit),
SIMD_WIDTH);
vpx_memset(lfi->lim[i], block_inside_limit, SIMD_WIDTH);
vpx_memset(lfi->blim[i], (2 * filt_lvl + block_inside_limit),
SIMD_WIDTH);
vpx_memset(lfi->mblim[i], (2 * (filt_lvl + 2) + block_inside_limit),
SIMD_WIDTH);
}
}
@@ -104,7 +105,7 @@ void vp8_loop_filter_init(VP8_COMMON *cm)
/* init hev threshold const vectors */
for(i = 0; i < 4 ; i++)
{
memset(lfi->hev_thr[i], i, SIMD_WIDTH);
vpx_memset(lfi->hev_thr[i], i, SIMD_WIDTH);
}
}
@@ -150,7 +151,7 @@ void vp8_loop_filter_frame_init(VP8_COMMON *cm,
/* we could get rid of this if we assume that deltas are set to
* zero when not in use; encoder always uses deltas
*/
memset(lfi->lvl[seg][0], lvl_seg, 4 * 4 );
vpx_memset(lfi->lvl[seg][0], lvl_seg, 4 * 4 );
continue;
}

View File

@@ -153,11 +153,11 @@ static void multiframe_quality_enhance_block
actd = (vp8_variance16x16(yd, yd_stride, VP8_ZEROS, 0, &sse)+128)>>8;
act = (vp8_variance16x16(y, y_stride, VP8_ZEROS, 0, &sse)+128)>>8;
#ifdef USE_SSD
vp8_variance16x16(y, y_stride, yd, yd_stride, &sse);
sad = (vp8_variance16x16(y, y_stride, yd, yd_stride, &sse));
sad = (sse + 128)>>8;
vp8_variance8x8(u, uv_stride, ud, uvd_stride, &sse);
usad = (vp8_variance8x8(u, uv_stride, ud, uvd_stride, &sse));
usad = (sse + 32)>>6;
vp8_variance8x8(v, uv_stride, vd, uvd_stride, &sse);
vsad = (vp8_variance8x8(v, uv_stride, vd, uvd_stride, &sse));
vsad = (sse + 32)>>6;
#else
sad = (vp8_sad16x16(y, y_stride, yd, yd_stride, UINT_MAX) + 128) >> 8;
@@ -170,11 +170,11 @@ static void multiframe_quality_enhance_block
actd = (vp8_variance8x8(yd, yd_stride, VP8_ZEROS, 0, &sse)+32)>>6;
act = (vp8_variance8x8(y, y_stride, VP8_ZEROS, 0, &sse)+32)>>6;
#ifdef USE_SSD
vp8_variance8x8(y, y_stride, yd, yd_stride, &sse);
sad = (vp8_variance8x8(y, y_stride, yd, yd_stride, &sse));
sad = (sse + 32)>>6;
vp8_variance4x4(u, uv_stride, ud, uvd_stride, &sse);
usad = (vp8_variance4x4(u, uv_stride, ud, uvd_stride, &sse));
usad = (sse + 8)>>4;
vp8_variance4x4(v, uv_stride, vd, uvd_stride, &sse);
vsad = (vp8_variance4x4(v, uv_stride, vd, uvd_stride, &sse));
vsad = (sse + 8)>>4;
#else
sad = (vp8_sad8x8(y, y_stride, yd, yd_stride, UINT_MAX) + 32) >> 6;
@@ -231,9 +231,9 @@ static void multiframe_quality_enhance_block
{
vp8_copy_mem8x8(y, y_stride, yd, yd_stride);
for (up = u, udp = ud, i = 0; i < uvblksize; ++i, up += uv_stride, udp += uvd_stride)
memcpy(udp, up, uvblksize);
vpx_memcpy(udp, up, uvblksize);
for (vp = v, vdp = vd, i = 0; i < uvblksize; ++i, vp += uv_stride, vdp += uvd_stride)
memcpy(vdp, vp, uvblksize);
vpx_memcpy(vdp, vp, uvblksize);
}
}
}
@@ -341,8 +341,8 @@ void vp8_multiframe_quality_enhance
for (k = 0; k < 4; ++k, up += show->uv_stride, udp += dest->uv_stride,
vp += show->uv_stride, vdp += dest->uv_stride)
{
memcpy(udp, up, 4);
memcpy(vdp, vp, 4);
vpx_memcpy(udp, up, 4);
vpx_memcpy(vdp, vp, 4);
}
}
}

View File

@@ -26,7 +26,7 @@ void vp8_dequant_idct_add_dspr2(short *input, short *dq,
vp8_short_idct4x4llm_dspr2(input, dest, stride, dest, stride);
memset(input, 0, 32);
vpx_memset(input, 0, 32);
}

View File

@@ -355,8 +355,8 @@ void vp8_deblock(VP8_COMMON *cm,
else
mb_ppl = (unsigned char)ppl;
memset(ylptr, mb_ppl, 16);
memset(uvlptr, mb_ppl, 8);
vpx_memset(ylptr, mb_ppl, 16);
vpx_memset(uvlptr, mb_ppl, 8);
ylptr += 16;
uvlptr += 8;
@@ -403,7 +403,7 @@ void vp8_de_noise(VP8_COMMON *cm,
(void) low_var_thresh;
(void) flag;
memset(limits, (unsigned char)ppl, 16 * mb_cols);
vpx_memset(limits, (unsigned char)ppl, 16 * mb_cols);
/* TODO: The original code don't filter the 2 outer rows and columns. */
for (mbr = 0; mbr < mb_rows; mbr++)
@@ -763,7 +763,7 @@ int vp8_post_proc_frame(VP8_COMMON *oci, YV12_BUFFER_CONFIG *dest, vp8_ppflags_t
/* insure that postproc is set to all 0's so that post proc
* doesn't pull random data in from edge
*/
memset((&oci->post_proc_buffer_int)->buffer_alloc,128,(&oci->post_proc_buffer)->frame_size);
vpx_memset((&oci->post_proc_buffer_int)->buffer_alloc,128,(&oci->post_proc_buffer)->frame_size);
}
}

View File

@@ -0,0 +1,47 @@
;
; Copyright (c) 2010 The WebM project authors. All Rights Reserved.
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
;
.globl copy_mem16x16_ppc
;# r3 unsigned char *src
;# r4 int src_stride
;# r5 unsigned char *dst
;# r6 int dst_stride
;# Make the assumption that input will not be aligned,
;# but the output will be. So two reads and a perm
;# for the input, but only one store for the output.
copy_mem16x16_ppc:
mfspr r11, 256 ;# get old VRSAVE
oris r12, r11, 0xe000
mtspr 256, r12 ;# set VRSAVE
li r10, 16
mtctr r10
cp_16x16_loop:
lvsl v0, 0, r3 ;# permutate value for alignment
lvx v1, 0, r3
lvx v2, r10, r3
vperm v1, v1, v2, v0
stvx v1, 0, r5
add r3, r3, r4 ;# increment source pointer
add r5, r5, r6 ;# increment destination pointer
bdnz cp_16x16_loop
mtspr 256, r11 ;# reset old VRSAVE
blr

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,677 @@
;
; Copyright (c) 2010 The WebM project authors. All Rights Reserved.
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
;
.globl bilinear_predict4x4_ppc
.globl bilinear_predict8x4_ppc
.globl bilinear_predict8x8_ppc
.globl bilinear_predict16x16_ppc
.macro load_c V, LABEL, OFF, R0, R1
lis \R0, \LABEL@ha
la \R1, \LABEL@l(\R0)
lvx \V, \OFF, \R1
.endm
.macro load_vfilter V0, V1
load_c \V0, vfilter_b, r6, r9, r10
addi r6, r6, 16
lvx \V1, r6, r10
.endm
.macro HProlog jump_label
;# load up horizontal filter
slwi. r5, r5, 4 ;# index into horizontal filter array
;# index to the next set of vectors in the row.
li r10, 16
li r12, 32
;# downshift by 7 ( divide by 128 ) at the end
vspltish v19, 7
;# If there isn't any filtering to be done for the horizontal, then
;# just skip to the second pass.
beq \jump_label
load_c v20, hfilter_b, r5, r9, r0
;# setup constants
;# v14 permutation value for alignment
load_c v28, b_hperm_b, 0, r9, r0
;# rounding added in on the multiply
vspltisw v21, 8
vspltisw v18, 3
vslw v18, v21, v18 ;# 0x00000040000000400000004000000040
slwi. r6, r6, 5 ;# index into vertical filter array
.endm
;# Filters a horizontal line
;# expects:
;# r3 src_ptr
;# r4 pitch
;# r10 16
;# r12 32
;# v17 perm intput
;# v18 rounding
;# v19 shift
;# v20 filter taps
;# v21 tmp
;# v22 tmp
;# v23 tmp
;# v24 tmp
;# v25 tmp
;# v26 tmp
;# v27 tmp
;# v28 perm output
;#
.macro HFilter V
vperm v24, v21, v21, v10 ;# v20 = 0123 1234 2345 3456
vperm v25, v21, v21, v11 ;# v21 = 4567 5678 6789 789A
vmsummbm v24, v20, v24, v18
vmsummbm v25, v20, v25, v18
vpkswus v24, v24, v25 ;# v24 = 0 4 8 C 1 5 9 D (16-bit)
vsrh v24, v24, v19 ;# divide v0, v1 by 128
vpkuhus \V, v24, v24 ;# \V = scrambled 8-bit result
.endm
.macro hfilter_8 V, increment_counter
lvsl v17, 0, r3 ;# permutate value for alignment
;# input to filter is 9 bytes wide, output is 8 bytes.
lvx v21, 0, r3
lvx v22, r10, r3
.if \increment_counter
add r3, r3, r4
.endif
vperm v21, v21, v22, v17
HFilter \V
.endm
.macro load_and_align_8 V, increment_counter
lvsl v17, 0, r3 ;# permutate value for alignment
;# input to filter is 21 bytes wide, output is 16 bytes.
;# input will can span three vectors if not aligned correctly.
lvx v21, 0, r3
lvx v22, r10, r3
.if \increment_counter
add r3, r3, r4
.endif
vperm \V, v21, v22, v17
.endm
.macro write_aligned_8 V, increment_counter
stvx \V, 0, r7
.if \increment_counter
add r7, r7, r8
.endif
.endm
.macro vfilter_16 P0 P1
vmuleub v22, \P0, v20 ;# 64 + 4 positive taps
vadduhm v22, v18, v22
vmuloub v23, \P0, v20
vadduhm v23, v18, v23
vmuleub v24, \P1, v21
vadduhm v22, v22, v24 ;# Re = evens, saturation unnecessary
vmuloub v25, \P1, v21
vadduhm v23, v23, v25 ;# Ro = odds
vsrh v22, v22, v19 ;# divide by 128
vsrh v23, v23, v19 ;# v16 v17 = evens, odds
vmrghh \P0, v22, v23 ;# v18 v19 = 16-bit result in order
vmrglh v23, v22, v23
vpkuhus \P0, \P0, v23 ;# P0 = 8-bit result
.endm
.macro w_8x8 V, D, R, P
stvx \V, 0, r1
lwz \R, 0(r1)
stw \R, 0(r7)
lwz \R, 4(r1)
stw \R, 4(r7)
add \D, \D, \P
.endm
.align 2
;# r3 unsigned char * src
;# r4 int src_pitch
;# r5 int x_offset
;# r6 int y_offset
;# r7 unsigned char * dst
;# r8 int dst_pitch
bilinear_predict4x4_ppc:
mfspr r11, 256 ;# get old VRSAVE
oris r12, r11, 0xf830
ori r12, r12, 0xfff8
mtspr 256, r12 ;# set VRSAVE
stwu r1,-32(r1) ;# create space on the stack
HProlog second_pass_4x4_pre_copy_b
;# Load up permutation constants
load_c v10, b_0123_b, 0, r9, r12
load_c v11, b_4567_b, 0, r9, r12
hfilter_8 v0, 1
hfilter_8 v1, 1
hfilter_8 v2, 1
hfilter_8 v3, 1
;# Finished filtering main horizontal block. If there is no
;# vertical filtering, jump to storing the data. Otherwise
;# load up and filter the additional line that is needed
;# for the vertical filter.
beq store_out_4x4_b
hfilter_8 v4, 0
b second_pass_4x4_b
second_pass_4x4_pre_copy_b:
slwi r6, r6, 5 ;# index into vertical filter array
load_and_align_8 v0, 1
load_and_align_8 v1, 1
load_and_align_8 v2, 1
load_and_align_8 v3, 1
load_and_align_8 v4, 1
second_pass_4x4_b:
vspltish v20, 8
vspltish v18, 3
vslh v18, v20, v18 ;# 0x0040 0040 0040 0040 0040 0040 0040 0040
load_vfilter v20, v21
vfilter_16 v0, v1
vfilter_16 v1, v2
vfilter_16 v2, v3
vfilter_16 v3, v4
store_out_4x4_b:
stvx v0, 0, r1
lwz r0, 0(r1)
stw r0, 0(r7)
add r7, r7, r8
stvx v1, 0, r1
lwz r0, 0(r1)
stw r0, 0(r7)
add r7, r7, r8
stvx v2, 0, r1
lwz r0, 0(r1)
stw r0, 0(r7)
add r7, r7, r8
stvx v3, 0, r1
lwz r0, 0(r1)
stw r0, 0(r7)
exit_4x4:
addi r1, r1, 32 ;# recover stack
mtspr 256, r11 ;# reset old VRSAVE
blr
.align 2
;# r3 unsigned char * src
;# r4 int src_pitch
;# r5 int x_offset
;# r6 int y_offset
;# r7 unsigned char * dst
;# r8 int dst_pitch
bilinear_predict8x4_ppc:
mfspr r11, 256 ;# get old VRSAVE
oris r12, r11, 0xf830
ori r12, r12, 0xfff8
mtspr 256, r12 ;# set VRSAVE
stwu r1,-32(r1) ;# create space on the stack
HProlog second_pass_8x4_pre_copy_b
;# Load up permutation constants
load_c v10, b_0123_b, 0, r9, r12
load_c v11, b_4567_b, 0, r9, r12
hfilter_8 v0, 1
hfilter_8 v1, 1
hfilter_8 v2, 1
hfilter_8 v3, 1
;# Finished filtering main horizontal block. If there is no
;# vertical filtering, jump to storing the data. Otherwise
;# load up and filter the additional line that is needed
;# for the vertical filter.
beq store_out_8x4_b
hfilter_8 v4, 0
b second_pass_8x4_b
second_pass_8x4_pre_copy_b:
slwi r6, r6, 5 ;# index into vertical filter array
load_and_align_8 v0, 1
load_and_align_8 v1, 1
load_and_align_8 v2, 1
load_and_align_8 v3, 1
load_and_align_8 v4, 1
second_pass_8x4_b:
vspltish v20, 8
vspltish v18, 3
vslh v18, v20, v18 ;# 0x0040 0040 0040 0040 0040 0040 0040 0040
load_vfilter v20, v21
vfilter_16 v0, v1
vfilter_16 v1, v2
vfilter_16 v2, v3
vfilter_16 v3, v4
store_out_8x4_b:
cmpi cr0, r8, 8
beq cr0, store_aligned_8x4_b
w_8x8 v0, r7, r0, r8
w_8x8 v1, r7, r0, r8
w_8x8 v2, r7, r0, r8
w_8x8 v3, r7, r0, r8
b exit_8x4
store_aligned_8x4_b:
load_c v10, b_hilo_b, 0, r9, r10
vperm v0, v0, v1, v10
vperm v2, v2, v3, v10
stvx v0, 0, r7
addi r7, r7, 16
stvx v2, 0, r7
exit_8x4:
addi r1, r1, 32 ;# recover stack
mtspr 256, r11 ;# reset old VRSAVE
blr
.align 2
;# r3 unsigned char * src
;# r4 int src_pitch
;# r5 int x_offset
;# r6 int y_offset
;# r7 unsigned char * dst
;# r8 int dst_pitch
bilinear_predict8x8_ppc:
mfspr r11, 256 ;# get old VRSAVE
oris r12, r11, 0xfff0
ori r12, r12, 0xffff
mtspr 256, r12 ;# set VRSAVE
stwu r1,-32(r1) ;# create space on the stack
HProlog second_pass_8x8_pre_copy_b
;# Load up permutation constants
load_c v10, b_0123_b, 0, r9, r12
load_c v11, b_4567_b, 0, r9, r12
hfilter_8 v0, 1
hfilter_8 v1, 1
hfilter_8 v2, 1
hfilter_8 v3, 1
hfilter_8 v4, 1
hfilter_8 v5, 1
hfilter_8 v6, 1
hfilter_8 v7, 1
;# Finished filtering main horizontal block. If there is no
;# vertical filtering, jump to storing the data. Otherwise
;# load up and filter the additional line that is needed
;# for the vertical filter.
beq store_out_8x8_b
hfilter_8 v8, 0
b second_pass_8x8_b
second_pass_8x8_pre_copy_b:
slwi r6, r6, 5 ;# index into vertical filter array
load_and_align_8 v0, 1
load_and_align_8 v1, 1
load_and_align_8 v2, 1
load_and_align_8 v3, 1
load_and_align_8 v4, 1
load_and_align_8 v5, 1
load_and_align_8 v6, 1
load_and_align_8 v7, 1
load_and_align_8 v8, 0
second_pass_8x8_b:
vspltish v20, 8
vspltish v18, 3
vslh v18, v20, v18 ;# 0x0040 0040 0040 0040 0040 0040 0040 0040
load_vfilter v20, v21
vfilter_16 v0, v1
vfilter_16 v1, v2
vfilter_16 v2, v3
vfilter_16 v3, v4
vfilter_16 v4, v5
vfilter_16 v5, v6
vfilter_16 v6, v7
vfilter_16 v7, v8
store_out_8x8_b:
cmpi cr0, r8, 8
beq cr0, store_aligned_8x8_b
w_8x8 v0, r7, r0, r8
w_8x8 v1, r7, r0, r8
w_8x8 v2, r7, r0, r8
w_8x8 v3, r7, r0, r8
w_8x8 v4, r7, r0, r8
w_8x8 v5, r7, r0, r8
w_8x8 v6, r7, r0, r8
w_8x8 v7, r7, r0, r8
b exit_8x8
store_aligned_8x8_b:
load_c v10, b_hilo_b, 0, r9, r10
vperm v0, v0, v1, v10
vperm v2, v2, v3, v10
vperm v4, v4, v5, v10
vperm v6, v6, v7, v10
stvx v0, 0, r7
addi r7, r7, 16
stvx v2, 0, r7
addi r7, r7, 16
stvx v4, 0, r7
addi r7, r7, 16
stvx v6, 0, r7
exit_8x8:
addi r1, r1, 32 ;# recover stack
mtspr 256, r11 ;# reset old VRSAVE
blr
;# Filters a horizontal line
;# expects:
;# r3 src_ptr
;# r4 pitch
;# r10 16
;# r12 32
;# v17 perm intput
;# v18 rounding
;# v19 shift
;# v20 filter taps
;# v21 tmp
;# v22 tmp
;# v23 tmp
;# v24 tmp
;# v25 tmp
;# v26 tmp
;# v27 tmp
;# v28 perm output
;#
.macro hfilter_16 V, increment_counter
lvsl v17, 0, r3 ;# permutate value for alignment
;# input to filter is 21 bytes wide, output is 16 bytes.
;# input will can span three vectors if not aligned correctly.
lvx v21, 0, r3
lvx v22, r10, r3
lvx v23, r12, r3
.if \increment_counter
add r3, r3, r4
.endif
vperm v21, v21, v22, v17
vperm v22, v22, v23, v17 ;# v8 v9 = 21 input pixels left-justified
;# set 0
vmsummbm v24, v20, v21, v18 ;# taps times elements
;# set 1
vsldoi v23, v21, v22, 1
vmsummbm v25, v20, v23, v18
;# set 2
vsldoi v23, v21, v22, 2
vmsummbm v26, v20, v23, v18
;# set 3
vsldoi v23, v21, v22, 3
vmsummbm v27, v20, v23, v18
vpkswus v24, v24, v25 ;# v24 = 0 4 8 C 1 5 9 D (16-bit)
vpkswus v25, v26, v27 ;# v25 = 2 6 A E 3 7 B F
vsrh v24, v24, v19 ;# divide v0, v1 by 128
vsrh v25, v25, v19
vpkuhus \V, v24, v25 ;# \V = scrambled 8-bit result
vperm \V, \V, v0, v28 ;# \V = correctly-ordered result
.endm
.macro load_and_align_16 V, increment_counter
lvsl v17, 0, r3 ;# permutate value for alignment
;# input to filter is 21 bytes wide, output is 16 bytes.
;# input will can span three vectors if not aligned correctly.
lvx v21, 0, r3
lvx v22, r10, r3
.if \increment_counter
add r3, r3, r4
.endif
vperm \V, v21, v22, v17
.endm
.macro write_16 V, increment_counter
stvx \V, 0, r7
.if \increment_counter
add r7, r7, r8
.endif
.endm
.align 2
;# r3 unsigned char * src
;# r4 int src_pitch
;# r5 int x_offset
;# r6 int y_offset
;# r7 unsigned char * dst
;# r8 int dst_pitch
bilinear_predict16x16_ppc:
mfspr r11, 256 ;# get old VRSAVE
oris r12, r11, 0xffff
ori r12, r12, 0xfff8
mtspr 256, r12 ;# set VRSAVE
HProlog second_pass_16x16_pre_copy_b
hfilter_16 v0, 1
hfilter_16 v1, 1
hfilter_16 v2, 1
hfilter_16 v3, 1
hfilter_16 v4, 1
hfilter_16 v5, 1
hfilter_16 v6, 1
hfilter_16 v7, 1
hfilter_16 v8, 1
hfilter_16 v9, 1
hfilter_16 v10, 1
hfilter_16 v11, 1
hfilter_16 v12, 1
hfilter_16 v13, 1
hfilter_16 v14, 1
hfilter_16 v15, 1
;# Finished filtering main horizontal block. If there is no
;# vertical filtering, jump to storing the data. Otherwise
;# load up and filter the additional line that is needed
;# for the vertical filter.
beq store_out_16x16_b
hfilter_16 v16, 0
b second_pass_16x16_b
second_pass_16x16_pre_copy_b:
slwi r6, r6, 5 ;# index into vertical filter array
load_and_align_16 v0, 1
load_and_align_16 v1, 1
load_and_align_16 v2, 1
load_and_align_16 v3, 1
load_and_align_16 v4, 1
load_and_align_16 v5, 1
load_and_align_16 v6, 1
load_and_align_16 v7, 1
load_and_align_16 v8, 1
load_and_align_16 v9, 1
load_and_align_16 v10, 1
load_and_align_16 v11, 1
load_and_align_16 v12, 1
load_and_align_16 v13, 1
load_and_align_16 v14, 1
load_and_align_16 v15, 1
load_and_align_16 v16, 0
second_pass_16x16_b:
vspltish v20, 8
vspltish v18, 3
vslh v18, v20, v18 ;# 0x0040 0040 0040 0040 0040 0040 0040 0040
load_vfilter v20, v21
vfilter_16 v0, v1
vfilter_16 v1, v2
vfilter_16 v2, v3
vfilter_16 v3, v4
vfilter_16 v4, v5
vfilter_16 v5, v6
vfilter_16 v6, v7
vfilter_16 v7, v8
vfilter_16 v8, v9
vfilter_16 v9, v10
vfilter_16 v10, v11
vfilter_16 v11, v12
vfilter_16 v12, v13
vfilter_16 v13, v14
vfilter_16 v14, v15
vfilter_16 v15, v16
store_out_16x16_b:
write_16 v0, 1
write_16 v1, 1
write_16 v2, 1
write_16 v3, 1
write_16 v4, 1
write_16 v5, 1
write_16 v6, 1
write_16 v7, 1
write_16 v8, 1
write_16 v9, 1
write_16 v10, 1
write_16 v11, 1
write_16 v12, 1
write_16 v13, 1
write_16 v14, 1
write_16 v15, 0
mtspr 256, r11 ;# reset old VRSAVE
blr
.data
.align 4
hfilter_b:
.byte 128, 0, 0, 0,128, 0, 0, 0,128, 0, 0, 0,128, 0, 0, 0
.byte 112, 16, 0, 0,112, 16, 0, 0,112, 16, 0, 0,112, 16, 0, 0
.byte 96, 32, 0, 0, 96, 32, 0, 0, 96, 32, 0, 0, 96, 32, 0, 0
.byte 80, 48, 0, 0, 80, 48, 0, 0, 80, 48, 0, 0, 80, 48, 0, 0
.byte 64, 64, 0, 0, 64, 64, 0, 0, 64, 64, 0, 0, 64, 64, 0, 0
.byte 48, 80, 0, 0, 48, 80, 0, 0, 48, 80, 0, 0, 48, 80, 0, 0
.byte 32, 96, 0, 0, 32, 96, 0, 0, 32, 96, 0, 0, 32, 96, 0, 0
.byte 16,112, 0, 0, 16,112, 0, 0, 16,112, 0, 0, 16,112, 0, 0
.align 4
vfilter_b:
.byte 128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128
.byte 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.byte 112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112
.byte 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16
.byte 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96
.byte 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32
.byte 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80
.byte 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48
.byte 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64
.byte 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64
.byte 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48
.byte 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80
.byte 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32
.byte 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96
.byte 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16
.byte 112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112
.align 4
b_hperm_b:
.byte 0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15
.align 4
b_0123_b:
.byte 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6
.align 4
b_4567_b:
.byte 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10
b_hilo_b:
.byte 0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23

View File

@@ -0,0 +1,189 @@
;
; Copyright (c) 2010 The WebM project authors. All Rights Reserved.
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
;
.globl short_idct4x4llm_ppc
.macro load_c V, LABEL, OFF, R0, R1
lis \R0, \LABEL@ha
la \R1, \LABEL@l(\R0)
lvx \V, \OFF, \R1
.endm
;# r3 short *input
;# r4 short *output
;# r5 int pitch
.align 2
short_idct4x4llm_ppc:
mfspr r11, 256 ;# get old VRSAVE
oris r12, r11, 0xfff8
mtspr 256, r12 ;# set VRSAVE
load_c v8, sinpi8sqrt2, 0, r9, r10
load_c v9, cospi8sqrt2minus1, 0, r9, r10
load_c v10, hi_hi, 0, r9, r10
load_c v11, lo_lo, 0, r9, r10
load_c v12, shift_16, 0, r9, r10
li r10, 16
lvx v0, 0, r3 ;# input ip[0], ip[ 4]
lvx v1, r10, r3 ;# input ip[8], ip[12]
;# first pass
vupkhsh v2, v0
vupkhsh v3, v1
vaddsws v6, v2, v3 ;# a1 = ip[0]+ip[8]
vsubsws v7, v2, v3 ;# b1 = ip[0]-ip[8]
vupklsh v0, v0
vmulosh v4, v0, v8
vsraw v4, v4, v12
vaddsws v4, v4, v0 ;# ip[ 4] * sin(pi/8) * sqrt(2)
vupklsh v1, v1
vmulosh v5, v1, v9
vsraw v5, v5, v12 ;# ip[12] * cos(pi/8) * sqrt(2)
vaddsws v5, v5, v1
vsubsws v4, v4, v5 ;# c1
vmulosh v3, v1, v8
vsraw v3, v3, v12
vaddsws v3, v3, v1 ;# ip[12] * sin(pi/8) * sqrt(2)
vmulosh v5, v0, v9
vsraw v5, v5, v12 ;# ip[ 4] * cos(pi/8) * sqrt(2)
vaddsws v5, v5, v0
vaddsws v3, v3, v5 ;# d1
vaddsws v0, v6, v3 ;# a1 + d1
vsubsws v3, v6, v3 ;# a1 - d1
vaddsws v1, v7, v4 ;# b1 + c1
vsubsws v2, v7, v4 ;# b1 - c1
;# transpose input
vmrghw v4, v0, v1 ;# a0 b0 a1 b1
vmrghw v5, v2, v3 ;# c0 d0 c1 d1
vmrglw v6, v0, v1 ;# a2 b2 a3 b3
vmrglw v7, v2, v3 ;# c2 d2 c3 d3
vperm v0, v4, v5, v10 ;# a0 b0 c0 d0
vperm v1, v4, v5, v11 ;# a1 b1 c1 d1
vperm v2, v6, v7, v10 ;# a2 b2 c2 d2
vperm v3, v6, v7, v11 ;# a3 b3 c3 d3
;# second pass
vaddsws v6, v0, v2 ;# a1 = ip[0]+ip[8]
vsubsws v7, v0, v2 ;# b1 = ip[0]-ip[8]
vmulosh v4, v1, v8
vsraw v4, v4, v12
vaddsws v4, v4, v1 ;# ip[ 4] * sin(pi/8) * sqrt(2)
vmulosh v5, v3, v9
vsraw v5, v5, v12 ;# ip[12] * cos(pi/8) * sqrt(2)
vaddsws v5, v5, v3
vsubsws v4, v4, v5 ;# c1
vmulosh v2, v3, v8
vsraw v2, v2, v12
vaddsws v2, v2, v3 ;# ip[12] * sin(pi/8) * sqrt(2)
vmulosh v5, v1, v9
vsraw v5, v5, v12 ;# ip[ 4] * cos(pi/8) * sqrt(2)
vaddsws v5, v5, v1
vaddsws v3, v2, v5 ;# d1
vaddsws v0, v6, v3 ;# a1 + d1
vsubsws v3, v6, v3 ;# a1 - d1
vaddsws v1, v7, v4 ;# b1 + c1
vsubsws v2, v7, v4 ;# b1 - c1
vspltish v6, 4
vspltish v7, 3
vpkswss v0, v0, v1
vpkswss v1, v2, v3
vaddshs v0, v0, v6
vaddshs v1, v1, v6
vsrah v0, v0, v7
vsrah v1, v1, v7
;# transpose output
vmrghh v2, v0, v1 ;# a0 c0 a1 c1 a2 c2 a3 c3
vmrglh v3, v0, v1 ;# b0 d0 b1 d1 b2 d2 b3 d3
vmrghh v0, v2, v3 ;# a0 b0 c0 d0 a1 b1 c1 d1
vmrglh v1, v2, v3 ;# a2 b2 c2 d2 a3 b3 c3 d3
stwu r1,-416(r1) ;# create space on the stack
stvx v0, 0, r1
lwz r6, 0(r1)
stw r6, 0(r4)
lwz r6, 4(r1)
stw r6, 4(r4)
add r4, r4, r5
lwz r6, 8(r1)
stw r6, 0(r4)
lwz r6, 12(r1)
stw r6, 4(r4)
add r4, r4, r5
stvx v1, 0, r1
lwz r6, 0(r1)
stw r6, 0(r4)
lwz r6, 4(r1)
stw r6, 4(r4)
add r4, r4, r5
lwz r6, 8(r1)
stw r6, 0(r4)
lwz r6, 12(r1)
stw r6, 4(r4)
addi r1, r1, 416 ;# recover stack
mtspr 256, r11 ;# reset old VRSAVE
blr
.align 4
sinpi8sqrt2:
.short 35468, 35468, 35468, 35468, 35468, 35468, 35468, 35468
.align 4
cospi8sqrt2minus1:
.short 20091, 20091, 20091, 20091, 20091, 20091, 20091, 20091
.align 4
shift_16:
.long 16, 16, 16, 16
.align 4
hi_hi:
.byte 0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23
.align 4
lo_lo:
.byte 8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31

View File

@@ -0,0 +1,135 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include "loopfilter.h"
#include "onyxc_int.h"
typedef void loop_filter_function_y_ppc
(
unsigned char *s, // source pointer
int p, // pitch
const signed char *flimit,
const signed char *limit,
const signed char *thresh
);
typedef void loop_filter_function_uv_ppc
(
unsigned char *u, // source pointer
unsigned char *v, // source pointer
int p, // pitch
const signed char *flimit,
const signed char *limit,
const signed char *thresh
);
typedef void loop_filter_function_s_ppc
(
unsigned char *s, // source pointer
int p, // pitch
const signed char *flimit
);
loop_filter_function_y_ppc mbloop_filter_horizontal_edge_y_ppc;
loop_filter_function_y_ppc mbloop_filter_vertical_edge_y_ppc;
loop_filter_function_y_ppc loop_filter_horizontal_edge_y_ppc;
loop_filter_function_y_ppc loop_filter_vertical_edge_y_ppc;
loop_filter_function_uv_ppc mbloop_filter_horizontal_edge_uv_ppc;
loop_filter_function_uv_ppc mbloop_filter_vertical_edge_uv_ppc;
loop_filter_function_uv_ppc loop_filter_horizontal_edge_uv_ppc;
loop_filter_function_uv_ppc loop_filter_vertical_edge_uv_ppc;
loop_filter_function_s_ppc loop_filter_simple_horizontal_edge_ppc;
loop_filter_function_s_ppc loop_filter_simple_vertical_edge_ppc;
// Horizontal MB filtering
void loop_filter_mbh_ppc(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi)
{
mbloop_filter_horizontal_edge_y_ppc(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr);
if (u_ptr)
mbloop_filter_horizontal_edge_uv_ppc(u_ptr, v_ptr, uv_stride, lfi->mbflim, lfi->lim, lfi->thr);
}
void loop_filter_mbhs_ppc(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi)
{
(void)u_ptr;
(void)v_ptr;
(void)uv_stride;
loop_filter_simple_horizontal_edge_ppc(y_ptr, y_stride, lfi->mbflim);
}
// Vertical MB Filtering
void loop_filter_mbv_ppc(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi)
{
mbloop_filter_vertical_edge_y_ppc(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr);
if (u_ptr)
mbloop_filter_vertical_edge_uv_ppc(u_ptr, v_ptr, uv_stride, lfi->mbflim, lfi->lim, lfi->thr);
}
void loop_filter_mbvs_ppc(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi)
{
(void)u_ptr;
(void)v_ptr;
(void)uv_stride;
loop_filter_simple_vertical_edge_ppc(y_ptr, y_stride, lfi->mbflim);
}
// Horizontal B Filtering
void loop_filter_bh_ppc(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi)
{
// These should all be done at once with one call, instead of 3
loop_filter_horizontal_edge_y_ppc(y_ptr + 4 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr);
loop_filter_horizontal_edge_y_ppc(y_ptr + 8 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr);
loop_filter_horizontal_edge_y_ppc(y_ptr + 12 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr);
if (u_ptr)
loop_filter_horizontal_edge_uv_ppc(u_ptr + 4 * uv_stride, v_ptr + 4 * uv_stride, uv_stride, lfi->flim, lfi->lim, lfi->thr);
}
void loop_filter_bhs_ppc(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi)
{
(void)u_ptr;
(void)v_ptr;
(void)uv_stride;
loop_filter_simple_horizontal_edge_ppc(y_ptr + 4 * y_stride, y_stride, lfi->flim);
loop_filter_simple_horizontal_edge_ppc(y_ptr + 8 * y_stride, y_stride, lfi->flim);
loop_filter_simple_horizontal_edge_ppc(y_ptr + 12 * y_stride, y_stride, lfi->flim);
}
// Vertical B Filtering
void loop_filter_bv_ppc(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi)
{
loop_filter_vertical_edge_y_ppc(y_ptr, y_stride, lfi->flim, lfi->lim, lfi->thr);
if (u_ptr)
loop_filter_vertical_edge_uv_ppc(u_ptr + 4, v_ptr + 4, uv_stride, lfi->flim, lfi->lim, lfi->thr);
}
void loop_filter_bvs_ppc(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi)
{
(void)u_ptr;
(void)v_ptr;
(void)uv_stride;
loop_filter_simple_vertical_edge_ppc(y_ptr + 4, y_stride, lfi->flim);
loop_filter_simple_vertical_edge_ppc(y_ptr + 8, y_stride, lfi->flim);
loop_filter_simple_vertical_edge_ppc(y_ptr + 12, y_stride, lfi->flim);
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,59 @@
;
; Copyright (c) 2010 The WebM project authors. All Rights Reserved.
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
;
.globl save_platform_context
.globl restore_platform_context
.macro W V P
stvx \V, 0, \P
addi \P, \P, 16
.endm
.macro R V P
lvx \V, 0, \P
addi \P, \P, 16
.endm
;# r3 context_ptr
.align 2
save_platform_contex:
W v20, r3
W v21, r3
W v22, r3
W v23, r3
W v24, r3
W v25, r3
W v26, r3
W v27, r3
W v28, r3
W v29, r3
W v30, r3
W v31, r3
blr
;# r3 context_ptr
.align 2
restore_platform_context:
R v20, r3
R v21, r3
R v22, r3
R v23, r3
R v24, r3
R v25, r3
R v26, r3
R v27, r3
R v28, r3
R v29, r3
R v30, r3
R v31, r3
blr

View File

@@ -0,0 +1,175 @@
;
; Copyright (c) 2010 The WebM project authors. All Rights Reserved.
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
;
.globl recon4b_ppc
.globl recon2b_ppc
.globl recon_b_ppc
.macro row_of16 Diff Pred Dst Stride
lvx v1, 0, \Pred ;# v1 = pred = p0..p15
addi \Pred, \Pred, 16 ;# next pred
vmrghb v2, v0, v1 ;# v2 = 16-bit p0..p7
lvx v3, 0, \Diff ;# v3 = d0..d7
vaddshs v2, v2, v3 ;# v2 = r0..r7
vmrglb v1, v0, v1 ;# v1 = 16-bit p8..p15
lvx v3, r8, \Diff ;# v3 = d8..d15
addi \Diff, \Diff, 32 ;# next diff
vaddshs v3, v3, v1 ;# v3 = r8..r15
vpkshus v2, v2, v3 ;# v2 = 8-bit r0..r15
stvx v2, 0, \Dst ;# to dst
add \Dst, \Dst, \Stride ;# next dst
.endm
.text
.align 2
;# r3 = short *diff_ptr,
;# r4 = unsigned char *pred_ptr,
;# r5 = unsigned char *dst_ptr,
;# r6 = int stride
recon4b_ppc:
mfspr r0, 256 ;# get old VRSAVE
stw r0, -8(r1) ;# save old VRSAVE to stack
oris r0, r0, 0xf000
mtspr 256,r0 ;# set VRSAVE
vxor v0, v0, v0
li r8, 16
row_of16 r3, r4, r5, r6
row_of16 r3, r4, r5, r6
row_of16 r3, r4, r5, r6
row_of16 r3, r4, r5, r6
lwz r12, -8(r1) ;# restore old VRSAVE from stack
mtspr 256, r12 ;# reset old VRSAVE
blr
.macro two_rows_of8 Diff Pred Dst Stride write_first_four_pels
lvx v1, 0, \Pred ;# v1 = pred = p0..p15
vmrghb v2, v0, v1 ;# v2 = 16-bit p0..p7
lvx v3, 0, \Diff ;# v3 = d0..d7
vaddshs v2, v2, v3 ;# v2 = r0..r7
vmrglb v1, v0, v1 ;# v1 = 16-bit p8..p15
lvx v3, r8, \Diff ;# v2 = d8..d15
vaddshs v3, v3, v1 ;# v3 = r8..r15
vpkshus v2, v2, v3 ;# v3 = 8-bit r0..r15
stvx v2, 0, r10 ;# 2 rows to dst from buf
lwz r0, 0(r10)
.if \write_first_four_pels
stw r0, 0(\Dst)
.else
stwux r0, \Dst, \Stride
.endif
lwz r0, 4(r10)
stw r0, 4(\Dst)
lwz r0, 8(r10)
stwux r0, \Dst, \Stride ;# advance dst to next row
lwz r0, 12(r10)
stw r0, 4(\Dst)
.endm
.align 2
;# r3 = short *diff_ptr,
;# r4 = unsigned char *pred_ptr,
;# r5 = unsigned char *dst_ptr,
;# r6 = int stride
recon2b_ppc:
mfspr r0, 256 ;# get old VRSAVE
stw r0, -8(r1) ;# save old VRSAVE to stack
oris r0, r0, 0xf000
mtspr 256,r0 ;# set VRSAVE
vxor v0, v0, v0
li r8, 16
la r10, -48(r1) ;# buf
two_rows_of8 r3, r4, r5, r6, 1
addi r4, r4, 16; ;# next pred
addi r3, r3, 32; ;# next diff
two_rows_of8 r3, r4, r5, r6, 0
lwz r12, -8(r1) ;# restore old VRSAVE from stack
mtspr 256, r12 ;# reset old VRSAVE
blr
.macro get_two_diff_rows
stw r0, 0(r10)
lwz r0, 4(r3)
stw r0, 4(r10)
lwzu r0, 32(r3)
stw r0, 8(r10)
lwz r0, 4(r3)
stw r0, 12(r10)
lvx v3, 0, r10
.endm
.align 2
;# r3 = short *diff_ptr,
;# r4 = unsigned char *pred_ptr,
;# r5 = unsigned char *dst_ptr,
;# r6 = int stride
recon_b_ppc:
mfspr r0, 256 ;# get old VRSAVE
stw r0, -8(r1) ;# save old VRSAVE to stack
oris r0, r0, 0xf000
mtspr 256,r0 ;# set VRSAVE
vxor v0, v0, v0
la r10, -48(r1) ;# buf
lwz r0, 0(r4)
stw r0, 0(r10)
lwz r0, 16(r4)
stw r0, 4(r10)
lwz r0, 32(r4)
stw r0, 8(r10)
lwz r0, 48(r4)
stw r0, 12(r10)
lvx v1, 0, r10; ;# v1 = pred = p0..p15
lwz r0, 0(r3) ;# v3 = d0..d7
get_two_diff_rows
vmrghb v2, v0, v1; ;# v2 = 16-bit p0..p7
vaddshs v2, v2, v3; ;# v2 = r0..r7
lwzu r0, 32(r3) ;# v3 = d8..d15
get_two_diff_rows
vmrglb v1, v0, v1; ;# v1 = 16-bit p8..p15
vaddshs v3, v3, v1; ;# v3 = r8..r15
vpkshus v2, v2, v3; ;# v2 = 8-bit r0..r15
stvx v2, 0, r10; ;# 16 pels to dst from buf
lwz r0, 0(r10)
stw r0, 0(r5)
lwz r0, 4(r10)
stwux r0, r5, r6
lwz r0, 8(r10)
stwux r0, r5, r6
lwz r0, 12(r10)
stwx r0, r5, r6
lwz r12, -8(r1) ;# restore old VRSAVE from stack
mtspr 256, r12 ;# reset old VRSAVE
blr

View File

@@ -0,0 +1,277 @@
;
; Copyright (c) 2010 The WebM project authors. All Rights Reserved.
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
;
.globl vp8_sad16x16_ppc
.globl vp8_sad16x8_ppc
.globl vp8_sad8x16_ppc
.globl vp8_sad8x8_ppc
.globl vp8_sad4x4_ppc
.macro load_aligned_16 V R O
lvsl v3, 0, \R ;# permutate value for alignment
lvx v1, 0, \R
lvx v2, \O, \R
vperm \V, v1, v2, v3
.endm
.macro prologue
mfspr r11, 256 ;# get old VRSAVE
oris r12, r11, 0xffc0
mtspr 256, r12 ;# set VRSAVE
stwu r1, -32(r1) ;# create space on the stack
li r10, 16 ;# load offset and loop counter
vspltisw v8, 0 ;# zero out total to start
.endm
.macro epilogue
addi r1, r1, 32 ;# recover stack
mtspr 256, r11 ;# reset old VRSAVE
.endm
.macro SAD_16
;# v6 = abs (v4 - v5)
vsububs v6, v4, v5
vsububs v7, v5, v4
vor v6, v6, v7
;# v8 += abs (v4 - v5)
vsum4ubs v8, v6, v8
.endm
.macro sad_16_loop loop_label
lvsl v3, 0, r5 ;# only needs to be done once per block
;# preload a line of data before getting into the loop
lvx v4, 0, r3
lvx v1, 0, r5
lvx v2, r10, r5
add r5, r5, r6
add r3, r3, r4
vperm v5, v1, v2, v3
.align 4
\loop_label:
;# compute difference on first row
vsububs v6, v4, v5
vsububs v7, v5, v4
;# load up next set of data
lvx v9, 0, r3
lvx v1, 0, r5
lvx v2, r10, r5
;# perform abs() of difference
vor v6, v6, v7
add r3, r3, r4
;# add to the running tally
vsum4ubs v8, v6, v8
;# now onto the next line
vperm v5, v1, v2, v3
add r5, r5, r6
lvx v4, 0, r3
;# compute difference on second row
vsububs v6, v9, v5
lvx v1, 0, r5
vsububs v7, v5, v9
lvx v2, r10, r5
vor v6, v6, v7
add r3, r3, r4
vsum4ubs v8, v6, v8
vperm v5, v1, v2, v3
add r5, r5, r6
bdnz \loop_label
vspltisw v7, 0
vsumsws v8, v8, v7
stvx v8, 0, r1
lwz r3, 12(r1)
.endm
.macro sad_8_loop loop_label
.align 4
\loop_label:
;# only one of the inputs should need to be aligned.
load_aligned_16 v4, r3, r10
load_aligned_16 v5, r5, r10
;# move onto the next line
add r3, r3, r4
add r5, r5, r6
;# only one of the inputs should need to be aligned.
load_aligned_16 v6, r3, r10
load_aligned_16 v7, r5, r10
;# move onto the next line
add r3, r3, r4
add r5, r5, r6
vmrghb v4, v4, v6
vmrghb v5, v5, v7
SAD_16
bdnz \loop_label
vspltisw v7, 0
vsumsws v8, v8, v7
stvx v8, 0, r1
lwz r3, 12(r1)
.endm
.align 2
;# r3 unsigned char *src_ptr
;# r4 int src_stride
;# r5 unsigned char *ref_ptr
;# r6 int ref_stride
;#
;# r3 return value
vp8_sad16x16_ppc:
prologue
li r9, 8
mtctr r9
sad_16_loop sad16x16_loop
epilogue
blr
.align 2
;# r3 unsigned char *src_ptr
;# r4 int src_stride
;# r5 unsigned char *ref_ptr
;# r6 int ref_stride
;#
;# r3 return value
vp8_sad16x8_ppc:
prologue
li r9, 4
mtctr r9
sad_16_loop sad16x8_loop
epilogue
blr
.align 2
;# r3 unsigned char *src_ptr
;# r4 int src_stride
;# r5 unsigned char *ref_ptr
;# r6 int ref_stride
;#
;# r3 return value
vp8_sad8x16_ppc:
prologue
li r9, 8
mtctr r9
sad_8_loop sad8x16_loop
epilogue
blr
.align 2
;# r3 unsigned char *src_ptr
;# r4 int src_stride
;# r5 unsigned char *ref_ptr
;# r6 int ref_stride
;#
;# r3 return value
vp8_sad8x8_ppc:
prologue
li r9, 4
mtctr r9
sad_8_loop sad8x8_loop
epilogue
blr
.macro transfer_4x4 I P
lwz r0, 0(\I)
add \I, \I, \P
lwz r7, 0(\I)
add \I, \I, \P
lwz r8, 0(\I)
add \I, \I, \P
lwz r9, 0(\I)
stw r0, 0(r1)
stw r7, 4(r1)
stw r8, 8(r1)
stw r9, 12(r1)
.endm
.align 2
;# r3 unsigned char *src_ptr
;# r4 int src_stride
;# r5 unsigned char *ref_ptr
;# r6 int ref_stride
;#
;# r3 return value
vp8_sad4x4_ppc:
prologue
transfer_4x4 r3, r4
lvx v4, 0, r1
transfer_4x4 r5, r6
lvx v5, 0, r1
vspltisw v8, 0 ;# zero out total to start
;# v6 = abs (v4 - v5)
vsububs v6, v4, v5
vsububs v7, v5, v4
vor v6, v6, v7
;# v8 += abs (v4 - v5)
vsum4ubs v7, v6, v8
vsumsws v7, v7, v8
stvx v7, 0, r1
lwz r3, 12(r1)
epilogue
blr

View File

@@ -0,0 +1,165 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include "subpixel.h"
#include "loopfilter.h"
#include "recon.h"
#include "onyxc_int.h"
extern void (*vp8_post_proc_down_and_across_mb_row)(
unsigned char *src_ptr,
unsigned char *dst_ptr,
int src_pixels_per_line,
int dst_pixels_per_line,
int cols,
unsigned char *f,
int size
);
extern void (*vp8_mbpost_proc_down)(unsigned char *dst, int pitch, int rows, int cols, int flimit);
extern void vp8_mbpost_proc_down_c(unsigned char *dst, int pitch, int rows, int cols, int flimit);
extern void (*vp8_mbpost_proc_across_ip)(unsigned char *src, int pitch, int rows, int cols, int flimit);
extern void vp8_mbpost_proc_across_ip_c(unsigned char *src, int pitch, int rows, int cols, int flimit);
extern void vp8_post_proc_down_and_across_mb_row_c
(
unsigned char *src_ptr,
unsigned char *dst_ptr,
int src_pixels_per_line,
int dst_pixels_per_line,
int cols,
unsigned char *f,
int size
);
void vp8_plane_add_noise_c(unsigned char *Start, unsigned int Width, unsigned int Height, int Pitch, int q, int a);
extern copy_mem_block_function *vp8_copy_mem16x16;
extern copy_mem_block_function *vp8_copy_mem8x8;
extern copy_mem_block_function *vp8_copy_mem8x4;
// PPC
extern subpixel_predict_function sixtap_predict_ppc;
extern subpixel_predict_function sixtap_predict8x4_ppc;
extern subpixel_predict_function sixtap_predict8x8_ppc;
extern subpixel_predict_function sixtap_predict16x16_ppc;
extern subpixel_predict_function bilinear_predict4x4_ppc;
extern subpixel_predict_function bilinear_predict8x4_ppc;
extern subpixel_predict_function bilinear_predict8x8_ppc;
extern subpixel_predict_function bilinear_predict16x16_ppc;
extern copy_mem_block_function copy_mem16x16_ppc;
void recon_b_ppc(short *diff_ptr, unsigned char *pred_ptr, unsigned char *dst_ptr, int stride);
void recon2b_ppc(short *diff_ptr, unsigned char *pred_ptr, unsigned char *dst_ptr, int stride);
void recon4b_ppc(short *diff_ptr, unsigned char *pred_ptr, unsigned char *dst_ptr, int stride);
extern void short_idct4x4llm_ppc(short *input, short *output, int pitch);
// Generic C
extern subpixel_predict_function vp8_sixtap_predict_c;
extern subpixel_predict_function vp8_sixtap_predict8x4_c;
extern subpixel_predict_function vp8_sixtap_predict8x8_c;
extern subpixel_predict_function vp8_sixtap_predict16x16_c;
extern subpixel_predict_function vp8_bilinear_predict4x4_c;
extern subpixel_predict_function vp8_bilinear_predict8x4_c;
extern subpixel_predict_function vp8_bilinear_predict8x8_c;
extern subpixel_predict_function vp8_bilinear_predict16x16_c;
extern copy_mem_block_function vp8_copy_mem16x16_c;
extern copy_mem_block_function vp8_copy_mem8x8_c;
extern copy_mem_block_function vp8_copy_mem8x4_c;
void vp8_recon_b_c(short *diff_ptr, unsigned char *pred_ptr, unsigned char *dst_ptr, int stride);
void vp8_recon2b_c(short *diff_ptr, unsigned char *pred_ptr, unsigned char *dst_ptr, int stride);
void vp8_recon4b_c(short *diff_ptr, unsigned char *pred_ptr, unsigned char *dst_ptr, int stride);
extern void vp8_short_idct4x4llm_1_c(short *input, short *output, int pitch);
extern void vp8_short_idct4x4llm_c(short *input, short *output, int pitch);
extern void vp8_dc_only_idct_c(short input_dc, short *output, int pitch);
// PPC
extern loop_filter_block_function loop_filter_mbv_ppc;
extern loop_filter_block_function loop_filter_bv_ppc;
extern loop_filter_block_function loop_filter_mbh_ppc;
extern loop_filter_block_function loop_filter_bh_ppc;
extern loop_filter_block_function loop_filter_mbvs_ppc;
extern loop_filter_block_function loop_filter_bvs_ppc;
extern loop_filter_block_function loop_filter_mbhs_ppc;
extern loop_filter_block_function loop_filter_bhs_ppc;
// Generic C
extern loop_filter_block_function vp8_loop_filter_mbv_c;
extern loop_filter_block_function vp8_loop_filter_bv_c;
extern loop_filter_block_function vp8_loop_filter_mbh_c;
extern loop_filter_block_function vp8_loop_filter_bh_c;
extern loop_filter_block_function vp8_loop_filter_mbvs_c;
extern loop_filter_block_function vp8_loop_filter_bvs_c;
extern loop_filter_block_function vp8_loop_filter_mbhs_c;
extern loop_filter_block_function vp8_loop_filter_bhs_c;
extern loop_filter_block_function *vp8_lf_mbvfull;
extern loop_filter_block_function *vp8_lf_mbhfull;
extern loop_filter_block_function *vp8_lf_bvfull;
extern loop_filter_block_function *vp8_lf_bhfull;
extern loop_filter_block_function *vp8_lf_mbvsimple;
extern loop_filter_block_function *vp8_lf_mbhsimple;
extern loop_filter_block_function *vp8_lf_bvsimple;
extern loop_filter_block_function *vp8_lf_bhsimple;
void vp8_clear_c(void)
{
}
void vp8_machine_specific_config(void)
{
// Pure C:
vp8_clear_system_state = vp8_clear_c;
vp8_recon_b = vp8_recon_b_c;
vp8_recon4b = vp8_recon4b_c;
vp8_recon2b = vp8_recon2b_c;
vp8_bilinear_predict16x16 = bilinear_predict16x16_ppc;
vp8_bilinear_predict8x8 = bilinear_predict8x8_ppc;
vp8_bilinear_predict8x4 = bilinear_predict8x4_ppc;
vp8_bilinear_predict = bilinear_predict4x4_ppc;
vp8_sixtap_predict16x16 = sixtap_predict16x16_ppc;
vp8_sixtap_predict8x8 = sixtap_predict8x8_ppc;
vp8_sixtap_predict8x4 = sixtap_predict8x4_ppc;
vp8_sixtap_predict = sixtap_predict_ppc;
vp8_short_idct4x4_1 = vp8_short_idct4x4llm_1_c;
vp8_short_idct4x4 = short_idct4x4llm_ppc;
vp8_dc_only_idct = vp8_dc_only_idct_c;
vp8_lf_mbvfull = loop_filter_mbv_ppc;
vp8_lf_bvfull = loop_filter_bv_ppc;
vp8_lf_mbhfull = loop_filter_mbh_ppc;
vp8_lf_bhfull = loop_filter_bh_ppc;
vp8_lf_mbvsimple = loop_filter_mbvs_ppc;
vp8_lf_bvsimple = loop_filter_bvs_ppc;
vp8_lf_mbhsimple = loop_filter_mbhs_ppc;
vp8_lf_bhsimple = loop_filter_bhs_ppc;
vp8_post_proc_down_and_across_mb_row = vp8_post_proc_down_and_across_mb_row_c;
vp8_mbpost_proc_down = vp8_mbpost_proc_down_c;
vp8_mbpost_proc_across_ip = vp8_mbpost_proc_across_ip_c;
vp8_plane_add_noise = vp8_plane_add_noise_c;
vp8_copy_mem16x16 = copy_mem16x16_ppc;
vp8_copy_mem8x8 = vp8_copy_mem8x8_c;
vp8_copy_mem8x4 = vp8_copy_mem8x4_c;
}

View File

@@ -0,0 +1,375 @@
;
; Copyright (c) 2010 The WebM project authors. All Rights Reserved.
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
;
.globl vp8_get8x8var_ppc
.globl vp8_get16x16var_ppc
.globl vp8_mse16x16_ppc
.globl vp8_variance16x16_ppc
.globl vp8_variance16x8_ppc
.globl vp8_variance8x16_ppc
.globl vp8_variance8x8_ppc
.globl vp8_variance4x4_ppc
.macro load_aligned_16 V R O
lvsl v3, 0, \R ;# permutate value for alignment
lvx v1, 0, \R
lvx v2, \O, \R
vperm \V, v1, v2, v3
.endm
.macro prologue
mfspr r11, 256 ;# get old VRSAVE
oris r12, r11, 0xffc0
mtspr 256, r12 ;# set VRSAVE
stwu r1, -32(r1) ;# create space on the stack
li r10, 16 ;# load offset and loop counter
vspltisw v7, 0 ;# zero for merging
vspltisw v8, 0 ;# zero out total to start
vspltisw v9, 0 ;# zero out total for dif^2
.endm
.macro epilogue
addi r1, r1, 32 ;# recover stack
mtspr 256, r11 ;# reset old VRSAVE
.endm
.macro compute_sum_sse
;# Compute sum first. Unpack to so signed subract
;# can be used. Only have a half word signed
;# subract. Do high, then low.
vmrghb v2, v7, v4
vmrghb v3, v7, v5
vsubshs v2, v2, v3
vsum4shs v8, v2, v8
vmrglb v2, v7, v4
vmrglb v3, v7, v5
vsubshs v2, v2, v3
vsum4shs v8, v2, v8
;# Now compute sse.
vsububs v2, v4, v5
vsububs v3, v5, v4
vor v2, v2, v3
vmsumubm v9, v2, v2, v9
.endm
.macro variance_16 DS loop_label store_sum
\loop_label:
;# only one of the inputs should need to be aligned.
load_aligned_16 v4, r3, r10
load_aligned_16 v5, r5, r10
;# move onto the next line
add r3, r3, r4
add r5, r5, r6
compute_sum_sse
bdnz \loop_label
vsumsws v8, v8, v7
vsumsws v9, v9, v7
stvx v8, 0, r1
lwz r3, 12(r1)
stvx v9, 0, r1
lwz r4, 12(r1)
.if \store_sum
stw r3, 0(r8) ;# sum
.endif
stw r4, 0(r7) ;# sse
mullw r3, r3, r3 ;# sum*sum
srlwi r3, r3, \DS ;# (sum*sum) >> DS
subf r3, r3, r4 ;# sse - ((sum*sum) >> DS)
.endm
.macro variance_8 DS loop_label store_sum
\loop_label:
;# only one of the inputs should need to be aligned.
load_aligned_16 v4, r3, r10
load_aligned_16 v5, r5, r10
;# move onto the next line
add r3, r3, r4
add r5, r5, r6
;# only one of the inputs should need to be aligned.
load_aligned_16 v6, r3, r10
load_aligned_16 v0, r5, r10
;# move onto the next line
add r3, r3, r4
add r5, r5, r6
vmrghb v4, v4, v6
vmrghb v5, v5, v0
compute_sum_sse
bdnz \loop_label
vsumsws v8, v8, v7
vsumsws v9, v9, v7
stvx v8, 0, r1
lwz r3, 12(r1)
stvx v9, 0, r1
lwz r4, 12(r1)
.if \store_sum
stw r3, 0(r8) ;# sum
.endif
stw r4, 0(r7) ;# sse
mullw r3, r3, r3 ;# sum*sum
srlwi r3, r3, \DS ;# (sum*sum) >> 8
subf r3, r3, r4 ;# sse - ((sum*sum) >> 8)
.endm
.align 2
;# r3 unsigned char *src_ptr
;# r4 int source_stride
;# r5 unsigned char *ref_ptr
;# r6 int recon_stride
;# r7 unsigned int *SSE
;# r8 int *Sum
;#
;# r3 return value
vp8_get8x8var_ppc:
prologue
li r9, 4
mtctr r9
variance_8 6, get8x8var_loop, 1
epilogue
blr
.align 2
;# r3 unsigned char *src_ptr
;# r4 int source_stride
;# r5 unsigned char *ref_ptr
;# r6 int recon_stride
;# r7 unsigned int *SSE
;# r8 int *Sum
;#
;# r3 return value
vp8_get16x16var_ppc:
prologue
mtctr r10
variance_16 8, get16x16var_loop, 1
epilogue
blr
.align 2
;# r3 unsigned char *src_ptr
;# r4 int source_stride
;# r5 unsigned char *ref_ptr
;# r6 int recon_stride
;# r7 unsigned int *sse
;#
;# r 3 return value
vp8_mse16x16_ppc:
prologue
mtctr r10
mse16x16_loop:
;# only one of the inputs should need to be aligned.
load_aligned_16 v4, r3, r10
load_aligned_16 v5, r5, r10
;# move onto the next line
add r3, r3, r4
add r5, r5, r6
;# Now compute sse.
vsububs v2, v4, v5
vsububs v3, v5, v4
vor v2, v2, v3
vmsumubm v9, v2, v2, v9
bdnz mse16x16_loop
vsumsws v9, v9, v7
stvx v9, 0, r1
lwz r3, 12(r1)
stvx v9, 0, r1
lwz r3, 12(r1)
stw r3, 0(r7) ;# sse
epilogue
blr
.align 2
;# r3 unsigned char *src_ptr
;# r4 int source_stride
;# r5 unsigned char *ref_ptr
;# r6 int recon_stride
;# r7 unsigned int *sse
;#
;# r3 return value
vp8_variance16x16_ppc:
prologue
mtctr r10
variance_16 8, variance16x16_loop, 0
epilogue
blr
.align 2
;# r3 unsigned char *src_ptr
;# r4 int source_stride
;# r5 unsigned char *ref_ptr
;# r6 int recon_stride
;# r7 unsigned int *sse
;#
;# r3 return value
vp8_variance16x8_ppc:
prologue
li r9, 8
mtctr r9
variance_16 7, variance16x8_loop, 0
epilogue
blr
.align 2
;# r3 unsigned char *src_ptr
;# r4 int source_stride
;# r5 unsigned char *ref_ptr
;# r6 int recon_stride
;# r7 unsigned int *sse
;#
;# r3 return value
vp8_variance8x16_ppc:
prologue
li r9, 8
mtctr r9
variance_8 7, variance8x16_loop, 0
epilogue
blr
.align 2
;# r3 unsigned char *src_ptr
;# r4 int source_stride
;# r5 unsigned char *ref_ptr
;# r6 int recon_stride
;# r7 unsigned int *sse
;#
;# r3 return value
vp8_variance8x8_ppc:
prologue
li r9, 4
mtctr r9
variance_8 6, variance8x8_loop, 0
epilogue
blr
.macro transfer_4x4 I P
lwz r0, 0(\I)
add \I, \I, \P
lwz r10,0(\I)
add \I, \I, \P
lwz r8, 0(\I)
add \I, \I, \P
lwz r9, 0(\I)
stw r0, 0(r1)
stw r10, 4(r1)
stw r8, 8(r1)
stw r9, 12(r1)
.endm
.align 2
;# r3 unsigned char *src_ptr
;# r4 int source_stride
;# r5 unsigned char *ref_ptr
;# r6 int recon_stride
;# r7 unsigned int *sse
;#
;# r3 return value
vp8_variance4x4_ppc:
prologue
transfer_4x4 r3, r4
lvx v4, 0, r1
transfer_4x4 r5, r6
lvx v5, 0, r1
compute_sum_sse
vsumsws v8, v8, v7
vsumsws v9, v9, v7
stvx v8, 0, r1
lwz r3, 12(r1)
stvx v9, 0, r1
lwz r4, 12(r1)
stw r4, 0(r7) ;# sse
mullw r3, r3, r3 ;# sum*sum
srlwi r3, r3, 4 ;# (sum*sum) >> 4
subf r3, r3, r4 ;# sse - ((sum*sum) >> 4)
epilogue
blr

View File

@@ -0,0 +1,865 @@
;
; Copyright (c) 2010 The WebM project authors. All Rights Reserved.
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
;
.globl vp8_sub_pixel_variance4x4_ppc
.globl vp8_sub_pixel_variance8x8_ppc
.globl vp8_sub_pixel_variance8x16_ppc
.globl vp8_sub_pixel_variance16x8_ppc
.globl vp8_sub_pixel_variance16x16_ppc
.macro load_c V, LABEL, OFF, R0, R1
lis \R0, \LABEL@ha
la \R1, \LABEL@l(\R0)
lvx \V, \OFF, \R1
.endm
.macro load_vfilter V0, V1
load_c \V0, vfilter_b, r6, r12, r10
addi r6, r6, 16
lvx \V1, r6, r10
.endm
.macro HProlog jump_label
;# load up horizontal filter
slwi. r5, r5, 4 ;# index into horizontal filter array
;# index to the next set of vectors in the row.
li r10, 16
;# downshift by 7 ( divide by 128 ) at the end
vspltish v19, 7
;# If there isn't any filtering to be done for the horizontal, then
;# just skip to the second pass.
beq \jump_label
load_c v20, hfilter_b, r5, r12, r0
;# setup constants
;# v14 permutation value for alignment
load_c v28, b_hperm_b, 0, r12, r0
;# index to the next set of vectors in the row.
li r12, 32
;# rounding added in on the multiply
vspltisw v21, 8
vspltisw v18, 3
vslw v18, v21, v18 ;# 0x00000040000000400000004000000040
slwi. r6, r6, 5 ;# index into vertical filter array
.endm
;# Filters a horizontal line
;# expects:
;# r3 src_ptr
;# r4 pitch
;# r10 16
;# r12 32
;# v17 perm intput
;# v18 rounding
;# v19 shift
;# v20 filter taps
;# v21 tmp
;# v22 tmp
;# v23 tmp
;# v24 tmp
;# v25 tmp
;# v26 tmp
;# v27 tmp
;# v28 perm output
;#
.macro hfilter_8 V, hp, lp, increment_counter
lvsl v17, 0, r3 ;# permutate value for alignment
;# input to filter is 9 bytes wide, output is 8 bytes.
lvx v21, 0, r3
lvx v22, r10, r3
.if \increment_counter
add r3, r3, r4
.endif
vperm v21, v21, v22, v17
vperm v24, v21, v21, \hp ;# v20 = 0123 1234 2345 3456
vperm v25, v21, v21, \lp ;# v21 = 4567 5678 6789 789A
vmsummbm v24, v20, v24, v18
vmsummbm v25, v20, v25, v18
vpkswus v24, v24, v25 ;# v24 = 0 4 8 C 1 5 9 D (16-bit)
vsrh v24, v24, v19 ;# divide v0, v1 by 128
vpkuhus \V, v24, v24 ;# \V = scrambled 8-bit result
.endm
.macro vfilter_16 P0 P1
vmuleub v22, \P0, v20 ;# 64 + 4 positive taps
vadduhm v22, v18, v22
vmuloub v23, \P0, v20
vadduhm v23, v18, v23
vmuleub v24, \P1, v21
vadduhm v22, v22, v24 ;# Re = evens, saturation unnecessary
vmuloub v25, \P1, v21
vadduhm v23, v23, v25 ;# Ro = odds
vsrh v22, v22, v19 ;# divide by 128
vsrh v23, v23, v19 ;# v16 v17 = evens, odds
vmrghh \P0, v22, v23 ;# v18 v19 = 16-bit result in order
vmrglh v23, v22, v23
vpkuhus \P0, \P0, v23 ;# P0 = 8-bit result
.endm
.macro compute_sum_sse src, ref, sum, sse, t1, t2, z0
;# Compute sum first. Unpack to so signed subract
;# can be used. Only have a half word signed
;# subract. Do high, then low.
vmrghb \t1, \z0, \src
vmrghb \t2, \z0, \ref
vsubshs \t1, \t1, \t2
vsum4shs \sum, \t1, \sum
vmrglb \t1, \z0, \src
vmrglb \t2, \z0, \ref
vsubshs \t1, \t1, \t2
vsum4shs \sum, \t1, \sum
;# Now compute sse.
vsububs \t1, \src, \ref
vsububs \t2, \ref, \src
vor \t1, \t1, \t2
vmsumubm \sse, \t1, \t1, \sse
.endm
.macro variance_final sum, sse, z0, DS
vsumsws \sum, \sum, \z0
vsumsws \sse, \sse, \z0
stvx \sum, 0, r1
lwz r3, 12(r1)
stvx \sse, 0, r1
lwz r4, 12(r1)
stw r4, 0(r9) ;# sse
mullw r3, r3, r3 ;# sum*sum
srlwi r3, r3, \DS ;# (sum*sum) >> 8
subf r3, r3, r4 ;# sse - ((sum*sum) >> 8)
.endm
.macro compute_sum_sse_16 V, increment_counter
load_and_align_16 v16, r7, r8, \increment_counter
compute_sum_sse \V, v16, v18, v19, v20, v21, v23
.endm
.macro load_and_align_16 V, R, P, increment_counter
lvsl v17, 0, \R ;# permutate value for alignment
;# input to filter is 21 bytes wide, output is 16 bytes.
;# input will can span three vectors if not aligned correctly.
lvx v21, 0, \R
lvx v22, r10, \R
.if \increment_counter
add \R, \R, \P
.endif
vperm \V, v21, v22, v17
.endm
.align 2
;# r3 unsigned char *src_ptr
;# r4 int src_pixels_per_line
;# r5 int xoffset
;# r6 int yoffset
;# r7 unsigned char *dst_ptr
;# r8 int dst_pixels_per_line
;# r9 unsigned int *sse
;#
;# r3 return value
vp8_sub_pixel_variance4x4_ppc:
mfspr r11, 256 ;# get old VRSAVE
oris r12, r11, 0xf830
ori r12, r12, 0xfff8
mtspr 256, r12 ;# set VRSAVE
stwu r1,-32(r1) ;# create space on the stack
HProlog second_pass_4x4_pre_copy_b
;# Load up permutation constants
load_c v10, b_0123_b, 0, r12, r0
load_c v11, b_4567_b, 0, r12, r0
hfilter_8 v0, v10, v11, 1
hfilter_8 v1, v10, v11, 1
hfilter_8 v2, v10, v11, 1
hfilter_8 v3, v10, v11, 1
;# Finished filtering main horizontal block. If there is no
;# vertical filtering, jump to storing the data. Otherwise
;# load up and filter the additional line that is needed
;# for the vertical filter.
beq compute_sum_sse_4x4_b
hfilter_8 v4, v10, v11, 0
b second_pass_4x4_b
second_pass_4x4_pre_copy_b:
slwi r6, r6, 5 ;# index into vertical filter array
load_and_align_16 v0, r3, r4, 1
load_and_align_16 v1, r3, r4, 1
load_and_align_16 v2, r3, r4, 1
load_and_align_16 v3, r3, r4, 1
load_and_align_16 v4, r3, r4, 0
second_pass_4x4_b:
vspltish v20, 8
vspltish v18, 3
vslh v18, v20, v18 ;# 0x0040 0040 0040 0040 0040 0040 0040 0040
load_vfilter v20, v21
vfilter_16 v0, v1
vfilter_16 v1, v2
vfilter_16 v2, v3
vfilter_16 v3, v4
compute_sum_sse_4x4_b:
vspltish v18, 0 ;# sum
vspltish v19, 0 ;# sse
vspltish v23, 0 ;# unpack
li r10, 16
load_and_align_16 v4, r7, r8, 1
load_and_align_16 v5, r7, r8, 1
load_and_align_16 v6, r7, r8, 1
load_and_align_16 v7, r7, r8, 1
vmrghb v0, v0, v1
vmrghb v1, v2, v3
vmrghb v2, v4, v5
vmrghb v3, v6, v7
load_c v10, b_hilo_b, 0, r12, r0
vperm v0, v0, v1, v10
vperm v1, v2, v3, v10
compute_sum_sse v0, v1, v18, v19, v20, v21, v23
variance_final v18, v19, v23, 4
addi r1, r1, 32 ;# recover stack
mtspr 256, r11 ;# reset old VRSAVE
blr
.align 2
;# r3 unsigned char *src_ptr
;# r4 int src_pixels_per_line
;# r5 int xoffset
;# r6 int yoffset
;# r7 unsigned char *dst_ptr
;# r8 int dst_pixels_per_line
;# r9 unsigned int *sse
;#
;# r3 return value
vp8_sub_pixel_variance8x8_ppc:
mfspr r11, 256 ;# get old VRSAVE
oris r12, r11, 0xfff0
ori r12, r12, 0xffff
mtspr 256, r12 ;# set VRSAVE
stwu r1,-32(r1) ;# create space on the stack
HProlog second_pass_8x8_pre_copy_b
;# Load up permutation constants
load_c v10, b_0123_b, 0, r12, r0
load_c v11, b_4567_b, 0, r12, r0
hfilter_8 v0, v10, v11, 1
hfilter_8 v1, v10, v11, 1
hfilter_8 v2, v10, v11, 1
hfilter_8 v3, v10, v11, 1
hfilter_8 v4, v10, v11, 1
hfilter_8 v5, v10, v11, 1
hfilter_8 v6, v10, v11, 1
hfilter_8 v7, v10, v11, 1
;# Finished filtering main horizontal block. If there is no
;# vertical filtering, jump to storing the data. Otherwise
;# load up and filter the additional line that is needed
;# for the vertical filter.
beq compute_sum_sse_8x8_b
hfilter_8 v8, v10, v11, 0
b second_pass_8x8_b
second_pass_8x8_pre_copy_b:
slwi. r6, r6, 5 ;# index into vertical filter array
load_and_align_16 v0, r3, r4, 1
load_and_align_16 v1, r3, r4, 1
load_and_align_16 v2, r3, r4, 1
load_and_align_16 v3, r3, r4, 1
load_and_align_16 v4, r3, r4, 1
load_and_align_16 v5, r3, r4, 1
load_and_align_16 v6, r3, r4, 1
load_and_align_16 v7, r3, r4, 1
load_and_align_16 v8, r3, r4, 0
beq compute_sum_sse_8x8_b
second_pass_8x8_b:
vspltish v20, 8
vspltish v18, 3
vslh v18, v20, v18 ;# 0x0040 0040 0040 0040 0040 0040 0040 0040
load_vfilter v20, v21
vfilter_16 v0, v1
vfilter_16 v1, v2
vfilter_16 v2, v3
vfilter_16 v3, v4
vfilter_16 v4, v5
vfilter_16 v5, v6
vfilter_16 v6, v7
vfilter_16 v7, v8
compute_sum_sse_8x8_b:
vspltish v18, 0 ;# sum
vspltish v19, 0 ;# sse
vspltish v23, 0 ;# unpack
li r10, 16
vmrghb v0, v0, v1
vmrghb v1, v2, v3
vmrghb v2, v4, v5
vmrghb v3, v6, v7
load_and_align_16 v4, r7, r8, 1
load_and_align_16 v5, r7, r8, 1
load_and_align_16 v6, r7, r8, 1
load_and_align_16 v7, r7, r8, 1
load_and_align_16 v8, r7, r8, 1
load_and_align_16 v9, r7, r8, 1
load_and_align_16 v10, r7, r8, 1
load_and_align_16 v11, r7, r8, 0
vmrghb v4, v4, v5
vmrghb v5, v6, v7
vmrghb v6, v8, v9
vmrghb v7, v10, v11
compute_sum_sse v0, v4, v18, v19, v20, v21, v23
compute_sum_sse v1, v5, v18, v19, v20, v21, v23
compute_sum_sse v2, v6, v18, v19, v20, v21, v23
compute_sum_sse v3, v7, v18, v19, v20, v21, v23
variance_final v18, v19, v23, 6
addi r1, r1, 32 ;# recover stack
mtspr 256, r11 ;# reset old VRSAVE
blr
.align 2
;# r3 unsigned char *src_ptr
;# r4 int src_pixels_per_line
;# r5 int xoffset
;# r6 int yoffset
;# r7 unsigned char *dst_ptr
;# r8 int dst_pixels_per_line
;# r9 unsigned int *sse
;#
;# r3 return value
vp8_sub_pixel_variance8x16_ppc:
mfspr r11, 256 ;# get old VRSAVE
oris r12, r11, 0xffff
ori r12, r12, 0xfffc
mtspr 256, r12 ;# set VRSAVE
stwu r1,-32(r1) ;# create space on the stack
HProlog second_pass_8x16_pre_copy_b
;# Load up permutation constants
load_c v29, b_0123_b, 0, r12, r0
load_c v30, b_4567_b, 0, r12, r0
hfilter_8 v0, v29, v30, 1
hfilter_8 v1, v29, v30, 1
hfilter_8 v2, v29, v30, 1
hfilter_8 v3, v29, v30, 1
hfilter_8 v4, v29, v30, 1
hfilter_8 v5, v29, v30, 1
hfilter_8 v6, v29, v30, 1
hfilter_8 v7, v29, v30, 1
hfilter_8 v8, v29, v30, 1
hfilter_8 v9, v29, v30, 1
hfilter_8 v10, v29, v30, 1
hfilter_8 v11, v29, v30, 1
hfilter_8 v12, v29, v30, 1
hfilter_8 v13, v29, v30, 1
hfilter_8 v14, v29, v30, 1
hfilter_8 v15, v29, v30, 1
;# Finished filtering main horizontal block. If there is no
;# vertical filtering, jump to storing the data. Otherwise
;# load up and filter the additional line that is needed
;# for the vertical filter.
beq compute_sum_sse_8x16_b
hfilter_8 v16, v29, v30, 0
b second_pass_8x16_b
second_pass_8x16_pre_copy_b:
slwi. r6, r6, 5 ;# index into vertical filter array
load_and_align_16 v0, r3, r4, 1
load_and_align_16 v1, r3, r4, 1
load_and_align_16 v2, r3, r4, 1
load_and_align_16 v3, r3, r4, 1
load_and_align_16 v4, r3, r4, 1
load_and_align_16 v5, r3, r4, 1
load_and_align_16 v6, r3, r4, 1
load_and_align_16 v7, r3, r4, 1
load_and_align_16 v8, r3, r4, 1
load_and_align_16 v9, r3, r4, 1
load_and_align_16 v10, r3, r4, 1
load_and_align_16 v11, r3, r4, 1
load_and_align_16 v12, r3, r4, 1
load_and_align_16 v13, r3, r4, 1
load_and_align_16 v14, r3, r4, 1
load_and_align_16 v15, r3, r4, 1
load_and_align_16 v16, r3, r4, 0
beq compute_sum_sse_8x16_b
second_pass_8x16_b:
vspltish v20, 8
vspltish v18, 3
vslh v18, v20, v18 ;# 0x0040 0040 0040 0040 0040 0040 0040 0040
load_vfilter v20, v21
vfilter_16 v0, v1
vfilter_16 v1, v2
vfilter_16 v2, v3
vfilter_16 v3, v4
vfilter_16 v4, v5
vfilter_16 v5, v6
vfilter_16 v6, v7
vfilter_16 v7, v8
vfilter_16 v8, v9
vfilter_16 v9, v10
vfilter_16 v10, v11
vfilter_16 v11, v12
vfilter_16 v12, v13
vfilter_16 v13, v14
vfilter_16 v14, v15
vfilter_16 v15, v16
compute_sum_sse_8x16_b:
vspltish v18, 0 ;# sum
vspltish v19, 0 ;# sse
vspltish v23, 0 ;# unpack
li r10, 16
vmrghb v0, v0, v1
vmrghb v1, v2, v3
vmrghb v2, v4, v5
vmrghb v3, v6, v7
vmrghb v4, v8, v9
vmrghb v5, v10, v11
vmrghb v6, v12, v13
vmrghb v7, v14, v15
load_and_align_16 v8, r7, r8, 1
load_and_align_16 v9, r7, r8, 1
load_and_align_16 v10, r7, r8, 1
load_and_align_16 v11, r7, r8, 1
load_and_align_16 v12, r7, r8, 1
load_and_align_16 v13, r7, r8, 1
load_and_align_16 v14, r7, r8, 1
load_and_align_16 v15, r7, r8, 1
vmrghb v8, v8, v9
vmrghb v9, v10, v11
vmrghb v10, v12, v13
vmrghb v11, v14, v15
compute_sum_sse v0, v8, v18, v19, v20, v21, v23
compute_sum_sse v1, v9, v18, v19, v20, v21, v23
compute_sum_sse v2, v10, v18, v19, v20, v21, v23
compute_sum_sse v3, v11, v18, v19, v20, v21, v23
load_and_align_16 v8, r7, r8, 1
load_and_align_16 v9, r7, r8, 1
load_and_align_16 v10, r7, r8, 1
load_and_align_16 v11, r7, r8, 1
load_and_align_16 v12, r7, r8, 1
load_and_align_16 v13, r7, r8, 1
load_and_align_16 v14, r7, r8, 1
load_and_align_16 v15, r7, r8, 0
vmrghb v8, v8, v9
vmrghb v9, v10, v11
vmrghb v10, v12, v13
vmrghb v11, v14, v15
compute_sum_sse v4, v8, v18, v19, v20, v21, v23
compute_sum_sse v5, v9, v18, v19, v20, v21, v23
compute_sum_sse v6, v10, v18, v19, v20, v21, v23
compute_sum_sse v7, v11, v18, v19, v20, v21, v23
variance_final v18, v19, v23, 7
addi r1, r1, 32 ;# recover stack
mtspr 256, r11 ;# reset old VRSAVE
blr
;# Filters a horizontal line
;# expects:
;# r3 src_ptr
;# r4 pitch
;# r10 16
;# r12 32
;# v17 perm intput
;# v18 rounding
;# v19 shift
;# v20 filter taps
;# v21 tmp
;# v22 tmp
;# v23 tmp
;# v24 tmp
;# v25 tmp
;# v26 tmp
;# v27 tmp
;# v28 perm output
;#
.macro hfilter_16 V, increment_counter
lvsl v17, 0, r3 ;# permutate value for alignment
;# input to filter is 21 bytes wide, output is 16 bytes.
;# input will can span three vectors if not aligned correctly.
lvx v21, 0, r3
lvx v22, r10, r3
lvx v23, r12, r3
.if \increment_counter
add r3, r3, r4
.endif
vperm v21, v21, v22, v17
vperm v22, v22, v23, v17 ;# v8 v9 = 21 input pixels left-justified
;# set 0
vmsummbm v24, v20, v21, v18 ;# taps times elements
;# set 1
vsldoi v23, v21, v22, 1
vmsummbm v25, v20, v23, v18
;# set 2
vsldoi v23, v21, v22, 2
vmsummbm v26, v20, v23, v18
;# set 3
vsldoi v23, v21, v22, 3
vmsummbm v27, v20, v23, v18
vpkswus v24, v24, v25 ;# v24 = 0 4 8 C 1 5 9 D (16-bit)
vpkswus v25, v26, v27 ;# v25 = 2 6 A E 3 7 B F
vsrh v24, v24, v19 ;# divide v0, v1 by 128
vsrh v25, v25, v19
vpkuhus \V, v24, v25 ;# \V = scrambled 8-bit result
vperm \V, \V, v0, v28 ;# \V = correctly-ordered result
.endm
.align 2
;# r3 unsigned char *src_ptr
;# r4 int src_pixels_per_line
;# r5 int xoffset
;# r6 int yoffset
;# r7 unsigned char *dst_ptr
;# r8 int dst_pixels_per_line
;# r9 unsigned int *sse
;#
;# r3 return value
vp8_sub_pixel_variance16x8_ppc:
mfspr r11, 256 ;# get old VRSAVE
oris r12, r11, 0xffff
ori r12, r12, 0xfff8
mtspr 256, r12 ;# set VRSAVE
stwu r1, -32(r1) ;# create space on the stack
HProlog second_pass_16x8_pre_copy_b
hfilter_16 v0, 1
hfilter_16 v1, 1
hfilter_16 v2, 1
hfilter_16 v3, 1
hfilter_16 v4, 1
hfilter_16 v5, 1
hfilter_16 v6, 1
hfilter_16 v7, 1
;# Finished filtering main horizontal block. If there is no
;# vertical filtering, jump to storing the data. Otherwise
;# load up and filter the additional line that is needed
;# for the vertical filter.
beq compute_sum_sse_16x8_b
hfilter_16 v8, 0
b second_pass_16x8_b
second_pass_16x8_pre_copy_b:
slwi. r6, r6, 5 ;# index into vertical filter array
load_and_align_16 v0, r3, r4, 1
load_and_align_16 v1, r3, r4, 1
load_and_align_16 v2, r3, r4, 1
load_and_align_16 v3, r3, r4, 1
load_and_align_16 v4, r3, r4, 1
load_and_align_16 v5, r3, r4, 1
load_and_align_16 v6, r3, r4, 1
load_and_align_16 v7, r3, r4, 1
load_and_align_16 v8, r3, r4, 1
beq compute_sum_sse_16x8_b
second_pass_16x8_b:
vspltish v20, 8
vspltish v18, 3
vslh v18, v20, v18 ;# 0x0040 0040 0040 0040 0040 0040 0040 0040
load_vfilter v20, v21
vfilter_16 v0, v1
vfilter_16 v1, v2
vfilter_16 v2, v3
vfilter_16 v3, v4
vfilter_16 v4, v5
vfilter_16 v5, v6
vfilter_16 v6, v7
vfilter_16 v7, v8
compute_sum_sse_16x8_b:
vspltish v18, 0 ;# sum
vspltish v19, 0 ;# sse
vspltish v23, 0 ;# unpack
li r10, 16
compute_sum_sse_16 v0, 1
compute_sum_sse_16 v1, 1
compute_sum_sse_16 v2, 1
compute_sum_sse_16 v3, 1
compute_sum_sse_16 v4, 1
compute_sum_sse_16 v5, 1
compute_sum_sse_16 v6, 1
compute_sum_sse_16 v7, 0
variance_final v18, v19, v23, 7
addi r1, r1, 32 ;# recover stack
mtspr 256, r11 ;# reset old VRSAVE
blr
.align 2
;# r3 unsigned char *src_ptr
;# r4 int src_pixels_per_line
;# r5 int xoffset
;# r6 int yoffset
;# r7 unsigned char *dst_ptr
;# r8 int dst_pixels_per_line
;# r9 unsigned int *sse
;#
;# r3 return value
vp8_sub_pixel_variance16x16_ppc:
mfspr r11, 256 ;# get old VRSAVE
oris r12, r11, 0xffff
ori r12, r12, 0xfff8
mtspr 256, r12 ;# set VRSAVE
stwu r1, -32(r1) ;# create space on the stack
HProlog second_pass_16x16_pre_copy_b
hfilter_16 v0, 1
hfilter_16 v1, 1
hfilter_16 v2, 1
hfilter_16 v3, 1
hfilter_16 v4, 1
hfilter_16 v5, 1
hfilter_16 v6, 1
hfilter_16 v7, 1
hfilter_16 v8, 1
hfilter_16 v9, 1
hfilter_16 v10, 1
hfilter_16 v11, 1
hfilter_16 v12, 1
hfilter_16 v13, 1
hfilter_16 v14, 1
hfilter_16 v15, 1
;# Finished filtering main horizontal block. If there is no
;# vertical filtering, jump to storing the data. Otherwise
;# load up and filter the additional line that is needed
;# for the vertical filter.
beq compute_sum_sse_16x16_b
hfilter_16 v16, 0
b second_pass_16x16_b
second_pass_16x16_pre_copy_b:
slwi. r6, r6, 5 ;# index into vertical filter array
load_and_align_16 v0, r3, r4, 1
load_and_align_16 v1, r3, r4, 1
load_and_align_16 v2, r3, r4, 1
load_and_align_16 v3, r3, r4, 1
load_and_align_16 v4, r3, r4, 1
load_and_align_16 v5, r3, r4, 1
load_and_align_16 v6, r3, r4, 1
load_and_align_16 v7, r3, r4, 1
load_and_align_16 v8, r3, r4, 1
load_and_align_16 v9, r3, r4, 1
load_and_align_16 v10, r3, r4, 1
load_and_align_16 v11, r3, r4, 1
load_and_align_16 v12, r3, r4, 1
load_and_align_16 v13, r3, r4, 1
load_and_align_16 v14, r3, r4, 1
load_and_align_16 v15, r3, r4, 1
load_and_align_16 v16, r3, r4, 0
beq compute_sum_sse_16x16_b
second_pass_16x16_b:
vspltish v20, 8
vspltish v18, 3
vslh v18, v20, v18 ;# 0x0040 0040 0040 0040 0040 0040 0040 0040
load_vfilter v20, v21
vfilter_16 v0, v1
vfilter_16 v1, v2
vfilter_16 v2, v3
vfilter_16 v3, v4
vfilter_16 v4, v5
vfilter_16 v5, v6
vfilter_16 v6, v7
vfilter_16 v7, v8
vfilter_16 v8, v9
vfilter_16 v9, v10
vfilter_16 v10, v11
vfilter_16 v11, v12
vfilter_16 v12, v13
vfilter_16 v13, v14
vfilter_16 v14, v15
vfilter_16 v15, v16
compute_sum_sse_16x16_b:
vspltish v18, 0 ;# sum
vspltish v19, 0 ;# sse
vspltish v23, 0 ;# unpack
li r10, 16
compute_sum_sse_16 v0, 1
compute_sum_sse_16 v1, 1
compute_sum_sse_16 v2, 1
compute_sum_sse_16 v3, 1
compute_sum_sse_16 v4, 1
compute_sum_sse_16 v5, 1
compute_sum_sse_16 v6, 1
compute_sum_sse_16 v7, 1
compute_sum_sse_16 v8, 1
compute_sum_sse_16 v9, 1
compute_sum_sse_16 v10, 1
compute_sum_sse_16 v11, 1
compute_sum_sse_16 v12, 1
compute_sum_sse_16 v13, 1
compute_sum_sse_16 v14, 1
compute_sum_sse_16 v15, 0
variance_final v18, v19, v23, 8
addi r1, r1, 32 ;# recover stack
mtspr 256, r11 ;# reset old VRSAVE
blr
.data
.align 4
hfilter_b:
.byte 128, 0, 0, 0,128, 0, 0, 0,128, 0, 0, 0,128, 0, 0, 0
.byte 112, 16, 0, 0,112, 16, 0, 0,112, 16, 0, 0,112, 16, 0, 0
.byte 96, 32, 0, 0, 96, 32, 0, 0, 96, 32, 0, 0, 96, 32, 0, 0
.byte 80, 48, 0, 0, 80, 48, 0, 0, 80, 48, 0, 0, 80, 48, 0, 0
.byte 64, 64, 0, 0, 64, 64, 0, 0, 64, 64, 0, 0, 64, 64, 0, 0
.byte 48, 80, 0, 0, 48, 80, 0, 0, 48, 80, 0, 0, 48, 80, 0, 0
.byte 32, 96, 0, 0, 32, 96, 0, 0, 32, 96, 0, 0, 32, 96, 0, 0
.byte 16,112, 0, 0, 16,112, 0, 0, 16,112, 0, 0, 16,112, 0, 0
.align 4
vfilter_b:
.byte 128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128
.byte 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.byte 112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112
.byte 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16
.byte 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96
.byte 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32
.byte 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80
.byte 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48
.byte 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64
.byte 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64
.byte 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48
.byte 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80
.byte 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32
.byte 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96
.byte 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16
.byte 112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112
.align 4
b_hperm_b:
.byte 0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15
.align 4
b_0123_b:
.byte 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6
.align 4
b_4567_b:
.byte 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10
b_hilo_b:
.byte 0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23

View File

@@ -70,10 +70,10 @@ void vp8_build_intra_predictors_mby_s_c(MACROBLOCKD *x,
expected_dc = 128;
}
/*memset(ypred_ptr, expected_dc, 256);*/
/*vpx_memset(ypred_ptr, expected_dc, 256);*/
for (r = 0; r < 16; r++)
{
memset(ypred_ptr, expected_dc, 16);
vpx_memset(ypred_ptr, expected_dc, 16);
ypred_ptr += y_stride;
}
}
@@ -98,7 +98,7 @@ void vp8_build_intra_predictors_mby_s_c(MACROBLOCKD *x,
for (r = 0; r < 16; r++)
{
memset(ypred_ptr, yleft_col[r], 16);
vpx_memset(ypred_ptr, yleft_col[r], 16);
ypred_ptr += y_stride;
}
@@ -202,12 +202,12 @@ void vp8_build_intra_predictors_mbuv_s_c(MACROBLOCKD *x,
}
/*memset(upred_ptr,expected_udc,64);*/
/*memset(vpred_ptr,expected_vdc,64);*/
/*vpx_memset(upred_ptr,expected_udc,64);*/
/*vpx_memset(vpred_ptr,expected_vdc,64);*/
for (i = 0; i < 8; i++)
{
memset(upred_ptr, expected_udc, 8);
memset(vpred_ptr, expected_vdc, 8);
vpx_memset(upred_ptr, expected_udc, 8);
vpx_memset(vpred_ptr, expected_vdc, 8);
upred_ptr += pred_stride;
vpred_ptr += pred_stride;
}
@@ -217,8 +217,8 @@ void vp8_build_intra_predictors_mbuv_s_c(MACROBLOCKD *x,
{
for (i = 0; i < 8; i++)
{
memcpy(upred_ptr, uabove_row, 8);
memcpy(vpred_ptr, vabove_row, 8);
vpx_memcpy(upred_ptr, uabove_row, 8);
vpx_memcpy(vpred_ptr, vabove_row, 8);
upred_ptr += pred_stride;
vpred_ptr += pred_stride;
}
@@ -229,8 +229,8 @@ void vp8_build_intra_predictors_mbuv_s_c(MACROBLOCKD *x,
{
for (i = 0; i < 8; i++)
{
memset(upred_ptr, uleft_col[i], 8);
memset(vpred_ptr, vleft_col[i], 8);
vpx_memset(upred_ptr, uleft_col[i], 8);
vpx_memset(vpred_ptr, vleft_col[i], 8);
upred_ptr += pred_stride;
vpred_ptr += pred_stride;
}

View File

@@ -7,13 +7,15 @@
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include "./vpx_config.h"
#include "vpx_config.h"
#define RTCD_C
#include "./vp8_rtcd.h"
#include "vp8_rtcd.h"
#include "vpx_ports/vpx_once.h"
extern void vpx_scale_rtcd(void);
void vp8_rtcd()
{
vpx_scale_rtcd();
once(setup_rtcd_internal);
}

View File

@@ -17,15 +17,15 @@ void vp8_setup_intra_recon(YV12_BUFFER_CONFIG *ybf)
int i;
/* set up frame new frame for intra coded blocks */
memset(ybf->y_buffer - 1 - ybf->y_stride, 127, ybf->y_width + 5);
vpx_memset(ybf->y_buffer - 1 - ybf->y_stride, 127, ybf->y_width + 5);
for (i = 0; i < ybf->y_height; i++)
ybf->y_buffer[ybf->y_stride *i - 1] = (unsigned char) 129;
memset(ybf->u_buffer - 1 - ybf->uv_stride, 127, ybf->uv_width + 5);
vpx_memset(ybf->u_buffer - 1 - ybf->uv_stride, 127, ybf->uv_width + 5);
for (i = 0; i < ybf->uv_height; i++)
ybf->u_buffer[ybf->uv_stride *i - 1] = (unsigned char) 129;
memset(ybf->v_buffer - 1 - ybf->uv_stride, 127, ybf->uv_width + 5);
vpx_memset(ybf->v_buffer - 1 - ybf->uv_stride, 127, ybf->uv_width + 5);
for (i = 0; i < ybf->uv_height; i++)
ybf->v_buffer[ybf->uv_stride *i - 1] = (unsigned char) 129;
@@ -33,7 +33,7 @@ void vp8_setup_intra_recon(YV12_BUFFER_CONFIG *ybf)
void vp8_setup_intra_recon_top_line(YV12_BUFFER_CONFIG *ybf)
{
memset(ybf->y_buffer - 1 - ybf->y_stride, 127, ybf->y_width + 5);
memset(ybf->u_buffer - 1 - ybf->uv_stride, 127, ybf->uv_width + 5);
memset(ybf->v_buffer - 1 - ybf->uv_stride, 127, ybf->uv_width + 5);
vpx_memset(ybf->y_buffer - 1 - ybf->y_stride, 127, ybf->y_width + 5);
vpx_memset(ybf->u_buffer - 1 - ybf->uv_stride, 127, ybf->uv_width + 5);
vpx_memset(ybf->v_buffer - 1 - ybf->uv_stride, 127, ybf->uv_width + 5);
}

View File

@@ -36,7 +36,7 @@ void vp8_dequant_idct_add_y_block_mmx
else if (eobs[0] == 1)
{
vp8_dc_only_idct_add_mmx (q[0]*dq[0], dst, stride, dst, stride);
memset(q, 0, 2 * sizeof(q[0]));
vpx_memset(q, 0, 2 * sizeof(q[0]));
}
if (eobs[1] > 1)
@@ -45,7 +45,7 @@ void vp8_dequant_idct_add_y_block_mmx
{
vp8_dc_only_idct_add_mmx (q[16]*dq[0], dst+4, stride,
dst+4, stride);
memset(q + 16, 0, 2 * sizeof(q[0]));
vpx_memset(q + 16, 0, 2 * sizeof(q[0]));
}
if (eobs[2] > 1)
@@ -54,7 +54,7 @@ void vp8_dequant_idct_add_y_block_mmx
{
vp8_dc_only_idct_add_mmx (q[32]*dq[0], dst+8, stride,
dst+8, stride);
memset(q + 32, 0, 2 * sizeof(q[0]));
vpx_memset(q + 32, 0, 2 * sizeof(q[0]));
}
if (eobs[3] > 1)
@@ -63,7 +63,7 @@ void vp8_dequant_idct_add_y_block_mmx
{
vp8_dc_only_idct_add_mmx (q[48]*dq[0], dst+12, stride,
dst+12, stride);
memset(q + 48, 0, 2 * sizeof(q[0]));
vpx_memset(q + 48, 0, 2 * sizeof(q[0]));
}
q += 64;
@@ -85,7 +85,7 @@ void vp8_dequant_idct_add_uv_block_mmx
else if (eobs[0] == 1)
{
vp8_dc_only_idct_add_mmx (q[0]*dq[0], dstu, stride, dstu, stride);
memset(q, 0, 2 * sizeof(q[0]));
vpx_memset(q, 0, 2 * sizeof(q[0]));
}
if (eobs[1] > 1)
@@ -94,7 +94,7 @@ void vp8_dequant_idct_add_uv_block_mmx
{
vp8_dc_only_idct_add_mmx (q[16]*dq[0], dstu+4, stride,
dstu+4, stride);
memset(q + 16, 0, 2 * sizeof(q[0]));
vpx_memset(q + 16, 0, 2 * sizeof(q[0]));
}
q += 32;
@@ -109,7 +109,7 @@ void vp8_dequant_idct_add_uv_block_mmx
else if (eobs[0] == 1)
{
vp8_dc_only_idct_add_mmx (q[0]*dq[0], dstv, stride, dstv, stride);
memset(q, 0, 2 * sizeof(q[0]));
vpx_memset(q, 0, 2 * sizeof(q[0]));
}
if (eobs[1] > 1)
@@ -118,7 +118,7 @@ void vp8_dequant_idct_add_uv_block_mmx
{
vp8_dc_only_idct_add_mmx (q[16]*dq[0], dstv+4, stride,
dstv+4, stride);
memset(q + 16, 0, 2 * sizeof(q[0]));
vpx_memset(q + 16, 0, 2 * sizeof(q[0]));
}
q += 32;

View File

@@ -142,7 +142,7 @@ static void decode_macroblock(VP8D_COMP *pbi, MACROBLOCKD *xd,
* Better to use the predictor as reconstruction.
*/
pbi->frame_corrupt_residual = 1;
memset(xd->qcoeff, 0, sizeof(xd->qcoeff));
vpx_memset(xd->qcoeff, 0, sizeof(xd->qcoeff));
vp8_conceal_corrupt_mb(xd);
@@ -151,7 +151,7 @@ static void decode_macroblock(VP8D_COMP *pbi, MACROBLOCKD *xd,
/* force idct to be skipped for B_PRED and use the
* prediction only for reconstruction
* */
memset(xd->eobs, 0, 25);
vpx_memset(xd->eobs, 0, 25);
}
}
#endif
@@ -184,7 +184,7 @@ static void decode_macroblock(VP8D_COMP *pbi, MACROBLOCKD *xd,
/* clear out residual eob info */
if(xd->mode_info_context->mbmi.mb_skip_coeff)
memset(xd->eobs, 0, 25);
vpx_memset(xd->eobs, 0, 25);
intra_prediction_down_copy(xd, xd->recon_above[0] + 16);
@@ -214,7 +214,7 @@ static void decode_macroblock(VP8D_COMP *pbi, MACROBLOCKD *xd,
(b->qcoeff[0] * DQC[0],
dst, dst_stride,
dst, dst_stride);
memset(b->qcoeff, 0, 2 * sizeof(b->qcoeff[0]));
vpx_memset(b->qcoeff, 0, 2 * sizeof(b->qcoeff[0]));
}
}
}
@@ -251,14 +251,14 @@ static void decode_macroblock(VP8D_COMP *pbi, MACROBLOCKD *xd,
vp8_short_inv_walsh4x4(&b->dqcoeff[0],
xd->qcoeff);
memset(b->qcoeff, 0, 16 * sizeof(b->qcoeff[0]));
vpx_memset(b->qcoeff, 0, 16 * sizeof(b->qcoeff[0]));
}
else
{
b->dqcoeff[0] = b->qcoeff[0] * xd->dequant_y2[0];
vp8_short_inv_walsh4x4_1(&b->dqcoeff[0],
xd->qcoeff);
memset(b->qcoeff, 0, 2 * sizeof(b->qcoeff[0]));
vpx_memset(b->qcoeff, 0, 2 * sizeof(b->qcoeff[0]));
}
/* override the dc dequant constant in order to preserve the
@@ -323,7 +323,7 @@ static void yv12_extend_frame_top_c(YV12_BUFFER_CONFIG *ybf)
for (i = 0; i < (int)Border; i++)
{
memcpy(dest_ptr1, src_ptr1, plane_stride);
vpx_memcpy(dest_ptr1, src_ptr1, plane_stride);
dest_ptr1 += plane_stride;
}
@@ -338,7 +338,7 @@ static void yv12_extend_frame_top_c(YV12_BUFFER_CONFIG *ybf)
for (i = 0; i < (int)(Border); i++)
{
memcpy(dest_ptr1, src_ptr1, plane_stride);
vpx_memcpy(dest_ptr1, src_ptr1, plane_stride);
dest_ptr1 += plane_stride;
}
@@ -351,7 +351,7 @@ static void yv12_extend_frame_top_c(YV12_BUFFER_CONFIG *ybf)
for (i = 0; i < (int)(Border); i++)
{
memcpy(dest_ptr1, src_ptr1, plane_stride);
vpx_memcpy(dest_ptr1, src_ptr1, plane_stride);
dest_ptr1 += plane_stride;
}
}
@@ -379,7 +379,7 @@ static void yv12_extend_frame_bottom_c(YV12_BUFFER_CONFIG *ybf)
for (i = 0; i < (int)Border; i++)
{
memcpy(dest_ptr2, src_ptr2, plane_stride);
vpx_memcpy(dest_ptr2, src_ptr2, plane_stride);
dest_ptr2 += plane_stride;
}
@@ -397,7 +397,7 @@ static void yv12_extend_frame_bottom_c(YV12_BUFFER_CONFIG *ybf)
for (i = 0; i < (int)(Border); i++)
{
memcpy(dest_ptr2, src_ptr2, plane_stride);
vpx_memcpy(dest_ptr2, src_ptr2, plane_stride);
dest_ptr2 += plane_stride;
}
@@ -411,7 +411,7 @@ static void yv12_extend_frame_bottom_c(YV12_BUFFER_CONFIG *ybf)
for (i = 0; i < (int)(Border); i++)
{
memcpy(dest_ptr2, src_ptr2, plane_stride);
vpx_memcpy(dest_ptr2, src_ptr2, plane_stride);
dest_ptr2 += plane_stride;
}
}
@@ -446,8 +446,8 @@ static void yv12_extend_frame_left_right_c(YV12_BUFFER_CONFIG *ybf,
for (i = 0; i < plane_height; i++)
{
memset(dest_ptr1, src_ptr1[0], Border);
memset(dest_ptr2, src_ptr2[0], Border);
vpx_memset(dest_ptr1, src_ptr1[0], Border);
vpx_memset(dest_ptr2, src_ptr2[0], Border);
src_ptr1 += plane_stride;
src_ptr2 += plane_stride;
dest_ptr1 += plane_stride;
@@ -470,8 +470,8 @@ static void yv12_extend_frame_left_right_c(YV12_BUFFER_CONFIG *ybf,
for (i = 0; i < plane_height; i++)
{
memset(dest_ptr1, src_ptr1[0], Border);
memset(dest_ptr2, src_ptr2[0], Border);
vpx_memset(dest_ptr1, src_ptr1[0], Border);
vpx_memset(dest_ptr2, src_ptr2[0], Border);
src_ptr1 += plane_stride;
src_ptr2 += plane_stride;
dest_ptr1 += plane_stride;
@@ -490,8 +490,8 @@ static void yv12_extend_frame_left_right_c(YV12_BUFFER_CONFIG *ybf,
for (i = 0; i < plane_height; i++)
{
memset(dest_ptr1, src_ptr1[0], Border);
memset(dest_ptr2, src_ptr2[0], Border);
vpx_memset(dest_ptr1, src_ptr1[0], Border);
vpx_memset(dest_ptr2, src_ptr2[0], Border);
src_ptr1 += plane_stride;
src_ptr2 += plane_stride;
dest_ptr1 += plane_stride;
@@ -568,7 +568,7 @@ static void decode_mb_rows(VP8D_COMP *pbi)
/* reset contexts */
xd->above_context = pc->above_context;
memset(xd->left_context, 0, sizeof(ENTROPY_CONTEXT_PLANES));
vpx_memset(xd->left_context, 0, sizeof(ENTROPY_CONTEXT_PLANES));
xd->left_available = 0;
@@ -918,19 +918,19 @@ static void init_frame(VP8D_COMP *pbi)
if (pc->frame_type == KEY_FRAME)
{
/* Various keyframe initializations */
memcpy(pc->fc.mvc, vp8_default_mv_context, sizeof(vp8_default_mv_context));
vpx_memcpy(pc->fc.mvc, vp8_default_mv_context, sizeof(vp8_default_mv_context));
vp8_init_mbmode_probs(pc);
vp8_default_coef_probs(pc);
/* reset the segment feature data to 0 with delta coding (Default state). */
memset(xd->segment_feature_data, 0, sizeof(xd->segment_feature_data));
vpx_memset(xd->segment_feature_data, 0, sizeof(xd->segment_feature_data));
xd->mb_segement_abs_delta = SEGMENT_DELTADATA;
/* reset the mode ref deltasa for loop filter */
memset(xd->ref_lf_deltas, 0, sizeof(xd->ref_lf_deltas));
memset(xd->mode_lf_deltas, 0, sizeof(xd->mode_lf_deltas));
vpx_memset(xd->ref_lf_deltas, 0, sizeof(xd->ref_lf_deltas));
vpx_memset(xd->mode_lf_deltas, 0, sizeof(xd->mode_lf_deltas));
/* All buffers are implicitly updated on key frames. */
pc->refresh_golden_frame = 1;
@@ -1069,11 +1069,12 @@ int vp8_decode_frame(VP8D_COMP *pbi)
pc->vert_scale = clear[6] >> 6;
}
data += 7;
clear += 7;
}
else
{
memcpy(&xd->pre, yv12_fb_new, sizeof(YV12_BUFFER_CONFIG));
memcpy(&xd->dst, yv12_fb_new, sizeof(YV12_BUFFER_CONFIG));
vpx_memcpy(&xd->pre, yv12_fb_new, sizeof(YV12_BUFFER_CONFIG));
vpx_memcpy(&xd->dst, yv12_fb_new, sizeof(YV12_BUFFER_CONFIG));
}
}
if ((!pbi->decoded_key_frame && pc->frame_type != KEY_FRAME))
@@ -1105,7 +1106,7 @@ int vp8_decode_frame(VP8D_COMP *pbi)
{
xd->mb_segement_abs_delta = (unsigned char)vp8_read_bit(bc);
memset(xd->segment_feature_data, 0, sizeof(xd->segment_feature_data));
vpx_memset(xd->segment_feature_data, 0, sizeof(xd->segment_feature_data));
/* For each segmentation feature (Quant and loop filter level) */
for (i = 0; i < MB_LVL_MAX; i++)
@@ -1129,7 +1130,7 @@ int vp8_decode_frame(VP8D_COMP *pbi)
if (xd->update_mb_segmentation_map)
{
/* Which macro block level features are enabled */
memset(xd->mb_segment_tree_probs, 255, sizeof(xd->mb_segment_tree_probs));
vpx_memset(xd->mb_segment_tree_probs, 255, sizeof(xd->mb_segment_tree_probs));
/* Read the probs used to decode the segment id for each macro block. */
for (i = 0; i < MB_FEATURE_TREE_PROBS; i++)
@@ -1278,7 +1279,7 @@ int vp8_decode_frame(VP8D_COMP *pbi)
#endif
if (pc->refresh_entropy_probs == 0)
{
memcpy(&pc->lfc, &pc->fc, sizeof(pc->fc));
vpx_memcpy(&pc->lfc, &pc->fc, sizeof(pc->fc));
}
pc->refresh_last_frame = pc->frame_type == KEY_FRAME || vp8_read_bit(bc);
@@ -1327,7 +1328,7 @@ int vp8_decode_frame(VP8D_COMP *pbi)
}
/* clear out the coeff buffer */
memset(xd->qcoeff, 0, sizeof(xd->qcoeff));
vpx_memset(xd->qcoeff, 0, sizeof(xd->qcoeff));
vp8_decode_mode_mvs(pbi);
@@ -1341,7 +1342,7 @@ int vp8_decode_frame(VP8D_COMP *pbi)
}
#endif
memset(pc->above_context, 0, sizeof(ENTROPY_CONTEXT_PLANES) * pc->mb_cols);
vpx_memset(pc->above_context, 0, sizeof(ENTROPY_CONTEXT_PLANES) * pc->mb_cols);
pbi->frame_corrupt_residual = 0;
#if CONFIG_MULTITHREAD
@@ -1380,7 +1381,7 @@ int vp8_decode_frame(VP8D_COMP *pbi)
if (pc->refresh_entropy_probs == 0)
{
memcpy(&pc->fc, &pc->lfc, sizeof(pc->fc));
vpx_memcpy(&pc->fc, &pc->lfc, sizeof(pc->fc));
pbi->independent_partitions = prev_independent_partitions;
}

View File

@@ -20,8 +20,8 @@ void vp8_reset_mb_tokens_context(MACROBLOCKD *x)
ENTROPY_CONTEXT *a_ctx = ((ENTROPY_CONTEXT *)x->above_context);
ENTROPY_CONTEXT *l_ctx = ((ENTROPY_CONTEXT *)x->left_context);
memset(a_ctx, 0, sizeof(ENTROPY_CONTEXT_PLANES)-1);
memset(l_ctx, 0, sizeof(ENTROPY_CONTEXT_PLANES)-1);
vpx_memset(a_ctx, 0, sizeof(ENTROPY_CONTEXT_PLANES)-1);
vpx_memset(l_ctx, 0, sizeof(ENTROPY_CONTEXT_PLANES)-1);
/* Clear entropy contexts for Y2 blocks */
if (!x->mode_info_context->mbmi.is_4x4)

View File

@@ -350,7 +350,7 @@ static void estimate_missing_mvs(MB_OVERLAP *overlaps,
unsigned int first_corrupt)
{
int mb_row, mb_col;
memset(overlaps, 0, sizeof(MB_OVERLAP) * mb_rows * mb_cols);
vpx_memset(overlaps, 0, sizeof(MB_OVERLAP) * mb_rows * mb_cols);
/* First calculate the overlaps for all blocks */
for (mb_row = 0; mb_row < mb_rows; ++mb_row)
{

View File

@@ -58,7 +58,7 @@ static struct VP8D_COMP * create_decompressor(VP8D_CONFIG *oxcf)
if (!pbi)
return NULL;
memset(pbi, 0, sizeof(VP8D_COMP));
vpx_memset(pbi, 0, sizeof(VP8D_COMP));
if (setjmp(pbi->common.error.jmp))
{

View File

@@ -60,12 +60,12 @@ static void setup_decoding_thread_data(VP8D_COMP *pbi, MACROBLOCKD *xd, MB_ROW_D
mbd->segmentation_enabled = xd->segmentation_enabled;
mbd->mb_segement_abs_delta = xd->mb_segement_abs_delta;
memcpy(mbd->segment_feature_data, xd->segment_feature_data, sizeof(xd->segment_feature_data));
vpx_memcpy(mbd->segment_feature_data, xd->segment_feature_data, sizeof(xd->segment_feature_data));
/*signed char ref_lf_deltas[MAX_REF_LF_DELTAS];*/
memcpy(mbd->ref_lf_deltas, xd->ref_lf_deltas, sizeof(xd->ref_lf_deltas));
vpx_memcpy(mbd->ref_lf_deltas, xd->ref_lf_deltas, sizeof(xd->ref_lf_deltas));
/*signed char mode_lf_deltas[MAX_MODE_LF_DELTAS];*/
memcpy(mbd->mode_lf_deltas, xd->mode_lf_deltas, sizeof(xd->mode_lf_deltas));
vpx_memcpy(mbd->mode_lf_deltas, xd->mode_lf_deltas, sizeof(xd->mode_lf_deltas));
/*unsigned char mode_ref_lf_delta_enabled;
unsigned char mode_ref_lf_delta_update;*/
mbd->mode_ref_lf_delta_enabled = xd->mode_ref_lf_delta_enabled;
@@ -73,10 +73,10 @@ static void setup_decoding_thread_data(VP8D_COMP *pbi, MACROBLOCKD *xd, MB_ROW_D
mbd->current_bc = &pbi->mbc[0];
memcpy(mbd->dequant_y1_dc, xd->dequant_y1_dc, sizeof(xd->dequant_y1_dc));
memcpy(mbd->dequant_y1, xd->dequant_y1, sizeof(xd->dequant_y1));
memcpy(mbd->dequant_y2, xd->dequant_y2, sizeof(xd->dequant_y2));
memcpy(mbd->dequant_uv, xd->dequant_uv, sizeof(xd->dequant_uv));
vpx_memcpy(mbd->dequant_y1_dc, xd->dequant_y1_dc, sizeof(xd->dequant_y1_dc));
vpx_memcpy(mbd->dequant_y1, xd->dequant_y1, sizeof(xd->dequant_y1));
vpx_memcpy(mbd->dequant_y2, xd->dequant_y2, sizeof(xd->dequant_y2));
vpx_memcpy(mbd->dequant_uv, xd->dequant_uv, sizeof(xd->dequant_uv));
mbd->fullpixel_mask = 0xffffffff;
@@ -137,7 +137,7 @@ static void mt_decode_macroblock(VP8D_COMP *pbi, MACROBLOCKD *xd,
* Better to use the predictor as reconstruction.
*/
pbi->frame_corrupt_residual = 1;
memset(xd->qcoeff, 0, sizeof(xd->qcoeff));
vpx_memset(xd->qcoeff, 0, sizeof(xd->qcoeff));
vp8_conceal_corrupt_mb(xd);
@@ -146,7 +146,7 @@ static void mt_decode_macroblock(VP8D_COMP *pbi, MACROBLOCKD *xd,
/* force idct to be skipped for B_PRED and use the
* prediction only for reconstruction
* */
memset(xd->eobs, 0, 25);
vpx_memset(xd->eobs, 0, 25);
}
}
#endif
@@ -179,7 +179,7 @@ static void mt_decode_macroblock(VP8D_COMP *pbi, MACROBLOCKD *xd,
/* clear out residual eob info */
if(xd->mode_info_context->mbmi.mb_skip_coeff)
memset(xd->eobs, 0, 25);
vpx_memset(xd->eobs, 0, 25);
intra_prediction_down_copy(xd, xd->recon_above[0] + 16);
@@ -229,7 +229,7 @@ static void mt_decode_macroblock(VP8D_COMP *pbi, MACROBLOCKD *xd,
{
vp8_dc_only_idct_add(b->qcoeff[0] * DQC[0],
dst, dst_stride, dst, dst_stride);
memset(b->qcoeff, 0, 2 * sizeof(b->qcoeff[0]));
vpx_memset(b->qcoeff, 0, 2 * sizeof(b->qcoeff[0]));
}
}
}
@@ -266,14 +266,14 @@ static void mt_decode_macroblock(VP8D_COMP *pbi, MACROBLOCKD *xd,
vp8_short_inv_walsh4x4(&b->dqcoeff[0],
xd->qcoeff);
memset(b->qcoeff, 0, 16 * sizeof(b->qcoeff[0]));
vpx_memset(b->qcoeff, 0, 16 * sizeof(b->qcoeff[0]));
}
else
{
b->dqcoeff[0] = b->qcoeff[0] * xd->dequant_y2[0];
vp8_short_inv_walsh4x4_1(&b->dqcoeff[0],
xd->qcoeff);
memset(b->qcoeff, 0, 2 * sizeof(b->qcoeff[0]));
vpx_memset(b->qcoeff, 0, 2 * sizeof(b->qcoeff[0]));
}
/* override the dc dequant constant in order to preserve the
@@ -360,7 +360,7 @@ static void mt_decode_mb_rows(VP8D_COMP *pbi, MACROBLOCKD *xd, int start_mb_row)
/* reset contexts */
xd->above_context = pc->above_context;
memset(xd->left_context, 0, sizeof(ENTROPY_CONTEXT_PLANES));
vpx_memset(xd->left_context, 0, sizeof(ENTROPY_CONTEXT_PLANES));
xd->left_available = 0;
@@ -499,9 +499,9 @@ static void mt_decode_mb_rows(VP8D_COMP *pbi, MACROBLOCKD *xd, int start_mb_row)
if( mb_row != pc->mb_rows-1 )
{
/* Save decoded MB last row data for next-row decoding */
memcpy((pbi->mt_yabove_row[mb_row + 1] + 32 + mb_col*16), (xd->dst.y_buffer + 15 * recon_y_stride), 16);
memcpy((pbi->mt_uabove_row[mb_row + 1] + 16 + mb_col*8), (xd->dst.u_buffer + 7 * recon_uv_stride), 8);
memcpy((pbi->mt_vabove_row[mb_row + 1] + 16 + mb_col*8), (xd->dst.v_buffer + 7 * recon_uv_stride), 8);
vpx_memcpy((pbi->mt_yabove_row[mb_row + 1] + 32 + mb_col*16), (xd->dst.y_buffer + 15 * recon_y_stride), 16);
vpx_memcpy((pbi->mt_uabove_row[mb_row + 1] + 16 + mb_col*8), (xd->dst.u_buffer + 7 * recon_uv_stride), 8);
vpx_memcpy((pbi->mt_vabove_row[mb_row + 1] + 16 + mb_col*8), (xd->dst.v_buffer + 7 * recon_uv_stride), 8);
}
/* save left_col for next MB decoding */
@@ -876,23 +876,23 @@ void vp8mt_decode_mb_rows( VP8D_COMP *pbi, MACROBLOCKD *xd)
if (filter_level)
{
/* Set above_row buffer to 127 for decoding first MB row */
memset(pbi->mt_yabove_row[0] + VP8BORDERINPIXELS-1, 127, yv12_fb_new->y_width + 5);
memset(pbi->mt_uabove_row[0] + (VP8BORDERINPIXELS>>1)-1, 127, (yv12_fb_new->y_width>>1) +5);
memset(pbi->mt_vabove_row[0] + (VP8BORDERINPIXELS>>1)-1, 127, (yv12_fb_new->y_width>>1) +5);
vpx_memset(pbi->mt_yabove_row[0] + VP8BORDERINPIXELS-1, 127, yv12_fb_new->y_width + 5);
vpx_memset(pbi->mt_uabove_row[0] + (VP8BORDERINPIXELS>>1)-1, 127, (yv12_fb_new->y_width>>1) +5);
vpx_memset(pbi->mt_vabove_row[0] + (VP8BORDERINPIXELS>>1)-1, 127, (yv12_fb_new->y_width>>1) +5);
for (j=1; j<pc->mb_rows; j++)
{
memset(pbi->mt_yabove_row[j] + VP8BORDERINPIXELS-1, (unsigned char)129, 1);
memset(pbi->mt_uabove_row[j] + (VP8BORDERINPIXELS>>1)-1, (unsigned char)129, 1);
memset(pbi->mt_vabove_row[j] + (VP8BORDERINPIXELS>>1)-1, (unsigned char)129, 1);
vpx_memset(pbi->mt_yabove_row[j] + VP8BORDERINPIXELS-1, (unsigned char)129, 1);
vpx_memset(pbi->mt_uabove_row[j] + (VP8BORDERINPIXELS>>1)-1, (unsigned char)129, 1);
vpx_memset(pbi->mt_vabove_row[j] + (VP8BORDERINPIXELS>>1)-1, (unsigned char)129, 1);
}
/* Set left_col to 129 initially */
for (j=0; j<pc->mb_rows; j++)
{
memset(pbi->mt_yleft_col[j], (unsigned char)129, 16);
memset(pbi->mt_uleft_col[j], (unsigned char)129, 8);
memset(pbi->mt_vleft_col[j], (unsigned char)129, 8);
vpx_memset(pbi->mt_yleft_col[j], (unsigned char)129, 16);
vpx_memset(pbi->mt_uleft_col[j], (unsigned char)129, 8);
vpx_memset(pbi->mt_vleft_col[j], (unsigned char)129, 8);
}
/* Initialize the loop filter for this frame. */

View File

@@ -1543,7 +1543,7 @@ void vp8_pack_bitstream(VP8_COMP *cpi, unsigned char *dest, unsigned char * dest
if (pc->refresh_entropy_probs == 0)
{
/* save a copy for later refresh */
memcpy(&cpi->common.lfc, &cpi->common.fc, sizeof(cpi->common.fc));
vpx_memcpy(&cpi->common.lfc, &cpi->common.fc, sizeof(cpi->common.fc));
}
vp8_update_coef_probs(cpi);
@@ -1620,7 +1620,7 @@ void vp8_pack_bitstream(VP8_COMP *cpi, unsigned char *dest, unsigned char * dest
/* concatenate partition buffers */
for(i = 0; i < num_part; i++)
{
memmove(dp, cpi->partition_d[i+1], cpi->partition_sz[i+1]);
vpx_memmove(dp, cpi->partition_d[i+1], cpi->partition_sz[i+1]);
cpi->partition_d[i+1] = dp;
dp += cpi->partition_sz[i+1];
}

View File

@@ -415,8 +415,8 @@ int vp8_denoiser_allocate(VP8_DENOISER *denoiser, int width, int height,
vp8_denoiser_free(denoiser);
return 1;
}
memset(denoiser->yv12_running_avg[i].buffer_alloc, 0,
denoiser->yv12_running_avg[i].frame_size);
vpx_memset(denoiser->yv12_running_avg[i].buffer_alloc, 0,
denoiser->yv12_running_avg[i].frame_size);
}
denoiser->yv12_mc_running_avg.flags = 0;
@@ -428,19 +428,19 @@ int vp8_denoiser_allocate(VP8_DENOISER *denoiser, int width, int height,
return 1;
}
memset(denoiser->yv12_mc_running_avg.buffer_alloc, 0,
denoiser->yv12_mc_running_avg.frame_size);
vpx_memset(denoiser->yv12_mc_running_avg.buffer_alloc, 0,
denoiser->yv12_mc_running_avg.frame_size);
if (vp8_yv12_alloc_frame_buffer(&denoiser->yv12_last_source, width,
height, VP8BORDERINPIXELS) < 0) {
vp8_denoiser_free(denoiser);
return 1;
}
memset(denoiser->yv12_last_source.buffer_alloc, 0,
denoiser->yv12_last_source.frame_size);
vpx_memset(denoiser->yv12_last_source.buffer_alloc, 0,
denoiser->yv12_last_source.frame_size);
denoiser->denoise_state = vpx_calloc((num_mb_rows * num_mb_cols), 1);
memset(denoiser->denoise_state, 0, (num_mb_rows * num_mb_cols));
vpx_memset(denoiser->denoise_state, 0, (num_mb_rows * num_mb_cols));
vp8_denoiser_set_parameters(denoiser, mode);
denoiser->nmse_source_diff = 0;
denoiser->nmse_source_diff_count = 0;

View File

@@ -155,8 +155,8 @@ static void calc_av_activity( VP8_COMP *cpi, int64_t activity_sum )
cpi->common.MBs));
/* Copy map to sort list */
memcpy( sortlist, cpi->mb_activity_map,
sizeof(unsigned int) * cpi->common.MBs );
vpx_memcpy( sortlist, cpi->mb_activity_map,
sizeof(unsigned int) * cpi->common.MBs );
/* Ripple each value down to its correct position */
@@ -665,7 +665,8 @@ static void init_encode_frame_mb_context(VP8_COMP *cpi)
x->mvc = cm->fc.mvc;
memset(cm->above_context, 0, sizeof(ENTROPY_CONTEXT_PLANES) * cm->mb_cols);
vpx_memset(cm->above_context, 0,
sizeof(ENTROPY_CONTEXT_PLANES) * cm->mb_cols);
/* Special case treatment when GF and ARF are not sensible options
* for reference
@@ -743,7 +744,7 @@ void vp8_encode_frame(VP8_COMP *cpi)
const int num_part = (1 << cm->multi_token_partition);
#endif
memset(segment_counts, 0, sizeof(segment_counts));
vpx_memset(segment_counts, 0, sizeof(segment_counts));
totalrate = 0;
if (cpi->compressor_speed == 2)
@@ -973,7 +974,7 @@ void vp8_encode_frame(VP8_COMP *cpi)
int i;
/* Set to defaults */
memset(xd->mb_segment_tree_probs, 255 , sizeof(xd->mb_segment_tree_probs));
vpx_memset(xd->mb_segment_tree_probs, 255 , sizeof(xd->mb_segment_tree_probs));
tot_count = segment_counts[0] + segment_counts[1] + segment_counts[2] + segment_counts[3];

View File

@@ -506,8 +506,8 @@ static void optimize_mb(MACROBLOCK *x)
ENTROPY_CONTEXT *ta;
ENTROPY_CONTEXT *tl;
memcpy(&t_above, x->e_mbd.above_context, sizeof(ENTROPY_CONTEXT_PLANES));
memcpy(&t_left, x->e_mbd.left_context, sizeof(ENTROPY_CONTEXT_PLANES));
vpx_memcpy(&t_above, x->e_mbd.above_context, sizeof(ENTROPY_CONTEXT_PLANES));
vpx_memcpy(&t_left, x->e_mbd.left_context, sizeof(ENTROPY_CONTEXT_PLANES));
ta = (ENTROPY_CONTEXT *)&t_above;
tl = (ENTROPY_CONTEXT *)&t_left;
@@ -555,8 +555,8 @@ void vp8_optimize_mby(MACROBLOCK *x)
if (!x->e_mbd.left_context)
return;
memcpy(&t_above, x->e_mbd.above_context, sizeof(ENTROPY_CONTEXT_PLANES));
memcpy(&t_left, x->e_mbd.left_context, sizeof(ENTROPY_CONTEXT_PLANES));
vpx_memcpy(&t_above, x->e_mbd.above_context, sizeof(ENTROPY_CONTEXT_PLANES));
vpx_memcpy(&t_left, x->e_mbd.left_context, sizeof(ENTROPY_CONTEXT_PLANES));
ta = (ENTROPY_CONTEXT *)&t_above;
tl = (ENTROPY_CONTEXT *)&t_left;
@@ -595,8 +595,8 @@ void vp8_optimize_mbuv(MACROBLOCK *x)
if (!x->e_mbd.left_context)
return;
memcpy(&t_above, x->e_mbd.above_context, sizeof(ENTROPY_CONTEXT_PLANES));
memcpy(&t_left, x->e_mbd.left_context, sizeof(ENTROPY_CONTEXT_PLANES));
vpx_memcpy(&t_above, x->e_mbd.above_context, sizeof(ENTROPY_CONTEXT_PLANES));
vpx_memcpy(&t_left, x->e_mbd.left_context, sizeof(ENTROPY_CONTEXT_PLANES));
ta = (ENTROPY_CONTEXT *)&t_above;
tl = (ENTROPY_CONTEXT *)&t_left;

View File

@@ -416,13 +416,14 @@ static void setup_mbby_copy(MACROBLOCK *mbdst, MACROBLOCK *mbsrc)
zd->subpixel_predict16x16 = xd->subpixel_predict16x16;
zd->segmentation_enabled = xd->segmentation_enabled;
zd->mb_segement_abs_delta = xd->mb_segement_abs_delta;
memcpy(zd->segment_feature_data, xd->segment_feature_data,
sizeof(xd->segment_feature_data));
vpx_memcpy(zd->segment_feature_data, xd->segment_feature_data,
sizeof(xd->segment_feature_data));
memcpy(zd->dequant_y1_dc, xd->dequant_y1_dc, sizeof(xd->dequant_y1_dc));
memcpy(zd->dequant_y1, xd->dequant_y1, sizeof(xd->dequant_y1));
memcpy(zd->dequant_y2, xd->dequant_y2, sizeof(xd->dequant_y2));
memcpy(zd->dequant_uv, xd->dequant_uv, sizeof(xd->dequant_uv));
vpx_memcpy(zd->dequant_y1_dc, xd->dequant_y1_dc,
sizeof(xd->dequant_y1_dc));
vpx_memcpy(zd->dequant_y1, xd->dequant_y1, sizeof(xd->dequant_y1));
vpx_memcpy(zd->dequant_y2, xd->dequant_y2, sizeof(xd->dequant_y2));
vpx_memcpy(zd->dequant_uv, xd->dequant_uv, sizeof(xd->dequant_uv));
#if 1
/*TODO: Remove dequant from BLOCKD. This is a temporary solution until
@@ -437,14 +438,15 @@ static void setup_mbby_copy(MACROBLOCK *mbdst, MACROBLOCK *mbsrc)
#endif
memcpy(z->rd_threshes, x->rd_threshes, sizeof(x->rd_threshes));
memcpy(z->rd_thresh_mult, x->rd_thresh_mult, sizeof(x->rd_thresh_mult));
vpx_memcpy(z->rd_threshes, x->rd_threshes, sizeof(x->rd_threshes));
vpx_memcpy(z->rd_thresh_mult, x->rd_thresh_mult,
sizeof(x->rd_thresh_mult));
z->zbin_over_quant = x->zbin_over_quant;
z->zbin_mode_boost_enabled = x->zbin_mode_boost_enabled;
z->zbin_mode_boost = x->zbin_mode_boost;
memset(z->error_bins, 0, sizeof(z->error_bins));
vpx_memset(z->error_bins, 0, sizeof(z->error_bins));
}
}
@@ -470,7 +472,7 @@ void vp8cx_init_mbrthread_data(VP8_COMP *cpi,
mbd->subpixel_predict16x16 = xd->subpixel_predict16x16;
mb->gf_active_ptr = x->gf_active_ptr;
memset(mbr_ei[i].segment_counts, 0, sizeof(mbr_ei[i].segment_counts));
vpx_memset(mbr_ei[i].segment_counts, 0, sizeof(mbr_ei[i].segment_counts));
mbr_ei[i].totalrate = 0;
mb->partition_info = x->pi + x->e_mbd.mode_info_stride * (i + 1);
@@ -545,7 +547,7 @@ int vp8cx_create_encoder_threads(VP8_COMP *cpi)
vpx_malloc(sizeof(sem_t) * th_count));
CHECK_MEM_ERROR(cpi->mb_row_ei,
vpx_memalign(32, sizeof(MB_ROW_COMP) * th_count));
memset(cpi->mb_row_ei, 0, sizeof(MB_ROW_COMP) * th_count);
vpx_memset(cpi->mb_row_ei, 0, sizeof(MB_ROW_COMP) * th_count);
CHECK_MEM_ERROR(cpi->en_thread_data,
vpx_malloc(sizeof(ENCODETHREAD_DATA) * th_count));

View File

@@ -573,7 +573,7 @@ void vp8_first_pass(VP8_COMP *cpi)
{
int flag[2] = {1, 1};
vp8_initialize_rd_consts(cpi, x, vp8_dc_quant(cm->base_qindex, cm->y1dc_delta_q));
memcpy(cm->fc.mvc, vp8_default_mv_context, sizeof(vp8_default_mv_context));
vpx_memcpy(cm->fc.mvc, vp8_default_mv_context, sizeof(vp8_default_mv_context));
vp8_build_component_cost_table(cpi->mb.mvcost, (const MV_CONTEXT *) cm->fc.mvc, flag);
}
@@ -1779,7 +1779,7 @@ static void define_gf_group(VP8_COMP *cpi, FIRSTPASS_STATS *this_frame)
start_pos = cpi->twopass.stats_in;
memset(&next_frame, 0, sizeof(next_frame)); /* assure clean */
vpx_memset(&next_frame, 0, sizeof(next_frame)); /* assure clean */
/* Load stats for the current frame. */
mod_frame_err = calculate_modified_err(cpi, this_frame);
@@ -1875,7 +1875,7 @@ static void define_gf_group(VP8_COMP *cpi, FIRSTPASS_STATS *this_frame)
break;
}
memcpy(this_frame, &next_frame, sizeof(*this_frame));
vpx_memcpy(this_frame, &next_frame, sizeof(*this_frame));
old_boost_score = boost_score;
}
@@ -2445,7 +2445,7 @@ void vp8_second_pass(VP8_COMP *cpi)
if (cpi->twopass.frames_to_key == 0)
{
/* Define next KF group and assign bits to it */
memcpy(&this_frame_copy, &this_frame, sizeof(this_frame));
vpx_memcpy(&this_frame_copy, &this_frame, sizeof(this_frame));
find_next_key_frame(cpi, &this_frame_copy);
/* Special case: Error error_resilient_mode mode does not make much
@@ -2471,7 +2471,7 @@ void vp8_second_pass(VP8_COMP *cpi)
if (cpi->frames_till_gf_update_due == 0)
{
/* Define next gf group and assign bits to it */
memcpy(&this_frame_copy, &this_frame, sizeof(this_frame));
vpx_memcpy(&this_frame_copy, &this_frame, sizeof(this_frame));
define_gf_group(cpi, &this_frame_copy);
/* If we are going to code an altref frame at the end of the group
@@ -2487,7 +2487,7 @@ void vp8_second_pass(VP8_COMP *cpi)
* to the GF group
*/
int bak = cpi->per_frame_bandwidth;
memcpy(&this_frame_copy, &this_frame, sizeof(this_frame));
vpx_memcpy(&this_frame_copy, &this_frame, sizeof(this_frame));
assign_std_frame_bits(cpi, &this_frame_copy);
cpi->per_frame_bandwidth = bak;
}
@@ -2510,14 +2510,14 @@ void vp8_second_pass(VP8_COMP *cpi)
if (cpi->common.frame_type != KEY_FRAME)
{
/* Assign bits from those allocated to the GF group */
memcpy(&this_frame_copy, &this_frame, sizeof(this_frame));
vpx_memcpy(&this_frame_copy, &this_frame, sizeof(this_frame));
assign_std_frame_bits(cpi, &this_frame_copy);
}
}
else
{
/* Assign bits from those allocated to the GF group */
memcpy(&this_frame_copy, &this_frame, sizeof(this_frame));
vpx_memcpy(&this_frame_copy, &this_frame, sizeof(this_frame));
assign_std_frame_bits(cpi, &this_frame_copy);
}
}
@@ -2658,7 +2658,7 @@ static int test_candidate_kf(VP8_COMP *cpi, FIRSTPASS_STATS *last_frame, FIRSTP
double decay_accumulator = 1.0;
double next_iiratio;
memcpy(&local_next_frame, next_frame, sizeof(*next_frame));
vpx_memcpy(&local_next_frame, next_frame, sizeof(*next_frame));
/* Note the starting file position so we can reset to it */
start_pos = cpi->twopass.stats_in;
@@ -2735,7 +2735,7 @@ static void find_next_key_frame(VP8_COMP *cpi, FIRSTPASS_STATS *this_frame)
double kf_group_coded_err = 0.0;
double recent_loop_decay[8] = {1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0};
memset(&next_frame, 0, sizeof(next_frame));
vpx_memset(&next_frame, 0, sizeof(next_frame));
vp8_clear_system_state();
start_position = cpi->twopass.stats_in;
@@ -2756,7 +2756,7 @@ static void find_next_key_frame(VP8_COMP *cpi, FIRSTPASS_STATS *this_frame)
cpi->twopass.frames_to_key = 1;
/* Take a copy of the initial frame details */
memcpy(&first_frame, this_frame, sizeof(*this_frame));
vpx_memcpy(&first_frame, this_frame, sizeof(*this_frame));
cpi->twopass.kf_group_bits = 0;
cpi->twopass.kf_group_error_left = 0;
@@ -2779,7 +2779,7 @@ static void find_next_key_frame(VP8_COMP *cpi, FIRSTPASS_STATS *this_frame)
kf_group_coded_err += this_frame->coded_error;
/* Load the next frame's stats. */
memcpy(&last_frame, this_frame, sizeof(*this_frame));
vpx_memcpy(&last_frame, this_frame, sizeof(*this_frame));
input_stats(cpi, this_frame);
/* Provided that we are not at the end of the file... */
@@ -2847,7 +2847,7 @@ static void find_next_key_frame(VP8_COMP *cpi, FIRSTPASS_STATS *this_frame)
cpi->twopass.frames_to_key /= 2;
/* Copy first frame details */
memcpy(&tmp_frame, &first_frame, sizeof(first_frame));
vpx_memcpy(&tmp_frame, &first_frame, sizeof(first_frame));
/* Reset to the start of the group */
reset_fpf_position(cpi, start_position);
@@ -2969,6 +2969,7 @@ static void find_next_key_frame(VP8_COMP *cpi, FIRSTPASS_STATS *this_frame)
*/
decay_accumulator = 1.0;
boost_score = 0.0;
loop_decay_rate = 1.00; /* Starting decay rate */
for (i = 0 ; i < cpi->twopass.frames_to_key ; i++)
{
@@ -3212,7 +3213,7 @@ static void find_next_key_frame(VP8_COMP *cpi, FIRSTPASS_STATS *this_frame)
int new_width = cpi->oxcf.Width;
int new_height = cpi->oxcf.Height;
int projected_buffer_level;
int projected_buffer_level = (int)cpi->buffer_level;
int tmp_q;
double projected_bits_perframe;

View File

@@ -1978,8 +1978,8 @@ void print_mode_context(void)
#ifdef VP8_ENTROPY_STATS
void init_mv_ref_counts()
{
memset(mv_ref_ct, 0, sizeof(mv_ref_ct));
memset(mv_mode_cts, 0, sizeof(mv_mode_cts));
vpx_memset(mv_ref_ct, 0, sizeof(mv_ref_ct));
vpx_memset(mv_mode_cts, 0, sizeof(mv_mode_cts));
}
void accum_mv_refs(MB_PREDICTION_MODE m, const int ct[4])

View File

@@ -428,10 +428,10 @@ static void setup_features(VP8_COMP *cpi)
cpi->mb.e_mbd.mode_ref_lf_delta_enabled = 0;
cpi->mb.e_mbd.mode_ref_lf_delta_update = 0;
memset(cpi->mb.e_mbd.ref_lf_deltas, 0, sizeof(cpi->mb.e_mbd.ref_lf_deltas));
memset(cpi->mb.e_mbd.mode_lf_deltas, 0, sizeof(cpi->mb.e_mbd.mode_lf_deltas));
memset(cpi->mb.e_mbd.last_ref_lf_deltas, 0, sizeof(cpi->mb.e_mbd.ref_lf_deltas));
memset(cpi->mb.e_mbd.last_mode_lf_deltas, 0, sizeof(cpi->mb.e_mbd.mode_lf_deltas));
vpx_memset(cpi->mb.e_mbd.ref_lf_deltas, 0, sizeof(cpi->mb.e_mbd.ref_lf_deltas));
vpx_memset(cpi->mb.e_mbd.mode_lf_deltas, 0, sizeof(cpi->mb.e_mbd.mode_lf_deltas));
vpx_memset(cpi->mb.e_mbd.last_ref_lf_deltas, 0, sizeof(cpi->mb.e_mbd.ref_lf_deltas));
vpx_memset(cpi->mb.e_mbd.last_mode_lf_deltas, 0, sizeof(cpi->mb.e_mbd.mode_lf_deltas));
set_default_lf_deltas(cpi);
@@ -508,7 +508,7 @@ static void disable_segmentation(VP8_COMP *cpi)
static void set_segmentation_map(VP8_COMP *cpi, unsigned char *segmentation_map)
{
/* Copy in the new segmentation map */
memcpy(cpi->segmentation_map, segmentation_map, (cpi->common.mb_rows * cpi->common.mb_cols));
vpx_memcpy(cpi->segmentation_map, segmentation_map, (cpi->common.mb_rows * cpi->common.mb_cols));
/* Signal that the map should be updated. */
cpi->mb.e_mbd.update_mb_segmentation_map = 1;
@@ -530,7 +530,7 @@ static void set_segmentation_map(VP8_COMP *cpi, unsigned char *segmentation_map)
static void set_segment_data(VP8_COMP *cpi, signed char *feature_data, unsigned char abs_delta)
{
cpi->mb.e_mbd.mb_segement_abs_delta = abs_delta;
memcpy(cpi->segment_feature_data, feature_data, sizeof(cpi->segment_feature_data));
vpx_memcpy(cpi->segment_feature_data, feature_data, sizeof(cpi->segment_feature_data));
}
@@ -602,7 +602,7 @@ static void cyclic_background_refresh(VP8_COMP *cpi, int Q, int lf_adjustment)
// Set every macroblock to be eligible for update.
// For key frame this will reset seg map to 0.
memset(cpi->segmentation_map, 0, mbs_in_frame);
vpx_memset(cpi->segmentation_map, 0, mbs_in_frame);
if (cpi->common.frame_type != KEY_FRAME && block_count > 0)
{
@@ -686,8 +686,8 @@ static void set_default_lf_deltas(VP8_COMP *cpi)
cpi->mb.e_mbd.mode_ref_lf_delta_enabled = 1;
cpi->mb.e_mbd.mode_ref_lf_delta_update = 1;
memset(cpi->mb.e_mbd.ref_lf_deltas, 0, sizeof(cpi->mb.e_mbd.ref_lf_deltas));
memset(cpi->mb.e_mbd.mode_lf_deltas, 0, sizeof(cpi->mb.e_mbd.mode_lf_deltas));
vpx_memset(cpi->mb.e_mbd.ref_lf_deltas, 0, sizeof(cpi->mb.e_mbd.ref_lf_deltas));
vpx_memset(cpi->mb.e_mbd.mode_lf_deltas, 0, sizeof(cpi->mb.e_mbd.mode_lf_deltas));
/* Test of ref frame deltas */
cpi->mb.e_mbd.ref_lf_deltas[INTRA_FRAME] = 2;
@@ -1087,7 +1087,7 @@ void vp8_set_speed_features(VP8_COMP *cpi)
if (Speed >= 15)
sf->half_pixel_search = 0;
memset(cpi->mb.error_bins, 0, sizeof(cpi->mb.error_bins));
vpx_memset(cpi->mb.error_bins, 0, sizeof(cpi->mb.error_bins));
}; /* switch */
@@ -1298,7 +1298,7 @@ void vp8_alloc_compressor_data(VP8_COMP *cpi)
CHECK_MEM_ERROR(cpi->active_map,
vpx_calloc(cm->mb_rows * cm->mb_cols,
sizeof(*cpi->active_map)));
memset(cpi->active_map , 1, (cm->mb_rows * cm->mb_cols));
vpx_memset(cpi->active_map , 1, (cm->mb_rows * cm->mb_cols));
#if CONFIG_MULTITHREAD
if (width < 640)
@@ -1891,7 +1891,7 @@ struct VP8_COMP* vp8_create_compressor(VP8_CONFIG *oxcf)
cm = &cpi->common;
memset(cpi, 0, sizeof(VP8_COMP));
vpx_memset(cpi, 0, sizeof(VP8_COMP));
if (setjmp(cm->error.jmp))
{
@@ -2867,7 +2867,7 @@ static void update_alt_ref_frame_stats(VP8_COMP *cpi)
}
/* Update data structure that monitors level of reference to last GF */
memset(cpi->gf_active_flags, 1, (cm->mb_rows * cm->mb_cols));
vpx_memset(cpi->gf_active_flags, 1, (cm->mb_rows * cm->mb_cols));
cpi->gf_active_count = cm->mb_rows * cm->mb_cols;
/* this frame refreshes means next frames don't unless specified by user */
@@ -2916,7 +2916,7 @@ static void update_golden_frame_stats(VP8_COMP *cpi)
}
/* Update data structure that monitors level of reference to last GF */
memset(cpi->gf_active_flags, 1, (cm->mb_rows * cm->mb_cols));
vpx_memset(cpi->gf_active_flags, 1, (cm->mb_rows * cm->mb_cols));
cpi->gf_active_count = cm->mb_rows * cm->mb_cols;
/* this frame refreshes means next frames don't unless specified by
@@ -3830,9 +3830,9 @@ static void encode_frame_to_data_rate
}
// Reset the zero_last counter to 0 on key frame.
memset(cpi->consec_zero_last, 0, cm->mb_rows * cm->mb_cols);
memset(cpi->consec_zero_last_mvbias, 0,
(cpi->common.mb_rows * cpi->common.mb_cols));
vpx_memset(cpi->consec_zero_last, 0, cm->mb_rows * cm->mb_cols);
vpx_memset(cpi->consec_zero_last_mvbias, 0,
(cpi->common.mb_rows * cpi->common.mb_cols));
}
#if 0
@@ -4362,9 +4362,9 @@ static void encode_frame_to_data_rate
disable_segmentation(cpi);
}
// Reset the zero_last counter to 0 on key frame.
memset(cpi->consec_zero_last, 0, cm->mb_rows * cm->mb_cols);
memset(cpi->consec_zero_last_mvbias, 0,
(cpi->common.mb_rows * cpi->common.mb_cols));
vpx_memset(cpi->consec_zero_last, 0, cm->mb_rows * cm->mb_cols);
vpx_memset(cpi->consec_zero_last_mvbias, 0,
(cpi->common.mb_rows * cpi->common.mb_cols));
vp8_set_quantizer(cpi, Q);
}
@@ -4387,7 +4387,7 @@ static void encode_frame_to_data_rate
if (cm->refresh_entropy_probs == 0)
{
/* save a copy for later refresh */
memcpy(&cm->lfc, &cm->fc, sizeof(cm->fc));
vpx_memcpy(&cm->lfc, &cm->fc, sizeof(cm->fc));
}
vp8_update_coef_context(cpi);
@@ -5613,19 +5613,19 @@ int vp8_get_compressed_data(VP8_COMP *cpi, unsigned int *frame_flags, unsigned l
if (cm->refresh_entropy_probs == 0)
{
memcpy(&cm->fc, &cm->lfc, sizeof(cm->fc));
vpx_memcpy(&cm->fc, &cm->lfc, sizeof(cm->fc));
}
/* Save the contexts separately for alt ref, gold and last. */
/* (TODO jbb -> Optimize this with pointers to avoid extra copies. ) */
if(cm->refresh_alt_ref_frame)
memcpy(&cpi->lfc_a, &cm->fc, sizeof(cm->fc));
vpx_memcpy(&cpi->lfc_a, &cm->fc, sizeof(cm->fc));
if(cm->refresh_golden_frame)
memcpy(&cpi->lfc_g, &cm->fc, sizeof(cm->fc));
vpx_memcpy(&cpi->lfc_g, &cm->fc, sizeof(cm->fc));
if(cm->refresh_last_frame)
memcpy(&cpi->lfc_n, &cm->fc, sizeof(cm->fc));
vpx_memcpy(&cpi->lfc_n, &cm->fc, sizeof(cm->fc));
/* if its a dropped frame honor the requests on subsequent frames */
if (*size > 0)
@@ -5934,7 +5934,7 @@ int vp8_set_active_map(VP8_COMP *cpi, unsigned char *map, unsigned int rows, uns
{
if (map)
{
memcpy(cpi->active_map, map, rows * cols);
vpx_memcpy(cpi->active_map, map, rows * cols);
cpi->active_map_enabled = 1;
}
else

View File

@@ -862,8 +862,8 @@ void vp8_pick_inter_mode(VP8_COMP *cpi, MACROBLOCK *x, int recon_yoffset,
mode_mv = mode_mv_sb[sign_bias];
best_ref_mv.as_int = 0;
memset(mode_mv_sb, 0, sizeof(mode_mv_sb));
memset(&best_mbmode, 0, sizeof(best_mbmode));
vpx_memset(mode_mv_sb, 0, sizeof(mode_mv_sb));
vpx_memset(&best_mbmode, 0, sizeof(best_mbmode));
/* Setup search priorities */
#if CONFIG_MULTI_RES_ENCODING
@@ -1348,8 +1348,8 @@ void vp8_pick_inter_mode(VP8_COMP *cpi, MACROBLOCK *x, int recon_yoffset,
*returndistortion = distortion2;
best_rd_sse = sse;
best_rd = this_rd;
memcpy(&best_mbmode, &x->e_mbd.mode_info_context->mbmi,
sizeof(MB_MODE_INFO));
vpx_memcpy(&best_mbmode, &x->e_mbd.mode_info_context->mbmi,
sizeof(MB_MODE_INFO));
/* Testing this mode gave rise to an improvement in best error
* score. Lower threshold a bit for next time
@@ -1487,8 +1487,8 @@ void vp8_pick_inter_mode(VP8_COMP *cpi, MACROBLOCK *x, int recon_yoffset,
if (this_rd < best_rd)
{
memcpy(&best_mbmode, &x->e_mbd.mode_info_context->mbmi,
sizeof(MB_MODE_INFO));
vpx_memcpy(&best_mbmode, &x->e_mbd.mode_info_context->mbmi,
sizeof(MB_MODE_INFO));
}
}
@@ -1512,8 +1512,8 @@ void vp8_pick_inter_mode(VP8_COMP *cpi, MACROBLOCK *x, int recon_yoffset,
/* set to the best mb mode, this copy can be skip if x->skip since it
* already has the right content */
if (!x->skip)
memcpy(&x->e_mbd.mode_info_context->mbmi, &best_mbmode,
sizeof(MB_MODE_INFO));
vpx_memcpy(&x->e_mbd.mode_info_context->mbmi, &best_mbmode,
sizeof(MB_MODE_INFO));
if (best_mbmode.mode <= B_PRED)
{

View File

@@ -49,7 +49,7 @@ static void yv12_copy_partial_frame(YV12_BUFFER_CONFIG *src_ybc,
src_y = src_ybc->y_buffer + yoffset;
dst_y = dst_ybc->y_buffer + yoffset;
memcpy(dst_y, src_y, ystride * linestocopy);
vpx_memcpy(dst_y, src_y, ystride * linestocopy);
}
static int calc_partial_ssl_err(YV12_BUFFER_CONFIG *source,
@@ -142,7 +142,7 @@ void vp8cx_pick_filter_level_fast(YV12_BUFFER_CONFIG *sd, VP8_COMP *cpi)
int min_filter_level = get_min_filter_level(cpi, cm->base_qindex);
int max_filter_level = get_max_filter_level(cpi, cm->base_qindex);
int filt_val;
int best_filt_val;
int best_filt_val = cm->filter_level;
YV12_BUFFER_CONFIG * saved_frame = cm->frame_to_show;
/* Replace unfiltered frame buffer with a new one */
@@ -274,7 +274,8 @@ void vp8cx_pick_filter_level(YV12_BUFFER_CONFIG *sd, VP8_COMP *cpi)
int filter_step;
int filt_high = 0;
int filt_mid;
/* Start search at previous frame filter level */
int filt_mid = cm->filter_level;
int filt_low = 0;
int filt_best;
int filt_direction = 0;
@@ -286,7 +287,7 @@ void vp8cx_pick_filter_level(YV12_BUFFER_CONFIG *sd, VP8_COMP *cpi)
YV12_BUFFER_CONFIG * saved_frame = cm->frame_to_show;
memset(ss_err, 0, sizeof(ss_err));
vpx_memset(ss_err, 0, sizeof(ss_err));
/* Replace unfiltered frame buffer with a new one */
cm->frame_to_show = &cpi->pick_lf_lvl_frame;

View File

@@ -0,0 +1,160 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include "vp8/encoder/variance.h"
#include "vp8/encoder/onyx_int.h"
SADFunction *vp8_sad16x16;
SADFunction *vp8_sad16x8;
SADFunction *vp8_sad8x16;
SADFunction *vp8_sad8x8;
SADFunction *vp8_sad4x4;
variance_function *vp8_variance4x4;
variance_function *vp8_variance8x8;
variance_function *vp8_variance8x16;
variance_function *vp8_variance16x8;
variance_function *vp8_variance16x16;
variance_function *vp8_mse16x16;
sub_pixel_variance_function *vp8_sub_pixel_variance4x4;
sub_pixel_variance_function *vp8_sub_pixel_variance8x8;
sub_pixel_variance_function *vp8_sub_pixel_variance8x16;
sub_pixel_variance_function *vp8_sub_pixel_variance16x8;
sub_pixel_variance_function *vp8_sub_pixel_variance16x16;
int (*vp8_block_error)(short *coeff, short *dqcoeff);
int (*vp8_mbblock_error)(MACROBLOCK *mb, int dc);
int (*vp8_mbuverror)(MACROBLOCK *mb);
unsigned int (*vp8_get_mb_ss)(short *);
void (*vp8_short_fdct4x4)(short *input, short *output, int pitch);
void (*vp8_short_fdct8x4)(short *input, short *output, int pitch);
void (*vp8_fast_fdct4x4)(short *input, short *output, int pitch);
void (*vp8_fast_fdct8x4)(short *input, short *output, int pitch);
void (*short_walsh4x4)(short *input, short *output, int pitch);
void (*vp8_subtract_b)(BLOCK *be, BLOCKD *bd, int pitch);
void (*vp8_subtract_mby)(short *diff, unsigned char *src, unsigned char *pred, int stride);
void (*vp8_subtract_mbuv)(short *diff, unsigned char *usrc, unsigned char *vsrc, unsigned char *pred, int stride);
void (*vp8_fast_quantize_b)(BLOCK *b, BLOCKD *d);
unsigned int (*vp8_get4x4sse_cs)(unsigned char *src_ptr, int source_stride, unsigned char *ref_ptr, int recon_stride);
// c imports
extern int block_error_c(short *coeff, short *dqcoeff);
extern int vp8_mbblock_error_c(MACROBLOCK *mb, int dc);
extern int vp8_mbuverror_c(MACROBLOCK *mb);
extern unsigned int vp8_get8x8var_c(unsigned char *src_ptr, int source_stride, unsigned char *ref_ptr, int recon_stride, unsigned int *SSE, int *Sum);
extern void short_fdct4x4_c(short *input, short *output, int pitch);
extern void short_fdct8x4_c(short *input, short *output, int pitch);
extern void vp8_short_walsh4x4_c(short *input, short *output, int pitch);
extern void vp8_subtract_b_c(BLOCK *be, BLOCKD *bd, int pitch);
extern void subtract_mby_c(short *diff, unsigned char *src, unsigned char *pred, int stride);
extern void subtract_mbuv_c(short *diff, unsigned char *usrc, unsigned char *vsrc, unsigned char *pred, int stride);
extern void vp8_fast_quantize_b_c(BLOCK *b, BLOCKD *d);
extern SADFunction sad16x16_c;
extern SADFunction sad16x8_c;
extern SADFunction sad8x16_c;
extern SADFunction sad8x8_c;
extern SADFunction sad4x4_c;
extern variance_function variance16x16_c;
extern variance_function variance8x16_c;
extern variance_function variance16x8_c;
extern variance_function variance8x8_c;
extern variance_function variance4x4_c;
extern variance_function mse16x16_c;
extern sub_pixel_variance_function sub_pixel_variance4x4_c;
extern sub_pixel_variance_function sub_pixel_variance8x8_c;
extern sub_pixel_variance_function sub_pixel_variance8x16_c;
extern sub_pixel_variance_function sub_pixel_variance16x8_c;
extern sub_pixel_variance_function sub_pixel_variance16x16_c;
extern unsigned int vp8_get_mb_ss_c(short *);
extern unsigned int vp8_get4x4sse_cs_c(unsigned char *src_ptr, int source_stride, unsigned char *ref_ptr, int recon_stride);
// ppc
extern int vp8_block_error_ppc(short *coeff, short *dqcoeff);
extern void vp8_short_fdct4x4_ppc(short *input, short *output, int pitch);
extern void vp8_short_fdct8x4_ppc(short *input, short *output, int pitch);
extern void vp8_subtract_mby_ppc(short *diff, unsigned char *src, unsigned char *pred, int stride);
extern void vp8_subtract_mbuv_ppc(short *diff, unsigned char *usrc, unsigned char *vsrc, unsigned char *pred, int stride);
extern SADFunction vp8_sad16x16_ppc;
extern SADFunction vp8_sad16x8_ppc;
extern SADFunction vp8_sad8x16_ppc;
extern SADFunction vp8_sad8x8_ppc;
extern SADFunction vp8_sad4x4_ppc;
extern variance_function vp8_variance16x16_ppc;
extern variance_function vp8_variance8x16_ppc;
extern variance_function vp8_variance16x8_ppc;
extern variance_function vp8_variance8x8_ppc;
extern variance_function vp8_variance4x4_ppc;
extern variance_function vp8_mse16x16_ppc;
extern sub_pixel_variance_function vp8_sub_pixel_variance4x4_ppc;
extern sub_pixel_variance_function vp8_sub_pixel_variance8x8_ppc;
extern sub_pixel_variance_function vp8_sub_pixel_variance8x16_ppc;
extern sub_pixel_variance_function vp8_sub_pixel_variance16x8_ppc;
extern sub_pixel_variance_function vp8_sub_pixel_variance16x16_ppc;
extern unsigned int vp8_get8x8var_ppc(unsigned char *src_ptr, int source_stride, unsigned char *ref_ptr, int recon_stride, unsigned int *SSE, int *Sum);
extern unsigned int vp8_get16x16var_ppc(unsigned char *src_ptr, int source_stride, unsigned char *ref_ptr, int recon_stride, unsigned int *SSE, int *Sum);
void vp8_cmachine_specific_config(void)
{
// Pure C:
vp8_mbuverror = vp8_mbuverror_c;
vp8_fast_quantize_b = vp8_fast_quantize_b_c;
vp8_short_fdct4x4 = vp8_short_fdct4x4_ppc;
vp8_short_fdct8x4 = vp8_short_fdct8x4_ppc;
vp8_fast_fdct4x4 = vp8_short_fdct4x4_ppc;
vp8_fast_fdct8x4 = vp8_short_fdct8x4_ppc;
short_walsh4x4 = vp8_short_walsh4x4_c;
vp8_variance4x4 = vp8_variance4x4_ppc;
vp8_variance8x8 = vp8_variance8x8_ppc;
vp8_variance8x16 = vp8_variance8x16_ppc;
vp8_variance16x8 = vp8_variance16x8_ppc;
vp8_variance16x16 = vp8_variance16x16_ppc;
vp8_mse16x16 = vp8_mse16x16_ppc;
vp8_sub_pixel_variance4x4 = vp8_sub_pixel_variance4x4_ppc;
vp8_sub_pixel_variance8x8 = vp8_sub_pixel_variance8x8_ppc;
vp8_sub_pixel_variance8x16 = vp8_sub_pixel_variance8x16_ppc;
vp8_sub_pixel_variance16x8 = vp8_sub_pixel_variance16x8_ppc;
vp8_sub_pixel_variance16x16 = vp8_sub_pixel_variance16x16_ppc;
vp8_get_mb_ss = vp8_get_mb_ss_c;
vp8_get4x4sse_cs = vp8_get4x4sse_cs_c;
vp8_sad16x16 = vp8_sad16x16_ppc;
vp8_sad16x8 = vp8_sad16x8_ppc;
vp8_sad8x16 = vp8_sad8x16_ppc;
vp8_sad8x8 = vp8_sad8x8_ppc;
vp8_sad4x4 = vp8_sad4x4_ppc;
vp8_block_error = vp8_block_error_ppc;
vp8_mbblock_error = vp8_mbblock_error_c;
vp8_subtract_b = vp8_subtract_b_c;
vp8_subtract_mby = vp8_subtract_mby_ppc;
vp8_subtract_mbuv = vp8_subtract_mbuv_ppc;
}

View File

@@ -0,0 +1,153 @@
;
; Copyright (c) 2010 The WebM project authors. All Rights Reserved.
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
;
.globl vp8_subtract_mbuv_ppc
.globl vp8_subtract_mby_ppc
;# r3 short *diff
;# r4 unsigned char *usrc
;# r5 unsigned char *vsrc
;# r6 unsigned char *pred
;# r7 int stride
vp8_subtract_mbuv_ppc:
mfspr r11, 256 ;# get old VRSAVE
oris r12, r11, 0xf000
mtspr 256, r12 ;# set VRSAVE
li r9, 256
add r3, r3, r9
add r3, r3, r9
add r6, r6, r9
li r10, 16
li r9, 4
mtctr r9
vspltisw v0, 0
mbu_loop:
lvsl v5, 0, r4 ;# permutate value for alignment
lvx v1, 0, r4 ;# src
lvx v2, 0, r6 ;# pred
add r4, r4, r7
addi r6, r6, 16
vperm v1, v1, v0, v5
vmrghb v3, v0, v1 ;# unpack high src to short
vmrghb v4, v0, v2 ;# unpack high pred to short
lvsl v5, 0, r4 ;# permutate value for alignment
lvx v1, 0, r4 ;# src
add r4, r4, r7
vsubshs v3, v3, v4
stvx v3, 0, r3 ;# store out diff
vperm v1, v1, v0, v5
vmrghb v3, v0, v1 ;# unpack high src to short
vmrglb v4, v0, v2 ;# unpack high pred to short
vsubshs v3, v3, v4
stvx v3, r10, r3 ;# store out diff
addi r3, r3, 32
bdnz mbu_loop
mtctr r9
mbv_loop:
lvsl v5, 0, r5 ;# permutate value for alignment
lvx v1, 0, r5 ;# src
lvx v2, 0, r6 ;# pred
add r5, r5, r7
addi r6, r6, 16
vperm v1, v1, v0, v5
vmrghb v3, v0, v1 ;# unpack high src to short
vmrghb v4, v0, v2 ;# unpack high pred to short
lvsl v5, 0, r5 ;# permutate value for alignment
lvx v1, 0, r5 ;# src
add r5, r5, r7
vsubshs v3, v3, v4
stvx v3, 0, r3 ;# store out diff
vperm v1, v1, v0, v5
vmrghb v3, v0, v1 ;# unpack high src to short
vmrglb v4, v0, v2 ;# unpack high pred to short
vsubshs v3, v3, v4
stvx v3, r10, r3 ;# store out diff
addi r3, r3, 32
bdnz mbv_loop
mtspr 256, r11 ;# reset old VRSAVE
blr
;# r3 short *diff
;# r4 unsigned char *src
;# r5 unsigned char *pred
;# r6 int stride
vp8_subtract_mby_ppc:
mfspr r11, 256 ;# get old VRSAVE
oris r12, r11, 0xf800
mtspr 256, r12 ;# set VRSAVE
li r10, 16
mtctr r10
vspltisw v0, 0
mby_loop:
lvx v1, 0, r4 ;# src
lvx v2, 0, r5 ;# pred
add r4, r4, r6
addi r5, r5, 16
vmrghb v3, v0, v1 ;# unpack high src to short
vmrghb v4, v0, v2 ;# unpack high pred to short
vsubshs v3, v3, v4
stvx v3, 0, r3 ;# store out diff
vmrglb v3, v0, v1 ;# unpack low src to short
vmrglb v4, v0, v2 ;# unpack low pred to short
vsubshs v3, v3, v4
stvx v3, r10, r3 ;# store out diff
addi r3, r3, 32
bdnz mby_loop
mtspr 256, r11 ;# reset old VRSAVE
blr

View File

@@ -0,0 +1,205 @@
;
; Copyright (c) 2010 The WebM project authors. All Rights Reserved.
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
;
.globl vp8_short_fdct4x4_ppc
.globl vp8_short_fdct8x4_ppc
.macro load_c V, LABEL, OFF, R0, R1
lis \R0, \LABEL@ha
la \R1, \LABEL@l(\R0)
lvx \V, \OFF, \R1
.endm
;# Forward and inverse DCTs are nearly identical; only differences are
;# in normalization (fwd is twice unitary, inv is half unitary)
;# and that they are of course transposes of each other.
;#
;# The following three accomplish most of implementation and
;# are used only by ppc_idct.c and ppc_fdct.c.
.macro prologue
mfspr r11, 256 ;# get old VRSAVE
oris r12, r11, 0xfffc
mtspr 256, r12 ;# set VRSAVE
stwu r1,-32(r1) ;# create space on the stack
li r6, 16
load_c v0, dct_tab, 0, r9, r10
lvx v1, r6, r10
addi r10, r10, 32
lvx v2, 0, r10
lvx v3, r6, r10
load_c v4, ppc_dctperm_tab, 0, r9, r10
load_c v5, ppc_dctperm_tab, r6, r9, r10
load_c v6, round_tab, 0, r10, r9
.endm
.macro epilogue
addi r1, r1, 32 ;# recover stack
mtspr 256, r11 ;# reset old VRSAVE
.endm
;# Do horiz xf on two rows of coeffs v8 = a0 a1 a2 a3 b0 b1 b2 b3.
;# a/A are the even rows 0,2 b/B are the odd rows 1,3
;# For fwd transform, indices are horizontal positions, then frequencies.
;# For inverse transform, frequencies then positions.
;# The two resulting A0..A3 B0..B3 are later combined
;# and vertically transformed.
.macro two_rows_horiz Dst
vperm v9, v8, v8, v4 ;# v9 = a2 a3 a0 a1 b2 b3 b0 b1
vmsumshm v10, v0, v8, v6
vmsumshm v10, v1, v9, v10
vsraw v10, v10, v7 ;# v10 = A0 A1 B0 B1
vmsumshm v11, v2, v8, v6
vmsumshm v11, v3, v9, v11
vsraw v11, v11, v7 ;# v11 = A2 A3 B2 B3
vpkuwum v10, v10, v11 ;# v10 = A0 A1 B0 B1 A2 A3 B2 B3
vperm \Dst, v10, v10, v5 ;# Dest = A0 B0 A1 B1 A2 B2 A3 B3
.endm
;# Vertical xf on two rows. DCT values in comments are for inverse transform;
;# forward transform uses transpose.
.macro two_rows_vert Ceven, Codd
vspltw v8, \Ceven, 0 ;# v8 = c00 c10 or c02 c12 four times
vspltw v9, \Codd, 0 ;# v9 = c20 c30 or c22 c32 ""
vmsumshm v8, v8, v12, v6
vmsumshm v8, v9, v13, v8
vsraw v10, v8, v7
vspltw v8, \Codd, 1 ;# v8 = c01 c11 or c03 c13
vspltw v9, \Ceven, 1 ;# v9 = c21 c31 or c23 c33
vmsumshm v8, v8, v12, v6
vmsumshm v8, v9, v13, v8
vsraw v8, v8, v7
vpkuwum v8, v10, v8 ;# v8 = rows 0,1 or 2,3
.endm
.macro two_rows_h Dest
stw r0, 0(r8)
lwz r0, 4(r3)
stw r0, 4(r8)
lwzux r0, r3,r5
stw r0, 8(r8)
lwz r0, 4(r3)
stw r0, 12(r8)
lvx v8, 0,r8
two_rows_horiz \Dest
.endm
.align 2
;# r3 short *input
;# r4 short *output
;# r5 int pitch
vp8_short_fdct4x4_ppc:
prologue
vspltisw v7, 14 ;# == 14, fits in 5 signed bits
addi r8, r1, 0
lwz r0, 0(r3)
two_rows_h v12 ;# v12 = H00 H10 H01 H11 H02 H12 H03 H13
lwzux r0, r3, r5
two_rows_h v13 ;# v13 = H20 H30 H21 H31 H22 H32 H23 H33
lvx v6, r6, r9 ;# v6 = Vround
vspltisw v7, -16 ;# == 16 == -16, only low 5 bits matter
two_rows_vert v0, v1
stvx v8, 0, r4
two_rows_vert v2, v3
stvx v8, r6, r4
epilogue
blr
.align 2
;# r3 short *input
;# r4 short *output
;# r5 int pitch
vp8_short_fdct8x4_ppc:
prologue
vspltisw v7, 14 ;# == 14, fits in 5 signed bits
addi r8, r1, 0
addi r10, r3, 0
lwz r0, 0(r3)
two_rows_h v12 ;# v12 = H00 H10 H01 H11 H02 H12 H03 H13
lwzux r0, r3, r5
two_rows_h v13 ;# v13 = H20 H30 H21 H31 H22 H32 H23 H33
lvx v6, r6, r9 ;# v6 = Vround
vspltisw v7, -16 ;# == 16 == -16, only low 5 bits matter
two_rows_vert v0, v1
stvx v8, 0, r4
two_rows_vert v2, v3
stvx v8, r6, r4
;# Next block
addi r3, r10, 8
addi r4, r4, 32
lvx v6, 0, r9 ;# v6 = Hround
vspltisw v7, 14 ;# == 14, fits in 5 signed bits
addi r8, r1, 0
lwz r0, 0(r3)
two_rows_h v12 ;# v12 = H00 H10 H01 H11 H02 H12 H03 H13
lwzux r0, r3, r5
two_rows_h v13 ;# v13 = H20 H30 H21 H31 H22 H32 H23 H33
lvx v6, r6, r9 ;# v6 = Vround
vspltisw v7, -16 ;# == 16 == -16, only low 5 bits matter
two_rows_vert v0, v1
stvx v8, 0, r4
two_rows_vert v2, v3
stvx v8, r6, r4
epilogue
blr
.data
.align 4
ppc_dctperm_tab:
.byte 4,5,6,7, 0,1,2,3, 12,13,14,15, 8,9,10,11
.byte 0,1,4,5, 2,3,6,7, 8,9,12,13, 10,11,14,15
.align 4
dct_tab:
.short 23170, 23170,-12540,-30274, 23170, 23170,-12540,-30274
.short 23170, 23170, 30274, 12540, 23170, 23170, 30274, 12540
.short 23170,-23170, 30274,-12540, 23170,-23170, 30274,-12540
.short -23170, 23170, 12540,-30274,-23170, 23170, 12540,-30274
.align 4
round_tab:
.long (1 << (14-1)), (1 << (14-1)), (1 << (14-1)), (1 << (14-1))
.long (1 << (16-1)), (1 << (16-1)), (1 << (16-1)), (1 << (16-1))

View File

@@ -0,0 +1,51 @@
;
; Copyright (c) 2010 The WebM project authors. All Rights Reserved.
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
;
.globl vp8_block_error_ppc
.align 2
;# r3 short *Coeff
;# r4 short *dqcoeff
vp8_block_error_ppc:
mfspr r11, 256 ;# get old VRSAVE
oris r12, r11, 0xf800
mtspr 256, r12 ;# set VRSAVE
stwu r1,-32(r1) ;# create space on the stack
stw r5, 12(r1) ;# tranfer dc to vector register
lvx v0, 0, r3 ;# Coeff
lvx v1, 0, r4 ;# dqcoeff
li r10, 16
vspltisw v3, 0
vsubshs v0, v0, v1
vmsumshm v2, v0, v0, v3 ;# multiply differences
lvx v0, r10, r3 ;# Coeff
lvx v1, r10, r4 ;# dqcoeff
vsubshs v0, v0, v1
vmsumshm v1, v0, v0, v2 ;# multiply differences
vsumsws v1, v1, v3 ;# sum up
stvx v1, 0, r1
lwz r3, 12(r1) ;# return value
addi r1, r1, 32 ;# recover stack
mtspr 256, r11 ;# reset old VRSAVE
blr

View File

@@ -65,8 +65,8 @@ void vp8_regular_quantize_b_c(BLOCK *b, BLOCKD *d)
short *dequant_ptr = d->dequant;
short zbin_oq_value = b->zbin_extra;
memset(qcoeff_ptr, 0, 32);
memset(dqcoeff_ptr, 0, 32);
vpx_memset(qcoeff_ptr, 0, 32);
vpx_memset(dqcoeff_ptr, 0, 32);
eob = -1;

View File

@@ -296,7 +296,7 @@ void vp8_setup_key_frame(VP8_COMP *cpi)
vp8_default_coef_probs(& cpi->common);
memcpy(cpi->common.fc.mvc, vp8_default_mv_context, sizeof(vp8_default_mv_context));
vpx_memcpy(cpi->common.fc.mvc, vp8_default_mv_context, sizeof(vp8_default_mv_context));
{
int flag[2] = {1, 1};
vp8_build_component_cost_table(cpi->mb.mvcost, (const MV_CONTEXT *) cpi->common.fc.mvc, flag);
@@ -305,9 +305,9 @@ void vp8_setup_key_frame(VP8_COMP *cpi)
/* Make sure we initialize separate contexts for altref,gold, and normal.
* TODO shouldn't need 3 different copies of structure to do this!
*/
memcpy(&cpi->lfc_a, &cpi->common.fc, sizeof(cpi->common.fc));
memcpy(&cpi->lfc_g, &cpi->common.fc, sizeof(cpi->common.fc));
memcpy(&cpi->lfc_n, &cpi->common.fc, sizeof(cpi->common.fc));
vpx_memcpy(&cpi->lfc_a, &cpi->common.fc, sizeof(cpi->common.fc));
vpx_memcpy(&cpi->lfc_g, &cpi->common.fc, sizeof(cpi->common.fc));
vpx_memcpy(&cpi->lfc_n, &cpi->common.fc, sizeof(cpi->common.fc));
cpi->common.filter_level = cpi->common.base_qindex * 3 / 8 ;

View File

@@ -555,8 +555,8 @@ static int vp8_rdcost_mby(MACROBLOCK *mb)
ENTROPY_CONTEXT *ta;
ENTROPY_CONTEXT *tl;
memcpy(&t_above, mb->e_mbd.above_context, sizeof(ENTROPY_CONTEXT_PLANES));
memcpy(&t_left, mb->e_mbd.left_context, sizeof(ENTROPY_CONTEXT_PLANES));
vpx_memcpy(&t_above, mb->e_mbd.above_context, sizeof(ENTROPY_CONTEXT_PLANES));
vpx_memcpy(&t_left, mb->e_mbd.left_context, sizeof(ENTROPY_CONTEXT_PLANES));
ta = (ENTROPY_CONTEXT *)&t_above;
tl = (ENTROPY_CONTEXT *)&t_left;
@@ -691,7 +691,7 @@ static int rd_pick_intra4x4block(
*a = tempa;
*l = templ;
copy_predictor(best_predictor, b->predictor);
memcpy(best_dqcoeff, b->dqcoeff, 32);
vpx_memcpy(best_dqcoeff, b->dqcoeff, 32);
}
}
b->bmi.as_mode = *best_mode;
@@ -715,8 +715,8 @@ static int rd_pick_intra4x4mby_modes(MACROBLOCK *mb, int *Rate,
ENTROPY_CONTEXT *tl;
const int *bmode_costs;
memcpy(&t_above, mb->e_mbd.above_context, sizeof(ENTROPY_CONTEXT_PLANES));
memcpy(&t_left, mb->e_mbd.left_context, sizeof(ENTROPY_CONTEXT_PLANES));
vpx_memcpy(&t_above, mb->e_mbd.above_context, sizeof(ENTROPY_CONTEXT_PLANES));
vpx_memcpy(&t_left, mb->e_mbd.left_context, sizeof(ENTROPY_CONTEXT_PLANES));
ta = (ENTROPY_CONTEXT *)&t_above;
tl = (ENTROPY_CONTEXT *)&t_left;
@@ -820,8 +820,8 @@ static int rd_cost_mbuv(MACROBLOCK *mb)
ENTROPY_CONTEXT *ta;
ENTROPY_CONTEXT *tl;
memcpy(&t_above, mb->e_mbd.above_context, sizeof(ENTROPY_CONTEXT_PLANES));
memcpy(&t_left, mb->e_mbd.left_context, sizeof(ENTROPY_CONTEXT_PLANES));
vpx_memcpy(&t_above, mb->e_mbd.above_context, sizeof(ENTROPY_CONTEXT_PLANES));
vpx_memcpy(&t_left, mb->e_mbd.left_context, sizeof(ENTROPY_CONTEXT_PLANES));
ta = (ENTROPY_CONTEXT *)&t_above;
tl = (ENTROPY_CONTEXT *)&t_left;
@@ -1128,8 +1128,8 @@ static void rd_check_segment(VP8_COMP *cpi, MACROBLOCK *x,
ENTROPY_CONTEXT *ta_b;
ENTROPY_CONTEXT *tl_b;
memcpy(&t_above, x->e_mbd.above_context, sizeof(ENTROPY_CONTEXT_PLANES));
memcpy(&t_left, x->e_mbd.left_context, sizeof(ENTROPY_CONTEXT_PLANES));
vpx_memcpy(&t_above, x->e_mbd.above_context, sizeof(ENTROPY_CONTEXT_PLANES));
vpx_memcpy(&t_left, x->e_mbd.left_context, sizeof(ENTROPY_CONTEXT_PLANES));
ta = (ENTROPY_CONTEXT *)&t_above;
tl = (ENTROPY_CONTEXT *)&t_left;
@@ -1172,8 +1172,8 @@ static void rd_check_segment(VP8_COMP *cpi, MACROBLOCK *x,
ENTROPY_CONTEXT *ta_s;
ENTROPY_CONTEXT *tl_s;
memcpy(&t_above_s, &t_above, sizeof(ENTROPY_CONTEXT_PLANES));
memcpy(&t_left_s, &t_left, sizeof(ENTROPY_CONTEXT_PLANES));
vpx_memcpy(&t_above_s, &t_above, sizeof(ENTROPY_CONTEXT_PLANES));
vpx_memcpy(&t_left_s, &t_left, sizeof(ENTROPY_CONTEXT_PLANES));
ta_s = (ENTROPY_CONTEXT *)&t_above_s;
tl_s = (ENTROPY_CONTEXT *)&t_left_s;
@@ -1329,14 +1329,14 @@ static void rd_check_segment(VP8_COMP *cpi, MACROBLOCK *x,
mode_selected = this_mode;
best_label_rd = this_rd;
memcpy(ta_b, ta_s, sizeof(ENTROPY_CONTEXT_PLANES));
memcpy(tl_b, tl_s, sizeof(ENTROPY_CONTEXT_PLANES));
vpx_memcpy(ta_b, ta_s, sizeof(ENTROPY_CONTEXT_PLANES));
vpx_memcpy(tl_b, tl_s, sizeof(ENTROPY_CONTEXT_PLANES));
}
} /*for each 4x4 mode*/
memcpy(ta, ta_b, sizeof(ENTROPY_CONTEXT_PLANES));
memcpy(tl, tl_b, sizeof(ENTROPY_CONTEXT_PLANES));
vpx_memcpy(ta, ta_b, sizeof(ENTROPY_CONTEXT_PLANES));
vpx_memcpy(tl, tl_b, sizeof(ENTROPY_CONTEXT_PLANES));
labels2mode(x, labels, i, mode_selected, &mode_mv[mode_selected],
bsi->ref_mv, x->mvcost);
@@ -1392,7 +1392,7 @@ static int vp8_rd_pick_best_mbsegmentation(VP8_COMP *cpi, MACROBLOCK *x,
int i;
BEST_SEG_INFO bsi;
memset(&bsi, 0, sizeof(bsi));
vpx_memset(&bsi, 0, sizeof(bsi));
bsi.segment_rd = best_rd;
bsi.ref_mv = best_ref_mv;
@@ -1661,6 +1661,7 @@ void vp8_mv_pred
mv.as_mv.row = mvx[vcnt/2];
mv.as_mv.col = mvy[vcnt/2];
find = 1;
/* sr is set to 0 to allow calling function to decide the search
* range.
*/
@@ -1925,8 +1926,8 @@ static void update_best_mode(BEST_MODE* best_mode, int this_rd,
(rd->distortion2-rd->distortion_uv));
best_mode->rd = this_rd;
memcpy(&best_mode->mbmode, &x->e_mbd.mode_info_context->mbmi, sizeof(MB_MODE_INFO));
memcpy(&best_mode->partition, x->partition_info, sizeof(PARTITION_INFO));
vpx_memcpy(&best_mode->mbmode, &x->e_mbd.mode_info_context->mbmi, sizeof(MB_MODE_INFO));
vpx_memcpy(&best_mode->partition, x->partition_info, sizeof(PARTITION_INFO));
if ((this_mode == B_PRED) || (this_mode == SPLITMV))
{
@@ -1988,9 +1989,9 @@ void vp8_rd_pick_inter_mode(VP8_COMP *cpi, MACROBLOCK *x, int recon_yoffset,
best_mode.rd = INT_MAX;
best_mode.yrd = INT_MAX;
best_mode.intra_rd = INT_MAX;
memset(mode_mv_sb, 0, sizeof(mode_mv_sb));
memset(&best_mode.mbmode, 0, sizeof(best_mode.mbmode));
memset(&best_mode.bmodes, 0, sizeof(best_mode.bmodes));
vpx_memset(mode_mv_sb, 0, sizeof(mode_mv_sb));
vpx_memset(&best_mode.mbmode, 0, sizeof(best_mode.mbmode));
vpx_memset(&best_mode.bmodes, 0, sizeof(best_mode.bmodes));
/* Setup search priorities */
get_reference_search_order(cpi, ref_frame_map);
@@ -2292,6 +2293,7 @@ void vp8_rd_pick_inter_mode(VP8_COMP *cpi, MACROBLOCK *x, int recon_yoffset,
mode_mv[NEWMV].as_int = d->bmi.mv.as_int;
/* Further step/diamond searches as necessary */
n = 0;
further_steps = (cpi->sf.max_step_search_steps - 1) - step_param;
n = num00;
@@ -2558,6 +2560,8 @@ void vp8_rd_pick_inter_mode(VP8_COMP *cpi, MACROBLOCK *x, int recon_yoffset,
intra_rd_penalty, cpi, x);
if (this_rd < best_mode.rd || x->skip)
{
/* Note index of best mode so far */
best_mode_index = mode_index;
*returnrate = rd.rate2;
*returndistortion = rd.distortion2;
update_best_mode(&best_mode, this_rd, &rd, other_cost, x);
@@ -2582,7 +2586,7 @@ void vp8_rd_pick_inter_mode(VP8_COMP *cpi, MACROBLOCK *x, int recon_yoffset,
/* macroblock modes */
memcpy(&x->e_mbd.mode_info_context->mbmi, &best_mode.mbmode, sizeof(MB_MODE_INFO));
vpx_memcpy(&x->e_mbd.mode_info_context->mbmi, &best_mode.mbmode, sizeof(MB_MODE_INFO));
if (best_mode.mbmode.mode == B_PRED)
{
@@ -2595,7 +2599,7 @@ void vp8_rd_pick_inter_mode(VP8_COMP *cpi, MACROBLOCK *x, int recon_yoffset,
for (i = 0; i < 16; i++)
xd->mode_info_context->bmi[i].mv.as_int = best_mode.bmodes[i].mv.as_int;
memcpy(x->partition_info, &best_mode.partition, sizeof(PARTITION_INFO));
vpx_memcpy(x->partition_info, &best_mode.partition, sizeof(PARTITION_INFO));
x->e_mbd.mode_info_context->mbmi.mv.as_int =
x->partition_info->bmi[15].mv.as_int;

View File

@@ -23,7 +23,7 @@ void vp8_update_gf_useage_maps(VP8_COMP *cpi, VP8_COMMON *cm, MACROBLOCK *x)
if ((cm->frame_type == KEY_FRAME) || (cm->refresh_golden_frame))
{
/* Reset Gf useage monitors */
memset(cpi->gf_active_flags, 1, (cm->mb_rows * cm->mb_cols));
vpx_memset(cpi->gf_active_flags, 1, (cm->mb_rows * cm->mb_cols));
cpi->gf_active_count = cm->mb_rows * cm->mb_cols;
}
else

View File

@@ -274,8 +274,8 @@ static void vp8_temporal_filter_iterate_c
int i, j, k;
int stride;
memset(accumulator, 0, 384*sizeof(unsigned int));
memset(count, 0, 384*sizeof(unsigned short));
vpx_memset(accumulator, 0, 384*sizeof(unsigned int));
vpx_memset(count, 0, 384*sizeof(unsigned short));
#if ALT_REF_MC_ENABLED
cpi->mb.mv_col_min = -((mb_col * 16) + (16 - 5));
@@ -502,7 +502,7 @@ void vp8_temporal_filter_prepare_c
start_frame = distance + frames_to_blur_forward;
/* Setup frame pointers, NULL indicates frame not included in filter */
memset(cpi->frames, 0, max_frames*sizeof(YV12_BUFFER_CONFIG *));
vpx_memset(cpi->frames, 0, max_frames*sizeof(YV12_BUFFER_CONFIG *));
for (frame = 0; frame < frames_to_blur; frame++)
{
int which_buffer = start_frame - frame;

View File

@@ -421,7 +421,7 @@ void vp8_tokenize_mb(VP8_COMP *cpi, MACROBLOCK *x, TOKENEXTRA **t)
void init_context_counters(void)
{
memset(context_counters, 0, sizeof(context_counters));
vpx_memset(context_counters, 0, sizeof(context_counters));
}
void print_context_counters()
@@ -596,13 +596,13 @@ void vp8_fix_contexts(MACROBLOCKD *x)
/* Clear entropy contexts for Y2 blocks */
if (x->mode_info_context->mbmi.mode != B_PRED && x->mode_info_context->mbmi.mode != SPLITMV)
{
memset(x->above_context, 0, sizeof(ENTROPY_CONTEXT_PLANES));
memset(x->left_context, 0, sizeof(ENTROPY_CONTEXT_PLANES));
vpx_memset(x->above_context, 0, sizeof(ENTROPY_CONTEXT_PLANES));
vpx_memset(x->left_context, 0, sizeof(ENTROPY_CONTEXT_PLANES));
}
else
{
memset(x->above_context, 0, sizeof(ENTROPY_CONTEXT_PLANES)-1);
memset(x->left_context, 0, sizeof(ENTROPY_CONTEXT_PLANES)-1);
vpx_memset(x->above_context, 0, sizeof(ENTROPY_CONTEXT_PLANES)-1);
vpx_memset(x->left_context, 0, sizeof(ENTROPY_CONTEXT_PLANES)-1);
}
}

View File

@@ -35,7 +35,7 @@
void vp8_regular_quantize_b_sse2(BLOCK *b, BLOCKD *d)
{
char eob = 0;
short *zbin_boost_ptr;
short *zbin_boost_ptr = b->zrun_zbin_boost;
short *qcoeff_ptr = d->qcoeff;
DECLARE_ALIGNED_ARRAY(16, short, x, 16);
DECLARE_ALIGNED_ARRAY(16, short, y, 16);
@@ -55,7 +55,7 @@ void vp8_regular_quantize_b_sse2(BLOCK *b, BLOCKD *d)
__m128i dequant0 = _mm_load_si128((__m128i *)(d->dequant));
__m128i dequant1 = _mm_load_si128((__m128i *)(d->dequant + 8));
memset(qcoeff_ptr, 0, 32);
vpx_memset(qcoeff_ptr, 0, 32);
/* Duplicate to all lanes. */
zbin_extra = _mm_shufflelo_epi16(zbin_extra, 0);

View File

@@ -10,8 +10,7 @@
#include "./vpx_config.h"
#include "./vp8_rtcd.h"
#include "./vpx_scale_rtcd.h"
#include "vp8_rtcd.h"
#include "vpx/vpx_codec.h"
#include "vpx/internal/vpx_codec_internal.h"
#include "vpx_version.h"
@@ -366,9 +365,9 @@ static vpx_codec_err_t set_vp8e_config(VP8_CONFIG *oxcf,
if (oxcf->number_of_layers > 1)
{
memcpy (oxcf->target_bitrate, cfg.ts_target_bitrate,
sizeof(cfg.ts_target_bitrate));
sizeof(cfg.ts_target_bitrate));
memcpy (oxcf->rate_decimator, cfg.ts_rate_decimator,
sizeof(cfg.ts_rate_decimator));
sizeof(cfg.ts_rate_decimator));
memcpy (oxcf->layer_id, cfg.ts_layer_id, sizeof(cfg.ts_layer_id));
}
@@ -650,7 +649,6 @@ static vpx_codec_err_t vp8e_init(vpx_codec_ctx_t *ctx,
vp8_rtcd();
vpx_scale_rtcd();
if (!ctx->priv)
{

View File

@@ -11,8 +11,7 @@
#include <stdlib.h>
#include <string.h>
#include "./vp8_rtcd.h"
#include "./vpx_scale_rtcd.h"
#include "vp8_rtcd.h"
#include "vpx/vpx_decoder.h"
#include "vpx/vp8dx.h"
#include "vpx/internal/vpx_codec_internal.h"
@@ -107,7 +106,6 @@ static vpx_codec_err_t vp8_init(vpx_codec_ctx_t *ctx,
(void) data;
vp8_rtcd();
vpx_scale_rtcd();
/* This function only allocates space for the vpx_codec_alg_priv_t
* structure. More memory may be required at the time the stream
@@ -288,8 +286,8 @@ update_fragments(vpx_codec_alg_priv_t *ctx,
if (ctx->fragments.count == 0)
{
/* New frame, reset fragment pointers and sizes */
memset((void*)ctx->fragments.ptrs, 0, sizeof(ctx->fragments.ptrs));
memset(ctx->fragments.sizes, 0, sizeof(ctx->fragments.sizes));
vpx_memset((void*)ctx->fragments.ptrs, 0, sizeof(ctx->fragments.ptrs));
vpx_memset(ctx->fragments.sizes, 0, sizeof(ctx->fragments.sizes));
}
if (ctx->fragments.enabled && !(data == NULL && data_sz == 0))
{

File diff suppressed because it is too large Load Diff

View File

@@ -1,880 +0,0 @@
/*
* Copyright (c) 2015 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include "./vp9_rtcd.h"
#include "vp9/common/mips/msa/vp9_convolve_msa.h"
const uint8_t mc_filt_mask_arr[16 * 3] = {
/* 8 width cases */
0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8,
/* 4 width cases */
0, 1, 1, 2, 2, 3, 3, 4, 16, 17, 17, 18, 18, 19, 19, 20,
/* 4 width cases */
8, 9, 9, 10, 10, 11, 11, 12, 24, 25, 25, 26, 26, 27, 27, 28
};
static void common_hv_8ht_8vt_4w_msa(const uint8_t *src, int32_t src_stride,
uint8_t *dst, int32_t dst_stride,
int8_t *filter_horiz, int8_t *filter_vert,
int32_t height) {
uint32_t loop_cnt;
v16i8 src0, src1, src2, src3, src4, src5, src6, src7, src8, src9, src10;
v16i8 filt_horiz0, filt_horiz1, filt_horiz2, filt_horiz3;
v16u8 mask0, mask1, mask2, mask3;
v8i16 filt_horiz;
v8i16 horiz_out0, horiz_out1, horiz_out2, horiz_out3, horiz_out4;
v8i16 horiz_out5, horiz_out6, horiz_out7, horiz_out8, horiz_out9;
v8i16 tmp0, tmp1, out0, out1, out2, out3, out4;
v8i16 filt, filt_vert0, filt_vert1, filt_vert2, filt_vert3;
mask0 = LOAD_UB(&mc_filt_mask_arr[16]);
src -= (3 + 3 * src_stride);
/* rearranging filter */
filt_horiz = LOAD_SH(filter_horiz);
filt_horiz0 = (v16i8)__msa_splati_h(filt_horiz, 0);
filt_horiz1 = (v16i8)__msa_splati_h(filt_horiz, 1);
filt_horiz2 = (v16i8)__msa_splati_h(filt_horiz, 2);
filt_horiz3 = (v16i8)__msa_splati_h(filt_horiz, 3);
mask1 = mask0 + 2;
mask2 = mask0 + 4;
mask3 = mask0 + 6;
LOAD_7VECS_SB(src, src_stride, src0, src1, src2, src3, src4, src5, src6);
src += (7 * src_stride);
XORI_B_7VECS_SB(src0, src1, src2, src3, src4, src5, src6,
src0, src1, src2, src3, src4, src5, src6, 128);
horiz_out0 = HORIZ_8TAP_FILT_2VECS(src0, src1, mask0, mask1, mask2, mask3,
filt_horiz0, filt_horiz1, filt_horiz2,
filt_horiz3);
horiz_out2 = HORIZ_8TAP_FILT_2VECS(src2, src3, mask0, mask1, mask2, mask3,
filt_horiz0, filt_horiz1, filt_horiz2,
filt_horiz3);
horiz_out4 = HORIZ_8TAP_FILT_2VECS(src4, src5, mask0, mask1, mask2, mask3,
filt_horiz0, filt_horiz1, filt_horiz2,
filt_horiz3);
horiz_out5 = HORIZ_8TAP_FILT_2VECS(src5, src6, mask0, mask1, mask2, mask3,
filt_horiz0, filt_horiz1, filt_horiz2,
filt_horiz3);
horiz_out1 = (v8i16)__msa_sldi_b((v16i8)horiz_out2, (v16i8)horiz_out0, 8);
horiz_out3 = (v8i16)__msa_sldi_b((v16i8)horiz_out4, (v16i8)horiz_out2, 8);
filt = LOAD_SH(filter_vert);
filt_vert0 = __msa_splati_h(filt, 0);
filt_vert1 = __msa_splati_h(filt, 1);
filt_vert2 = __msa_splati_h(filt, 2);
filt_vert3 = __msa_splati_h(filt, 3);
out0 = (v8i16)__msa_ilvev_b((v16i8)horiz_out1, (v16i8)horiz_out0);
out1 = (v8i16)__msa_ilvev_b((v16i8)horiz_out3, (v16i8)horiz_out2);
out2 = (v8i16)__msa_ilvev_b((v16i8)horiz_out5, (v16i8)horiz_out4);
for (loop_cnt = (height >> 2); loop_cnt--;) {
LOAD_4VECS_SB(src, src_stride, src7, src8, src9, src10);
src += (4 * src_stride);
XORI_B_4VECS_SB(src7, src8, src9, src10, src7, src8, src9, src10, 128);
horiz_out7 = HORIZ_8TAP_FILT_2VECS(src7, src8, mask0, mask1, mask2, mask3,
filt_horiz0, filt_horiz1, filt_horiz2,
filt_horiz3);
horiz_out6 = (v8i16)__msa_sldi_b((v16i8)horiz_out7, (v16i8)horiz_out5, 8);
out3 = (v8i16)__msa_ilvev_b((v16i8)horiz_out7, (v16i8)horiz_out6);
tmp0 = FILT_8TAP_DPADD_S_H(out0, out1, out2, out3, filt_vert0, filt_vert1,
filt_vert2, filt_vert3);
horiz_out9 = HORIZ_8TAP_FILT_2VECS(src9, src10, mask0, mask1, mask2, mask3,
filt_horiz0, filt_horiz1, filt_horiz2,
filt_horiz3);
horiz_out8 = (v8i16)__msa_sldi_b((v16i8)horiz_out9, (v16i8)horiz_out7, 8);
out4 = (v8i16)__msa_ilvev_b((v16i8)horiz_out9, (v16i8)horiz_out8);
tmp1 = FILT_8TAP_DPADD_S_H(out1, out2, out3, out4, filt_vert0, filt_vert1,
filt_vert2, filt_vert3);
tmp0 = SRARI_SATURATE_SIGNED_H(tmp0, FILTER_BITS, 7);
tmp1 = SRARI_SATURATE_SIGNED_H(tmp1, FILTER_BITS, 7);
PCKEV_2B_XORI128_STORE_4_BYTES_4(tmp0, tmp1, dst, dst_stride);
dst += (4 * dst_stride);
horiz_out5 = horiz_out9;
out0 = out2;
out1 = out3;
out2 = out4;
}
}
static void common_hv_8ht_8vt_8w_msa(const uint8_t *src, int32_t src_stride,
uint8_t *dst, int32_t dst_stride,
int8_t *filter_horiz, int8_t *filter_vert,
int32_t height) {
uint32_t loop_cnt;
v16i8 src0, src1, src2, src3, src4, src5, src6, src7, src8, src9, src10;
v16i8 filt_horiz0, filt_horiz1, filt_horiz2, filt_horiz3;
v8i16 filt_horiz, filt, filt_vert0, filt_vert1, filt_vert2, filt_vert3;
v16u8 mask0, mask1, mask2, mask3;
v8i16 horiz_out0, horiz_out1, horiz_out2, horiz_out3;
v8i16 horiz_out4, horiz_out5, horiz_out6, horiz_out7;
v8i16 horiz_out8, horiz_out9, horiz_out10;
v8i16 out0, out1, out2, out3, out4, out5, out6, out7, out8, out9;
v8i16 tmp0, tmp1, tmp2, tmp3;
mask0 = LOAD_UB(&mc_filt_mask_arr[0]);
src -= (3 + 3 * src_stride);
/* rearranging filter */
filt_horiz = LOAD_SH(filter_horiz);
filt_horiz0 = (v16i8)__msa_splati_h(filt_horiz, 0);
filt_horiz1 = (v16i8)__msa_splati_h(filt_horiz, 1);
filt_horiz2 = (v16i8)__msa_splati_h(filt_horiz, 2);
filt_horiz3 = (v16i8)__msa_splati_h(filt_horiz, 3);
mask1 = mask0 + 2;
mask2 = mask0 + 4;
mask3 = mask0 + 6;
LOAD_7VECS_SB(src, src_stride, src0, src1, src2, src3, src4, src5, src6);
src += (7 * src_stride);
XORI_B_7VECS_SB(src0, src1, src2, src3, src4, src5, src6,
src0, src1, src2, src3, src4, src5, src6, 128);
horiz_out0 = HORIZ_8TAP_FILT(src0, mask0, mask1, mask2, mask3, filt_horiz0,
filt_horiz1, filt_horiz2, filt_horiz3);
horiz_out1 = HORIZ_8TAP_FILT(src1, mask0, mask1, mask2, mask3, filt_horiz0,
filt_horiz1, filt_horiz2, filt_horiz3);
horiz_out2 = HORIZ_8TAP_FILT(src2, mask0, mask1, mask2, mask3, filt_horiz0,
filt_horiz1, filt_horiz2, filt_horiz3);
horiz_out3 = HORIZ_8TAP_FILT(src3, mask0, mask1, mask2, mask3, filt_horiz0,
filt_horiz1, filt_horiz2, filt_horiz3);
horiz_out4 = HORIZ_8TAP_FILT(src4, mask0, mask1, mask2, mask3, filt_horiz0,
filt_horiz1, filt_horiz2, filt_horiz3);
horiz_out5 = HORIZ_8TAP_FILT(src5, mask0, mask1, mask2, mask3, filt_horiz0,
filt_horiz1, filt_horiz2, filt_horiz3);
horiz_out6 = HORIZ_8TAP_FILT(src6, mask0, mask1, mask2, mask3, filt_horiz0,
filt_horiz1, filt_horiz2, filt_horiz3);
filt = LOAD_SH(filter_vert);
filt_vert0 = __msa_splati_h(filt, 0);
filt_vert1 = __msa_splati_h(filt, 1);
filt_vert2 = __msa_splati_h(filt, 2);
filt_vert3 = __msa_splati_h(filt, 3);
out0 = (v8i16)__msa_ilvev_b((v16i8)horiz_out1, (v16i8)horiz_out0);
out1 = (v8i16)__msa_ilvev_b((v16i8)horiz_out3, (v16i8)horiz_out2);
out2 = (v8i16)__msa_ilvev_b((v16i8)horiz_out5, (v16i8)horiz_out4);
out4 = (v8i16)__msa_ilvev_b((v16i8)horiz_out2, (v16i8)horiz_out1);
out5 = (v8i16)__msa_ilvev_b((v16i8)horiz_out4, (v16i8)horiz_out3);
out6 = (v8i16)__msa_ilvev_b((v16i8)horiz_out6, (v16i8)horiz_out5);
for (loop_cnt = (height >> 2); loop_cnt--;) {
LOAD_4VECS_SB(src, src_stride, src7, src8, src9, src10);
src += (4 * src_stride);
XORI_B_4VECS_SB(src7, src8, src9, src10, src7, src8, src9, src10, 128);
horiz_out7 = HORIZ_8TAP_FILT(src7, mask0, mask1, mask2, mask3, filt_horiz0,
filt_horiz1, filt_horiz2, filt_horiz3);
out3 = (v8i16)__msa_ilvev_b((v16i8)horiz_out7, (v16i8)horiz_out6);
tmp0 = FILT_8TAP_DPADD_S_H(out0, out1, out2, out3, filt_vert0, filt_vert1,
filt_vert2, filt_vert3);
tmp0 = SRARI_SATURATE_SIGNED_H(tmp0, FILTER_BITS, 7);
horiz_out8 = HORIZ_8TAP_FILT(src8, mask0, mask1, mask2, mask3, filt_horiz0,
filt_horiz1, filt_horiz2, filt_horiz3);
out7 = (v8i16)__msa_ilvev_b((v16i8)horiz_out8, (v16i8)horiz_out7);
tmp1 = FILT_8TAP_DPADD_S_H(out4, out5, out6, out7, filt_vert0, filt_vert1,
filt_vert2, filt_vert3);
tmp1 = SRARI_SATURATE_SIGNED_H(tmp1, FILTER_BITS, 7);
horiz_out9 = HORIZ_8TAP_FILT(src9, mask0, mask1, mask2, mask3, filt_horiz0,
filt_horiz1, filt_horiz2, filt_horiz3);
out8 = (v8i16)__msa_ilvev_b((v16i8)horiz_out9, (v16i8)horiz_out8);
tmp2 = FILT_8TAP_DPADD_S_H(out1, out2, out3, out8, filt_vert0, filt_vert1,
filt_vert2, filt_vert3);
tmp2 = SRARI_SATURATE_SIGNED_H(tmp2, FILTER_BITS, 7);
horiz_out10 = HORIZ_8TAP_FILT(src10, mask0, mask1, mask2, mask3,
filt_horiz0, filt_horiz1, filt_horiz2,
filt_horiz3);
out9 = (v8i16)__msa_ilvev_b((v16i8)horiz_out10, (v16i8)horiz_out9);
tmp3 = FILT_8TAP_DPADD_S_H(out5, out6, out7, out9, filt_vert0, filt_vert1,
filt_vert2, filt_vert3);
tmp3 = SRARI_SATURATE_SIGNED_H(tmp3, FILTER_BITS, 7);
PCKEV_B_4_XORI128_STORE_8_BYTES_4(tmp0, tmp1, tmp2, tmp3, dst, dst_stride);
dst += (4 * dst_stride);
horiz_out6 = horiz_out10;
out0 = out2;
out1 = out3;
out2 = out8;
out4 = out6;
out5 = out7;
out6 = out9;
}
}
static void common_hv_8ht_8vt_16w_msa(const uint8_t *src, int32_t src_stride,
uint8_t *dst, int32_t dst_stride,
int8_t *filter_horiz, int8_t *filter_vert,
int32_t height) {
int32_t multiple8_cnt;
for (multiple8_cnt = 2; multiple8_cnt--;) {
common_hv_8ht_8vt_8w_msa(src, src_stride, dst, dst_stride, filter_horiz,
filter_vert, height);
src += 8;
dst += 8;
}
}
static void common_hv_8ht_8vt_32w_msa(const uint8_t *src, int32_t src_stride,
uint8_t *dst, int32_t dst_stride,
int8_t *filter_horiz, int8_t *filter_vert,
int32_t height) {
int32_t multiple8_cnt;
for (multiple8_cnt = 4; multiple8_cnt--;) {
common_hv_8ht_8vt_8w_msa(src, src_stride, dst, dst_stride, filter_horiz,
filter_vert, height);
src += 8;
dst += 8;
}
}
static void common_hv_8ht_8vt_64w_msa(const uint8_t *src, int32_t src_stride,
uint8_t *dst, int32_t dst_stride,
int8_t *filter_horiz, int8_t *filter_vert,
int32_t height) {
int32_t multiple8_cnt;
for (multiple8_cnt = 8; multiple8_cnt--;) {
common_hv_8ht_8vt_8w_msa(src, src_stride, dst, dst_stride, filter_horiz,
filter_vert, height);
src += 8;
dst += 8;
}
}
static void common_hv_2ht_2vt_4x4_msa(const uint8_t *src, int32_t src_stride,
uint8_t *dst, int32_t dst_stride,
int8_t *filter_horiz,
int8_t *filter_vert) {
uint32_t out0, out1, out2, out3;
v16i8 src0, src1, src2, src3, src4, mask;
v16u8 res0, res1, horiz_vec;
v16u8 filt_vert, filt_horiz, vec0, vec1;
v8u16 filt, tmp0, tmp1;
v8u16 horiz_out0, horiz_out1, horiz_out2, horiz_out3, horiz_out4;
mask = LOAD_SB(&mc_filt_mask_arr[16]);
/* rearranging filter */
filt = LOAD_UH(filter_horiz);
filt_horiz = (v16u8)__msa_splati_h((v8i16)filt, 0);
filt = LOAD_UH(filter_vert);
filt_vert = (v16u8)__msa_splati_h((v8i16)filt, 0);
LOAD_5VECS_SB(src, src_stride, src0, src1, src2, src3, src4);
horiz_vec = (v16u8)__msa_vshf_b(mask, src1, src0);
horiz_out0 = __msa_dotp_u_h(horiz_vec, filt_horiz);
horiz_out0 = SRARI_SATURATE_UNSIGNED_H(horiz_out0, FILTER_BITS, 7);
horiz_vec = (v16u8)__msa_vshf_b(mask, src3, src2);
horiz_out2 = __msa_dotp_u_h(horiz_vec, filt_horiz);
horiz_out2 = SRARI_SATURATE_UNSIGNED_H(horiz_out2, FILTER_BITS, 7);
horiz_vec = (v16u8)__msa_vshf_b(mask, src4, src4);
horiz_out4 = __msa_dotp_u_h(horiz_vec, filt_horiz);
horiz_out4 = SRARI_SATURATE_UNSIGNED_H(horiz_out4, FILTER_BITS, 7);
horiz_out1 = (v8u16)__msa_sldi_b((v16i8)horiz_out2, (v16i8)horiz_out0, 8);
horiz_out3 = (v8u16)__msa_pckod_d((v2i64)horiz_out4, (v2i64)horiz_out2);
vec0 = (v16u8)__msa_ilvev_b((v16i8)horiz_out1, (v16i8)horiz_out0);
vec1 = (v16u8)__msa_ilvev_b((v16i8)horiz_out3, (v16i8)horiz_out2);
tmp0 = __msa_dotp_u_h(vec0, filt_vert);
tmp1 = __msa_dotp_u_h(vec1, filt_vert);
tmp0 = SRARI_SATURATE_UNSIGNED_H(tmp0, FILTER_BITS, 7);
tmp1 = SRARI_SATURATE_UNSIGNED_H(tmp1, FILTER_BITS, 7);
res0 = (v16u8)__msa_pckev_b((v16i8)tmp0, (v16i8)tmp0);
res1 = (v16u8)__msa_pckev_b((v16i8)tmp1, (v16i8)tmp1);
out0 = __msa_copy_u_w((v4i32)res0, 0);
out1 = __msa_copy_u_w((v4i32)res0, 1);
out2 = __msa_copy_u_w((v4i32)res1, 0);
out3 = __msa_copy_u_w((v4i32)res1, 1);
STORE_WORD(dst, out0);
dst += dst_stride;
STORE_WORD(dst, out1);
dst += dst_stride;
STORE_WORD(dst, out2);
dst += dst_stride;
STORE_WORD(dst, out3);
}
static void common_hv_2ht_2vt_4x8_msa(const uint8_t *src, int32_t src_stride,
uint8_t *dst, int32_t dst_stride,
int8_t *filter_horiz,
int8_t *filter_vert) {
uint32_t out0, out1, out2, out3;
v16i8 src0, src1, src2, src3, src4, src5, src6, src7, src8, mask;
v16u8 filt_horiz, filt_vert, horiz_vec;
v16u8 vec0, vec1, vec2, vec3;
v8u16 horiz_out0, horiz_out1, horiz_out2, horiz_out3;
v8u16 vec4, vec5, vec6, vec7, filt;
v8u16 horiz_out4, horiz_out5, horiz_out6, horiz_out7, horiz_out8;
v16i8 res0, res1, res2, res3;
mask = LOAD_SB(&mc_filt_mask_arr[16]);
/* rearranging filter */
filt = LOAD_UH(filter_horiz);
filt_horiz = (v16u8)__msa_splati_h((v8i16)filt, 0);
filt = LOAD_UH(filter_vert);
filt_vert = (v16u8)__msa_splati_h((v8i16)filt, 0);
LOAD_8VECS_SB(src, src_stride,
src0, src1, src2, src3, src4, src5, src6, src7);
src += (8 * src_stride);
src8 = LOAD_SB(src);
horiz_vec = (v16u8)__msa_vshf_b(mask, src1, src0);
horiz_out0 = __msa_dotp_u_h(horiz_vec, filt_horiz);
horiz_out0 = SRARI_SATURATE_UNSIGNED_H(horiz_out0, FILTER_BITS, 7);
horiz_vec = (v16u8)__msa_vshf_b(mask, src3, src2);
horiz_out2 = __msa_dotp_u_h(horiz_vec, filt_horiz);
horiz_out2 = SRARI_SATURATE_UNSIGNED_H(horiz_out2, FILTER_BITS, 7);
horiz_vec = (v16u8)__msa_vshf_b(mask, src5, src4);
horiz_out4 = __msa_dotp_u_h(horiz_vec, filt_horiz);
horiz_out4 = SRARI_SATURATE_UNSIGNED_H(horiz_out4, FILTER_BITS, 7);
horiz_vec = (v16u8)__msa_vshf_b(mask, src7, src6);
horiz_out6 = __msa_dotp_u_h(horiz_vec, filt_horiz);
horiz_out6 = SRARI_SATURATE_UNSIGNED_H(horiz_out6, FILTER_BITS, 7);
horiz_vec = (v16u8)__msa_vshf_b(mask, src8, src8);
horiz_out8 = __msa_dotp_u_h(horiz_vec, filt_horiz);
horiz_out8 = SRARI_SATURATE_UNSIGNED_H(horiz_out8, FILTER_BITS, 7);
horiz_out1 = (v8u16)__msa_sldi_b((v16i8)horiz_out2, (v16i8)horiz_out0, 8);
horiz_out3 = (v8u16)__msa_sldi_b((v16i8)horiz_out4, (v16i8)horiz_out2, 8);
horiz_out5 = (v8u16)__msa_sldi_b((v16i8)horiz_out6, (v16i8)horiz_out4, 8);
horiz_out7 = (v8u16)__msa_pckod_d((v2i64)horiz_out8, (v2i64)horiz_out6);
vec0 = (v16u8)__msa_ilvev_b((v16i8)horiz_out1, (v16i8)horiz_out0);
vec1 = (v16u8)__msa_ilvev_b((v16i8)horiz_out3, (v16i8)horiz_out2);
vec2 = (v16u8)__msa_ilvev_b((v16i8)horiz_out5, (v16i8)horiz_out4);
vec3 = (v16u8)__msa_ilvev_b((v16i8)horiz_out7, (v16i8)horiz_out6);
vec4 = __msa_dotp_u_h(vec0, filt_vert);
vec5 = __msa_dotp_u_h(vec1, filt_vert);
vec6 = __msa_dotp_u_h(vec2, filt_vert);
vec7 = __msa_dotp_u_h(vec3, filt_vert);
vec4 = SRARI_SATURATE_UNSIGNED_H(vec4, FILTER_BITS, 7);
vec5 = SRARI_SATURATE_UNSIGNED_H(vec5, FILTER_BITS, 7);
vec6 = SRARI_SATURATE_UNSIGNED_H(vec6, FILTER_BITS, 7);
vec7 = SRARI_SATURATE_UNSIGNED_H(vec7, FILTER_BITS, 7);
res0 = __msa_pckev_b((v16i8)vec4, (v16i8)vec4);
res1 = __msa_pckev_b((v16i8)vec5, (v16i8)vec5);
res2 = __msa_pckev_b((v16i8)vec6, (v16i8)vec6);
res3 = __msa_pckev_b((v16i8)vec7, (v16i8)vec7);
out0 = __msa_copy_u_w((v4i32)res0, 0);
out1 = __msa_copy_u_w((v4i32)res0, 1);
out2 = __msa_copy_u_w((v4i32)res1, 0);
out3 = __msa_copy_u_w((v4i32)res1, 1);
STORE_WORD(dst, out0);
dst += dst_stride;
STORE_WORD(dst, out1);
dst += dst_stride;
STORE_WORD(dst, out2);
dst += dst_stride;
STORE_WORD(dst, out3);
dst += dst_stride;
out0 = __msa_copy_u_w((v4i32)res2, 0);
out1 = __msa_copy_u_w((v4i32)res2, 1);
out2 = __msa_copy_u_w((v4i32)res3, 0);
out3 = __msa_copy_u_w((v4i32)res3, 1);
STORE_WORD(dst, out0);
dst += dst_stride;
STORE_WORD(dst, out1);
dst += dst_stride;
STORE_WORD(dst, out2);
dst += dst_stride;
STORE_WORD(dst, out3);
}
static void common_hv_2ht_2vt_4w_msa(const uint8_t *src, int32_t src_stride,
uint8_t *dst, int32_t dst_stride,
int8_t *filter_horiz,
int8_t *filter_vert,
int32_t height) {
if (4 == height) {
common_hv_2ht_2vt_4x4_msa(src, src_stride, dst, dst_stride,
filter_horiz, filter_vert);
} else if (8 == height) {
common_hv_2ht_2vt_4x8_msa(src, src_stride, dst, dst_stride,
filter_horiz, filter_vert);
}
}
static void common_hv_2ht_2vt_8x4_msa(const uint8_t *src, int32_t src_stride,
uint8_t *dst, int32_t dst_stride,
int8_t *filter_horiz,
int8_t *filter_vert) {
v16i8 src0, src1, src2, src3, src4, mask;
v16u8 filt_horiz, filt_vert, horiz_vec;
v16u8 vec0, vec1, vec2, vec3;
v8u16 horiz_out0, horiz_out1;
v8u16 tmp0, tmp1, tmp2, tmp3;
v8i16 filt;
mask = LOAD_SB(&mc_filt_mask_arr[0]);
/* rearranging filter */
filt = LOAD_SH(filter_horiz);
filt_horiz = (v16u8)__msa_splati_h(filt, 0);
filt = LOAD_SH(filter_vert);
filt_vert = (v16u8)__msa_splati_h(filt, 0);
LOAD_5VECS_SB(src, src_stride, src0, src1, src2, src3, src4);
src += (5 * src_stride);
horiz_vec = (v16u8)__msa_vshf_b(mask, src0, src0);
horiz_out0 = __msa_dotp_u_h(horiz_vec, filt_horiz);
horiz_out0 = SRARI_SATURATE_UNSIGNED_H(horiz_out0, FILTER_BITS, 7);
horiz_vec = (v16u8)__msa_vshf_b(mask, src1, src1);
horiz_out1 = __msa_dotp_u_h(horiz_vec, filt_horiz);
horiz_out1 = SRARI_SATURATE_UNSIGNED_H(horiz_out1, FILTER_BITS, 7);
vec0 = (v16u8)__msa_ilvev_b((v16i8)horiz_out1, (v16i8)horiz_out0);
tmp0 = __msa_dotp_u_h(vec0, filt_vert);
horiz_vec = (v16u8)__msa_vshf_b(mask, src2, src2);
horiz_out0 = __msa_dotp_u_h(horiz_vec, filt_horiz);
horiz_out0 = SRARI_SATURATE_UNSIGNED_H(horiz_out0, FILTER_BITS, 7);
vec1 = (v16u8)__msa_ilvev_b((v16i8)horiz_out0, (v16i8)horiz_out1);
tmp1 = __msa_dotp_u_h(vec1, filt_vert);
horiz_vec = (v16u8)__msa_vshf_b(mask, src3, src3);
horiz_out1 = __msa_dotp_u_h(horiz_vec, filt_horiz);
horiz_out1 = SRARI_SATURATE_UNSIGNED_H(horiz_out1, FILTER_BITS, 7);
vec2 = (v16u8)__msa_ilvev_b((v16i8)horiz_out1, (v16i8)horiz_out0);
tmp2 = __msa_dotp_u_h(vec2, filt_vert);
horiz_vec = (v16u8)__msa_vshf_b(mask, src4, src4);
horiz_out0 = __msa_dotp_u_h(horiz_vec, filt_horiz);
horiz_out0 = SRARI_SATURATE_UNSIGNED_H(horiz_out0, FILTER_BITS, 7);
vec3 = (v16u8)__msa_ilvev_b((v16i8)horiz_out0, (v16i8)horiz_out1);
tmp3 = __msa_dotp_u_h(vec3, filt_vert);
tmp0 = SRARI_SATURATE_UNSIGNED_H(tmp0, FILTER_BITS, 7);
tmp1 = SRARI_SATURATE_UNSIGNED_H(tmp1, FILTER_BITS, 7);
tmp2 = SRARI_SATURATE_UNSIGNED_H(tmp2, FILTER_BITS, 7);
tmp3 = SRARI_SATURATE_UNSIGNED_H(tmp3, FILTER_BITS, 7);
PCKEV_B_STORE_8_BYTES_4(tmp0, tmp1, tmp2, tmp3, dst, dst_stride);
}
static void common_hv_2ht_2vt_8x8mult_msa(const uint8_t *src,
int32_t src_stride,
uint8_t *dst,
int32_t dst_stride,
int8_t *filter_horiz,
int8_t *filter_vert,
int32_t height) {
uint32_t loop_cnt;
v16i8 src0, src1, src2, src3, src4, mask;
v16u8 filt_horiz, filt_vert, vec0, horiz_vec;
v8u16 horiz_out0, horiz_out1;
v8u16 tmp1, tmp2, tmp3, tmp4, tmp5, tmp6, tmp7, tmp8;
v8i16 filt;
mask = LOAD_SB(&mc_filt_mask_arr[0]);
/* rearranging filter */
filt = LOAD_SH(filter_horiz);
filt_horiz = (v16u8)__msa_splati_h(filt, 0);
filt = LOAD_SH(filter_vert);
filt_vert = (v16u8)__msa_splati_h(filt, 0);
src0 = LOAD_SB(src);
src += src_stride;
horiz_vec = (v16u8)__msa_vshf_b(mask, src0, src0);
horiz_out0 = __msa_dotp_u_h(horiz_vec, filt_horiz);
horiz_out0 = SRARI_SATURATE_UNSIGNED_H(horiz_out0, FILTER_BITS, 7);
for (loop_cnt = (height >> 3); loop_cnt--;) {
LOAD_4VECS_SB(src, src_stride, src1, src2, src3, src4);
src += (4 * src_stride);
horiz_vec = (v16u8)__msa_vshf_b(mask, src1, src1);
horiz_out1 = __msa_dotp_u_h(horiz_vec, filt_horiz);
horiz_out1 = SRARI_SATURATE_UNSIGNED_H(horiz_out1, FILTER_BITS, 7);
vec0 = (v16u8)__msa_ilvev_b((v16i8)horiz_out1, (v16i8)horiz_out0);
tmp1 = __msa_dotp_u_h(vec0, filt_vert);
horiz_vec = (v16u8)__msa_vshf_b(mask, src2, src2);
horiz_out0 = __msa_dotp_u_h(horiz_vec, filt_horiz);
horiz_out0 = SRARI_SATURATE_UNSIGNED_H(horiz_out0, FILTER_BITS, 7);
vec0 = (v16u8)__msa_ilvev_b((v16i8)horiz_out0, (v16i8)horiz_out1);
tmp2 = (v8u16)__msa_dotp_u_h(vec0, filt_vert);
tmp1 = SRARI_SATURATE_UNSIGNED_H(tmp1, FILTER_BITS, 7);
tmp2 = SRARI_SATURATE_UNSIGNED_H(tmp2, FILTER_BITS, 7);
horiz_vec = (v16u8)__msa_vshf_b(mask, src3, src3);
horiz_out1 = __msa_dotp_u_h(horiz_vec, filt_horiz);
horiz_out1 = SRARI_SATURATE_UNSIGNED_H(horiz_out1, FILTER_BITS, 7);
vec0 = (v16u8)__msa_ilvev_b((v16i8)horiz_out1, (v16i8)horiz_out0);
tmp3 = __msa_dotp_u_h(vec0, filt_vert);
horiz_vec = (v16u8)__msa_vshf_b(mask, src4, src4);
horiz_out0 = __msa_dotp_u_h(horiz_vec, filt_horiz);
horiz_out0 = SRARI_SATURATE_UNSIGNED_H(horiz_out0, FILTER_BITS, 7);
LOAD_4VECS_SB(src, src_stride, src1, src2, src3, src4);
src += (4 * src_stride);
vec0 = (v16u8)__msa_ilvev_b((v16i8)horiz_out0, (v16i8)horiz_out1);
tmp4 = __msa_dotp_u_h(vec0, filt_vert);
tmp3 = SRARI_SATURATE_UNSIGNED_H(tmp3, FILTER_BITS, 7);
tmp4 = SRARI_SATURATE_UNSIGNED_H(tmp4, FILTER_BITS, 7);
PCKEV_B_STORE_8_BYTES_4(tmp1, tmp2, tmp3, tmp4, dst, dst_stride);
dst += (4 * dst_stride);
horiz_vec = (v16u8)__msa_vshf_b(mask, src1, src1);
horiz_out1 = __msa_dotp_u_h(horiz_vec, filt_horiz);
horiz_out1 = SRARI_SATURATE_UNSIGNED_H(horiz_out1, FILTER_BITS, 7);
vec0 = (v16u8)__msa_ilvev_b((v16i8)horiz_out1, (v16i8)horiz_out0);
tmp5 = __msa_dotp_u_h(vec0, filt_vert);
horiz_vec = (v16u8)__msa_vshf_b(mask, src2, src2);
horiz_out0 = __msa_dotp_u_h(horiz_vec, filt_horiz);
horiz_out0 = SRARI_SATURATE_UNSIGNED_H(horiz_out0, FILTER_BITS, 7);
vec0 = (v16u8)__msa_ilvev_b((v16i8)horiz_out0, (v16i8)horiz_out1);
tmp6 = __msa_dotp_u_h(vec0, filt_vert);
horiz_vec = (v16u8)__msa_vshf_b(mask, src3, src3);
horiz_out1 = __msa_dotp_u_h(horiz_vec, filt_horiz);
horiz_out1 = SRARI_SATURATE_UNSIGNED_H(horiz_out1, FILTER_BITS, 7);
vec0 = (v16u8)__msa_ilvev_b((v16i8)horiz_out1, (v16i8)horiz_out0);
tmp7 = __msa_dotp_u_h(vec0, filt_vert);
horiz_vec = (v16u8)__msa_vshf_b(mask, src4, src4);
horiz_out0 = __msa_dotp_u_h(horiz_vec, filt_horiz);
horiz_out0 = SRARI_SATURATE_UNSIGNED_H(horiz_out0, FILTER_BITS, 7);
vec0 = (v16u8)__msa_ilvev_b((v16i8)horiz_out0, (v16i8)horiz_out1);
tmp8 = __msa_dotp_u_h(vec0, filt_vert);
tmp5 = SRARI_SATURATE_UNSIGNED_H(tmp5, FILTER_BITS, 7);
tmp6 = SRARI_SATURATE_UNSIGNED_H(tmp6, FILTER_BITS, 7);
tmp7 = SRARI_SATURATE_UNSIGNED_H(tmp7, FILTER_BITS, 7);
tmp8 = SRARI_SATURATE_UNSIGNED_H(tmp8, FILTER_BITS, 7);
PCKEV_B_STORE_8_BYTES_4(tmp5, tmp6, tmp7, tmp8, dst, dst_stride);
dst += (4 * dst_stride);
}
}
static void common_hv_2ht_2vt_8w_msa(const uint8_t *src, int32_t src_stride,
uint8_t *dst, int32_t dst_stride,
int8_t *filter_horiz, int8_t *filter_vert,
int32_t height) {
if (4 == height) {
common_hv_2ht_2vt_8x4_msa(src, src_stride, dst, dst_stride, filter_horiz,
filter_vert);
} else {
common_hv_2ht_2vt_8x8mult_msa(src, src_stride, dst, dst_stride,
filter_horiz, filter_vert, height);
}
}
static void common_hv_2ht_2vt_16w_msa(const uint8_t *src, int32_t src_stride,
uint8_t *dst, int32_t dst_stride,
int8_t *filter_horiz, int8_t *filter_vert,
int32_t height) {
uint32_t loop_cnt;
v16i8 src0, src1, src2, src3, src4, src5, src6, src7, mask;
v16u8 filt_horiz, filt_vert, vec0, horiz_vec;
v8u16 horiz_vec0, horiz_vec1, tmp1, tmp2;
v8u16 horiz_out0, horiz_out1, horiz_out2, horiz_out3;
v8i16 filt;
mask = LOAD_SB(&mc_filt_mask_arr[0]);
/* rearranging filter */
filt = LOAD_SH(filter_horiz);
filt_horiz = (v16u8)__msa_splati_h(filt, 0);
filt = LOAD_SH(filter_vert);
filt_vert = (v16u8)__msa_splati_h(filt, 0);
src0 = LOAD_SB(src);
src1 = LOAD_SB(src + 8);
horiz_vec = (v16u8)__msa_vshf_b(mask, src0, src0);
horiz_vec0 = __msa_dotp_u_h(horiz_vec, filt_horiz);
horiz_out0 = SRARI_SATURATE_UNSIGNED_H(horiz_vec0, FILTER_BITS, 7);
horiz_vec = (v16u8)__msa_vshf_b(mask, src1, src1);
horiz_vec1 = __msa_dotp_u_h(horiz_vec, filt_horiz);
horiz_out2 = SRARI_SATURATE_UNSIGNED_H(horiz_vec1, FILTER_BITS, 7);
src += src_stride;
for (loop_cnt = (height >> 2); loop_cnt--;) {
LOAD_4VECS_SB(src, src_stride, src0, src2, src4, src6);
LOAD_4VECS_SB(src + 8, src_stride, src1, src3, src5, src7);
src += (4 * src_stride);
horiz_vec = (v16u8)__msa_vshf_b(mask, src0, src0);
horiz_vec0 = __msa_dotp_u_h(horiz_vec, filt_horiz);
horiz_out1 = SRARI_SATURATE_UNSIGNED_H(horiz_vec0, FILTER_BITS, 7);
horiz_vec = (v16u8)__msa_vshf_b(mask, src1, src1);
horiz_vec1 = __msa_dotp_u_h(horiz_vec, filt_horiz);
horiz_out3 = SRARI_SATURATE_UNSIGNED_H(horiz_vec1, FILTER_BITS, 7);
vec0 = (v16u8)__msa_ilvev_b((v16i8)horiz_out1, (v16i8)horiz_out0);
tmp1 = __msa_dotp_u_h(vec0, filt_vert);
vec0 = (v16u8)__msa_ilvev_b((v16i8)horiz_out3, (v16i8)horiz_out2);
tmp2 = __msa_dotp_u_h(vec0, filt_vert);
tmp1 = SRARI_SATURATE_UNSIGNED_H(tmp1, FILTER_BITS, 7);
tmp2 = SRARI_SATURATE_UNSIGNED_H(tmp2, FILTER_BITS, 7);
PCKEV_B_STORE_VEC(tmp2, tmp1, dst);
dst += dst_stride;
horiz_vec = (v16u8)__msa_vshf_b(mask, src2, src2);
horiz_vec0 = __msa_dotp_u_h(horiz_vec, filt_horiz);
horiz_out0 = SRARI_SATURATE_UNSIGNED_H(horiz_vec0, FILTER_BITS, 7);
horiz_vec = (v16u8)__msa_vshf_b(mask, src3, src3);
horiz_vec1 = __msa_dotp_u_h(horiz_vec, filt_horiz);
horiz_out2 = SRARI_SATURATE_UNSIGNED_H(horiz_vec1, FILTER_BITS, 7);
vec0 = (v16u8)__msa_ilvev_b((v16i8)horiz_out0, (v16i8)horiz_out1);
tmp1 = __msa_dotp_u_h(vec0, filt_vert);
vec0 = (v16u8)__msa_ilvev_b((v16i8)horiz_out2, (v16i8)horiz_out3);
tmp2 = __msa_dotp_u_h(vec0, filt_vert);
tmp1 = SRARI_SATURATE_UNSIGNED_H(tmp1, FILTER_BITS, 7);
tmp2 = SRARI_SATURATE_UNSIGNED_H(tmp2, FILTER_BITS, 7);
PCKEV_B_STORE_VEC(tmp2, tmp1, dst);
dst += dst_stride;
horiz_vec = (v16u8)__msa_vshf_b(mask, src4, src4);
horiz_vec0 = __msa_dotp_u_h(horiz_vec, filt_horiz);
horiz_out1 = SRARI_SATURATE_UNSIGNED_H(horiz_vec0, FILTER_BITS, 7);
horiz_vec = (v16u8)__msa_vshf_b(mask, src5, src5);
horiz_vec1 = __msa_dotp_u_h(horiz_vec, filt_horiz);
horiz_out3 = SRARI_SATURATE_UNSIGNED_H(horiz_vec1, FILTER_BITS, 7);
vec0 = (v16u8)__msa_ilvev_b((v16i8)horiz_out1, (v16i8)horiz_out0);
tmp1 = __msa_dotp_u_h(vec0, filt_vert);
vec0 = (v16u8)__msa_ilvev_b((v16i8)horiz_out3, (v16i8)horiz_out2);
tmp2 = __msa_dotp_u_h(vec0, filt_vert);
tmp1 = SRARI_SATURATE_UNSIGNED_H(tmp1, FILTER_BITS, 7);
tmp2 = SRARI_SATURATE_UNSIGNED_H(tmp2, FILTER_BITS, 7);
PCKEV_B_STORE_VEC(tmp2, tmp1, dst);
dst += dst_stride;
horiz_vec = (v16u8)__msa_vshf_b(mask, src6, src6);
horiz_vec0 = __msa_dotp_u_h(horiz_vec, filt_horiz);
horiz_out0 = SRARI_SATURATE_UNSIGNED_H(horiz_vec0, FILTER_BITS, 7);
horiz_vec = (v16u8)__msa_vshf_b(mask, src7, src7);
horiz_vec1 = __msa_dotp_u_h(horiz_vec, filt_horiz);
horiz_out2 = SRARI_SATURATE_UNSIGNED_H(horiz_vec1, FILTER_BITS, 7);
vec0 = (v16u8)__msa_ilvev_b((v16i8)horiz_out0, (v16i8)horiz_out1);
tmp1 = __msa_dotp_u_h(vec0, filt_vert);
vec0 = (v16u8)__msa_ilvev_b((v16i8)horiz_out2, (v16i8)horiz_out3);
tmp2 = __msa_dotp_u_h(vec0, filt_vert);
tmp1 = SRARI_SATURATE_UNSIGNED_H(tmp1, FILTER_BITS, 7);
tmp2 = SRARI_SATURATE_UNSIGNED_H(tmp2, FILTER_BITS, 7);
PCKEV_B_STORE_VEC(tmp2, tmp1, dst);
dst += dst_stride;
}
}
static void common_hv_2ht_2vt_32w_msa(const uint8_t *src, int32_t src_stride,
uint8_t *dst, int32_t dst_stride,
int8_t *filter_horiz, int8_t *filter_vert,
int32_t height) {
int32_t multiple8_cnt;
for (multiple8_cnt = 2; multiple8_cnt--;) {
common_hv_2ht_2vt_16w_msa(src, src_stride, dst, dst_stride, filter_horiz,
filter_vert, height);
src += 16;
dst += 16;
}
}
static void common_hv_2ht_2vt_64w_msa(const uint8_t *src, int32_t src_stride,
uint8_t *dst, int32_t dst_stride,
int8_t *filter_horiz, int8_t *filter_vert,
int32_t height) {
int32_t multiple8_cnt;
for (multiple8_cnt = 4; multiple8_cnt--;) {
common_hv_2ht_2vt_16w_msa(src, src_stride, dst, dst_stride, filter_horiz,
filter_vert, height);
src += 16;
dst += 16;
}
}
void vp9_convolve8_msa(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int32_t x_step_q4,
const int16_t *filter_y, int32_t y_step_q4,
int32_t w, int32_t h) {
int8_t cnt, filt_hor[8], filt_ver[8];
if (16 != x_step_q4 || 16 != y_step_q4) {
vp9_convolve8_c(src, src_stride, dst, dst_stride,
filter_x, x_step_q4, filter_y, y_step_q4,
w, h);
return;
}
if (((const int32_t *)filter_x)[1] == 0x800000 &&
((const int32_t *)filter_y)[1] == 0x800000) {
vp9_convolve_copy(src, src_stride, dst, dst_stride,
filter_x, x_step_q4, filter_y, y_step_q4,
w, h);
return;
}
for (cnt = 0; cnt < 8; ++cnt) {
filt_hor[cnt] = filter_x[cnt];
filt_ver[cnt] = filter_y[cnt];
}
if (((const int32_t *)filter_x)[0] == 0 &&
((const int32_t *)filter_y)[0] == 0) {
switch (w) {
case 4:
common_hv_2ht_2vt_4w_msa(src, (int32_t)src_stride,
dst, (int32_t)dst_stride,
&filt_hor[3], &filt_ver[3], (int32_t)h);
break;
case 8:
common_hv_2ht_2vt_8w_msa(src, (int32_t)src_stride,
dst, (int32_t)dst_stride,
&filt_hor[3], &filt_ver[3], (int32_t)h);
break;
case 16:
common_hv_2ht_2vt_16w_msa(src, (int32_t)src_stride,
dst, (int32_t)dst_stride,
&filt_hor[3], &filt_ver[3], (int32_t)h);
break;
case 32:
common_hv_2ht_2vt_32w_msa(src, (int32_t)src_stride,
dst, (int32_t)dst_stride,
&filt_hor[3], &filt_ver[3], (int32_t)h);
break;
case 64:
common_hv_2ht_2vt_64w_msa(src, (int32_t)src_stride,
dst, (int32_t)dst_stride,
&filt_hor[3], &filt_ver[3], (int32_t)h);
break;
default:
vp9_convolve8_c(src, src_stride, dst, dst_stride,
filter_x, x_step_q4, filter_y, y_step_q4,
w, h);
break;
}
} else if (((const int32_t *)filter_x)[0] == 0 ||
((const int32_t *)filter_y)[0] == 0) {
vp9_convolve8_c(src, src_stride, dst, dst_stride,
filter_x, x_step_q4, filter_y, y_step_q4,
w, h);
} else {
switch (w) {
case 4:
common_hv_8ht_8vt_4w_msa(src, (int32_t)src_stride,
dst, (int32_t)dst_stride,
filt_hor, filt_ver, (int32_t)h);
break;
case 8:
common_hv_8ht_8vt_8w_msa(src, (int32_t)src_stride,
dst, (int32_t)dst_stride,
filt_hor, filt_ver, (int32_t)h);
break;
case 16:
common_hv_8ht_8vt_16w_msa(src, (int32_t)src_stride,
dst, (int32_t)dst_stride,
filt_hor, filt_ver, (int32_t)h);
break;
case 32:
common_hv_8ht_8vt_32w_msa(src, (int32_t)src_stride,
dst, (int32_t)dst_stride,
filt_hor, filt_ver, (int32_t)h);
break;
case 64:
common_hv_8ht_8vt_64w_msa(src, (int32_t)src_stride,
dst, (int32_t)dst_stride,
filt_hor, filt_ver, (int32_t)h);
break;
default:
vp9_convolve8_c(src, src_stride, dst, dst_stride,
filter_x, x_step_q4, filter_y, y_step_q4,
w, h);
break;
}
}
}

Some files were not shown because too many files have changed in this diff Show More