Compare commits

..

12 Commits

Author SHA1 Message Date
Stefan Holmer
3cf0ef4593 Added configure option to enable error-concealment. Disabled by default.
Change-Id: I94580a5ecb13520195ea2b8a10ca11bb5a01d2a6
2011-04-29 14:08:47 +02:00
Stefan Holmer
0909b83427 Concealed MBs are always SPLITMV with partition=3. This can be optimized.
Also changed the criterion for when to skip decoding the residual,
now only skipping for blocks which actually is missing residual.
Now using mvs_corrupt_from_mb for this decision since asking the bool
decoder doesn't work (it has already finished decoding).

Change-Id: I3175f11c84ae701fc2935ebe22e1d75297072eae
2011-04-29 13:50:15 +02:00
John Koleszar
62da6700dc Update VP8DX_BOOL_DECODER_FILL to better detect EOS
Allow more reliable detection of truncated bitstreams by being more
precise with the count of "virtual" bits in the value buffer.
Specifically, the VP8_LOTS_OF_BITS value is accumulated into count,
rather than being assigned, which was losing the prior value,
increasing the required tolerance when testing for the error condition.

Change-Id: Ib5172eaa57323b939c439fff8a8ab5fa38da9b69
2011-04-29 11:22:09 +02:00
Stefan Holmer
98ea0d71a4 Added more descriptive comments and did some smaller refactoring. Also changed to setting the mb_skip_coeff flag when a macroblock needs to be concealed.
Change-Id: I0bbf6de899f5b27f4a8ca0454da7e928e8b23919
2011-04-28 16:28:07 +02:00
Stefan Holmer
8d49ea12c2 Added correct handling of motion vectors outside frame boundaries.
Change-Id: Ibf81e1d188d8dd6de877e1c52761fa212e848865
2011-04-20 12:08:27 +02:00
Stefan Holmer
766ad7edb6 Reverting some of the changes done in a64b37..., moving back the bool dec
error check to vp8_decode_mb_row.

Change-Id: I717ee57efc29b8e0619d6f00d1c64d0d20114a8b
2011-04-19 16:23:05 +02:00
Stefan Holmer
20431c1354 Forgot to remove two lines in previous submit
Change-Id: Idbc0bc328cf2f99071008fd4a54ea00bac7beb94
2011-04-19 15:38:39 +02:00
Stefan Holmer
1b913c1f78 Refactored find_neighboring_blocks() and moved the test for corrupt stream
and intra concealment inside vp8_decode_macroblock to be able tocapture
and conceal errors in the residual before reconstruction.

Change-Id: Id0f0bd87945a9bb1db0c20bb5467e2ff9aae5d28
2011-04-19 15:33:46 +02:00
Stefan Holmer
a64b37fdbc Added spatial motion vector interpolation. Used for intra blocks with missing residual coefficients.
Change-Id: I3e765b5dee251362d1330ebbcf9fa22d852377a1
2011-04-19 12:45:51 +02:00
Stefan Holmer
a2951d8deb Implemented a first version of the motion vector extrapolation error
concealment algorithm. Tested on foreman_cif.yuv only. Some special
cases are still not handled in a good way, for instance when receiving
intra blocks without coefficients.

Change-Id: Ie7bb41855860923b313645dacb3cf70f1e350549
2011-04-01 11:55:30 +02:00
Stefan Holmer
83a2b4e114 Added a first simple version of error-concealment
Added a first very simple version of error-concealment which simply
repeats the last decoded motion vector for corrupt MBs.

Change-Id: Ia83e111649afe11870c3c66065977bd0610c4fa1
2011-02-01 17:30:51 +01:00
Henrik Lundin
1422ce5cff Error concealment in decoder
Implementing an error concealment in the VP8 decoder.

Change-Id: I63934df71191ad0b1e65c89725d9e021e1d8d93d
2011-01-20 11:22:50 +01:00
314 changed files with 24426 additions and 16711 deletions

6
.gitignore vendored
View File

@@ -60,9 +60,3 @@
/vpx_config.h
/vpx_version.h
TAGS
vpxdec
vpxenc
.project
.cproject
*.csv
*.oclpj

View File

@@ -1,4 +1,2 @@
Adrian Grange <agrange@google.com>
Johann Koenig <johannkoenig@google.com>
Tero Rintaluoma <teror@google.com> <tero.rintaluoma@on2.com>
Tom Finegan <tomfinegan@google.com>

View File

@@ -4,18 +4,13 @@
Aaron Watry <awatry@gmail.com>
Adrian Grange <agrange@google.com>
Alex Converse <alex.converse@gmail.com>
Andoni Morales Alastruey <ylatuya@gmail.com>
Andres Mejia <mcitadel@gmail.com>
Attila Nagy <attilanagy@google.com>
Fabio Pedretti <fabio.ped@libero.it>
Frank Galligan <fgalligan@google.com>
Fredrik Söderquist <fs@opera.com>
Fritz Koenig <frkoenig@google.com>
Gaute Strokkenes <gaute.strokkenes@broadcom.com>
Giuseppe Scrivano <gscrivano@gnu.org>
Guillermo Ballester Valor <gbvalor@gmail.com>
Henrik Lundin <hlundin@google.com>
James Berry <jamesberry@google.com>
James Zern <jzern@google.com>
Jan Kratochvil <jan.kratochvil@redhat.com>
Jeff Muizelaar <jmuizelaar@mozilla.com>
@@ -28,14 +23,10 @@ Luca Barbato <lu_zero@gentoo.org>
Makoto Kato <makoto.kt@gmail.com>
Martin Ettl <ettl.martin78@googlemail.com>
Michael Kohler <michaelkohler@live.com>
Mikhal Shemer <mikhal@google.com>
Pascal Massimino <pascal.massimino@gmail.com>
Patrik Westin <patrik.westin@gmail.com>
Paul Wilkins <paulwilkins@google.com>
Pavol Rusnak <stick@gk2.sk>
Philip Jägenstedt <philipj@opera.com>
Scott LaVarnway <slavarnway@google.com>
Tero Rintaluoma <teror@google.com>
Timothy B. Terriberry <tterribe@xiph.org>
Tom Finegan <tomfinegan@google.com>
Yaowu Xu <yaowu@google.com>

View File

@@ -1,80 +1,3 @@
2011-03-07 v0.9.6 "Bali"
Our second named release, focused on a faster, higher quality, encoder.
- Upgrading:
This release is backwards compatible with Aylesbury (v0.9.5). Users
of older releases should refer to the Upgrading notes in this
document for that release.
- Enhancements:
vpxenc --psnr shows a summary when encode completes
--tune=ssim option to enable activity masking
improved postproc visualizations for development
updated support for Apple iOS to SDK 4.2
query decoder to determine which reference frames were updated
implemented error tracking in the decoder
fix pipe support on windows
- Speed:
Primary focus was on good quality mode, speed 0. Average improvement
on x86 about 40%, up to 100% on user-generated content at that speed.
Best quality mode speed improved 35%, and realtime speed 10-20%. This
release also saw significant improvement in realtime encoding speed
on ARM platforms.
Improved encoder threading
Dont pick encoder filter level when loopfilter is disabled.
Avoid double copying of key frames into alt and golden buffer
FDCT optimizations.
x86 sse2 temporal filter
SSSE3 version of fast quantizer
vp8_rd_pick_best_mbsegmentation code restructure
Adjusted breakout RD for SPLITMV
Changed segmentation check order
Improved rd_pick_intra4x4block
Adds armv6 optimized variance calculation
ARMv6 optimized sad16x16
ARMv6 optimized half pixel variance calculations
Full search SAD function optimization in SSE4.1
Improve MV prediction accuracy to achieve performance gain
Improve MV prediction in vp8_pick_inter_mode() for speed>3
- Quality:
Best quality mode improved PSNR 6.3%, and SSIM 6.1%. This release
also includes support for "activity masking," which greatly improves
SSIM at the expense of PSNR. For now, this feature is available with
the --tune=ssim option. Further experimentation in this area
is ongoing. This release also introduces a new rate control mode
called "CQ," which changes the allocation of bits within a clip to
the sections where they will have the most visual impact.
Tuning for the more exact quantizer.
Relax rate control for last few frames
CQ Mode
Limit key frame quantizer for forced key frames.
KF/GF Pulsing
Add simple version of activity masking.
make rdmult adaptive for intra in quantizer RDO
cap the best quantizer for 2nd order DC
change the threshold of DC check for encode breakout
- Bug Fixes:
Fix crash on Sparc Solaris.
Fix counter of fixed keyframe distance
ARNR filter pointer update bug fix
Fixed use of motion percentage in KF/GF group calc
Changed condition for using RD in Intra Mode
Fix encoder real-time only configuration.
Fix ARM encoder crash with multiple token partitions
Fixed bug first cluster timecode of webm file is wrong.
Fixed various encoder bugs with odd-sized images
vp8e_get_preview fixed when spatial resampling enabled
quantizer: fix assertion in fast quantizer path
Allocate source buffers to be multiples of 16
Fix for manual Golden frame frequency
Fix drastic undershoot in long form content
2010-10-28 v0.9.5 "Aylesbury"
Our first named release, focused on a faster decoder, and a better encoder.

4
README
View File

@@ -45,14 +45,18 @@ COMPILING THE APPLICATIONS/LIBRARIES:
armv5te-linux-rvct
armv5te-linux-gcc
armv5te-symbian-gcc
armv5te-wince-vs8
armv6-darwin-gcc
armv6-linux-rvct
armv6-linux-gcc
armv6-symbian-gcc
armv6-wince-vs8
iwmmxt-linux-rvct
iwmmxt-linux-gcc
iwmmxt-wince-vs8
iwmmxt2-linux-rvct
iwmmxt2-linux-gcc
iwmmxt2-wince-vs8
armv7-linux-rvct
armv7-linux-gcc
mips32-linux-gcc

View File

@@ -0,0 +1,20 @@
<?xml version="1.0" encoding="utf-8"?>
<VisualStudioToolFile
Name="armasm"
Version="8.00"
>
<Rules>
<CustomBuildRule
Name="ARMASM"
DisplayName="Armasm Assembler"
CommandLine="armasm -o &quot;$(IntDir)\$(InputName).obj&quot; $(InputPath) -32 -ARCH 5&#x0D;&#x0A;"
Outputs="$(IntDir)\$(InputName).obj"
FileExtensions="*.asm"
ExecutionDescription="Assembling $(InputName).asm"
ShowOnlyRuleProperties="false"
>
<Properties>
</Properties>
</CustomBuildRule>
</Rules>
</VisualStudioToolFile>

View File

@@ -0,0 +1,20 @@
<?xml version="1.0" encoding="utf-8"?>
<VisualStudioToolFile
Name="armasm"
Version="8.00"
>
<Rules>
<CustomBuildRule
Name="ARMASM"
DisplayName="Armasm Assembler"
CommandLine="armasm -o &quot;$(IntDir)\$(InputName).obj&quot; $(InputPath) -32 -ARCH 6&#x0D;&#x0A;"
Outputs="$(IntDir)\$(InputName).obj"
FileExtensions="*.asm"
ExecutionDescription="Assembling $(InputName).asm"
ShowOnlyRuleProperties="false"
>
<Properties>
</Properties>
</CustomBuildRule>
</Rules>
</VisualStudioToolFile>

View File

@@ -0,0 +1,20 @@
<?xml version="1.0" encoding="utf-8"?>
<VisualStudioToolFile
Name="armasm"
Version="8.00"
>
<Rules>
<CustomBuildRule
Name="ARMASM"
DisplayName="Armasm Assembler"
CommandLine="armasm -o &quot;$(IntDir)\$(InputName).obj&quot; $(InputPath) -32 -cpu XSCALE&#x0D;&#x0A;"
Outputs="$(IntDir)\$(InputName).obj"
FileExtensions="*.asm"
ExecutionDescription="Assembling $(InputName).asm"
ShowOnlyRuleProperties="false"
>
<Properties>
</Properties>
</CustomBuildRule>
</Rules>
</VisualStudioToolFile>

View File

@@ -0,0 +1,13 @@
@echo off
REM Copyright (c) 2010 The WebM project authors. All Rights Reserved.
REM
REM Use of this source code is governed by a BSD-style license
REM that can be found in the LICENSE file in the root of the source
REM tree. An additional intellectual property rights grant can be found
REM in the file PATENTS. All contributing project authors may
REM be found in the AUTHORS file in the root of the source tree.
echo on
cl /I ".\\" /I "..\vp6_decoder_sdk" /I "..\vp6_decoder_sdk\vpx_ports" /D "NDEBUG" /D "_WIN32_WCE=0x420" /D "UNDER_CE" /D "WIN32_PLATFORM_PSPC" /D "WINCE" /D "_LIB" /D "ARM" /D "_ARM_" /D "_UNICODE" /D "UNICODE" /FD /EHsc /MT /GS- /fp:fast /GR- /Fo"Pocket_PC_2003__ARMV4_\%1/" /Fd"Pocket_PC_2003__ARMV4_\%1/vc80.pdb" /W3 /nologo /c /TC ..\vp6_decoder_sdk\vp6_decoder\algo\common\arm\dec_asm_offsets_arm.c
obj_int_extract.exe rvds "Pocket_PC_2003__ARMV4_\%1/dec_asm_offsets_arm.obj"

View File

@@ -0,0 +1,88 @@
Microsoft Visual Studio Solution File, Format Version 9.00
# Visual Studio 2005
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "example", "example.vcproj", "{BA5FE66F-38DD-E034-F542-B1578C5FB950}"
ProjectSection(ProjectDependencies) = postProject
{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74} = {DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}
{E1360C65-D375-4335-8057-7ED99CC3F9B2} = {E1360C65-D375-4335-8057-7ED99CC3F9B2}
EndProjectSection
EndProject
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "obj_int_extract", "obj_int_extract.vcproj", "{E1360C65-D375-4335-8057-7ED99CC3F9B2}"
EndProject
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "vpx", "vpx.vcproj", "{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}"
ProjectSection(ProjectDependencies) = postProject
{E1360C65-D375-4335-8057-7ED99CC3F9B2} = {E1360C65-D375-4335-8057-7ED99CC3F9B2}
EndProjectSection
EndProject
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "xma", "xma.vcproj", "{A955FC4A-73F1-44F7-135E-30D84D32F022}"
ProjectSection(ProjectDependencies) = postProject
{E1360C65-D375-4335-8057-7ED99CC3F9B2} = {E1360C65-D375-4335-8057-7ED99CC3F9B2}
{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74} = {DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}
EndProjectSection
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Mixed Platforms = Debug|Mixed Platforms
Debug|Pocket PC 2003 (ARMV4) = Debug|Pocket PC 2003 (ARMV4)
Debug|Win32 = Debug|Win32
Release|Mixed Platforms = Release|Mixed Platforms
Release|Pocket PC 2003 (ARMV4) = Release|Pocket PC 2003 (ARMV4)
Release|Win32 = Release|Win32
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Debug|Mixed Platforms.ActiveCfg = Debug|Pocket PC 2003 (ARMV4)
{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Debug|Mixed Platforms.Build.0 = Debug|Pocket PC 2003 (ARMV4)
{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Debug|Mixed Platforms.Deploy.0 = Debug|Pocket PC 2003 (ARMV4)
{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Debug|Pocket PC 2003 (ARMV4).ActiveCfg = Debug|Pocket PC 2003 (ARMV4)
{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Debug|Pocket PC 2003 (ARMV4).Build.0 = Debug|Pocket PC 2003 (ARMV4)
{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Debug|Pocket PC 2003 (ARMV4).Deploy.0 = Debug|Pocket PC 2003 (ARMV4)
{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Debug|Win32.ActiveCfg = Debug|Pocket PC 2003 (ARMV4)
{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Release|Mixed Platforms.ActiveCfg = Release|Pocket PC 2003 (ARMV4)
{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Release|Mixed Platforms.Build.0 = Release|Pocket PC 2003 (ARMV4)
{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Release|Mixed Platforms.Deploy.0 = Release|Pocket PC 2003 (ARMV4)
{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Release|Pocket PC 2003 (ARMV4).ActiveCfg = Release|Pocket PC 2003 (ARMV4)
{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Release|Pocket PC 2003 (ARMV4).Build.0 = Release|Pocket PC 2003 (ARMV4)
{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Release|Pocket PC 2003 (ARMV4).Deploy.0 = Release|Pocket PC 2003 (ARMV4)
{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Release|Win32.ActiveCfg = Release|Pocket PC 2003 (ARMV4)
{E1360C65-D375-4335-8057-7ED99CC3F9B2}.Debug|Mixed Platforms.ActiveCfg = Release|Win32
{E1360C65-D375-4335-8057-7ED99CC3F9B2}.Debug|Mixed Platforms.Build.0 = Release|Win32
{E1360C65-D375-4335-8057-7ED99CC3F9B2}.Debug|Pocket PC 2003 (ARMV4).ActiveCfg = Release|Win32
{E1360C65-D375-4335-8057-7ED99CC3F9B2}.Debug|Win32.ActiveCfg = Release|Win32
{E1360C65-D375-4335-8057-7ED99CC3F9B2}.Debug|Win32.Build.0 = Release|Win32
{E1360C65-D375-4335-8057-7ED99CC3F9B2}.Release|Mixed Platforms.ActiveCfg = Release|Win32
{E1360C65-D375-4335-8057-7ED99CC3F9B2}.Release|Mixed Platforms.Build.0 = Release|Win32
{E1360C65-D375-4335-8057-7ED99CC3F9B2}.Release|Pocket PC 2003 (ARMV4).ActiveCfg = Release|Win32
{E1360C65-D375-4335-8057-7ED99CC3F9B2}.Release|Win32.ActiveCfg = Release|Win32
{E1360C65-D375-4335-8057-7ED99CC3F9B2}.Release|Win32.Build.0 = Release|Win32
{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Debug|Mixed Platforms.ActiveCfg = Debug|Pocket PC 2003 (ARMV4)
{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Debug|Mixed Platforms.Build.0 = Debug|Pocket PC 2003 (ARMV4)
{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Debug|Mixed Platforms.Deploy.0 = Debug|Pocket PC 2003 (ARMV4)
{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Debug|Pocket PC 2003 (ARMV4).ActiveCfg = Debug|Pocket PC 2003 (ARMV4)
{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Debug|Pocket PC 2003 (ARMV4).Build.0 = Debug|Pocket PC 2003 (ARMV4)
{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Debug|Pocket PC 2003 (ARMV4).Deploy.0 = Debug|Pocket PC 2003 (ARMV4)
{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Debug|Win32.ActiveCfg = Debug|Pocket PC 2003 (ARMV4)
{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Release|Mixed Platforms.ActiveCfg = Release|Pocket PC 2003 (ARMV4)
{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Release|Mixed Platforms.Build.0 = Release|Pocket PC 2003 (ARMV4)
{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Release|Mixed Platforms.Deploy.0 = Release|Pocket PC 2003 (ARMV4)
{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Release|Pocket PC 2003 (ARMV4).ActiveCfg = Release|Pocket PC 2003 (ARMV4)
{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Release|Pocket PC 2003 (ARMV4).Build.0 = Release|Pocket PC 2003 (ARMV4)
{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Release|Pocket PC 2003 (ARMV4).Deploy.0 = Release|Pocket PC 2003 (ARMV4)
{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Release|Win32.ActiveCfg = Release|Pocket PC 2003 (ARMV4)
{A955FC4A-73F1-44F7-135E-30D84D32F022}.Debug|Mixed Platforms.ActiveCfg = Debug|Pocket PC 2003 (ARMV4)
{A955FC4A-73F1-44F7-135E-30D84D32F022}.Debug|Mixed Platforms.Build.0 = Debug|Pocket PC 2003 (ARMV4)
{A955FC4A-73F1-44F7-135E-30D84D32F022}.Debug|Mixed Platforms.Deploy.0 = Debug|Pocket PC 2003 (ARMV4)
{A955FC4A-73F1-44F7-135E-30D84D32F022}.Debug|Pocket PC 2003 (ARMV4).ActiveCfg = Debug|Pocket PC 2003 (ARMV4)
{A955FC4A-73F1-44F7-135E-30D84D32F022}.Debug|Pocket PC 2003 (ARMV4).Build.0 = Debug|Pocket PC 2003 (ARMV4)
{A955FC4A-73F1-44F7-135E-30D84D32F022}.Debug|Pocket PC 2003 (ARMV4).Deploy.0 = Debug|Pocket PC 2003 (ARMV4)
{A955FC4A-73F1-44F7-135E-30D84D32F022}.Debug|Win32.ActiveCfg = Debug|Pocket PC 2003 (ARMV4)
{A955FC4A-73F1-44F7-135E-30D84D32F022}.Release|Mixed Platforms.ActiveCfg = Release|Pocket PC 2003 (ARMV4)
{A955FC4A-73F1-44F7-135E-30D84D32F022}.Release|Mixed Platforms.Build.0 = Release|Pocket PC 2003 (ARMV4)
{A955FC4A-73F1-44F7-135E-30D84D32F022}.Release|Mixed Platforms.Deploy.0 = Release|Pocket PC 2003 (ARMV4)
{A955FC4A-73F1-44F7-135E-30D84D32F022}.Release|Pocket PC 2003 (ARMV4).ActiveCfg = Release|Pocket PC 2003 (ARMV4)
{A955FC4A-73F1-44F7-135E-30D84D32F022}.Release|Pocket PC 2003 (ARMV4).Build.0 = Release|Pocket PC 2003 (ARMV4)
{A955FC4A-73F1-44F7-135E-30D84D32F022}.Release|Pocket PC 2003 (ARMV4).Deploy.0 = Release|Pocket PC 2003 (ARMV4)
{A955FC4A-73F1-44F7-135E-30D84D32F022}.Release|Win32.ActiveCfg = Release|Pocket PC 2003 (ARMV4)
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
EndGlobalSection
EndGlobal

View File

@@ -152,8 +152,8 @@ endif
# Rule to extract assembly constants from C sources
#
obj_int_extract: build/make/obj_int_extract.c
$(if $(quiet),@echo " [HOSTCC] $@")
$(qexec)$(HOSTCC) -I. -I$(SRC_PATH_BARE) -o $@ $<
$(if $(quiet),echo " [HOSTCC] $@")
$(qexec)$(HOSTCC) -I. -o $@ $<
CLEAN-OBJS += obj_int_extract
#
@@ -255,7 +255,7 @@ ifeq ($(filter clean,$(MAKECMDGOALS)),)
endif
#
# Configuration dependent rules
# Configuration dependant rules
#
$(call pairmap,install_map_templates,$(INSTALL_MAPS))
@@ -331,8 +331,11 @@ ifneq ($(call enabled,DIST-SRCS),)
DIST-SRCS-$(CONFIG_MSVS) += build/make/gen_msvs_sln.sh
DIST-SRCS-$(CONFIG_MSVS) += build/x86-msvs/yasm.rules
DIST-SRCS-$(CONFIG_RVCT) += build/make/armlink_adapter.sh
# Include obj_int_extract if we use offsets from asm_*_offsets
DIST-SRCS-$(ARCH_ARM)$(ARCH_X86)$(ARCH_X86_64) += build/make/obj_int_extract.c
#
# This isn't really ARCH_ARM dependent, it's dependant on whether we're
# using assembly code or not (CONFIG_OPTIMIZATIONS maybe). Just use
# this for now.
DIST-SRCS-$(ARCH_ARM) += build/make/obj_int_extract.c
DIST-SRCS-$(ARCH_ARM) += build/make/ads2gas.pl
DIST-SRCS-yes += $(target:-$(TOOLCHAIN)=).mk
endif

View File

@@ -17,17 +17,15 @@ for i; do
on_of=1
elif [ "$i" == "-v" ]; then
verbose=1
elif [ "$i" == "-g" ]; then
args="${args} --debug"
elif [ "$on_of" == "1" ]; then
outfile=$i
on_of=0
on_of=0
elif [ -f "$i" ]; then
infiles="$infiles $i"
elif [ "${i:0:2}" == "-l" ]; then
libs="$libs ${i#-l}"
elif [ "${i:0:2}" == "-L" ]; then
libpaths="${libpaths} ${i#-L}"
libpaths="${libpaths} ${i#-L}"
else
args="${args} ${i}"
fi

View File

@@ -78,12 +78,11 @@ Build options:
--log=yes|no|FILE file configure log is written to [config.err]
--target=TARGET target platform tuple [generic-gnu]
--cpu=CPU optimize for a specific cpu rather than a family
--extra-cflags=ECFLAGS add ECFLAGS to CFLAGS [$CFLAGS]
${toggle_extra_warnings} emit harmless warnings (always non-fatal)
${toggle_werror} treat warnings as errors, if possible
(not available with all compilers)
${toggle_optimizations} turn on/off compiler optimization flags
${toggle_pic} turn on/off Position Independent Code
${toggle_pic} turn on/off Position Independant Code
${toggle_ccache} turn on/off compiler cache
${toggle_debug} enable/disable debug mode
${toggle_gprof} enable/disable gprof profiling instrumentation
@@ -443,9 +442,6 @@ process_common_cmdline() {
;;
--cpu=*) tune_cpu="$optval"
;;
--extra-cflags=*)
extra_cflags="${optval}"
;;
--enable-?*|--disable-?*)
eval `echo "$opt" | sed 's/--/action=/;s/-/ option=/;s/-/_/g'`
echo "${CMDLINE_SELECT} ${ARCH_EXT_LIST}" | grep "^ *$option\$" >/dev/null || die_unknown $opt
@@ -624,10 +620,6 @@ process_common_toolchain() {
# Handle Solaris variants. Solaris 10 needs -lposix4
case ${toolchain} in
sparc-solaris-*)
add_extralibs -lposix4
add_cflags "-DMUST_BE_ALIGNED"
;;
*-solaris-*)
add_extralibs -lposix4
;;
@@ -668,12 +660,12 @@ process_common_toolchain() {
elif enabled armv7
then
check_add_cflags -march=armv7-a -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp #-ftree-vectorize
check_add_asflags -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp #-march=armv7-a
check_add_asflags -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp #-march=armv7-a
else
check_add_cflags -march=${tgt_isa}
check_add_asflags -march=${tgt_isa}
fi
enabled debug && add_asflags -g
asm_conversion_cmd="${source_path}/build/make/ads2gas.pl"
;;
rvct)
@@ -698,24 +690,16 @@ process_common_toolchain() {
arch_int=${tgt_isa##armv}
arch_int=${arch_int%%te}
check_add_asflags --pd "\"ARCHITECTURE SETA ${arch_int}\""
enabled debug && add_asflags -g
add_cflags --gnu
add_cflags --enum_is_int
add_cflags --wchar32
;;
esac
case ${tgt_os} in
none*)
disable multithread
disable os_support
;;
darwin*)
SDK_PATH=/Developer/Platforms/iPhoneOS.platform/Developer
TOOLCHAIN_PATH=${SDK_PATH}/usr/bin
CC=${TOOLCHAIN_PATH}/gcc
AR=${TOOLCHAIN_PATH}/ar
LD=${TOOLCHAIN_PATH}/arm-apple-darwin10-gcc-4.2.1
LD=${TOOLCHAIN_PATH}/arm-apple-darwin9-gcc-4.2.1
AS=${TOOLCHAIN_PATH}/as
STRIP=${TOOLCHAIN_PATH}/strip
NM=${TOOLCHAIN_PATH}/nm
@@ -729,18 +713,19 @@ process_common_toolchain() {
add_cflags -arch ${tgt_isa}
add_ldflags -arch_only ${tgt_isa}
add_cflags "-isysroot ${SDK_PATH}/SDKs/iPhoneOS4.3.sdk"
add_cflags "-isysroot /Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS3.1.sdk"
# This should be overridable
alt_libc=${SDK_PATH}/SDKs/iPhoneOS4.3.sdk
alt_libc=${SDK_PATH}/SDKs/iPhoneOS3.1.sdk
# Add the paths for the alternate libc
for d in usr/include usr/include/gcc/darwin/4.2/ usr/lib/gcc/arm-apple-darwin10/4.2.1/include/; do
# for d in usr/include usr/include/gcc/darwin/4.0/; do
for d in usr/include usr/include/gcc/darwin/4.0/ usr/lib/gcc/arm-apple-darwin9/4.0.1/include/; do
try_dir="${alt_libc}/${d}"
[ -d "${try_dir}" ] && add_cflags -I"${try_dir}"
done
for d in lib usr/lib usr/lib/system; do
for d in lib usr/lib; do
try_dir="${alt_libc}/${d}"
[ -d "${try_dir}" ] && add_ldflags -L"${try_dir}"
done
@@ -757,9 +742,13 @@ process_common_toolchain() {
|| die "Must supply --libc when targetting *-linux-rvct"
# Set up compiler
add_cflags --gnu
add_cflags --enum_is_int
add_cflags --library_interface=aeabi_glibc
add_cflags --no_hide_all
add_cflags --wchar32
add_cflags --dwarf2
add_cflags --gnu
# Set up linker
add_ldflags --sysv --no_startup --no_ref_cpp_init
@@ -866,7 +855,7 @@ process_common_toolchain() {
setup_gnu_toolchain
add_cflags -use-msasm -use-asm
add_ldflags -i-static
enabled x86_64 && add_cflags -ipo -no-prec-div -static -xSSE2 -axSSE2
enabled x86_64 && add_cflags -ipo -no-prec-div -static -xSSE3 -axSSE3
enabled x86_64 && AR=xiar
case ${tune_cpu} in
atom*)
@@ -884,8 +873,6 @@ process_common_toolchain() {
link_with_cc=gcc
tune_cflags="-march="
setup_gnu_toolchain
#for 32 bit x86 builds, -O3 did not turn on this flag
enabled optimizations && check_add_cflags -fomit-frame-pointer
;;
esac
@@ -957,40 +944,8 @@ process_common_toolchain() {
enabled rvct && check_add_cflags -Otime
enabled small && check_add_cflags -O2 || check_add_cflags -O3
fi
if enabled opencl; then
disable multithread
echo " disabling multithread"
soft_enable opencl #Provide output to make user comfortable
enable runtime_cpu_detect
#Use dlopen() to load OpenCL when possible.
case ${toolchain} in
*darwin10*)
check_add_cflags -D__APPLE__
add_extralibs -framework OpenCL
;;
*-win32-gcc)
if check_header dlfcn.h; then
add_extralibs -ldl
enable dlopen
else
#This shouldn't be a hard-coded path in the long term
add_extralibs -L/cygdrive/c/Windows/System32 -lOpenCL
fi
;;
*)
if check_header dlfcn.h; then
add_extralibs -ldl
enable dlopen
else
add_extralibs -lOpenCL
fi
;;
esac
fi
# Position Independent Code (PIC) support, for building relocatable
# Position Independant Code (PIC) support, for building relocatable
# shared objects
enabled gcc && enabled pic && check_add_cflags -fPIC
@@ -1017,12 +972,6 @@ EOF
add_cflags -D_LARGEFILE_SOURCE
add_cflags -D_FILE_OFFSET_BITS=64
fi
# append any user defined extra cflags
if [ -n "${extra_cflags}" ] ; then
check_add_cflags ${extra_cflags} || \
die "Requested extra CFLAGS '${extra_cflags}' not supported by compiler"
fi
}
process_toolchain() {

View File

@@ -32,8 +32,7 @@ Options:
--name=project_name Name of the project (required)
--proj-guid=GUID GUID to use for the project
--module-def=filename File containing export definitions (for DLLs)
--ver=version Version (7,8,9) of visual studio to generate for
--src-path-bare=dir Path to root of source tree
--ver=version Version (7,8) of visual studio to generate for
-Ipath/to/include Additional include directories
-DFLAG[=value] Preprocessor macros to define
-Lpath/to/lib Additional library search paths
@@ -133,7 +132,7 @@ generate_filter() {
open_tag Filter \
Name=$name \
Filter=$pats \
UniqueIdentifier=`generate_uuid` \
UniqueIdentifier=`generate_uuid`
file_list_sz=${#file_list[@]}
for i in ${!file_list[@]}; do
@@ -146,21 +145,31 @@ generate_filter() {
if [ "$pat" == "asm" ] && $asm_use_custom_step; then
for plat in "${platforms[@]}"; do
for cfg in Debug Release; do
open_tag FileConfiguration \
Name="${cfg}|${plat}" \
open_tag FileConfiguration \
Name="${cfg}|${plat}"
tag Tool \
Name="VCCustomBuildTool" \
Description="Assembling \$(InputFileName)" \
CommandLine="$(eval echo \$asm_${cfg}_cmdline)" \
Outputs="\$(InputName).obj" \
CommandLine="$(eval echo \$asm_${cfg}_cmdline)"\
Outputs="\$(InputName).obj"
close_tag FileConfiguration
done
done
fi
close_tag File
if [ "${f##*.}" == "cpp" ]; then
for plat in "${platforms[@]}"; do
for cfg in Debug Release; do
open_tag FileConfiguration \
Name="${cfg}|${plat}"
tag Tool \
Name="VCCLCompilerTool" \
CompileAs="2"
close_tag FileConfiguration
done
done
fi
close_tag File
break
fi
@@ -176,63 +185,57 @@ unset target
for opt in "$@"; do
optval="${opt#*=}"
case "$opt" in
--help|-h) show_help
;;
--target=*) target="${optval}"
;;
--out=*) outfile="$optval"
;;
--name=*) name="${optval}"
;;
--proj-guid=*) guid="${optval}"
;;
--module-def=*) link_opts="${link_opts} ModuleDefinitionFile=${optval}"
;;
--exe) proj_kind="exe"
;;
--lib) proj_kind="lib"
;;
--src-path-bare=*) src_path_bare="$optval"
;;
--static-crt) use_static_runtime=true
;;
--ver=*)
vs_ver="$optval"
case "$optval" in
[789])
;;
*) die Unrecognized Visual Studio Version in $opt
;;
esac
;;
-I*)
opt="${opt%/}"
incs="${incs}${incs:+;}&quot;${opt##-I}&quot;"
yasmincs="${yasmincs} ${opt}"
;;
-D*) defines="${defines}${defines:+;}${opt##-D}"
;;
-L*) # fudge . to $(OutDir)
if [ "${opt##-L}" == "." ]; then
libdirs="${libdirs}${libdirs:+;}&quot;\$(OutDir)&quot;"
else
# Also try directories for this platform/configuration
libdirs="${libdirs}${libdirs:+;}&quot;${opt##-L}&quot;"
libdirs="${libdirs}${libdirs:+;}&quot;${opt##-L}/\$(PlatformName)/\$(ConfigurationName)&quot;"
libdirs="${libdirs}${libdirs:+;}&quot;${opt##-L}/\$(PlatformName)&quot;"
fi
;;
-l*) libs="${libs}${libs:+ }${opt##-l}.lib"
;;
-*) die_unknown $opt
;;
*)
file_list[${#file_list[@]}]="$opt"
case "$opt" in
*.asm) uses_asm=true
;;
esac
;;
--help|-h) show_help
;;
--target=*) target="${optval}"
;;
--out=*) outfile="$optval"
;;
--name=*) name="${optval}"
;;
--proj-guid=*) guid="${optval}"
;;
--module-def=*)
link_opts="${link_opts} ModuleDefinitionFile=${optval}"
;;
--exe) proj_kind="exe"
;;
--lib) proj_kind="lib"
;;
--static-crt) use_static_runtime=true
;;
--ver=*) vs_ver="$optval"
case $optval in
[789])
;;
*) die Unrecognized Visual Studio Version in $opt
;;
esac
;;
-I*) opt="${opt%/}"
incs="${incs}${incs:+;}&quot;${opt##-I}&quot;"
yasmincs="${yasmincs} ${opt}"
;;
-D*) defines="${defines}${defines:+;}${opt##-D}"
;;
-L*) # fudge . to $(OutDir)
if [ "${opt##-L}" == "." ]; then
libdirs="${libdirs}${libdirs:+;}&quot;\$(OutDir)&quot;"
else
# Also try directories for this platform/configuration
libdirs="${libdirs}${libdirs:+;}&quot;${opt##-L}&quot;"
libdirs="${libdirs}${libdirs:+;}&quot;${opt##-L}/\$(PlatformName)/\$(ConfigurationName)&quot;"
libdirs="${libdirs}${libdirs:+;}&quot;${opt##-L}/\$(PlatformName)&quot;"
fi
;;
-l*) libs="${libs}${libs:+ }${opt##-l}.lib"
;;
-*) die_unknown $opt
;;
*) file_list[${#file_list[@]}]="$opt"
case "$opt" in
*.asm) uses_asm=true;;
esac
esac
done
outfile=${outfile:-/dev/stdout}
@@ -275,7 +278,11 @@ done
# List Keyword for this target
case "$target" in
x86*) keyword="ManagedCProj"
x86*)
keyword="ManagedCProj"
;;
arm*|iwmmx*)
keyword="Win32Proj"
;;
*) die "Unsupported target $target!"
esac
@@ -291,255 +298,402 @@ case "$target" in
asm_Debug_cmdline="yasm -Xvc -g cv8 -f \$(PlatformName) ${yasmincs} &quot;\$(InputPath)&quot;"
asm_Release_cmdline="yasm -Xvc -f \$(PlatformName) ${yasmincs} &quot;\$(InputPath)&quot;"
;;
arm*|iwmmx*)
case "${name}" in
obj_int_extract) platforms[0]="Win32"
;;
*) platforms[0]="Pocket PC 2003 (ARMV4)"
;;
esac
;;
*) die "Unsupported target $target!"
esac
# List Command-line Arguments for this target
case "$target" in
arm*|iwmmx*)
if [ "$name" == "example" ];then
ARGU="--codec vp6 --flipuv --progress _bnd.vp6"
fi
if [ "$name" == "xma" ];then
ARGU="--codec vp6 -h 240 -w 320 -v"
fi
;;
esac
generate_vcproj() {
case "$proj_kind" in
exe) vs_ConfigurationType=1
;;
*) vs_ConfigurationType=4
;;
exe) vs_ConfigurationType=1
;;
*) vs_ConfigurationType=4
;;
esac
echo "<?xml version=\"1.0\" encoding=\"Windows-1252\"?>"
open_tag VisualStudioProject \
ProjectType="Visual C++" \
Version="${vs_ver_id}" \
Name="${name}" \
ProjectGUID="{${guid}}" \
RootNamespace="${name}" \
Keyword="${keyword}" \
open_tag VisualStudioProject \
ProjectType="Visual C++" \
Version="${vs_ver_id}" \
Name="${name}" \
ProjectGUID="{${guid}}" \
RootNamespace="${name}" \
Keyword="${keyword}"
open_tag Platforms
open_tag Platforms
for plat in "${platforms[@]}"; do
tag Platform Name="$plat"
tag Platform Name="$plat"
done
close_tag Platforms
open_tag ToolFiles
open_tag ToolFiles
case "$target" in
x86*) $uses_asm && tag ToolFile RelativePath="$self_dirname/../x86-msvs/yasm.rules"
;;
arm*|iwmmx*)
if [ "$name" == "vpx" ];then
case "$target" in
armv5*)
tag ToolFile RelativePath="$self_dirname/../arm-wince-vs8/armasmv5.rules"
;;
armv6*)
tag ToolFile RelativePath="$self_dirname/../arm-wince-vs8/armasmv6.rules"
;;
iwmmxt*)
tag ToolFile RelativePath="$self_dirname/../arm-wince-vs8/armasmxscale.rules"
;;
esac
fi
;;
esac
close_tag ToolFiles
open_tag Configurations
open_tag Configurations
for plat in "${platforms[@]}"; do
plat_no_ws=`echo $plat | sed 's/[^A-Za-z0-9_]/_/g'`
open_tag Configuration \
Name="Debug|$plat" \
OutputDirectory="\$(SolutionDir)$plat_no_ws/\$(ConfigurationName)" \
IntermediateDirectory="$plat_no_ws/\$(ConfigurationName)/${name}" \
ConfigurationType="$vs_ConfigurationType" \
CharacterSet="1" \
open_tag Configuration \
Name="Debug|$plat" \
OutputDirectory="\$(SolutionDir)$plat_no_ws/\$(ConfigurationName)" \
IntermediateDirectory="$plat_no_ws/\$(ConfigurationName)/${name}" \
ConfigurationType="$vs_ConfigurationType" \
CharacterSet="1"
if [ "$target" == "armv6-wince-vs8" ] || [ "$target" == "armv5te-wince-vs8" ] || [ "$target" == "iwmmxt-wince-vs8" ] || [ "$target" == "iwmmxt2-wince-vs8" ];then
case "$name" in
vpx) tag Tool \
Name="VCPreBuildEventTool" \
CommandLine="call obj_int_extract.bat \$(ConfigurationName)"
tag Tool \
Name="VCMIDLTool" \
TargetEnvironment="1"
tag Tool \
Name="VCCLCompilerTool" \
ExecutionBucket="7" \
Optimization="0" \
AdditionalIncludeDirectories="$incs" \
PreprocessorDefinitions="_DEBUG;_WIN32_WCE=\$(CEVER);UNDER_CE;\$(PLATFORMDEFINES);WINCE;DEBUG;_LIB;\$(ARCHFAM);\$(_ARCHFAM_);_UNICODE;UNICODE;" \
MinimalRebuild="true" \
RuntimeLibrary="1" \
BufferSecurityCheck="false" \
UsePrecompiledHeader="0" \
WarningLevel="3" \
DebugInformationFormat="1" \
CompileAs="1"
tag Tool \
Name="VCResourceCompilerTool" \
PreprocessorDefinitions="_DEBUG;_WIN32_WCE=\$(CEVER);UNDER_CE;\$(PLATFORMDEFINES)" \
Culture="1033" \
AdditionalIncludeDirectories="\$(IntDir)" \
;;
example|xma) tag Tool \
Name="VCCLCompilerTool" \
ExecutionBucket="7" \
Optimization="0" \
AdditionalIncludeDirectories="$incs" \
PreprocessorDefinitions="_DEBUG;_WIN32_WCE=\$(CEVER);UNDER_CE;\$(PLATFORMDEFINES);WINCE;DEBUG;_CONSOLE;\$(ARCHFAM);\$(_ARCHFAM_);_UNICODE;UNICODE;" \
MinimalRebuild="true" \
RuntimeLibrary="1" \
BufferSecurityCheck="false" \
UsePrecompiledHeader="0" \
WarningLevel="3" \
DebugInformationFormat="1" \
CompileAs="1"
tag Tool \
Name="VCResourceCompilerTool" \
PreprocessorDefinitions="_DEBUG;_WIN32_WCE=\$(CEVER);UNDER_CE;\$(PLATFORMDEFINES)" \
Culture="1033" \
AdditionalIncludeDirectories="\$(IntDir)" \
;;
obj_int_extract) tag Tool \
Name="VCCLCompilerTool" \
Optimization="0" \
AdditionalIncludeDirectories="$incs" \
PreprocessorDefinitions="WIN32;DEBUG;_CONSOLE" \
RuntimeLibrary="1" \
WarningLevel="3" \
DebugInformationFormat="1" \
;;
esac
fi
case "$target" in
x86*)
case "$name" in
obj_int_extract)
tag Tool \
Name="VCCLCompilerTool" \
Optimization="0" \
AdditionalIncludeDirectories="$incs" \
PreprocessorDefinitions="WIN32;DEBUG;_CONSOLE;_CRT_SECURE_NO_WARNINGS;_CRT_SECURE_NO_DEPRECATE" \
RuntimeLibrary="$debug_runtime" \
WarningLevel="3" \
Detect64BitPortabilityProblems="true" \
DebugInformationFormat="1" \
;;
vpx)
tag Tool \
Name="VCPreBuildEventTool" \
CommandLine="call obj_int_extract.bat $src_path_bare" \
x86*) tag Tool \
Name="VCCLCompilerTool" \
Optimization="0" \
AdditionalIncludeDirectories="$incs" \
PreprocessorDefinitions="WIN32;_DEBUG;_CRT_SECURE_NO_WARNINGS;_CRT_SECURE_NO_DEPRECATE;$defines" \
RuntimeLibrary="$debug_runtime" \
UsePrecompiledHeader="0" \
WarningLevel="3" \
DebugInformationFormat="1" \
Detect64BitPortabilityProblems="true" \
tag Tool \
Name="VCCLCompilerTool" \
Optimization="0" \
AdditionalIncludeDirectories="$incs" \
PreprocessorDefinitions="WIN32;_DEBUG;_CRT_SECURE_NO_WARNINGS;_CRT_SECURE_NO_DEPRECATE;$defines" \
RuntimeLibrary="$debug_runtime" \
UsePrecompiledHeader="0" \
WarningLevel="3" \
DebugInformationFormat="1" \
Detect64BitPortabilityProblems="true" \
$uses_asm && tag Tool Name="YASM" IncludePaths="$incs" Debug="1"
;;
*)
tag Tool \
Name="VCCLCompilerTool" \
Optimization="0" \
AdditionalIncludeDirectories="$incs" \
PreprocessorDefinitions="WIN32;_DEBUG;_CRT_SECURE_NO_WARNINGS;_CRT_SECURE_NO_DEPRECATE;$defines" \
RuntimeLibrary="$debug_runtime" \
UsePrecompiledHeader="0" \
WarningLevel="3" \
DebugInformationFormat="1" \
Detect64BitPortabilityProblems="true" \
$uses_asm && tag Tool Name="YASM" IncludePaths="$incs" Debug="1"
;;
esac
$uses_asm && tag Tool Name="YASM" IncludePaths="$incs" Debug="1"
;;
esac
case "$proj_kind" in
exe)
case "$target" in
x86*)
x86*) tag Tool \
Name="VCLinkerTool" \
AdditionalDependencies="$debug_libs \$(NoInherit)" \
AdditionalLibraryDirectories="$libdirs" \
GenerateDebugInformation="true" \
ProgramDatabaseFile="\$(OutDir)/${name}.pdb" \
;;
arm*|iwmmx*)
case "$name" in
obj_int_extract)
tag Tool \
Name="VCLinkerTool" \
OutputFile="${name}.exe" \
GenerateDebugInformation="true" \
obj_int_extract) tag Tool \
Name="VCLinkerTool" \
OutputFile="${name}.exe" \
GenerateDebugInformation="true"
;;
*)
tag Tool \
Name="VCLinkerTool" \
AdditionalDependencies="$debug_libs \$(NoInherit)" \
AdditionalLibraryDirectories="$libdirs" \
GenerateDebugInformation="true" \
ProgramDatabaseFile="\$(OutDir)/${name}.pdb" \
*) tag Tool \
Name="VCLinkerTool" \
AdditionalDependencies="$debug_libs" \
OutputFile="\$(OutDir)/${name}.exe" \
LinkIncremental="2" \
AdditionalLibraryDirectories="${libdirs};&quot;..\lib/$plat_no_ws&quot;" \
DelayLoadDLLs="\$(NOINHERIT)" \
GenerateDebugInformation="true" \
ProgramDatabaseFile="\$(OutDir)/${name}.pdb" \
SubSystem="9" \
StackReserveSize="65536" \
StackCommitSize="4096" \
EntryPointSymbol="mainWCRTStartup" \
TargetMachine="3"
;;
esac
;;
;;
esac
;;
lib)
case "$target" in
x86*)
tag Tool \
Name="VCLibrarianTool" \
OutputFile="\$(OutDir)/${name}${lib_sfx}d.lib" \
;;
arm*|iwmmx*) tag Tool \
Name="VCLibrarianTool" \
AdditionalOptions=" /subsystem:windowsce,4.20 /machine:ARM" \
OutputFile="\$(OutDir)/${name}.lib" \
;;
*) tag Tool \
Name="VCLibrarianTool" \
OutputFile="\$(OutDir)/${name}${lib_sfx}d.lib" \
;;
esac
;;
dll)
tag Tool \
Name="VCLinkerTool" \
AdditionalDependencies="\$(NoInherit)" \
LinkIncremental="2" \
GenerateDebugInformation="true" \
AssemblyDebug="1" \
TargetMachine="1" \
$link_opts \
;;
dll) tag Tool \
Name="VCLinkerTool" \
AdditionalDependencies="\$(NoInherit)" \
LinkIncremental="2" \
GenerateDebugInformation="true" \
AssemblyDebug="1" \
TargetMachine="1" \
$link_opts
esac
if [ "$target" == "armv6-wince-vs8" ] || [ "$target" == "armv5te-wince-vs8" ] || [ "$target" == "iwmmxt-wince-vs8" ] || [ "$target" == "iwmmxt2-wince-vs8" ];then
case "$name" in
vpx) tag DeploymentTool \
ForceDirty="-1" \
RegisterOutput="0"
;;
example|xma) tag DeploymentTool \
ForceDirty="-1" \
RegisterOutput="0"
tag DebuggerTool \
Arguments="${ARGU}"
;;
esac
fi
close_tag Configuration
open_tag Configuration \
Name="Release|$plat" \
OutputDirectory="\$(SolutionDir)$plat_no_ws/\$(ConfigurationName)" \
IntermediateDirectory="$plat_no_ws/\$(ConfigurationName)/${name}" \
ConfigurationType="$vs_ConfigurationType" \
CharacterSet="1" \
WholeProgramOptimization="0" \
open_tag Configuration \
Name="Release|$plat" \
OutputDirectory="\$(SolutionDir)$plat_no_ws/\$(ConfigurationName)" \
IntermediateDirectory="$plat_no_ws/\$(ConfigurationName)/${name}" \
ConfigurationType="$vs_ConfigurationType" \
CharacterSet="1" \
WholeProgramOptimization="0"
case "$target" in
x86*)
case "$name" in
obj_int_extract)
tag Tool \
Name="VCCLCompilerTool" \
AdditionalIncludeDirectories="$incs" \
PreprocessorDefinitions="WIN32;NDEBUG;_CONSOLE;_CRT_SECURE_NO_WARNINGS;_CRT_SECURE_NO_DEPRECATE" \
RuntimeLibrary="$release_runtime" \
UsePrecompiledHeader="0" \
WarningLevel="3" \
Detect64BitPortabilityProblems="true" \
DebugInformationFormat="0" \
;;
vpx)
tag Tool \
Name="VCPreBuildEventTool" \
CommandLine="call obj_int_extract.bat $src_path_bare" \
if [ "$target" == "armv6-wince-vs8" ] || [ "$target" == "armv5te-wince-vs8" ] || [ "$target" == "iwmmxt-wince-vs8" ] || [ "$target" == "iwmmxt2-wince-vs8" ];then
case "$name" in
vpx) tag Tool \
Name="VCPreBuildEventTool" \
CommandLine="call obj_int_extract.bat \$(ConfigurationName)"
tag Tool \
Name="VCMIDLTool" \
TargetEnvironment="1"
tag Tool \
Name="VCCLCompilerTool" \
ExecutionBucket="7" \
Optimization="2" \
FavorSizeOrSpeed="1" \
AdditionalIncludeDirectories="$incs" \
PreprocessorDefinitions="NDEBUG;_WIN32_WCE=\$(CEVER);UNDER_CE;\$(PLATFORMDEFINES);WINCE;_LIB;\$(ARCHFAM);\$(_ARCHFAM_);_UNICODE;UNICODE;" \
RuntimeLibrary="0" \
BufferSecurityCheck="false" \
UsePrecompiledHeader="0" \
WarningLevel="3" \
DebugInformationFormat="0" \
CompileAs="1"
tag Tool \
Name="VCResourceCompilerTool" \
PreprocessorDefinitions="NDEBUG;_WIN32_WCE=\$(CEVER);UNDER_CE;\$(PLATFORMDEFINES)" \
Culture="1033" \
AdditionalIncludeDirectories="\$(IntDir)" \
;;
example|xma) tag Tool \
Name="VCCLCompilerTool" \
ExecutionBucket="7" \
Optimization="2" \
FavorSizeOrSpeed="1" \
AdditionalIncludeDirectories="$incs" \
PreprocessorDefinitions="NDEBUG;_WIN32_WCE=\$(CEVER);UNDER_CE;\$(PLATFORMDEFINES);WINCE;_CONSOLE;\$(ARCHFAM);\$(_ARCHFAM_);_UNICODE;UNICODE;" \
RuntimeLibrary="0" \
BufferSecurityCheck="false" \
UsePrecompiledHeader="0" \
WarningLevel="3" \
DebugInformationFormat="0" \
CompileAs="1"
tag Tool \
Name="VCResourceCompilerTool" \
PreprocessorDefinitions="NDEBUG;_WIN32_WCE=\$(CEVER);UNDER_CE;\$(PLATFORMDEFINES)" \
Culture="1033" \
AdditionalIncludeDirectories="\$(IntDir)" \
;;
obj_int_extract) tag Tool \
Name="VCCLCompilerTool" \
AdditionalIncludeDirectories="$incs" \
PreprocessorDefinitions="WIN32;NDEBUG;_CONSOLE" \
RuntimeLibrary="0" \
UsePrecompiledHeader="0" \
WarningLevel="3" \
Detect64BitPortabilityProblems="true" \
DebugInformationFormat="0" \
;;
esac
fi
tag Tool \
Name="VCCLCompilerTool" \
AdditionalIncludeDirectories="$incs" \
PreprocessorDefinitions="WIN32;NDEBUG;_CRT_SECURE_NO_WARNINGS;_CRT_SECURE_NO_DEPRECATE;$defines" \
RuntimeLibrary="$release_runtime" \
UsePrecompiledHeader="0" \
WarningLevel="3" \
DebugInformationFormat="0" \
Detect64BitPortabilityProblems="true" \
case "$target" in
x86*) tag Tool \
Name="VCCLCompilerTool" \
AdditionalIncludeDirectories="$incs" \
PreprocessorDefinitions="WIN32;NDEBUG;_CRT_SECURE_NO_WARNINGS;_CRT_SECURE_NO_DEPRECATE;$defines" \
RuntimeLibrary="$release_runtime" \
UsePrecompiledHeader="0" \
WarningLevel="3" \
DebugInformationFormat="0" \
Detect64BitPortabilityProblems="true"
$uses_asm && tag Tool Name="YASM" IncludePaths="$incs"
;;
*)
tag Tool \
Name="VCCLCompilerTool" \
AdditionalIncludeDirectories="$incs" \
PreprocessorDefinitions="WIN32;NDEBUG;_CRT_SECURE_NO_WARNINGS;_CRT_SECURE_NO_DEPRECATE;$defines" \
RuntimeLibrary="$release_runtime" \
UsePrecompiledHeader="0" \
WarningLevel="3" \
DebugInformationFormat="0" \
Detect64BitPortabilityProblems="true" \
$uses_asm && tag Tool Name="YASM" IncludePaths="$incs"
;;
$uses_asm && tag Tool Name="YASM" IncludePaths="$incs"
;;
esac
;;
esac
case "$proj_kind" in
exe)
case "$target" in
x86*)
x86*) tag Tool \
Name="VCLinkerTool" \
AdditionalDependencies="$libs \$(NoInherit)" \
AdditionalLibraryDirectories="$libdirs" \
;;
arm*|iwmmx*)
case "$name" in
obj_int_extract)
tag Tool \
Name="VCLinkerTool" \
OutputFile="${name}.exe" \
GenerateDebugInformation="true" \
obj_int_extract) tag Tool \
Name="VCLinkerTool" \
OutputFile="${name}.exe" \
LinkIncremental="1" \
GenerateDebugInformation="false" \
SubSystem="0" \
OptimizeReferences="0" \
EnableCOMDATFolding="0" \
TargetMachine="0"
;;
*)
tag Tool \
Name="VCLinkerTool" \
AdditionalDependencies="$libs \$(NoInherit)" \
AdditionalLibraryDirectories="$libdirs" \
*) tag Tool \
Name="VCLinkerTool" \
AdditionalDependencies="$libs" \
OutputFile="\$(OutDir)/${name}.exe" \
LinkIncremental="1" \
AdditionalLibraryDirectories="${libdirs};&quot;..\lib/$plat_no_ws&quot;" \
DelayLoadDLLs="\$(NOINHERIT)" \
GenerateDebugInformation="true" \
ProgramDatabaseFile="\$(OutDir)/${name}.pdb" \
SubSystem="9" \
StackReserveSize="65536" \
StackCommitSize="4096" \
OptimizeReferences="2" \
EnableCOMDATFolding="2" \
EntryPointSymbol="mainWCRTStartup" \
TargetMachine="3"
;;
esac
;;
;;
esac
;;
lib)
lib)
case "$target" in
x86*)
tag Tool \
Name="VCLibrarianTool" \
OutputFile="\$(OutDir)/${name}${lib_sfx}.lib" \
;;
arm*|iwmmx*) tag Tool \
Name="VCLibrarianTool" \
AdditionalOptions=" /subsystem:windowsce,4.20 /machine:ARM" \
OutputFile="\$(OutDir)/${name}.lib" \
;;
*) tag Tool \
Name="VCLibrarianTool" \
OutputFile="\$(OutDir)/${name}${lib_sfx}.lib" \
;;
esac
;;
dll) # note differences to debug version: LinkIncremental, AssemblyDebug
tag Tool \
Name="VCLinkerTool" \
AdditionalDependencies="\$(NoInherit)" \
LinkIncremental="1" \
GenerateDebugInformation="true" \
TargetMachine="1" \
$link_opts \
;;
;;
dll) # note differences to debug version: LinkIncremental, AssemblyDebug
tag Tool \
Name="VCLinkerTool" \
AdditionalDependencies="\$(NoInherit)" \
LinkIncremental="1" \
GenerateDebugInformation="true" \
TargetMachine="1" \
$link_opts
esac
if [ "$target" == "armv6-wince-vs8" ] || [ "$target" == "armv5te-wince-vs8" ] || [ "$target" == "iwmmxt-wince-vs8" ] || [ "$target" == "iwmmxt2-wince-vs8" ];then
case "$name" in
vpx) tag DeploymentTool \
ForceDirty="-1" \
RegisterOutput="0"
;;
example|xma) tag DeploymentTool \
ForceDirty="-1" \
RegisterOutput="0"
tag DebuggerTool \
Arguments="${ARGU}"
;;
esac
fi
close_tag Configuration
done
close_tag Configurations
open_tag Files
generate_filter srcs "Source Files" "c;def;odl;idl;hpj;bat;asm;asmx"
generate_filter hdrs "Header Files" "h;hm;inl;inc;xsd"
open_tag Files
generate_filter srcs "Source Files" "cpp;c;cc;cxx;def;odl;idl;hpj;bat;asm;asmx"
generate_filter hdrs "Header Files" "h;hpp;hxx;hm;inl;inc;xsd"
generate_filter resrcs "Resource Files" "rc;ico;cur;bmp;dlg;rc2;rct;bin;rgs;gif;jpg;jpeg;jpe;resx;tiff;tif;png;wav"
generate_filter resrcs "Build Files" "mk"
close_tag Files

View File

@@ -139,6 +139,9 @@ process_global() {
echo "${indent}${proj_guid}.${config}.ActiveCfg = ${config}"
echo "${indent}${proj_guid}.${config}.Build.0 = ${config}"
if [ "$target" == "armv6-wince-vs8" ] || [ "$target" == "armv5te-wince-vs8" ] || [ "$target" == "iwmmxt-wince-vs8" ] || [ "$target" == "iwmmxt2-wince-vs8" ];then
echo "${indent}${proj_guid}.${config}.Deploy.0 = ${config}"
fi
done
IFS=${IFS_bak}
done

File diff suppressed because it is too large Load Diff

View File

@@ -1,15 +0,0 @@
REM Copyright (c) 2011 The WebM project authors. All Rights Reserved.
REM
REM Use of this source code is governed by a BSD-style license
REM that can be found in the LICENSE file in the root of the source
REM tree. An additional intellectual property rights grant can be found
REM in the file PATENTS. All contributing project authors may
REM be found in the AUTHORS file in the root of the source tree.
echo on
cl /I "./" /I "%1" /nologo /c "%1/vp8/common/asm_com_offsets.c"
cl /I "./" /I "%1" /nologo /c "%1/vp8/decoder/asm_dec_offsets.c"
cl /I "./" /I "%1" /nologo /c "%1/vp8/encoder/asm_enc_offsets.c"
obj_int_extract.exe rvds "asm_com_offsets.obj" > "asm_com_offsets.asm"
obj_int_extract.exe rvds "asm_dec_offsets.obj" > "asm_dec_offsets.asm"
obj_int_extract.exe rvds "asm_enc_offsets.obj" > "asm_enc_offsets.asm"

31
configure vendored
View File

@@ -37,10 +37,11 @@ Advanced options:
${toggle_multithread} multithreaded encoding and decoding.
${toggle_spatial_resampling} spatial sampling (scaling) support
${toggle_realtime_only} enable this option while building for real-time encoding
${toggle_error_concealment} enable this option to get a decoder which is able to conceal losses
${toggle_runtime_cpu_detect} runtime cpu detection
${toggle_shared} shared library support
${toggle_small} favor smaller size over speed
${toggle_opencl} support for OpenCL-assisted VP8 decoding (experimental)
${toggle_arm_asm_detok} assembly version of the detokenizer (ARM platforms only)
${toggle_postproc_visualizer} macro block / block level visualizers
Codecs:
@@ -79,21 +80,22 @@ EOF
# alphabetically by architecture, generic-gnu last.
all_platforms="${all_platforms} armv5te-linux-rvct"
all_platforms="${all_platforms} armv5te-linux-gcc"
all_platforms="${all_platforms} armv5te-none-rvct"
all_platforms="${all_platforms} armv5te-symbian-gcc"
all_platforms="${all_platforms} armv5te-wince-vs8"
all_platforms="${all_platforms} armv6-darwin-gcc"
all_platforms="${all_platforms} armv6-linux-rvct"
all_platforms="${all_platforms} armv6-linux-gcc"
all_platforms="${all_platforms} armv6-none-rvct"
all_platforms="${all_platforms} armv6-symbian-gcc"
all_platforms="${all_platforms} armv6-wince-vs8"
all_platforms="${all_platforms} iwmmxt-linux-rvct"
all_platforms="${all_platforms} iwmmxt-linux-gcc"
all_platforms="${all_platforms} iwmmxt-wince-vs8"
all_platforms="${all_platforms} iwmmxt2-linux-rvct"
all_platforms="${all_platforms} iwmmxt2-linux-gcc"
all_platforms="${all_platforms} iwmmxt2-wince-vs8"
all_platforms="${all_platforms} armv7-darwin-gcc" #neon Cortex-A8
all_platforms="${all_platforms} armv7-linux-rvct" #neon Cortex-A8
all_platforms="${all_platforms} armv7-linux-gcc" #neon Cortex-A8
all_platforms="${all_platforms} armv7-none-rvct" #neon Cortex-A8
all_platforms="${all_platforms} mips32-linux-gcc"
all_platforms="${all_platforms} ppc32-darwin8-gcc"
all_platforms="${all_platforms} ppc32-darwin9-gcc"
@@ -106,7 +108,6 @@ all_platforms="${all_platforms} x86-darwin8-gcc"
all_platforms="${all_platforms} x86-darwin8-icc"
all_platforms="${all_platforms} x86-darwin9-gcc"
all_platforms="${all_platforms} x86-darwin9-icc"
all_platforms="${all_platforms} x86-darwin10-gcc"
all_platforms="${all_platforms} x86-linux-gcc"
all_platforms="${all_platforms} x86-linux-icc"
all_platforms="${all_platforms} x86-solaris-gcc"
@@ -159,7 +160,6 @@ enable fast_unaligned #allow unaligned accesses, if supported by hw
enable md5
enable spatial_resampling
enable multithread
enable os_support
[ -d ${source_path}/../include ] && enable alt_tree_layout
for d in vp8; do
@@ -213,7 +213,6 @@ HAVE_LIST="
alt_tree_layout
pthread_h
sys_mman_h
dlopen
"
CONFIG_LIST="
external_build
@@ -251,11 +250,11 @@ CONFIG_LIST="
static_msvcrt
spatial_resampling
realtime_only
error_concealment
shared
small
opencl
arm_asm_detok
postproc_visualizer
os_support
"
CMDLINE_SELECT="
extra_warnings
@@ -292,9 +291,10 @@ CMDLINE_SELECT="
mem_tracker
spatial_resampling
realtime_only
error_concealment
shared
small
opencl
arm_asm_detok
postproc_visualizer
"
@@ -303,7 +303,7 @@ process_cmdline() {
optval="${opt#*=}"
case "$opt" in
--disable-codecs) for c in ${CODECS}; do disable $c; done ;;
*) process_common_cmdline "$opt"
*) process_common_cmdline $opt
;;
esac
done
@@ -382,7 +382,6 @@ process_targets() {
if [ -f "${source_path}/build/make/version.sh" ]; then
local ver=`"$source_path/build/make/version.sh" --bare $source_path`
DIST_DIR="${DIST_DIR}-${ver}"
VERSION_STRING=${ver}
ver=${ver%%-*}
VERSION_PATCH=${ver##*.}
ver=${ver%.*}
@@ -391,8 +390,6 @@ process_targets() {
VERSION_MAJOR=${ver%.*}
fi
enabled child || cat <<EOF >> config.mk
PREFIX=${prefix}
ifeq (\$(MAKECMDGOALS),dist)
DIST_DIR?=${DIST_DIR}
else
@@ -400,8 +397,6 @@ DIST_DIR?=\$(DESTDIR)${prefix}
endif
LIBSUBDIR=${libdir##${prefix}/}
VERSION_STRING=${VERSION_STRING}
VERSION_MAJOR=${VERSION_MAJOR}
VERSION_MINOR=${VERSION_MINOR}
VERSION_PATCH=${VERSION_PATCH}
@@ -496,7 +491,7 @@ process_toolchain() {
check_add_cflags -Wpointer-arith
check_add_cflags -Wtype-limits
check_add_cflags -Wcast-qual
enabled extra_warnings || check_add_cflags -Wno-unused-function
enabled extra_warnings || check_add_cflags -Wno-unused
fi
if enabled icc; then
@@ -561,6 +556,4 @@ process "$@"
cat <<EOF > ${BUILD_PFX}vpx_config.c
static const char* const cfg = "$CONFIGURE_ARGS";
const char *vpx_codec_build_config(void) {return cfg;}
static const char* const libdir = "$libdir";
const char *vpx_codec_lib_dir(void) {return libdir;}
EOF

View File

@@ -34,8 +34,7 @@ TXT_DOX = $(call enabled,TXT_DOX)
EXAMPLE_PATH += $(SRC_PATH_BARE) #for CHANGELOG, README, etc
doxyfile: $(if $(findstring examples, $(ALL_TARGETS)),examples.doxy)
doxyfile: libs.doxy_template libs.doxy
doxyfile: libs.doxy_template libs.doxy examples.doxy
@echo " [CREATE] $@"
@cat $^ > $@
@echo "STRIP_FROM_PATH += $(SRC_PATH_BARE) $(BUILD_ROOT)" >> $@

View File

@@ -77,6 +77,11 @@ GEN_EXAMPLES-$(CONFIG_ENCODERS) += decode_with_drops.c
endif
decode_with_drops.GUID = CE5C53C4-8DDA-438A-86ED-0DDD3CDB8D26
decode_with_drops.DESCRIPTION = Drops frames while decoding
ifeq ($(CONFIG_DECODERS),yes)
GEN_EXAMPLES-$(CONFIG_ENCODERS) += decode_with_partial_drops.c
endif
decode_partial_with_drops.GUID = CE5C53C4-8DDA-438A-86ED-0DDD3CDB8D27
decode_partial_with_drops.DESCRIPTION = Drops parts of frames while decoding
GEN_EXAMPLES-$(CONFIG_ENCODERS) += error_resilient.c
error_resilient.GUID = DF5837B9-4145-4F92-A031-44E4F832E00C
error_resilient.DESCRIPTION = Error Resiliency Feature
@@ -93,16 +98,8 @@ vp8cx_set_ref.DESCRIPTION = VP8 set encoder reference frame
# Handle extra library flags depending on codec configuration
# We should not link to math library (libm) on RVCT
# when building for bare-metal targets
ifeq ($(CONFIG_OS_SUPPORT), yes)
CODEC_EXTRA_LIBS-$(CONFIG_VP8) += m
else
ifeq ($(CONFIG_GCC), yes)
CODEC_EXTRA_LIBS-$(CONFIG_VP8) += m
endif
endif
#
# End of specified files. The rest of the build rules should happen
# automagically from here.

View File

@@ -0,0 +1,213 @@
@TEMPLATE decoder_tmpl.c
Decode With Drops Example
=========================
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ INTRODUCTION
This is an example utility which drops a series of frames, as specified
on the command line. This is useful for observing the error recovery
features of the codec.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ INTRODUCTION
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ EXTRA_INCLUDES
#include <time.h>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ EXTRA_INCLUDES
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ HELPERS
struct parsed_header
{
char key_frame;
int version;
char show_frame;
int first_part_size;
};
int next_packet(struct parsed_header* hdr, int pos, int length, int mtu)
{
int size = 0;
int remaining = length - pos;
/* Uncompressed part is 3 bytes for P frames and 10 bytes for I frames */
int uncomp_part_size = (hdr->key_frame ? 10 : 3);
/* number of bytes yet to send from header and the first partition */
int remainFirst = uncomp_part_size + hdr->first_part_size - pos;
if (remainFirst > 0)
{
if (remainFirst <= mtu)
{
size = remainFirst;
}
else
{
size = mtu;
}
return size;
}
/* second partition; just slot it up according to MTU */
if (remaining <= mtu)
{
size = remaining;
return size;
}
return mtu;
}
void throw_packets(unsigned char* frame, int* size, int loss_rate, int* thrown, int* kept)
{
unsigned char loss_frame[256*1024];
int pkg_size = 1;
int count = 0;
int pos = 0;
int loss_pos = 0;
struct parsed_header hdr;
unsigned int tmp;
int mtu = 100;
if (*size < 3)
{
return;
}
putc('|', stdout);
/* parse uncompressed 3 bytes */
tmp = (frame[2] << 16) | (frame[1] << 8) | frame[0];
hdr.key_frame = !(tmp & 0x1); /* inverse logic */
hdr.version = (tmp >> 1) & 0x7;
hdr.show_frame = (tmp >> 4) & 0x1;
hdr.first_part_size = (tmp >> 5) & 0x7FFFF;
/* don't drop key frames */
if (hdr.key_frame)
{
int i;
*kept = *size/mtu + ((*size % mtu > 0) ? 1 : 0); /* approximate */
for (i=0; i < *kept; i++)
putc('.', stdout);
return;
}
while ((pkg_size = next_packet(&hdr, pos, *size, mtu)) > 0)
{
int loss_event = ((rand() + 1.0)/(RAND_MAX + 1.0) < loss_rate/100.0);
if (*thrown == 0 && !loss_event)
{
memcpy(loss_frame + loss_pos, frame + pos, pkg_size);
loss_pos += pkg_size;
(*kept)++;
putc('.', stdout);
}
else
{
(*thrown)++;
putc('X', stdout);
}
pos += pkg_size;
}
memcpy(frame, loss_frame, loss_pos);
memset(frame + loss_pos, 0, *size - loss_pos);
*size = loss_pos;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ HELPERS
Usage
-----
This example adds a single argument to the `simple_decoder` example,
which specifies the range or pattern of frames to drop. The parameter is
parsed as follows:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ USAGE
if(argc!=4 && argc != 5)
die("Usage: %s <infile> <outfile> <N-M|N/M|L,S>\n", argv[0]);
{
char *nptr;
n = strtol(argv[3], &nptr, 0);
mode = (*nptr == '\0' || *nptr == ',') ? 2 : (*nptr == '-') ? 1 : 0;
m = strtol(nptr+1, NULL, 0);
if((!n && !m) || (*nptr != '-' && *nptr != '/' &&
*nptr != '\0' && *nptr != ','))
die("Couldn't parse pattern %s\n", argv[3]);
}
seed = (m > 0) ? m : (unsigned int)time(NULL);
srand(seed);thrown_frame = 0;
printf("Seed: %u\n", seed);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ USAGE
Dropping A Range Of Frames
--------------------------
To drop a range of frames, specify the starting frame and the ending
frame to drop, separated by a dash. The following command will drop
frames 5 through 10 (base 1).
$ ./decode_with_drops in.ivf out.i420 5-10
Dropping A Pattern Of Frames
----------------------------
To drop a pattern of frames, specify the number of frames to drop and
the number of frames after which to repeat the pattern, separated by
a forward-slash. The following command will drop 3 of 7 frames.
Specifically, it will decode 4 frames, then drop 3 frames, and then
repeat.
$ ./decode_with_drops in.ivf out.i420 3/7
Extra Variables
---------------
This example maintains the pattern passed on the command line in the
`n`, `m`, and `is_range` variables:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ EXTRA_VARS
int n, m, mode; //
unsigned int seed;
int thrown=0, kept=0;
int thrown_frame=0, kept_frame=0;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ EXTRA_VARS
Making The Drop Decision
------------------------
The example decides whether to drop the frame based on the current
frame number, immediately before decoding the frame.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ PRE_DECODE
/* Decide whether to throw parts of the frame or the whole frame
depending on the drop mode */
thrown_frame = 0;
kept_frame = 0;
switch (mode)
{
case 0:
if (m - (frame_cnt-1)%m <= n)
{
frame_sz = 0;
}
break;
case 1:
if (frame_cnt >= n && frame_cnt <= m)
{
frame_sz = 0;
}
break;
case 2:
throw_packets(frame, &frame_sz, n, &thrown_frame, &kept_frame);
break;
default: break;
}
if (mode < 2)
{
if (frame_sz == 0)
{
putc('X', stdout);
thrown_frame++;
}
else
{
putc('.', stdout);
kept_frame++;
}
}
thrown += thrown_frame;
kept += kept_frame;
fflush(stdout);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ PRE_DECODE

View File

@@ -19,7 +19,7 @@
#define VPX_CODEC_DISABLE_COMPAT 1
#include "vpx/vpx_decoder.h"
#include "vpx/vp8dx.h"
#define interface (vpx_codec_vp8_dx())
#define interface (&vpx_codec_vp8_dx_algo)
@EXTRA_INCLUDES
@@ -42,6 +42,8 @@ static void die(const char *fmt, ...) {
@DIE_CODEC
@HELPERS
int main(int argc, char **argv) {
FILE *infile, *outfile;
vpx_codec_ctx_t codec;

View File

@@ -2,7 +2,7 @@
#define VPX_CODEC_DISABLE_COMPAT 1
#include "vpx/vpx_decoder.h"
#include "vpx/vp8dx.h"
#define interface (vpx_codec_vp8_dx())
#define interface (&vpx_codec_vp8_dx_algo)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ DEC_INCLUDES

View File

@@ -19,7 +19,7 @@
#define VPX_CODEC_DISABLE_COMPAT 1
#include "vpx/vpx_encoder.h"
#include "vpx/vp8cx.h"
#define interface (vpx_codec_vp8_cx())
#define interface (&vpx_codec_vp8_cx_algo)
#define fourcc 0x30385056
@EXTRA_INCLUDES

View File

@@ -2,7 +2,7 @@
#define VPX_CODEC_DISABLE_COMPAT 1
#include "vpx/vpx_encoder.h"
#include "vpx/vp8cx.h"
#define interface (vpx_codec_vp8_cx())
#define interface (&vpx_codec_vp8_cx_algo)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ENC_INCLUDES

View File

@@ -33,7 +33,7 @@ Initializing The Codec
----------------------
The decoder is initialized by the following code. This is an example for
the VP8 decoder, but the code is analogous for all algorithms. Replace
`vpx_codec_vp8_dx()` with a pointer to the interface exposed by the
`&vpx_codec_vp8_dx_algo` with a pointer to the interface exposed by the
algorithm you want to use. The `cfg` argument is left as NULL in this
example, because we want the algorithm to determine the stream
configuration (width/height) and allocate memory automatically. This

126
libs.mk
View File

@@ -9,13 +9,7 @@
##
# ARM assembly files are written in RVCT-style. We use some make magic to
# filter those files to allow GCC compilation
ifeq ($(ARCH_ARM),yes)
ASM:=$(if $(filter yes,$(CONFIG_GCC)),.asm.s,.asm)
else
ASM:=.asm
endif
ASM:=$(if $(filter yes,$(CONFIG_GCC)),.asm.s,.asm)
CODEC_SRCS-yes += libs.mk
@@ -123,18 +117,6 @@ endif
else
INSTALL-LIBS-yes += $(LIBSUBDIR)/libvpx.a
INSTALL-LIBS-$(CONFIG_DEBUG_LIBS) += $(LIBSUBDIR)/libvpx_g.a
#Install the OpenCL kernels if CL enabled.
ifeq ($(CONFIG_OPENCL),yes)
INSTALL-LIBS-yes += $(LIBSUBDIR)/vp8/common/opencl/filter_cl.cl
INSTALL-LIBS-yes += $(LIBSUBDIR)/vp8/common/opencl/idctllm_cl.cl
INSTALL-LIBS-yes += $(LIBSUBDIR)/vp8/common/opencl/loopfilter.cl
#only install decoder CL files if VP8 decoder enabled
ifeq ($(CONFIG_VP8_DECODER),yes)
INSTALL-LIBS-yes += $(LIBSUBDIR)/vp8/decoder/opencl/dequantize_cl.cl
endif
endif #CONFIG_OPENCL=yes
endif
CODEC_SRCS=$(call enabled,CODEC_SRCS)
@@ -144,22 +126,28 @@ INSTALL-SRCS-$(CONFIG_CODEC_SRCS) += $(call enabled,CODEC_EXPORTS)
ifeq ($(CONFIG_EXTERNAL_BUILD),yes)
ifeq ($(CONFIG_MSVS),yes)
ifeq ($(ARCH_ARM),yes)
ifeq ($(HAVE_ARMV5TE),yes)
ARM_ARCH=v5
endif
ifeq ($(HAVE_ARMV6),yes)
ARM_ARCH=v6
endif
obj_int_extract.vcproj: $(SRC_PATH_BARE)/build/make/obj_int_extract.c
@cp $(SRC_PATH_BARE)/build/x86-msvs/obj_int_extract.bat .
@cp $(SRC_PATH_BARE)/build/arm-wince-vs8/obj_int_extract.bat .
@echo " [CREATE] $@"
$(SRC_PATH_BARE)/build/make/gen_msvs_proj.sh \
--exe \
--target=$(TOOLCHAIN) \
--name=obj_int_extract \
--ver=$(CONFIG_VS_VERSION) \
--proj-guid=E1360C65-D375-4335-8057-7ED99CC3F9B2 \
$(if $(CONFIG_STATIC_MSVCRT),--static-crt) \
--out=$@ $^ \
-I. \
-I"$(SRC_PATH_BARE)" \
$(SRC_PATH_BARE)/build/make/gen_msvs_proj.sh\
--exe\
--target=$(TOOLCHAIN)\
$(if $(CONFIG_STATIC_MSVCRT),--static-crt) \
--name=obj_int_extract\
--proj-guid=E1360C65-D375-4335-8057-7ED99CC3F9B2\
--out=$@ $^\
-I".&quot;;&quot;$(SRC_PATH_BARE)"
PROJECTS-$(BUILD_LIBVPX) += obj_int_extract.vcproj
PROJECTS-$(BUILD_LIBVPX) += obj_int_extract.bat
endif
vpx.def: $(call enabled,CODEC_EXPORTS)
@echo " [CREATE] $@"
@@ -170,16 +158,15 @@ CLEAN-OBJS += vpx.def
vpx.vcproj: $(CODEC_SRCS) vpx.def
@echo " [CREATE] $@"
$(SRC_PATH_BARE)/build/make/gen_msvs_proj.sh \
--lib \
--target=$(TOOLCHAIN) \
$(SRC_PATH_BARE)/build/make/gen_msvs_proj.sh\
--lib\
--target=$(TOOLCHAIN)\
$(if $(CONFIG_STATIC_MSVCRT),--static-crt) \
--name=vpx \
--proj-guid=DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74 \
--module-def=vpx.def \
--ver=$(CONFIG_VS_VERSION) \
--out=$@ $(CFLAGS) $^ \
--src-path-bare="$(SRC_PATH_BARE)" \
--name=vpx\
--proj-guid=DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74\
--module-def=vpx.def\
--ver=$(CONFIG_VS_VERSION)\
--out=$@ $(CFLAGS) $^\
PROJECTS-$(BUILD_LIBVPX) += vpx.vcproj
@@ -216,26 +203,6 @@ $(addprefix $(DIST_DIR)/,$(LIBVPX_SO_SYMLINKS)):
INSTALL-LIBS-$(CONFIG_SHARED) += $(LIBVPX_SO_SYMLINKS)
INSTALL-LIBS-$(CONFIG_SHARED) += $(LIBSUBDIR)/$(LIBVPX_SO)
LIBS-$(BUILD_LIBVPX) += vpx.pc
vpx.pc: config.mk libs.mk
@echo " [CREATE] $@"
$(qexec)echo '# pkg-config file from libvpx $(VERSION_STRING)' > $@
$(qexec)echo 'prefix=$(PREFIX)' >> $@
$(qexec)echo 'exec_prefix=$${prefix}' >> $@
$(qexec)echo 'libdir=$${prefix}/lib' >> $@
$(qexec)echo 'includedir=$${prefix}/include' >> $@
$(qexec)echo '' >> $@
$(qexec)echo 'Name: vpx' >> $@
$(qexec)echo 'Description: WebM Project VPx codec implementation' >> $@
$(qexec)echo 'Version: $(VERSION_MAJOR).$(VERSION_MINOR).$(VERSION_PATCH)' >> $@
$(qexec)echo 'Requires:' >> $@
$(qexec)echo 'Conflicts:' >> $@
$(qexec)echo 'Libs: -L$${libdir} -lvpx' >> $@
$(qexec)echo 'Cflags: -I$${includedir}' >> $@
INSTALL-LIBS-yes += $(LIBSUBDIR)/pkgconfig/vpx.pc
INSTALL_MAPS += $(LIBSUBDIR)/pkgconfig/%.pc %.pc
CLEAN-OBJS += vpx.pc
endif
LIBS-$(LIPO_LIBVPX) += libvpx.a
@@ -263,44 +230,9 @@ endif
#
# Add assembler dependencies for configuration and offsets
#
$(filter %.s.o,$(OBJS-yes)): $(BUILD_PFX)vpx_config.asm
$(filter %$(ASM).o,$(OBJS-yes)): $(BUILD_PFX)vpx_config.asm
#
# Calculate platform- and compiler-specific offsets for hand coded assembly
#
ifeq ($(CONFIG_EXTERNAL_BUILD),) # Visual Studio uses obj_int_extract.bat
ifeq ($(ARCH_ARM), yes)
asm_com_offsets.asm: obj_int_extract
asm_com_offsets.asm: $(VP8_PREFIX)common/asm_com_offsets.c.o
./obj_int_extract rvds $< $(ADS2GAS) > $@
OBJS-yes += $(VP8_PREFIX)common/asm_com_offsets.c.o
CLEAN-OBJS += asm_com_offsets.asm
$(filter %$(ASM).o,$(OBJS-yes)): $(BUILD_PFX)asm_com_offsets.asm
endif
ifeq ($(ARCH_ARM)$(ARCH_X86)$(ARCH_X86_64), yes)
ifeq ($(CONFIG_VP8_ENCODER), yes)
asm_enc_offsets.asm: obj_int_extract
asm_enc_offsets.asm: $(VP8_PREFIX)encoder/asm_enc_offsets.c.o
./obj_int_extract rvds $< $(ADS2GAS) > $@
OBJS-yes += $(VP8_PREFIX)encoder/asm_enc_offsets.c.o
CLEAN-OBJS += asm_enc_offsets.asm
$(filter %$(ASM).o,$(OBJS-yes)): $(BUILD_PFX)asm_enc_offsets.asm
endif
endif
ifeq ($(ARCH_ARM), yes)
ifeq ($(CONFIG_VP8_DECODER), yes)
asm_dec_offsets.asm: obj_int_extract
asm_dec_offsets.asm: $(VP8_PREFIX)decoder/asm_dec_offsets.c.o
./obj_int_extract rvds $< $(ADS2GAS) > $@
OBJS-yes += $(VP8_PREFIX)decoder/asm_dec_offsets.c.o
CLEAN-OBJS += asm_dec_offsets.asm
$(filter %$(ASM).o,$(OBJS-yes)): $(BUILD_PFX)asm_dec_offsets.asm
endif
endif
endif
#$(filter %$(ASM).o,$(OBJS-yes)): $(BUILD_PFX)vpx_config.asm $(BUILD_PFX)vpx_asm_offsets.asm
$(filter %.s.o,$(OBJS-yes)): $(BUILD_PFX)vpx_config.asm
$(filter %.asm.o,$(OBJS-yes)): $(BUILD_PFX)vpx_config.asm
$(shell $(SRC_PATH_BARE)/build/make/version.sh "$(SRC_PATH_BARE)" $(BUILD_PFX)vpx_version.h)
CLEAN-OBJS += $(BUILD_PFX)vpx_version.h

View File

@@ -31,7 +31,7 @@
The WebM project is an open source project supported by its community. For
questions about this SDK, please mail the apps-devel@webmproject.org list.
To contribute, see http://www.webmproject.org/code/contribute and mail
codec-devel@webmproject.org.
vpx-devel@webmproject.org.
*/
/*!\page changelog CHANGELOG

View File

@@ -20,6 +20,8 @@
* Still in the public domain.
*/
#include <sys/types.h> /* for stupid systems */
#include <string.h> /* for memcpy() */
#include "md5_utils.h"

View File

@@ -9,13 +9,38 @@
##
ifeq ($(ARCH_ARM),yes)
ARM_DEVELOP=no
ARM_DEVELOP:=$(if $(filter %vpx.vcproj,$(wildcard *.vcproj)),yes)
ifeq ($(ARM_DEVELOP),yes)
vpx.sln:
@echo " [COPY] $@"
@cp $(SRC_PATH_BARE)/build/arm-wince-vs8/vpx.sln .
PROJECTS-yes += vpx.sln
else
vpx.sln: $(wildcard *.vcproj)
@echo " [CREATE] $@"
$(SRC_PATH_BARE)/build/make/gen_msvs_sln.sh \
$(if $(filter %vpx.vcproj,$^),--dep=vpxdec:vpx) \
$(if $(filter %vpx.vcproj,$^),--dep=xma:vpx) \
--ver=$(CONFIG_VS_VERSION)\
--target=$(TOOLCHAIN)\
--out=$@ $^
vpx.sln.mk: vpx.sln
@true
PROJECTS-yes += vpx.sln vpx.sln.mk
-include vpx.sln.mk
endif
else
vpx.sln: $(wildcard *.vcproj)
@echo " [CREATE] $@"
$(SRC_PATH_BARE)/build/make/gen_msvs_sln.sh \
$(if $(filter %vpx.vcproj,$^),\
$(foreach vcp,$(filter-out %vpx.vcproj %obj_int_extract.vcproj,$^),\
$(foreach vcp,$(filter-out %vpx.vcproj,$^),\
--dep=$(vcp:.vcproj=):vpx)) \
--dep=vpx:obj_int_extract \
--ver=$(CONFIG_VS_VERSION)\
--out=$@ $^
vpx.sln.mk: vpx.sln
@@ -23,6 +48,7 @@ vpx.sln.mk: vpx.sln
PROJECTS-yes += vpx.sln vpx.sln.mk
-include vpx.sln.mk
endif
# Always install this file, as it is an unconditional post-build rule.
INSTALL_MAPS += src/% $(SRC_PATH_BARE)/%

View File

@@ -25,7 +25,7 @@
codec may write into to store details about a single instance of that codec.
Most of the context is implementation specific, and thus opaque to the
application. The context structure as seen by the application is of fixed
size, and thus can be allocated with automatic storage or dynamically
size, and thus can be allocated eith with automatic storage or dynamically
on the heap.
Most operations require an initialized codec context. Codec context
@@ -74,7 +74,7 @@
the ABI is versioned. The ABI version number must be passed at
initialization time to ensure the application is using a header file that
matches the library. The current ABI version number is stored in the
preprocessor macros #VPX_CODEC_ABI_VERSION, #VPX_ENCODER_ABI_VERSION, and
prepropcessor macros #VPX_CODEC_ABI_VERSION, #VPX_ENCODER_ABI_VERSION, and
#VPX_DECODER_ABI_VERSION. For convenience, each initialization function has
a wrapper macro that inserts the correct version number. These macros are
named like the initialization methods, but without the _ver suffix.
@@ -125,7 +125,7 @@
The special value <code>0</code> is reserved to represent an infinite
deadline. In this case, the codec will perform as much processing as
possible to yield the highest quality frame.
possible to yeild the highest quality frame.
By convention, the value <code>1</code> is used to mean "return as fast as
possible."
@@ -135,7 +135,7 @@
/*! \page usage_xma External Memory Allocation
Applications that wish to have fine grained control over how and where
decoders allocate memory \ref MAY make use of the eXternal Memory Allocation
decoders allocate memory \ref MAY make use of the e_xternal Memory Allocation
(XMA) interface. Not all codecs support the XMA \ref usage_features.
To use a decoder in XMA mode, the decoder \ref MUST be initialized with the
@@ -143,7 +143,7 @@
allocate is heavily dependent on the size of the encoded video frames. The
size of the video must be known before requesting the decoder's memory map.
This stream information can be obtained with the vpx_codec_peek_stream_info()
function, which does not require a constructed decoder context. If the exact
function, which does not require a contructed decoder context. If the exact
stream is not known, a stream info structure can be created that reflects
the maximum size that the decoder instance is required to support.
@@ -175,7 +175,7 @@
\section usage_xma_seg_szalign Segment Size and Alignment
The sz (size) and align (alignment) parameters describe the required size
and alignment of the requested segment. Alignment will always be a power of
two. Applications \ref MUST honor the alignment requested. Failure to do so
two. Applications \ref MUST honor the aligment requested. Failure to do so
could result in program crashes or may incur a speed penalty.
\section usage_xma_seg_flags Segment Flags

View File

@@ -12,21 +12,26 @@
#include "vpx_ports/config.h"
#include "blockd.h"
#include "vpx_mem/vpx_mem.h"
#include "error_concealment.h"
#include "onyxc_int.h"
#include "findnearmv.h"
#include "entropymode.h"
#include "systemdependent.h"
#include "vpxerrors.h"
extern void vp8_init_scan_order_mask();
static void update_mode_info_border(MODE_INFO *mi, int rows, int cols)
void vp8_update_mode_info_border(MODE_INFO *mi, int rows, int cols)
{
int i;
vpx_memset(mi - cols - 2, 0, sizeof(MODE_INFO) * (cols + 1));
for (i = 0; i < rows; i++)
{
/* TODO(holmer): Bug? This updates the last element of each row
* rather than the border element!
*/
vpx_memset(&mi[i*cols-1], 0, sizeof(MODE_INFO));
}
}
@@ -43,9 +48,11 @@ void vp8_de_alloc_frame_buffers(VP8_COMMON *oci)
vpx_free(oci->above_context);
vpx_free(oci->mip);
vpx_free(oci->prev_mip);
oci->above_context = 0;
oci->mip = 0;
oci->prev_mip = 0;
}
@@ -70,7 +77,7 @@ int vp8_alloc_frame_buffers(VP8_COMMON *oci, int width, int height)
if (vp8_yv12_alloc_frame_buffer(&oci->yv12_fb[i], width, height, VP8BORDERINPIXELS) < 0)
{
vp8_de_alloc_frame_buffers(oci);
return 1;
return ALLOC_FAILURE;
}
}
@@ -87,13 +94,13 @@ int vp8_alloc_frame_buffers(VP8_COMMON *oci, int width, int height)
if (vp8_yv12_alloc_frame_buffer(&oci->temp_scale_frame, width, 16, VP8BORDERINPIXELS) < 0)
{
vp8_de_alloc_frame_buffers(oci);
return 1;
return ALLOC_FAILURE;
}
if (vp8_yv12_alloc_frame_buffer(&oci->post_proc_buffer, width, height, VP8BORDERINPIXELS) < 0)
{
vp8_de_alloc_frame_buffers(oci);
return 1;
return ALLOC_FAILURE;
}
oci->mb_rows = height >> 4;
@@ -105,21 +112,31 @@ int vp8_alloc_frame_buffers(VP8_COMMON *oci, int width, int height)
if (!oci->mip)
{
vp8_de_alloc_frame_buffers(oci);
return 1;
return ALLOC_FAILURE;
}
oci->mi = oci->mip + oci->mode_info_stride + 1;
/* allocate memory for last frame MODE_INFO array */
oci->prev_mip = vpx_calloc((oci->mb_cols + 1) * (oci->mb_rows + 1), sizeof(MODE_INFO));
if (!oci->prev_mip)
{
vp8_de_alloc_frame_buffers(oci);
return ALLOC_FAILURE;
}
oci->prev_mi = oci->prev_mip + oci->mode_info_stride + 1;
oci->above_context = vpx_calloc(sizeof(ENTROPY_CONTEXT_PLANES) * oci->mb_cols, 1);
if (!oci->above_context)
{
vp8_de_alloc_frame_buffers(oci);
return 1;
return ALLOC_FAILURE;
}
update_mode_info_border(oci->mi, oci->mb_rows, oci->mb_cols);
vp8_update_mode_info_border(oci->mi, oci->mb_rows, oci->mb_cols);
return 0;
}
@@ -130,32 +147,32 @@ void vp8_setup_version(VP8_COMMON *cm)
case 0:
cm->no_lpf = 0;
cm->simpler_lpf = 0;
cm->mcomp_filter_type = SIXTAP;
cm->use_bilinear_mc_filter = 0;
cm->full_pixel = 0;
break;
case 1:
cm->no_lpf = 0;
cm->simpler_lpf = 1;
cm->mcomp_filter_type = BILINEAR;
cm->use_bilinear_mc_filter = 1;
cm->full_pixel = 0;
break;
case 2:
cm->no_lpf = 1;
cm->simpler_lpf = 0;
cm->mcomp_filter_type = BILINEAR;
cm->use_bilinear_mc_filter = 1;
cm->full_pixel = 0;
break;
case 3:
cm->no_lpf = 1;
cm->simpler_lpf = 1;
cm->mcomp_filter_type = BILINEAR;
cm->use_bilinear_mc_filter = 1;
cm->full_pixel = 1;
break;
default:
/*4,5,6,7 are reserved for future use*/
cm->no_lpf = 0;
cm->simpler_lpf = 0;
cm->mcomp_filter_type = SIXTAP;
cm->use_bilinear_mc_filter = 0;
cm->full_pixel = 0;
break;
}
@@ -170,7 +187,7 @@ void vp8_create_common(VP8_COMMON *oci)
oci->mb_no_coeff_skip = 1;
oci->no_lpf = 0;
oci->simpler_lpf = 0;
oci->mcomp_filter_type = SIXTAP;
oci->use_bilinear_mc_filter = 0;
oci->full_pixel = 0;
oci->multi_token_partition = ONE_PARTITION;
oci->clr_type = REG_YUV;

View File

@@ -11,13 +11,21 @@
#include "vpx_ports/config.h"
#include "vpx_ports/arm.h"
#include "vp8/common/g_common.h"
#include "vp8/common/pragmas.h"
#include "vp8/common/subpixel.h"
#include "vp8/common/loopfilter.h"
#include "vp8/common/recon.h"
#include "vp8/common/idct.h"
#include "vp8/common/onyxc_int.h"
#include "g_common.h"
#include "pragmas.h"
#include "subpixel.h"
#include "loopfilter.h"
#include "recon.h"
#include "idct.h"
#include "onyxc_int.h"
extern void (*vp8_build_intra_predictors_mby_ptr)(MACROBLOCKD *x);
extern void vp8_build_intra_predictors_mby(MACROBLOCKD *x);
extern void vp8_build_intra_predictors_mby_neon(MACROBLOCKD *x);
extern void (*vp8_build_intra_predictors_mby_s_ptr)(MACROBLOCKD *x);
extern void vp8_build_intra_predictors_mby_s(MACROBLOCKD *x);
extern void vp8_build_intra_predictors_mby_s_neon(MACROBLOCKD *x);
void vp8_arch_arm_common_init(VP8_COMMON *ctx)
{
@@ -98,12 +106,31 @@ void vp8_arch_arm_common_init(VP8_COMMON *ctx)
rtcd->recon.recon2 = vp8_recon2b_neon;
rtcd->recon.recon4 = vp8_recon4b_neon;
rtcd->recon.recon_mb = vp8_recon_mb_neon;
rtcd->recon.build_intra_predictors_mby =
vp8_build_intra_predictors_mby_neon;
rtcd->recon.build_intra_predictors_mby_s =
vp8_build_intra_predictors_mby_s_neon;
}
#endif
#endif
#if HAVE_ARMV6
#if CONFIG_RUNTIME_CPU_DETECT
if (has_media)
#endif
{
vp8_build_intra_predictors_mby_ptr = vp8_build_intra_predictors_mby;
vp8_build_intra_predictors_mby_s_ptr = vp8_build_intra_predictors_mby_s;
}
#endif
#if HAVE_ARMV7
#if CONFIG_RUNTIME_CPU_DETECT
if (has_neon)
#endif
{
vp8_build_intra_predictors_mby_ptr =
vp8_build_intra_predictors_mby_neon;
vp8_build_intra_predictors_mby_s_ptr =
vp8_build_intra_predictors_mby_s_neon;
}
#endif
}

View File

@@ -15,19 +15,19 @@
AREA |.text|, CODE, READONLY ; name this block of code
;-------------------------------------
; r0 unsigned char *src_ptr,
; r1 unsigned short *dst_ptr,
; r2 unsigned int src_pitch,
; r3 unsigned int height,
; stack unsigned int width,
; stack const short *vp8_filter
; r0 unsigned char *src_ptr,
; r1 unsigned short *output_ptr,
; r2 unsigned int src_pixels_per_line,
; r3 unsigned int output_height,
; stack unsigned int output_width,
; stack const short *vp8_filter
;-------------------------------------
; The output is transposed stroed in output array to make it easy for second pass filtering.
|vp8_filter_block2d_bil_first_pass_armv6| PROC
stmdb sp!, {r4 - r11, lr}
ldr r11, [sp, #40] ; vp8_filter address
ldr r4, [sp, #36] ; width
ldr r4, [sp, #36] ; output width
mov r12, r3 ; outer-loop counter
sub r2, r2, r4 ; src increment for height loop
@@ -38,10 +38,10 @@
ldr r5, [r11] ; load up filter coefficients
mov r3, r3, lsl #1 ; height*2
mov r3, r3, lsl #1 ; output_height*2
add r3, r3, #2 ; plus 2 to make output buffer 4-bit aligned since height is actually (height+1)
mov r11, r1 ; save dst_ptr for each row
mov r11, r1 ; save output_ptr for each row
cmp r5, #128 ; if filter coef = 128, then skip the filter
beq bil_null_1st_filter
@@ -140,17 +140,17 @@
;---------------------------------
; r0 unsigned short *src_ptr,
; r1 unsigned char *dst_ptr,
; r2 int dst_pitch,
; r3 unsigned int height,
; stack unsigned int width,
; stack const short *vp8_filter
; r1 unsigned char *output_ptr,
; r2 int output_pitch,
; r3 unsigned int output_height,
; stack unsigned int output_width,
; stack const short *vp8_filter
;---------------------------------
|vp8_filter_block2d_bil_second_pass_armv6| PROC
stmdb sp!, {r4 - r11, lr}
ldr r11, [sp, #40] ; vp8_filter address
ldr r4, [sp, #36] ; width
ldr r4, [sp, #36] ; output width
ldr r5, [r11] ; load up filter coefficients
mov r12, r4 ; outer-loop counter = width, since we work on transposed data matrix

View File

@@ -243,6 +243,8 @@ skip_secondpass_hloop
ENDP
;-----------------
AREA subpelfilters8_dat, DATA, READWRITE ;read/write by default
;Data section with name data_area is specified. DCD reserves space in memory for 48 data.
;One word each is reserved. Label filter_coeff can be used to access the data.
;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
_filter8_coeff_

View File

@@ -10,29 +10,128 @@
#include <math.h>
#include "vp8/common/filter.h"
#include "vp8/common/subpixel.h"
#include "bilinearfilter_arm.h"
#include "subpixel.h"
#define BLOCK_HEIGHT_WIDTH 4
#define VP8_FILTER_WEIGHT 128
#define VP8_FILTER_SHIFT 7
static const short bilinear_filters[8][2] =
{
{ 128, 0 },
{ 112, 16 },
{ 96, 32 },
{ 80, 48 },
{ 64, 64 },
{ 48, 80 },
{ 32, 96 },
{ 16, 112 }
};
extern void vp8_filter_block2d_bil_first_pass_armv6
(
unsigned char *src_ptr,
unsigned short *output_ptr,
unsigned int src_pixels_per_line,
unsigned int output_height,
unsigned int output_width,
const short *vp8_filter
);
extern void vp8_filter_block2d_bil_second_pass_armv6
(
unsigned short *src_ptr,
unsigned char *output_ptr,
int output_pitch,
unsigned int output_height,
unsigned int output_width,
const short *vp8_filter
);
#if 0
void vp8_filter_block2d_bil_first_pass_6
(
unsigned char *src_ptr,
unsigned short *output_ptr,
unsigned int src_pixels_per_line,
unsigned int output_height,
unsigned int output_width,
const short *vp8_filter
)
{
unsigned int i, j;
for ( i=0; i<output_height; i++ )
{
for ( j=0; j<output_width; j++ )
{
/* Apply bilinear filter */
output_ptr[j] = ( ( (int)src_ptr[0] * vp8_filter[0]) +
((int)src_ptr[1] * vp8_filter[1]) +
(VP8_FILTER_WEIGHT/2) ) >> VP8_FILTER_SHIFT;
src_ptr++;
}
/* Next row... */
src_ptr += src_pixels_per_line - output_width;
output_ptr += output_width;
}
}
void vp8_filter_block2d_bil_second_pass_6
(
unsigned short *src_ptr,
unsigned char *output_ptr,
int output_pitch,
unsigned int output_height,
unsigned int output_width,
const short *vp8_filter
)
{
unsigned int i,j;
int Temp;
for ( i=0; i<output_height; i++ )
{
for ( j=0; j<output_width; j++ )
{
/* Apply filter */
Temp = ((int)src_ptr[0] * vp8_filter[0]) +
((int)src_ptr[output_width] * vp8_filter[1]) +
(VP8_FILTER_WEIGHT/2);
output_ptr[j] = (unsigned int)(Temp >> VP8_FILTER_SHIFT);
src_ptr++;
}
/* Next row... */
/*src_ptr += src_pixels_per_line - output_width;*/
output_ptr += output_pitch;
}
}
#endif
void vp8_filter_block2d_bil_armv6
(
unsigned char *src_ptr,
unsigned char *dst_ptr,
unsigned int src_pitch,
unsigned char *output_ptr,
unsigned int src_pixels_per_line,
unsigned int dst_pitch,
const short *HFilter,
const short *VFilter,
const short *HFilter,
const short *VFilter,
int Width,
int Height
)
{
unsigned short FData[36*16]; /* Temp data buffer used in filtering */
unsigned short FData[36*16]; /* Temp data bufffer used in filtering */
/* First filter 1-D horizontally... */
vp8_filter_block2d_bil_first_pass_armv6(src_ptr, FData, src_pitch, Height + 1, Width, HFilter);
/* pixel_step = 1; */
vp8_filter_block2d_bil_first_pass_armv6(src_ptr, FData, src_pixels_per_line, Height + 1, Width, HFilter);
/* then 1-D vertically... */
vp8_filter_block2d_bil_second_pass_armv6(FData, dst_ptr, dst_pitch, Height, Width, VFilter);
vp8_filter_block2d_bil_second_pass_armv6(FData, output_ptr, dst_pitch, Height, Width, VFilter);
}
@@ -49,8 +148,8 @@ void vp8_bilinear_predict4x4_armv6
const short *HFilter;
const short *VFilter;
HFilter = vp8_bilinear_filters[xoffset];
VFilter = vp8_bilinear_filters[yoffset];
HFilter = bilinear_filters[xoffset];
VFilter = bilinear_filters[yoffset];
vp8_filter_block2d_bil_armv6(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 4, 4);
}
@@ -68,8 +167,8 @@ void vp8_bilinear_predict8x8_armv6
const short *HFilter;
const short *VFilter;
HFilter = vp8_bilinear_filters[xoffset];
VFilter = vp8_bilinear_filters[yoffset];
HFilter = bilinear_filters[xoffset];
VFilter = bilinear_filters[yoffset];
vp8_filter_block2d_bil_armv6(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 8, 8);
}
@@ -87,8 +186,8 @@ void vp8_bilinear_predict8x4_armv6
const short *HFilter;
const short *VFilter;
HFilter = vp8_bilinear_filters[xoffset];
VFilter = vp8_bilinear_filters[yoffset];
HFilter = bilinear_filters[xoffset];
VFilter = bilinear_filters[yoffset];
vp8_filter_block2d_bil_armv6(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 8, 4);
}
@@ -106,8 +205,8 @@ void vp8_bilinear_predict16x16_armv6
const short *HFilter;
const short *VFilter;
HFilter = vp8_bilinear_filters[xoffset];
VFilter = vp8_bilinear_filters[yoffset];
HFilter = bilinear_filters[xoffset];
VFilter = bilinear_filters[yoffset];
vp8_filter_block2d_bil_armv6(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 16, 16);
}

View File

@@ -1,35 +0,0 @@
/*
* Copyright (c) 2011 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#ifndef BILINEARFILTER_ARM_H
#define BILINEARFILTER_ARM_H
extern void vp8_filter_block2d_bil_first_pass_armv6
(
const unsigned char *src_ptr,
unsigned short *dst_ptr,
unsigned int src_pitch,
unsigned int height,
unsigned int width,
const short *vp8_filter
);
extern void vp8_filter_block2d_bil_second_pass_armv6
(
const unsigned short *src_ptr,
unsigned char *dst_ptr,
int dst_pitch,
unsigned int height,
unsigned int width,
const short *vp8_filter
);
#endif /* BILINEARFILTER_ARM_H */

View File

@@ -11,10 +11,26 @@
#include "vpx_ports/config.h"
#include <math.h>
#include "vp8/common/filter.h"
#include "vp8/common/subpixel.h"
#include "subpixel.h"
#include "vpx_ports/mem.h"
#define BLOCK_HEIGHT_WIDTH 4
#define VP8_FILTER_WEIGHT 128
#define VP8_FILTER_SHIFT 7
DECLARE_ALIGNED(16, static const short, sub_pel_filters[8][6]) =
{
{ 0, 0, 128, 0, 0, 0 }, /* note that 1/8 pel positions are just as per alpha -0.5 bicubic */
{ 0, -6, 123, 12, -1, 0 },
{ 2, -11, 108, 36, -8, 1 }, /* New 1/4 pel 6 tap filter */
{ 0, -9, 93, 50, -6, 0 },
{ 3, -16, 77, 77, -16, 3 }, /* New 1/2 pel 6 tap filter */
{ 0, -6, 50, 93, -9, 0 },
{ 1, -8, 36, 108, -11, 2 }, /* New 1/4 pel 6 tap filter */
{ 0, -1, 12, 123, -6, 0 },
};
extern void vp8_filter_block2d_first_pass_armv6
(
unsigned char *src_ptr,
@@ -77,11 +93,11 @@ void vp8_sixtap_predict_armv6
{
const short *HFilter;
const short *VFilter;
DECLARE_ALIGNED_ARRAY(4, short, FData, 12*4); /* Temp data buffer used in filtering */
DECLARE_ALIGNED_ARRAY(4, short, FData, 12*4); /* Temp data bufffer used in filtering */
HFilter = vp8_sub_pel_filters[xoffset]; /* 6 tap */
VFilter = vp8_sub_pel_filters[yoffset]; /* 6 tap */
HFilter = sub_pel_filters[xoffset]; /* 6 tap */
VFilter = sub_pel_filters[yoffset]; /* 6 tap */
/* Vfilter is null. First pass only */
if (xoffset && !yoffset)
@@ -113,6 +129,47 @@ void vp8_sixtap_predict_armv6
}
}
#if 0
void vp8_sixtap_predict8x4_armv6
(
unsigned char *src_ptr,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_ptr,
int dst_pitch
)
{
const short *HFilter;
const short *VFilter;
DECLARE_ALIGNED_ARRAY(4, short, FData, 16*8); /* Temp data bufffer used in filtering */
HFilter = sub_pel_filters[xoffset]; /* 6 tap */
VFilter = sub_pel_filters[yoffset]; /* 6 tap */
/*if (xoffset && !yoffset)
{
vp8_filter_block2d_first_pass_only_armv6 ( src_ptr, dst_ptr, src_pixels_per_line, 8, dst_pitch, HFilter );
}*/
/* Hfilter is null. Second pass only */
/*else if (!xoffset && yoffset)
{
vp8_filter_block2d_second_pass_only_armv6 ( src_ptr, dst_ptr, src_pixels_per_line, 8, dst_pitch, VFilter );
}
else
{
if (yoffset & 0x1)
vp8_filter_block2d_first_pass_armv6 ( src_ptr-src_pixels_per_line, FData+1, src_pixels_per_line, 8, 7, HFilter );
else*/
vp8_filter_block2d_first_pass_armv6 ( src_ptr-(2*src_pixels_per_line), FData, src_pixels_per_line, 8, 9, HFilter );
vp8_filter_block2d_second_pass_armv6 ( FData+2, dst_ptr, dst_pitch, 4, 8, VFilter );
/*}*/
}
#endif
void vp8_sixtap_predict8x8_armv6
(
unsigned char *src_ptr,
@@ -125,10 +182,10 @@ void vp8_sixtap_predict8x8_armv6
{
const short *HFilter;
const short *VFilter;
DECLARE_ALIGNED_ARRAY(4, short, FData, 16*8); /* Temp data buffer used in filtering */
DECLARE_ALIGNED_ARRAY(4, short, FData, 16*8); /* Temp data bufffer used in filtering */
HFilter = vp8_sub_pel_filters[xoffset]; /* 6 tap */
VFilter = vp8_sub_pel_filters[yoffset]; /* 6 tap */
HFilter = sub_pel_filters[xoffset]; /* 6 tap */
VFilter = sub_pel_filters[yoffset]; /* 6 tap */
if (xoffset && !yoffset)
{
@@ -167,10 +224,10 @@ void vp8_sixtap_predict16x16_armv6
{
const short *HFilter;
const short *VFilter;
DECLARE_ALIGNED_ARRAY(4, short, FData, 24*16); /* Temp data buffer used in filtering */
DECLARE_ALIGNED_ARRAY(4, short, FData, 24*16); /* Temp data bufffer used in filtering */
HFilter = vp8_sub_pel_filters[xoffset]; /* 6 tap */
VFilter = vp8_sub_pel_filters[yoffset]; /* 6 tap */
HFilter = sub_pel_filters[xoffset]; /* 6 tap */
VFilter = sub_pel_filters[yoffset]; /* 6 tap */
if (xoffset && !yoffset)
{

View File

@@ -11,8 +11,8 @@
#include "vpx_ports/config.h"
#include <math.h>
#include "vp8/common/loopfilter.h"
#include "vp8/common/onyxc_int.h"
#include "loopfilter.h"
#include "onyxc_int.h"
extern prototype_loopfilter(vp8_loop_filter_horizontal_edge_armv6);
extern prototype_loopfilter(vp8_loop_filter_vertical_edge_armv6);
@@ -41,13 +41,13 @@ void vp8_loop_filter_mbh_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsig
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) simpler_lpf;
vp8_mbloop_filter_horizontal_edge_armv6(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
vp8_mbloop_filter_horizontal_edge_armv6(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);
if (u_ptr)
vp8_mbloop_filter_horizontal_edge_armv6(u_ptr, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1);
vp8_mbloop_filter_horizontal_edge_armv6(u_ptr, uv_stride, lfi->uvmbflim, lfi->uvlim, lfi->uvmbthr, 1);
if (v_ptr)
vp8_mbloop_filter_horizontal_edge_armv6(v_ptr, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1);
vp8_mbloop_filter_horizontal_edge_armv6(v_ptr, uv_stride, lfi->uvmbflim, lfi->uvlim, lfi->uvmbthr, 1);
}
void vp8_loop_filter_mbhs_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
@@ -57,7 +57,7 @@ void vp8_loop_filter_mbhs_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsi
(void) v_ptr;
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_horizontal_edge_armv6(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_horizontal_edge_armv6(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);
}
/* Vertical MB Filtering */
@@ -65,13 +65,13 @@ void vp8_loop_filter_mbv_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsig
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) simpler_lpf;
vp8_mbloop_filter_vertical_edge_armv6(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
vp8_mbloop_filter_vertical_edge_armv6(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);
if (u_ptr)
vp8_mbloop_filter_vertical_edge_armv6(u_ptr, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1);
vp8_mbloop_filter_vertical_edge_armv6(u_ptr, uv_stride, lfi->uvmbflim, lfi->uvlim, lfi->uvmbthr, 1);
if (v_ptr)
vp8_mbloop_filter_vertical_edge_armv6(v_ptr, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1);
vp8_mbloop_filter_vertical_edge_armv6(v_ptr, uv_stride, lfi->uvmbflim, lfi->uvlim, lfi->uvmbthr, 1);
}
void vp8_loop_filter_mbvs_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
@@ -81,7 +81,7 @@ void vp8_loop_filter_mbvs_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsi
(void) v_ptr;
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_vertical_edge_armv6(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_vertical_edge_armv6(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);
}
/* Horizontal B Filtering */
@@ -94,10 +94,10 @@ void vp8_loop_filter_bh_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsign
vp8_loop_filter_horizontal_edge_armv6(y_ptr + 12 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
if (u_ptr)
vp8_loop_filter_horizontal_edge_armv6(u_ptr + 4 * uv_stride, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1);
vp8_loop_filter_horizontal_edge_armv6(u_ptr + 4 * uv_stride, uv_stride, lfi->uvflim, lfi->uvlim, lfi->uvthr, 1);
if (v_ptr)
vp8_loop_filter_horizontal_edge_armv6(v_ptr + 4 * uv_stride, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1);
vp8_loop_filter_horizontal_edge_armv6(v_ptr + 4 * uv_stride, uv_stride, lfi->uvflim, lfi->uvlim, lfi->uvthr, 1);
}
void vp8_loop_filter_bhs_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
@@ -122,10 +122,10 @@ void vp8_loop_filter_bv_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsign
vp8_loop_filter_vertical_edge_armv6(y_ptr + 12, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
if (u_ptr)
vp8_loop_filter_vertical_edge_armv6(u_ptr + 4, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1);
vp8_loop_filter_vertical_edge_armv6(u_ptr + 4, uv_stride, lfi->uvflim, lfi->uvlim, lfi->uvthr, 1);
if (v_ptr)
vp8_loop_filter_vertical_edge_armv6(v_ptr + 4, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1);
vp8_loop_filter_vertical_edge_armv6(v_ptr + 4, uv_stride, lfi->uvflim, lfi->uvlim, lfi->uvthr, 1);
}
void vp8_loop_filter_bvs_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
@@ -148,10 +148,10 @@ void vp8_loop_filter_mbh_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsign
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) simpler_lpf;
vp8_mbloop_filter_horizontal_edge_y_neon(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
vp8_mbloop_filter_horizontal_edge_y_neon(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);
if (u_ptr)
vp8_mbloop_filter_horizontal_edge_uv_neon(u_ptr, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, v_ptr);
vp8_mbloop_filter_horizontal_edge_uv_neon(u_ptr, uv_stride, lfi->uvmbflim, lfi->uvlim, lfi->uvmbthr, v_ptr);
}
void vp8_loop_filter_mbhs_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
@@ -161,7 +161,7 @@ void vp8_loop_filter_mbhs_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsig
(void) v_ptr;
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_horizontal_edge_neon(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_horizontal_edge_neon(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);
}
/* Vertical MB Filtering */
@@ -169,10 +169,10 @@ void vp8_loop_filter_mbv_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsign
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) simpler_lpf;
vp8_mbloop_filter_vertical_edge_y_neon(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
vp8_mbloop_filter_vertical_edge_y_neon(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);
if (u_ptr)
vp8_mbloop_filter_vertical_edge_uv_neon(u_ptr, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, v_ptr);
vp8_mbloop_filter_vertical_edge_uv_neon(u_ptr, uv_stride, lfi->uvmbflim, lfi->uvlim, lfi->uvmbthr, v_ptr);
}
void vp8_loop_filter_mbvs_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
@@ -182,7 +182,7 @@ void vp8_loop_filter_mbvs_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsig
(void) v_ptr;
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_vertical_edge_neon(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_vertical_edge_neon(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);
}
/* Horizontal B Filtering */
@@ -195,7 +195,7 @@ void vp8_loop_filter_bh_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsigne
vp8_loop_filter_horizontal_edge_y_neon(y_ptr + 12 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
if (u_ptr)
vp8_loop_filter_horizontal_edge_uv_neon(u_ptr + 4 * uv_stride, uv_stride, lfi->flim, lfi->lim, lfi->thr, v_ptr + 4 * uv_stride);
vp8_loop_filter_horizontal_edge_uv_neon(u_ptr + 4 * uv_stride, uv_stride, lfi->uvflim, lfi->uvlim, lfi->uvthr, v_ptr + 4 * uv_stride);
}
void vp8_loop_filter_bhs_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
@@ -220,7 +220,7 @@ void vp8_loop_filter_bv_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsigne
vp8_loop_filter_vertical_edge_y_neon(y_ptr + 12, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
if (u_ptr)
vp8_loop_filter_vertical_edge_uv_neon(u_ptr + 4, uv_stride, lfi->flim, lfi->lim, lfi->thr, v_ptr + 4);
vp8_loop_filter_vertical_edge_uv_neon(u_ptr + 4, uv_stride, lfi->uvflim, lfi->uvlim, lfi->uvthr, v_ptr + 4);
}
void vp8_loop_filter_bvs_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,

View File

@@ -350,7 +350,10 @@ filt_blk2d_spo16x16_loop_neon
ENDP
;-----------------
AREA bifilters16_dat, DATA, READWRITE ;read/write by default
;Data section with name data_area is specified. DCD reserves space in memory for 48 data.
;One word each is reserved. Label filter_coeff can be used to access the data.
;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
_bifilter16_coeff_
DCD bifilter16_coeff
bifilter16_coeff

View File

@@ -123,7 +123,10 @@ skip_secondpass_filter
ENDP
;-----------------
AREA bilinearfilters4_dat, DATA, READWRITE ;read/write by default
;Data section with name data_area is specified. DCD reserves space in memory for 48 data.
;One word each is reserved. Label filter_coeff can be used to access the data.
;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
_bifilter4_coeff_
DCD bifilter4_coeff
bifilter4_coeff

View File

@@ -128,7 +128,10 @@ skip_secondpass_filter
ENDP
;-----------------
AREA bifilters8x4_dat, DATA, READWRITE ;read/write by default
;Data section with name data_area is specified. DCD reserves space in memory for 48 data.
;One word each is reserved. Label filter_coeff can be used to access the data.
;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
_bifilter8x4_coeff_
DCD bifilter8x4_coeff
bifilter8x4_coeff

View File

@@ -176,7 +176,10 @@ skip_secondpass_filter
ENDP
;-----------------
AREA bifilters8_dat, DATA, READWRITE ;read/write by default
;Data section with name data_area is specified. DCD reserves space in memory for 48 data.
;One word each is reserved. Label filter_coeff can be used to access the data.
;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
_bifilter8_coeff_
DCD bifilter8_coeff
bifilter8_coeff

View File

@@ -397,8 +397,7 @@
bx lr
ENDP ; |vp8_loop_filter_horizontal_edge_y_neon|
;-----------------
AREA loopfilter_dat, DATA, READONLY
_lf_coeff_
DCD lf_coeff
lf_coeff

View File

@@ -104,7 +104,10 @@
ENDP ; |vp8_loop_filter_simple_horizontal_edge_neon|
;-----------------
AREA hloopfiltery_dat, DATA, READWRITE ;read/write by default
;Data section with name data_area is specified. DCD reserves space in memory for 16 data.
;One word each is reserved. Label filter_coeff can be used to access the data.
;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
_lfhy_coeff_
DCD lfhy_coeff
lfhy_coeff

View File

@@ -145,7 +145,10 @@
ENDP ; |vp8_loop_filter_simple_vertical_edge_neon|
;-----------------
AREA vloopfiltery_dat, DATA, READWRITE ;read/write by default
;Data section with name data_area is specified. DCD reserves space in memory for 16 data.
;One word each is reserved. Label filter_coeff can be used to access the data.
;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
_vlfy_coeff_
DCD vlfy_coeff
vlfy_coeff

View File

@@ -505,8 +505,7 @@
bx lr
ENDP ; |vp8_mbloop_filter_neon|
;-----------------
AREA mbloopfilter_dat, DATA, READONLY
_mblf_coeff_
DCD mblf_coeff
mblf_coeff

View File

@@ -10,8 +10,8 @@
#include "vpx_ports/config.h"
#include "vp8/common/recon.h"
#include "vp8/common/blockd.h"
#include "recon.h"
#include "blockd.h"
extern void vp8_recon16x16mb_neon(unsigned char *pred_ptr, short *diff_ptr, unsigned char *dst_ptr, int ystride, unsigned char *udst_ptr, unsigned char *vdst_ptr);

View File

@@ -113,7 +113,10 @@
ENDP
;-----------------
AREA idct4x4_dat, DATA, READWRITE ;read/write by default
;Data section with name data_area is specified. DCD reserves space in memory for 48 data.
;One word each is reserved. Label filter_coeff can be used to access the data.
;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
_idct_coeff_
DCD idct_coeff
idct_coeff

View File

@@ -476,7 +476,10 @@ secondpass_only_inner_loop_neon
ENDP
;-----------------
AREA subpelfilters16_dat, DATA, READWRITE ;read/write by default
;Data section with name data_area is specified. DCD reserves space in memory for 48 data.
;One word each is reserved. Label filter_coeff can be used to access the data.
;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
_filter16_coeff_
DCD filter16_coeff
filter16_coeff

View File

@@ -407,7 +407,10 @@ secondpass_filter4x4_only
ENDP
;-----------------
AREA subpelfilters4_dat, DATA, READWRITE ;read/write by default
;Data section with name data_area is specified. DCD reserves space in memory for 48 data.
;One word each is reserved. Label filter_coeff can be used to access the data.
;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
_filter4_coeff_
DCD filter4_coeff
filter4_coeff

View File

@@ -458,7 +458,10 @@ secondpass_filter8x4_only
ENDP
;-----------------
AREA subpelfilters8_dat, DATA, READWRITE ;read/write by default
;Data section with name data_area is specified. DCD reserves space in memory for 48 data.
;One word each is reserved. Label filter_coeff can be used to access the data.
;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
_filter8_coeff_
DCD filter8_coeff
filter8_coeff

View File

@@ -509,7 +509,10 @@ filt_blk2d_spo8x8_loop_neon
ENDP
;-----------------
AREA subpelfilters8_dat, DATA, READWRITE ;read/write by default
;Data section with name data_area is specified. DCD reserves space in memory for 48 data.
;One word each is reserved. Label filter_coeff can be used to access the data.
;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
_filter8_coeff_
DCD filter8_coeff
filter8_coeff

View File

@@ -53,9 +53,6 @@ extern prototype_copy_block(vp8_copy_mem16x16_neon);
extern prototype_recon_macroblock(vp8_recon_mb_neon);
extern prototype_build_intra_predictors(vp8_build_intra_predictors_mby_neon);
extern prototype_build_intra_predictors(vp8_build_intra_predictors_mby_s_neon);
#if !CONFIG_RUNTIME_CPU_DETECT
#undef vp8_recon_recon
#define vp8_recon_recon vp8_recon_b_neon
@@ -77,13 +74,6 @@ extern prototype_build_intra_predictors(vp8_build_intra_predictors_mby_s_neon);
#undef vp8_recon_recon_mb
#define vp8_recon_recon_mb vp8_recon_mb_neon
#undef vp8_recon_build_intra_predictors_mby
#define vp8_recon_build_intra_predictors_mby vp8_build_intra_predictors_mby_neon
#undef vp8_recon_build_intra_predictors_mby_s
#define vp8_recon_build_intra_predictors_mby_s vp8_build_intra_predictors_mby_s_neon
#endif
#endif

View File

@@ -10,10 +10,10 @@
#include "vpx_ports/config.h"
#include "vp8/common/blockd.h"
#include "vp8/common/reconintra.h"
#include "blockd.h"
#include "reconintra.h"
#include "vpx_mem/vpx_mem.h"
#include "vp8/common/recon.h"
#include "recon.h"
#if HAVE_ARMV7
extern void vp8_build_intra_predictors_mby_neon_func(

View File

@@ -1,5 +1,5 @@
/*
* Copyright (c) 2011 The WebM project authors. All Rights Reserved.
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
@@ -12,7 +12,13 @@
#include "vpx_ports/config.h"
#include <stddef.h>
#if CONFIG_VP8_ENCODER
#include "vpx_scale/yv12config.h"
#endif
#if CONFIG_VP8_DECODER
#include "onyxd_int.h"
#endif
#define DEFINE(sym, val) int sym = val;
@@ -25,6 +31,29 @@
* {
*/
#if CONFIG_VP8_DECODER || CONFIG_VP8_ENCODER
DEFINE(yv12_buffer_config_y_width, offsetof(YV12_BUFFER_CONFIG, y_width));
DEFINE(yv12_buffer_config_y_height, offsetof(YV12_BUFFER_CONFIG, y_height));
DEFINE(yv12_buffer_config_y_stride, offsetof(YV12_BUFFER_CONFIG, y_stride));
DEFINE(yv12_buffer_config_uv_width, offsetof(YV12_BUFFER_CONFIG, uv_width));
DEFINE(yv12_buffer_config_uv_height, offsetof(YV12_BUFFER_CONFIG, uv_height));
DEFINE(yv12_buffer_config_uv_stride, offsetof(YV12_BUFFER_CONFIG, uv_stride));
DEFINE(yv12_buffer_config_y_buffer, offsetof(YV12_BUFFER_CONFIG, y_buffer));
DEFINE(yv12_buffer_config_u_buffer, offsetof(YV12_BUFFER_CONFIG, u_buffer));
DEFINE(yv12_buffer_config_v_buffer, offsetof(YV12_BUFFER_CONFIG, v_buffer));
DEFINE(yv12_buffer_config_border, offsetof(YV12_BUFFER_CONFIG, border));
#endif
#if CONFIG_VP8_DECODER
DEFINE(mb_diff, offsetof(MACROBLOCKD, diff));
DEFINE(mb_predictor, offsetof(MACROBLOCKD, predictor));
DEFINE(mb_dst_y_stride, offsetof(MACROBLOCKD, dst.y_stride));
DEFINE(mb_dst_y_buffer, offsetof(MACROBLOCKD, dst.y_buffer));
DEFINE(mb_dst_u_buffer, offsetof(MACROBLOCKD, dst.u_buffer));
DEFINE(mb_dst_v_buffer, offsetof(MACROBLOCKD, dst.v_buffer));
DEFINE(mb_up_available, offsetof(MACROBLOCKD, up_available));
DEFINE(mb_left_available, offsetof(MACROBLOCKD, left_available));
DEFINE(detok_scan, offsetof(DETOK, scan));
DEFINE(detok_ptr_block2leftabove, offsetof(DETOK, ptr_block2leftabove));
DEFINE(detok_coef_tree_ptr, offsetof(DETOK, vp8_coef_tree_ptr));
@@ -48,6 +77,7 @@ DEFINE(bool_decoder_range, offsetof(BOOL_DECODER, range));
DEFINE(tokenextrabits_min_val, offsetof(TOKENEXTRABITS, min_val));
DEFINE(tokenextrabits_length, offsetof(TOKENEXTRABITS, Length));
#endif
//add asserts for any offset that is not supported by assembly code
//add asserts for any size that is not supported by assembly code

View File

@@ -1,49 +0,0 @@
/*
* Copyright (c) 2011 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include "vpx_ports/config.h"
#include <stddef.h>
#include "vpx_scale/yv12config.h"
#define ct_assert(name,cond) \
static void assert_##name(void) UNUSED;\
static void assert_##name(void) {switch(0){case 0:case !!(cond):;}}
#define DEFINE(sym, val) int sym = val;
/*
#define BLANK() asm volatile("\n->" : : )
*/
/*
* int main(void)
* {
*/
//vpx_scale
DEFINE(yv12_buffer_config_y_width, offsetof(YV12_BUFFER_CONFIG, y_width));
DEFINE(yv12_buffer_config_y_height, offsetof(YV12_BUFFER_CONFIG, y_height));
DEFINE(yv12_buffer_config_y_stride, offsetof(YV12_BUFFER_CONFIG, y_stride));
DEFINE(yv12_buffer_config_uv_width, offsetof(YV12_BUFFER_CONFIG, uv_width));
DEFINE(yv12_buffer_config_uv_height, offsetof(YV12_BUFFER_CONFIG, uv_height));
DEFINE(yv12_buffer_config_uv_stride, offsetof(YV12_BUFFER_CONFIG, uv_stride));
DEFINE(yv12_buffer_config_y_buffer, offsetof(YV12_BUFFER_CONFIG, y_buffer));
DEFINE(yv12_buffer_config_u_buffer, offsetof(YV12_BUFFER_CONFIG, u_buffer));
DEFINE(yv12_buffer_config_v_buffer, offsetof(YV12_BUFFER_CONFIG, v_buffer));
DEFINE(yv12_buffer_config_border, offsetof(YV12_BUFFER_CONFIG, border));
//add asserts for any offset that is not supported by assembly code
//add asserts for any size that is not supported by assembly code
/*
* return 0;
* }
*/

View File

@@ -12,6 +12,8 @@
#include "blockd.h"
#include "vpx_mem/vpx_mem.h"
const int vp8_block2type[25] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 1};
const unsigned char vp8_block2left[25] =
{
0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8

View File

@@ -14,17 +14,12 @@
void vpx_log(const char *format, ...);
#include "../../vpx_ports/config.h"
#include "../../vpx_scale/yv12config.h"
#include "vpx_ports/config.h"
#include "vpx_scale/yv12config.h"
#include "mv.h"
#include "treecoder.h"
#include "subpixel.h"
#include "../../vpx_ports/mem.h"
#include "../../vpx_config.h"
#if CONFIG_OPENCL
#include "opencl/vp8_opencl.h"
#endif
#include "vpx_ports/mem.h"
#define TRUE 1
#define FALSE 0
@@ -33,6 +28,11 @@ void vpx_log(const char *format, ...);
#define DCPREDSIMTHRESH 0
#define DCPREDCNTTHRESH 3
#define Y1CONTEXT 0
#define UCONTEXT 1
#define VCONTEXT 2
#define Y2CONTEXT 3
#define MB_FEATURE_TREE_PROBS 3
#define MAX_MB_SEGMENTS 4
@@ -48,11 +48,6 @@ typedef struct
int r, c;
} POS;
#define PLANE_TYPE_Y_NO_DC 0
#define PLANE_TYPE_Y2 1
#define PLANE_TYPE_UV 2
#define PLANE_TYPE_Y_WITH_DC 3
typedef char ENTROPY_CONTEXT;
typedef struct
@@ -63,6 +58,8 @@ typedef struct
ENTROPY_CONTEXT y2;
} ENTROPY_CONTEXT_PLANES;
extern const int vp8_block2type[25];
extern const unsigned char vp8_block2left[25];
extern const unsigned char vp8_block2above[25];
@@ -78,19 +75,19 @@ typedef enum
typedef enum
{
DC_PRED = 0, /* average of above and left pixels */
V_PRED = 1, /* vertical prediction */
H_PRED = 2, /* horizontal prediction */
TM_PRED = 3, /* Truemotion prediction */
B_PRED = 4, /* block based prediction, each block has its own prediction mode */
DC_PRED, /* average of above and left pixels */
V_PRED, /* vertical prediction */
H_PRED, /* horizontal prediction */
TM_PRED, /* Truemotion prediction */
B_PRED, /* block based prediction, each block has its own prediction mode */
NEARESTMV = 5,
NEARMV = 6,
ZEROMV = 7,
NEWMV = 8,
SPLITMV = 9,
NEARESTMV,
NEARMV,
ZEROMV,
NEWMV,
SPLITMV,
MB_MODE_COUNT = 10
MB_MODE_COUNT
} MB_PREDICTION_MODE;
/* Macroblock level features */
@@ -192,47 +189,24 @@ typedef struct
typedef struct
{
short *qcoeff_base;
int qcoeff_offset;
short *dqcoeff_base;
int dqcoeff_offset;
unsigned char *predictor_base;
int predictor_offset;
short *diff_base;
int diff_offset;
short *qcoeff;
short *dqcoeff;
unsigned char *predictor;
short *diff;
short *reference;
short *dequant;
#if CONFIG_OPENCL
cl_command_queue cl_commands; //pointer to macroblock CL command queue
cl_mem cl_diff_mem;
cl_mem cl_predictor_mem;
cl_mem cl_qcoeff_mem;
cl_mem cl_dqcoeff_mem;
cl_mem cl_eobs_mem;
cl_mem cl_dequant_mem; //Block-specific, not shared
cl_bool sixtap_filter; //Subpixel Prediction type (true=sixtap, false=bilinear)
#endif
/* 16 Y blocks, 4 U blocks, 4 V blocks each with 16 entries */
unsigned char **base_pre; //previous frame, same Macroblock, base pointer
unsigned char **base_pre;
int pre;
int pre_stride;
unsigned char **base_dst; //destination base pointer
unsigned char **base_dst;
int dst;
int dst_stride;
int eob; //only used in encoder? Decoder uses MBD.eobs
char *eobs_base; //beginning of MB.eobs
int eob;
B_MODE_INFO bmi;
@@ -242,26 +216,16 @@ typedef struct
{
DECLARE_ALIGNED(16, short, diff[400]); /* from idct diff */
DECLARE_ALIGNED(16, unsigned char, predictor[384]);
/* not used DECLARE_ALIGNED(16, short, reference[384]); */
DECLARE_ALIGNED(16, short, qcoeff[400]);
DECLARE_ALIGNED(16, short, dqcoeff[400]);
DECLARE_ALIGNED(16, char, eobs[25]);
#if CONFIG_OPENCL
cl_command_queue cl_commands; //Each macroblock gets its own command queue.
cl_mem cl_diff_mem;
cl_mem cl_predictor_mem;
cl_mem cl_qcoeff_mem;
cl_mem cl_dqcoeff_mem;
cl_mem cl_eobs_mem;
cl_bool sixtap_filter;
#endif
/* 16 Y blocks, 4 U, 4 V, 1 DC 2nd order block, each with 16 entries. */
BLOCKD block[25];
YV12_BUFFER_CONFIG pre; /* Filtered copy of previous frame reconstruction */
YV12_BUFFER_CONFIG dst; /* Destination buffer for current frame */
YV12_BUFFER_CONFIG dst;
MODE_INFO *mode_info_context;
int mode_info_stride;
@@ -311,7 +275,6 @@ typedef struct
unsigned int frames_since_golden;
unsigned int frames_till_alt_ref_frame;
vp8_subpix_fn_t subpixel_predict;
vp8_subpix_fn_t subpixel_predict8x4;
vp8_subpix_fn_t subpixel_predict8x8;

570
vp8/common/boolcoder.h Normal file
View File

@@ -0,0 +1,570 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#ifndef bool_coder_h
#define bool_coder_h 1
/* Arithmetic bool coder with largish probability range.
Timothy S Murphy 6 August 2004 */
/* So as not to force users to drag in too much of my idiosyncratic C++ world,
I avoid fancy storage management. */
#include <assert.h>
#include <stddef.h>
#include <stdio.h>
typedef unsigned char vp8bc_index_t; // probability index
/* There are a couple of slight variants in the details of finite-precision
arithmetic coding. May be safely ignored by most users. */
enum vp8bc_rounding
{
vp8bc_down = 0, // just like VP8
vp8bc_down_full = 1, // handles minimum probability correctly
vp8bc_up = 2
};
#if _MSC_VER
/* Note that msvc by default does not inline _anything_ (regardless of the
setting of inline_depth) and that a command-line option (-Ob1 or -Ob2)
is required to inline even the smallest functions. */
# pragma inline_depth( 255) // I mean it when I inline something
# pragma warning( disable : 4099) // No class vs. struct harassment
# pragma warning( disable : 4250) // dominance complaints
# pragma warning( disable : 4284) // operator-> in templates
# pragma warning( disable : 4800) // bool conversion
// don't let prefix ++,-- stand in for postfix, disaster would ensue
# pragma warning( error : 4620 4621)
#endif // _MSC_VER
#if __cplusplus
// Sometimes one wishes to be definite about integer lengths.
struct int_types
{
typedef const bool cbool;
typedef const signed char cchar;
typedef const short cshort;
typedef const int cint;
typedef const int clong;
typedef const double cdouble;
typedef const size_t csize_t;
typedef unsigned char uchar; // 8 bits
typedef const uchar cuchar;
typedef short int16;
typedef unsigned short uint16;
typedef const int16 cint16;
typedef const uint16 cuint16;
typedef int int32;
typedef unsigned int uint32;
typedef const int32 cint32;
typedef const uint32 cuint32;
typedef unsigned int uint;
typedef unsigned int ulong;
typedef const uint cuint;
typedef const ulong culong;
// All structs consume space, may as well have a vptr.
virtual ~int_types();
};
struct bool_coder_spec;
struct bool_coder;
struct bool_writer;
struct bool_reader;
struct bool_coder_namespace : int_types
{
typedef vp8bc_index_t Index;
typedef bool_coder_spec Spec;
typedef const Spec c_spec;
enum Rounding
{
Down = vp8bc_down,
down_full = vp8bc_down_full,
Up = vp8bc_up
};
};
// Archivable specification of a bool coder includes rounding spec
// and probability mapping table. The latter replaces a uchar j
// (0 <= j < 256) with an arbitrary uint16 tbl[j] = p.
// p/65536 is then the probability of a zero.
struct bool_coder_spec : bool_coder_namespace
{
friend struct bool_coder;
friend struct bool_writer;
friend struct bool_reader;
friend struct bool_coder_spec_float;
friend struct bool_coder_spec_explicit_table;
friend struct bool_coder_spec_exponential_table;
friend struct BPsrc;
private:
uint w; // precision
Rounding r;
uint ebits, mbits, ebias;
uint32 mmask;
Index max_index, half_index;
uint32 mantissa(Index i) const
{
assert(i < half_index);
return (1 << mbits) + (i & mmask);
}
uint exponent(Index i) const
{
assert(i < half_index);
return ebias - (i >> mbits);
}
uint16 Ptbl[256]; // kinda clunky, but so is storage management.
/* Cost in bits of encoding a zero at every probability, scaled by 2^20.
Assumes that index is at most 8 bits wide. */
uint32 Ctbl[256];
uint32 split(Index i, uint32 R) const // 1 <= split <= max( 1, R-1)
{
if (!ebias)
return 1 + (((R - 1) * Ptbl[i]) >> 16);
if (i >= half_index)
return R - split(max_index - i, R);
return 1 + (((R - 1) * mantissa(i)) >> exponent(i));
}
uint32 max_range() const
{
return (1 << w) - (r == down_full ? 0 : 1);
}
uint32 min_range() const
{
return (1 << (w - 1)) + (r == down_full ? 1 : 0);
}
uint32 Rinc() const
{
return r == Up ? 1 : 0;
}
void check_prec() const;
bool float_init(uint Ebits, uint Mbits);
void cost_init();
bool_coder_spec(
uint prec, Rounding rr, uint Ebits = 0, uint Mbits = 0
)
: w(prec), r(rr)
{
float_init(Ebits, Mbits);
}
public:
// Read complete spec from file.
bool_coder_spec(FILE *);
// Write spec to file.
void dump(FILE *) const;
// return probability index best approximating prob.
Index operator()(double prob) const;
// probability corresponding to index
double operator()(Index i) const;
Index complement(Index i) const
{
return max_index - i;
}
Index max_index() const
{
return max_index;
}
Index half_index() const
{
return half_index;
}
uint32 cost_zero(Index i) const
{
return Ctbl[i];
}
uint32 cost_one(Index i) const
{
return Ctbl[ max_index - i];
}
uint32 cost_bit(Index i, bool b) const
{
return Ctbl[b? max_index-i:i];
}
};
/* Pseudo floating-point probability specification.
At least one of Ebits and Mbits must be nonzero.
Since all arithmetic is done at 32 bits, Ebits is at most 5.
Total significant bits in index is Ebits + Mbits + 1.
Below the halfway point (i.e. when the top significant bit is 0),
the index is (e << Mbits) + m.
The exponent e is between 0 and (2**Ebits) - 1,
the mantissa m is between 0 and (2**Mbits) - 1.
Prepending an implicit 1 to the mantissa, the probability is then
(2**Mbits + m) >> (e - 2**Ebits - 1 - Mbits),
which has (1/2)**(2**Ebits + 1) as a minimum
and (1/2) * [1 - 2**(Mbits + 1)] as a maximum.
When the index is above the halfway point, the probability is the
complement of the probability associated to the complement of the index.
Note that the probability increases with the index and that, because of
the symmetry, we cannot encode probability exactly 1/2; though we
can get as close to 1/2 as we like, provided we have enough Mbits.
The latter is of course not a problem in practice, one never has
exact probabilities and entropy errors are second order, that is, the
"overcoding" of a zero will be largely compensated for by the
"undercoding" of a one (or vice-versa).
Compared to arithmetic probability specs (a la VP8), this will do better
at very high and low probabilities and worse at probabilities near 1/2,
as well as facilitating the usage of wider or narrower probability indices.
*/
struct bool_coder_spec_float : bool_coder_spec
{
bool_coder_spec_float(
uint Ebits = 3, uint Mbits = 4, Rounding rr = down_full, uint prec = 12
)
: bool_coder_spec(prec, rr, Ebits, Mbits)
{
cost_init();
}
};
struct bool_coder_spec_explicit_table : bool_coder_spec
{
bool_coder_spec_explicit_table(
cuint16 probability_table[256] = 0, // default is tbl[i] = i << 8.
Rounding = down_full,
uint precision = 16
);
};
// Contruct table via multiplicative interpolation between
// p[128] = 1/2 and p[0] = (1/2)^x.
// Since we are working with 16-bit precision, x is at most 16.
// For probabilities to increase with i, we must have x > 1.
// For 0 <= i <= 128, p[i] = (1/2)^{ 1 + [1 - (i/128)]*[x-1] }.
// Finally, p[128+i] = 1 - p[128 - i].
struct bool_coder_spec_exponential_table : bool_coder_spec
{
bool_coder_spec_exponential_table(uint x, Rounding = down_full, uint prec = 16);
};
// Commonalities between writer and reader.
struct bool_coder : bool_coder_namespace
{
friend struct bool_writer;
friend struct bool_reader;
friend struct BPsrc;
private:
uint32 Low, Range;
cuint32 min_range;
cuint32 rinc;
c_spec spec;
void _reset()
{
Low = 0;
Range = spec.max_range();
}
bool_coder(c_spec &s)
: min_range(s.min_range()),
rinc(s.Rinc()),
spec(s)
{
_reset();
}
uint32 half() const
{
return 1 + ((Range - 1) >> 1);
}
public:
c_spec &Spec() const
{
return spec;
}
};
struct bool_writer : bool_coder
{
friend struct BPsrc;
private:
uchar *Bstart, *Bend, *B;
int bit_lag;
bool is_toast;
void carry();
void reset()
{
_reset();
bit_lag = 32 - spec.w;
is_toast = 0;
}
void raw(bool value, uint32 split);
public:
bool_writer(c_spec &, uchar *Dest, size_t Len);
virtual ~bool_writer();
void operator()(Index p, bool v)
{
raw(v, spec.split(p, Range));
}
uchar *buf() const
{
return Bstart;
}
size_t bytes_written() const
{
return B - Bstart;
}
// Call when done with input, flushes internal state.
// DO NOT write any more data after calling this.
bool_writer &flush();
void write_bits(int n, uint val)
{
if (n)
{
uint m = 1 << (n - 1);
do
{
raw((bool)(val & m), half());
}
while (m >>= 1);
}
}
# if 0
// We are agnostic about storage management.
// By default, overflows throw an assert but user can
// override to provide an expanding buffer using ...
virtual void overflow(uint Len) const;
// ... this function copies already-written data into new buffer
// and retains new buffer location.
void new_buffer(uchar *dest, uint Len);
// Note that storage management is the user's responsibility.
# endif
};
// This could be adjusted to use a little less lookahead.
struct bool_reader : bool_coder
{
friend struct BPsrc;
private:
cuchar *const Bstart; // for debugging
cuchar *B;
cuchar *const Bend;
cuint shf;
uint bct;
bool raw(uint32 split);
public:
bool_reader(c_spec &s, cuchar *src, size_t Len);
bool operator()(Index p)
{
return raw(spec.split(p, Range));
}
uint read_bits(int num_bits)
{
uint v = 0;
while (--num_bits >= 0)
v += v + (raw(half()) ? 1 : 0);
return v;
}
};
extern "C" {
#endif /* __cplusplus */
/* C interface */
typedef struct bool_coder_spec bool_coder_spec;
typedef struct bool_writer bool_writer;
typedef struct bool_reader bool_reader;
typedef const bool_coder_spec c_bool_coder_spec;
typedef const bool_writer c_bool_writer;
typedef const bool_reader c_bool_reader;
/* Optionally override default precision when constructing coder_specs.
Just pass a zero pointer if you don't care.
Precision is at most 16 bits for table specs, at most 23 otherwise. */
struct vp8bc_prec
{
enum vp8bc_rounding r; /* see top header file for def */
unsigned int prec; /* range precision in bits */
};
typedef const struct vp8bc_prec vp8bc_c_prec;
/* bool_coder_spec contains mapping of uchars to actual probabilities
(16 bit uints) as well as (usually immaterial) selection of
exact finite-precision algorithm used (for now, the latter can only
be overridden using the C++ interface).
See comments above the corresponding C++ constructors for discussion,
especially of exponential probability table generation. */
bool_coder_spec *vp8bc_vp8spec(); // just like vp8
bool_coder_spec *vp8bc_literal_spec(
const unsigned short prob_map[256], // 0 is like vp8 w/more precision
vp8bc_c_prec*
);
bool_coder_spec *vp8bc_float_spec(
unsigned int exponent_bits, unsigned int mantissa_bits, vp8bc_c_prec*
);
bool_coder_spec *vp8bc_exponential_spec(unsigned int min_exp, vp8bc_c_prec *);
bool_coder_spec *vp8bc_spec_from_file(FILE *);
void vp8bc_destroy_spec(c_bool_coder_spec *);
void vp8bc_spec_to_file(c_bool_coder_spec *, FILE *);
/* Nearest index to supplied probability of zero, 0 <= prob <= 1. */
vp8bc_index_t vp8bc_index(c_bool_coder_spec *, double prob);
vp8bc_index_t vp8bc_index_from_counts(
c_bool_coder_spec *p, unsigned int zero_ct, unsigned int one_ct
);
/* In case you want to look */
double vp8bc_probability(c_bool_coder_spec *, vp8bc_index_t);
/* Opposite index */
vp8bc_index_t vp8bc_complement(c_bool_coder_spec *, vp8bc_index_t);
/* Cost in bits of encoding a zero at given probability, scaled by 2^20.
(assumes that an int holds at least 32 bits). */
unsigned int vp8bc_cost_zero(c_bool_coder_spec *, vp8bc_index_t);
unsigned int vp8bc_cost_one(c_bool_coder_spec *, vp8bc_index_t);
unsigned int vp8bc_cost_bit(c_bool_coder_spec *, vp8bc_index_t, int);
/* bool_writer interface */
/* Length = 0 disables checking for writes beyond buffer end. */
bool_writer *vp8bc_create_writer(
c_bool_coder_spec *, unsigned char *Destination, size_t Length
);
/* Flushes out any buffered data and returns total # of bytes written. */
size_t vp8bc_destroy_writer(bool_writer *);
void vp8bc_write_bool(bool_writer *, int boolean_val, vp8bc_index_t false_prob);
void vp8bc_write_bits(
bool_writer *, unsigned int integer_value, int number_of_bits
);
c_bool_coder_spec *vp8bc_writer_spec(c_bool_writer *);
/* bool_reader interface */
/* Length = 0 disables checking for reads beyond buffer end. */
bool_reader *vp8bc_create_reader(
c_bool_coder_spec *, const unsigned char *Source, size_t Length
);
void vp8bc_destroy_reader(bool_reader *);
int vp8bc_read_bool(bool_reader *, vp8bc_index_t false_prob);
unsigned int vp8bc_read_bits(bool_reader *, int number_of_bits);
c_bool_coder_spec *vp8bc_reader_spec(c_bool_reader *);
#if __cplusplus
}
#endif
#endif /* bool_coder_h */

View File

@@ -0,0 +1,93 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#ifndef CODEC_COMMON_INTERFACE_H
#define CODEC_COMMON_INTERFACE_H
#define __export
#define _export
#define dll_export __declspec( dllexport )
#define dll_import __declspec( dllimport )
// Playback ERROR Codes.
#define NO_DECODER_ERROR 0
#define REMOTE_DECODER_ERROR -1
#define DFR_BAD_DCT_COEFF -100
#define DFR_ZERO_LENGTH_FRAME -101
#define DFR_FRAME_SIZE_INVALID -102
#define DFR_OUTPUT_BUFFER_OVERFLOW -103
#define DFR_INVALID_FRAME_HEADER -104
#define FR_INVALID_MODE_TOKEN -110
#define ETR_ALLOCATION_ERROR -200
#define ETR_INVALID_ROOT_PTR -201
#define SYNCH_ERROR -400
#define BUFFER_UNDERFLOW_ERROR -500
#define PB_IB_OVERFLOW_ERROR -501
// External error triggers
#define PB_HEADER_CHECKSUM_ERROR -601
#define PB_DATA_CHECKSUM_ERROR -602
// DCT Error Codes
#define DDCT_EXPANSION_ERROR -700
#define DDCT_INVALID_TOKEN_ERROR -701
// exception_errors
#define GEN_EXCEPTIONS -800
#define EX_UNQUAL_ERROR -801
// Unrecoverable error codes
#define FATAL_PLAYBACK_ERROR -1000
#define GEN_ERROR_CREATING_CDC -1001
#define GEN_THREAD_CREATION_ERROR -1002
#define DFR_CREATE_BMP_FAILED -1003
// YUV buffer configuration structure
typedef struct
{
int y_width;
int y_height;
int y_stride;
int uv_width;
int uv_height;
int uv_stride;
unsigned char *y_buffer;
unsigned char *u_buffer;
unsigned char *v_buffer;
} YUV_BUFFER_CONFIG;
typedef enum
{
C_SET_KEY_FRAME,
C_SET_FIXED_Q,
C_SET_FIRSTPASS_FILE,
C_SET_EXPERIMENTAL_MIN,
C_SET_EXPERIMENTAL_MAX = C_SET_EXPERIMENTAL_MIN + 255,
C_SET_CHECKPROTECT,
C_SET_TESTMODE,
C_SET_INTERNAL_SIZE,
C_SET_RECOVERY_FRAME,
C_SET_REFERENCEFRAME,
C_SET_GOLDENFRAME
#ifndef VP50_COMP_INTERFACE
// Specialist test facilities.
// C_VCAP_PARAMS, // DO NOT USE FOR NOW WITH VFW CODEC
#endif
} C_SETTING;
typedef unsigned long C_SET_VALUE;
#endif

View File

@@ -18,8 +18,6 @@ enum
{
mv_max = 1023, /* max absolute value of a MV component */
MVvals = (2 * mv_max) + 1, /* # possible values "" */
mvfp_max = 255, /* max absolute value of a full pixel MV component */
MVfpvals = (2 * mvfp_max) +1, /* # possible full pixel MV values */
mvlong_width = 10, /* Large MVs have 9 bit magnitudes */
mvnum_short = 8, /* magnitudes 0 through 7 */

View File

@@ -13,12 +13,10 @@
#include "vpx_mem/vpx_mem.h"
static void copy_and_extend_plane
static void extend_plane_borders
(
unsigned char *s, /* source */
int sp, /* source pitch */
unsigned char *d, /* destination */
int dp, /* destination pitch */
int sp, /* pitch */
int h, /* height */
int w, /* width */
int et, /* extend top border */
@@ -27,6 +25,7 @@ static void copy_and_extend_plane
int er /* extend right border */
)
{
int i;
unsigned char *src_ptr1, *src_ptr2;
unsigned char *dest_ptr1, *dest_ptr2;
@@ -35,73 +34,68 @@ static void copy_and_extend_plane
/* copy the left and right most columns out */
src_ptr1 = s;
src_ptr2 = s + w - 1;
dest_ptr1 = d - el;
dest_ptr2 = d + w;
dest_ptr1 = s - el;
dest_ptr2 = s + w;
for (i = 0; i < h; i++)
for (i = 0; i < h - 0 + 1; i++)
{
vpx_memset(dest_ptr1, src_ptr1[0], el);
vpx_memcpy(dest_ptr1 + el, src_ptr1, w);
/* Some linkers will complain if we call vpx_memset with el set to a
* constant 0.
*/
if (el)
vpx_memset(dest_ptr1, src_ptr1[0], el);
vpx_memset(dest_ptr2, src_ptr2[0], er);
src_ptr1 += sp;
src_ptr2 += sp;
dest_ptr1 += dp;
dest_ptr2 += dp;
dest_ptr1 += sp;
dest_ptr2 += sp;
}
/* Now copy the top and bottom lines into each line of the respective
* borders
*/
src_ptr1 = d - el;
src_ptr2 = d + dp * (h - 1) - el;
dest_ptr1 = d + dp * (-et) - el;
dest_ptr2 = d + dp * (h) - el;
linesize = el + er + w;
/* Now copy the top and bottom source lines into each line of the respective borders */
src_ptr1 = s - el;
src_ptr2 = s + sp * (h - 1) - el;
dest_ptr1 = s + sp * (-et) - el;
dest_ptr2 = s + sp * (h) - el;
linesize = el + er + w + 1;
for (i = 0; i < et; i++)
for (i = 0; i < (int)et; i++)
{
vpx_memcpy(dest_ptr1, src_ptr1, linesize);
dest_ptr1 += dp;
dest_ptr1 += sp;
}
for (i = 0; i < eb; i++)
for (i = 0; i < (int)eb; i++)
{
vpx_memcpy(dest_ptr2, src_ptr2, linesize);
dest_ptr2 += dp;
dest_ptr2 += sp;
}
}
void vp8_copy_and_extend_frame(YV12_BUFFER_CONFIG *src,
YV12_BUFFER_CONFIG *dst)
void vp8_extend_to_multiple_of16(YV12_BUFFER_CONFIG *ybf, int width, int height)
{
int et = dst->border;
int el = dst->border;
int eb = dst->border + dst->y_height - src->y_height;
int er = dst->border + dst->y_width - src->y_width;
int er = 0xf & (16 - (width & 0xf));
int eb = 0xf & (16 - (height & 0xf));
copy_and_extend_plane(src->y_buffer, src->y_stride,
dst->y_buffer, dst->y_stride,
src->y_height, src->y_width,
et, el, eb, er);
/* check for non multiples of 16 */
if (er != 0 || eb != 0)
{
extend_plane_borders(ybf->y_buffer, ybf->y_stride, height, width, 0, 0, eb, er);
et = (et + 1) >> 1;
el = (el + 1) >> 1;
eb = (eb + 1) >> 1;
er = (er + 1) >> 1;
/* adjust for uv */
height = (height + 1) >> 1;
width = (width + 1) >> 1;
er = 0x7 & (8 - (width & 0x7));
eb = 0x7 & (8 - (height & 0x7));
copy_and_extend_plane(src->u_buffer, src->uv_stride,
dst->u_buffer, dst->uv_stride,
src->uv_height, src->uv_width,
et, el, eb, er);
copy_and_extend_plane(src->v_buffer, src->uv_stride,
dst->v_buffer, dst->uv_stride,
src->uv_height, src->uv_width,
et, el, eb, er);
if (er || eb)
{
extend_plane_borders(ybf->u_buffer, ybf->uv_stride, height, width, 0, 0, eb, er);
extend_plane_borders(ybf->v_buffer, ybf->uv_stride, height, width, 0, 0, eb, er);
}
}
}
/* note the extension is only for the last row, for intra prediction purpose */
void vp8_extend_mb_row(YV12_BUFFER_CONFIG *ybf, unsigned char *YPtr, unsigned char *UPtr, unsigned char *VPtr)
{

View File

@@ -14,8 +14,8 @@
#include "vpx_scale/yv12config.h"
void Extend(YV12_BUFFER_CONFIG *ybf);
void vp8_extend_mb_row(YV12_BUFFER_CONFIG *ybf, unsigned char *YPtr, unsigned char *UPtr, unsigned char *VPtr);
void vp8_copy_and_extend_frame(YV12_BUFFER_CONFIG *src,
YV12_BUFFER_CONFIG *dst);
void vp8_extend_to_multiple_of16(YV12_BUFFER_CONFIG *ybf, int width, int height);
#endif

View File

@@ -1,536 +0,0 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include <stdlib.h>
#include <stdio.h>
#define REGISTER_FILTER 1
#define CLAMP(x,min,max) if (x < min) x = min; else if ( x > max ) x = max;
#if REGISTER_FILTER
#define FILTER0 filter0
#define FILTER1 filter1
#define FILTER2 filter2
#define FILTER3 filter3
#define FILTER4 filter4
#define FILTER5 filter5
#else
#define FILTER0 vp8_filter[0]
#define FILTER1 vp8_filter[1]
#define FILTER2 vp8_filter[2]
#define FILTER3 vp8_filter[3]
#define FILTER4 vp8_filter[4]
#define FILTER5 vp8_filter[5]
#endif
#define SRC_INCREMENT src_increment
#include "filter.h"
#include "vpx_ports/mem.h"
DECLARE_ALIGNED(16, const short, vp8_bilinear_filters[8][2]) =
{
{ 128, 0 },
{ 112, 16 },
{ 96, 32 },
{ 80, 48 },
{ 64, 64 },
{ 48, 80 },
{ 32, 96 },
{ 16, 112 }
};
DECLARE_ALIGNED(16, const short, vp8_sub_pel_filters[8][6]) =
{
{ 0, 0, 128, 0, 0, 0 }, /* note that 1/8 pel positions are just as per alpha -0.5 bicubic */
{ 0, -6, 123, 12, -1, 0 },
{ 2, -11, 108, 36, -8, 1 }, /* New 1/4 pel 6 tap filter */
{ 0, -9, 93, 50, -6, 0 },
{ 3, -16, 77, 77, -16, 3 }, /* New 1/2 pel 6 tap filter */
{ 0, -6, 50, 93, -9, 0 },
{ 1, -8, 36, 108, -11, 2 }, /* New 1/4 pel 6 tap filter */
{ 0, -1, 12, 123, -6, 0 },
};
static void filter_block2d_first_pass
(
unsigned char *src_ptr,
int *output_ptr,
unsigned int src_pixels_per_line,
unsigned int pixel_step,
unsigned int output_height,
unsigned int output_width,
const short *vp8_filter
)
{
unsigned int i, j;
int Temp;
#if REGISTER_FILTER
short filter0 = vp8_filter[0];
short filter1 = vp8_filter[1];
short filter2 = vp8_filter[2];
short filter3 = vp8_filter[3];
short filter4 = vp8_filter[4];
short filter5 = vp8_filter[5];
#endif
int ps2 = 2*(int)pixel_step;
int ps3 = 3*(int)pixel_step;
unsigned int src_increment = src_pixels_per_line - output_width;
for (i = 0; i < output_height; i++)
{
for (j = 0; j < output_width; j++)
{
Temp = ((int)src_ptr[-1*ps2] * FILTER0);
Temp += ((int)src_ptr[-1*(int)pixel_step] * FILTER1) +
((int)src_ptr[0] * FILTER2) +
((int)src_ptr[pixel_step] * FILTER3) +
((int)src_ptr[ps2] * FILTER4) +
((int)src_ptr[ps3] * FILTER5) +
(VP8_FILTER_WEIGHT >> 1); /* Rounding */
/* Normalize back to 0-255 */
Temp = Temp >> VP8_FILTER_SHIFT;
CLAMP(Temp, 0, 255);
output_ptr[j] = Temp;
src_ptr++;
}
/* Next row... */
src_ptr += SRC_INCREMENT;
output_ptr += output_width;
}
}
static void filter_block2d_second_pass
(
int *src_ptr,
unsigned char *output_ptr,
int output_pitch,
unsigned int src_pixels_per_line,
unsigned int pixel_step,
unsigned int output_height,
unsigned int output_width,
const short *vp8_filter
)
{
unsigned int i, j;
int Temp;
#if REGISTER_FILTER
short filter0 = vp8_filter[0];
short filter1 = vp8_filter[1];
short filter2 = vp8_filter[2];
short filter3 = vp8_filter[3];
short filter4 = vp8_filter[4];
short filter5 = vp8_filter[5];
#endif
int ps2 = ((int)pixel_step) << 1;
int ps3 = ps2 + (int)pixel_step;
unsigned int src_increment = src_pixels_per_line - output_width;
for (i = 0; i < output_height; i++)
{
for (j = 0; j < output_width; j++)
{
/* Apply filter */
Temp = ((int)src_ptr[-1*ps2] * FILTER0) +
((int)src_ptr[-1*(int)pixel_step] * FILTER1) +
((int)src_ptr[0] * FILTER2) +
((int)src_ptr[pixel_step] * FILTER3) +
((int)src_ptr[ps2] * FILTER4) +
((int)src_ptr[ps3] * FILTER5) +
(VP8_FILTER_WEIGHT >> 1); /* Rounding */
/* Normalize back to 0-255 */
Temp = Temp >> VP8_FILTER_SHIFT;
CLAMP(Temp, 0, 255);
output_ptr[j] = (unsigned char)Temp;
src_ptr++;
}
/* Start next row */
src_ptr += src_increment;
output_ptr += output_pitch;
}
}
static void filter_block2d
(
unsigned char *src_ptr,
unsigned char *output_ptr,
unsigned int src_pixels_per_line,
int output_pitch,
const short *HFilter,
const short *VFilter
)
{
int FData[9*4]; /* Temp data buffer used in filtering */
/* First filter 1-D horizontally... */
filter_block2d_first_pass(src_ptr - (2 * src_pixels_per_line), FData, src_pixels_per_line, 1, 9, 4, HFilter);
/* then filter verticaly... */
filter_block2d_second_pass(FData + 8, output_ptr, output_pitch, 4, 4, 4, 4, VFilter);
}
void vp8_sixtap_predict_c
(
unsigned char *src_ptr,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_ptr,
int dst_pitch
)
{
const short *HFilter;
const short *VFilter;
HFilter = vp8_sub_pel_filters[xoffset]; /* 6 tap */
VFilter = vp8_sub_pel_filters[yoffset]; /* 6 tap */
filter_block2d(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter);
}
void vp8_sixtap_predict8x8_c
(
unsigned char *src_ptr,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_ptr,
int dst_pitch
)
{
const short *HFilter;
const short *VFilter;
int FData[13*16]; /* Temp data buffer used in filtering */
HFilter = vp8_sub_pel_filters[xoffset]; /* 6 tap */
VFilter = vp8_sub_pel_filters[yoffset]; /* 6 tap */
/* First filter 1-D horizontally... */
filter_block2d_first_pass(src_ptr - (2 * src_pixels_per_line), FData, src_pixels_per_line, 1, 13, 8, HFilter);
/* then filter verticaly... */
filter_block2d_second_pass(FData + 16, dst_ptr, dst_pitch, 8, 8, 8, 8, VFilter);
}
void vp8_sixtap_predict8x4_c
(
unsigned char *src_ptr,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_ptr,
int dst_pitch
)
{
const short *HFilter;
const short *VFilter;
int FData[13*16]; /* Temp data buffer used in filtering */
HFilter = vp8_sub_pel_filters[xoffset]; /* 6 tap */
VFilter = vp8_sub_pel_filters[yoffset]; /* 6 tap */
/* First filter 1-D horizontally... */
filter_block2d_first_pass(src_ptr - (2 * src_pixels_per_line), FData, src_pixels_per_line, 1, 9, 8, HFilter);
/* then filter verticaly... */
filter_block2d_second_pass(FData + 16, dst_ptr, dst_pitch, 8, 8, 4, 8, VFilter);
}
void vp8_sixtap_predict16x16_c
(
unsigned char *src_ptr,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_ptr,
int dst_pitch
)
{
const short *HFilter;
const short *VFilter;
int FData[21*24]; /* Temp data buffer used in filtering */
HFilter = vp8_sub_pel_filters[xoffset]; /* 6 tap */
VFilter = vp8_sub_pel_filters[yoffset]; /* 6 tap */
/* First filter 1-D horizontally... */
filter_block2d_first_pass(src_ptr - (2 * src_pixels_per_line), FData, src_pixels_per_line, 1, 21, 16, HFilter);
/* then filter verticaly... */
filter_block2d_second_pass(FData + 32, dst_ptr, dst_pitch, 16, 16, 16, 16, VFilter);
}
/****************************************************************************
*
* ROUTINE : filter_block2d_bil_first_pass
*
* INPUTS : UINT8 *src_ptr : Pointer to source block.
* UINT32 src_stride : Stride of source block.
* UINT32 height : Block height.
* UINT32 width : Block width.
* INT32 *vp8_filter : Array of 2 bi-linear filter taps.
*
* OUTPUTS : INT32 *dst_ptr : Pointer to filtered block.
*
* RETURNS : void
*
* FUNCTION : Applies a 1-D 2-tap bi-linear filter to the source block
* in the horizontal direction to produce the filtered output
* block. Used to implement first-pass of 2-D separable filter.
*
* SPECIAL NOTES : Produces INT32 output to retain precision for next pass.
* Two filter taps should sum to VP8_FILTER_WEIGHT.
*
****************************************************************************/
static void filter_block2d_bil_first_pass
(
unsigned char *src_ptr,
unsigned short *dst_ptr,
unsigned int src_stride,
unsigned int height,
unsigned int width,
const short *vp8_filter
)
{
unsigned int i, j;
for (i = 0; i < height; i++)
{
for (j = 0; j < width; j++)
{
/* Apply bilinear filter */
dst_ptr[j] = (((int)src_ptr[0] * vp8_filter[0]) +
((int)src_ptr[1] * vp8_filter[1]) +
(VP8_FILTER_WEIGHT / 2)) >> VP8_FILTER_SHIFT;
src_ptr++;
}
/* Next row... */
src_ptr += src_stride - width;
dst_ptr += width;
}
}
/****************************************************************************
*
* ROUTINE : filter_block2d_bil_second_pass
*
* INPUTS : INT32 *src_ptr : Pointer to source block.
* UINT32 dst_pitch : Destination block pitch.
* UINT32 height : Block height.
* UINT32 width : Block width.
* INT32 *vp8_filter : Array of 2 bi-linear filter taps.
*
* OUTPUTS : UINT16 *dst_ptr : Pointer to filtered block.
*
* RETURNS : void
*
* FUNCTION : Applies a 1-D 2-tap bi-linear filter to the source block
* in the vertical direction to produce the filtered output
* block. Used to implement second-pass of 2-D separable filter.
*
* SPECIAL NOTES : Requires 32-bit input as produced by filter_block2d_bil_first_pass.
* Two filter taps should sum to VP8_FILTER_WEIGHT.
*
****************************************************************************/
static void filter_block2d_bil_second_pass
(
unsigned short *src_ptr,
unsigned char *dst_ptr,
int dst_pitch,
unsigned int height,
unsigned int width,
const short *vp8_filter
)
{
unsigned int i, j;
int Temp;
for (i = 0; i < height; i++)
{
for (j = 0; j < width; j++)
{
/* Apply filter */
Temp = ((int)src_ptr[0] * vp8_filter[0]) +
((int)src_ptr[width] * vp8_filter[1]) +
(VP8_FILTER_WEIGHT / 2);
dst_ptr[j] = (unsigned int)(Temp >> VP8_FILTER_SHIFT);
src_ptr++;
}
/* Next row... */
dst_ptr += dst_pitch;
}
}
/****************************************************************************
*
* ROUTINE : filter_block2d_bil
*
* INPUTS : UINT8 *src_ptr : Pointer to source block.
* UINT32 src_pitch : Stride of source block.
* UINT32 dst_pitch : Stride of destination block.
* INT32 *HFilter : Array of 2 horizontal filter taps.
* INT32 *VFilter : Array of 2 vertical filter taps.
* INT32 Width : Block width
* INT32 Height : Block height
*
* OUTPUTS : UINT16 *dst_ptr : Pointer to filtered block.
*
* RETURNS : void
*
* FUNCTION : 2-D filters an input block by applying a 2-tap
* bi-linear filter horizontally followed by a 2-tap
* bi-linear filter vertically on the result.
*
* SPECIAL NOTES : The largest block size can be handled here is 16x16
*
****************************************************************************/
static void filter_block2d_bil
(
unsigned char *src_ptr,
unsigned char *dst_ptr,
unsigned int src_pitch,
unsigned int dst_pitch,
const short *HFilter,
const short *VFilter,
int Width,
int Height
)
{
unsigned short FData[17*16]; /* Temp data buffer used in filtering */
/* First filter 1-D horizontally... */
filter_block2d_bil_first_pass(src_ptr, FData, src_pitch, Height + 1, Width, HFilter);
/* then 1-D vertically... */
filter_block2d_bil_second_pass(FData, dst_ptr, dst_pitch, Height, Width, VFilter);
}
void vp8_bilinear_predict4x4_c
(
unsigned char *src_ptr,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_ptr,
int dst_pitch
)
{
const short *HFilter;
const short *VFilter;
HFilter = vp8_bilinear_filters[xoffset];
VFilter = vp8_bilinear_filters[yoffset];
#if 0
{
int i;
unsigned char temp1[16];
unsigned char temp2[16];
bilinear_predict4x4_mmx(src_ptr, src_pixels_per_line, xoffset, yoffset, temp1, 4);
filter_block2d_bil(src_ptr, temp2, src_pixels_per_line, 4, HFilter, VFilter, 4, 4);
for (i = 0; i < 16; i++)
{
if (temp1[i] != temp2[i])
{
bilinear_predict4x4_mmx(src_ptr, src_pixels_per_line, xoffset, yoffset, temp1, 4);
filter_block2d_bil(src_ptr, temp2, src_pixels_per_line, 4, HFilter, VFilter, 4, 4);
}
}
}
#endif
filter_block2d_bil(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 4, 4);
}
void vp8_bilinear_predict8x8_c
(
unsigned char *src_ptr,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_ptr,
int dst_pitch
)
{
const short *HFilter;
const short *VFilter;
HFilter = vp8_bilinear_filters[xoffset];
VFilter = vp8_bilinear_filters[yoffset];
filter_block2d_bil(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 8, 8);
}
void vp8_bilinear_predict8x4_c
(
unsigned char *src_ptr,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_ptr,
int dst_pitch
)
{
const short *HFilter;
const short *VFilter;
HFilter = vp8_bilinear_filters[xoffset];
VFilter = vp8_bilinear_filters[yoffset];
filter_block2d_bil(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 8, 4);
}
void vp8_bilinear_predict16x16_c
(
unsigned char *src_ptr,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_ptr,
int dst_pitch
)
{
const short *HFilter;
const short *VFilter;
HFilter = vp8_bilinear_filters[xoffset];
VFilter = vp8_bilinear_filters[yoffset];
filter_block2d_bil(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 16, 16);
}

540
vp8/common/filter_c.c Normal file
View File

@@ -0,0 +1,540 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include <stdlib.h>
#define BLOCK_HEIGHT_WIDTH 4
#define VP8_FILTER_WEIGHT 128
#define VP8_FILTER_SHIFT 7
static const int bilinear_filters[8][2] =
{
{ 128, 0 },
{ 112, 16 },
{ 96, 32 },
{ 80, 48 },
{ 64, 64 },
{ 48, 80 },
{ 32, 96 },
{ 16, 112 }
};
static const short sub_pel_filters[8][6] =
{
{ 0, 0, 128, 0, 0, 0 }, /* note that 1/8 pel positions are just as per alpha -0.5 bicubic */
{ 0, -6, 123, 12, -1, 0 },
{ 2, -11, 108, 36, -8, 1 }, /* New 1/4 pel 6 tap filter */
{ 0, -9, 93, 50, -6, 0 },
{ 3, -16, 77, 77, -16, 3 }, /* New 1/2 pel 6 tap filter */
{ 0, -6, 50, 93, -9, 0 },
{ 1, -8, 36, 108, -11, 2 }, /* New 1/4 pel 6 tap filter */
{ 0, -1, 12, 123, -6, 0 },
};
void vp8_filter_block2d_first_pass
(
unsigned char *src_ptr,
int *output_ptr,
unsigned int src_pixels_per_line,
unsigned int pixel_step,
unsigned int output_height,
unsigned int output_width,
const short *vp8_filter
)
{
unsigned int i, j;
int Temp;
for (i = 0; i < output_height; i++)
{
for (j = 0; j < output_width; j++)
{
Temp = ((int)src_ptr[-2 * (int)pixel_step] * vp8_filter[0]) +
((int)src_ptr[-1 * (int)pixel_step] * vp8_filter[1]) +
((int)src_ptr[0] * vp8_filter[2]) +
((int)src_ptr[pixel_step] * vp8_filter[3]) +
((int)src_ptr[2*pixel_step] * vp8_filter[4]) +
((int)src_ptr[3*pixel_step] * vp8_filter[5]) +
(VP8_FILTER_WEIGHT >> 1); /* Rounding */
/* Normalize back to 0-255 */
Temp = Temp >> VP8_FILTER_SHIFT;
if (Temp < 0)
Temp = 0;
else if (Temp > 255)
Temp = 255;
output_ptr[j] = Temp;
src_ptr++;
}
/* Next row... */
src_ptr += src_pixels_per_line - output_width;
output_ptr += output_width;
}
}
void vp8_filter_block2d_second_pass
(
int *src_ptr,
unsigned char *output_ptr,
int output_pitch,
unsigned int src_pixels_per_line,
unsigned int pixel_step,
unsigned int output_height,
unsigned int output_width,
const short *vp8_filter
)
{
unsigned int i, j;
int Temp;
for (i = 0; i < output_height; i++)
{
for (j = 0; j < output_width; j++)
{
/* Apply filter */
Temp = ((int)src_ptr[-2 * (int)pixel_step] * vp8_filter[0]) +
((int)src_ptr[-1 * (int)pixel_step] * vp8_filter[1]) +
((int)src_ptr[0] * vp8_filter[2]) +
((int)src_ptr[pixel_step] * vp8_filter[3]) +
((int)src_ptr[2*pixel_step] * vp8_filter[4]) +
((int)src_ptr[3*pixel_step] * vp8_filter[5]) +
(VP8_FILTER_WEIGHT >> 1); /* Rounding */
/* Normalize back to 0-255 */
Temp = Temp >> VP8_FILTER_SHIFT;
if (Temp < 0)
Temp = 0;
else if (Temp > 255)
Temp = 255;
output_ptr[j] = (unsigned char)Temp;
src_ptr++;
}
/* Start next row */
src_ptr += src_pixels_per_line - output_width;
output_ptr += output_pitch;
}
}
void vp8_filter_block2d
(
unsigned char *src_ptr,
unsigned char *output_ptr,
unsigned int src_pixels_per_line,
int output_pitch,
const short *HFilter,
const short *VFilter
)
{
int FData[9*4]; /* Temp data bufffer used in filtering */
/* First filter 1-D horizontally... */
vp8_filter_block2d_first_pass(src_ptr - (2 * src_pixels_per_line), FData, src_pixels_per_line, 1, 9, 4, HFilter);
/* then filter verticaly... */
vp8_filter_block2d_second_pass(FData + 8, output_ptr, output_pitch, 4, 4, 4, 4, VFilter);
}
void vp8_block_variation_c
(
unsigned char *src_ptr,
int src_pixels_per_line,
int *HVar,
int *VVar
)
{
int i, j;
unsigned char *Ptr = src_ptr;
for (i = 0; i < 4; i++)
{
for (j = 0; j < 4; j++)
{
*HVar += abs((int)Ptr[j] - (int)Ptr[j+1]);
*VVar += abs((int)Ptr[j] - (int)Ptr[j+src_pixels_per_line]);
}
Ptr += src_pixels_per_line;
}
}
void vp8_sixtap_predict_c
(
unsigned char *src_ptr,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_ptr,
int dst_pitch
)
{
const short *HFilter;
const short *VFilter;
HFilter = sub_pel_filters[xoffset]; /* 6 tap */
VFilter = sub_pel_filters[yoffset]; /* 6 tap */
vp8_filter_block2d(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter);
}
void vp8_sixtap_predict8x8_c
(
unsigned char *src_ptr,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_ptr,
int dst_pitch
)
{
const short *HFilter;
const short *VFilter;
int FData[13*16]; /* Temp data bufffer used in filtering */
HFilter = sub_pel_filters[xoffset]; /* 6 tap */
VFilter = sub_pel_filters[yoffset]; /* 6 tap */
/* First filter 1-D horizontally... */
vp8_filter_block2d_first_pass(src_ptr - (2 * src_pixels_per_line), FData, src_pixels_per_line, 1, 13, 8, HFilter);
/* then filter verticaly... */
vp8_filter_block2d_second_pass(FData + 16, dst_ptr, dst_pitch, 8, 8, 8, 8, VFilter);
}
void vp8_sixtap_predict8x4_c
(
unsigned char *src_ptr,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_ptr,
int dst_pitch
)
{
const short *HFilter;
const short *VFilter;
int FData[13*16]; /* Temp data bufffer used in filtering */
HFilter = sub_pel_filters[xoffset]; /* 6 tap */
VFilter = sub_pel_filters[yoffset]; /* 6 tap */
/* First filter 1-D horizontally... */
vp8_filter_block2d_first_pass(src_ptr - (2 * src_pixels_per_line), FData, src_pixels_per_line, 1, 9, 8, HFilter);
/* then filter verticaly... */
vp8_filter_block2d_second_pass(FData + 16, dst_ptr, dst_pitch, 8, 8, 4, 8, VFilter);
}
void vp8_sixtap_predict16x16_c
(
unsigned char *src_ptr,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_ptr,
int dst_pitch
)
{
const short *HFilter;
const short *VFilter;
int FData[21*24]; /* Temp data bufffer used in filtering */
HFilter = sub_pel_filters[xoffset]; /* 6 tap */
VFilter = sub_pel_filters[yoffset]; /* 6 tap */
/* First filter 1-D horizontally... */
vp8_filter_block2d_first_pass(src_ptr - (2 * src_pixels_per_line), FData, src_pixels_per_line, 1, 21, 16, HFilter);
/* then filter verticaly... */
vp8_filter_block2d_second_pass(FData + 32, dst_ptr, dst_pitch, 16, 16, 16, 16, VFilter);
}
/****************************************************************************
*
* ROUTINE : filter_block2d_bil_first_pass
*
* INPUTS : UINT8 *src_ptr : Pointer to source block.
* UINT32 src_pixels_per_line : Stride of input block.
* UINT32 pixel_step : Offset between filter input samples (see notes).
* UINT32 output_height : Input block height.
* UINT32 output_width : Input block width.
* INT32 *vp8_filter : Array of 2 bi-linear filter taps.
*
* OUTPUTS : INT32 *output_ptr : Pointer to filtered block.
*
* RETURNS : void
*
* FUNCTION : Applies a 1-D 2-tap bi-linear filter to the source block in
* either horizontal or vertical direction to produce the
* filtered output block. Used to implement first-pass
* of 2-D separable filter.
*
* SPECIAL NOTES : Produces INT32 output to retain precision for next pass.
* Two filter taps should sum to VP8_FILTER_WEIGHT.
* pixel_step defines whether the filter is applied
* horizontally (pixel_step=1) or vertically (pixel_step=stride).
* It defines the offset required to move from one input
* to the next.
*
****************************************************************************/
void vp8_filter_block2d_bil_first_pass
(
unsigned char *src_ptr,
unsigned short *output_ptr,
unsigned int src_pixels_per_line,
int pixel_step,
unsigned int output_height,
unsigned int output_width,
const int *vp8_filter
)
{
unsigned int i, j;
for (i = 0; i < output_height; i++)
{
for (j = 0; j < output_width; j++)
{
/* Apply bilinear filter */
output_ptr[j] = (((int)src_ptr[0] * vp8_filter[0]) +
((int)src_ptr[pixel_step] * vp8_filter[1]) +
(VP8_FILTER_WEIGHT / 2)) >> VP8_FILTER_SHIFT;
src_ptr++;
}
/* Next row... */
src_ptr += src_pixels_per_line - output_width;
output_ptr += output_width;
}
}
/****************************************************************************
*
* ROUTINE : filter_block2d_bil_second_pass
*
* INPUTS : INT32 *src_ptr : Pointer to source block.
* UINT32 src_pixels_per_line : Stride of input block.
* UINT32 pixel_step : Offset between filter input samples (see notes).
* UINT32 output_height : Input block height.
* UINT32 output_width : Input block width.
* INT32 *vp8_filter : Array of 2 bi-linear filter taps.
*
* OUTPUTS : UINT16 *output_ptr : Pointer to filtered block.
*
* RETURNS : void
*
* FUNCTION : Applies a 1-D 2-tap bi-linear filter to the source block in
* either horizontal or vertical direction to produce the
* filtered output block. Used to implement second-pass
* of 2-D separable filter.
*
* SPECIAL NOTES : Requires 32-bit input as produced by filter_block2d_bil_first_pass.
* Two filter taps should sum to VP8_FILTER_WEIGHT.
* pixel_step defines whether the filter is applied
* horizontally (pixel_step=1) or vertically (pixel_step=stride).
* It defines the offset required to move from one input
* to the next.
*
****************************************************************************/
void vp8_filter_block2d_bil_second_pass
(
unsigned short *src_ptr,
unsigned char *output_ptr,
int output_pitch,
unsigned int src_pixels_per_line,
unsigned int pixel_step,
unsigned int output_height,
unsigned int output_width,
const int *vp8_filter
)
{
unsigned int i, j;
int Temp;
for (i = 0; i < output_height; i++)
{
for (j = 0; j < output_width; j++)
{
/* Apply filter */
Temp = ((int)src_ptr[0] * vp8_filter[0]) +
((int)src_ptr[pixel_step] * vp8_filter[1]) +
(VP8_FILTER_WEIGHT / 2);
output_ptr[j] = (unsigned int)(Temp >> VP8_FILTER_SHIFT);
src_ptr++;
}
/* Next row... */
src_ptr += src_pixels_per_line - output_width;
output_ptr += output_pitch;
}
}
/****************************************************************************
*
* ROUTINE : filter_block2d_bil
*
* INPUTS : UINT8 *src_ptr : Pointer to source block.
* UINT32 src_pixels_per_line : Stride of input block.
* INT32 *HFilter : Array of 2 horizontal filter taps.
* INT32 *VFilter : Array of 2 vertical filter taps.
*
* OUTPUTS : UINT16 *output_ptr : Pointer to filtered block.
*
* RETURNS : void
*
* FUNCTION : 2-D filters an input block by applying a 2-tap
* bi-linear filter horizontally followed by a 2-tap
* bi-linear filter vertically on the result.
*
* SPECIAL NOTES : The largest block size can be handled here is 16x16
*
****************************************************************************/
void vp8_filter_block2d_bil
(
unsigned char *src_ptr,
unsigned char *output_ptr,
unsigned int src_pixels_per_line,
unsigned int dst_pitch,
const int *HFilter,
const int *VFilter,
int Width,
int Height
)
{
unsigned short FData[17*16]; /* Temp data bufffer used in filtering */
/* First filter 1-D horizontally... */
vp8_filter_block2d_bil_first_pass(src_ptr, FData, src_pixels_per_line, 1, Height + 1, Width, HFilter);
/* then 1-D vertically... */
vp8_filter_block2d_bil_second_pass(FData, output_ptr, dst_pitch, Width, Width, Height, Width, VFilter);
}
void vp8_bilinear_predict4x4_c
(
unsigned char *src_ptr,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_ptr,
int dst_pitch
)
{
const int *HFilter;
const int *VFilter;
HFilter = bilinear_filters[xoffset];
VFilter = bilinear_filters[yoffset];
#if 0
{
int i;
unsigned char temp1[16];
unsigned char temp2[16];
bilinear_predict4x4_mmx(src_ptr, src_pixels_per_line, xoffset, yoffset, temp1, 4);
vp8_filter_block2d_bil(src_ptr, temp2, src_pixels_per_line, 4, HFilter, VFilter, 4, 4);
for (i = 0; i < 16; i++)
{
if (temp1[i] != temp2[i])
{
bilinear_predict4x4_mmx(src_ptr, src_pixels_per_line, xoffset, yoffset, temp1, 4);
vp8_filter_block2d_bil(src_ptr, temp2, src_pixels_per_line, 4, HFilter, VFilter, 4, 4);
}
}
}
#endif
vp8_filter_block2d_bil(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 4, 4);
}
void vp8_bilinear_predict8x8_c
(
unsigned char *src_ptr,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_ptr,
int dst_pitch
)
{
const int *HFilter;
const int *VFilter;
HFilter = bilinear_filters[xoffset];
VFilter = bilinear_filters[yoffset];
vp8_filter_block2d_bil(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 8, 8);
}
void vp8_bilinear_predict8x4_c
(
unsigned char *src_ptr,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_ptr,
int dst_pitch
)
{
const int *HFilter;
const int *VFilter;
HFilter = bilinear_filters[xoffset];
VFilter = bilinear_filters[yoffset];
vp8_filter_block2d_bil(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 8, 4);
}
void vp8_bilinear_predict16x16_c
(
unsigned char *src_ptr,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_ptr,
int dst_pitch
)
{
const int *HFilter;
const int *VFilter;
HFilter = bilinear_filters[xoffset];
VFilter = bilinear_filters[yoffset];
vp8_filter_block2d_bil(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 16, 16);
}

View File

@@ -11,16 +11,47 @@
#include "findnearmv.h"
const unsigned char vp8_mbsplit_offset[4][16] = {
{ 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
{ 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
{ 0, 2, 8, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
};
#define FINDNEAR_SEARCH_SITES 3
/* Predict motion vectors using those from already-decoded nearby blocks.
Note that we only consider one 4x4 subblock from each candidate 16x16
macroblock. */
typedef union
{
unsigned int as_int;
MV as_mv;
} int_mv; /* facilitates rapid equality tests */
static void mv_bias(const MODE_INFO *x, int refframe, int_mv *mvp, const int *ref_frame_sign_bias)
{
MV xmv;
xmv = x->mbmi.mv.as_mv;
if (ref_frame_sign_bias[x->mbmi.ref_frame] != ref_frame_sign_bias[refframe])
{
xmv.row *= -1;
xmv.col *= -1;
}
mvp->as_mv = xmv;
}
void vp8_clamp_mv(MV *mv, const MACROBLOCKD *xd)
{
if (mv->col < (xd->mb_to_left_edge - LEFT_TOP_MARGIN))
mv->col = xd->mb_to_left_edge - LEFT_TOP_MARGIN;
else if (mv->col > xd->mb_to_right_edge + RIGHT_BOTTOM_MARGIN)
mv->col = xd->mb_to_right_edge + RIGHT_BOTTOM_MARGIN;
if (mv->row < (xd->mb_to_top_edge - LEFT_TOP_MARGIN))
mv->row = xd->mb_to_top_edge - LEFT_TOP_MARGIN;
else if (mv->row > xd->mb_to_bottom_edge + RIGHT_BOTTOM_MARGIN)
mv->row = xd->mb_to_bottom_edge + RIGHT_BOTTOM_MARGIN;
}
void vp8_find_near_mvs
(
MACROBLOCKD *xd,
@@ -51,7 +82,7 @@ void vp8_find_near_mvs
if (above->mbmi.mv.as_int)
{
(++mv)->as_int = above->mbmi.mv.as_int;
mv_bias(ref_frame_sign_bias[above->mbmi.ref_frame], refframe, mv, ref_frame_sign_bias);
mv_bias(above, refframe, mv, ref_frame_sign_bias);
++cntx;
}
@@ -66,7 +97,7 @@ void vp8_find_near_mvs
int_mv this_mv;
this_mv.as_int = left->mbmi.mv.as_int;
mv_bias(ref_frame_sign_bias[left->mbmi.ref_frame], refframe, &this_mv, ref_frame_sign_bias);
mv_bias(left, refframe, &this_mv, ref_frame_sign_bias);
if (this_mv.as_int != mv->as_int)
{
@@ -88,7 +119,7 @@ void vp8_find_near_mvs
int_mv this_mv;
this_mv.as_int = aboveleft->mbmi.mv.as_int;
mv_bias(ref_frame_sign_bias[aboveleft->mbmi.ref_frame], refframe, &this_mv, ref_frame_sign_bias);
mv_bias(aboveleft, refframe, &this_mv, ref_frame_sign_bias);
if (this_mv.as_int != mv->as_int)
{

View File

@@ -17,41 +17,6 @@
#include "modecont.h"
#include "treecoder.h"
typedef union
{
unsigned int as_int;
MV as_mv;
} int_mv; /* facilitates rapid equality tests */
static void mv_bias(int refmb_ref_frame_sign_bias, int refframe, int_mv *mvp, const int *ref_frame_sign_bias)
{
MV xmv;
xmv = mvp->as_mv;
if (refmb_ref_frame_sign_bias != ref_frame_sign_bias[refframe])
{
xmv.row *= -1;
xmv.col *= -1;
}
mvp->as_mv = xmv;
}
#define LEFT_TOP_MARGIN (16 << 3)
#define RIGHT_BOTTOM_MARGIN (16 << 3)
static void vp8_clamp_mv(MV *mv, const MACROBLOCKD *xd)
{
if (mv->col < (xd->mb_to_left_edge - LEFT_TOP_MARGIN))
mv->col = xd->mb_to_left_edge - LEFT_TOP_MARGIN;
else if (mv->col > xd->mb_to_right_edge + RIGHT_BOTTOM_MARGIN)
mv->col = xd->mb_to_right_edge + RIGHT_BOTTOM_MARGIN;
if (mv->row < (xd->mb_to_top_edge - LEFT_TOP_MARGIN))
mv->row = xd->mb_to_top_edge - LEFT_TOP_MARGIN;
else if (mv->row > xd->mb_to_bottom_edge + RIGHT_BOTTOM_MARGIN)
mv->row = xd->mb_to_bottom_edge + RIGHT_BOTTOM_MARGIN;
}
void vp8_find_near_mvs
(
MACROBLOCKD *xd,
@@ -70,6 +35,8 @@ const B_MODE_INFO *vp8_left_bmi(const MODE_INFO *cur_mb, int b);
const B_MODE_INFO *vp8_above_bmi(const MODE_INFO *cur_mb, int b, int mi_stride);
extern const unsigned char vp8_mbsplit_offset[4][16];
#define LEFT_TOP_MARGIN (16 << 3)
#define RIGHT_BOTTOM_MARGIN (16 << 3)
#endif

121
vp8/common/fourcc.hpp Normal file
View File

@@ -0,0 +1,121 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#ifndef FOURCC_HPP
#define FOURCC_HPP
#include <iosfwd>
#include <cstring>
#if defined(__POWERPC__) || defined(__APPLE__) || defined(__MERKS__)
using namespace std;
#endif
class four_cc
{
public:
four_cc();
four_cc(const char*);
explicit four_cc(unsigned long);
bool operator==(const four_cc&) const;
bool operator!=(const four_cc&) const;
bool operator==(const char*) const;
bool operator!=(const char*) const;
operator unsigned long() const;
unsigned long as_long() const;
four_cc& operator=(unsigned long);
char operator[](int) const;
std::ostream& put(std::ostream&) const;
bool printable() const;
private:
union
{
char code[4];
unsigned long code_as_long;
};
};
inline four_cc::four_cc()
{
}
inline four_cc::four_cc(unsigned long x)
: code_as_long(x)
{
}
inline four_cc::four_cc(const char* str)
{
memcpy(code, str, 4);
}
inline bool four_cc::operator==(const four_cc& rhs) const
{
return code_as_long == rhs.code_as_long;
}
inline bool four_cc::operator!=(const four_cc& rhs) const
{
return !operator==(rhs);
}
inline bool four_cc::operator==(const char* rhs) const
{
return (memcmp(code, rhs, 4) == 0);
}
inline bool four_cc::operator!=(const char* rhs) const
{
return !operator==(rhs);
}
inline four_cc::operator unsigned long() const
{
return code_as_long;
}
inline unsigned long four_cc::as_long() const
{
return code_as_long;
}
inline char four_cc::operator[](int i) const
{
return code[i];
}
inline four_cc& four_cc::operator=(unsigned long val)
{
code_as_long = val;
return *this;
}
inline std::ostream& operator<<(std::ostream& os, const four_cc& rhs)
{
return rhs.put(os);
}
#endif

View File

@@ -10,16 +10,21 @@
#include "vpx_ports/config.h"
#include "vp8/common/g_common.h"
#include "vp8/common/subpixel.h"
#include "vp8/common/loopfilter.h"
#include "vp8/common/recon.h"
#include "vp8/common/idct.h"
#include "vp8/common/onyxc_int.h"
#include "g_common.h"
#include "subpixel.h"
#include "loopfilter.h"
#include "recon.h"
#include "idct.h"
#include "onyxc_int.h"
extern void vp8_arch_x86_common_init(VP8_COMMON *ctx);
extern void vp8_arch_arm_common_init(VP8_COMMON *ctx);
extern void vp8_arch_opencl_common_init(VP8_COMMON *ctx);
void (*vp8_build_intra_predictors_mby_ptr)(MACROBLOCKD *x);
extern void vp8_build_intra_predictors_mby(MACROBLOCKD *x);
void (*vp8_build_intra_predictors_mby_s_ptr)(MACROBLOCKD *x);
extern void vp8_build_intra_predictors_mby_s(MACROBLOCKD *x);
void vp8_machine_specific_config(VP8_COMMON *ctx)
{
@@ -40,10 +45,6 @@ void vp8_machine_specific_config(VP8_COMMON *ctx)
rtcd->recon.recon4 = vp8_recon4b_c;
rtcd->recon.recon_mb = vp8_recon_mb_c;
rtcd->recon.recon_mby = vp8_recon_mby_c;
rtcd->recon.build_intra_predictors_mby =
vp8_build_intra_predictors_mby;
rtcd->recon.build_intra_predictors_mby_s =
vp8_build_intra_predictors_mby_s;
rtcd->subpix.sixtap16x16 = vp8_sixtap_predict16x16_c;
rtcd->subpix.sixtap8x8 = vp8_sixtap_predict8x8_c;
@@ -74,6 +75,9 @@ void vp8_machine_specific_config(VP8_COMMON *ctx)
#endif
#endif
/* Pure C: */
vp8_build_intra_predictors_mby_ptr = vp8_build_intra_predictors_mby;
vp8_build_intra_predictors_mby_s_ptr = vp8_build_intra_predictors_mby_s;
#if ARCH_X86 || ARCH_X86_64
vp8_arch_x86_common_init(ctx);
@@ -83,8 +87,4 @@ void vp8_machine_specific_config(VP8_COMMON *ctx)
vp8_arch_arm_common_init(ctx);
#endif
#if CONFIG_OPENCL && (ENABLE_CL_IDCT_DEQUANT || ENABLE_CL_SUBPIXEL || ENABLE_CL_LOOPFILTER)
vp8_arch_opencl_common_init(ctx);
#endif
}

View File

@@ -31,10 +31,6 @@
#include "arm/idct_arm.h"
#endif
#if CONFIG_OPENCL
#include "opencl/idct_cl.h"
#endif
#ifndef vp8_idct_idct1
#define vp8_idct_idct1 vp8_short_idct4x4llm_1_c
#endif

View File

@@ -11,6 +11,8 @@
#include "invtrans.h"
static void recon_dcblock(MACROBLOCKD *x)
{
BLOCKD *b = &x->block[24];
@@ -18,7 +20,7 @@ static void recon_dcblock(MACROBLOCKD *x)
for (i = 0; i < 16; i++)
{
*(x->block[i].dqcoeff_base+x->block[i].dqcoeff_offset) = b->diff_base[b->diff_offset+i];
x->block[i].dqcoeff[0] = b->diff[i];
}
}
@@ -26,18 +28,18 @@ static void recon_dcblock(MACROBLOCKD *x)
void vp8_inverse_transform_b(const vp8_idct_rtcd_vtable_t *rtcd, BLOCKD *b, int pitch)
{
if (b->eob > 1)
IDCT_INVOKE(rtcd, idct16)(b->dqcoeff_base + b->dqcoeff_offset, &b->diff_base[b->diff_offset], pitch);
IDCT_INVOKE(rtcd, idct16)(b->dqcoeff, b->diff, pitch);
else
IDCT_INVOKE(rtcd, idct1)(b->dqcoeff_base + b->dqcoeff_offset, &b->diff_base[b->diff_offset], pitch);
IDCT_INVOKE(rtcd, idct1)(b->dqcoeff, b->diff, pitch);
}
/* Only used in the encoder */
void vp8_inverse_transform_mby(const vp8_idct_rtcd_vtable_t *rtcd, MACROBLOCKD *x)
{
int i;
/* do 2nd order transform on the dc block */
IDCT_INVOKE(rtcd, iwalsh16)(x->block[24].dqcoeff_base + x->block[23].dqcoeff_offset, &x->block[24].diff_base[x->block[24].diff_offset]);
IDCT_INVOKE(rtcd, iwalsh16)(x->block[24].dqcoeff, x->block[24].diff);
recon_dcblock(x);
@@ -47,8 +49,6 @@ void vp8_inverse_transform_mby(const vp8_idct_rtcd_vtable_t *rtcd, MACROBLOCKD *
}
}
/* Only used in encoder */
void vp8_inverse_transform_mbuv(const vp8_idct_rtcd_vtable_t *rtcd, MACROBLOCKD *x)
{
int i;
@@ -57,6 +57,7 @@ void vp8_inverse_transform_mbuv(const vp8_idct_rtcd_vtable_t *rtcd, MACROBLOCKD
{
vp8_inverse_transform_b(rtcd, &x->block[i], 16);
}
}
@@ -68,10 +69,8 @@ void vp8_inverse_transform_mb(const vp8_idct_rtcd_vtable_t *rtcd, MACROBLOCKD *x
x->mode_info_context->mbmi.mode != SPLITMV)
{
/* do 2nd order transform on the dc block */
BLOCKD b = x->block[24];
IDCT_INVOKE(rtcd, iwalsh16)(b.dqcoeff_base+b.dqcoeff_offset, &b.diff_base[b.diff_offset]);
IDCT_INVOKE(rtcd, iwalsh16)(&x->block[24].dqcoeff[0], x->block[24].diff);
recon_dcblock(x);
}

View File

@@ -13,8 +13,8 @@
#define __INC_INVTRANS_H
#include "vpx_ports/config.h"
#include "vp8/common/idct.h"
#include "vp8/common/blockd.h"
#include "idct.h"
#include "blockd.h"
extern void vp8_inverse_transform_b(const vp8_idct_rtcd_vtable_t *rtcd, BLOCKD *b, int pitch);
extern void vp8_inverse_transform_mb(const vp8_idct_rtcd_vtable_t *rtcd, MACROBLOCKD *x);
extern void vp8_inverse_transform_mby(const vp8_idct_rtcd_vtable_t *rtcd, MACROBLOCKD *x);

View File

@@ -13,10 +13,6 @@
#include "loopfilter.h"
#include "onyxc_int.h"
#if CONFIG_OPENCL
#include "opencl/loopfilter_cl.h"
#endif
typedef unsigned char uc;
@@ -32,13 +28,13 @@ void vp8_loop_filter_mbh_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) simpler_lpf;
vp8_mbloop_filter_horizontal_edge_c(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
vp8_mbloop_filter_horizontal_edge_c(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);
if (u_ptr)
vp8_mbloop_filter_horizontal_edge_c(u_ptr, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1);
vp8_mbloop_filter_horizontal_edge_c(u_ptr, uv_stride, lfi->uvmbflim, lfi->uvlim, lfi->uvmbthr, 1);
if (v_ptr)
vp8_mbloop_filter_horizontal_edge_c(v_ptr, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1);
vp8_mbloop_filter_horizontal_edge_c(v_ptr, uv_stride, lfi->uvmbflim, lfi->uvlim, lfi->uvmbthr, 1);
}
void vp8_loop_filter_mbhs_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
@@ -48,7 +44,7 @@ void vp8_loop_filter_mbhs_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned
(void) v_ptr;
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_horizontal_edge_c(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_horizontal_edge_c(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);
}
/* Vertical MB Filtering */
@@ -56,13 +52,13 @@ void vp8_loop_filter_mbv_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) simpler_lpf;
vp8_mbloop_filter_vertical_edge_c(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
vp8_mbloop_filter_vertical_edge_c(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);
if (u_ptr)
vp8_mbloop_filter_vertical_edge_c(u_ptr, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1);
vp8_mbloop_filter_vertical_edge_c(u_ptr, uv_stride, lfi->uvmbflim, lfi->uvlim, lfi->uvmbthr, 1);
if (v_ptr)
vp8_mbloop_filter_vertical_edge_c(v_ptr, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1);
vp8_mbloop_filter_vertical_edge_c(v_ptr, uv_stride, lfi->uvmbflim, lfi->uvlim, lfi->uvmbthr, 1);
}
void vp8_loop_filter_mbvs_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
@@ -72,7 +68,7 @@ void vp8_loop_filter_mbvs_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned
(void) v_ptr;
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_vertical_edge_c(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_vertical_edge_c(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);
}
/* Horizontal B Filtering */
@@ -85,10 +81,10 @@ void vp8_loop_filter_bh_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned c
vp8_loop_filter_horizontal_edge_c(y_ptr + 12 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
if (u_ptr)
vp8_loop_filter_horizontal_edge_c(u_ptr + 4 * uv_stride, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1);
vp8_loop_filter_horizontal_edge_c(u_ptr + 4 * uv_stride, uv_stride, lfi->uvflim, lfi->uvlim, lfi->uvthr, 1);
if (v_ptr)
vp8_loop_filter_horizontal_edge_c(v_ptr + 4 * uv_stride, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1);
vp8_loop_filter_horizontal_edge_c(v_ptr + 4 * uv_stride, uv_stride, lfi->uvflim, lfi->uvlim, lfi->uvthr, 1);
}
void vp8_loop_filter_bhs_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
@@ -113,10 +109,10 @@ void vp8_loop_filter_bv_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned c
vp8_loop_filter_vertical_edge_c(y_ptr + 12, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
if (u_ptr)
vp8_loop_filter_vertical_edge_c(u_ptr + 4, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1);
vp8_loop_filter_vertical_edge_c(u_ptr + 4, uv_stride, lfi->uvflim, lfi->uvlim, lfi->uvthr, 1);
if (v_ptr)
vp8_loop_filter_vertical_edge_c(v_ptr + 4, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1);
vp8_loop_filter_vertical_edge_c(v_ptr + 4, uv_stride, lfi->uvflim, lfi->uvlim, lfi->uvthr, 1);
}
void vp8_loop_filter_bvs_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
@@ -141,6 +137,8 @@ void vp8_init_loop_filter(VP8_COMMON *cm)
int block_inside_limit = 0;
int HEVThresh;
const int yhedge_boost = 2;
const int uvhedge_boost = 2;
/* For each possible value for the loop filter fill out a "loop_filter_info" entry. */
for (i = 0; i <= MAX_LOOP_FILTER; i++)
@@ -184,9 +182,15 @@ void vp8_init_loop_filter(VP8_COMMON *cm)
for (j = 0; j < 16; j++)
{
lfi[i].lim[j] = block_inside_limit;
lfi[i].mbflim[j] = filt_lvl + 2;
lfi[i].mbflim[j] = filt_lvl + yhedge_boost;
lfi[i].mbthr[j] = HEVThresh;
lfi[i].flim[j] = filt_lvl;
lfi[i].thr[j] = HEVThresh;
lfi[i].uvlim[j] = block_inside_limit;
lfi[i].uvmbflim[j] = filt_lvl + uvhedge_boost;
lfi[i].uvmbthr[j] = HEVThresh;
lfi[i].uvflim[j] = filt_lvl;
lfi[i].uvthr[j] = HEVThresh;
}
}
@@ -245,52 +249,57 @@ void vp8_frame_init_loop_filter(loop_filter_info *lfi, int frame_type)
for (j = 0; j < 16; j++)
{
/*lfi[i].lim[j] = block_inside_limit;
lfi[i].mbflim[j] = filt_lvl+2;*/
lfi[i].mbflim[j] = filt_lvl+yhedge_boost;*/
lfi[i].mbthr[j] = HEVThresh;
/*lfi[i].flim[j] = filt_lvl;*/
lfi[i].thr[j] = HEVThresh;
/*lfi[i].uvlim[j] = block_inside_limit;
lfi[i].uvmbflim[j] = filt_lvl+uvhedge_boost;*/
lfi[i].uvmbthr[j] = HEVThresh;
/*lfi[i].uvflim[j] = filt_lvl;*/
lfi[i].uvthr[j] = HEVThresh;
}
}
}
int vp8_adjust_mb_lf_value(MACROBLOCKD *mbd, int filter_level)
void vp8_adjust_mb_lf_value(MACROBLOCKD *mbd, int *filter_level)
{
MB_MODE_INFO *mbmi = &mbd->mode_info_context->mbmi;
if (mbd->mode_ref_lf_delta_enabled)
{
/* Apply delta for reference frame */
filter_level += mbd->ref_lf_deltas[mbmi->ref_frame];
*filter_level += mbd->ref_lf_deltas[mbmi->ref_frame];
/* Apply delta for mode */
if (mbmi->ref_frame == INTRA_FRAME)
{
/* Only the split mode BPRED has a further special case */
if (mbmi->mode == B_PRED)
filter_level += mbd->mode_lf_deltas[0];
*filter_level += mbd->mode_lf_deltas[0];
}
else
{
/* Zero motion mode */
if (mbmi->mode == ZEROMV)
filter_level += mbd->mode_lf_deltas[1];
*filter_level += mbd->mode_lf_deltas[1];
/* Split MB motion mode */
else if (mbmi->mode == SPLITMV)
filter_level += mbd->mode_lf_deltas[3];
*filter_level += mbd->mode_lf_deltas[3];
/* All other inter motion modes (Nearest, Near, New) */
else
filter_level += mbd->mode_lf_deltas[2];
*filter_level += mbd->mode_lf_deltas[2];
}
/* Range check */
if (filter_level > MAX_LOOP_FILTER)
filter_level = MAX_LOOP_FILTER;
else if (filter_level < 0)
filter_level = 0;
if (*filter_level > MAX_LOOP_FILTER)
*filter_level = MAX_LOOP_FILTER;
else if (*filter_level < 0)
*filter_level = 0;
}
return filter_level;
}
@@ -316,13 +325,6 @@ void vp8_loop_filter_frame
int i;
unsigned char *y_ptr, *u_ptr, *v_ptr;
#if CONFIG_OPENCL && ENABLE_CL_LOOPFILTER
if ( cl_initialized == CL_SUCCESS ){
vp8_loop_filter_frame_cl(cm,mbd,default_filt_lvl);
return;
}
#endif
mbd->mode_info_context = cm->mi; /* Point at base of Mb MODE_INFO list */
/* Note the baseline filter values for each segment */
@@ -371,7 +373,7 @@ void vp8_loop_filter_frame
* These specified to 8th pel as they are always compared to values that are in 1/8th pel units
* Apply any context driven MB level adjustment
*/
filter_level = vp8_adjust_mb_lf_value(mbd, filter_level);
vp8_adjust_mb_lf_value(mbd, &filter_level);
if (filter_level)
{
@@ -405,7 +407,6 @@ void vp8_loop_filter_frame
}
/* Encoder only... */
void vp8_loop_filter_frame_yonly
(
VP8_COMMON *cm,
@@ -472,7 +473,7 @@ void vp8_loop_filter_frame_yonly
filter_level = baseline_filter_level[Segment];
/* Apply any context driven MB level adjustment */
filter_level = vp8_adjust_mb_lf_value(mbd, filter_level);
vp8_adjust_mb_lf_value(mbd, &filter_level);
if (filter_level)
{
@@ -501,7 +502,7 @@ void vp8_loop_filter_frame_yonly
}
/* Encoder only... */
void vp8_loop_filter_partial_frame
(
VP8_COMMON *cm,

View File

@@ -32,6 +32,12 @@ typedef struct
DECLARE_ALIGNED(16, signed char, flim[16]);
DECLARE_ALIGNED(16, signed char, thr[16]);
DECLARE_ALIGNED(16, signed char, mbflim[16]);
DECLARE_ALIGNED(16, signed char, mbthr[16]);
DECLARE_ALIGNED(16, signed char, uvlim[16]);
DECLARE_ALIGNED(16, signed char, uvflim[16]);
DECLARE_ALIGNED(16, signed char, uvthr[16]);
DECLARE_ALIGNED(16, signed char, uvmbflim[16]);
DECLARE_ALIGNED(16, signed char, uvmbthr[16]);
} loop_filter_info;

View File

@@ -49,6 +49,7 @@ static __inline signed char vp8_hevmask(signed char thresh, uc p1, uc p0, uc q0,
}
static __inline void vp8_filter(signed char mask, signed char hev, uc *op1, uc *op0, uc *oq0, uc *oq1)
{
signed char ps0, qs0;
signed char ps1, qs1;
@@ -93,7 +94,6 @@ static __inline void vp8_filter(signed char mask, signed char hev, uc *op1, uc *
*op1 = u ^ 0x80;
}
void vp8_loop_filter_horizontal_edge_c
(
unsigned char *s,

View File

@@ -9,17 +9,23 @@
*/
#ifndef __INC_RECONINTER_CL_H
#define __INC_RECONINTER_CL_H
#if !defined(_mac_specs_h)
#define _mac_specs_h
#include "blockd_cl.h"
#include "subpixel_cl.h"
#include "filter_cl.h"
extern void vp8_build_inter_predictors_mb_cl(MACROBLOCKD *x);
extern void vp8_build_inter_predictors_mbuv_cl(MACROBLOCKD *x);
#if defined(__cplusplus)
extern "C" {
#endif
extern unsigned int vp8_read_tsc();
extern unsigned int vp8_get_processor_freq();
extern unsigned int vpx_has_altivec();
#if defined(__cplusplus)
}
#endif
extern void vp8_build_inter_predictors_mb_s_cl(MACROBLOCKD *x);
//extern void vp8_build_inter_predictors_b_cl(BLOCKD *d, int pitch);
#endif

View File

@@ -11,21 +11,16 @@
#include "blockd.h"
#include "stdio.h"
#include "vpx_config.h"
#if CONFIG_OPENCL
#include "opencl/vp8_opencl.h"
#endif
typedef enum
{
PRED = 0,
DEST = 1
} BLOCKSET;
static void setup_block
void vp8_setup_block
(
BLOCKD *b,
int mv_stride,
unsigned char **base,
int Stride,
int offset,
@@ -48,183 +43,87 @@ static void setup_block
}
static void setup_macroblock(MACROBLOCKD *x, BLOCKSET bs)
void vp8_setup_macroblock(MACROBLOCKD *x, BLOCKSET bs)
{
int block;
unsigned char **y, **u, **v;
unsigned char **buf_base;
int y_off, u_off, v_off;
if (bs == DEST)
{
buf_base = &x->dst.buffer_alloc;
y_off = x->dst.y_buffer - x->dst.buffer_alloc;
u_off = x->dst.u_buffer - x->dst.buffer_alloc;
v_off = x->dst.v_buffer - x->dst.buffer_alloc;
y = &x->dst.y_buffer;
u = &x->dst.u_buffer;
v = &x->dst.v_buffer;
y_off = 0;
//y = buf_base;
//y_off = x->dst.y_buffer - x->dst.buffer_alloc;
u = buf_base;
v = buf_base;
u_off = x->dst.u_buffer - x->dst.buffer_alloc;
v_off = x->dst.v_buffer - x->dst.buffer_alloc;
}
else
{
buf_base = &x->pre.buffer_alloc;
y = &x->pre.y_buffer;
u = &x->pre.u_buffer;
v = &x->pre.v_buffer;
y_off = u_off = v_off = 0;
//y = buf_base;
//y_off = x->pre.y_buffer - x->pre.buffer_alloc;
//u = buf_base;
//u_off = x->pre.u_buffer - x->pre.buffer_alloc;
//v = buf_base;
//v_off = x->pre.v_buffer - x->pre.buffer_alloc;
}
for (block = 0; block < 16; block++) /* y blocks */
{
setup_block(&x->block[block], y, x->dst.y_stride,
y_off + ((block >> 2) * 4 * x->dst.y_stride + (block & 3) * 4), bs);
vp8_setup_block(&x->block[block], x->dst.y_stride, y, x->dst.y_stride,
(block >> 2) * 4 * x->dst.y_stride + (block & 3) * 4, bs);
}
for (block = 16; block < 20; block++) /* U and V blocks */
{
int block_off = ((block - 16) >> 1) * 4 * x->dst.uv_stride + (block & 1) * 4;
vp8_setup_block(&x->block[block], x->dst.uv_stride, u, x->dst.uv_stride,
((block - 16) >> 1) * 4 * x->dst.uv_stride + (block & 1) * 4, bs);
setup_block(&x->block[block], u, x->dst.uv_stride,
u_off + block_off, bs);
setup_block(&x->block[block+4], v, x->dst.uv_stride,
v_off + block_off, bs);
vp8_setup_block(&x->block[block+4], x->dst.uv_stride, v, x->dst.uv_stride,
((block - 16) >> 1) * 4 * x->dst.uv_stride + (block & 1) * 4, bs);
}
}
void vp8_setup_block_dptrs(MACROBLOCKD *x)
{
int r, c;
unsigned int offset;
#if CONFIG_OPENCL && !ONE_CQ_PER_MB
cl_command_queue y_cq, u_cq, v_cq;
int err;
if (cl_initialized == CL_SUCCESS){
//Create command queue for Y/U/V Planes
y_cq = clCreateCommandQueue(cl_data.context, cl_data.device_id, 0, &err);
if (!y_cq || err != CL_SUCCESS) {
printf("Error: Failed to create a command queue!\n");
cl_destroy(NULL, VP8_CL_TRIED_BUT_FAILED);
}
u_cq = clCreateCommandQueue(cl_data.context, cl_data.device_id, 0, &err);
if (!u_cq || err != CL_SUCCESS) {
printf("Error: Failed to create a command queue!\n");
cl_destroy(NULL, VP8_CL_TRIED_BUT_FAILED);
}
v_cq = clCreateCommandQueue(cl_data.context, cl_data.device_id, 0, &err);
if (!v_cq || err != CL_SUCCESS) {
printf("Error: Failed to create a command queue!\n");
cl_destroy(NULL, VP8_CL_TRIED_BUT_FAILED);
}
}
#endif
/* 16 Y blocks */
for (r = 0; r < 4; r++)
{
for (c = 0; c < 4; c++)
{
offset = r * 4 * 16 + c * 4;
x->block[r*4+c].diff_offset = offset;
x->block[r*4+c].predictor_offset = offset;
#if CONFIG_OPENCL && !ONE_CQ_PER_MB
if (cl_initialized == CL_SUCCESS)
x->block[r*4+c].cl_commands = y_cq;
#endif
x->block[r*4+c].diff = &x->diff[r * 4 * 16 + c * 4];
x->block[r*4+c].predictor = x->predictor + r * 4 * 16 + c * 4;
}
}
/* 4 U Blocks */
for (r = 0; r < 2; r++)
{
for (c = 0; c < 2; c++)
{
offset = 256 + r * 4 * 8 + c * 4;
x->block[16+r*2+c].diff_offset = offset;
x->block[16+r*2+c].predictor_offset = offset;
x->block[16+r*2+c].diff = &x->diff[256 + r * 4 * 8 + c * 4];
x->block[16+r*2+c].predictor = x->predictor + 256 + r * 4 * 8 + c * 4;
#if CONFIG_OPENCL && !ONE_CQ_PER_MB
if (cl_initialized == CL_SUCCESS)
x->block[16+r*2+c].cl_commands = u_cq;
#endif
}
}
/* 4 V Blocks */
for (r = 0; r < 2; r++)
{
for (c = 0; c < 2; c++)
{
offset = 320+ r * 4 * 8 + c * 4;
x->block[20+r*2+c].diff_offset = offset;
x->block[20+r*2+c].predictor_offset = offset;
x->block[20+r*2+c].diff = &x->diff[320+ r * 4 * 8 + c * 4];
x->block[20+r*2+c].predictor = x->predictor + 320 + r * 4 * 8 + c * 4;
#if CONFIG_OPENCL && !ONE_CQ_PER_MB
if (cl_initialized == CL_SUCCESS)
x->block[20+r*2+c].cl_commands = v_cq;
#endif
}
}
x->block[24].diff_offset = 384;
x->block[24].diff = &x->diff[384];
for (r = 0; r < 25; r++)
{
x->block[r].qcoeff_base = x->qcoeff;
x->block[r].qcoeff_offset = r * 16;
x->block[r].dqcoeff_base = x->dqcoeff;
x->block[r].dqcoeff_offset = r * 16;
x->block[r].predictor_base = x->predictor;
x->block[r].diff_base = x->diff;
x->block[r].eobs_base = x->eobs;
#if CONFIG_OPENCL
if (cl_initialized == CL_SUCCESS){
/* Copy command queue reference from macroblock */
#if ONE_CQ_PER_MB
x->block[r].cl_commands = x->cl_commands;
#endif
/* Set up CL memory buffers as appropriate */
x->block[r].cl_diff_mem = x->cl_diff_mem;
x->block[r].cl_dqcoeff_mem = x->cl_dqcoeff_mem;
x->block[r].cl_eobs_mem = x->cl_eobs_mem;
x->block[r].cl_predictor_mem = x->cl_predictor_mem;
x->block[r].cl_qcoeff_mem = x->cl_qcoeff_mem;
}
//Copy filter type to block.
x->block[r].sixtap_filter = x->sixtap_filter;
#endif
x->block[r].qcoeff = x->qcoeff + r * 16;
x->block[r].dqcoeff = x->dqcoeff + r * 16;
}
}
void vp8_build_block_doffsets(MACROBLOCKD *x)
{
/* handle the destination pitch features */
setup_macroblock(x, DEST);
setup_macroblock(x, PRED);
vp8_setup_macroblock(x, DEST);
vp8_setup_macroblock(x, PRED);
}

View File

@@ -120,6 +120,7 @@ typedef struct VP8Common
int mb_no_coeff_skip;
int no_lpf;
int simpler_lpf;
int use_bilinear_mc_filter;
int full_pixel;
int base_qindex;
@@ -139,6 +140,8 @@ typedef struct VP8Common
MODE_INFO *mip; /* Base of allocated array */
MODE_INFO *mi; /* Corresponds to upper left visible macroblock */
MODE_INFO *prev_mip; /* MODE_INFO array 'mip' from last decoded frame */
MODE_INFO *prev_mi; /* 'mi' from last frame (points into prev_mip) */
INTERPOLATIONFILTERTYPE mcomp_filter_type;
@@ -199,7 +202,7 @@ typedef struct VP8Common
} VP8_COMMON;
int vp8_adjust_mb_lf_value(MACROBLOCKD *mbd, int filter_level);
void vp8_adjust_mb_lf_value(MACROBLOCKD *mbd, int *filter_level);
void vp8_init_loop_filter(VP8_COMMON *cm);
void vp8_frame_init_loop_filter(loop_filter_info *lfi, int frame_type);
extern void vp8_loop_filter_frame(VP8_COMMON *cm, MACROBLOCKD *mbd, int filt_val);

View File

@@ -1,233 +0,0 @@
/*
* Copyright (c) 2011 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include "../../decoder/onyxd_int.h"
#include "../../../vpx_ports/config.h"
#include "../../common/idct.h"
#include "blockd_cl.h"
#include "../../decoder/opencl/dequantize_cl.h"
int vp8_cl_mb_prep(MACROBLOCKD *x, int flags){
int err;
if (cl_initialized != CL_SUCCESS){
return cl_initialized;
}
//Copy all blockd.cl_*_mem objects
if (flags & DIFF)
VP8_CL_SET_BUF(x->cl_commands, x->cl_diff_mem, sizeof(cl_short)*400, x->diff,
,err
);
if (flags & PREDICTOR)
VP8_CL_SET_BUF(x->cl_commands, x->cl_predictor_mem, sizeof(cl_uchar)*384, x->predictor,
,err
);
if (flags & QCOEFF)
VP8_CL_SET_BUF(x->cl_commands, x->cl_qcoeff_mem, sizeof(cl_short)*400, x->qcoeff,
,err
);
if (flags & DQCOEFF)
VP8_CL_SET_BUF(x->cl_commands, x->cl_dqcoeff_mem, sizeof(cl_short)*400, x->dqcoeff,
,err
);
if (flags & EOBS)
VP8_CL_SET_BUF(x->cl_commands, x->cl_eobs_mem, sizeof(cl_char)*25, x->eobs,
,err
);
if (flags & PRE_BUF){
VP8_CL_SET_BUF(x->cl_commands, x->pre.buffer_mem, x->pre.buffer_size, x->pre.buffer_alloc,
,err
);
}
if (flags & DST_BUF){
VP8_CL_SET_BUF(x->cl_commands, x->dst.buffer_mem, x->dst.buffer_size, x->dst.buffer_alloc,
,err
);
}
return CL_SUCCESS;
}
int vp8_cl_mb_finish(MACROBLOCKD *x, int flags){
int err;
if (cl_initialized != CL_SUCCESS){
return cl_initialized;
}
if (flags & DIFF){
err = clEnqueueReadBuffer(x->cl_commands, x->cl_diff_mem, CL_FALSE, 0, sizeof(cl_short)*400, x->diff, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( x->cl_commands, err != CL_SUCCESS,
"Error: Failed to read from GPU!\n",
, err
);
}
if (flags & PREDICTOR){
err = clEnqueueReadBuffer(x->cl_commands, x->cl_predictor_mem, CL_FALSE, 0, sizeof(cl_uchar)*384, x->predictor, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( x->cl_commands, err != CL_SUCCESS,
"Error: Failed to read from GPU!\n",
, err
);
}
if (flags & QCOEFF){
err = clEnqueueReadBuffer(x->cl_commands, x->cl_qcoeff_mem, CL_FALSE, 0, sizeof(cl_short)*400, x->qcoeff, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( x->cl_commands, err != CL_SUCCESS,
"Error: Failed to read from GPU!\n",
, err
);
}
if (flags & DQCOEFF){
err = clEnqueueReadBuffer(x->cl_commands, x->cl_dqcoeff_mem, CL_FALSE, 0, sizeof(cl_short)*400, x->dqcoeff, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( x->cl_commands, err != CL_SUCCESS,
"Error: Failed to read from GPU!\n",
, err
);
}
if (flags & EOBS){
err = clEnqueueReadBuffer(x->cl_commands, x->cl_eobs_mem, CL_FALSE, 0, sizeof(cl_char)*25, x->eobs, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( x->cl_commands, err != CL_SUCCESS,
"Error: Failed to read from GPU!\n",
, err
);
}
if (flags & PRE_BUF){
err = clEnqueueReadBuffer(x->cl_commands, x->pre.buffer_mem, CL_FALSE,
0, x->pre.buffer_size, x->pre.buffer_alloc, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( x->cl_commands, err != CL_SUCCESS,
"Error: Failed to read from GPU!\n",
, err
);
}
if (flags & DST_BUF){
err = clEnqueueReadBuffer(x->cl_commands, x->dst.buffer_mem, CL_FALSE,
0, x->dst.buffer_size, x->dst.buffer_alloc, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( x->cl_commands, err != CL_SUCCESS,
"Error: Failed to read from GPU!\n",
, err
);
}
return CL_SUCCESS;
}
int vp8_cl_block_prep(BLOCKD *b, int flags){
int err;
if (cl_initialized != CL_SUCCESS){
return cl_initialized;
}
//Copy all blockd.cl_*_mem objects
if (flags & DIFF)
VP8_CL_SET_BUF(b->cl_commands, b->cl_diff_mem, sizeof(cl_short)*400, b->diff_base,
,err
);
if (flags & PREDICTOR)
VP8_CL_SET_BUF(b->cl_commands, b->cl_predictor_mem, sizeof(cl_uchar)*384, b->predictor_base,
,err
);
if (flags & QCOEFF)
VP8_CL_SET_BUF(b->cl_commands, b->cl_qcoeff_mem, sizeof(cl_short)*400, b->qcoeff_base,
,err
);
if (flags & DQCOEFF)
VP8_CL_SET_BUF(b->cl_commands, b->cl_dqcoeff_mem, sizeof(cl_short)*400, b->dqcoeff_base,
,err
);
if (flags & EOBS)
VP8_CL_SET_BUF(b->cl_commands, b->cl_eobs_mem, sizeof(cl_char)*25, b->eobs_base,
,err
);
if (flags & DEQUANT)
VP8_CL_SET_BUF(b->cl_commands, b->cl_dequant_mem, sizeof(cl_short)*16 ,b->dequant,
,err
);
return CL_SUCCESS;
}
int vp8_cl_block_finish(BLOCKD *b, int flags){
int err;
if (cl_initialized != CL_SUCCESS){
return cl_initialized;
}
if (flags & DIFF){
err = clEnqueueReadBuffer(b->cl_commands, b->cl_diff_mem, CL_FALSE, 0, sizeof(cl_short)*400, b->diff_base, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to read from GPU!\n",
, err
);
}
if (flags & PREDICTOR){
err = clEnqueueReadBuffer(b->cl_commands, b->cl_predictor_mem, CL_FALSE, 0, sizeof(cl_uchar)*384, b->predictor_base, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to read from GPU!\n",
, err
);
}
if (flags & QCOEFF){
err = clEnqueueReadBuffer(b->cl_commands, b->cl_qcoeff_mem, CL_FALSE, 0, sizeof(cl_short)*400, b->qcoeff_base, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to read from GPU!\n",
, err
);
}
if (flags & DQCOEFF){
err = clEnqueueReadBuffer(b->cl_commands, b->cl_dqcoeff_mem, CL_FALSE, 0, sizeof(cl_short)*400, b->dqcoeff_base, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to read from GPU!\n",
, err
);
}
if (flags & EOBS){
err = clEnqueueReadBuffer(b->cl_commands, b->cl_eobs_mem, CL_FALSE, 0, sizeof(cl_char)*25, b->eobs_base, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to read from GPU!\n",
, err
);
}
if (flags & DEQUANT){
err = clEnqueueReadBuffer(b->cl_commands, b->cl_dequant_mem, CL_FALSE, 0, sizeof(cl_short)*16 ,b->dequant, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to read from GPU!\n",
, err
);
}
return CL_SUCCESS;
}

View File

@@ -1,64 +0,0 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#ifndef BLOCKD_OPENCL_H
#define BLOCKD_OPENCL_H
#ifdef __cplusplus
extern "C" {
#endif
#include "vp8_opencl.h"
#include "../blockd.h"
#define DIFF 0x0001
#define PREDICTOR 0x0002
#define QCOEFF 0x0004
#define DQCOEFF 0x0008
#define EOBS 0x0010
#define DEQUANT 0x0020
#define PRE_BUF 0x0040
#define DST_BUF 0x0080
#define BLOCK_COPY_ALL 0xffff
/*
#define BLOCK_MEM_SIZE 6
enum {
DIFF_MEM = 0,
PRED_MEM = 1,
QCOEFF_MEM = 2,
DQCOEFF_MEM = 3,
EOBS_MEM = 4,
DEQUANT_MEM = 5
} BLOCK_MEM_TYPES;
struct cl_block_mem{
cl_mem gpu_mem;
size_t size;
void *host_mem;
};
typedef struct cl_block_mem block_mem;
*/
extern int vp8_cl_block_finish(BLOCKD *b, int flags);
extern int vp8_cl_block_prep(BLOCKD *b, int flags);
extern int vp8_cl_mb_prep(MACROBLOCKD *x, int flags);
extern int vp8_cl_mb_finish(MACROBLOCKD *x, int flags);
#ifdef __cplusplus
}
#endif
#endif

View File

@@ -1,106 +0,0 @@
/*
* Copyright (c) 2011 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include "vp8_opencl.h"
#include <stdio.h>
CL_FUNCTIONS cl;
void *dll = NULL;
int cl_loaded = VP8_CL_NOT_INITIALIZED;
int close_cl(){
int ret = dlclose(dll);
if (ret != 0)
fprintf(stderr, "Error closing OpenCL library: %s", dlerror());
return ret;
}
int load_cl(char *lib_name){
//printf("Loading OpenCL library\n");
dll = dlopen(lib_name, RTLD_NOW|RTLD_LOCAL);
if (dll != NULL){
//printf("Found CL library\n");
} else {
//printf("Didn't find CL library\n");
return VP8_CL_TRIED_BUT_FAILED;
}
CL_LOAD_FN("clGetPlatformIDs", cl.getPlatformIDs);
CL_LOAD_FN("clGetPlatformInfo", cl.getPlatformInfo);
CL_LOAD_FN("clGetDeviceIDs", cl.getDeviceIDs);
CL_LOAD_FN("clGetDeviceInfo", cl.getDeviceInfo);
CL_LOAD_FN("clCreateContext", cl.createContext);
// CL_LOAD_FN("clCreateContextFromType", cl.createContextFromType);
// CL_LOAD_FN("clRetainContext", cl.retainContext);
CL_LOAD_FN("clReleaseContext", cl.releaseContext);
// CL_LOAD_FN("clGetContextInfo", cl.getContextInfo);
CL_LOAD_FN("clCreateCommandQueue", cl.createCommandQueue);
// CL_LOAD_FN("clRetainCommandQueue", cl.retainCommandQueue);
CL_LOAD_FN("clReleaseCommandQueue", cl.releaseCommandQueue);
// CL_LOAD_FN("clGetCommandQueueInfo", cl.getCommandQueue);
CL_LOAD_FN("clCreateBuffer", cl.createBuffer);
// CL_LOAD_FN("clCreateImage2D", cl.createImage2D);
// CL_LOAD_FN("clCreateImage3D", cl.createImage3D);
// CL_LOAD_FN("clRetainMemObject", cl.retainMemObject);
CL_LOAD_FN("clReleaseMemObject", cl.releaseMemObject);
// CL_LOAD_FN("clGetSupportedImageFormats", cl.getSupportedImageFormats);
// CL_LOAD_FN("clGetMemObjectInfo", cl.getMemObjectInfo);
// CL_LOAD_FN("clGetImageInfo", cl.getImageInfo);
// CL_LOAD_FN("clCreateSampler", cl.createSampler);
// CL_LOAD_FN("clRetainSampler", cl.retainSampler);
// CL_LOAD_FN("clReleaseSampler", cl.releaseSampler);
// CL_LOAD_FN("clGetSamplerInfo", cl.getSamplerInfo);
CL_LOAD_FN("clCreateProgramWithSource", cl.createProgramWithSource);
// CL_LOAD_FN("clCreateProgramWithBinary", cl.createProgramWithBinary);
// CL_LOAD_FN("clRetainProgram", cl.retainProgram);
CL_LOAD_FN("clReleaseProgram", cl.releaseProgram);
CL_LOAD_FN("clBuildProgram", cl.buildProgram);
// CL_LOAD_FN("clUnloadCompiler", cl.unloadCompiler);
CL_LOAD_FN("clGetProgramInfo", cl.getProgramInfo);
CL_LOAD_FN("clGetProgramBuildInfo", cl.getProgramBuildInfo);
CL_LOAD_FN("clCreateKernel", cl.createKernel);
// CL_LOAD_FN("clCreateKernelsInProgram", cl.createKernelsInProgram);
// CL_LOAD_FN("clRetainKernel", cl.retainKernel);
CL_LOAD_FN("clReleaseKernel", cl.releaseKernel);
CL_LOAD_FN("clSetKernelArg", cl.setKernelArg);
// CL_LOAD_FN("clGetKernelInfo", cl.getKernelInfo);
CL_LOAD_FN("clGetKernelWorkGroupInfo", cl.getKernelWorkGroupInfo);
// CL_LOAD_FN("clWaitForEvents", cl.waitForEvents);
// CL_LOAD_FN("clGetEventInfo", cl.getEventInfo);
// CL_LOAD_FN("clRetainEvent", cl.retainEvent);
// CL_LOAD_FN("clReleaseEvent", cl.releaseEvent);
// CL_LOAD_FN("clGetEventProfilingInfo", cl.getEventProfilingInfo);
CL_LOAD_FN("clFlush", cl.flush);
CL_LOAD_FN("clFinish", cl.finish);
CL_LOAD_FN("clEnqueueReadBuffer", cl.enqueueReadBuffer);
CL_LOAD_FN("clEnqueueWriteBuffer", cl.enqueueWriteBuffer);
CL_LOAD_FN("clEnqueueCopyBuffer", cl.enqueueCopyBuffer);
// CL_LOAD_FN("clEnqueueReadImage", cl.enqueueReadImage);
// CL_LOAD_FN("clEnqueueWriteImage", cl.enqueueWriteImage);
// CL_LOAD_FN("clEnqueueCopyImage", cl.enqueueCopyImage);
// CL_LOAD_FN("clEnqueueCopyImageToBuffer", cl.enqueueCopyImageToBuffer);
// CL_LOAD_FN("clEnqueueCopyBufferToImage", cl.enqueueCopyBufferToImage);
// CL_LOAD_FN("clEnqueueMapBuffer", cl.enqueueMapBuffer);
// CL_LOAD_FN("clEnqueueMapImage", cl.enqueueMapImage);
// CL_LOAD_FN("clEnqueueUnmapMemObject", cl.enqueueUnmapMemObject);
CL_LOAD_FN("clEnqueueNDRangeKernel", cl.enqueueNDRAngeKernel);
// CL_LOAD_FN("clEnqueueTask", cl.enqueueTask);
// CL_LOAD_FN("clEnqueueNativeKernel", cl.enqueueNativeKernel);
// CL_LOAD_FN("clEnqueueMarker", cl.enqueueMarker);
// CL_LOAD_FN("clEnqueueWaitForEvents", cl.enqueueWaitForEvents);
CL_LOAD_FN("clEnqueueBarrier", cl.enqueueBarrier);
// CL_LOAD_FN("clGetExtensionFunctionAddress", cl.getExtensionFunctionAddress);
return CL_SUCCESS;
}

View File

@@ -1,253 +0,0 @@
/*
* Copyright (c) 2011 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#ifndef DYNAMIC_CL_H
#define DYNAMIC_CL_H
#ifdef __cplusplus
extern "C" {
#endif
#ifdef __APPLE__
#include <OpenCL/cl.h>
#else
#include <CL/cl.h>
#endif
#include <dlfcn.h>
int load_cl(char *lib_name);
int close_cl();
extern int cl_loaded;
typedef cl_int(*fn_clGetPlatformIDs_t)(cl_uint, cl_platform_id *, cl_uint *);
typedef cl_int(*fn_clGetPlatformInfo_t)(cl_platform_id, cl_platform_info, size_t, void *, size_t *);
typedef cl_int(*fn_clGetDeviceIDs_t)(cl_platform_id, cl_device_type, cl_uint, cl_device_id *, cl_uint *);
typedef cl_int(*fn_clGetDeviceInfo_t)(cl_device_id, cl_device_info, size_t, void *, size_t *);
typedef cl_context(*fn_clCreateContext_t)(const cl_context_properties *, cl_uint, const cl_device_id *, void (*pfn_notify)(const char *, const void *, size_t, void *), void *, cl_int *);
typedef cl_context(*fn_clCreateContextFromType_t)(const cl_context_properties *, cl_device_type, void (*pfn_notify)(const char *, const void *, size_t, void *), void *, cl_int *);
typedef cl_int(*fn_clRetainContext_t)(cl_context);
typedef cl_int(*fn_clReleaseContext_t)(cl_context);
typedef cl_int(*fn_clGetContextInfo_t)(cl_context, cl_context_info, size_t, void *, size_t *);
typedef cl_command_queue(*fn_clCreateCommandQueue_t)(cl_context, cl_device_id, cl_command_queue_properties, cl_int *);
typedef cl_int(*fn_clRetainCommandQueue_t)(cl_command_queue);
typedef cl_int(*fn_clReleaseCommandQueue_t)(cl_command_queue);
typedef cl_int(*fn_clGetCommandQueueInfo_t)(cl_command_queue, cl_command_queue_info, size_t, void *, size_t *);
typedef cl_mem(*fn_clCreateBuffer_t)(cl_context, cl_mem_flags, size_t, void *, cl_int *);
typedef cl_mem(*fn_clCreateImage2D_t)(cl_context, cl_mem_flags, const cl_image_format *, size_t, size_t, size_t, void *, cl_int *);
typedef cl_mem(*fn_clCreateImage3D_t)(cl_context, cl_mem_flags, const cl_image_format *, size_t, size_t, size_t, size_t, size_t, void *, cl_int *);
typedef cl_int(*fn_clRetainMemObject_t)(cl_mem);
typedef cl_int(*fn_clReleaseMemObject_t)(cl_mem);
typedef cl_int(*fn_clGetSupportedImageFormats_t)(cl_context, cl_mem_flags, cl_mem_object_type, cl_uint, cl_image_format *, cl_uint *);
typedef cl_int(*fn_clGetMemObjectInfo_t)(cl_mem, cl_mem_info, size_t, void *, size_t *);
typedef cl_int(*fn_clGetImageInfo_t)(cl_mem, cl_image_info, size_t, void *, size_t *);
typedef cl_sampler(*fn_clCreateSampler_t)(cl_context, cl_bool, cl_addressing_mode, cl_filter_mode, cl_int *);
typedef cl_int(*fn_clRetainSampler_t)(cl_sampler);
typedef cl_int(*fn_clReleaseSampler_t)(cl_sampler);
typedef cl_int(*fn_clGetSamplerInfo_t)(cl_sampler, cl_sampler_info, size_t, void *, size_t *);
typedef cl_program(*fn_clCreateProgramWithSource_t)(cl_context, cl_uint, const char **, const size_t *, cl_int *);
typedef cl_program(*fn_clCreateProgramWithBinary_t)(cl_context, cl_uint, const cl_device_id *, const size_t *, const unsigned char **, cl_int *, cl_int *);
typedef cl_int(*fn_clRetainProgram_t)(cl_program);
typedef cl_int(*fn_clReleaseProgram_t)(cl_program);
typedef cl_int(*fn_clBuildProgram_t)(cl_program, cl_uint, const cl_device_id *, const char *, void (*pfn_notify)(cl_program,void*), void *);
typedef cl_int(*fn_clUnloadCompiler_t)(void);
typedef cl_int(*fn_clGetProgramInfo_t)(cl_program, cl_program_info, size_t, void *, size_t *);
typedef cl_int(*fn_clGetProgramBuildInfo_t)(cl_program, cl_device_id, cl_program_build_info, size_t, void *, size_t *);
typedef cl_kernel(*fn_clCreateKernel_t)(cl_program, const char *, cl_int *);
typedef cl_int(*fn_clCreateKernelsInProgram_t)(cl_program, cl_uint, cl_kernel *, cl_uint *);
typedef cl_int(*fn_clRetainKernel_t)(cl_kernel);
typedef cl_int(*fn_clReleaseKernel_t)(cl_kernel);
typedef cl_int(*fn_clSetKernelArg_t)(cl_kernel, cl_uint, size_t, const void *);
typedef cl_int(*fn_clGetKernelInfo_t)(cl_kernel, cl_kernel_info, size_t, void *, size_t *);
typedef cl_int(*fn_clGetKernelWorkGroupInfo_t)(cl_kernel, cl_device_id, cl_kernel_work_group_info, size_t, void *, size_t *);
typedef cl_int(*fn_clWaitForEvents_t)(cl_uint, const cl_event *);
typedef cl_int(*fn_clGetEventInfo_t)(cl_event, cl_event_info, size_t, void *, size_t *);
typedef cl_int(*fn_clRetainEvent_t)(cl_event);
typedef cl_int(*fn_clReleaseEvent_t)(cl_event);
typedef cl_int(*fn_clGetEventProfilingInfo_t)(cl_event, cl_profiling_info, size_t, void *, size_t *);
typedef cl_int(*fn_clFlush_t)(cl_command_queue);
typedef cl_int(*fn_clFinish_t)(cl_command_queue);
typedef cl_int(*fn_clEnqueueReadBuffer_t)(cl_command_queue, cl_mem, cl_bool, size_t, size_t, void *, cl_uint, const cl_event *, cl_event *);
typedef cl_int(*fn_clEnqueueWriteBuffer_t)(cl_command_queue, cl_mem, cl_bool, size_t, size_t, const void *, cl_uint, const cl_event *, cl_event *);
typedef cl_int(*fn_clEnqueueCopyBuffer_t)(cl_command_queue, cl_mem, cl_mem, size_t, size_t, size_t, cl_uint, const cl_event *, cl_event *);
typedef cl_int(*fn_clEnqueueReadImage_t)(cl_command_queue, cl_mem, cl_bool, const size_t *, const size_t *, size_t, size_t, void *, cl_uint, const cl_event *, cl_event *);
typedef cl_int(*fn_clEnqueueWriteImage_t)(cl_command_queue, cl_mem, cl_bool, const size_t *, const size_t *, size_t, size_t, const void *, cl_uint, const cl_event *, cl_event *);
typedef cl_int(*fn_clEnqueueCopyImage_t)(cl_command_queue, cl_mem, cl_mem, const size_t *, const size_t *, const size_t *, cl_uint, const cl_event *, cl_event *);
typedef cl_int(*fn_clEnqueueCopyImageToBuffer_t)(cl_command_queue, cl_mem, cl_mem, const size_t *, const size_t *, size_t, cl_uint, const cl_event *, cl_event *);
typedef cl_int(*fn_clEnqueueCopyBufferToImage_t)(cl_command_queue, cl_mem, cl_mem, size_t, const size_t *, const size_t *, cl_uint, const cl_event *, cl_event *);
typedef void*(*fn_clEnqueueMapBuffer_t)(cl_command_queue, cl_mem, cl_bool, cl_map_flags, size_t, size_t, cl_uint, const cl_event *, cl_event *, cl_int *);
typedef void*(*fn_clEnqueueMapImage_t)(cl_command_queue, cl_mem, cl_bool, cl_map_flags, const size_t *, const size_t *, size_t *, size_t *, cl_uint, const cl_event *, cl_event *, cl_int *);
typedef cl_int(*fn_clEnqueueUnmapMemObject_t)(cl_command_queue, cl_mem, void *, cl_uint, const cl_event *, cl_event *);
typedef cl_int(*fn_clEnqueueNDRangeKernel_t)(cl_command_queue, cl_kernel, cl_uint, const size_t *, const size_t *, const size_t *, cl_uint, const cl_event *, cl_event *);
typedef cl_int(*fn_clEnqueueTask_t)(cl_command_queue, cl_kernel, cl_uint, const cl_event *, cl_event *);
typedef cl_int(*fn_clEnqueueNativeKernel_t)(cl_command_queue, void (*user_func)(void *), void *, size_t, cl_uint, const cl_mem *, const void **, cl_uint, const cl_event *, cl_event *);
typedef cl_int(*fn_clEnqueueMarker_t)(cl_command_queue, cl_event *);
typedef cl_int(*fn_clEnqueueWaitForEvents_t)(cl_command_queue, cl_uint, const cl_event *);
typedef cl_int(*fn_clEnqueueBarrier_t)(cl_command_queue);
typedef void*(*fn_clGetExtensionFunctionAddress_t)(const char *);
typedef struct CL_FUNCTIONS {
fn_clGetPlatformIDs_t getPlatformIDs;
fn_clGetPlatformInfo_t getPlatformInfo;
fn_clGetDeviceIDs_t getDeviceIDs;
fn_clGetDeviceInfo_t getDeviceInfo;
fn_clCreateContext_t createContext;
fn_clCreateContextFromType_t createContextFromType;
fn_clRetainContext_t retainContext;
fn_clReleaseContext_t releaseContext;
fn_clGetContextInfo_t getContextInfo;
fn_clCreateCommandQueue_t createCommandQueue;
fn_clRetainCommandQueue_t retainCommandQueue;
fn_clReleaseCommandQueue_t releaseCommandQueue;
fn_clGetCommandQueueInfo_t getCommandQueue;
fn_clCreateBuffer_t createBuffer;
fn_clCreateImage2D_t createImage2D;
fn_clCreateImage3D_t createImage3D;
fn_clRetainMemObject_t retainMemObject;
fn_clReleaseMemObject_t releaseMemObject;
fn_clGetSupportedImageFormats_t getSupportedImageFormats;
fn_clGetMemObjectInfo_t getMemObjectInfo;
fn_clGetImageInfo_t getImageInfo;
fn_clCreateSampler_t createSampler;
fn_clRetainSampler_t retainSampler;
fn_clReleaseSampler_t releaseSampler;
fn_clGetSamplerInfo_t getSamplerInfo;
fn_clCreateProgramWithSource_t createProgramWithSource;
fn_clCreateProgramWithBinary_t createProgramWithBinary;
fn_clRetainProgram_t retainProgram;
fn_clReleaseProgram_t releaseProgram;
fn_clBuildProgram_t buildProgram;
fn_clUnloadCompiler_t unloadCompiler;
fn_clGetProgramInfo_t getProgramInfo;
fn_clGetProgramBuildInfo_t getProgramBuildInfo;
fn_clCreateKernel_t createKernel;
fn_clCreateKernelsInProgram_t createKernelsInProgram;
fn_clRetainKernel_t retainKernel;
fn_clReleaseKernel_t releaseKernel;
fn_clSetKernelArg_t setKernelArg;
fn_clGetKernelInfo_t getKernelInfo;
fn_clGetKernelWorkGroupInfo_t getKernelWorkGroupInfo;
fn_clWaitForEvents_t waitForEvents;
fn_clGetEventInfo_t getEventInfo;
fn_clRetainEvent_t retainEvent;
fn_clReleaseEvent_t releaseEvent;
fn_clGetEventProfilingInfo_t getEventProfilingInfo;
fn_clFlush_t flush;
fn_clFinish_t finish;
fn_clEnqueueReadBuffer_t enqueueReadBuffer;
fn_clEnqueueWriteBuffer_t enqueueWriteBuffer;
fn_clEnqueueCopyBuffer_t enqueueCopyBuffer;
fn_clEnqueueReadImage_t enqueueReadImage;
fn_clEnqueueWriteImage_t enqueueWriteImage;
fn_clEnqueueCopyImage_t enqueueCopyImage;
fn_clEnqueueCopyImageToBuffer_t enqueueCopyImageToBuffer;
fn_clEnqueueCopyBufferToImage_t enqueueCopyBufferToImage;
fn_clEnqueueMapBuffer_t enqueueMapBuffer;
fn_clEnqueueMapImage_t enqueueMapImage;
fn_clEnqueueUnmapMemObject_t enqueueUnmapMemObject;
fn_clEnqueueNDRangeKernel_t enqueueNDRAngeKernel;
fn_clEnqueueTask_t enqueueTask;
fn_clEnqueueNativeKernel_t enqueueNativeKernel;
fn_clEnqueueMarker_t enqueueMarker;
fn_clEnqueueWaitForEvents_t enqueueWaitForEvents;
fn_clEnqueueBarrier_t enqueueBarrier;
fn_clGetExtensionFunctionAddress_t getExtensionFunctionAddress;
} CL_FUNCTIONS;
extern CL_FUNCTIONS cl;
#define clGetPlatformIDs cl.getPlatformIDs
#define clGetPlatformInfo cl.getPlatformInfo
#define clGetDeviceIDs cl.getDeviceIDs
#define clGetDeviceInfo cl.getDeviceInfo
#define clCreateContext cl.createContext
#define clCreateContextFromType cl.createContextFromType
#define clRetainContext cl.retainContext
#define clReleaseContext cl.releaseContext
#define clGetContextInfo cl.getContextInfo
#define clCreateCommandQueue cl.createCommandQueue
#define clRetainCommandQueue cl.retainCommandQueue
#define clReleaseCommandQueue cl.releaseCommandQueue
#define clGetCommandQueueInfo cl.getCommandQueue
#define clCreateBuffer cl.createBuffer
#define clCreateSubBuffer cl.createSubBuffer
#define clCreateImage2D cl.createImage2D
#define clCreateImage3D cl.createImage3D
#define clRetainMemObject cl.retainMemObject
#define clReleaseMemObject cl.releaseMemObject
#define clGetSupportedImageFormats cl.getSupportedImageFormats
#define clGetMemObjectInfo cl.getMemObjectInfo
#define clGetImageInfo cl.getImageInfo
#define clSetMemObjectDestructorCallback cl.setMemObjectDestructorCallback
#define clCreateSampler cl.createSampler
#define clRetainSampler cl.retainSampler
#define clReleaseSampler cl.releaseSampler
#define clGetSamplerInfo cl.getSamplerInfo
#define clCreateProgramWithSource cl.createProgramWithSource
#define clCreateProgramWithBinary cl.createProgramWithBinary
#define clRetainProgram cl.retainProgram
#define clReleaseProgram cl.releaseProgram
#define clBuildProgram cl.buildProgram
#define clUnloadCompiler cl.unloadCompiler
#define clGetProgramInfo cl.getProgramInfo
#define clGetProgramBuildInfo cl.getProgramBuildInfo
#define clCreateKernel cl.createKernel
#define clCreateKernelsInProgram cl.createKernelsInProgram
#define clRetainKernel cl.retainKernel
#define clReleaseKernel cl.releaseKernel
#define clSetKernelArg cl.setKernelArg
#define clGetKernelInfo cl.getKernelInfo
#define clGetKernelWorkGroupInfo cl.getKernelWorkGroupInfo
#define clWaitForEvents cl.waitForEvents
#define clGetEventInfo cl.getEventInfo
#define clCreateUserEvent cl.createUserEvent
#define clRetainEvent cl.retainEvent
#define clReleaseEvent cl.releaseEvent
#define clSetUserEventStatus cl.setUserEventStatus
#define clSetEventCallback cl.setEventCallback
#define clGetEventProfilingInfo cl.getEventProfilingInfo
#define clFlush cl.flush
#define clFinish cl.finish
#define clEnqueueReadBuffer cl.enqueueReadBuffer
#define clEnqueueReadBufferRect cl.enqueueReadBufferRect
#define clEnqueueWriteBuffer cl.enqueueWriteBuffer
#define clEnqueueWriteBufferRect cl.enqueueWriteBufferRect
#define clEnqueueCopyBuffer cl.enqueueCopyBuffer
#define clEnqueueCopyBufferRect cl.enqueueCopyBufferRect
#define clEnqueueReadImage cl.enqueueReadImage
#define clEnqueueWriteImage cl.enqueueWriteImage
#define clEnqueueCopyImage cl.enqueueCopyImage
#define clEnqueueCopyImageToBuffer cl.enqueueCopyImageToBuffer
#define clEnqueueCopyBufferToImage cl.enqueueCopyBufferToImage
#define clEnqueueMapBuffer cl.enqueueMapBuffer
#define clEnqueueMapImage cl.enqueueMapImage
#define clEnqueueUnmapMemObject cl.enqueueUnmapMemObject
#define clEnqueueNDRangeKernel cl.enqueueNDRAngeKernel
#define clEnqueueTask cl.enqueueTask
#define clEnqueueNativeKernel cl.enqueueNativeKernel
#define clEnqueueMarker cl.enqueueMarker
#define clEnqueueWaitForEvents cl.enqueueWaitForEvents
#define clEnqueueBarrier cl.enqueueBarrier
#define clGetExtensionFunctionAddress cl.getExtensionFunctionAddress
#define CL_LOAD_FN(name, ref) \
ref = dlsym(dll,name); \
if (ref == NULL){ \
dlclose(dll); \
return CL_INVALID_PLATFORM; \
}
#ifdef __cplusplus
}
#endif
#endif /* DYNAMIC_CL_H */

View File

@@ -1,824 +0,0 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include <stdlib.h>
//ACW: Remove me after debugging.
#include <stdio.h>
#include <string.h>
#include "vp8_opencl.h"
#include "filter_cl.h"
#include "../blockd.h"
#define SIXTAP_FILTER_LEN 6
const char *filterCompileOptions = "-Ivp8/common/opencl -DVP8_FILTER_WEIGHT=128 -DVP8_FILTER_SHIFT=7 -DFILTER_OFFSET";
const char *filter_cl_file_name = "vp8/common/opencl/filter_cl.cl";
#define STATIC_MEM 1
#if STATIC_MEM
static cl_mem int_mem = NULL;
#endif
void cl_destroy_filter(){
if (cl_data.filter_program)
clReleaseProgram(cl_data.filter_program);
//VP8_CL_RELEASE_KERNEL(cl_data.vp8_block_variation_kernel);
#if !TWO_PASS_SIXTAP
VP8_CL_RELEASE_KERNEL(cl_data.vp8_sixtap_predict_kernel);
VP8_CL_RELEASE_KERNEL(cl_data.vp8_sixtap_predict8x8_kernel);
VP8_CL_RELEASE_KERNEL(cl_data.vp8_sixtap_predict8x4_kernel);
VP8_CL_RELEASE_KERNEL(cl_data.vp8_sixtap_predict16x16_kernel);
#else
VP8_CL_RELEASE_KERNEL(cl_data.vp8_filter_block2d_first_pass_kernel);
VP8_CL_RELEASE_KERNEL(cl_data.vp8_filter_block2d_second_pass_kernel);
#endif
//VP8_CL_RELEASE_KERNEL(cl_data.vp8_bilinear_predict4x4_kernel);
//VP8_CL_RELEASE_KERNEL(cl_data.vp8_bilinear_predict8x4_kernel);
//VP8_CL_RELEASE_KERNEL(cl_data.vp8_bilinear_predict8x8_kernel);
//VP8_CL_RELEASE_KERNEL(cl_data.vp8_bilinear_predict16x16_kernel);
#if MEM_COPY_KERNEL
VP8_CL_RELEASE_KERNEL(cl_data.vp8_memcpy_kernel);
#endif
VP8_CL_RELEASE_KERNEL(cl_data.vp8_filter_block2d_bil_first_pass_kernel);
VP8_CL_RELEASE_KERNEL(cl_data.vp8_filter_block2d_bil_second_pass_kernel);
#if STATIC_MEM
if (int_mem != NULL)
clReleaseMemObject(int_mem);
int_mem = NULL;
#endif
cl_data.filter_program = NULL;
}
int cl_init_filter() {
int err;
// Create the filter compute program from the file-defined source code
if ( cl_load_program(&cl_data.filter_program, filter_cl_file_name,
filterCompileOptions) != CL_SUCCESS )
return VP8_CL_TRIED_BUT_FAILED;
// Create the compute kernel in the program we wish to run
#if TWO_PASS_SIXTAP
VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_filter_block2d_first_pass_kernel,"vp8_filter_block2d_first_pass_kernel");
VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_filter_block2d_second_pass_kernel,"vp8_filter_block2d_second_pass_kernel");
VP8_CL_CALC_LOCAL_SIZE(vp8_filter_block2d_first_pass_kernel,vp8_filter_block2d_first_pass_kernel_size);
VP8_CL_CALC_LOCAL_SIZE(vp8_filter_block2d_second_pass_kernel,vp8_filter_block2d_second_pass_kernel_size);
#else
VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_sixtap_predict_kernel,"vp8_sixtap_predict_kernel");
VP8_CL_CALC_LOCAL_SIZE(vp8_sixtap_predict_kernel,vp8_sixtap_predict_kernel_size);
VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_sixtap_predict8x8_kernel,"vp8_sixtap_predict8x8_kernel");
VP8_CL_CALC_LOCAL_SIZE(vp8_sixtap_predict8x8_kernel,vp8_sixtap_predict8x8_kernel_size);
VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_sixtap_predict8x4_kernel,"vp8_sixtap_predict8x4_kernel");
VP8_CL_CALC_LOCAL_SIZE(vp8_sixtap_predict8x4_kernel,vp8_sixtap_predict8x4_kernel_size);
VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_sixtap_predict16x16_kernel,"vp8_sixtap_predict16x16_kernel");
VP8_CL_CALC_LOCAL_SIZE(vp8_sixtap_predict16x16_kernel,vp8_sixtap_predict16x16_kernel_size);
#endif
//VP8_CL_CALC_LOCAL_SIZE(vp8_filter_block2d_bil_first_pass_kernel,vp8_filter_block2d_bil_first_pass_kernel_size);
//VP8_CL_CALC_LOCAL_SIZE(vp8_filter_block2d_bil_second_pass_kernel,vp8_filter_block2d_bil_second_pass_kernel_size);
VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_filter_block2d_bil_first_pass_kernel,"vp8_filter_block2d_bil_first_pass_kernel");
VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_filter_block2d_bil_second_pass_kernel,"vp8_filter_block2d_bil_second_pass_kernel");
//VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_bilinear_predict4x4_kernel,"vp8_bilinear_predict4x4_kernel");
//VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_bilinear_predict8x4_kernel,"vp8_bilinear_predict8x4_kernel");
//VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_bilinear_predict8x8_kernel,"vp8_bilinear_predict8x8_kernel");
//VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_bilinear_predict16x16_kernel,"vp8_bilinear_predict16x16_kernel");
#if MEM_COPY_KERNEL
VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_memcpy_kernel,"vp8_memcpy_kernel");
VP8_CL_CALC_LOCAL_SIZE(vp8_memcpy_kernel,vp8_memcpy_kernel_size);
#endif
#if STATIC_MEM
VP8_CL_CREATE_BUF(NULL, int_mem, NULL, sizeof(cl_int)*21*16, NULL, ,err);
#endif
return CL_SUCCESS;
}
void vp8_filter_block2d_first_pass_cl(
cl_command_queue cq,
cl_mem src_mem,
int src_offset,
cl_mem int_mem,
unsigned int src_pixels_per_line,
unsigned int int_height,
unsigned int int_width,
int xoffset
){
int err;
size_t global = int_width*int_height;
size_t local = cl_data.vp8_filter_block2d_first_pass_kernel_size;
if (local > global)
local = global;
err = clSetKernelArg(cl_data.vp8_filter_block2d_first_pass_kernel, 0, sizeof (cl_mem), &src_mem);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_first_pass_kernel, 1, sizeof (int), &src_offset);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_first_pass_kernel, 2, sizeof (cl_mem), &int_mem);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_first_pass_kernel, 3, sizeof (cl_uint), &src_pixels_per_line);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_first_pass_kernel, 4, sizeof (cl_uint), &int_height);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_first_pass_kernel, 5, sizeof (cl_int), &int_width);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_first_pass_kernel, 6, sizeof (int), &xoffset);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to set kernel arguments!\n",
,
);
/* Execute the kernel */
err = clEnqueueNDRangeKernel( cq, cl_data.vp8_filter_block2d_first_pass_kernel, 1, NULL, &global, &local , 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to execute kernel!\n",
printf("err = %d\n",err);,
);
}
void vp8_filter_block2d_second_pass_cl(
cl_command_queue cq,
cl_mem int_mem,
int int_offset,
cl_mem dst_mem,
int dst_offset,
int dst_pitch,
unsigned int output_height,
unsigned int output_width,
int yoffset
){
int err;
size_t global = output_width*output_height;
size_t local = cl_data.vp8_filter_block2d_second_pass_kernel_size;
if (local > global){
//printf("Local is now %ld\n",global);
local = global;
}
/* Set kernel arguments */
err = clSetKernelArg(cl_data.vp8_filter_block2d_second_pass_kernel, 0, sizeof (cl_mem), &int_mem);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_second_pass_kernel, 1, sizeof (int), &int_offset);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_second_pass_kernel, 2, sizeof (cl_mem), &dst_mem);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_second_pass_kernel, 3, sizeof (int), &dst_offset);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_second_pass_kernel, 4, sizeof (int), &dst_pitch);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_second_pass_kernel, 5, sizeof (int), &output_width);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_second_pass_kernel, 6, sizeof (int), &output_width);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_second_pass_kernel, 7, sizeof (int), &output_height);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_second_pass_kernel, 8, sizeof (int), &output_width);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_second_pass_kernel, 9, sizeof (int), &yoffset);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to set kernel arguments!\n",
,
);
/* Execute the kernel */
err = clEnqueueNDRangeKernel( cq, cl_data.vp8_filter_block2d_second_pass_kernel, 1, NULL, &global, &local , 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to execute kernel!\n",
printf("err = %d\n",err);,
);
}
void vp8_sixtap_single_pass(
cl_command_queue cq,
cl_kernel kernel,
size_t local,
size_t global,
cl_mem src_mem,
cl_mem dst_mem,
unsigned char *src_base,
int src_offset,
size_t src_len,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_base,
int dst_offset,
int dst_pitch,
size_t dst_len
){
int err;
#if !STATIC_MEM
cl_mem int_mem;
#endif
int free_src = 0, free_dst = 0;
if (local > global){
local = global;
}
/* Make space for kernel input/output data.
* Initialize the buffer as well if needed.
*/
if (src_mem == NULL){
VP8_CL_CREATE_BUF( cq, src_mem,, sizeof (unsigned char) * src_len, src_base-2,,);
src_offset = 2;
free_src = 1;
} else {
src_offset -= 2*src_pixels_per_line;
}
if (dst_mem == NULL){
VP8_CL_CREATE_BUF( cq, dst_mem,, sizeof (unsigned char) * dst_len + dst_offset, dst_base,, );
free_dst = 1;
}
#if !STATIC_MEM
CL_CREATE_BUF( cq, int_mem,, sizeof(cl_int)*FData_height*FData_width, NULL,, );
#endif
err = clSetKernelArg(kernel, 0, sizeof (cl_mem), &src_mem);
err |= clSetKernelArg(kernel, 1, sizeof (int), &src_offset);
err |= clSetKernelArg(kernel, 2, sizeof (cl_int), &src_pixels_per_line);
err |= clSetKernelArg(kernel, 3, sizeof (cl_int), &xoffset);
err |= clSetKernelArg(kernel, 4, sizeof (cl_int), &yoffset);
err |= clSetKernelArg(kernel, 5, sizeof (cl_mem), &dst_mem);
err |= clSetKernelArg(kernel, 6, sizeof (cl_int), &dst_offset);
err |= clSetKernelArg(kernel, 7, sizeof (int), &dst_pitch);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to set kernel arguments!\n",
,
);
/* Execute the kernel */
err = clEnqueueNDRangeKernel( cq, kernel, 1, NULL, &global, &local , 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to execute kernel!\n",
printf("err = %d\n",err);,
);
if (free_src == 1)
clReleaseMemObject(src_mem);
if (free_dst == 1){
/* Read back the result data from the device */
err = clEnqueueReadBuffer(cq, dst_mem, CL_FALSE, 0, sizeof (unsigned char) * dst_len + dst_offset, dst_base, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to read output array!\n",
,
);
clReleaseMemObject(dst_mem);
}
}
void vp8_sixtap_run_cl(
cl_command_queue cq,
cl_mem src_mem,
cl_mem dst_mem,
unsigned char *src_base,
int src_offset,
size_t src_len,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_base,
int dst_offset,
int dst_pitch,
size_t dst_len,
unsigned int FData_height,
unsigned int FData_width,
unsigned int output_height,
unsigned int output_width,
int int_offset
)
{
int err;
#if !STATIC_MEM
cl_mem int_mem;
#endif
int free_src = 0, free_dst = 0;
/* Make space for kernel input/output data.
* Initialize the buffer as well if needed.
*/
if (src_mem == NULL){
VP8_CL_CREATE_BUF( cq, src_mem,, sizeof (unsigned char) * src_len, src_base-2,,);
src_offset = 2;
free_src = 1;
} else {
src_offset -= 2*src_pixels_per_line;
}
if (dst_mem == NULL){
VP8_CL_CREATE_BUF( cq, dst_mem,, sizeof (unsigned char) * dst_len + dst_offset, dst_base,, );
free_dst = 1;
}
#if !STATIC_MEM
CL_CREATE_BUF( cq, int_mem,, sizeof(cl_int)*FData_height*FData_width, NULL,, );
#endif
vp8_filter_block2d_first_pass_cl(
cq, src_mem, src_offset, int_mem, src_pixels_per_line,
FData_height, FData_width, xoffset
);
vp8_filter_block2d_second_pass_cl(cq,int_mem,int_offset,dst_mem,dst_offset,dst_pitch,
output_height,output_width,yoffset);
if (free_src == 1)
clReleaseMemObject(src_mem);
if (free_dst == 1){
/* Read back the result data from the device */
err = clEnqueueReadBuffer(cq, dst_mem, CL_FALSE, 0, sizeof (unsigned char) * dst_len + dst_offset, dst_base, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to read output array!\n",
,
);
clReleaseMemObject(dst_mem);
}
#if !STATIC_MEM
clReleaseMemObject(int_mem);
#endif
}
void vp8_sixtap_predict4x4_cl
(
cl_command_queue cq,
unsigned char *src_base,
cl_mem src_mem,
int src_offset,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_base,
cl_mem dst_mem,
int dst_offset,
int dst_pitch
) {
int output_width=4, output_height=4, FData_height=9, FData_width=4;
//Size of output to transfer
int dst_len = DST_LEN(dst_pitch,output_height,output_width);
int src_len = SIXTAP_SRC_LEN(FData_width,FData_height,src_pixels_per_line);
#if TWO_PASS_SIXTAP
int int_offset = 8;
unsigned char *src_ptr = src_base + src_offset;
vp8_sixtap_run_cl(cq, src_mem, dst_mem,
(src_ptr-2*src_pixels_per_line),src_offset, src_len,
src_pixels_per_line, xoffset,yoffset,dst_base,dst_offset,
dst_pitch,dst_len,FData_height,FData_width,output_height,
output_width,int_offset
);
#else
vp8_sixtap_single_pass(
cq,
cl_data.vp8_sixtap_predict_kernel,
cl_data.vp8_sixtap_predict_kernel_size,
FData_height*FData_width,
src_mem,
dst_mem,
src_base,
src_offset,
src_len,
src_pixels_per_line,
xoffset,
yoffset,
dst_base,
dst_offset,
dst_pitch,
dst_len
);
#endif
return;
}
void vp8_sixtap_predict8x8_cl
(
cl_command_queue cq,
unsigned char *src_base,
cl_mem src_mem,
int src_offset,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_base,
cl_mem dst_mem,
int dst_offset,
int dst_pitch
) {
int output_width=8, output_height=8, FData_height=13, FData_width=8;
//Size of output to transfer
int dst_len = DST_LEN(dst_pitch,output_height,output_width);
int src_len = SIXTAP_SRC_LEN(FData_width,FData_height,src_pixels_per_line);
#if TWO_PASS_SIXTAP
int int_offset = 16;
unsigned char *src_ptr = src_base + src_offset;
vp8_sixtap_run_cl(cq, src_mem, dst_mem,
(src_ptr-2*src_pixels_per_line),src_offset, src_len,
src_pixels_per_line, xoffset,yoffset,dst_base,dst_offset,
dst_pitch,dst_len,FData_height,FData_width,output_height,
output_width,int_offset
);
#else
vp8_sixtap_single_pass(
cq,
cl_data.vp8_sixtap_predict8x8_kernel,
cl_data.vp8_sixtap_predict8x8_kernel_size,
FData_height*FData_width,
src_mem,
dst_mem,
src_base,
src_offset,
src_len,
src_pixels_per_line,
xoffset,
yoffset,
dst_base,
dst_offset,
dst_pitch,
dst_len
);
#endif
return;
}
void vp8_sixtap_predict8x4_cl
(
cl_command_queue cq,
unsigned char *src_base,
cl_mem src_mem,
int src_offset,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_base,
cl_mem dst_mem,
int dst_offset,
int dst_pitch
) {
int output_width=8, output_height=4, FData_height=9, FData_width=8;
//Size of output to transfer
int dst_len = DST_LEN(dst_pitch,output_height,output_width);
int src_len = SIXTAP_SRC_LEN(FData_width,FData_height,src_pixels_per_line);
#if TWO_PASS_SIXTAP
int int_offset = 16;
unsigned char *src_ptr = src_base + src_offset;
vp8_sixtap_run_cl(cq, src_mem, dst_mem,
(src_ptr-2*src_pixels_per_line),src_offset, src_len,
src_pixels_per_line, xoffset,yoffset,dst_base,dst_offset,
dst_pitch,dst_len,FData_height,FData_width,output_height,
output_width,int_offset
);
#else
vp8_sixtap_single_pass(
cq,
cl_data.vp8_sixtap_predict8x4_kernel,
cl_data.vp8_sixtap_predict8x4_kernel_size,
FData_height*FData_width,
src_mem,
dst_mem,
src_base,
src_offset,
src_len,
src_pixels_per_line,
xoffset,
yoffset,
dst_base,
dst_offset,
dst_pitch,
dst_len
);
#endif
return;
}
void vp8_sixtap_predict16x16_cl
(
cl_command_queue cq,
unsigned char *src_base,
cl_mem src_mem,
int src_offset,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_base,
cl_mem dst_mem,
int dst_offset,
int dst_pitch
) {
int output_width=16, output_height=16, FData_height=21, FData_width=16;
//Size of output to transfer
int dst_len = DST_LEN(dst_pitch,output_height,output_width);
int src_len = SIXTAP_SRC_LEN(FData_width,FData_height,src_pixels_per_line);
#if TWO_PASS_SIXTAP
int int_offset = 32;
unsigned char *src_ptr = src_base + src_offset;
vp8_sixtap_run_cl(cq, src_mem, dst_mem,
(src_ptr-2*src_pixels_per_line),src_offset, src_len,
src_pixels_per_line, xoffset,yoffset,dst_base,dst_offset,
dst_pitch,dst_len,FData_height,FData_width,output_height,
output_width,int_offset
);
#else
vp8_sixtap_single_pass(
cq,
cl_data.vp8_sixtap_predict16x16_kernel,
cl_data.vp8_sixtap_predict16x16_kernel_size,
FData_height*FData_width,
src_mem,
dst_mem,
src_base,
src_offset,
src_len,
src_pixels_per_line,
xoffset,
yoffset,
dst_base,
dst_offset,
dst_pitch,
dst_len
);
#endif
return;
}
void vp8_filter_block2d_bil_first_pass_cl(
cl_command_queue cq,
unsigned char *src_base,
cl_mem src_mem,
int src_offset,
cl_mem int_mem,
int src_pixels_per_line,
int height,
int width,
int xoffset
)
{
int err;
size_t global = width*height;
int free_src = 0;
if (src_mem == NULL){
int src_len = BIL_SRC_LEN(width,height,src_pixels_per_line);
/*Make space for kernel input/output data. Initialize the buffer as well if needed. */
VP8_CL_CREATE_BUF(cq, src_mem, CL_MEM_READ_ONLY|CL_MEM_COPY_HOST_PTR,
sizeof (unsigned char) * src_len, src_base+src_offset,,
);
src_offset = 0; //Set to zero as long as src_mem starts at base+offset
free_src = 1;
}
err = clSetKernelArg(cl_data.vp8_filter_block2d_bil_first_pass_kernel, 0, sizeof (cl_mem), &src_mem);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_first_pass_kernel, 1, sizeof (int), &src_offset);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_first_pass_kernel, 2, sizeof (cl_mem), &int_mem);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_first_pass_kernel, 3, sizeof (int), &src_pixels_per_line);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_first_pass_kernel, 4, sizeof (int), &height);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_first_pass_kernel, 5, sizeof (int), &width);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_first_pass_kernel, 6, sizeof (int), &xoffset);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to set kernel arguments!\n",
,
);
/* Execute the kernel */
err = clEnqueueNDRangeKernel( cq, cl_data.vp8_filter_block2d_bil_first_pass_kernel, 1, NULL, &global, NULL , 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to execute kernel!\n",
printf("err = %d\n",err);,
);
if (free_src == 1)
clReleaseMemObject(src_mem);
}
void vp8_filter_block2d_bil_second_pass_cl(
cl_command_queue cq,
cl_mem int_mem,
unsigned char *dst_base,
cl_mem dst_mem,
int dst_offset,
int dst_pitch,
int height,
int width,
int yoffset
)
{
int err;
size_t global = width*height;
//Size of output data
int dst_len = DST_LEN(dst_pitch,height,width);
int free_dst = 0;
if (dst_mem == NULL){
VP8_CL_CREATE_BUF(cq, dst_mem, CL_MEM_WRITE_ONLY|CL_MEM_COPY_HOST_PTR,
sizeof (unsigned char) * dst_len + dst_offset, dst_base,,
);
free_dst = 1;
}
err = clSetKernelArg(cl_data.vp8_filter_block2d_bil_second_pass_kernel, 0, sizeof (cl_mem), &int_mem);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_second_pass_kernel, 1, sizeof (cl_mem), &dst_mem);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_second_pass_kernel, 2, sizeof (int), &dst_offset);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_second_pass_kernel, 3, sizeof (int), &dst_pitch);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_second_pass_kernel, 4, sizeof (int), &height);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_second_pass_kernel, 5, sizeof (int), &width);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_second_pass_kernel, 6, sizeof (int), &yoffset);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to set kernel arguments!\n",
,
);
/* Execute the kernel */
err = clEnqueueNDRangeKernel( cq, cl_data.vp8_filter_block2d_bil_second_pass_kernel, 1, NULL, &global, NULL , 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to execute kernel!\n",
printf("err = %d\n",err);,
);
if (free_dst == 1){
/* Read back the result data from the device */
err = clEnqueueReadBuffer(cq, dst_mem, CL_FALSE, 0, sizeof (unsigned char) * dst_len + dst_offset, dst_base, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to read output array!\n",
,
);
clReleaseMemObject(dst_mem);
}
}
void vp8_bilinear_predict4x4_cl
(
cl_command_queue cq,
unsigned char *src_base,
cl_mem src_mem,
int src_offset,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_base,
cl_mem dst_mem,
int dst_offset,
int dst_pitch
) {
const int height = 4, width = 4;
#if !STATIC_MEM
int err;
cl_mem int_mem = NULL;
VP8_CL_CREATE_BUF(NULL, int_mem, NULL, sizeof(cl_int)*21*16, NULL, ,);
#endif
/* First filter 1-D horizontally... */
vp8_filter_block2d_bil_first_pass_cl(cq, src_base, src_mem, src_offset, int_mem, src_pixels_per_line, height + 1, width, xoffset);
/* then 1-D vertically... */
vp8_filter_block2d_bil_second_pass_cl(cq, int_mem, dst_base, dst_mem, dst_offset, dst_pitch, height, width, yoffset);
#if !STATIC_MEM
clReleaseMemObject(int_mem);
#endif
}
void vp8_bilinear_predict8x8_cl
(
cl_command_queue cq,
unsigned char *src_base,
cl_mem src_mem,
int src_offset,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_base,
cl_mem dst_mem,
int dst_offset,
int dst_pitch
) {
const int height = 8, width = 8;
#if !STATIC_MEM
int err;
cl_mem int_mem = NULL;
VP8_CL_CREATE_BUF(NULL, int_mem, NULL, sizeof(cl_int)*21*16, NULL, ,);
#endif
/* First filter 1-D horizontally... */
vp8_filter_block2d_bil_first_pass_cl(cq, src_base, src_mem, src_offset, int_mem, src_pixels_per_line, height + 1, width, xoffset);
/* then 1-D vertically... */
vp8_filter_block2d_bil_second_pass_cl(cq, int_mem, dst_base, dst_mem, dst_offset, dst_pitch, height, width, yoffset);
#if !STATIC_MEM
clReleaseMemObject(int_mem);
#endif
}
void vp8_bilinear_predict8x4_cl
(
cl_command_queue cq,
unsigned char *src_base,
cl_mem src_mem,
int src_offset,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_base,
cl_mem dst_mem,
int dst_offset,
int dst_pitch
) {
const int height = 4, width = 8;
#if !STATIC_MEM
int err;
cl_mem int_mem = NULL;
VP8_CL_CREATE_BUF(NULL, int_mem, NULL, sizeof(cl_int)*21*16, NULL, ,);
#endif
/* First filter 1-D horizontally... */
vp8_filter_block2d_bil_first_pass_cl(cq, src_base, src_mem, src_offset, int_mem, src_pixels_per_line, height + 1, width, xoffset);
/* then 1-D vertically... */
vp8_filter_block2d_bil_second_pass_cl(cq, int_mem, dst_base, dst_mem, dst_offset, dst_pitch, height, width, yoffset);
#if !STATIC_MEM
clReleaseMemObject(int_mem);
#endif
}
void vp8_bilinear_predict16x16_cl
(
cl_command_queue cq,
unsigned char *src_base,
cl_mem src_mem,
int src_offset,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_base,
cl_mem dst_mem,
int dst_offset,
int dst_pitch
) {
const int height = 16, width = 16;
#if !STATIC_MEM
int err;
cl_mem int_mem = NULL;
VP8_CL_CREATE_BUF(NULL, int_mem, NULL, sizeof(cl_int)*21*16, NULL, ,);
#endif
/* First filter 1-D horizontally... */
vp8_filter_block2d_bil_first_pass_cl(cq, src_base, src_mem, src_offset, int_mem, src_pixels_per_line, height + 1, width, xoffset);
/* then 1-D vertically... */
vp8_filter_block2d_bil_second_pass_cl(cq, int_mem, dst_base, dst_mem, dst_offset, dst_pitch, height, width, yoffset);
#if !STATIC_MEM
clReleaseMemObject(int_mem);
#endif
}

View File

@@ -1,562 +0,0 @@
#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable
#pragma OPENCL EXTENSION cl_amd_printf : enable
__constant int bilinear_filters[8][2] = {
{ 128, 0},
{ 112, 16},
{ 96, 32},
{ 80, 48},
{ 64, 64},
{ 48, 80},
{ 32, 96},
{ 16, 112}
};
__constant short sub_pel_filters[8][8] = {
//These were originally 8x6, but are padded for vector ops
{ 0, 0, 128, 0, 0, 0, 0, 0}, /* note that 1/8 pel positions are just as per alpha -0.5 bicubic */
{ 0, -6, 123, 12, -1, 0, 0, 0},
{ 2, -11, 108, 36, -8, 1, 0, 0}, /* New 1/4 pel 6 tap filter */
{ 0, -9, 93, 50, -6, 0, 0, 0},
{ 3, -16, 77, 77, -16, 3, 0, 0}, /* New 1/2 pel 6 tap filter */
{ 0, -6, 50, 93, -9, 0, 0, 0},
{ 1, -8, 36, 108, -11, 2, 0, 0}, /* New 1/4 pel 6 tap filter */
{ 0, -1, 12, 123, -6, 0, 0, 0},
};
kernel void vp8_filter_block2d_first_pass_kernel(
__global unsigned char *src_base,
int src_offset,
__global int *output_ptr,
unsigned int src_pixels_per_line,
unsigned int output_height,
unsigned int output_width,
int filter_offset
){
uint tid = get_global_id(0);
global unsigned char *src_ptr = &src_base[src_offset];
//Note that src_offset will be reset later, which is why we use it now
int Temp;
__constant short *vp8_filter = sub_pel_filters[filter_offset];
if (tid < (output_width*output_height)){
src_offset = tid + (tid/output_width * (src_pixels_per_line - output_width));
Temp = (int)(src_ptr[src_offset - 2] * vp8_filter[0]) +
(int)(src_ptr[src_offset - 1] * vp8_filter[1]) +
(int)(src_ptr[src_offset] * vp8_filter[2]) +
(int)(src_ptr[src_offset + 1] * vp8_filter[3]) +
(int)(src_ptr[src_offset + 2] * vp8_filter[4]) +
(int)(src_ptr[src_offset + 3] * vp8_filter[5]) +
(VP8_FILTER_WEIGHT >> 1); /* Rounding */
/* Normalize back to 0-255 */
Temp = Temp >> VP8_FILTER_SHIFT;
if (Temp < 0)
Temp = 0;
else if ( Temp > 255 )
Temp = 255;
output_ptr[tid] = Temp;
}
}
kernel void vp8_filter_block2d_second_pass_kernel
(
__global int *src_base,
int src_offset,
__global unsigned char *output_base,
int output_offset,
int output_pitch,
unsigned int src_pixels_per_line,
unsigned int pixel_step,
unsigned int output_height,
unsigned int output_width,
int filter_offset
) {
uint i = get_global_id(0);
global int *src_ptr = &src_base[src_offset];
global unsigned char *output_ptr = &output_base[output_offset];
int out_offset; //Not same as output_offset...
int Temp;
int PS2 = 2*(int)pixel_step;
int PS3 = 3*(int)pixel_step;
unsigned int src_increment = src_pixels_per_line - output_width;
__constant short *vp8_filter = sub_pel_filters[filter_offset];
if (i < (output_width * output_height)){
out_offset = i/output_width;
src_offset = out_offset;
src_offset = i + (src_offset * src_increment);
out_offset = i%output_width + (out_offset * output_pitch);
/* Apply filter */
Temp = ((int)src_ptr[src_offset - PS2] * vp8_filter[0]) +
((int)src_ptr[src_offset -(int)pixel_step] * vp8_filter[1]) +
((int)src_ptr[src_offset] * vp8_filter[2]) +
((int)src_ptr[src_offset + pixel_step] * vp8_filter[3]) +
((int)src_ptr[src_offset + PS2] * vp8_filter[4]) +
((int)src_ptr[src_offset + PS3] * vp8_filter[5]) +
(VP8_FILTER_WEIGHT >> 1); /* Rounding */
/* Normalize back to 0-255 */
Temp = Temp >> VP8_FILTER_SHIFT;
if (Temp < 0)
Temp = 0;
else if (Temp > 255)
Temp = 255;
output_ptr[out_offset] = (unsigned char)Temp;
}
}
kernel void vp8_filter_block2d_bil_first_pass_kernel(
__global unsigned char *src_base,
int src_offset,
__global int *output_ptr,
unsigned int src_pixels_per_line,
unsigned int output_height,
unsigned int output_width,
int filter_offset
)
{
uint tid = get_global_id(0);
if (tid < output_width * output_height){
global unsigned char *src_ptr = &src_base[src_offset];
unsigned int i, j;
__constant int *vp8_filter = bilinear_filters[filter_offset];
unsigned int out_row,out_offset;
int src_increment = src_pixels_per_line - output_width;
i = tid / output_width;
j = tid % output_width;
src_offset = i*(output_width+src_increment) + j;
out_row = output_width * i;
out_offset = out_row + j;
/* Apply bilinear filter */
output_ptr[out_offset] = (((int)src_ptr[src_offset] * vp8_filter[0]) +
((int)src_ptr[src_offset+1] * vp8_filter[1]) +
(VP8_FILTER_WEIGHT / 2)) >> VP8_FILTER_SHIFT;
}
}
kernel void vp8_filter_block2d_bil_second_pass_kernel
(
__global int *src_ptr,
__global unsigned char *output_base,
int output_offset,
int output_pitch,
unsigned int output_height,
unsigned int output_width,
int filter_offset
)
{
uint tid = get_global_id(0);
if (tid < output_width * output_height){
global unsigned char *output_ptr = &output_base[output_offset];
unsigned int i, j;
int Temp;
__constant int *vp8_filter = bilinear_filters[filter_offset];
int out_offset;
int src_offset;
i = tid / output_width;
j = tid % output_width;
src_offset = i*(output_width) + j;
out_offset = i*output_pitch + j;
/* Apply filter */
Temp = ((int)src_ptr[src_offset] * vp8_filter[0]) +
((int)src_ptr[src_offset+output_width] * vp8_filter[1]) +
(VP8_FILTER_WEIGHT / 2);
output_ptr[out_offset++] = (unsigned int)(Temp >> VP8_FILTER_SHIFT);
}
}
//Called from reconinter_cl.c
kernel void vp8_memcpy_kernel(
global unsigned char *src_base,
int src_offset,
int src_stride,
global unsigned char *dst_base,
int dst_offset,
int dst_stride,
int num_bytes,
int num_iter
){
int i,r;
global unsigned char *src = &src_base[src_offset];
global unsigned char *dst = &dst_base[dst_offset];
src_offset = dst_offset = 0;
r = get_global_id(1);
if (r < get_global_size(1)){
i = get_global_id(0);
if (i < get_global_size(0)){
src_offset = r*src_stride + i;
dst_offset = r*dst_stride + i;
dst[dst_offset] = src[src_offset];
}
}
}
//Not used currently.
void vp8_memset_short(
global short *mem,
int offset,
short newval,
unsigned int size
)
{
int tid = get_global_id(0);
if (tid < (size/2)){
mem[offset+tid/2] = newval;
}
}
__kernel void vp8_bilinear_predict4x4_kernel
(
__global unsigned char *src_base,
int src_offset,
int src_pixels_per_line,
int xoffset,
int yoffset,
__global unsigned char *dst_base,
int dst_offset,
int dst_pitch,
__global int *int_mem
)
{
int Height = 4, Width = 4;
/* First filter 1-D horizontally... */
vp8_filter_block2d_bil_first_pass_kernel(src_base, src_offset, int_mem, src_pixels_per_line, Height + 1, Width, xoffset);
/* then 1-D vertically... */
vp8_filter_block2d_bil_second_pass_kernel(int_mem, dst_base, dst_offset, dst_pitch, Height, Width, yoffset);
}
__kernel void vp8_bilinear_predict8x8_kernel
(
__global unsigned char *src_base,
int src_offset,
int src_pixels_per_line,
int xoffset,
int yoffset,
__global unsigned char *dst_base,
int dst_offset,
int dst_pitch,
__global int *int_mem
)
{
int Height = 8, Width = 8;
/* First filter 1-D horizontally... */
vp8_filter_block2d_bil_first_pass_kernel(src_base, src_offset, int_mem, src_pixels_per_line, Height + 1, Width, xoffset);
/* then 1-D vertically... */
vp8_filter_block2d_bil_second_pass_kernel(int_mem, dst_base, dst_offset, dst_pitch, Height, Width, yoffset);
}
__kernel void vp8_bilinear_predict8x4_kernel
(
__global unsigned char *src_base,
int src_offset,
int src_pixels_per_line,
int xoffset,
int yoffset,
__global unsigned char *dst_base,
int dst_offset,
int dst_pitch,
__global int *int_mem
)
{
int Height = 4, Width = 8;
/* First filter 1-D horizontally... */
vp8_filter_block2d_bil_first_pass_kernel(src_base, src_offset, int_mem, src_pixels_per_line, Height + 1, Width, xoffset);
/* then 1-D vertically... */
vp8_filter_block2d_bil_second_pass_kernel(int_mem, dst_base, dst_offset, dst_pitch, Height, Width, yoffset);
}
__kernel void vp8_bilinear_predict16x16_kernel
(
__global unsigned char *src_base,
int src_offset,
int src_pixels_per_line,
int xoffset,
int yoffset,
__global unsigned char *dst_base,
int dst_offset,
int dst_pitch,
__global int *int_mem
)
{
int Height = 16, Width = 16;
/* First filter 1-D horizontally... */
vp8_filter_block2d_bil_first_pass_kernel(src_base, src_offset, int_mem, src_pixels_per_line, Height + 1, Width, xoffset);
/* then 1-D vertically... */
vp8_filter_block2d_bil_second_pass_kernel(int_mem, dst_base, dst_offset, dst_pitch, Height, Width, yoffset);
}
void vp8_filter_block2d_first_pass(
global unsigned char *src_base,
int src_offset,
local int *output_ptr,
unsigned int src_pixels_per_line,
unsigned int pixel_step,
unsigned int output_height,
unsigned int output_width,
int filter_offset
){
uint tid = get_global_id(0);
uint i = tid;
int nthreads = get_global_size(0);
int ngroups = nthreads / get_local_size(0);
global unsigned char *src_ptr = &src_base[src_offset];
//Note that src_offset will be reset later, which is why we capture it now
int Temp;
__constant short *vp8_filter = sub_pel_filters[filter_offset];
if (tid < (output_width*output_height)){
short filter0 = vp8_filter[0];
short filter1 = vp8_filter[1];
short filter2 = vp8_filter[2];
short filter3 = vp8_filter[3];
short filter4 = vp8_filter[4];
short filter5 = vp8_filter[5];
if (ngroups > 1){
//This is generally only true on Apple CPU-CL, which gives a group
//size of 1, regardless of the CPU core count.
for (i=0; i < output_width*output_height; i++){
src_offset = i + (i/output_width * (src_pixels_per_line - output_width));
Temp = (int)(src_ptr[src_offset - 2] * filter0) +
(int)(src_ptr[src_offset - 1] * filter1) +
(int)(src_ptr[src_offset] * filter2) +
(int)(src_ptr[src_offset + 1] * filter3) +
(int)(src_ptr[src_offset + 2] * filter4) +
(int)(src_ptr[src_offset + 3] * filter5) +
(VP8_FILTER_WEIGHT >> 1); /* Rounding */
/* Normalize back to 0-255 */
Temp >>= VP8_FILTER_SHIFT;
if (Temp < 0)
Temp = 0;
else if ( Temp > 255 )
Temp = 255;
output_ptr[i] = Temp;
}
} else {
src_offset = i + (i/output_width * (src_pixels_per_line - output_width));
Temp = (int)(src_ptr[src_offset - 2] * filter0) +
(int)(src_ptr[src_offset - 1] * filter1) +
(int)(src_ptr[src_offset] * filter2) +
(int)(src_ptr[src_offset + 1] * filter3) +
(int)(src_ptr[src_offset + 2] * filter4) +
(int)(src_ptr[src_offset + 3] * filter5) +
(VP8_FILTER_WEIGHT >> 1); /* Rounding */
/* Normalize back to 0-255 */
Temp >>= VP8_FILTER_SHIFT;
if (Temp < 0)
Temp = 0;
else if ( Temp > 255 )
Temp = 255;
output_ptr[i] = Temp;
}
}
//Add a fence so that no 2nd pass stuff starts before 1st pass writes are done.
barrier(CLK_LOCAL_MEM_FENCE);
}
void vp8_filter_block2d_second_pass
(
local int *src_ptr,
global unsigned char *output_base,
int output_offset,
int output_pitch,
unsigned int src_pixels_per_line,
unsigned int pixel_step,
unsigned int output_height,
unsigned int output_width,
int filter_offset
) {
global unsigned char *output_ptr = &output_base[output_offset];
int out_offset; //Not same as output_offset...
int src_offset;
int Temp;
int PS2 = 2*(int)pixel_step;
int PS3 = 3*(int)pixel_step;
unsigned int src_increment = src_pixels_per_line - output_width;
uint i = get_global_id(0);
__constant short *vp8_filter = sub_pel_filters[filter_offset];
if (i < (output_width * output_height)){
out_offset = i/output_width;
src_offset = out_offset;
src_offset = i + (src_offset * src_increment);
out_offset = i%output_width + (out_offset * output_pitch);
/* Apply filter */
Temp = ((int)src_ptr[src_offset - PS2] * vp8_filter[0]) +
((int)src_ptr[src_offset -(int)pixel_step] * vp8_filter[1]) +
((int)src_ptr[src_offset] * vp8_filter[2]) +
((int)src_ptr[src_offset + pixel_step] * vp8_filter[3]) +
((int)src_ptr[src_offset + PS2] * vp8_filter[4]) +
((int)src_ptr[src_offset + PS3] * vp8_filter[5]) +
(VP8_FILTER_WEIGHT >> 1); /* Rounding */
/* Normalize back to 0-255 */
Temp = Temp >> VP8_FILTER_SHIFT;
if (Temp < 0)
Temp = 0;
else if (Temp > 255)
Temp = 255;
output_ptr[out_offset] = (unsigned char)Temp;
}
}
__kernel void vp8_sixtap_predict_kernel
(
__global unsigned char *src_ptr,
int src_offset,
int src_pixels_per_line,
int xoffset,
int yoffset,
__global unsigned char *dst_ptr,
int dst_offset,
int dst_pitch
)
{
local int FData[9*4];
/* First filter 1-D horizontally... */
vp8_filter_block2d_first_pass(src_ptr, src_offset, FData, src_pixels_per_line, 1, 9, 4, xoffset);
/* then filter vertically... */
vp8_filter_block2d_second_pass(&FData[8], dst_ptr, dst_offset, dst_pitch, 4, 4, 4, 4, yoffset);
}
__kernel void vp8_sixtap_predict8x8_kernel
(
__global unsigned char *src_ptr,
int src_offset,
int src_pixels_per_line,
int xoffset,
int yoffset,
__global unsigned char *dst_ptr,
int dst_offset,
int dst_pitch
)
{
local int FData[13*16]; /* Temp data bufffer used in filtering */
/* First filter 1-D horizontally... */
vp8_filter_block2d_first_pass(src_ptr, src_offset, FData, src_pixels_per_line, 1, 13, 8, xoffset);
/* then filter vertically... */
vp8_filter_block2d_second_pass(&FData[16], dst_ptr, dst_offset, dst_pitch, 8, 8, 8, 8, yoffset);
}
__kernel void vp8_sixtap_predict8x4_kernel
(
__global unsigned char *src_ptr,
int src_offset,
int src_pixels_per_line,
int xoffset,
int yoffset,
__global unsigned char *dst_ptr,
int dst_offset,
int dst_pitch
)
{
local int FData[13*16]; /* Temp data buffer used in filtering */
/* First filter 1-D horizontally... */
vp8_filter_block2d_first_pass(src_ptr, src_offset, FData, src_pixels_per_line, 1, 9, 8, xoffset);
/* then filter verticaly... */
vp8_filter_block2d_second_pass(&FData[16], dst_ptr, dst_offset, dst_pitch, 8, 8, 4, 8, yoffset);
}
__kernel void vp8_sixtap_predict16x16_kernel
(
__global unsigned char *src_ptr,
int src_offset,
int src_pixels_per_line,
int xoffset,
int yoffset,
__global unsigned char *dst_ptr,
int dst_offset,
int dst_pitch
)
{
local int FData[21*24]; /* Temp data buffer used in filtering */
/* First filter 1-D horizontally... */
vp8_filter_block2d_first_pass(src_ptr, src_offset, FData, src_pixels_per_line, 1, 21, 16, xoffset);
/* then filter verticaly... */
vp8_filter_block2d_second_pass(&FData[32], dst_ptr, dst_offset, dst_pitch, 16, 16, 16, 16, yoffset);
return;
}

View File

@@ -1,74 +0,0 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#ifndef FILTER_CL_H_
#define FILTER_CL_H_
#ifdef __cplusplus
extern "C" {
#endif
#include "vp8_opencl.h"
#define VP8_FILTER_WEIGHT 128
#define VP8_FILTER_SHIFT 7
#define REGISTER_FILTER 1
#define CLAMP(x,min,max) if (x < min) x = min; else if ( x > max ) x = max;
#define PRE_CALC_PIXEL_STEPS 1
#define PRE_CALC_SRC_INCREMENT 1
#if PRE_CALC_PIXEL_STEPS
#define PS2 two_pixel_steps
#define PS3 three_pixel_steps
#else
#define PS2 2*(int)pixel_step
#define PS3 3*(int)pixel_step
#endif
#if REGISTER_FILTER
#define FILTER0 filter0
#define FILTER1 filter1
#define FILTER2 filter2
#define FILTER3 filter3
#define FILTER4 filter4
#define FILTER5 filter5
#else
#define FILTER0 vp8_filter[0]
#define FILTER1 vp8_filter[1]
#define FILTER2 vp8_filter[2]
#define FILTER3 vp8_filter[3]
#define FILTER4 vp8_filter[4]
#define FILTER5 vp8_filter[5]
#endif
#if PRE_CALC_SRC_INCREMENT
#define SRC_INCREMENT src_increment
#else
#define SRC_INCREMENT (src_pixels_per_line - output_width)
#endif
#define FILTER_OFFSET //Filter data stored as CL constant memory
#define FILTER_REF sub_pel_filters[filter_offset]
extern const char *filterCompileOptions;
extern const char *filter_cl_file_name;
//Copy the -2*pixel_step (and ps*3) bytes because the filter algorithm
//accesses negative indexes
#define SIXTAP_SRC_LEN(out_width,out_height,src_px) ((out_width)*(out_height) + (((out_width)*(out_height)-1)/(out_width))*(src_px - out_width) + 5)
#define BIL_SRC_LEN(out_width,out_height,src_px) ((out_height) * src_px + out_width)
#define DST_LEN(dst_pitch,dst_height,dst_width) (dst_pitch * (dst_height) + (dst_width))
#ifdef __cplusplus
}
#endif
#endif /* FILTER_CL_H_ */

View File

@@ -1,45 +0,0 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#ifndef IDCT_OPENCL_H
#define IDCT_OPENCL_H
#ifdef __cplusplus
extern "C" {
#endif
#include "vp8_opencl.h"
#include "vp8/common/blockd.h"
#define prototype_second_order_cl(sym) \
void sym(BLOCKD *b)
#define prototype_idct_cl(sym) \
void sym(BLOCKD *b, int pitch)
#define prototype_idct_scalar_add_cl(sym) \
void sym(BLOCKD *b, cl_int use_diff, int diff_offset, int qcoeff_offset, \
int pred_offset, unsigned char *output, cl_mem out_mem, int out_offset, size_t out_size, \
int pitch, int stride)\
extern prototype_idct_cl(vp8_short_idct4x4llm_1_cl);
extern prototype_idct_cl(vp8_short_idct4x4llm_cl);
extern prototype_idct_scalar_add_cl(vp8_dc_only_idct_add_cl);
extern prototype_second_order_cl(vp8_short_inv_walsh4x4_1_cl);
extern prototype_second_order_cl(vp8_short_inv_walsh4x4_cl);
#ifdef __cplusplus
}
#endif
#endif

View File

@@ -1,325 +0,0 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include <stdlib.h>
//ACW: Remove me after debugging.
#include <stdio.h>
#include <string.h>
#include "idct_cl.h"
#include "idctllm_cl.h"
#include "blockd_cl.h"
void cl_destroy_idct(){
if (cl_data.idct_program)
clReleaseProgram(cl_data.idct_program);
cl_data.idct_program = NULL;
VP8_CL_RELEASE_KERNEL(cl_data.vp8_short_inv_walsh4x4_1_kernel);
VP8_CL_RELEASE_KERNEL(cl_data.vp8_short_inv_walsh4x4_1st_pass_kernel);
VP8_CL_RELEASE_KERNEL(cl_data.vp8_short_inv_walsh4x4_2nd_pass_kernel);
VP8_CL_RELEASE_KERNEL(cl_data.vp8_dc_only_idct_add_kernel);
//VP8_CL_RELEASE_KERNEL(cl_data.vp8_short_idct4x4llm_1_kernel);
//VP8_CL_RELEASE_KERNEL(cl_data.vp8_short_idct4x4llm_kernel);
}
int cl_init_idct() {
int err;
// Create the filter compute program from the file-defined source code
if (cl_load_program(&cl_data.idct_program, idctllm_cl_file_name,
idctCompileOptions) != CL_SUCCESS)
return VP8_CL_TRIED_BUT_FAILED;
// Create the compute kernel in the program we wish to run
VP8_CL_CREATE_KERNEL(cl_data,idct_program,vp8_short_inv_walsh4x4_1_kernel,"vp8_short_inv_walsh4x4_1_kernel");
VP8_CL_CREATE_KERNEL(cl_data,idct_program,vp8_short_inv_walsh4x4_1st_pass_kernel,"vp8_short_inv_walsh4x4_1st_pass_kernel");
VP8_CL_CREATE_KERNEL(cl_data,idct_program,vp8_short_inv_walsh4x4_2nd_pass_kernel,"vp8_short_inv_walsh4x4_2nd_pass_kernel");
VP8_CL_CREATE_KERNEL(cl_data,idct_program,vp8_dc_only_idct_add_kernel,"vp8_dc_only_idct_add_kernel");
////idct4x4llm kernels are only useful for the encoder
//VP8_CL_CREATE_KERNEL(cl_data,idct_program,vp8_short_idct4x4llm_1_kernel,"vp8_short_idct4x4llm_1_kernel");
//VP8_CL_CREATE_KERNEL(cl_data,idct_program,vp8_short_idct4x4llm_kernel,"vp8_short_idct4x4llm_kernel");
return CL_SUCCESS;
}
#define max(x,y) (x > y ? x: y)
//#define NO_CL
/* Only useful for encoder... Untested... */
void vp8_short_idct4x4llm_cl(BLOCKD *b, int pitch)
{
int err;
short *input = b->dqcoeff_base + b->dqcoeff_offset;
short *output = &b->diff_base[b->diff_offset];
cl_mem src_mem, dst_mem;
//1 instance for now. This should be split into 2-pass * 4 thread.
size_t global = 1;
if (cl_initialized != CL_SUCCESS){
vp8_short_idct4x4llm_c(input,output,pitch);
return;
}
VP8_CL_CREATE_BUF(b->cl_commands, src_mem,,
sizeof(short)*16, input,
vp8_short_idct4x4llm_c(input,output,pitch),
);
VP8_CL_CREATE_BUF(b->cl_commands, dst_mem,,
sizeof(short)*(4+(pitch/2)*3), output,
vp8_short_idct4x4llm_c(input,output,pitch),
);
//Set arguments and run kernel
err = 0;
err = clSetKernelArg(cl_data.vp8_short_idct4x4llm_kernel, 0, sizeof (cl_mem), &src_mem);
err |= clSetKernelArg(cl_data.vp8_short_idct4x4llm_kernel, 1, sizeof (cl_mem), &dst_mem);
err |= clSetKernelArg(cl_data.vp8_short_idct4x4llm_kernel, 2, sizeof (int), &pitch);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to set kernel arguments!\n",
vp8_short_idct4x4llm_c(input,output,pitch),
);
/* Execute the kernel */
err = clEnqueueNDRangeKernel(b->cl_commands, cl_data.vp8_short_idct4x4llm_kernel, 1, NULL, &global, NULL , 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to execute kernel!\n",
printf("err = %d\n",err);
vp8_short_idct4x4llm_c(input,output,pitch),
);
/* Read back the result data from the device */
err = clEnqueueReadBuffer(b->cl_commands, dst_mem, CL_FALSE, 0, sizeof(short)*(4+pitch/2*3), output, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS(b->cl_commands, err != CL_SUCCESS,
"Error: Failed to read output array!\n",
vp8_short_idct4x4llm_c(input,output,pitch),
);
clReleaseMemObject(src_mem);
clReleaseMemObject(dst_mem);
return;
}
/* Only useful for encoder... Untested... */
void vp8_short_idct4x4llm_1_cl(BLOCKD *b, int pitch)
{
int err;
size_t global = 4;
short *input = b->dqcoeff_base + b->dqcoeff_offset;
short *output = &b->diff_base[b->diff_offset];
cl_mem src_mem, dst_mem;
if (cl_initialized != CL_SUCCESS){
vp8_short_idct4x4llm_1_c(input,output,pitch);
return;
}
printf("vp8_short_idct4x4llm_1_cl\n");
VP8_CL_CREATE_BUF(b->cl_commands, src_mem,,
sizeof(short), input,
vp8_short_idct4x4llm_1_c(input,output,pitch),
);
VP8_CL_CREATE_BUF(b->cl_commands, dst_mem,,
sizeof(short)*(4+(pitch/2)*3), output,
vp8_short_idct4x4llm_1_c(input,output,pitch),
);
//Set arguments and run kernel
err = 0;
err = clSetKernelArg(cl_data.vp8_short_idct4x4llm_1_kernel, 0, sizeof (cl_mem), &src_mem);
err |= clSetKernelArg(cl_data.vp8_short_idct4x4llm_1_kernel, 1, sizeof (cl_mem), &dst_mem);
err |= clSetKernelArg(cl_data.vp8_short_idct4x4llm_1_kernel, 2, sizeof (int), &pitch);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to set kernel arguments!\n",
vp8_short_idct4x4llm_1_c(input,output,pitch),
);
/* Execute the kernel */
err = clEnqueueNDRangeKernel(b->cl_commands, cl_data.vp8_short_idct4x4llm_1_kernel, 1, NULL, &global, NULL , 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to execute kernel!\n",
printf("err = %d\n",err);
vp8_short_idct4x4llm_1_c(input,output,pitch),
);
/* Read back the result data from the device */
err = clEnqueueReadBuffer(b->cl_commands, dst_mem, CL_FALSE, 0, sizeof(short)*(4+pitch/2*3), output, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS(b->cl_commands, err != CL_SUCCESS,
"Error: Failed to read output array!\n",
vp8_short_idct4x4llm_1_c(input,output,pitch),
);
clReleaseMemObject(src_mem);
clReleaseMemObject(dst_mem);
return;
}
void vp8_dc_only_idct_add_cl(BLOCKD *b, cl_int use_diff, int diff_offset,
int qcoeff_offset, int pred_offset,
unsigned char *dst_base, cl_mem dst_mem, int dst_offset, size_t dest_size,
int pitch, int stride
)
{
int err;
size_t global = 16;
int free_mem = 0;
//cl_mem dest_mem = NULL;
if (dst_mem == NULL){
VP8_CL_CREATE_BUF(b->cl_commands, dst_mem,,
dest_size, dst_base,,
);
free_mem = 1;
}
//Set arguments and run kernel
err = clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 0, sizeof (cl_mem), &b->cl_predictor_mem);
err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 1, sizeof (int), &pred_offset);
err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 2, sizeof (cl_mem), &dst_mem);
err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 3, sizeof (int), &dst_offset);
err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 4, sizeof (int), &pitch);
err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 5, sizeof (int), &stride);
err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 6, sizeof (cl_int), &use_diff);
err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 7, sizeof (cl_mem), &b->cl_diff_mem);
err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 8, sizeof (int), &diff_offset);
err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 9, sizeof (cl_mem), &b->cl_qcoeff_mem);
err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 10, sizeof (int), &qcoeff_offset);
err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 11, sizeof (cl_mem), &b->cl_dequant_mem);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to set kernel arguments!\n",,
);
/* Execute the kernel */
err = clEnqueueNDRangeKernel(b->cl_commands, cl_data.vp8_dc_only_idct_add_kernel, 1, NULL, &global, NULL , 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to execute kernel!\n",
printf("err = %d\n",err);,
);
if (free_mem == 1){
/* Read back the result data from the device */
err = clEnqueueReadBuffer(b->cl_commands, dst_mem, CL_FALSE, 0,
dest_size, dst_base, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS(b->cl_commands, err != CL_SUCCESS,
"Error: Failed to read output array!\n",,
);
clReleaseMemObject(dst_mem);
}
return;
}
void vp8_short_inv_walsh4x4_cl(BLOCKD *b)
{
int err;
size_t global = 4;
if (cl_initialized != CL_SUCCESS){
vp8_short_inv_walsh4x4_c(b->dqcoeff_base+b->dqcoeff_offset,&b->diff_base[b->diff_offset]);
return;
}
//Set arguments and run kernel
err = 0;
err = clSetKernelArg(cl_data.vp8_short_inv_walsh4x4_1st_pass_kernel, 0, sizeof (cl_mem), &b->cl_dqcoeff_mem);
err |= clSetKernelArg(cl_data.vp8_short_inv_walsh4x4_1st_pass_kernel, 1, sizeof(int), &b->dqcoeff_offset);
err |= clSetKernelArg(cl_data.vp8_short_inv_walsh4x4_1st_pass_kernel, 2, sizeof (cl_mem), &b->cl_diff_mem);
err |= clSetKernelArg(cl_data.vp8_short_inv_walsh4x4_1st_pass_kernel, 3, sizeof(int), &b->diff_offset);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to set kernel arguments!\n",
vp8_short_inv_walsh4x4_c(b->dqcoeff_base+b->dqcoeff_offset, &b->diff_base[b->diff_offset]),
);
/* Execute the kernel */
err = clEnqueueNDRangeKernel(b->cl_commands, cl_data.vp8_short_inv_walsh4x4_1st_pass_kernel, 1, NULL, &global, NULL , 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to execute kernel!\n",
printf("err = %d\n",err);
vp8_short_inv_walsh4x4_c(b->dqcoeff_base+b->dqcoeff_offset, &b->diff_base[b->diff_offset]),
);
//Second pass
//Set arguments and run kernel
err = 0;
err = clSetKernelArg(cl_data.vp8_short_inv_walsh4x4_2nd_pass_kernel, 0, sizeof (cl_mem), &b->cl_diff_mem);
err |= clSetKernelArg(cl_data.vp8_short_inv_walsh4x4_2nd_pass_kernel, 1, sizeof(int), &b->diff_offset);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to set kernel arguments!\n",
vp8_short_inv_walsh4x4_c(b->dqcoeff_base+b->dqcoeff_offset, &b->diff_base[b->diff_offset]),
);
/* Execute the kernel */
err = clEnqueueNDRangeKernel(b->cl_commands, cl_data.vp8_short_inv_walsh4x4_2nd_pass_kernel, 1, NULL, &global, NULL , 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to execute kernel!\n",
printf("err = %d\n",err);
vp8_short_inv_walsh4x4_c(b->dqcoeff_base+b->dqcoeff_offset, &b->diff_base[b->diff_offset]),
);
return;
}
void vp8_short_inv_walsh4x4_1_cl(BLOCKD *b)
{
int err;
size_t global = 4;
if (cl_initialized != CL_SUCCESS){
vp8_short_inv_walsh4x4_1_c(b->dqcoeff_base + b->dqcoeff_offset,
&b->diff_base[b->diff_offset]);
return;
}
//Set arguments and run kernel
err = 0;
err = clSetKernelArg(cl_data.vp8_short_inv_walsh4x4_1_kernel, 0, sizeof (cl_mem), &b->cl_dqcoeff_mem);
err |= clSetKernelArg(cl_data.vp8_short_inv_walsh4x4_1_kernel, 1, sizeof (int), &b->dqcoeff_offset);
err |= clSetKernelArg(cl_data.vp8_short_inv_walsh4x4_1_kernel, 2, sizeof (cl_mem), &b->cl_diff_mem);
err |= clSetKernelArg(cl_data.vp8_short_inv_walsh4x4_1_kernel, 3, sizeof (int), &b->diff_offset);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to set kernel arguments!\n",
vp8_short_inv_walsh4x4_1_c(b->dqcoeff_base + b->dqcoeff_offset,
&b->diff_base[b->diff_offset]),
);
/* Execute the kernel */
err = clEnqueueNDRangeKernel(b->cl_commands, cl_data.vp8_short_inv_walsh4x4_1_kernel, 1, NULL, &global, NULL , 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to execute kernel!\n",
printf("err = %d\n",err);
vp8_short_inv_walsh4x4_1_c(b->dqcoeff_base + b->dqcoeff_offset,
&b->diff_base[b->diff_offset]),
);
return;
}

View File

@@ -1,309 +0,0 @@
#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable
#pragma OPENCL EXTENSION cl_amd_printf : enable
__constant int cospi8sqrt2minus1 = 20091;
__constant int sinpi8sqrt2 = 35468;
__constant int rounding = 0;
kernel void vp8_short_idct4x4llm_1st_pass_kernel(global short*,global short *,int);
kernel void vp8_short_idct4x4llm_2nd_pass_kernel(global short*,int);
__kernel void vp8_short_idct4x4llm_kernel(
__global short *input,
__global short *output,
int pitch
){
vp8_short_idct4x4llm_1st_pass_kernel(input,output,pitch);
vp8_short_idct4x4llm_2nd_pass_kernel(output,pitch);
}
__kernel void vp8_short_idct4x4llm_1st_pass_kernel(
__global short *ip,
__global short *op,
int pitch
)
{
int i;
int a1, b1, c1, d1;
int temp1, temp2;
int shortpitch = pitch >> 1;
for (i = 0; i < 4; i++)
{
a1 = ip[0] + ip[8];
b1 = ip[0] - ip[8];
temp1 = (ip[4] * sinpi8sqrt2 + rounding) >> 16;
temp2 = ip[12] + ((ip[12] * cospi8sqrt2minus1 + rounding) >> 16);
c1 = temp1 - temp2;
temp1 = ip[4] + ((ip[4] * cospi8sqrt2minus1 + rounding) >> 16);
temp2 = (ip[12] * sinpi8sqrt2 + rounding) >> 16;
d1 = temp1 + temp2;
op[shortpitch*0] = a1 + d1;
op[shortpitch*3] = a1 - d1;
op[shortpitch*1] = b1 + c1;
op[shortpitch*2] = b1 - c1;
ip++;
op++;
}
return;
}
__kernel void vp8_short_idct4x4llm_2nd_pass_kernel(
__global short *output,
int pitch
)
{
int i;
int a1, b1, c1, d1;
int temp1, temp2;
int shortpitch = pitch >> 1;
__global short *ip = output;
__global short *op = output;
for (i = 0; i < 4; i++)
{
a1 = ip[0] + ip[2];
b1 = ip[0] - ip[2];
temp1 = (ip[1] * sinpi8sqrt2 + rounding) >> 16;
temp2 = ip[3] + ((ip[3] * cospi8sqrt2minus1 + rounding) >> 16);
c1 = temp1 - temp2;
temp1 = ip[1] + ((ip[1] * cospi8sqrt2minus1 + rounding) >> 16);
temp2 = (ip[3] * sinpi8sqrt2 + rounding) >> 16;
d1 = temp1 + temp2;
op[0] = (a1 + d1 + 4) >> 3;
op[3] = (a1 - d1 + 4) >> 3;
op[1] = (b1 + c1 + 4) >> 3;
op[2] = (b1 - c1 + 4) >> 3;
ip += shortpitch;
op += shortpitch;
}
return;
}
__kernel void vp8_short_idct4x4llm_1_kernel(
__global short *input,
__global short *output,
int pitch
)
{
int a1;
int out_offset;
int shortpitch = pitch >> 1;
//short4 a;
a1 = ((input[0] + 4) >> 3);
//a = a1;
int tid = get_global_id(0);
if (tid < 4){
out_offset = shortpitch * tid;
//vstore4(a,0,&output[out_offset];
output[out_offset] = a1;
output[out_offset+1] = a1;
output[out_offset+2] = a1;
output[out_offset+3] = a1;
}
}
__kernel void vp8_dc_only_idct_add_kernel(
__global unsigned char *pred_base,
int pred_offset,
__global unsigned char *dst_base,
int dst_offset,
int pitch,
int stride,
int use_diff,
global short *diff_base,
int diff_offset,
global short *qcoeff_base,
int qcoeff_offset,
global short *dequant
)
{
int r, c;
//int pred_offset;
global unsigned char *pred_ptr = &pred_base[pred_offset];
global unsigned char *dst_ptr = &dst_base[dst_offset];
int tid = get_global_id(0);
int a1;
if (tid < 16){
if (use_diff == 1){
a1 = diff_base[diff_offset];
} else {
a1 = qcoeff_base[qcoeff_offset] * dequant[0];
}
a1 = (a1 + 4)>>3;
r = tid / 4;
c = tid % 4;
pred_offset = r * pitch;
dst_offset += r * stride;
int a = a1 + pred_ptr[pred_offset + c] ;
if (a < 0)
a = 0;
else if (a > 255)
a = 255;
dst_base[dst_offset + c] = (unsigned char) a ;
}
}
__kernel void vp8_short_inv_walsh4x4_1st_pass_kernel(
__global short *src_base,
int src_offset,
__global short *output_base,
int out_offset
)
{
__global short *input = src_base + src_offset;
__global short *output = output_base + src_offset;
int tid = get_global_id(0);
#define VEC_WALSH 0
#if VEC_WALSH
//4-short vectors to calculate things in
short4 a,b,c,d, a2v, b2v, c2v, d2v, a1t, b1t, c1t, d1t;
short16 out;
if (tid == 0){
//first pass loop in vector form
a = vload4(0,input) + vload4(3,input);
b = vload4(1,input) + vload4(2,input);
c = vload4(1,input) - vload4(2,input);
d = vload4(0,input) - vload4(3,input);
vstore4(a + b, 0, output);
vstore4(c + d, 1, output);
vstore4(a - b, 2, output);
vstore4(d - c, 3, output);
return;
//2nd pass
a = (short4)(output[0], output[4], output[8], output[12]);
b = (short4)(output[1], output[5], output[9], output[13]);
c = (short4)(output[1], output[5], output[9], output[13]);
d = (short4)(output[0], output[4], output[8], output[12]);
a1t = (short4)(output[3], output[7], output[11], output[15]);
b1t = (short4)(output[2], output[6], output[10], output[14]);
c1t = (short4)(output[2], output[6], output[10], output[14]);
d1t = (short4)(output[3], output[7], output[11], output[15]);
a = a + a1t + (short)3;
b = b + b1t;
c = c - c1t;
d = d - d1t + (short)3;
a2v = (a + b) >> (short)3;
b2v = (c + d) >> (short)3;
c2v = (a - b) >> (short)3;
d2v = (d - c) >> (short)3;
out.s048c = a2v;
out.s159d = b2v;
out.s26ae = c2v;
out.s37bf = d2v;
vstore16(out,0,output);
}
#else
int i;
int a1, b1, c1, d1;
int a2, b2, c2, d2;
global short *ip = input;
global short *op = output;
int offset;
if (tid < 4){
offset = tid;
a1 = ip[offset] + ip[offset + 12];
b1 = ip[offset + 4] + ip[offset + 8];
c1 = ip[offset + 4] - ip[offset + 8];
d1 = ip[offset] - ip[offset + 12];
op[offset] = a1 + b1;
op[offset + 4] = c1 + d1;
op[offset + 8] = a1 - b1;
op[offset + 12] = d1 - c1;
}
#endif
}
__kernel void vp8_short_inv_walsh4x4_2nd_pass_kernel(
__global short *output_base,
int out_offset
)
{
int i;
int a1, b1, c1, d1;
int a2, b2, c2, d2;
__global short *output = output_base + out_offset;
int tid = get_global_id(0);
int offset = 0;
if (tid < 4){
offset = 4*tid;
a1 = output[offset] + output[offset + 3];
b1 = output[offset + 1] + output[offset + 2];
c1 = output[offset + 1] - output[offset + 2];
d1 = output[offset + 0] - output[offset + 3];
a2 = a1 + b1;
b2 = c1 + d1;
c2 = a1 - b1;
d2 = d1 - c1;
output[offset + 0] = (a2 + 3) >> 3;
output[offset + 1] = (b2 + 3) >> 3;
output[offset + 2] = (c2 + 3) >> 3;
output[offset + 3] = (d2 + 3) >> 3;
}
}
__kernel void vp8_short_inv_walsh4x4_1_kernel(
__global short *src_data,
int src_offset,
__global short *dst_data,
int dst_offset
){
int a1;
int tid = get_global_id(0);
//short16 a;
int i;
short4 a;
__global short *input = src_data + src_offset;
__global short *output = dst_data + dst_offset;
if (tid < 4)
{
a1 = ((input[0] + 3) >> 3);
a = (short)a1; //Set all elements of vector to a1
vstore4(a, tid, output);
}
}

View File

@@ -1,26 +0,0 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include "vpx_config.h"
#include "vp8_opencl.h"
#include "vp8/common/blockd.h"
#define CLAMP(x,min,max) if (x < min) x = min; else if ( x > max ) x = max;
//External functions that are fallbacks if CL is unavailable
extern void vp8_short_idct4x4llm_c(short *input, short *output, int pitch);
extern void vp8_short_idct4x4llm_1_c(short *input, short *output, int pitch);
extern void vp8_dc_only_idct_add_c(short input_dc, unsigned char *pred_ptr, unsigned char *dst_ptr, int pitch, int stride);
extern void vp8_short_inv_walsh4x4_c(short *input, short *output);
extern void vp8_short_inv_walsh4x4_1_c(short *input, short *output);
const char *idctCompileOptions = "-Ivp8/common/opencl";
const char *idctllm_cl_file_name = "vp8/common/opencl/idctllm_cl.cl";

View File

@@ -1,427 +0,0 @@
#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable
#pragma OPENCL EXTENSION cl_amd_printf : enable
typedef unsigned char uc;
typedef signed char sc;
__inline signed char vp8_filter_mask(sc, sc, uc, uc, uc, uc, uc, uc, uc, uc);
__inline signed char vp8_simple_filter_mask(signed char, signed char, uc, uc, uc, uc);
__inline signed char vp8_hevmask(signed char, uc, uc, uc, uc);
__inline signed char vp8_signed_char_clamp(int);
__inline void vp8_mbfilter(signed char mask,signed char hev,global uc *op2,
global uc *op1,global uc *op0,global uc *oq0,global uc *oq1,global uc *oq2);
void vp8_simple_filter(signed char mask,global uc *base, int op1_off,int op0_off,int oq0_off,int oq1_off);
typedef struct
{
signed char lim[16];
signed char flim[16];
signed char thr[16];
signed char mbflim[16];
signed char mbthr[16];
signed char uvlim[16];
signed char uvflim[16];
signed char uvthr[16];
signed char uvmbflim[16];
signed char uvmbthr[16];
} loop_filter_info;
void vp8_filter(
signed char mask,
signed char hev,
global uc *base,
int op1_off,
int op0_off,
int oq0_off,
int oq1_off
)
{
global uc *op1 = &base[op1_off];
global uc *op0 = &base[op0_off];
global uc *oq0 = &base[oq0_off];
global uc *oq1 = &base[oq1_off];
signed char ps0, qs0;
signed char ps1, qs1;
signed char vp8_filter, Filter1, Filter2;
signed char u;
ps1 = (signed char) * op1 ^ 0x80;
ps0 = (signed char) * op0 ^ 0x80;
qs0 = (signed char) * oq0 ^ 0x80;
qs1 = (signed char) * oq1 ^ 0x80;
/* add outer taps if we have high edge variance */
vp8_filter = vp8_signed_char_clamp(ps1 - qs1);
vp8_filter &= hev;
/* inner taps */
vp8_filter = vp8_signed_char_clamp(vp8_filter + 3 * (qs0 - ps0));
vp8_filter &= mask;
/* save bottom 3 bits so that we round one side +4 and the other +3
* if it equals 4 we'll set to adjust by -1 to account for the fact
* we'd round 3 the other way
*/
Filter1 = vp8_signed_char_clamp(vp8_filter + 4);
Filter2 = vp8_signed_char_clamp(vp8_filter + 3);
Filter1 >>= 3;
Filter2 >>= 3;
u = vp8_signed_char_clamp(qs0 - Filter1);
*oq0 = u ^ 0x80;
u = vp8_signed_char_clamp(ps0 + Filter2);
*op0 = u ^ 0x80;
vp8_filter = Filter1;
/* outer tap adjustments */
vp8_filter += 1;
vp8_filter >>= 1;
vp8_filter &= ~hev;
u = vp8_signed_char_clamp(qs1 - vp8_filter);
*oq1 = u ^ 0x80;
u = vp8_signed_char_clamp(ps1 + vp8_filter);
*op1 = u ^ 0x80;
}
kernel void vp8_loop_filter_horizontal_edge_kernel
(
global unsigned char *s_base,
int s_off,
int p, /* pitch */
global signed char *flimit,
global signed char *limit,
global signed char *thresh,
int off_stride
)
{
int hev = 0; /* high edge variance */
signed char mask = 0;
int i = get_global_id(0);
if (i < get_global_size(0)){
s_off += i;
mask = vp8_filter_mask(limit[i], flimit[i], s_base[s_off - 4*p],
s_base[s_off - 3*p], s_base[s_off - 2*p], s_base[s_off - p],
s_base[s_off], s_base[s_off + p], s_base[s_off + 2*p],
s_base[s_off + 3*p]);
hev = vp8_hevmask(thresh[i], s_base[s_off - 2*p], s_base[s_off - p],
s_base[s_off], s_base[s_off+p]);
vp8_filter(mask, hev, s_base, s_off - 2 * p, s_off - p, s_off,
s_off + p);
}
}
kernel void vp8_loop_filter_vertical_edge_kernel
(
global unsigned char *s_base,
int s_off,
int p,
global signed char *flimit,
global signed char *limit,
global signed char *thresh,
int off_stride
)
{
int hev = 0; /* high edge variance */
signed char mask = 0;
int i = get_global_id(0);
if ( i < get_global_size(0) ){
s_off += p * i;
mask = vp8_filter_mask(limit[i], flimit[i],
s_base[s_off-4], s_base[s_off-3], s_base[s_off-2],
s_base[s_off-1], s_base[s_off], s_base[s_off+1],
s_base[s_off+2], s_base[s_off+3]);
hev = vp8_hevmask(thresh[i], s_base[s_off-2], s_base[s_off-1],
s_base[s_off], s_base[s_off+1]);
vp8_filter(mask, hev, s_base, s_off - 2, s_off - 1, s_off, s_off + 1);
}
}
kernel void vp8_mbloop_filter_horizontal_edge_kernel
(
global unsigned char *s_base,
int s_off,
int p,
global signed char *flimit,
global signed char *limit,
global signed char *thresh,
int off_stride
)
{
global uc *s = s_base+s_off;
signed char hev = 0; /* high edge variance */
signed char mask = 0;
int i = get_global_id(0);
if (i < get_global_size(0)){
s += i;
mask = vp8_filter_mask(limit[i], flimit[i],
s[-4*p], s[-3*p], s[-2*p], s[-1*p],
s[0*p], s[1*p], s[2*p], s[3*p]);
hev = vp8_hevmask(thresh[i], s[-2*p], s[-1*p], s[0*p], s[1*p]);
vp8_mbfilter(mask, hev, s - 3 * p, s - 2 * p, s - 1 * p, s, s + 1 * p, s + 2 * p);
}
}
kernel void vp8_mbloop_filter_vertical_edge_kernel
(
global unsigned char *s_base,
int s_off,
int p,
global signed char *flimit,
global signed char *limit,
global signed char *thresh,
int off_stride
)
{
global uc *s = s_base + s_off;
signed char hev = 0; /* high edge variance */
signed char mask = 0;
int i = get_global_id(0);
if (i < get_global_size(0)){
s += p * i;
mask = vp8_filter_mask(limit[i], flimit[i],
s[-4], s[-3], s[-2], s[-1], s[0], s[1], s[2], s[3]);
hev = vp8_hevmask(thresh[i], s[-2], s[-1], s[0], s[1]);
vp8_mbfilter(mask, hev, s - 3, s - 2, s - 1, s, s + 1, s + 2);
}
}
kernel void vp8_loop_filter_simple_horizontal_edge_kernel
(
global unsigned char *s_base,
int s_off,
int p,
global const signed char *flimit,
global const signed char *limit,
global const signed char *thresh,
int off_stride
)
{
signed char mask = 0;
int i = get_global_id(0);
(void) thresh;
if (i < get_global_size(0))
{
s_off += i;
mask = vp8_simple_filter_mask(limit[i], flimit[i], s_base[s_off-2*p], s_base[s_off-p], s_base[s_off], s_base[s_off+p]);
vp8_simple_filter(mask, s_base, s_off - 2 * p, s_off - 1 * p, s_off, s_off + 1 * p);
}
}
kernel void vp8_loop_filter_simple_vertical_edge_kernel
(
global unsigned char *s_base,
int s_off,
int p,
global signed char *flimit,
global signed char *limit,
global signed char *thresh,
int off_stride
)
{
signed char mask = 0;
int i = get_global_id(0);
(void) thresh;
if (i < get_global_size(0)){
s_off += p * i;
mask = vp8_simple_filter_mask(limit[i], flimit[i], s_base[s_off-2], s_base[s_off-1], s_base[s_off], s_base[s_off+1]);
vp8_simple_filter(mask, s_base, s_off - 2, s_off - 1, s_off, s_off + 1);
}
}
//Inline and non-kernel functions follow.
__inline void vp8_mbfilter(
signed char mask,
signed char hev,
global uc *op2,
global uc *op1,
global uc *op0,
global uc *oq0,
global uc *oq1,
global uc *oq2
)
{
signed char s, u;
signed char vp8_filter, Filter1, Filter2;
signed char ps2 = (signed char) * op2 ^ 0x80;
signed char ps1 = (signed char) * op1 ^ 0x80;
signed char ps0 = (signed char) * op0 ^ 0x80;
signed char qs0 = (signed char) * oq0 ^ 0x80;
signed char qs1 = (signed char) * oq1 ^ 0x80;
signed char qs2 = (signed char) * oq2 ^ 0x80;
/* add outer taps if we have high edge variance */
vp8_filter = vp8_signed_char_clamp(ps1 - qs1);
vp8_filter = vp8_signed_char_clamp(vp8_filter + 3 * (qs0 - ps0));
vp8_filter &= mask;
Filter2 = vp8_filter;
Filter2 &= hev;
/* save bottom 3 bits so that we round one side +4 and the other +3 */
Filter1 = vp8_signed_char_clamp(Filter2 + 4);
Filter2 = vp8_signed_char_clamp(Filter2 + 3);
Filter1 >>= 3;
Filter2 >>= 3;
qs0 = vp8_signed_char_clamp(qs0 - Filter1);
ps0 = vp8_signed_char_clamp(ps0 + Filter2);
/* only apply wider filter if not high edge variance */
vp8_filter &= ~hev;
Filter2 = vp8_filter;
/* roughly 3/7th difference across boundary */
u = vp8_signed_char_clamp((63 + Filter2 * 27) >> 7);
s = vp8_signed_char_clamp(qs0 - u);
*oq0 = s ^ 0x80;
s = vp8_signed_char_clamp(ps0 + u);
*op0 = s ^ 0x80;
/* roughly 2/7th difference across boundary */
u = vp8_signed_char_clamp((63 + Filter2 * 18) >> 7);
s = vp8_signed_char_clamp(qs1 - u);
*oq1 = s ^ 0x80;
s = vp8_signed_char_clamp(ps1 + u);
*op1 = s ^ 0x80;
/* roughly 1/7th difference across boundary */
u = vp8_signed_char_clamp((63 + Filter2 * 9) >> 7);
s = vp8_signed_char_clamp(qs2 - u);
*oq2 = s ^ 0x80;
s = vp8_signed_char_clamp(ps2 + u);
*op2 = s ^ 0x80;
}
__inline signed char vp8_signed_char_clamp(int t)
{
t = (t < -128 ? -128 : t);
t = (t > 127 ? 127 : t);
return (signed char) t;
}
/* is there high variance internal edge ( 11111111 yes, 00000000 no) */
__inline signed char vp8_hevmask(signed char thresh, uc p1, uc p0, uc q0, uc q1)
{
signed char hev = 0;
hev |= (abs(p1 - p0) > thresh) * -1;
hev |= (abs(q1 - q0) > thresh) * -1;
return hev;
}
/* should we apply any filter at all ( 11111111 yes, 00000000 no) */
__inline signed char vp8_filter_mask(
signed char limit,
signed char flimit,
uc p3, uc p2, uc p1, uc p0, uc q0, uc q1, uc q2, uc q3)
{
signed char mask = 0;
mask |= (abs(p3 - p2) > limit) * -1;
mask |= (abs(p2 - p1) > limit) * -1;
mask |= (abs(p1 - p0) > limit) * -1;
mask |= (abs(q1 - q0) > limit) * -1;
mask |= (abs(q2 - q1) > limit) * -1;
mask |= (abs(q3 - q2) > limit) * -1;
mask |= (abs(p0 - q0) * 2 + abs(p1 - q1) / 2 > flimit * 2 + limit) * -1;
mask = ~mask;
return mask;
}
/* should we apply any filter at all ( 11111111 yes, 00000000 no) */
__inline signed char vp8_simple_filter_mask(
signed char limit,
signed char flimit,
uc p1,
uc p0,
uc q0,
uc q1
)
{
signed char mask = (abs(p0 - q0) * 2 + abs(p1 - q1) / 2 <= flimit * 2 + limit) * -1;
return mask;
}
void vp8_simple_filter(
signed char mask,
global uc *base,
int op1_off,
int op0_off,
int oq0_off,
int oq1_off
)
{
global uc *op1 = base + op1_off;
global uc *op0 = base + op0_off;
global uc *oq0 = base + oq0_off;
global uc *oq1 = base + oq1_off;
signed char vp8_filter, Filter1, Filter2;
signed char p1 = (signed char) * op1 ^ 0x80;
signed char p0 = (signed char) * op0 ^ 0x80;
signed char q0 = (signed char) * oq0 ^ 0x80;
signed char q1 = (signed char) * oq1 ^ 0x80;
signed char u;
vp8_filter = vp8_signed_char_clamp(p1 - q1);
vp8_filter = vp8_signed_char_clamp(vp8_filter + 3 * (q0 - p0));
vp8_filter &= mask;
/* save bottom 3 bits so that we round one side +4 and the other +3 */
Filter1 = vp8_signed_char_clamp(vp8_filter + 4);
Filter1 >>= 3;
u = vp8_signed_char_clamp(q0 - Filter1);
*oq0 = u ^ 0x80;
Filter2 = vp8_signed_char_clamp(vp8_filter + 3);
Filter2 >>= 3;
u = vp8_signed_char_clamp(p0 + Filter2);
*op0 = u ^ 0x80;
}

View File

@@ -1,457 +0,0 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include "../../../vpx_ports/config.h"
#include "loopfilter_cl.h"
#include "../onyxc_int.h"
#include "vpx_config.h"
#include "vp8_opencl.h"
#include "blockd_cl.h"
const char *loopFilterCompileOptions = "-Ivp8/common/opencl";
const char *loop_filter_cl_file_name = "vp8/common/opencl/loopfilter.cl";
typedef unsigned char uc;
extern void vp8_loop_filter_frame
(
VP8_COMMON *cm,
MACROBLOCKD *mbd,
int default_filt_lvl
);
prototype_loopfilter_cl(vp8_loop_filter_horizontal_edge_cl);
prototype_loopfilter_cl(vp8_loop_filter_vertical_edge_cl);
prototype_loopfilter_cl(vp8_mbloop_filter_horizontal_edge_cl);
prototype_loopfilter_cl(vp8_mbloop_filter_vertical_edge_cl);
prototype_loopfilter_cl(vp8_loop_filter_simple_horizontal_edge_cl);
prototype_loopfilter_cl(vp8_loop_filter_simple_vertical_edge_cl);
/* Horizontal MB filtering */
void vp8_loop_filter_mbh_cl(
MACROBLOCKD *x,
cl_mem buf_base,
int y_off,
int u_off,
int v_off,
int y_stride,
int uv_stride,
loop_filter_info *lfi,
int simpler_lpf
)
{
(void) simpler_lpf;
vp8_mbloop_filter_horizontal_edge_cl(x, buf_base, y_off, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2, 1);
vp8_mbloop_filter_horizontal_edge_cl(x, buf_base, u_off, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1, 1);
vp8_mbloop_filter_horizontal_edge_cl(x, buf_base, v_off, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1, 1);
}
void vp8_loop_filter_mbhs_cl(MACROBLOCKD *x, cl_mem buf_base, int y_off, int u_off, int v_off,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_horizontal_edge_cl(x, buf_base, y_off, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2, 1);
}
/* Vertical MB Filtering */
void vp8_loop_filter_mbv_cl(MACROBLOCKD *x, cl_mem buf_base, int y_off, int u_off, int v_off,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) simpler_lpf;
vp8_mbloop_filter_vertical_edge_cl(x, buf_base, y_off, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2, 1);
vp8_mbloop_filter_vertical_edge_cl(x, buf_base, u_off, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1, 1);
vp8_mbloop_filter_vertical_edge_cl(x, buf_base, v_off, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1, 1);
}
void vp8_loop_filter_mbvs_cl(MACROBLOCKD *x, cl_mem buf_base, int y_off, int u_off, int v_off,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_vertical_edge_cl(x, buf_base, y_off, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2, 1);
}
/* Horizontal B Filtering */
void vp8_loop_filter_bh_cl(MACROBLOCKD *x, cl_mem buf_base, int y_off, int u_off, int v_off,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) simpler_lpf;
vp8_loop_filter_horizontal_edge_cl(x, buf_base, y_off + 4 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
vp8_loop_filter_horizontal_edge_cl(x, buf_base, y_off + 8 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
vp8_loop_filter_horizontal_edge_cl(x, buf_base, y_off + 12 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
vp8_loop_filter_horizontal_edge_cl(x, buf_base, u_off + 4 * uv_stride, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1, 1);
vp8_loop_filter_horizontal_edge_cl(x, buf_base, v_off + 4 * uv_stride, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1, 1);
}
void vp8_loop_filter_bhs_cl(MACROBLOCKD *x, cl_mem buf_base, int y_off, int u_off, int v_off,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_horizontal_edge_cl(x, buf_base, y_off + 4 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
vp8_loop_filter_simple_horizontal_edge_cl(x, buf_base, y_off + 8 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
vp8_loop_filter_simple_horizontal_edge_cl(x, buf_base, y_off + 12 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
}
/* Vertical B Filtering */
void vp8_loop_filter_bv_cl(MACROBLOCKD *x, cl_mem buf_base, int y_off, int u_off, int v_off,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) simpler_lpf;
vp8_loop_filter_vertical_edge_cl(x, buf_base, y_off + 4, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
vp8_loop_filter_vertical_edge_cl(x, buf_base, y_off + 8, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
vp8_loop_filter_vertical_edge_cl(x, buf_base, y_off + 12, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
vp8_loop_filter_vertical_edge_cl(x, buf_base, u_off + 4, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1, 1);
vp8_loop_filter_vertical_edge_cl(x, buf_base, v_off + 4, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1, 1);
}
void vp8_loop_filter_bvs_cl(MACROBLOCKD *x, cl_mem buf_base, int y_off, int u_off, int v_off,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_vertical_edge_cl(x, buf_base, y_off + 4, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
vp8_loop_filter_simple_vertical_edge_cl(x, buf_base, y_off + 8, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
vp8_loop_filter_simple_vertical_edge_cl(x, buf_base, y_off + 12, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
}
void vp8_init_loop_filter_cl(VP8_COMMON *cm)
{
loop_filter_info *lfi = cm->lf_info;
int sharpness_lvl = cm->sharpness_level;
int frame_type = cm->frame_type;
int i, j;
int block_inside_limit = 0;
int HEVThresh;
const int yhedge_boost = 2;
/* For each possible value for the loop filter fill out a "loop_filter_info" entry. */
for (i = 0; i <= MAX_LOOP_FILTER; i++)
{
int filt_lvl = i;
if (frame_type == KEY_FRAME)
{
if (filt_lvl >= 40)
HEVThresh = 2;
else if (filt_lvl >= 15)
HEVThresh = 1;
else
HEVThresh = 0;
}
else
{
if (filt_lvl >= 40)
HEVThresh = 3;
else if (filt_lvl >= 20)
HEVThresh = 2;
else if (filt_lvl >= 15)
HEVThresh = 1;
else
HEVThresh = 0;
}
/* Set loop filter paramaeters that control sharpness. */
block_inside_limit = filt_lvl >> (sharpness_lvl > 0);
block_inside_limit = block_inside_limit >> (sharpness_lvl > 4);
if (sharpness_lvl > 0)
{
if (block_inside_limit > (9 - sharpness_lvl))
block_inside_limit = (9 - sharpness_lvl);
}
if (block_inside_limit < 1)
block_inside_limit = 1;
for (j = 0; j < 16; j++)
{
lfi[i].lim[j] = block_inside_limit;
lfi[i].mbflim[j] = filt_lvl + yhedge_boost;
lfi[i].flim[j] = filt_lvl;
lfi[i].thr[j] = HEVThresh;
}
}
}
/* Put vp8_init_loop_filter() in vp8dx_create_decompressor(). Only call vp8_frame_init_loop_filter() while decoding
* each frame. Check last_frame_type to skip the function most of times.
*/
void vp8_frame_init_loop_filter_cl(loop_filter_info *lfi, int frame_type)
{
int HEVThresh;
int i, j;
/* For each possible value for the loop filter fill out a "loop_filter_info" entry. */
for (i = 0; i <= MAX_LOOP_FILTER; i++)
{
int filt_lvl = i;
if (frame_type == KEY_FRAME)
{
if (filt_lvl >= 40)
HEVThresh = 2;
else if (filt_lvl >= 15)
HEVThresh = 1;
else
HEVThresh = 0;
}
else
{
if (filt_lvl >= 40)
HEVThresh = 3;
else if (filt_lvl >= 20)
HEVThresh = 2;
else if (filt_lvl >= 15)
HEVThresh = 1;
else
HEVThresh = 0;
}
for (j = 0; j < 16; j++)
{
lfi[i].thr[j] = HEVThresh;
}
}
}
//This might not need to be copied from loopfilter.c
void vp8_adjust_mb_lf_value_cl(MACROBLOCKD *mbd, int *filter_level)
{
MB_MODE_INFO *mbmi = &mbd->mode_info_context->mbmi;
if (mbd->mode_ref_lf_delta_enabled)
{
/* Apply delta for reference frame */
*filter_level += mbd->ref_lf_deltas[mbmi->ref_frame];
/* Apply delta for mode */
if (mbmi->ref_frame == INTRA_FRAME)
{
/* Only the split mode BPRED has a further special case */
if (mbmi->mode == B_PRED)
*filter_level += mbd->mode_lf_deltas[0];
}
else
{
/* Zero motion mode */
if (mbmi->mode == ZEROMV)
*filter_level += mbd->mode_lf_deltas[1];
/* Split MB motion mode */
else if (mbmi->mode == SPLITMV)
*filter_level += mbd->mode_lf_deltas[3];
/* All other inter motion modes (Nearest, Near, New) */
else
*filter_level += mbd->mode_lf_deltas[2];
}
/* Range check */
if (*filter_level > MAX_LOOP_FILTER)
*filter_level = MAX_LOOP_FILTER;
else if (*filter_level < 0)
*filter_level = 0;
}
}
//Start of externally callable functions.
int cl_init_loop_filter() {
int err;
// Create the filter compute program from the file-defined source code
if ( cl_load_program(&cl_data.loop_filter_program, loop_filter_cl_file_name,
loopFilterCompileOptions) != CL_SUCCESS )
return VP8_CL_TRIED_BUT_FAILED;
// Create the compute kernels in the program we wish to run
VP8_CL_CREATE_KERNEL(cl_data,loop_filter_program,vp8_loop_filter_horizontal_edge_kernel,"vp8_loop_filter_horizontal_edge_kernel");
VP8_CL_CREATE_KERNEL(cl_data,loop_filter_program,vp8_loop_filter_vertical_edge_kernel,"vp8_loop_filter_vertical_edge_kernel");
VP8_CL_CREATE_KERNEL(cl_data,loop_filter_program,vp8_mbloop_filter_horizontal_edge_kernel,"vp8_mbloop_filter_horizontal_edge_kernel");
VP8_CL_CREATE_KERNEL(cl_data,loop_filter_program,vp8_mbloop_filter_vertical_edge_kernel,"vp8_mbloop_filter_vertical_edge_kernel");
VP8_CL_CREATE_KERNEL(cl_data,loop_filter_program,vp8_loop_filter_simple_horizontal_edge_kernel,"vp8_loop_filter_simple_horizontal_edge_kernel");
VP8_CL_CREATE_KERNEL(cl_data,loop_filter_program,vp8_loop_filter_simple_vertical_edge_kernel,"vp8_loop_filter_simple_vertical_edge_kernel");
return CL_SUCCESS;
}
void cl_destroy_loop_filter(){
if (cl_data.loop_filter_program)
clReleaseProgram(cl_data.loop_filter_program);
VP8_CL_RELEASE_KERNEL(cl_data.vp8_loop_filter_horizontal_edge_kernel);
VP8_CL_RELEASE_KERNEL(cl_data.vp8_loop_filter_vertical_edge_kernel);
VP8_CL_RELEASE_KERNEL(cl_data.vp8_mbloop_filter_horizontal_edge_kernel);
VP8_CL_RELEASE_KERNEL(cl_data.vp8_mbloop_filter_vertical_edge_kernel);
VP8_CL_RELEASE_KERNEL(cl_data.vp8_loop_filter_simple_horizontal_edge_kernel);
VP8_CL_RELEASE_KERNEL(cl_data.vp8_loop_filter_simple_vertical_edge_kernel);
cl_data.loop_filter_program = NULL;
}
void vp8_loop_filter_set_baselines_cl(MACROBLOCKD *mbd, int default_filt_lvl, int *baseline_filter_level){
int alt_flt_enabled = mbd->segmentation_enabled;
int i;
if (alt_flt_enabled)
{
for (i = 0; i < MAX_MB_SEGMENTS; i++)
{
/* Abs value */
if (mbd->mb_segement_abs_delta == SEGMENT_ABSDATA)
baseline_filter_level[i] = mbd->segment_feature_data[MB_LVL_ALT_LF][i];
/* Delta Value */
else
{
baseline_filter_level[i] = default_filt_lvl + mbd->segment_feature_data[MB_LVL_ALT_LF][i];
baseline_filter_level[i] = (baseline_filter_level[i] >= 0) ? ((baseline_filter_level[i] <= MAX_LOOP_FILTER) ? baseline_filter_level[i] : MAX_LOOP_FILTER) : 0; /* Clamp to valid range */
}
}
}
else
{
for (i = 0; i < MAX_MB_SEGMENTS; i++)
baseline_filter_level[i] = default_filt_lvl;
}
}
void vp8_loop_filter_frame_cl
(
VP8_COMMON *cm,
MACROBLOCKD *mbd,
int default_filt_lvl
)
{
YV12_BUFFER_CONFIG *post = cm->frame_to_show;
loop_filter_info *lfi = cm->lf_info;
FRAME_TYPE frame_type = cm->frame_type;
LOOPFILTERTYPE filter_type = cm->filter_type;
int mb_row;
int mb_col;
int baseline_filter_level[MAX_MB_SEGMENTS];
int filter_level;
int alt_flt_enabled = mbd->segmentation_enabled;
int err;
unsigned char *buf_base;
int y_off, u_off, v_off;
//unsigned char *y_ptr, *u_ptr, *v_ptr;
mbd->mode_info_context = cm->mi; /* Point at base of Mb MODE_INFO list */
/* Note the baseline filter values for each segment */
vp8_loop_filter_set_baselines_cl(mbd, default_filt_lvl, baseline_filter_level);
/* Initialize the loop filter for this frame. */
if ((cm->last_filter_type != cm->filter_type) || (cm->last_sharpness_level != cm->sharpness_level))
vp8_init_loop_filter_cl(cm);
else if (frame_type != cm->last_frame_type)
vp8_frame_init_loop_filter_cl(lfi, frame_type);
/* Set up the buffer pointers */
buf_base = post->buffer_alloc;
y_off = post->y_buffer - buf_base;
u_off = post->u_buffer - buf_base;
v_off = post->v_buffer - buf_base;
VP8_CL_SET_BUF(mbd->cl_commands, post->buffer_mem, post->buffer_size, post->buffer_alloc,
vp8_loop_filter_frame(cm,mbd,default_filt_lvl),);
/* vp8_filter each macro block */
for (mb_row = 0; mb_row < cm->mb_rows; mb_row++)
{
for (mb_col = 0; mb_col < cm->mb_cols; mb_col++)
{
int Segment = (alt_flt_enabled) ? mbd->mode_info_context->mbmi.segment_id : 0;
filter_level = baseline_filter_level[Segment];
/* Distance of Mb to the various image edges.
* These specified to 8th pel as they are always compared to values
* that are in 1/8th pel units. Apply any context driven MB level
* adjustment
*/
filter_level = vp8_adjust_mb_lf_value(mbd, filter_level);
if (filter_level)
{
if (mb_col > 0){
if (filter_type == NORMAL_LOOPFILTER)
vp8_loop_filter_mbv_cl(mbd, post->buffer_mem, y_off, u_off, v_off, post->y_stride, post->uv_stride, &lfi[filter_level], cm->simpler_lpf);
else
vp8_loop_filter_mbvs_cl(mbd, post->buffer_mem, y_off, u_off, v_off, post->y_stride, post->uv_stride, &lfi[filter_level], cm->simpler_lpf);
}
if (mbd->mode_info_context->mbmi.dc_diff > 0){
if (filter_type == NORMAL_LOOPFILTER)
vp8_loop_filter_bv_cl(mbd, post->buffer_mem, y_off, u_off, v_off, post->y_stride, post->uv_stride, &lfi[filter_level], cm->simpler_lpf);
else
vp8_loop_filter_bvs_cl(mbd, post->buffer_mem, y_off, u_off, v_off, post->y_stride, post->uv_stride, &lfi[filter_level], cm->simpler_lpf);
}
/* don't apply across umv border */
if (mb_row > 0){
if (filter_type == NORMAL_LOOPFILTER)
vp8_loop_filter_mbh_cl(mbd, post->buffer_mem, y_off, u_off, v_off, post->y_stride, post->uv_stride, &lfi[filter_level], cm->simpler_lpf);
else
vp8_loop_filter_mbhs_cl(mbd, post->buffer_mem, y_off, u_off, v_off, post->y_stride, post->uv_stride, &lfi[filter_level], cm->simpler_lpf);
}
if (mbd->mode_info_context->mbmi.dc_diff > 0){
if (filter_type == NORMAL_LOOPFILTER)
vp8_loop_filter_bh_cl(mbd, post->buffer_mem, y_off, u_off, v_off, post->y_stride, post->uv_stride, &lfi[filter_level], cm->simpler_lpf);
else
vp8_loop_filter_bhs_cl(mbd, post->buffer_mem, y_off, u_off, v_off, post->y_stride, post->uv_stride, &lfi[filter_level], cm->simpler_lpf);
}
}
y_off += 16;
u_off += 8;
v_off += 8;
mbd->mode_info_context++; /* step to next MB */
}
y_off += post->y_stride * 16 - post->y_width;
u_off += post->uv_stride * 8 - post->uv_width;
v_off += post->uv_stride * 8 - post->uv_width;
mbd->mode_info_context++; /* Skip border mb */
}
//Retrieve buffer contents
err = clEnqueueReadBuffer(mbd->cl_commands, post->buffer_mem, CL_FALSE, 0, post->buffer_size, post->buffer_alloc, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS(mbd->cl_commands, err != CL_SUCCESS,
"Error: Failed to read loop filter output!\n",
,
);
VP8_CL_FINISH(mbd->cl_commands);
}

View File

@@ -1,48 +0,0 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#ifndef loopfilter_cl_h
#define loopfilter_cl_h
#include "../../../vpx_ports/mem.h"
#include "../onyxc_int.h"
#include "blockd_cl.h"
#include "../loopfilter.h"
#define prototype_loopfilter_cl(sym) \
void sym(MACROBLOCKD*, cl_mem src_base, int src_offset, \
int pitch, const signed char *flimit, \
const signed char *limit, const signed char *thresh, int count, int block_cnt)
#define prototype_loopfilter_block_cl(sym) \
void sym(MACROBLOCKD*, unsigned char *y, unsigned char *u, unsigned char *v,\
int ystride, int uv_stride, loop_filter_info *lfi, int simpler)
extern void vp8_loop_filter_frame_cl
(
VP8_COMMON *cm,
MACROBLOCKD *mbd,
int default_filt_lvl
);
extern prototype_loopfilter_block_cl(vp8_lf_normal_mb_v_cl);
extern prototype_loopfilter_block_cl(vp8_lf_normal_b_v_cl);
extern prototype_loopfilter_block_cl(vp8_lf_normal_mb_h_cl);
extern prototype_loopfilter_block_cl(vp8_lf_normal_b_h_cl);
extern prototype_loopfilter_block_cl(vp8_lf_simple_mb_v_cl);
extern prototype_loopfilter_block_cl(vp8_lf_simple_b_v_cl);
extern prototype_loopfilter_block_cl(vp8_lf_simple_mb_h_cl);
extern prototype_loopfilter_block_cl(vp8_lf_simple_b_h_cl);
typedef prototype_loopfilter_block_cl((*vp8_lf_block_cl_fn_t));
#endif

View File

@@ -1,187 +0,0 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include <stdlib.h>
#include <stdio.h>
#include "vpx_ports/config.h"
#include "vp8_opencl.h"
#include "blockd_cl.h"
//#include "loopfilter_cl.h"
//#include "../onyxc_int.h"
typedef unsigned char uc;
static void vp8_loop_filter_cl_run(
cl_command_queue cq,
cl_kernel kernel,
cl_mem buf_mem,
int s_off,
int p,
const signed char *flimit,
const signed char *limit,
const signed char *thresh,
int count,
int block_cnt
){
size_t global[] = {count,block_cnt};
int err;
cl_mem flimit_mem;
cl_mem limit_mem;
cl_mem thresh_mem;
VP8_CL_CREATE_BUF(cq, flimit_mem, , sizeof(uc)*16, flimit,, );
VP8_CL_CREATE_BUF(cq, limit_mem, , sizeof(uc)*16, limit,, );
VP8_CL_CREATE_BUF(cq, thresh_mem, , sizeof(uc)*16, thresh,, );
err = 0;
err = clSetKernelArg(kernel, 0, sizeof (cl_mem), &buf_mem);
err |= clSetKernelArg(kernel, 1, sizeof (cl_int), &s_off);
err |= clSetKernelArg(kernel, 2, sizeof (cl_int), &p);
err |= clSetKernelArg(kernel, 3, sizeof (cl_mem), &flimit_mem);
err |= clSetKernelArg(kernel, 4, sizeof (cl_mem), &limit_mem);
err |= clSetKernelArg(kernel, 5, sizeof (cl_mem), &thresh_mem);
err |= clSetKernelArg(kernel, 6, sizeof (cl_int), &block_cnt);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to set kernel arguments!\n",,
);
/* Execute the kernel */
err = clEnqueueNDRangeKernel(cq, kernel, 2, NULL, global, NULL , 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to execute kernel!\n",
printf("err = %d\n",err);,
);
clReleaseMemObject(flimit_mem);
clReleaseMemObject(limit_mem);
clReleaseMemObject(thresh_mem);
VP8_CL_FINISH(cq);
}
void vp8_loop_filter_horizontal_edge_cl
(
MACROBLOCKD *x,
cl_mem s_base,
int s_off,
int p, /* pitch */
const signed char *flimit,
const signed char *limit,
const signed char *thresh,
int count,
int block_cnt
)
{
vp8_loop_filter_cl_run(x->cl_commands,
cl_data.vp8_loop_filter_horizontal_edge_kernel, s_base, s_off,
p, flimit, limit, thresh, count*8, block_cnt
);
}
void vp8_loop_filter_vertical_edge_cl
(
MACROBLOCKD *x,
cl_mem s_base,
int s_off,
int p,
const signed char *flimit,
const signed char *limit,
const signed char *thresh,
int count,
int block_cnt
)
{
vp8_loop_filter_cl_run(x->cl_commands,
cl_data.vp8_loop_filter_vertical_edge_kernel, s_base, s_off,
p, flimit, limit, thresh, count*8, block_cnt
);
}
void vp8_mbloop_filter_horizontal_edge_cl
(
MACROBLOCKD *x,
cl_mem s_base,
int s_off,
int p,
const signed char *flimit,
const signed char *limit,
const signed char *thresh,
int count,
int block_cnt
)
{
vp8_loop_filter_cl_run(x->cl_commands,
cl_data.vp8_mbloop_filter_horizontal_edge_kernel, s_base, s_off,
p, flimit, limit, thresh, count*8, block_cnt
);
}
void vp8_mbloop_filter_vertical_edge_cl
(
MACROBLOCKD *x,
cl_mem s_base,
int s_off,
int p,
const signed char *flimit,
const signed char *limit,
const signed char *thresh,
int count,
int block_cnt
)
{
vp8_loop_filter_cl_run(x->cl_commands,
cl_data.vp8_mbloop_filter_vertical_edge_kernel, s_base, s_off,
p, flimit, limit, thresh, count*8, block_cnt
);
}
void vp8_loop_filter_simple_horizontal_edge_cl
(
MACROBLOCKD *x,
cl_mem s_base,
int s_off,
int p,
const signed char *flimit,
const signed char *limit,
const signed char *thresh,
int count,
int block_cnt
)
{
vp8_loop_filter_cl_run(x->cl_commands,
cl_data.vp8_loop_filter_simple_horizontal_edge_kernel, s_base, s_off,
p, flimit, limit, thresh, count*8, block_cnt
);
}
void vp8_loop_filter_simple_vertical_edge_cl
(
MACROBLOCKD *x,
cl_mem s_base,
int s_off,
int p,
const signed char *flimit,
const signed char *limit,
const signed char *thresh,
int count,
int block_cnt
)
{
vp8_loop_filter_cl_run(x->cl_commands,
cl_data.vp8_loop_filter_simple_vertical_edge_kernel, s_base, s_off,
p, flimit, limit, thresh, count*8, block_cnt
);
}

View File

@@ -1,41 +0,0 @@
/*
* Copyright (c) 2011 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include "vpx_ports/config.h"
#include "../subpixel.h"
#include "subpixel_cl.h"
#include "../onyxc_int.h"
#include "vp8_opencl.h"
#if HAVE_DLOPEN
#include "dynamic_cl.h"
#endif
void vp8_arch_opencl_common_init(VP8_COMMON *ctx)
{
#if HAVE_DLOPEN
#if WIN32 //Windows .dll has no lib prefix and no extension
cl_loaded = load_cl("OpenCL");
#else //But *nix needs full name
cl_loaded = load_cl("libOpenCL.so");
#endif
if (cl_loaded == CL_SUCCESS)
cl_initialized = cl_common_init();
else
cl_initialized = VP8_CL_TRIED_BUT_FAILED;
#else //!HAVE_DLOPEN (e.g. Apple)
cl_initialized = cl_common_init();
#endif
}

View File

@@ -1,641 +0,0 @@
/*
* Copyright (c) 2011 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
//for the decoder, all subpixel prediction is done in this file.
//
//Need to determine some sort of mechanism for easily determining SIXTAP/BILINEAR
//and what arguments to feed into the kernels. These kernels SHOULD be 2-pass,
//and ideally there'd be a data structure that determined what static arguments
//to pass in.
//
//Also, the only external functions being called here are the subpixel prediction
//functions. Hopefully this means no worrying about when to copy data back/forth.
#include "../../../vpx_ports/config.h"
//#include "../recon.h"
#include "../subpixel.h"
//#include "../blockd.h"
//#include "../reconinter.h"
#if CONFIG_RUNTIME_CPU_DETECT
//#include "../onyxc_int.h"
#endif
#include "vp8_opencl.h"
#include "filter_cl.h"
#include "reconinter_cl.h"
#include "blockd_cl.h"
#include <stdio.h>
/* use this define on systems where unaligned int reads and writes are
* not allowed, i.e. ARM architectures
*/
/*#define MUST_BE_ALIGNED*/
static const int bbb[4] = {0, 2, 8, 10};
static void vp8_memcpy(
unsigned char *src_base,
int src_offset,
int src_stride,
unsigned char *dst_base,
int dst_offset,
int dst_stride,
int num_bytes,
int num_iter
){
int i,r;
unsigned char *src = &src_base[src_offset];
unsigned char *dst = &dst_base[dst_offset];
src_offset = dst_offset = 0;
for (r = 0; r < num_iter; r++){
for (i = 0; i < num_bytes; i++){
src_offset = r*src_stride + i;
dst_offset = r*dst_stride + i;
dst[dst_offset] = src[src_offset];
}
}
}
static void vp8_copy_mem_cl(
cl_command_queue cq,
cl_mem src_mem,
int *src_offsets,
int src_stride,
cl_mem dst_mem,
int *dst_offsets,
int dst_stride,
int num_bytes,
int num_iter,
int num_blocks
){
int err,block;
#if MEM_COPY_KERNEL
size_t global[3] = {num_bytes, num_iter, num_blocks};
size_t local[3];
local[0] = global[0];
local[1] = global[1];
local[2] = global[2];
err = clSetKernelArg(cl_data.vp8_memcpy_kernel, 0, sizeof (cl_mem), &src_mem);
err |= clSetKernelArg(cl_data.vp8_memcpy_kernel, 2, sizeof (int), &src_stride);
err |= clSetKernelArg(cl_data.vp8_memcpy_kernel, 3, sizeof (cl_mem), &dst_mem);
err |= clSetKernelArg(cl_data.vp8_memcpy_kernel, 5, sizeof (int), &dst_stride);
err |= clSetKernelArg(cl_data.vp8_memcpy_kernel, 6, sizeof (int), &num_bytes);
err |= clSetKernelArg(cl_data.vp8_memcpy_kernel, 7, sizeof (int), &num_iter);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to set kernel arguments!\n",
return,
);
for (block = 0; block < num_blocks; block++){
/* Set kernel arguments */
err = clSetKernelArg(cl_data.vp8_memcpy_kernel, 1, sizeof (int), &src_offsets[block]);
err |= clSetKernelArg(cl_data.vp8_memcpy_kernel, 4, sizeof (int), &dst_offsets[block]);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to set kernel arguments!\n",
return,
);
/* Execute the kernel */
if (num_bytes * num_iter > cl_data.vp8_memcpy_kernel_size){
err = clEnqueueNDRangeKernel( cq, cl_data.vp8_memcpy_kernel, 2, NULL, global, NULL , 0, NULL, NULL);
} else {
err = clEnqueueNDRangeKernel( cq, cl_data.vp8_memcpy_kernel, 2, NULL, global, local , 0, NULL, NULL);
}
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to execute kernel!\n",
return,
);
}
#else
int iter;
for (block=0; block < num_blocks; block++){
for (iter = 0; iter < num_iter; iter++){
err = clEnqueueCopyBuffer(cq, src_mem, dst_mem,
src_offsets[block]+iter*src_stride,
dst_offsets[block]+iter*dst_stride,
num_bytes, 0, NULL, NULL
);
VP8_CL_CHECK_SUCCESS(cq, err != CL_SUCCESS, "Error copying between buffers\n",
,
);
}
}
#endif
}
static void vp8_build_inter_predictors_b_cl(MACROBLOCKD *x, BLOCKD *d, int pitch)
{
unsigned char *ptr_base = *(d->base_pre);
int ptr_offset = d->pre + (d->bmi.mv.as_mv.row >> 3) * d->pre_stride + (d->bmi.mv.as_mv.col >> 3);
vp8_subpix_cl_fn_t sppf;
int pre_dist = *d->base_pre - x->pre.buffer_alloc;
cl_mem pre_mem = x->pre.buffer_mem;
int pre_off = pre_dist+ptr_offset;
if (d->sixtap_filter == CL_TRUE)
sppf = vp8_sixtap_predict4x4_cl;
else
sppf = vp8_bilinear_predict4x4_cl;
//ptr_base a.k.a. d->base_pre is the start of the
//Macroblock's y_buffer, u_buffer, or v_buffer
if ( (d->bmi.mv.as_mv.row | d->bmi.mv.as_mv.col) & 7)
{
sppf(d->cl_commands, ptr_base, pre_mem, pre_off, d->pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, d->predictor_base, d->cl_predictor_mem, d->predictor_offset, pitch);
}
else
{
vp8_copy_mem_cl(d->cl_commands, pre_mem, &pre_off, d->pre_stride,d->cl_predictor_mem, &d->predictor_offset,pitch,4,4,1);
}
}
static void vp8_build_inter_predictors4b_cl(MACROBLOCKD *x, BLOCKD *d, int pitch)
{
unsigned char *ptr_base = *(d->base_pre);
int ptr_offset = d->pre + (d->bmi.mv.as_mv.row >> 3) * d->pre_stride + (d->bmi.mv.as_mv.col >> 3);
int pre_dist = *d->base_pre - x->pre.buffer_alloc;
cl_mem pre_mem = x->pre.buffer_mem;
int pre_off = pre_dist + ptr_offset;
//If there's motion in the bottom 8 subpixels, need to do subpixel prediction
if ( (d->bmi.mv.as_mv.row | d->bmi.mv.as_mv.col) & 7)
{
if (d->sixtap_filter == CL_TRUE)
vp8_sixtap_predict8x8_cl(d->cl_commands, ptr_base, pre_mem, pre_off, d->pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, d->predictor_base, d->cl_predictor_mem, d->predictor_offset, pitch);
else
vp8_bilinear_predict8x8_cl(d->cl_commands, ptr_base, pre_mem, pre_off, d->pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, d->predictor_base, d->cl_predictor_mem, d->predictor_offset, pitch);
}
//Otherwise copy memory directly from src to dest
else
{
vp8_copy_mem_cl(d->cl_commands, pre_mem, &pre_off, d->pre_stride, d->cl_predictor_mem, &d->predictor_offset, pitch, 8, 8, 1);
}
}
static void vp8_build_inter_predictors2b_cl(MACROBLOCKD *x, BLOCKD *d, int pitch)
{
unsigned char *ptr_base = *(d->base_pre);
int ptr_offset = d->pre + (d->bmi.mv.as_mv.row >> 3) * d->pre_stride + (d->bmi.mv.as_mv.col >> 3);
int pre_dist = *d->base_pre - x->pre.buffer_alloc;
cl_mem pre_mem = x->pre.buffer_mem;
int pre_off = pre_dist+ptr_offset;
if ( (d->bmi.mv.as_mv.row | d->bmi.mv.as_mv.col) & 7)
{
if (d->sixtap_filter == CL_TRUE)
vp8_sixtap_predict8x4_cl(d->cl_commands,ptr_base,pre_mem,pre_off, d->pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, d->predictor_base, d->cl_predictor_mem, d->predictor_offset, pitch);
else
vp8_bilinear_predict8x4_cl(d->cl_commands,ptr_base,pre_mem,pre_off, d->pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, d->predictor_base, d->cl_predictor_mem, d->predictor_offset, pitch);
}
else
{
vp8_copy_mem_cl(d->cl_commands, pre_mem, &pre_off, d->pre_stride, d->cl_predictor_mem, &d->predictor_offset, pitch, 8, 4, 1);
}
}
void vp8_build_inter_predictors_mbuv_cl(MACROBLOCKD *x)
{
int i;
vp8_cl_mb_prep(x, PREDICTOR|PRE_BUF);
#if !ONE_CQ_PER_MB
VP8_CL_FINISH(x->cl_commands);
#endif
if (x->mode_info_context->mbmi.ref_frame != INTRA_FRAME &&
x->mode_info_context->mbmi.mode != SPLITMV)
{
unsigned char *pred_base = x->predictor;
int upred_offset = 256;
int vpred_offset = 320;
int mv_row = x->block[16].bmi.mv.as_mv.row;
int mv_col = x->block[16].bmi.mv.as_mv.col;
int offset;
unsigned char *pre_base = x->pre.buffer_alloc;
cl_mem pre_mem = x->pre.buffer_mem;
int upre_off = x->pre.u_buffer - pre_base;
int vpre_off = x->pre.v_buffer - pre_base;
int pre_stride = x->block[16].pre_stride;
offset = (mv_row >> 3) * pre_stride + (mv_col >> 3);
if ((mv_row | mv_col) & 7)
{
if (cl_initialized == CL_SUCCESS && x->sixtap_filter == CL_TRUE){
vp8_sixtap_predict8x8_cl(x->block[16].cl_commands,pre_base, pre_mem, upre_off+offset, pre_stride, mv_col & 7, mv_row & 7, pred_base, x->cl_predictor_mem, upred_offset, 8);
vp8_sixtap_predict8x8_cl(x->block[20].cl_commands,pre_base, pre_mem, vpre_off+offset, pre_stride, mv_col & 7, mv_row & 7, pred_base, x->cl_predictor_mem, vpred_offset, 8);
}
else{
vp8_bilinear_predict8x8_cl(x->block[16].cl_commands,pre_base, pre_mem, upre_off+offset, pre_stride, mv_col & 7, mv_row & 7, pred_base, x->cl_predictor_mem, upred_offset, 8);
vp8_bilinear_predict8x8_cl(x->block[20].cl_commands,pre_base, pre_mem, vpre_off+offset, pre_stride, mv_col & 7, mv_row & 7, pred_base, x->cl_predictor_mem, vpred_offset, 8);
}
}
else
{
int pre_offsets[2] = {upre_off+offset, vpre_off+offset};
int pred_offsets[2] = {upred_offset,vpred_offset};
vp8_copy_mem_cl(x->block[16].cl_commands, pre_mem, pre_offsets, pre_stride, x->cl_predictor_mem, pred_offsets, 8, 8, 8, 2);
}
}
else
{
// Can probably batch these operations as well, but not tested in decoder
// (or at least the test videos I've been using.
for (i = 16; i < 24; i += 2)
{
BLOCKD *d0 = &x->block[i];
BLOCKD *d1 = &x->block[i+1];
if (d0->bmi.mv.as_int == d1->bmi.mv.as_int)
vp8_build_inter_predictors2b_cl(x, d0, 8);
else
{
vp8_build_inter_predictors_b_cl(x, d0, 8);
vp8_build_inter_predictors_b_cl(x, d1, 8);
}
}
}
#if !ONE_CQ_PER_MB
VP8_CL_FINISH(x->block[0].cl_commands);
VP8_CL_FINISH(x->block[16].cl_commands);
VP8_CL_FINISH(x->block[20].cl_commands);
#endif
vp8_cl_mb_finish(x, PREDICTOR);
}
void vp8_build_inter_predictors_mb_cl(MACROBLOCKD *x)
{
//If CL is running in encoder, need to call following before proceeding.
//vp8_cl_mb_prep(x, PRE_BUF);
#if !ONE_CQ_PER_MB
VP8_CL_FINISH(x->cl_commands);
#endif
if (x->mode_info_context->mbmi.ref_frame != INTRA_FRAME &&
x->mode_info_context->mbmi.mode != SPLITMV)
{
int offset;
unsigned char *pred_base = x->predictor;
int upred_offset = 256;
int vpred_offset = 320;
int mv_row = x->mode_info_context->mbmi.mv.as_mv.row;
int mv_col = x->mode_info_context->mbmi.mv.as_mv.col;
int pre_stride = x->block[0].pre_stride;
unsigned char *pre_base = x->pre.buffer_alloc;
cl_mem pre_mem = x->pre.buffer_mem;
int ypre_off = x->pre.y_buffer - pre_base + (mv_row >> 3) * pre_stride + (mv_col >> 3);
int upre_off = x->pre.u_buffer - pre_base;
int vpre_off = x->pre.v_buffer - pre_base;
if ((mv_row | mv_col) & 7)
{
if (cl_initialized == CL_SUCCESS && x->sixtap_filter == CL_TRUE){
vp8_sixtap_predict16x16_cl(x->block[0].cl_commands, pre_base, pre_mem, ypre_off, pre_stride, mv_col & 7, mv_row & 7, pred_base, x->cl_predictor_mem, 0, 16);
}
else
vp8_bilinear_predict16x16_cl(x->block[0].cl_commands, pre_base, pre_mem, ypre_off, pre_stride, mv_col & 7, mv_row & 7, pred_base, x->cl_predictor_mem, 0, 16);
}
else
{
//16x16 copy
int pred_off = 0;
vp8_copy_mem_cl(x->block[0].cl_commands, pre_mem, &ypre_off, pre_stride, x->cl_predictor_mem, &pred_off, 16, 16, 16, 1);
}
mv_row = x->block[16].bmi.mv.as_mv.row;
mv_col = x->block[16].bmi.mv.as_mv.col;
pre_stride >>= 1;
offset = (mv_row >> 3) * pre_stride + (mv_col >> 3);
if ((mv_row | mv_col) & 7)
{
if (x->sixtap_filter == CL_TRUE){
vp8_sixtap_predict8x8_cl(x->block[16].cl_commands, pre_base, pre_mem, upre_off+offset, pre_stride, mv_col & 7, mv_row & 7, pred_base, x->cl_predictor_mem, upred_offset, 8);
vp8_sixtap_predict8x8_cl(x->block[20].cl_commands, pre_base, pre_mem, vpre_off+offset, pre_stride, mv_col & 7, mv_row & 7, pred_base, x->cl_predictor_mem, vpred_offset, 8);
}
else {
vp8_bilinear_predict8x8_cl(x->block[16].cl_commands, pre_base, pre_mem, upre_off+offset, pre_stride, mv_col & 7, mv_row & 7, pred_base, x->cl_predictor_mem, upred_offset, 8);
vp8_bilinear_predict8x8_cl(x->block[20].cl_commands, pre_base, pre_mem, vpre_off+offset, pre_stride, mv_col & 7, mv_row & 7, pred_base, x->cl_predictor_mem, vpred_offset, 8);
}
}
else
{
int pre_off = upre_off + offset;
vp8_copy_mem_cl(x->block[16].cl_commands, pre_mem, &pre_off, pre_stride, x->cl_predictor_mem, &upred_offset, 8, 8, 8, 1);
pre_off = vpre_off + offset;
vp8_copy_mem_cl(x->block[20].cl_commands, pre_mem, &pre_off, pre_stride, x->cl_predictor_mem, &vpred_offset, 8, 8, 8, 1);
}
}
else
{
int i;
if (x->mode_info_context->mbmi.partitioning < 3)
{
for (i = 0; i < 4; i++)
{
BLOCKD *d = &x->block[bbb[i]];
vp8_build_inter_predictors4b_cl(x, d, 16);
}
}
else
{
/* This loop can be done in any order... No dependencies.*/
/* Also, d0/d1 can be decoded simultaneously */
for (i = 0; i < 16; i += 2)
{
BLOCKD *d0 = &x->block[i];
BLOCKD *d1 = &x->block[i+1];
if (d0->bmi.mv.as_int == d1->bmi.mv.as_int)
vp8_build_inter_predictors2b_cl(x, d0, 16);
else
{
vp8_build_inter_predictors_b_cl(x, d0, 16);
vp8_build_inter_predictors_b_cl(x, d1, 16);
}
}
}
/* Another case of re-orderable/batchable loop */
for (i = 16; i < 24; i += 2)
{
BLOCKD *d0 = &x->block[i];
BLOCKD *d1 = &x->block[i+1];
if (d0->bmi.mv.as_int == d1->bmi.mv.as_int)
vp8_build_inter_predictors2b_cl(x, d0, 8);
else
{
vp8_build_inter_predictors_b_cl(x, d0, 8);
vp8_build_inter_predictors_b_cl(x, d1, 8);
}
}
}
#if !ONE_CQ_PER_MB
VP8_CL_FINISH(x->block[0].cl_commands);
VP8_CL_FINISH(x->block[16].cl_commands);
VP8_CL_FINISH(x->block[20].cl_commands);
#endif
vp8_cl_mb_finish(x, PREDICTOR);
}
/* The following functions are written for skip_recon_mb() to call. Since there is no recon in this
* situation, we can write the result directly to dst buffer instead of writing it to predictor
* buffer and then copying it to dst buffer.
*/
static void vp8_build_inter_predictors_b_s_cl(MACROBLOCKD *x, BLOCKD *d, int dst_offset)
{
unsigned char *ptr_base = *(d->base_pre);
int dst_stride = d->dst_stride;
int pre_stride = d->pre_stride;
int ptr_offset = d->pre + (d->bmi.mv.as_mv.row >> 3) * d->pre_stride + (d->bmi.mv.as_mv.col >> 3);
vp8_subpix_cl_fn_t sppf;
int pre_dist = *d->base_pre - x->pre.buffer_alloc;
cl_mem pre_mem = x->pre.buffer_mem;
cl_mem dst_mem = x->dst.buffer_mem;
if (d->sixtap_filter == CL_TRUE){
sppf = vp8_sixtap_predict4x4_cl;
} else
sppf = vp8_bilinear_predict4x4_cl;
if ( (d->bmi.mv.as_mv.row | d->bmi.mv.as_mv.col) & 7)
{
sppf(d->cl_commands, ptr_base, pre_mem, pre_dist+ptr_offset, pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, NULL, dst_mem, dst_offset, dst_stride);
}
else
{
int pre_off = pre_dist+ptr_offset;
vp8_copy_mem_cl(d->cl_commands, pre_mem,&pre_off,pre_stride, dst_mem, &dst_offset,dst_stride,4,4,1);
}
}
void vp8_build_inter_predictors_mb_s_cl(MACROBLOCKD *x)
{
cl_mem dst_mem = NULL;
cl_mem pre_mem = x->pre.buffer_mem;
unsigned char *dst_base = x->dst.buffer_alloc;
int ydst_off = x->dst.y_buffer - dst_base;
int udst_off = x->dst.u_buffer - dst_base;
int vdst_off = x->dst.v_buffer - dst_base;
dst_mem = x->dst.buffer_mem;
vp8_cl_mb_prep(x, DST_BUF);
#if !ONE_CQ_PER_MB
VP8_CL_FINISH(x->cl_commands);
#endif
if (x->mode_info_context->mbmi.mode != SPLITMV)
{
int offset;
unsigned char *pre_base = x->pre.buffer_alloc;
int ypre_off = x->pre.y_buffer - pre_base;
int upre_off = x->pre.u_buffer - pre_base;
int vpre_off = x->pre.v_buffer - pre_base;
int mv_row = x->mode_info_context->mbmi.mv.as_mv.row;
int mv_col = x->mode_info_context->mbmi.mv.as_mv.col;
int pre_stride = x->dst.y_stride;
int ptr_offset = (mv_row >> 3) * pre_stride + (mv_col >> 3);
if ((mv_row | mv_col) & 7)
{
if (x->sixtap_filter == CL_TRUE){
vp8_sixtap_predict16x16_cl(x->block[0].cl_commands, pre_base, pre_mem, ypre_off+ptr_offset, pre_stride, mv_col & 7, mv_row & 7, dst_base, dst_mem, ydst_off, x->dst.y_stride);
}
else
vp8_bilinear_predict16x16_cl(x->block[0].cl_commands, pre_base, pre_mem, ypre_off+ptr_offset, pre_stride, mv_col & 7, mv_row & 7, dst_base, dst_mem, ydst_off, x->dst.y_stride);
}
else
{
int pre_off = ypre_off+ptr_offset;
vp8_copy_mem_cl(x->block[0].cl_commands, pre_mem, &pre_off, pre_stride, dst_mem, &ydst_off, x->dst.y_stride, 16, 16, 1);
}
mv_row = x->block[16].bmi.mv.as_mv.row;
mv_col = x->block[16].bmi.mv.as_mv.col;
pre_stride >>= 1;
offset = (mv_row >> 3) * pre_stride + (mv_col >> 3);
if ((mv_row | mv_col) & 7)
{
if (x->sixtap_filter == CL_TRUE){
vp8_sixtap_predict8x8_cl(x->block[16].cl_commands, pre_base, pre_mem, upre_off+offset, pre_stride, mv_col & 7, mv_row & 7, dst_base, dst_mem, udst_off, x->dst.uv_stride);
vp8_sixtap_predict8x8_cl(x->block[20].cl_commands, pre_base, pre_mem, vpre_off+offset, pre_stride, mv_col & 7, mv_row & 7, dst_base, dst_mem, vdst_off, x->dst.uv_stride);
} else {
vp8_bilinear_predict8x8_cl(x->block[16].cl_commands, pre_base, pre_mem, upre_off+offset, pre_stride, mv_col & 7, mv_row & 7, dst_base, dst_mem, udst_off, x->dst.uv_stride);
vp8_bilinear_predict8x8_cl(x->block[20].cl_commands, pre_base, pre_mem, vpre_off+offset, pre_stride, mv_col & 7, mv_row & 7, dst_base, dst_mem, vdst_off, x->dst.uv_stride);
}
}
else
{
int pre_offsets[2] = {upre_off+offset, vpre_off+offset};
int dst_offsets[2] = {udst_off,vdst_off};
vp8_copy_mem_cl(x->block[16].cl_commands, pre_mem, pre_offsets, pre_stride, dst_mem, dst_offsets, x->dst.uv_stride, 8, 8, 2);
}
}
else
{
/* note: this whole ELSE part is not executed at all. So, no way to test the correctness of my modification. Later,
* if sth is wrong, go back to what it is in build_inter_predictors_mb.
*
* ACW: Not sure who the above comment belongs to, but it is
* accurate for the decoder. Verified by reverse trace of source
*/
int i;
if (x->mode_info_context->mbmi.partitioning < 3)
{
for (i = 0; i < 4; i++)
{
BLOCKD *d = &x->block[bbb[i]];
{
unsigned char *ptr_base = *(d->base_pre);
int pre_off = ptr_base - x->pre.buffer_alloc;
int ptr_offset = d->pre + (d->bmi.mv.as_mv.row >> 3) * d->pre_stride + (d->bmi.mv.as_mv.col >> 3);
pre_off += ptr_offset;
if ( (d->bmi.mv.as_mv.row | d->bmi.mv.as_mv.col) & 7)
{
if (x->sixtap_filter == CL_TRUE)
vp8_sixtap_predict8x8_cl(d->cl_commands, ptr_base, pre_mem, pre_off, d->pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, dst_base, dst_mem, ydst_off, x->dst.y_stride);
else
vp8_bilinear_predict8x8_cl(d->cl_commands, ptr_base, pre_mem, pre_off, d->pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, dst_base, dst_mem, ydst_off, x->dst.y_stride);
}
else
{
vp8_copy_mem_cl(x->block[0].cl_commands, pre_mem, &pre_off, d->pre_stride, dst_mem, &ydst_off, x->dst.y_stride, 8, 8, 1);
}
}
}
}
else
{
for (i = 0; i < 16; i += 2)
{
BLOCKD *d0 = &x->block[i];
BLOCKD *d1 = &x->block[i+1];
if (d0->bmi.mv.as_int == d1->bmi.mv.as_int)
{
/*vp8_build_inter_predictors2b(x, d0, 16);*/
unsigned char *ptr_base = *(d0->base_pre);
int pre_off = ptr_base - x->pre.buffer_alloc;
int ptr_offset = d0->pre + (d0->bmi.mv.as_mv.row >> 3) * d0->pre_stride + (d0->bmi.mv.as_mv.col >> 3);
pre_off += ptr_offset;
if ( (d0->bmi.mv.as_mv.row | d0->bmi.mv.as_mv.col) & 7)
{
if (d0->sixtap_filter == CL_TRUE)
vp8_sixtap_predict8x4_cl(d0->cl_commands, ptr_base, pre_mem, pre_off, d0->pre_stride, d0->bmi.mv.as_mv.col & 7, d0->bmi.mv.as_mv.row & 7, dst_base, dst_mem, ydst_off, x->dst.y_stride);
else
vp8_bilinear_predict8x4_cl(d0->cl_commands, ptr_base, pre_mem,pre_off, d0->pre_stride, d0->bmi.mv.as_mv.col & 7, d0->bmi.mv.as_mv.row & 7, dst_base, dst_mem, ydst_off, x->dst.y_stride);
}
else
{
vp8_copy_mem_cl(x->block[0].cl_commands, pre_mem, &pre_off, d0->pre_stride, dst_mem, &ydst_off, x->dst.y_stride, 8, 4, 1);
}
}
else
{
vp8_build_inter_predictors_b_s_cl(x,d0, ydst_off);
vp8_build_inter_predictors_b_s_cl(x,d1, ydst_off);
}
}
}
for (i = 16; i < 24; i += 2)
{
BLOCKD *d0 = &x->block[i];
BLOCKD *d1 = &x->block[i+1];
if (d0->bmi.mv.as_int == d1->bmi.mv.as_int)
{
/*vp8_build_inter_predictors2b(x, d0, 8);*/
unsigned char *ptr_base = *(d0->base_pre);
int ptr_offset = d0->pre + (d0->bmi.mv.as_mv.row >> 3) * d0->pre_stride + (d0->bmi.mv.as_mv.col >> 3);
int pre_off = ptr_base - x->pre.buffer_alloc + ptr_offset;
if ( (d0->bmi.mv.as_mv.row | d0->bmi.mv.as_mv.col) & 7)
{
if (d0->sixtap_filter || CL_TRUE)
vp8_sixtap_predict8x4_cl(d0->cl_commands, ptr_base, pre_mem, pre_off, d0->pre_stride,
d0->bmi.mv.as_mv.col & 7, d0->bmi.mv.as_mv.row & 7,
dst_base, dst_mem, ydst_off, x->dst.uv_stride);
else
vp8_bilinear_predict8x4_cl(d0->cl_commands, ptr_base, pre_mem, pre_off, d0->pre_stride,
d0->bmi.mv.as_mv.col & 7, d0->bmi.mv.as_mv.row & 7,
dst_base, dst_mem, ydst_off, x->dst.uv_stride);
}
else
{
vp8_copy_mem_cl(x->block[0].cl_commands, pre_mem, &pre_off,
d0->pre_stride, dst_mem, &ydst_off, x->dst.uv_stride, 8, 4, 1);
}
}
else
{
vp8_build_inter_predictors_b_s_cl(x,d0, ydst_off);
vp8_build_inter_predictors_b_s_cl(x,d1, ydst_off);
}
} //end for
}
#if !ONE_CQ_PER_MB
VP8_CL_FINISH(x->block[0].cl_commands);
VP8_CL_FINISH(x->block[16].cl_commands);
VP8_CL_FINISH(x->block[20].cl_commands);
#endif
vp8_cl_mb_finish(x, DST_BUF);
}

View File

@@ -1,46 +0,0 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#ifndef SUBPIXEL_CL_H
#define SUBPIXEL_CL_H
#include "../blockd.h"
/* Note:
*
* This platform is commonly built for runtime CPU detection. If you modify
* any of the function mappings present in this file, be sure to also update
* them in the function pointer initialization code
*/
#define prototype_subpixel_predict_cl(sym) \
void sym(cl_command_queue cq, unsigned char *src_base, cl_mem src_mem, int src_offset, \
int src_pitch, int xofst, int yofst, \
unsigned char *dst_base, cl_mem dst_mem, int dst_offset, int dst_pitch)
extern prototype_subpixel_predict_cl(vp8_sixtap_predict16x16_cl);
extern prototype_subpixel_predict_cl(vp8_sixtap_predict8x8_cl);
extern prototype_subpixel_predict_cl(vp8_sixtap_predict8x4_cl);
extern prototype_subpixel_predict_cl(vp8_sixtap_predict4x4_cl);
extern prototype_subpixel_predict_cl(vp8_bilinear_predict16x16_cl);
extern prototype_subpixel_predict_cl(vp8_bilinear_predict8x8_cl);
extern prototype_subpixel_predict_cl(vp8_bilinear_predict8x4_cl);
extern prototype_subpixel_predict_cl(vp8_bilinear_predict4x4_cl);
typedef prototype_subpixel_predict_cl((*vp8_subpix_cl_fn_t));
//typedef enum
//{
// SIXTAP = 0,
// BILINEAR = 1
//} SUBPIX_TYPE;
#endif

View File

@@ -1,342 +0,0 @@
/*
* Copyright (c) 2011 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "vp8_opencl.h"
int cl_initialized = VP8_CL_NOT_INITIALIZED;
VP8_COMMON_CL cl_data;
//Initialization functions for various CL programs.
extern int cl_init_filter();
extern int cl_init_idct();
extern int cl_init_loop_filter();
//Common CL destructors
extern void cl_destroy_loop_filter();
extern void cl_destroy_filter();
extern void cl_destroy_idct();
//Destructors for encoder/decoder-specific bits
extern void cl_decode_destroy();
extern void cl_encode_destroy();
/**
*
* @param cq
* @param new_status
*/
void cl_destroy(cl_command_queue cq, int new_status) {
if (cl_initialized != CL_SUCCESS)
return;
//Wait on any pending operations to complete... frees up all of our pointers
if (cq != NULL)
clFinish(cq);
#if ENABLE_CL_SUBPIXEL
//Release the objects that we've allocated on the GPU
cl_destroy_filter();
#endif
#if ENABLE_CL_IDCT_DEQUANT
cl_destroy_idct();
#if CONFIG_VP8_DECODER
if (cl_data.cl_decode_initialized == CL_SUCCESS)
cl_decode_destroy();
#endif
#endif
#if ENABLE_CL_LOOPFILTER
cl_destroy_loop_filter();
#endif
#if CONFIG_VP8_ENCODER
//placeholder for if/when encoder CL gets implemented
#endif
if (cq){
clReleaseCommandQueue(cq);
}
if (cl_data.context){
clReleaseContext(cl_data.context);
cl_data.context = NULL;
}
cl_initialized = new_status;
return;
}
/**
*
* @param dev
* @return
*/
cl_device_type device_type(cl_device_id dev){
cl_device_type type;
int err;
err = clGetDeviceInfo(dev, CL_DEVICE_TYPE, sizeof(type),&type,NULL);
if (err != CL_SUCCESS)
return CL_INVALID_DEVICE;
return type;
}
/**
*
* @return
*/
int cl_common_init() {
int err,i,dev;
cl_platform_id platform_ids[MAX_NUM_PLATFORMS];
cl_uint num_found, num_devices;
cl_device_id devices[MAX_NUM_DEVICES];
//Don't allow multiple CL contexts..
if (cl_initialized != VP8_CL_NOT_INITIALIZED)
return cl_initialized;
// Connect to a compute device
err = clGetPlatformIDs(MAX_NUM_PLATFORMS, platform_ids, &num_found);
if (err != CL_SUCCESS) {
fprintf(stderr, "Couldn't query platform IDs\n");
return VP8_CL_TRIED_BUT_FAILED;
}
if (num_found == 0) {
fprintf(stderr, "No platforms found\n");
return VP8_CL_TRIED_BUT_FAILED;
}
//printf("Enumerating %d platform(s)\n", num_found);
//Enumerate the platforms found
for (i = 0; i < num_found; i++){
char buf[2048];
size_t len;
err = clGetPlatformInfo( platform_ids[i], CL_PLATFORM_VENDOR, sizeof(buf), buf, &len);
if (err != CL_SUCCESS){
fprintf(stderr, "Error retrieving platform vendor for platform %d",i);
continue;
}
//printf("Platform %d: %s\n",i,buf);
//If you need to force a platform (e.g. CPU-only testing), uncomment this
//if (strstr(buf,"NVIDIA"))
// continue;
//Try to find a valid compute device
//Favor the GPU, but fall back to any other available device if necessary
#ifdef __APPLE__
printf("Apple system. Running CL as CPU-only for now...\n");
err = clGetDeviceIDs(platform_ids[i], CL_DEVICE_TYPE_CPU, MAX_NUM_DEVICES, devices, &num_devices);
#else
err = clGetDeviceIDs(platform_ids[i], CL_DEVICE_TYPE_ALL, MAX_NUM_DEVICES, devices, &num_devices);
#endif //__APPLE__
//printf("found %d devices\n", num_devices);
cl_data.device_id = NULL;
for( dev = 0; dev < num_devices; dev++ ){
char ext[2048];
//Get info for this device.
err = clGetDeviceInfo(devices[dev], CL_DEVICE_EXTENSIONS,
sizeof(ext),ext,NULL);
VP8_CL_CHECK_SUCCESS(NULL,err != CL_SUCCESS,
"Error retrieving device extension list",continue, 0);
//printf("Device %d supports: %s\n",dev,ext);
//The kernels in VP8 require byte-addressable stores, which is an
//extension. It's required in OpenCL 1.1, but not all devices
//support it.
if (strstr(ext,"cl_khr_byte_addressable_store")){
//We found a valid device, so use it. But if we find a GPU
//(maybe this is one), prefer that.
cl_data.device_id = devices[dev];
if ( device_type(devices[dev]) == CL_DEVICE_TYPE_GPU ){
//printf("Device %d is a GPU\n",dev);
break;
}
}
}
//If we've found a usable GPU, stop looking.
if (cl_data.device_id != NULL && device_type(cl_data.device_id) == CL_DEVICE_TYPE_GPU )
break;
}
if (cl_data.device_id == NULL){
printf("Error: Failed to find a valid OpenCL device. Using CPU paths\n");
return VP8_CL_TRIED_BUT_FAILED;
}
// Create the compute context
cl_data.context = clCreateContext(0, 1, &cl_data.device_id, NULL, NULL, &err);
if (!cl_data.context) {
printf("Error: Failed to create a compute context!\n");
return VP8_CL_TRIED_BUT_FAILED;
}
//Initialize programs to null value
//Enables detection of if they've been initialized as well.
cl_data.filter_program = NULL;
cl_data.idct_program = NULL;
cl_data.loop_filter_program = NULL;
#if ENABLE_CL_SUBPIXEL
err = cl_init_filter();
if (err != CL_SUCCESS)
return err;
#endif
#if ENABLE_CL_IDCT_DEQUANT
err = cl_init_idct();
if (err != CL_SUCCESS)
return err;
#endif
#if ENABLE_CL_LOOPFILTER
err = cl_init_loop_filter();
if (err != CL_SUCCESS)
return err;
#endif
return CL_SUCCESS;
}
char *cl_read_file(const char* file_name) {
long pos;
char *bytes;
size_t amt_read;
FILE *f;
f = fopen(file_name, "rb");
if (f == NULL) {
char *fullpath;
//printf("Couldn't find %s\n", file_name);
//Generate a file path for the CL sources using the library install dir
fullpath = malloc(strlen(vpx_codec_lib_dir()) + strlen(file_name) + 2);
if (fullpath == NULL) {
return NULL;
}
strcpy(fullpath, vpx_codec_lib_dir());
strcat(fullpath, "/"); //Will need to be changed for MSVS
strcat(fullpath, file_name);
//printf("Looking in %s\n", fullpath);
f = fopen(fullpath, "rb");
if (f == NULL) {
fprintf(stderr,"Couldn't find CL source at %s or %s\n", file_name, fullpath);
free(fullpath);
return NULL;
}
//printf("Found cl source at %s\n", fullpath);
free(fullpath);
} else {
//printf("Found cl source at %s\n", file_name);
}
fseek(f, 0, SEEK_END);
pos = ftell(f);
fseek(f, 0, SEEK_SET);
bytes = malloc(pos+1);
if (bytes == NULL) {
fclose(f);
return NULL;
}
amt_read = fread(bytes, pos, 1, f);
if (amt_read != 1) {
free(bytes);
fclose(f);
return NULL;
}
bytes[pos] = '\0'; //null terminate the source string
fclose(f);
return bytes;
}
void show_build_log(cl_program *prog_ref){
size_t len;
char *buffer;
int err = clGetProgramBuildInfo(*prog_ref, cl_data.device_id, CL_PROGRAM_BUILD_LOG, 0, NULL, &len);
if (err != CL_SUCCESS){
printf("Error: Could not get length of CL build log\n");
}
buffer = (char*) malloc(len);
if (buffer == NULL) {
printf("Error: Couldn't allocate compile output buffer memory\n");
}
err = clGetProgramBuildInfo(*prog_ref, cl_data.device_id, CL_PROGRAM_BUILD_LOG, len, buffer, NULL);
if (err != CL_SUCCESS) {
printf("Error: Could not get CL build log\n");
} else {
printf("Compile output: %s\n", buffer);
}
free(buffer);
}
int cl_load_program(cl_program *prog_ref, const char *file_name, const char *opts) {
int err;
char *kernel_src = cl_read_file(file_name);
*prog_ref = NULL;
if (kernel_src != NULL) {
*prog_ref = clCreateProgramWithSource(cl_data.context, 1, (const char**)&kernel_src, NULL, &err);
free(kernel_src);
} else {
cl_destroy(NULL, VP8_CL_TRIED_BUT_FAILED);
printf("Couldn't find OpenCL source files. \nUsing software path.\n");
return VP8_CL_TRIED_BUT_FAILED;
}
if (*prog_ref == NULL) {
printf("Error: Couldn't create program\n");
return VP8_CL_TRIED_BUT_FAILED;
}
if (err != CL_SUCCESS) {
printf("Error creating program: %d\n", err);
}
/* Build the program executable */
err = clBuildProgram(*prog_ref, 0, NULL, opts, NULL, NULL);
if (err != CL_SUCCESS) {
printf("Error: Failed to build program executable for %s!\n", file_name);
show_build_log(prog_ref);
return VP8_CL_TRIED_BUT_FAILED;
}
return CL_SUCCESS;
}

View File

@@ -1,192 +0,0 @@
/*
* Copyright (c) 2011 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#ifndef VP8_OPENCL_H
#define VP8_OPENCL_H
#ifdef __cplusplus
extern "C" {
#endif
#include "../../../vpx_config.h"
#ifdef __APPLE__
#include <OpenCL/cl.h>
#else
#include <CL/cl.h>
#endif
#if HAVE_DLOPEN
#include "dynamic_cl.h"
#endif
#define ENABLE_CL_IDCT_DEQUANT 0
#define ENABLE_CL_SUBPIXEL 1
#define TWO_PASS_SIXTAP 0
#define MEM_COPY_KERNEL 1
#define ONE_CQ_PER_MB 1 //Value of 0 is racey... still experimental.
#define ENABLE_CL_LOOPFILTER 0
extern char *cl_read_file(const char* file_name);
extern int cl_common_init();
extern void cl_destroy(cl_command_queue cq, int new_status);
extern int cl_load_program(cl_program *prog_ref, const char *file_name, const char *opts);
#define MAX_NUM_PLATFORMS 4
#define MAX_NUM_DEVICES 10
#define VP8_CL_TRIED_BUT_FAILED 1
#define VP8_CL_NOT_INITIALIZED -1
extern int cl_initialized;
extern const char *vpx_codec_lib_dir(void);
#define VP8_CL_FINISH(cq) \
if (cl_initialized == CL_SUCCESS){ \
/* Wait for kernels to finish. */ \
clFinish(cq); \
}
#define VP8_CL_BARRIER(cq) \
if (cl_initialized == CL_SUCCESS){ \
/* Insert a barrier into the command queue. */ \
clEnqueueBarrier(cq); \
}
#define VP8_CL_CHECK_SUCCESS(cq,cond,msg,alt,retCode) \
if ( cond ){ \
fprintf(stderr, msg); \
cl_destroy(cq, VP8_CL_TRIED_BUT_FAILED); \
alt; \
return retCode; \
}
#define VP8_CL_CALC_LOCAL_SIZE(kernel, kernel_size) \
err = clGetKernelWorkGroupInfo( cl_data.kernel, \
cl_data.device_id, \
CL_KERNEL_WORK_GROUP_SIZE, \
sizeof(size_t), \
&cl_data.kernel_size, \
NULL);\
VP8_CL_CHECK_SUCCESS(NULL, err != CL_SUCCESS, \
"Error: Failed to calculate local size of kernel!\n", \
,\
VP8_CL_TRIED_BUT_FAILED \
); \
#define VP8_CL_CREATE_KERNEL(data,program,name,str_name) \
data.name = clCreateKernel(data.program, str_name , &err); \
VP8_CL_CHECK_SUCCESS(NULL, err != CL_SUCCESS || !data.name, \
"Error: Failed to create compute kernel "#str_name"!\n", \
,\
VP8_CL_TRIED_BUT_FAILED \
);
#define VP8_CL_READ_BUF(cq, bufRef, bufSize, dstPtr) \
err = clEnqueueReadBuffer(cq, bufRef, CL_FALSE, 0, bufSize , dstPtr, 0, NULL, NULL); \
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS, \
"Error: Failed to read from GPU!\n",, err \
); \
#define VP8_CL_SET_BUF(cq, bufRef, bufSize, dataPtr, altPath, retCode) \
{ \
err = clEnqueueWriteBuffer(cq, bufRef, CL_FALSE, 0, \
bufSize, dataPtr, 0, NULL, NULL); \
\
VP8_CL_CHECK_SUCCESS(cq, err != CL_SUCCESS, \
"Error: Failed to write to buffer!\n", \
altPath, retCode\
); \
} \
#define VP8_CL_CREATE_BUF(cq, bufRef, bufType, bufSize, dataPtr, altPath, retCode) \
bufRef = clCreateBuffer(cl_data.context, CL_MEM_READ_WRITE, bufSize, NULL, NULL); \
if (dataPtr != NULL && bufRef != NULL){ \
VP8_CL_SET_BUF(cq, bufRef, bufSize, dataPtr, altPath, retCode)\
} \
VP8_CL_CHECK_SUCCESS(cq, !bufRef, \
"Error: Failed to allocate buffer. Using CPU path!\n", \
altPath, retCode\
); \
#define VP8_CL_RELEASE_KERNEL(kernel) \
if (kernel) \
clReleaseKernel(kernel); \
kernel = NULL;
typedef struct VP8_COMMON_CL {
cl_device_id device_id; // compute device id
cl_context context; // compute context
//cl_command_queue commands; // compute command queue
cl_program filter_program; // compute program for subpixel/bilinear filters
cl_kernel vp8_sixtap_predict_kernel;
size_t vp8_sixtap_predict_kernel_size;
cl_kernel vp8_sixtap_predict8x4_kernel;
size_t vp8_sixtap_predict8x4_kernel_size;
cl_kernel vp8_sixtap_predict8x8_kernel;
size_t vp8_sixtap_predict8x8_kernel_size;
cl_kernel vp8_sixtap_predict16x16_kernel;
size_t vp8_sixtap_predict16x16_kernel_size;
cl_kernel vp8_bilinear_predict4x4_kernel;
cl_kernel vp8_bilinear_predict8x4_kernel;
cl_kernel vp8_bilinear_predict8x8_kernel;
cl_kernel vp8_bilinear_predict16x16_kernel;
cl_kernel vp8_filter_block2d_first_pass_kernel;
size_t vp8_filter_block2d_first_pass_kernel_size;
cl_kernel vp8_filter_block2d_second_pass_kernel;
size_t vp8_filter_block2d_second_pass_kernel_size;
cl_kernel vp8_filter_block2d_bil_first_pass_kernel;
size_t vp8_filter_block2d_bil_first_pass_kernel_size;
cl_kernel vp8_filter_block2d_bil_second_pass_kernel;
size_t vp8_filter_block2d_bil_second_pass_kernel_size;
cl_kernel vp8_memcpy_kernel;
size_t vp8_memcpy_kernel_size;
cl_kernel vp8_memset_short_kernel;
cl_program idct_program;
cl_kernel vp8_short_inv_walsh4x4_1_kernel;
cl_kernel vp8_short_inv_walsh4x4_1st_pass_kernel;
cl_kernel vp8_short_inv_walsh4x4_2nd_pass_kernel;
cl_kernel vp8_dc_only_idct_add_kernel;
//Note that the following 2 kernels are encoder-only. Not used in decoder.
cl_kernel vp8_short_idct4x4llm_1_kernel;
cl_kernel vp8_short_idct4x4llm_kernel;
cl_program loop_filter_program;
cl_kernel vp8_loop_filter_horizontal_edge_kernel;
cl_kernel vp8_loop_filter_vertical_edge_kernel;
cl_kernel vp8_mbloop_filter_horizontal_edge_kernel;
cl_kernel vp8_mbloop_filter_vertical_edge_kernel;
cl_kernel vp8_loop_filter_simple_horizontal_edge_kernel;
cl_kernel vp8_loop_filter_simple_vertical_edge_kernel;
cl_program dequant_program;
cl_kernel vp8_dequant_dc_idct_add_kernel;
cl_kernel vp8_dequant_idct_add_kernel;
cl_kernel vp8_dequantize_b_kernel;
cl_int cl_decode_initialized;
cl_int cl_encode_initialized;
} VP8_COMMON_CL;
extern VP8_COMMON_CL cl_data;
#ifdef __cplusplus
}
#endif
#endif /* VP8_OPENCL_H */

View File

@@ -9,17 +9,11 @@
*/
#include "vpx_ports/config.h"
#include "vp8/decoder/onyxd_int.h"
#ifndef __INC_PARTIALGFUPDATE_H
#define __INC_PARTIALGFUPDATE_H
#include "vp8/common/opencl/vp8_opencl.h"
#include "vp8_decode_cl.h"
#include "onyxc_int.h"
void vp8_arch_opencl_decode_init(VP8D_COMP *pbi)
{
extern void update_gf_selective(ONYX_COMMON *cm, MACROBLOCKD *x);
if (cl_initialized == CL_SUCCESS){
cl_decode_init();
}
}
#endif

Some files were not shown because too many files have changed in this diff Show More