Added configure option to enable error-concealment. Disabled by default.

Change-Id: I94580a5ecb13520195ea2b8a10ca11bb5a01d2a6
Concealed MBs are always SPLITMV with partition=3. This can be optimized.
2011-04-29 14:08:47 +02:00 · 2011-04-29 13:50:15 +02:00 · 2011-04-29 11:22:09 +02:00 · 2011-04-28 16:28:07 +02:00 · 2011-04-20 12:08:27 +02:00 · 2011-04-19 16:23:05 +02:00
314 changed files with 24426 additions and 16711 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -60,9 +60,3 @@
 /vpx_config.h
 /vpx_version.h
 TAGS
-vpxdec
-vpxenc
-.project
-.cproject
-*.csv
-*.oclpj
--- a/.mailmap
+++ b/.mailmap
@@ -1,4 +1,2 @@
 Adrian Grange <agrange@google.com>
 Johann Koenig <johannkoenig@google.com>
-Tero Rintaluoma <teror@google.com> <tero.rintaluoma@on2.com>
-Tom Finegan <tomfinegan@google.com>
--- a/9
+++ b/9
@@ -4,18 +4,13 @@
 Aaron Watry <awatry@gmail.com>
 Adrian Grange <agrange@google.com>
 Alex Converse <alex.converse@gmail.com>
-Andoni Morales Alastruey <ylatuya@gmail.com>
 Andres Mejia <mcitadel@gmail.com>
-Attila Nagy <attilanagy@google.com>
 Fabio Pedretti <fabio.ped@libero.it>
 Frank Galligan <fgalligan@google.com>
 Fredrik Söderquist <fs@opera.com>
 Fritz Koenig <frkoenig@google.com>
-Gaute Strokkenes <gaute.strokkenes@broadcom.com>
 Giuseppe Scrivano <gscrivano@gnu.org>
 Guillermo Ballester Valor <gbvalor@gmail.com>
-Henrik Lundin <hlundin@google.com>
-James Berry <jamesberry@google.com>
 James Zern <jzern@google.com>
 Jan Kratochvil <jan.kratochvil@redhat.com>
 Jeff Muizelaar <jmuizelaar@mozilla.com>
@@ -28,14 +23,10 @@ Luca Barbato <lu_zero@gentoo.org>
 Makoto Kato <makoto.kt@gmail.com>
 Martin Ettl <ettl.martin78@googlemail.com>
 Michael Kohler <michaelkohler@live.com>
-Mikhal Shemer <mikhal@google.com>
-Pascal Massimino <pascal.massimino@gmail.com>
-Patrik Westin <patrik.westin@gmail.com>
 Paul Wilkins <paulwilkins@google.com>
 Pavol Rusnak <stick@gk2.sk>
 Philip Jägenstedt <philipj@opera.com>
 Scott LaVarnway <slavarnway@google.com>
-Tero Rintaluoma <teror@google.com>
 Timothy B. Terriberry <tterribe@xiph.org>
 Tom Finegan <tomfinegan@google.com>
 Yaowu Xu <yaowu@google.com>
--- a/77
+++ b/77
@@ -1,80 +1,3 @@
-2011-03-07 v0.9.6 "Bali"
-  Our second named release, focused on a faster, higher quality, encoder.
-
-  - Upgrading:
-    This release is backwards compatible with Aylesbury (v0.9.5). Users
-    of older releases should refer to the Upgrading notes in this
-    document for that release.
-
-  - Enhancements:
-      vpxenc --psnr shows a summary when encode completes
-      --tune=ssim option to enable activity masking
-      improved postproc visualizations for development
-      updated support for Apple iOS to SDK 4.2
-      query decoder to determine which reference frames were updated
-      implemented error tracking in the decoder
-      fix pipe support on windows
-
-  - Speed:
-      Primary focus was on good quality mode, speed 0. Average improvement
-      on x86 about 40%, up to 100% on user-generated content at that speed.
-      Best quality mode speed improved 35%, and realtime speed 10-20%. This
-      release also saw significant improvement in realtime encoding speed
-      on ARM platforms.
-
-        Improved encoder threading
-        Dont pick encoder filter level when loopfilter is disabled.
-        Avoid double copying of key frames into alt and golden buffer
-        FDCT optimizations.
-        x86 sse2 temporal filter
-        SSSE3 version of fast quantizer
-        vp8_rd_pick_best_mbsegmentation code restructure
-        Adjusted breakout RD for SPLITMV
-        Changed segmentation check order
-        Improved rd_pick_intra4x4block
-        Adds armv6 optimized variance calculation
-        ARMv6 optimized sad16x16
-        ARMv6 optimized half pixel variance calculations
-        Full search SAD function optimization in SSE4.1
-        Improve MV prediction accuracy to achieve performance gain
-        Improve MV prediction in vp8_pick_inter_mode() for speed>3
-
-  - Quality:
-      Best quality mode improved PSNR 6.3%, and SSIM 6.1%. This release
-      also includes support for "activity masking," which greatly improves
-      SSIM at the expense of PSNR. For now, this feature is available with
-      the --tune=ssim option. Further experimentation in this area
-      is ongoing. This release also introduces a new rate control mode
-      called "CQ," which changes the allocation of bits within a clip to
-      the sections where they will have the most visual impact.
-
-        Tuning for the more exact quantizer.
-        Relax rate control for last few frames
-        CQ Mode
-        Limit key frame quantizer for forced key frames.
-        KF/GF Pulsing
-        Add simple version of activity masking.
-        make rdmult adaptive for intra in quantizer RDO
-        cap the best quantizer for 2nd order DC
-        change the threshold of DC check for encode breakout
-
-  - Bug Fixes:
-      Fix crash on Sparc Solaris.
-      Fix counter of fixed keyframe distance
-      ARNR filter pointer update bug fix
-      Fixed use of motion percentage in KF/GF group calc
-      Changed condition for using RD in Intra Mode
-      Fix encoder real-time only configuration.
-      Fix ARM encoder crash with multiple token partitions
-      Fixed bug first cluster timecode of webm file is wrong.
-      Fixed various encoder bugs with odd-sized images
-      vp8e_get_preview fixed when spatial resampling enabled
-      quantizer: fix assertion in fast quantizer path
-      Allocate source buffers to be multiples of 16
-      Fix for manual Golden frame frequency
-      Fix drastic undershoot in long form content
-
-
 2010-10-28 v0.9.5 "Aylesbury"
  Our first named release, focused on a faster decoder, and a better encoder.

--- a/4
+++ b/4
@@ -45,14 +45,18 @@ COMPILING THE APPLICATIONS/LIBRARIES:
    armv5te-linux-rvct
    armv5te-linux-gcc
    armv5te-symbian-gcc
+    armv5te-wince-vs8
    armv6-darwin-gcc
    armv6-linux-rvct
    armv6-linux-gcc
    armv6-symbian-gcc
+    armv6-wince-vs8
    iwmmxt-linux-rvct
    iwmmxt-linux-gcc
+    iwmmxt-wince-vs8
    iwmmxt2-linux-rvct
    iwmmxt2-linux-gcc
+    iwmmxt2-wince-vs8
    armv7-linux-rvct
    armv7-linux-gcc
    mips32-linux-gcc
--- a/build/arm-wince-vs8/armasmv5.rules
+++ b/build/arm-wince-vs8/armasmv5.rules
@@ -0,0 +1,20 @@
+<?xml version="1.0" encoding="utf-8"?>
+<VisualStudioToolFile
+	Name="armasm"
+	Version="8.00"
+	>
+	<Rules>
+		<CustomBuildRule
+			Name="ARMASM"
+			DisplayName="Armasm Assembler"
+			CommandLine="armasm -o &quot;$(IntDir)\$(InputName).obj&quot; $(InputPath) -32 -ARCH 5&#x0D;&#x0A;"
+			Outputs="$(IntDir)\$(InputName).obj"
+			FileExtensions="*.asm"
+			ExecutionDescription="Assembling $(InputName).asm"
+			ShowOnlyRuleProperties="false"
+			>
+			<Properties>
+			</Properties>
+		</CustomBuildRule>
+	</Rules>
+</VisualStudioToolFile>
--- a/build/arm-wince-vs8/armasmv6.rules
+++ b/build/arm-wince-vs8/armasmv6.rules
@@ -0,0 +1,20 @@
+<?xml version="1.0" encoding="utf-8"?>
+<VisualStudioToolFile
+	Name="armasm"
+	Version="8.00"
+	>
+	<Rules>
+		<CustomBuildRule
+			Name="ARMASM"
+			DisplayName="Armasm Assembler"
+			CommandLine="armasm -o &quot;$(IntDir)\$(InputName).obj&quot; $(InputPath) -32 -ARCH 6&#x0D;&#x0A;"
+			Outputs="$(IntDir)\$(InputName).obj"
+			FileExtensions="*.asm"
+			ExecutionDescription="Assembling $(InputName).asm"
+			ShowOnlyRuleProperties="false"
+			>
+			<Properties>
+			</Properties>
+		</CustomBuildRule>
+	</Rules>
+</VisualStudioToolFile>
--- a/build/arm-wince-vs8/armasmxscale.rules
+++ b/build/arm-wince-vs8/armasmxscale.rules
@@ -0,0 +1,20 @@
+<?xml version="1.0" encoding="utf-8"?>
+<VisualStudioToolFile
+	Name="armasm"
+	Version="8.00"
+	>
+	<Rules>
+		<CustomBuildRule
+			Name="ARMASM"
+			DisplayName="Armasm Assembler"
+			CommandLine="armasm -o &quot;$(IntDir)\$(InputName).obj&quot; $(InputPath) -32 -cpu XSCALE&#x0D;&#x0A;"
+			Outputs="$(IntDir)\$(InputName).obj"
+			FileExtensions="*.asm"
+			ExecutionDescription="Assembling $(InputName).asm"
+			ShowOnlyRuleProperties="false"
+			>
+			<Properties>
+			</Properties>
+		</CustomBuildRule>
+	</Rules>
+</VisualStudioToolFile>
--- a/build/arm-wince-vs8/obj_int_extract.bat
+++ b/build/arm-wince-vs8/obj_int_extract.bat
@@ -0,0 +1,13 @@
+@echo off
+REM   Copyright (c) 2010 The WebM project authors. All Rights Reserved.
+REM
+REM   Use of this source code is governed by a BSD-style license
+REM   that can be found in the LICENSE file in the root of the source
+REM   tree. An additional intellectual property rights grant can be found
+REM   in the file PATENTS.  All contributing project authors may
+REM   be found in the AUTHORS file in the root of the source tree.
+echo on
+
+
+cl /I ".\\" /I "..\vp6_decoder_sdk" /I "..\vp6_decoder_sdk\vpx_ports" /D "NDEBUG" /D "_WIN32_WCE=0x420" /D "UNDER_CE" /D "WIN32_PLATFORM_PSPC" /D "WINCE" /D "_LIB" /D "ARM" /D "_ARM_" /D "_UNICODE" /D "UNICODE" /FD /EHsc /MT /GS- /fp:fast /GR- /Fo"Pocket_PC_2003__ARMV4_\%1/" /Fd"Pocket_PC_2003__ARMV4_\%1/vc80.pdb" /W3 /nologo /c /TC ..\vp6_decoder_sdk\vp6_decoder\algo\common\arm\dec_asm_offsets_arm.c
+obj_int_extract.exe rvds "Pocket_PC_2003__ARMV4_\%1/dec_asm_offsets_arm.obj"
--- a/build/arm-wince-vs8/vpx.sln
+++ b/build/arm-wince-vs8/vpx.sln
@@ -0,0 +1,88 @@
+Microsoft Visual Studio Solution File, Format Version 9.00
+# Visual Studio 2005
+Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "example", "example.vcproj", "{BA5FE66F-38DD-E034-F542-B1578C5FB950}"
+	ProjectSection(ProjectDependencies) = postProject
+		{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74} = {DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}
+		{E1360C65-D375-4335-8057-7ED99CC3F9B2} = {E1360C65-D375-4335-8057-7ED99CC3F9B2}
+	EndProjectSection
+EndProject
+Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "obj_int_extract", "obj_int_extract.vcproj", "{E1360C65-D375-4335-8057-7ED99CC3F9B2}"
+EndProject
+Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "vpx", "vpx.vcproj", "{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}"
+	ProjectSection(ProjectDependencies) = postProject
+		{E1360C65-D375-4335-8057-7ED99CC3F9B2} = {E1360C65-D375-4335-8057-7ED99CC3F9B2}
+	EndProjectSection
+EndProject
+Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "xma", "xma.vcproj", "{A955FC4A-73F1-44F7-135E-30D84D32F022}"
+	ProjectSection(ProjectDependencies) = postProject
+		{E1360C65-D375-4335-8057-7ED99CC3F9B2} = {E1360C65-D375-4335-8057-7ED99CC3F9B2}
+		{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74} = {DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}
+	EndProjectSection
+EndProject
+Global
+	GlobalSection(SolutionConfigurationPlatforms) = preSolution
+		Debug|Mixed Platforms = Debug|Mixed Platforms
+		Debug|Pocket PC 2003 (ARMV4) = Debug|Pocket PC 2003 (ARMV4)
+		Debug|Win32 = Debug|Win32
+		Release|Mixed Platforms = Release|Mixed Platforms
+		Release|Pocket PC 2003 (ARMV4) = Release|Pocket PC 2003 (ARMV4)
+		Release|Win32 = Release|Win32
+	EndGlobalSection
+	GlobalSection(ProjectConfigurationPlatforms) = postSolution
+		{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Debug|Mixed Platforms.ActiveCfg = Debug|Pocket PC 2003 (ARMV4)
+		{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Debug|Mixed Platforms.Build.0 = Debug|Pocket PC 2003 (ARMV4)
+		{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Debug|Mixed Platforms.Deploy.0 = Debug|Pocket PC 2003 (ARMV4)
+		{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Debug|Pocket PC 2003 (ARMV4).ActiveCfg = Debug|Pocket PC 2003 (ARMV4)
+		{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Debug|Pocket PC 2003 (ARMV4).Build.0 = Debug|Pocket PC 2003 (ARMV4)
+		{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Debug|Pocket PC 2003 (ARMV4).Deploy.0 = Debug|Pocket PC 2003 (ARMV4)
+		{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Debug|Win32.ActiveCfg = Debug|Pocket PC 2003 (ARMV4)
+		{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Release|Mixed Platforms.ActiveCfg = Release|Pocket PC 2003 (ARMV4)
+		{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Release|Mixed Platforms.Build.0 = Release|Pocket PC 2003 (ARMV4)
+		{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Release|Mixed Platforms.Deploy.0 = Release|Pocket PC 2003 (ARMV4)
+		{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Release|Pocket PC 2003 (ARMV4).ActiveCfg = Release|Pocket PC 2003 (ARMV4)
+		{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Release|Pocket PC 2003 (ARMV4).Build.0 = Release|Pocket PC 2003 (ARMV4)
+		{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Release|Pocket PC 2003 (ARMV4).Deploy.0 = Release|Pocket PC 2003 (ARMV4)
+		{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Release|Win32.ActiveCfg = Release|Pocket PC 2003 (ARMV4)
+		{E1360C65-D375-4335-8057-7ED99CC3F9B2}.Debug|Mixed Platforms.ActiveCfg = Release|Win32
+		{E1360C65-D375-4335-8057-7ED99CC3F9B2}.Debug|Mixed Platforms.Build.0 = Release|Win32
+		{E1360C65-D375-4335-8057-7ED99CC3F9B2}.Debug|Pocket PC 2003 (ARMV4).ActiveCfg = Release|Win32
+		{E1360C65-D375-4335-8057-7ED99CC3F9B2}.Debug|Win32.ActiveCfg = Release|Win32
+		{E1360C65-D375-4335-8057-7ED99CC3F9B2}.Debug|Win32.Build.0 = Release|Win32
+		{E1360C65-D375-4335-8057-7ED99CC3F9B2}.Release|Mixed Platforms.ActiveCfg = Release|Win32
+		{E1360C65-D375-4335-8057-7ED99CC3F9B2}.Release|Mixed Platforms.Build.0 = Release|Win32
+		{E1360C65-D375-4335-8057-7ED99CC3F9B2}.Release|Pocket PC 2003 (ARMV4).ActiveCfg = Release|Win32
+		{E1360C65-D375-4335-8057-7ED99CC3F9B2}.Release|Win32.ActiveCfg = Release|Win32
+		{E1360C65-D375-4335-8057-7ED99CC3F9B2}.Release|Win32.Build.0 = Release|Win32
+		{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Debug|Mixed Platforms.ActiveCfg = Debug|Pocket PC 2003 (ARMV4)
+		{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Debug|Mixed Platforms.Build.0 = Debug|Pocket PC 2003 (ARMV4)
+		{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Debug|Mixed Platforms.Deploy.0 = Debug|Pocket PC 2003 (ARMV4)
+		{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Debug|Pocket PC 2003 (ARMV4).ActiveCfg = Debug|Pocket PC 2003 (ARMV4)
+		{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Debug|Pocket PC 2003 (ARMV4).Build.0 = Debug|Pocket PC 2003 (ARMV4)
+		{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Debug|Pocket PC 2003 (ARMV4).Deploy.0 = Debug|Pocket PC 2003 (ARMV4)
+		{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Debug|Win32.ActiveCfg = Debug|Pocket PC 2003 (ARMV4)
+		{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Release|Mixed Platforms.ActiveCfg = Release|Pocket PC 2003 (ARMV4)
+		{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Release|Mixed Platforms.Build.0 = Release|Pocket PC 2003 (ARMV4)
+		{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Release|Mixed Platforms.Deploy.0 = Release|Pocket PC 2003 (ARMV4)
+		{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Release|Pocket PC 2003 (ARMV4).ActiveCfg = Release|Pocket PC 2003 (ARMV4)
+		{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Release|Pocket PC 2003 (ARMV4).Build.0 = Release|Pocket PC 2003 (ARMV4)
+		{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Release|Pocket PC 2003 (ARMV4).Deploy.0 = Release|Pocket PC 2003 (ARMV4)
+		{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Release|Win32.ActiveCfg = Release|Pocket PC 2003 (ARMV4)
+		{A955FC4A-73F1-44F7-135E-30D84D32F022}.Debug|Mixed Platforms.ActiveCfg = Debug|Pocket PC 2003 (ARMV4)
+		{A955FC4A-73F1-44F7-135E-30D84D32F022}.Debug|Mixed Platforms.Build.0 = Debug|Pocket PC 2003 (ARMV4)
+		{A955FC4A-73F1-44F7-135E-30D84D32F022}.Debug|Mixed Platforms.Deploy.0 = Debug|Pocket PC 2003 (ARMV4)
+		{A955FC4A-73F1-44F7-135E-30D84D32F022}.Debug|Pocket PC 2003 (ARMV4).ActiveCfg = Debug|Pocket PC 2003 (ARMV4)
+		{A955FC4A-73F1-44F7-135E-30D84D32F022}.Debug|Pocket PC 2003 (ARMV4).Build.0 = Debug|Pocket PC 2003 (ARMV4)
+		{A955FC4A-73F1-44F7-135E-30D84D32F022}.Debug|Pocket PC 2003 (ARMV4).Deploy.0 = Debug|Pocket PC 2003 (ARMV4)
+		{A955FC4A-73F1-44F7-135E-30D84D32F022}.Debug|Win32.ActiveCfg = Debug|Pocket PC 2003 (ARMV4)
+		{A955FC4A-73F1-44F7-135E-30D84D32F022}.Release|Mixed Platforms.ActiveCfg = Release|Pocket PC 2003 (ARMV4)
+		{A955FC4A-73F1-44F7-135E-30D84D32F022}.Release|Mixed Platforms.Build.0 = Release|Pocket PC 2003 (ARMV4)
+		{A955FC4A-73F1-44F7-135E-30D84D32F022}.Release|Mixed Platforms.Deploy.0 = Release|Pocket PC 2003 (ARMV4)
+		{A955FC4A-73F1-44F7-135E-30D84D32F022}.Release|Pocket PC 2003 (ARMV4).ActiveCfg = Release|Pocket PC 2003 (ARMV4)
+		{A955FC4A-73F1-44F7-135E-30D84D32F022}.Release|Pocket PC 2003 (ARMV4).Build.0 = Release|Pocket PC 2003 (ARMV4)
+		{A955FC4A-73F1-44F7-135E-30D84D32F022}.Release|Pocket PC 2003 (ARMV4).Deploy.0 = Release|Pocket PC 2003 (ARMV4)
+		{A955FC4A-73F1-44F7-135E-30D84D32F022}.Release|Win32.ActiveCfg = Release|Pocket PC 2003 (ARMV4)
+	EndGlobalSection
+	GlobalSection(SolutionProperties) = preSolution
+		HideSolutionNode = FALSE
+	EndGlobalSection
+EndGlobal
--- a/build/make/Makefile
+++ b/build/make/Makefile
@@ -152,8 +152,8 @@ endif
 # Rule to extract assembly constants from C sources
 #
 obj_int_extract: build/make/obj_int_extract.c
-	$(if $(quiet),@echo "    [HOSTCC] $@")
-	$(qexec)$(HOSTCC) -I. -I$(SRC_PATH_BARE) -o $@ $<
+	$(if $(quiet),echo "    [HOSTCC] $@")
+	$(qexec)$(HOSTCC) -I. -o $@ $<
 CLEAN-OBJS += obj_int_extract

 #
@@ -255,7 +255,7 @@ ifeq ($(filter clean,$(MAKECMDGOALS)),)
 endif

 #
-# Configuration dependent rules
+# Configuration dependant rules
 #
 $(call pairmap,install_map_templates,$(INSTALL_MAPS))

@@ -331,8 +331,11 @@ ifneq ($(call enabled,DIST-SRCS),)
    DIST-SRCS-$(CONFIG_MSVS)  += build/make/gen_msvs_sln.sh
    DIST-SRCS-$(CONFIG_MSVS)  += build/x86-msvs/yasm.rules
    DIST-SRCS-$(CONFIG_RVCT) += build/make/armlink_adapter.sh
-    # Include obj_int_extract if we use offsets from asm_*_offsets
-    DIST-SRCS-$(ARCH_ARM)$(ARCH_X86)$(ARCH_X86_64)    += build/make/obj_int_extract.c
+    #
+    # This isn't really ARCH_ARM dependent, it's dependant on whether we're
+    # using assembly code or not (CONFIG_OPTIMIZATIONS maybe). Just use
+    # this for now.
+    DIST-SRCS-$(ARCH_ARM)    += build/make/obj_int_extract.c
    DIST-SRCS-$(ARCH_ARM)    += build/make/ads2gas.pl
    DIST-SRCS-yes            += $(target:-$(TOOLCHAIN)=).mk
 endif
--- a/build/make/armlink_adapter.sh
+++ b/build/make/armlink_adapter.sh
@@ -17,17 +17,15 @@ for i; do
        on_of=1
    elif [ "$i" == "-v" ]; then
        verbose=1
-    elif [ "$i" == "-g" ]; then
-        args="${args} --debug"
    elif [ "$on_of" == "1" ]; then
        outfile=$i
-        on_of=0
+    on_of=0
    elif [ -f "$i" ]; then
        infiles="$infiles $i"
    elif [ "${i:0:2}" == "-l" ]; then
        libs="$libs ${i#-l}"
    elif [ "${i:0:2}" == "-L" ]; then
-        libpaths="${libpaths} ${i#-L}"
+    libpaths="${libpaths} ${i#-L}"
    else
        args="${args} ${i}"
    fi
--- a/build/make/configure.sh
+++ b/build/make/configure.sh
@@ -78,12 +78,11 @@ Build options:
  --log=yes|no|FILE           file configure log is written to [config.err]
  --target=TARGET             target platform tuple [generic-gnu]
  --cpu=CPU                   optimize for a specific cpu rather than a family
-  --extra-cflags=ECFLAGS      add ECFLAGS to CFLAGS [$CFLAGS]
  ${toggle_extra_warnings}    emit harmless warnings (always non-fatal)
  ${toggle_werror}            treat warnings as errors, if possible
                              (not available with all compilers)
  ${toggle_optimizations}     turn on/off compiler optimization flags
-  ${toggle_pic}               turn on/off Position Independent Code
+  ${toggle_pic}               turn on/off Position Independant Code
  ${toggle_ccache}            turn on/off compiler cache
  ${toggle_debug}             enable/disable debug mode
  ${toggle_gprof}             enable/disable gprof profiling instrumentation
@@ -443,9 +442,6 @@ process_common_cmdline() {
        ;;
        --cpu=*) tune_cpu="$optval"
        ;;
-        --extra-cflags=*)
-        extra_cflags="${optval}"
-        ;;
        --enable-?*|--disable-?*)
        eval `echo "$opt" | sed 's/--/action=/;s/-/ option=/;s/-/_/g'`
        echo "${CMDLINE_SELECT} ${ARCH_EXT_LIST}" | grep "^ *$option\$" >/dev/null || die_unknown $opt
@@ -624,10 +620,6 @@ process_common_toolchain() {

    # Handle Solaris variants. Solaris 10 needs -lposix4
    case ${toolchain} in
-        sparc-solaris-*)
-            add_extralibs -lposix4
-            add_cflags "-DMUST_BE_ALIGNED"
-            ;;
        *-solaris-*)
            add_extralibs -lposix4
            ;;
@@ -668,12 +660,12 @@ process_common_toolchain() {
            elif enabled armv7
            then
                check_add_cflags -march=armv7-a -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp  #-ftree-vectorize
-                check_add_asflags -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp  #-march=armv7-a
+        check_add_asflags -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp  #-march=armv7-a
            else
                check_add_cflags -march=${tgt_isa}
                check_add_asflags -march=${tgt_isa}
            fi
-            enabled debug && add_asflags -g
+
            asm_conversion_cmd="${source_path}/build/make/ads2gas.pl"
            ;;
        rvct)
@@ -698,24 +690,16 @@ process_common_toolchain() {
            arch_int=${tgt_isa##armv}
            arch_int=${arch_int%%te}
            check_add_asflags --pd "\"ARCHITECTURE SETA ${arch_int}\""
-            enabled debug && add_asflags -g
-            add_cflags --gnu
-            add_cflags --enum_is_int
-            add_cflags --wchar32
        ;;
        esac

        case ${tgt_os} in
-        none*)
-            disable multithread
-            disable os_support
-            ;;
        darwin*)
            SDK_PATH=/Developer/Platforms/iPhoneOS.platform/Developer
            TOOLCHAIN_PATH=${SDK_PATH}/usr/bin
            CC=${TOOLCHAIN_PATH}/gcc
            AR=${TOOLCHAIN_PATH}/ar
-            LD=${TOOLCHAIN_PATH}/arm-apple-darwin10-gcc-4.2.1
+            LD=${TOOLCHAIN_PATH}/arm-apple-darwin9-gcc-4.2.1
            AS=${TOOLCHAIN_PATH}/as
            STRIP=${TOOLCHAIN_PATH}/strip
            NM=${TOOLCHAIN_PATH}/nm
@@ -729,18 +713,19 @@ process_common_toolchain() {
            add_cflags -arch ${tgt_isa}
            add_ldflags -arch_only ${tgt_isa}

-            add_cflags  "-isysroot ${SDK_PATH}/SDKs/iPhoneOS4.3.sdk"
+            add_cflags  "-isysroot /Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS3.1.sdk"

            # This should be overridable
-            alt_libc=${SDK_PATH}/SDKs/iPhoneOS4.3.sdk
+            alt_libc=${SDK_PATH}/SDKs/iPhoneOS3.1.sdk

            # Add the paths for the alternate libc
-            for d in usr/include usr/include/gcc/darwin/4.2/ usr/lib/gcc/arm-apple-darwin10/4.2.1/include/; do
+#            for d in usr/include usr/include/gcc/darwin/4.0/; do
+            for d in usr/include usr/include/gcc/darwin/4.0/ usr/lib/gcc/arm-apple-darwin9/4.0.1/include/; do
                try_dir="${alt_libc}/${d}"
                [ -d "${try_dir}" ] && add_cflags -I"${try_dir}"
            done

-            for d in lib usr/lib usr/lib/system; do
+            for d in lib usr/lib; do
                try_dir="${alt_libc}/${d}"
                [ -d "${try_dir}" ] && add_ldflags -L"${try_dir}"
            done
@@ -757,9 +742,13 @@ process_common_toolchain() {
                    || die "Must supply --libc when targetting *-linux-rvct"

                # Set up compiler
+                add_cflags --gnu
+                add_cflags --enum_is_int
                add_cflags --library_interface=aeabi_glibc
                add_cflags --no_hide_all
+                add_cflags --wchar32
                add_cflags --dwarf2
+                add_cflags --gnu

                # Set up linker
                add_ldflags --sysv --no_startup --no_ref_cpp_init
@@ -866,7 +855,7 @@ process_common_toolchain() {
                setup_gnu_toolchain
                add_cflags -use-msasm -use-asm
                add_ldflags -i-static
-                enabled x86_64 && add_cflags -ipo -no-prec-div -static -xSSE2 -axSSE2
+                enabled x86_64 && add_cflags -ipo -no-prec-div -static -xSSE3 -axSSE3
                enabled x86_64 && AR=xiar
                case ${tune_cpu} in
                    atom*)
@@ -884,8 +873,6 @@ process_common_toolchain() {
                link_with_cc=gcc
                tune_cflags="-march="
            setup_gnu_toolchain
-                #for 32 bit x86 builds, -O3 did not turn on this flag
-                enabled optimizations && check_add_cflags -fomit-frame-pointer
                ;;
        esac

@@ -957,40 +944,8 @@ process_common_toolchain() {
        enabled rvct && check_add_cflags -Otime
        enabled small && check_add_cflags -O2 || check_add_cflags -O3
    fi
-    
-    if enabled opencl; then
-        disable multithread
-        echo "  disabling multithread"
-        soft_enable opencl #Provide output to make user comfortable
-        enable runtime_cpu_detect
-	
-        #Use dlopen() to load OpenCL when possible.
-        case ${toolchain} in
-            *darwin10*)
-                check_add_cflags -D__APPLE__
-                add_extralibs -framework OpenCL
-                ;;
-            *-win32-gcc)
-                if check_header dlfcn.h; then
-                    add_extralibs -ldl 
-                    enable dlopen
-                else
-                    #This shouldn't be a hard-coded path in the long term
-                    add_extralibs -L/cygdrive/c/Windows/System32 -lOpenCL
-                fi
-                ;;
-            *)
-                if check_header dlfcn.h; then
-                    add_extralibs -ldl 
-                    enable dlopen
-                else
-                    add_extralibs -lOpenCL
-                fi
-                ;;
-        esac
-    fi

-    # Position Independent Code (PIC) support, for building relocatable
+    # Position Independant Code (PIC) support, for building relocatable
    # shared objects
    enabled gcc && enabled pic && check_add_cflags -fPIC

@@ -1017,12 +972,6 @@ EOF
        add_cflags -D_LARGEFILE_SOURCE
        add_cflags -D_FILE_OFFSET_BITS=64
    fi
-
-    # append any user defined extra cflags
-    if [ -n "${extra_cflags}" ] ; then
-        check_add_cflags ${extra_cflags} || \
-        die "Requested extra CFLAGS '${extra_cflags}' not supported by compiler"
-    fi
 }

 process_toolchain() {
--- a/build/make/gen_msvs_proj.sh
+++ b/build/make/gen_msvs_proj.sh
@@ -32,8 +32,7 @@ Options:
    --name=project_name         Name of the project (required)
    --proj-guid=GUID            GUID to use for the project
    --module-def=filename       File containing export definitions (for DLLs)
-    --ver=version               Version (7,8,9) of visual studio to generate for
-    --src-path-bare=dir         Path to root of source tree
+    --ver=version               Version (7,8) of visual studio to generate for
    -Ipath/to/include           Additional include directories
    -DFLAG[=value]              Preprocessor macros to define
    -Lpath/to/lib               Additional library search paths
@@ -133,7 +132,7 @@ generate_filter() {
    open_tag Filter \
        Name=$name \
        Filter=$pats \
-        UniqueIdentifier=`generate_uuid` \
+        UniqueIdentifier=`generate_uuid`

    file_list_sz=${#file_list[@]}
    for i in ${!file_list[@]}; do
@@ -146,21 +145,31 @@ generate_filter() {
                if [ "$pat" == "asm" ] && $asm_use_custom_step; then
                    for plat in "${platforms[@]}"; do
                        for cfg in Debug Release; do
-                            open_tag FileConfiguration \
-                                Name="${cfg}|${plat}" \
-
+                            open_tag  FileConfiguration \
+                            Name="${cfg}|${plat}"
                            tag Tool \
                                Name="VCCustomBuildTool" \
                                Description="Assembling \$(InputFileName)" \
-                                CommandLine="$(eval echo \$asm_${cfg}_cmdline)" \
-                                Outputs="\$(InputName).obj" \
-
+                                CommandLine="$(eval echo \$asm_${cfg}_cmdline)"\
+                                Outputs="\$(InputName).obj"
                            close_tag FileConfiguration
                        done
                    done
                fi

-                close_tag File
+                if [ "${f##*.}" == "cpp" ]; then
+                    for plat in "${platforms[@]}"; do
+                        for cfg in Debug Release; do
+                        open_tag FileConfiguration \
+                            Name="${cfg}|${plat}"
+                        tag Tool \
+                            Name="VCCLCompilerTool" \
+                            CompileAs="2"
+                        close_tag FileConfiguration
+                        done
+                    done
+                fi
+                close_tag  File

                break
            fi
@@ -176,63 +185,57 @@ unset target
 for opt in "$@"; do
    optval="${opt#*=}"
    case "$opt" in
-        --help|-h) show_help
-        ;;
-        --target=*) target="${optval}"
-        ;;
-        --out=*) outfile="$optval"
-        ;;
-        --name=*) name="${optval}"
-        ;;
-        --proj-guid=*) guid="${optval}"
-        ;;
-        --module-def=*) link_opts="${link_opts} ModuleDefinitionFile=${optval}"
-        ;;
-        --exe) proj_kind="exe"
-        ;;
-        --lib) proj_kind="lib"
-        ;;
-        --src-path-bare=*) src_path_bare="$optval"
-        ;;
-        --static-crt) use_static_runtime=true
-        ;;
-        --ver=*)
-            vs_ver="$optval"
-            case "$optval" in
-                [789])
-                ;;
-                *) die Unrecognized Visual Studio Version in $opt
-                ;;
-            esac
-        ;;
-        -I*)
-            opt="${opt%/}"
-            incs="${incs}${incs:+;}&quot;${opt##-I}&quot;"
-            yasmincs="${yasmincs} ${opt}"
-        ;;
-        -D*) defines="${defines}${defines:+;}${opt##-D}"
-        ;;
-        -L*) # fudge . to $(OutDir)
-            if [ "${opt##-L}" == "." ]; then
-                libdirs="${libdirs}${libdirs:+;}&quot;\$(OutDir)&quot;"
-            else
-                 # Also try directories for this platform/configuration
-                 libdirs="${libdirs}${libdirs:+;}&quot;${opt##-L}&quot;"
-                 libdirs="${libdirs}${libdirs:+;}&quot;${opt##-L}/\$(PlatformName)/\$(ConfigurationName)&quot;"
-                 libdirs="${libdirs}${libdirs:+;}&quot;${opt##-L}/\$(PlatformName)&quot;"
-            fi
-        ;;
-        -l*) libs="${libs}${libs:+ }${opt##-l}.lib"
-        ;;
-        -*) die_unknown $opt
-        ;;
-        *)
-            file_list[${#file_list[@]}]="$opt"
-            case "$opt" in
-                 *.asm) uses_asm=true
-                 ;;
-            esac
-        ;;
+    --help|-h) show_help
+    ;;
+    --target=*) target="${optval}"
+    ;;
+    --out=*) outfile="$optval"
+    ;;
+    --name=*) name="${optval}"
+    ;;
+    --proj-guid=*) guid="${optval}"
+    ;;
+    --module-def=*)
+        link_opts="${link_opts} ModuleDefinitionFile=${optval}"
+    ;;
+    --exe) proj_kind="exe"
+    ;;
+    --lib) proj_kind="lib"
+    ;;
+    --static-crt) use_static_runtime=true
+    ;;
+    --ver=*) vs_ver="$optval"
+             case $optval in
+             [789])
+             ;;
+             *) die Unrecognized Visual Studio Version in $opt
+             ;;
+             esac
+    ;;
+    -I*) opt="${opt%/}"
+         incs="${incs}${incs:+;}&quot;${opt##-I}&quot;"
+         yasmincs="${yasmincs} ${opt}"
+    ;;
+    -D*) defines="${defines}${defines:+;}${opt##-D}"
+    ;;
+    -L*) # fudge . to $(OutDir)
+         if [ "${opt##-L}" == "." ]; then
+             libdirs="${libdirs}${libdirs:+;}&quot;\$(OutDir)&quot;"
+         else
+             # Also try directories for this platform/configuration
+             libdirs="${libdirs}${libdirs:+;}&quot;${opt##-L}&quot;"
+             libdirs="${libdirs}${libdirs:+;}&quot;${opt##-L}/\$(PlatformName)/\$(ConfigurationName)&quot;"
+             libdirs="${libdirs}${libdirs:+;}&quot;${opt##-L}/\$(PlatformName)&quot;"
+         fi
+    ;;
+    -l*) libs="${libs}${libs:+ }${opt##-l}.lib"
+    ;;
+    -*) die_unknown $opt
+    ;;
+    *) file_list[${#file_list[@]}]="$opt"
+       case "$opt" in
+       *.asm) uses_asm=true;;
+       esac
    esac
 done
 outfile=${outfile:-/dev/stdout}
@@ -275,7 +278,11 @@ done

 # List Keyword for this target
 case "$target" in
-    x86*) keyword="ManagedCProj"
+    x86*)
+        keyword="ManagedCProj"
+    ;;
+    arm*|iwmmx*)
+        keyword="Win32Proj"
    ;;
    *) die "Unsupported target $target!"
 esac
@@ -291,255 +298,402 @@ case "$target" in
        asm_Debug_cmdline="yasm -Xvc -g cv8 -f \$(PlatformName) ${yasmincs} &quot;\$(InputPath)&quot;"
        asm_Release_cmdline="yasm -Xvc -f \$(PlatformName) ${yasmincs} &quot;\$(InputPath)&quot;"
    ;;
+    arm*|iwmmx*)
+        case "${name}" in
+        obj_int_extract) platforms[0]="Win32"
+        ;;
+        *) platforms[0]="Pocket PC 2003 (ARMV4)"
+        ;;
+        esac
+    ;;
    *) die "Unsupported target $target!"
+esac
+
+# List Command-line Arguments for this target
+case "$target" in
+    arm*|iwmmx*)
+        if [ "$name" == "example" ];then
+            ARGU="--codec vp6 --flipuv --progress _bnd.vp6"
+        fi
+        if [ "$name" == "xma" ];then
+            ARGU="--codec vp6 -h 240 -w 320 -v"
+        fi
    ;;
 esac

 generate_vcproj() {
    case "$proj_kind" in
-        exe) vs_ConfigurationType=1
-        ;;
-        *)   vs_ConfigurationType=4
-        ;;
+    exe) vs_ConfigurationType=1
+    ;;
+    *)   vs_ConfigurationType=4
+    ;;
    esac

    echo "<?xml version=\"1.0\" encoding=\"Windows-1252\"?>"
-    open_tag VisualStudioProject \
-        ProjectType="Visual C++" \
-        Version="${vs_ver_id}" \
-        Name="${name}" \
-        ProjectGUID="{${guid}}" \
-        RootNamespace="${name}" \
-        Keyword="${keyword}" \
+    open_tag  VisualStudioProject \
+                  ProjectType="Visual C++" \
+                  Version="${vs_ver_id}" \
+                  Name="${name}" \
+                  ProjectGUID="{${guid}}" \
+                  RootNamespace="${name}" \
+                  Keyword="${keyword}"

-    open_tag Platforms
+    open_tag  Platforms
    for plat in "${platforms[@]}"; do
-        tag Platform Name="$plat"
+        tag   Platform Name="$plat"
    done
    close_tag Platforms

-    open_tag ToolFiles
+    open_tag  ToolFiles
    case "$target" in
        x86*) $uses_asm && tag ToolFile RelativePath="$self_dirname/../x86-msvs/yasm.rules"
        ;;
+        arm*|iwmmx*)
+            if [ "$name" == "vpx" ];then
+            case "$target" in
+                armv5*)
+                    tag ToolFile RelativePath="$self_dirname/../arm-wince-vs8/armasmv5.rules"
+                ;;
+                armv6*)
+                    tag ToolFile RelativePath="$self_dirname/../arm-wince-vs8/armasmv6.rules"
+                ;;
+                iwmmxt*)
+                    tag ToolFile RelativePath="$self_dirname/../arm-wince-vs8/armasmxscale.rules"
+                ;;
+            esac
+            fi
+        ;;
    esac
    close_tag ToolFiles

-    open_tag Configurations
+    open_tag  Configurations
    for plat in "${platforms[@]}"; do
        plat_no_ws=`echo $plat | sed 's/[^A-Za-z0-9_]/_/g'`
-        open_tag Configuration \
-            Name="Debug|$plat" \
-            OutputDirectory="\$(SolutionDir)$plat_no_ws/\$(ConfigurationName)" \
-            IntermediateDirectory="$plat_no_ws/\$(ConfigurationName)/${name}" \
-            ConfigurationType="$vs_ConfigurationType" \
-            CharacterSet="1" \
+        open_tag  Configuration \
+                      Name="Debug|$plat" \
+                      OutputDirectory="\$(SolutionDir)$plat_no_ws/\$(ConfigurationName)" \
+                      IntermediateDirectory="$plat_no_ws/\$(ConfigurationName)/${name}" \
+                      ConfigurationType="$vs_ConfigurationType" \
+                      CharacterSet="1"
+
+        if [ "$target" == "armv6-wince-vs8" ] || [ "$target" == "armv5te-wince-vs8" ] || [ "$target" == "iwmmxt-wince-vs8" ] || [ "$target" == "iwmmxt2-wince-vs8" ];then
+            case "$name" in
+                vpx)         tag Tool \
+                             Name="VCPreBuildEventTool" \
+                             CommandLine="call obj_int_extract.bat \$(ConfigurationName)"
+                             tag Tool \
+                             Name="VCMIDLTool" \
+                             TargetEnvironment="1"
+                             tag Tool \
+                             Name="VCCLCompilerTool" \
+                             ExecutionBucket="7" \
+                             Optimization="0" \
+                             AdditionalIncludeDirectories="$incs" \
+                             PreprocessorDefinitions="_DEBUG;_WIN32_WCE=\$(CEVER);UNDER_CE;\$(PLATFORMDEFINES);WINCE;DEBUG;_LIB;\$(ARCHFAM);\$(_ARCHFAM_);_UNICODE;UNICODE;" \
+                             MinimalRebuild="true" \
+                             RuntimeLibrary="1" \
+                             BufferSecurityCheck="false" \
+                             UsePrecompiledHeader="0" \
+                             WarningLevel="3" \
+                             DebugInformationFormat="1" \
+                             CompileAs="1"
+                             tag Tool \
+                             Name="VCResourceCompilerTool" \
+                             PreprocessorDefinitions="_DEBUG;_WIN32_WCE=\$(CEVER);UNDER_CE;\$(PLATFORMDEFINES)" \
+                             Culture="1033" \
+                             AdditionalIncludeDirectories="\$(IntDir)" \
+                ;;
+                example|xma) tag Tool \
+                             Name="VCCLCompilerTool" \
+                             ExecutionBucket="7" \
+                             Optimization="0" \
+                             AdditionalIncludeDirectories="$incs" \
+                             PreprocessorDefinitions="_DEBUG;_WIN32_WCE=\$(CEVER);UNDER_CE;\$(PLATFORMDEFINES);WINCE;DEBUG;_CONSOLE;\$(ARCHFAM);\$(_ARCHFAM_);_UNICODE;UNICODE;" \
+                             MinimalRebuild="true" \
+                             RuntimeLibrary="1" \
+                             BufferSecurityCheck="false" \
+                             UsePrecompiledHeader="0" \
+                             WarningLevel="3" \
+                             DebugInformationFormat="1" \
+                             CompileAs="1"
+                             tag Tool \
+                             Name="VCResourceCompilerTool" \
+                             PreprocessorDefinitions="_DEBUG;_WIN32_WCE=\$(CEVER);UNDER_CE;\$(PLATFORMDEFINES)" \
+                             Culture="1033" \
+                             AdditionalIncludeDirectories="\$(IntDir)" \
+                ;;
+                obj_int_extract) tag Tool \
+                             Name="VCCLCompilerTool" \
+                             Optimization="0" \
+                             AdditionalIncludeDirectories="$incs" \
+                             PreprocessorDefinitions="WIN32;DEBUG;_CONSOLE" \
+                             RuntimeLibrary="1" \
+                             WarningLevel="3" \
+                             DebugInformationFormat="1" \
+                ;;
+            esac
+        fi

        case "$target" in
-            x86*)
-                case "$name" in
-                    obj_int_extract)
-                        tag Tool \
-                            Name="VCCLCompilerTool" \
-                            Optimization="0" \
-                            AdditionalIncludeDirectories="$incs" \
-                            PreprocessorDefinitions="WIN32;DEBUG;_CONSOLE;_CRT_SECURE_NO_WARNINGS;_CRT_SECURE_NO_DEPRECATE" \
-                            RuntimeLibrary="$debug_runtime" \
-                            WarningLevel="3" \
-                            Detect64BitPortabilityProblems="true" \
-                            DebugInformationFormat="1" \
-                    ;;
-                    vpx)
-                        tag Tool \
-                            Name="VCPreBuildEventTool" \
-                            CommandLine="call obj_int_extract.bat $src_path_bare" \
+            x86*) tag Tool \
+                Name="VCCLCompilerTool" \
+                Optimization="0" \
+                AdditionalIncludeDirectories="$incs" \
+                PreprocessorDefinitions="WIN32;_DEBUG;_CRT_SECURE_NO_WARNINGS;_CRT_SECURE_NO_DEPRECATE;$defines" \
+                RuntimeLibrary="$debug_runtime" \
+                UsePrecompiledHeader="0" \
+                WarningLevel="3" \
+                DebugInformationFormat="1" \
+                Detect64BitPortabilityProblems="true" \

-                        tag Tool \
-                            Name="VCCLCompilerTool" \
-                            Optimization="0" \
-                            AdditionalIncludeDirectories="$incs" \
-                            PreprocessorDefinitions="WIN32;_DEBUG;_CRT_SECURE_NO_WARNINGS;_CRT_SECURE_NO_DEPRECATE;$defines" \
-                            RuntimeLibrary="$debug_runtime" \
-                            UsePrecompiledHeader="0" \
-                            WarningLevel="3" \
-                            DebugInformationFormat="1" \
-                            Detect64BitPortabilityProblems="true" \
-
-                        $uses_asm && tag Tool Name="YASM"  IncludePaths="$incs" Debug="1"
-                    ;;
-                    *)
-                        tag Tool \
-                            Name="VCCLCompilerTool" \
-                            Optimization="0" \
-                            AdditionalIncludeDirectories="$incs" \
-                            PreprocessorDefinitions="WIN32;_DEBUG;_CRT_SECURE_NO_WARNINGS;_CRT_SECURE_NO_DEPRECATE;$defines" \
-                            RuntimeLibrary="$debug_runtime" \
-                            UsePrecompiledHeader="0" \
-                            WarningLevel="3" \
-                            DebugInformationFormat="1" \
-                            Detect64BitPortabilityProblems="true" \
-
-                        $uses_asm && tag Tool Name="YASM"  IncludePaths="$incs" Debug="1"
-                    ;;
-                esac
+                $uses_asm && tag Tool Name="YASM"  IncludePaths="$incs" Debug="1"
            ;;
        esac

        case "$proj_kind" in
            exe)
                case "$target" in
-                    x86*)
+                    x86*) tag Tool \
+                          Name="VCLinkerTool" \
+                          AdditionalDependencies="$debug_libs \$(NoInherit)" \
+                          AdditionalLibraryDirectories="$libdirs" \
+                          GenerateDebugInformation="true" \
+                          ProgramDatabaseFile="\$(OutDir)/${name}.pdb" \
+
+                    ;;
+                    arm*|iwmmx*)
                        case "$name" in
-                            obj_int_extract)
-                                tag Tool \
-                                    Name="VCLinkerTool" \
-                                    OutputFile="${name}.exe" \
-                                    GenerateDebugInformation="true" \
+                            obj_int_extract) tag Tool \
+                                Name="VCLinkerTool" \
+                                OutputFile="${name}.exe" \
+                                GenerateDebugInformation="true"
                            ;;
-                            *)
-                                tag Tool \
-                                    Name="VCLinkerTool" \
-                                    AdditionalDependencies="$debug_libs \$(NoInherit)" \
-                                    AdditionalLibraryDirectories="$libdirs" \
-                                    GenerateDebugInformation="true" \
-                                    ProgramDatabaseFile="\$(OutDir)/${name}.pdb" \
+                            *) tag Tool \
+                                Name="VCLinkerTool" \
+                                AdditionalDependencies="$debug_libs" \
+                                OutputFile="\$(OutDir)/${name}.exe" \
+                                LinkIncremental="2" \
+                                AdditionalLibraryDirectories="${libdirs};&quot;..\lib/$plat_no_ws&quot;" \
+                                DelayLoadDLLs="\$(NOINHERIT)" \
+                                GenerateDebugInformation="true" \
+                                ProgramDatabaseFile="\$(OutDir)/${name}.pdb" \
+                                SubSystem="9" \
+                                StackReserveSize="65536" \
+                                StackCommitSize="4096" \
+                                EntryPointSymbol="mainWCRTStartup" \
+                                TargetMachine="3"
                            ;;
                        esac
-                    ;;
+                     ;;
                 esac
            ;;
            lib)
                case "$target" in
-                    x86*)
-                        tag Tool \
-                            Name="VCLibrarianTool" \
-                            OutputFile="\$(OutDir)/${name}${lib_sfx}d.lib" \
-
-                    ;;
+                      arm*|iwmmx*) tag Tool \
+                                    Name="VCLibrarianTool" \
+                                    AdditionalOptions=" /subsystem:windowsce,4.20 /machine:ARM" \
+                                    OutputFile="\$(OutDir)/${name}.lib" \
+                                ;;
+                                *) tag Tool \
+                                    Name="VCLibrarianTool" \
+                                    OutputFile="\$(OutDir)/${name}${lib_sfx}d.lib" \
+                                ;;
                esac
            ;;
-            dll)
-                tag Tool \
-                    Name="VCLinkerTool" \
-                    AdditionalDependencies="\$(NoInherit)" \
-                    LinkIncremental="2" \
-                    GenerateDebugInformation="true" \
-                    AssemblyDebug="1" \
-                    TargetMachine="1" \
-                    $link_opts \
-
-            ;;
+            dll) tag Tool \
+                 Name="VCLinkerTool" \
+                AdditionalDependencies="\$(NoInherit)" \
+                LinkIncremental="2" \
+                GenerateDebugInformation="true" \
+                AssemblyDebug="1" \
+                TargetMachine="1" \
+                      $link_opts
        esac

+        if [ "$target" == "armv6-wince-vs8" ] || [ "$target" == "armv5te-wince-vs8" ] || [ "$target" == "iwmmxt-wince-vs8" ] || [ "$target" == "iwmmxt2-wince-vs8" ];then
+            case "$name" in
+                vpx)         tag DeploymentTool \
+                             ForceDirty="-1" \
+                             RegisterOutput="0"
+                                ;;
+                example|xma) tag DeploymentTool \
+                             ForceDirty="-1" \
+                             RegisterOutput="0"
+                             tag DebuggerTool \
+                             Arguments="${ARGU}"
+                                ;;
+            esac
+        fi
        close_tag Configuration

-        open_tag Configuration \
-            Name="Release|$plat" \
-            OutputDirectory="\$(SolutionDir)$plat_no_ws/\$(ConfigurationName)" \
-            IntermediateDirectory="$plat_no_ws/\$(ConfigurationName)/${name}" \
-            ConfigurationType="$vs_ConfigurationType" \
-            CharacterSet="1" \
-            WholeProgramOptimization="0" \
+        open_tag  Configuration \
+                      Name="Release|$plat" \
+                      OutputDirectory="\$(SolutionDir)$plat_no_ws/\$(ConfigurationName)" \
+                      IntermediateDirectory="$plat_no_ws/\$(ConfigurationName)/${name}" \
+                      ConfigurationType="$vs_ConfigurationType" \
+                      CharacterSet="1" \
+                      WholeProgramOptimization="0"

-        case "$target" in
-            x86*)
-                case "$name" in
-                    obj_int_extract)
-                        tag Tool \
-                            Name="VCCLCompilerTool" \
-                            AdditionalIncludeDirectories="$incs" \
-                            PreprocessorDefinitions="WIN32;NDEBUG;_CONSOLE;_CRT_SECURE_NO_WARNINGS;_CRT_SECURE_NO_DEPRECATE" \
-                            RuntimeLibrary="$release_runtime" \
-                            UsePrecompiledHeader="0" \
-                            WarningLevel="3" \
-                            Detect64BitPortabilityProblems="true" \
-                            DebugInformationFormat="0" \
-                    ;;
-                    vpx)
-                        tag Tool \
-                            Name="VCPreBuildEventTool" \
-                            CommandLine="call obj_int_extract.bat $src_path_bare" \
+        if [ "$target" == "armv6-wince-vs8" ] || [ "$target" == "armv5te-wince-vs8" ] || [ "$target" == "iwmmxt-wince-vs8" ] || [ "$target" == "iwmmxt2-wince-vs8" ];then
+            case "$name" in
+                vpx)         tag Tool \
+                                     Name="VCPreBuildEventTool" \
+                                     CommandLine="call obj_int_extract.bat \$(ConfigurationName)"
+                             tag Tool \
+                                     Name="VCMIDLTool" \
+                                     TargetEnvironment="1"
+                             tag Tool \
+                                             Name="VCCLCompilerTool" \
+                                             ExecutionBucket="7" \
+                                             Optimization="2" \
+                                             FavorSizeOrSpeed="1" \
+                                             AdditionalIncludeDirectories="$incs" \
+                                             PreprocessorDefinitions="NDEBUG;_WIN32_WCE=\$(CEVER);UNDER_CE;\$(PLATFORMDEFINES);WINCE;_LIB;\$(ARCHFAM);\$(_ARCHFAM_);_UNICODE;UNICODE;" \
+                                             RuntimeLibrary="0" \
+                                             BufferSecurityCheck="false" \
+                                             UsePrecompiledHeader="0" \
+                                             WarningLevel="3" \
+                                             DebugInformationFormat="0" \
+                                             CompileAs="1"
+                             tag Tool \
+                                             Name="VCResourceCompilerTool" \
+                                             PreprocessorDefinitions="NDEBUG;_WIN32_WCE=\$(CEVER);UNDER_CE;\$(PLATFORMDEFINES)" \
+                                             Culture="1033" \
+                                             AdditionalIncludeDirectories="\$(IntDir)" \
+                ;;
+                example|xma) tag Tool \
+                             Name="VCCLCompilerTool" \
+                             ExecutionBucket="7" \
+                             Optimization="2" \
+                             FavorSizeOrSpeed="1" \
+                             AdditionalIncludeDirectories="$incs" \
+                             PreprocessorDefinitions="NDEBUG;_WIN32_WCE=\$(CEVER);UNDER_CE;\$(PLATFORMDEFINES);WINCE;_CONSOLE;\$(ARCHFAM);\$(_ARCHFAM_);_UNICODE;UNICODE;" \
+                             RuntimeLibrary="0" \
+                             BufferSecurityCheck="false" \
+                             UsePrecompiledHeader="0" \
+                             WarningLevel="3" \
+                             DebugInformationFormat="0" \
+                             CompileAs="1"
+                             tag Tool \
+                             Name="VCResourceCompilerTool" \
+                             PreprocessorDefinitions="NDEBUG;_WIN32_WCE=\$(CEVER);UNDER_CE;\$(PLATFORMDEFINES)" \
+                             Culture="1033" \
+                             AdditionalIncludeDirectories="\$(IntDir)" \
+                ;;
+                obj_int_extract) tag Tool \
+                             Name="VCCLCompilerTool" \
+                             AdditionalIncludeDirectories="$incs" \
+                             PreprocessorDefinitions="WIN32;NDEBUG;_CONSOLE" \
+                             RuntimeLibrary="0" \
+                             UsePrecompiledHeader="0" \
+                             WarningLevel="3" \
+                             Detect64BitPortabilityProblems="true" \
+                             DebugInformationFormat="0" \
+                ;;
+            esac
+        fi

-                        tag Tool \
-                            Name="VCCLCompilerTool" \
-                            AdditionalIncludeDirectories="$incs" \
-                            PreprocessorDefinitions="WIN32;NDEBUG;_CRT_SECURE_NO_WARNINGS;_CRT_SECURE_NO_DEPRECATE;$defines" \
-                            RuntimeLibrary="$release_runtime" \
-                            UsePrecompiledHeader="0" \
-                            WarningLevel="3" \
-                            DebugInformationFormat="0" \
-                            Detect64BitPortabilityProblems="true" \
+    case "$target" in
+        x86*) tag       Tool \
+                      Name="VCCLCompilerTool" \
+                      AdditionalIncludeDirectories="$incs" \
+                      PreprocessorDefinitions="WIN32;NDEBUG;_CRT_SECURE_NO_WARNINGS;_CRT_SECURE_NO_DEPRECATE;$defines" \
+                      RuntimeLibrary="$release_runtime" \
+                      UsePrecompiledHeader="0" \
+                      WarningLevel="3" \
+                      DebugInformationFormat="0" \
+                      Detect64BitPortabilityProblems="true"

-                        $uses_asm && tag Tool Name="YASM"  IncludePaths="$incs"
-                    ;;
-                    *)
-                        tag Tool \
-                            Name="VCCLCompilerTool" \
-                            AdditionalIncludeDirectories="$incs" \
-                            PreprocessorDefinitions="WIN32;NDEBUG;_CRT_SECURE_NO_WARNINGS;_CRT_SECURE_NO_DEPRECATE;$defines" \
-                            RuntimeLibrary="$release_runtime" \
-                            UsePrecompiledHeader="0" \
-                            WarningLevel="3" \
-                            DebugInformationFormat="0" \
-                            Detect64BitPortabilityProblems="true" \
-
-                        $uses_asm && tag Tool Name="YASM"  IncludePaths="$incs"
-                    ;;
+                $uses_asm && tag Tool Name="YASM"  IncludePaths="$incs"
+                ;;
                esac
-            ;;
-        esac

        case "$proj_kind" in
            exe)
                case "$target" in
-                    x86*)
+                    x86*) tag Tool \
+                                  Name="VCLinkerTool" \
+                                  AdditionalDependencies="$libs \$(NoInherit)" \
+                                  AdditionalLibraryDirectories="$libdirs" \
+                    ;;
+                    arm*|iwmmx*)
                        case "$name" in
-                            obj_int_extract)
-                                tag Tool \
-                                    Name="VCLinkerTool" \
-                                    OutputFile="${name}.exe" \
-                                    GenerateDebugInformation="true" \
+                            obj_int_extract) tag Tool \
+                                Name="VCLinkerTool" \
+                                OutputFile="${name}.exe" \
+                                LinkIncremental="1" \
+                                GenerateDebugInformation="false" \
+                                SubSystem="0" \
+                                OptimizeReferences="0" \
+                                EnableCOMDATFolding="0" \
+                                TargetMachine="0"
                            ;;
-                            *)
-                                tag Tool \
-                                    Name="VCLinkerTool" \
-                                    AdditionalDependencies="$libs \$(NoInherit)" \
-                                    AdditionalLibraryDirectories="$libdirs" \
-
+                            *) tag Tool \
+                                Name="VCLinkerTool" \
+                                AdditionalDependencies="$libs" \
+                                OutputFile="\$(OutDir)/${name}.exe" \
+                                LinkIncremental="1" \
+                                AdditionalLibraryDirectories="${libdirs};&quot;..\lib/$plat_no_ws&quot;" \
+                                DelayLoadDLLs="\$(NOINHERIT)" \
+                                GenerateDebugInformation="true" \
+                                ProgramDatabaseFile="\$(OutDir)/${name}.pdb" \
+                                SubSystem="9" \
+                                StackReserveSize="65536" \
+                                StackCommitSize="4096" \
+                                OptimizeReferences="2" \
+                                EnableCOMDATFolding="2" \
+                                EntryPointSymbol="mainWCRTStartup" \
+                                TargetMachine="3"
                            ;;
                        esac
-                    ;;
+                     ;;
                 esac
            ;;
-            lib)
+        lib)
                case "$target" in
-                    x86*)
-                        tag Tool \
-                            Name="VCLibrarianTool" \
-                            OutputFile="\$(OutDir)/${name}${lib_sfx}.lib" \
-
-                    ;;
+                      arm*|iwmmx*) tag Tool \
+                                    Name="VCLibrarianTool" \
+                                    AdditionalOptions=" /subsystem:windowsce,4.20 /machine:ARM" \
+                                    OutputFile="\$(OutDir)/${name}.lib" \
+                                ;;
+                                *) tag Tool \
+                                    Name="VCLibrarianTool" \
+                                    OutputFile="\$(OutDir)/${name}${lib_sfx}.lib" \
+                                ;;
                esac
-            ;;
-            dll) # note differences to debug version: LinkIncremental, AssemblyDebug
-                tag Tool \
-                    Name="VCLinkerTool" \
-                    AdditionalDependencies="\$(NoInherit)" \
-                    LinkIncremental="1" \
-                    GenerateDebugInformation="true" \
-                    TargetMachine="1" \
-                    $link_opts \
-
-            ;;
+        ;;
+        dll) # note differences to debug version: LinkIncremental, AssemblyDebug
+             tag Tool \
+                      Name="VCLinkerTool" \
+                      AdditionalDependencies="\$(NoInherit)" \
+                      LinkIncremental="1" \
+                      GenerateDebugInformation="true" \
+                      TargetMachine="1" \
+                      $link_opts
        esac

+        if [ "$target" == "armv6-wince-vs8" ] || [ "$target" == "armv5te-wince-vs8" ] || [ "$target" == "iwmmxt-wince-vs8" ] || [ "$target" == "iwmmxt2-wince-vs8" ];then
+            case "$name" in
+                vpx)         tag DeploymentTool \
+                             ForceDirty="-1" \
+                             RegisterOutput="0"
+                ;;
+                example|xma) tag DeploymentTool \
+                             ForceDirty="-1" \
+                             RegisterOutput="0"
+                             tag DebuggerTool \
+                             Arguments="${ARGU}"
+                                ;;
+            esac
+        fi
+
        close_tag Configuration
    done
    close_tag Configurations

-    open_tag Files
-    generate_filter srcs   "Source Files"   "c;def;odl;idl;hpj;bat;asm;asmx"
-    generate_filter hdrs   "Header Files"   "h;hm;inl;inc;xsd"
+    open_tag  Files
+    generate_filter srcs   "Source Files"   "cpp;c;cc;cxx;def;odl;idl;hpj;bat;asm;asmx"
+    generate_filter hdrs   "Header Files"   "h;hpp;hxx;hm;inl;inc;xsd"
    generate_filter resrcs "Resource Files" "rc;ico;cur;bmp;dlg;rc2;rct;bin;rgs;gif;jpg;jpeg;jpe;resx;tiff;tif;png;wav"
    generate_filter resrcs "Build Files"    "mk"
    close_tag Files
--- a/build/make/gen_msvs_sln.sh
+++ b/build/make/gen_msvs_sln.sh
@@ -139,6 +139,9 @@ process_global() {
            echo "${indent}${proj_guid}.${config}.ActiveCfg = ${config}"
            echo "${indent}${proj_guid}.${config}.Build.0 = ${config}"

+            if [ "$target" == "armv6-wince-vs8" ] || [ "$target" == "armv5te-wince-vs8" ] || [ "$target" == "iwmmxt-wince-vs8" ] || [ "$target" == "iwmmxt2-wince-vs8" ];then
+                echo "${indent}${proj_guid}.${config}.Deploy.0 = ${config}"
+            fi
        done
        IFS=${IFS_bak}
    done
--- a/build/make/obj_int_extract.c
+++ b/build/make/obj_int_extract.c
--- a/build/x86-msvs/obj_int_extract.bat
+++ b/build/x86-msvs/obj_int_extract.bat
@@ -1,15 +0,0 @@
-REM   Copyright (c) 2011 The WebM project authors. All Rights Reserved.
-REM
-REM   Use of this source code is governed by a BSD-style license
-REM   that can be found in the LICENSE file in the root of the source
-REM   tree. An additional intellectual property rights grant can be found
-REM   in the file PATENTS.  All contributing project authors may
-REM   be found in the AUTHORS file in the root of the source tree.
-echo on
-
-cl /I "./" /I "%1" /nologo /c "%1/vp8/common/asm_com_offsets.c"
-cl /I "./" /I "%1" /nologo /c "%1/vp8/decoder/asm_dec_offsets.c"
-cl /I "./" /I "%1" /nologo /c "%1/vp8/encoder/asm_enc_offsets.c"
-obj_int_extract.exe rvds "asm_com_offsets.obj" > "asm_com_offsets.asm"
-obj_int_extract.exe rvds "asm_dec_offsets.obj" > "asm_dec_offsets.asm"
-obj_int_extract.exe rvds "asm_enc_offsets.obj" > "asm_enc_offsets.asm"
--- a/31
+++ b/31
@@ -37,10 +37,11 @@ Advanced options:
  ${toggle_multithread}           multithreaded encoding and decoding.
  ${toggle_spatial_resampling}    spatial sampling (scaling) support
  ${toggle_realtime_only}         enable this option while building for real-time encoding
+  ${toggle_error_concealment}     enable this option to get a decoder which is able to conceal losses
  ${toggle_runtime_cpu_detect}    runtime cpu detection
  ${toggle_shared}                shared library support
  ${toggle_small}                 favor smaller size over speed
-  ${toggle_opencl}                support for OpenCL-assisted VP8 decoding (experimental)
+  ${toggle_arm_asm_detok}         assembly version of the detokenizer (ARM platforms only)
  ${toggle_postproc_visualizer}   macro block / block level visualizers

 Codecs:
@@ -79,21 +80,22 @@ EOF
 # alphabetically by architecture, generic-gnu last.
 all_platforms="${all_platforms} armv5te-linux-rvct"
 all_platforms="${all_platforms} armv5te-linux-gcc"
-all_platforms="${all_platforms} armv5te-none-rvct"
 all_platforms="${all_platforms} armv5te-symbian-gcc"
+all_platforms="${all_platforms} armv5te-wince-vs8"
 all_platforms="${all_platforms} armv6-darwin-gcc"
 all_platforms="${all_platforms} armv6-linux-rvct"
 all_platforms="${all_platforms} armv6-linux-gcc"
-all_platforms="${all_platforms} armv6-none-rvct"
 all_platforms="${all_platforms} armv6-symbian-gcc"
+all_platforms="${all_platforms} armv6-wince-vs8"
 all_platforms="${all_platforms} iwmmxt-linux-rvct"
 all_platforms="${all_platforms} iwmmxt-linux-gcc"
+all_platforms="${all_platforms} iwmmxt-wince-vs8"
 all_platforms="${all_platforms} iwmmxt2-linux-rvct"
 all_platforms="${all_platforms} iwmmxt2-linux-gcc"
+all_platforms="${all_platforms} iwmmxt2-wince-vs8"
 all_platforms="${all_platforms} armv7-darwin-gcc"    #neon Cortex-A8
 all_platforms="${all_platforms} armv7-linux-rvct"    #neon Cortex-A8
 all_platforms="${all_platforms} armv7-linux-gcc"     #neon Cortex-A8
-all_platforms="${all_platforms} armv7-none-rvct"     #neon Cortex-A8
 all_platforms="${all_platforms} mips32-linux-gcc"
 all_platforms="${all_platforms} ppc32-darwin8-gcc"
 all_platforms="${all_platforms} ppc32-darwin9-gcc"
@@ -106,7 +108,6 @@ all_platforms="${all_platforms} x86-darwin8-gcc"
 all_platforms="${all_platforms} x86-darwin8-icc"
 all_platforms="${all_platforms} x86-darwin9-gcc"
 all_platforms="${all_platforms} x86-darwin9-icc"
-all_platforms="${all_platforms} x86-darwin10-gcc"
 all_platforms="${all_platforms} x86-linux-gcc"
 all_platforms="${all_platforms} x86-linux-icc"
 all_platforms="${all_platforms} x86-solaris-gcc"
@@ -159,7 +160,6 @@ enable fast_unaligned #allow unaligned accesses, if supported by hw
 enable md5
 enable spatial_resampling
 enable multithread
-enable os_support

 [ -d ${source_path}/../include ] && enable alt_tree_layout
 for d in vp8; do
@@ -213,7 +213,6 @@ HAVE_LIST="
    alt_tree_layout
    pthread_h
    sys_mman_h
-    dlopen
 "
 CONFIG_LIST="
    external_build
@@ -251,11 +250,11 @@ CONFIG_LIST="
    static_msvcrt
    spatial_resampling
    realtime_only
+    error_concealment
    shared
    small
-    opencl
+    arm_asm_detok
    postproc_visualizer
-    os_support
 "
 CMDLINE_SELECT="
    extra_warnings
@@ -292,9 +291,10 @@ CMDLINE_SELECT="
    mem_tracker
    spatial_resampling
    realtime_only
+    error_concealment
    shared
    small
-    opencl
+    arm_asm_detok
    postproc_visualizer
 "

@@ -303,7 +303,7 @@ process_cmdline() {
        optval="${opt#*=}"
        case "$opt" in
        --disable-codecs) for c in ${CODECS}; do disable $c; done ;;
-        *) process_common_cmdline "$opt"
+        *) process_common_cmdline $opt
        ;;
        esac
    done
@@ -382,7 +382,6 @@ process_targets() {
    if [ -f "${source_path}/build/make/version.sh" ]; then
        local ver=`"$source_path/build/make/version.sh" --bare $source_path`
        DIST_DIR="${DIST_DIR}-${ver}"
-        VERSION_STRING=${ver}
        ver=${ver%%-*}
        VERSION_PATCH=${ver##*.}
        ver=${ver%.*}
@@ -391,8 +390,6 @@ process_targets() {
        VERSION_MAJOR=${ver%.*}
    fi
    enabled child || cat <<EOF >> config.mk
-
-PREFIX=${prefix}
 ifeq (\$(MAKECMDGOALS),dist)
 DIST_DIR?=${DIST_DIR}
 else
@@ -400,8 +397,6 @@ DIST_DIR?=\$(DESTDIR)${prefix}
 endif
 LIBSUBDIR=${libdir##${prefix}/}

-VERSION_STRING=${VERSION_STRING}
-
 VERSION_MAJOR=${VERSION_MAJOR}
 VERSION_MINOR=${VERSION_MINOR}
 VERSION_PATCH=${VERSION_PATCH}
@@ -496,7 +491,7 @@ process_toolchain() {
        check_add_cflags -Wpointer-arith
        check_add_cflags -Wtype-limits
        check_add_cflags -Wcast-qual
-        enabled extra_warnings || check_add_cflags -Wno-unused-function
+        enabled extra_warnings || check_add_cflags -Wno-unused
    fi

    if enabled icc; then
@@ -561,6 +556,4 @@ process "$@"
 cat <<EOF > ${BUILD_PFX}vpx_config.c
 static const char* const cfg = "$CONFIGURE_ARGS";
 const char *vpx_codec_build_config(void) {return cfg;}
-static const char* const libdir = "$libdir";
-const char *vpx_codec_lib_dir(void) {return libdir;}
 EOF
--- a/docs.mk
+++ b/docs.mk
@@ -34,8 +34,7 @@ TXT_DOX = $(call enabled,TXT_DOX)

 EXAMPLE_PATH += $(SRC_PATH_BARE) #for CHANGELOG, README, etc

-doxyfile: $(if $(findstring examples, $(ALL_TARGETS)),examples.doxy)
-doxyfile: libs.doxy_template libs.doxy
+doxyfile: libs.doxy_template libs.doxy examples.doxy
 	@echo "    [CREATE] $@"
 	@cat $^ > $@
 	@echo "STRIP_FROM_PATH += $(SRC_PATH_BARE) $(BUILD_ROOT)" >> $@
--- a/examples.mk
+++ b/examples.mk
@@ -77,6 +77,11 @@ GEN_EXAMPLES-$(CONFIG_ENCODERS) += decode_with_drops.c
 endif
 decode_with_drops.GUID           = CE5C53C4-8DDA-438A-86ED-0DDD3CDB8D26
 decode_with_drops.DESCRIPTION    = Drops frames while decoding
+ifeq ($(CONFIG_DECODERS),yes)
+GEN_EXAMPLES-$(CONFIG_ENCODERS) += decode_with_partial_drops.c
+endif
+decode_partial_with_drops.GUID           = CE5C53C4-8DDA-438A-86ED-0DDD3CDB8D27
+decode_partial_with_drops.DESCRIPTION    = Drops parts of frames while decoding
 GEN_EXAMPLES-$(CONFIG_ENCODERS) += error_resilient.c
 error_resilient.GUID             = DF5837B9-4145-4F92-A031-44E4F832E00C
 error_resilient.DESCRIPTION      = Error Resiliency Feature
@@ -93,16 +98,8 @@ vp8cx_set_ref.DESCRIPTION           = VP8 set encoder reference frame


 # Handle extra library flags depending on codec configuration
-
-# We should not link to math library (libm) on RVCT
-# when building for bare-metal targets
-ifeq ($(CONFIG_OS_SUPPORT), yes)
 CODEC_EXTRA_LIBS-$(CONFIG_VP8)         += m
-else
-    ifeq ($(CONFIG_GCC), yes)
-    CODEC_EXTRA_LIBS-$(CONFIG_VP8)         += m
-    endif
-endif
+
 #
 # End of specified files. The rest of the build rules should happen
 # automagically from here.
--- a/examples/decode_with_partial_drops.txt
+++ b/examples/decode_with_partial_drops.txt
@@ -0,0 +1,213 @@
+@TEMPLATE decoder_tmpl.c
+Decode With Drops Example
+=========================
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ INTRODUCTION
+This is an example utility which drops a series of frames, as specified
+on the command line. This is useful for observing the error recovery
+features of the codec.
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ INTRODUCTION
+
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ EXTRA_INCLUDES
+#include <time.h>
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ EXTRA_INCLUDES
+
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ HELPERS
+struct parsed_header
+{
+    char key_frame;
+    int version;
+    char show_frame;
+    int first_part_size;
+};
+
+int next_packet(struct parsed_header* hdr, int pos, int length, int mtu)
+{
+    int size = 0;
+    int remaining = length - pos;
+    /* Uncompressed part is 3 bytes for P frames and 10 bytes for I frames */
+    int uncomp_part_size = (hdr->key_frame ? 10 : 3);
+    /* number of bytes yet to send from header and the first partition */
+    int remainFirst = uncomp_part_size + hdr->first_part_size - pos;
+    if (remainFirst > 0)
+    {
+        if (remainFirst <= mtu)
+        {
+            size = remainFirst;
+        }
+        else
+        {
+            size = mtu;
+        }
+
+        return size;
+    }
+
+    /* second partition; just slot it up according to MTU */
+    if (remaining <= mtu)
+    {
+        size = remaining;
+        return size;
+    }
+    return mtu;
+}
+
+void throw_packets(unsigned char* frame, int* size, int loss_rate, int* thrown, int* kept)
+{
+    unsigned char loss_frame[256*1024];
+    int pkg_size = 1;
+    int count = 0;
+    int pos = 0;
+    int loss_pos = 0;
+    struct parsed_header hdr;
+    unsigned int tmp;
+    int mtu = 100;
+
+    if (*size < 3)
+    {
+        return;
+    }
+    putc('|', stdout);
+    /* parse uncompressed 3 bytes */
+    tmp = (frame[2] << 16) | (frame[1] << 8) | frame[0];
+    hdr.key_frame = !(tmp & 0x1); /* inverse logic */
+    hdr.version = (tmp >> 1) & 0x7;
+    hdr.show_frame = (tmp >> 4) & 0x1;
+    hdr.first_part_size = (tmp >> 5) & 0x7FFFF;
+
+    /* don't drop key frames */
+    if (hdr.key_frame)
+    {
+        int i;
+        *kept = *size/mtu + ((*size % mtu > 0) ? 1 : 0); /* approximate */
+        for (i=0; i < *kept; i++)
+            putc('.', stdout);
+        return;
+    }
+
+    while ((pkg_size = next_packet(&hdr, pos, *size, mtu)) > 0)
+    {
+        int loss_event = ((rand() + 1.0)/(RAND_MAX + 1.0) < loss_rate/100.0);
+        if (*thrown == 0 && !loss_event)
+        {
+            memcpy(loss_frame + loss_pos, frame + pos, pkg_size);
+            loss_pos += pkg_size;
+            (*kept)++;
+            putc('.', stdout);
+        }
+        else
+        {
+            (*thrown)++;
+            putc('X', stdout);
+        }
+        pos += pkg_size;
+    }
+    memcpy(frame, loss_frame, loss_pos);
+    memset(frame + loss_pos, 0, *size - loss_pos);
+    *size = loss_pos;
+}
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ HELPERS
+
+Usage
+-----
+This example adds a single argument to the `simple_decoder` example,
+which specifies the range or pattern of frames to drop. The parameter is
+parsed as follows:
+
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ USAGE
+if(argc!=4 && argc != 5)
+    die("Usage: %s <infile> <outfile> <N-M|N/M|L,S>\n", argv[0]);
+{
+    char *nptr;
+    n = strtol(argv[3], &nptr, 0);
+    mode = (*nptr == '\0' || *nptr == ',') ? 2 : (*nptr == '-') ? 1 : 0;
+
+    m = strtol(nptr+1, NULL, 0);
+    if((!n && !m) || (*nptr != '-' && *nptr != '/' &&
+        *nptr != '\0' && *nptr != ','))
+        die("Couldn't parse pattern %s\n", argv[3]);
+}
+seed = (m > 0) ? m : (unsigned int)time(NULL);
+srand(seed);thrown_frame = 0;
+printf("Seed: %u\n", seed);
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ USAGE
+
+
+Dropping A Range Of Frames
+--------------------------
+To drop a range of frames, specify the starting frame and the ending
+frame to drop, separated by a dash. The following command will drop
+frames 5 through 10 (base 1).
+
+  $ ./decode_with_drops in.ivf out.i420 5-10
+
+
+Dropping A Pattern Of Frames
+----------------------------
+To drop a pattern of frames, specify the number of frames to drop and
+the number of frames after which to repeat the pattern, separated by
+a forward-slash. The following command will drop 3 of 7 frames.
+Specifically, it will decode 4 frames, then drop 3 frames, and then
+repeat.
+
+  $ ./decode_with_drops in.ivf out.i420 3/7
+
+
+Extra Variables
+---------------
+This example maintains the pattern passed on the command line in the
+`n`, `m`, and `is_range` variables:
+
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ EXTRA_VARS
+int              n, m, mode;                                        //
+unsigned int     seed;
+int              thrown=0, kept=0;
+int              thrown_frame=0, kept_frame=0;
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ EXTRA_VARS
+
+
+Making The Drop Decision
+------------------------
+The example decides whether to drop the frame based on the current
+frame number, immediately before decoding the frame.
+
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ PRE_DECODE
+/* Decide whether to throw parts of the frame or the whole frame
+   depending on the drop mode */
+thrown_frame = 0;
+kept_frame = 0;
+switch (mode)
+{
+case 0:
+    if (m - (frame_cnt-1)%m <= n)
+    {
+        frame_sz = 0;
+    }
+    break;
+case 1:
+    if (frame_cnt >= n && frame_cnt <= m)
+    {
+        frame_sz = 0;
+    }
+    break;
+case 2:
+    throw_packets(frame, &frame_sz, n, &thrown_frame, &kept_frame);
+    break;
+default: break;
+}
+if (mode < 2)
+{
+    if (frame_sz == 0)
+    {
+        putc('X', stdout);
+        thrown_frame++;
+    }
+    else
+    {
+        putc('.', stdout);
+        kept_frame++;
+    }
+}
+thrown += thrown_frame;
+kept += kept_frame;
+fflush(stdout);
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ PRE_DECODE
--- a/examples/decoder_tmpl.c
+++ b/examples/decoder_tmpl.c
@@ -19,7 +19,7 @@
 #define VPX_CODEC_DISABLE_COMPAT 1
 #include "vpx/vpx_decoder.h"
 #include "vpx/vp8dx.h"
-#define interface (vpx_codec_vp8_dx())
+#define interface (&vpx_codec_vp8_dx_algo)
@EXTRA_INCLUDES


@@ -42,6 +42,8 @@ static void die(const char *fmt, ...) {

@DIE_CODEC

+@HELPERS
+
 int main(int argc, char **argv) {
    FILE            *infile, *outfile;
    vpx_codec_ctx_t  codec;
--- a/examples/decoder_tmpl.txt
+++ b/examples/decoder_tmpl.txt
@@ -2,7 +2,7 @@
 #define VPX_CODEC_DISABLE_COMPAT 1
 #include "vpx/vpx_decoder.h"
 #include "vpx/vp8dx.h"
-#define interface (vpx_codec_vp8_dx())
+#define interface (&vpx_codec_vp8_dx_algo)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ DEC_INCLUDES


--- a/examples/encoder_tmpl.c
+++ b/examples/encoder_tmpl.c
@@ -19,7 +19,7 @@
 #define VPX_CODEC_DISABLE_COMPAT 1
 #include "vpx/vpx_encoder.h"
 #include "vpx/vp8cx.h"
-#define interface (vpx_codec_vp8_cx())
+#define interface (&vpx_codec_vp8_cx_algo)
 #define fourcc    0x30385056
@EXTRA_INCLUDES

--- a/examples/encoder_tmpl.txt
+++ b/examples/encoder_tmpl.txt
@@ -2,7 +2,7 @@
 #define VPX_CODEC_DISABLE_COMPAT 1
 #include "vpx/vpx_encoder.h"
 #include "vpx/vp8cx.h"
-#define interface (vpx_codec_vp8_cx())
+#define interface (&vpx_codec_vp8_cx_algo)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ENC_INCLUDES


--- a/examples/simple_decoder.txt
+++ b/examples/simple_decoder.txt
@@ -33,7 +33,7 @@ Initializing The Codec
 ----------------------
 The decoder is initialized by the following code. This is an example for
 the VP8 decoder, but the code is analogous for all algorithms. Replace
-`vpx_codec_vp8_dx()` with a pointer to the interface exposed by the
+`&vpx_codec_vp8_dx_algo` with a pointer to the interface exposed by the
 algorithm you want to use. The `cfg` argument is left as NULL in this
 example, because we want the algorithm to determine the stream
 configuration (width/height) and allocate memory automatically. This
--- a/libs.mk
+++ b/libs.mk
@@ -9,13 +9,7 @@
 ##


-# ARM assembly files are written in RVCT-style. We use some make magic to
-# filter those files to allow GCC compilation
-ifeq ($(ARCH_ARM),yes)
-  ASM:=$(if $(filter yes,$(CONFIG_GCC)),.asm.s,.asm)
-else
-  ASM:=.asm
-endif
+ASM:=$(if $(filter yes,$(CONFIG_GCC)),.asm.s,.asm)

 CODEC_SRCS-yes += libs.mk

@@ -123,18 +117,6 @@ endif
 else
 INSTALL-LIBS-yes += $(LIBSUBDIR)/libvpx.a
 INSTALL-LIBS-$(CONFIG_DEBUG_LIBS) += $(LIBSUBDIR)/libvpx_g.a
-
-#Install the OpenCL kernels if CL enabled.
-ifeq ($(CONFIG_OPENCL),yes)
-INSTALL-LIBS-yes += $(LIBSUBDIR)/vp8/common/opencl/filter_cl.cl
-INSTALL-LIBS-yes += $(LIBSUBDIR)/vp8/common/opencl/idctllm_cl.cl
-INSTALL-LIBS-yes += $(LIBSUBDIR)/vp8/common/opencl/loopfilter.cl
-#only install decoder CL files if VP8 decoder enabled
-ifeq ($(CONFIG_VP8_DECODER),yes)
-INSTALL-LIBS-yes += $(LIBSUBDIR)/vp8/decoder/opencl/dequantize_cl.cl
-endif
-endif #CONFIG_OPENCL=yes
-
 endif

 CODEC_SRCS=$(call enabled,CODEC_SRCS)
@@ -144,22 +126,28 @@ INSTALL-SRCS-$(CONFIG_CODEC_SRCS) += $(call enabled,CODEC_EXPORTS)
 ifeq ($(CONFIG_EXTERNAL_BUILD),yes)
 ifeq ($(CONFIG_MSVS),yes)

+ifeq ($(ARCH_ARM),yes)
+ifeq ($(HAVE_ARMV5TE),yes)
+ARM_ARCH=v5
+endif
+ifeq ($(HAVE_ARMV6),yes)
+ARM_ARCH=v6
+endif
 obj_int_extract.vcproj: $(SRC_PATH_BARE)/build/make/obj_int_extract.c
-	@cp $(SRC_PATH_BARE)/build/x86-msvs/obj_int_extract.bat .
+	@cp $(SRC_PATH_BARE)/build/arm-wince-vs8/obj_int_extract.bat .
 	@echo "    [CREATE] $@"
-	$(SRC_PATH_BARE)/build/make/gen_msvs_proj.sh \
-    --exe \
-    --target=$(TOOLCHAIN) \
-    --name=obj_int_extract \
-    --ver=$(CONFIG_VS_VERSION) \
-    --proj-guid=E1360C65-D375-4335-8057-7ED99CC3F9B2 \
-    $(if $(CONFIG_STATIC_MSVCRT),--static-crt) \
-    --out=$@ $^ \
-    -I. \
-    -I"$(SRC_PATH_BARE)" \
+	$(SRC_PATH_BARE)/build/make/gen_msvs_proj.sh\
+			--exe\
+			--target=$(TOOLCHAIN)\
+            $(if $(CONFIG_STATIC_MSVCRT),--static-crt) \
+            --name=obj_int_extract\
+            --proj-guid=E1360C65-D375-4335-8057-7ED99CC3F9B2\
+            --out=$@ $^\
+            -I".&quot;;&quot;$(SRC_PATH_BARE)"

 PROJECTS-$(BUILD_LIBVPX) += obj_int_extract.vcproj
 PROJECTS-$(BUILD_LIBVPX) += obj_int_extract.bat
+endif

 vpx.def: $(call enabled,CODEC_EXPORTS)
 	@echo "    [CREATE] $@"
@@ -170,16 +158,15 @@ CLEAN-OBJS += vpx.def

 vpx.vcproj: $(CODEC_SRCS) vpx.def
 	@echo "    [CREATE] $@"
-	$(SRC_PATH_BARE)/build/make/gen_msvs_proj.sh \
-			--lib \
-			--target=$(TOOLCHAIN) \
+	$(SRC_PATH_BARE)/build/make/gen_msvs_proj.sh\
+			--lib\
+			--target=$(TOOLCHAIN)\
            $(if $(CONFIG_STATIC_MSVCRT),--static-crt) \
-            --name=vpx \
-            --proj-guid=DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74 \
-            --module-def=vpx.def \
-            --ver=$(CONFIG_VS_VERSION) \
-            --out=$@ $(CFLAGS) $^ \
-            --src-path-bare="$(SRC_PATH_BARE)" \
+            --name=vpx\
+            --proj-guid=DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74\
+            --module-def=vpx.def\
+            --ver=$(CONFIG_VS_VERSION)\
+            --out=$@ $(CFLAGS) $^\

 PROJECTS-$(BUILD_LIBVPX) += vpx.vcproj

@@ -216,26 +203,6 @@ $(addprefix $(DIST_DIR)/,$(LIBVPX_SO_SYMLINKS)):

 INSTALL-LIBS-$(CONFIG_SHARED) += $(LIBVPX_SO_SYMLINKS)
 INSTALL-LIBS-$(CONFIG_SHARED) += $(LIBSUBDIR)/$(LIBVPX_SO)
-
-LIBS-$(BUILD_LIBVPX) += vpx.pc
-vpx.pc: config.mk libs.mk
-	@echo "    [CREATE] $@"
-	$(qexec)echo '# pkg-config file from libvpx $(VERSION_STRING)' > $@
-	$(qexec)echo 'prefix=$(PREFIX)' >> $@
-	$(qexec)echo 'exec_prefix=$${prefix}' >> $@
-	$(qexec)echo 'libdir=$${prefix}/lib' >> $@
-	$(qexec)echo 'includedir=$${prefix}/include' >> $@
-	$(qexec)echo '' >> $@
-	$(qexec)echo 'Name: vpx' >> $@
-	$(qexec)echo 'Description: WebM Project VPx codec implementation' >> $@
-	$(qexec)echo 'Version: $(VERSION_MAJOR).$(VERSION_MINOR).$(VERSION_PATCH)' >> $@
-	$(qexec)echo 'Requires:' >> $@
-	$(qexec)echo 'Conflicts:' >> $@
-	$(qexec)echo 'Libs: -L$${libdir} -lvpx' >> $@
-	$(qexec)echo 'Cflags: -I$${includedir}' >> $@
-INSTALL-LIBS-yes += $(LIBSUBDIR)/pkgconfig/vpx.pc
-INSTALL_MAPS += $(LIBSUBDIR)/pkgconfig/%.pc %.pc
-CLEAN-OBJS += vpx.pc
 endif

 LIBS-$(LIPO_LIBVPX) += libvpx.a
@@ -263,44 +230,9 @@ endif
 #
 # Add assembler dependencies for configuration and offsets
 #
-$(filter %.s.o,$(OBJS-yes)):     $(BUILD_PFX)vpx_config.asm
-$(filter %$(ASM).o,$(OBJS-yes)): $(BUILD_PFX)vpx_config.asm
-
-#
-# Calculate platform- and compiler-specific offsets for hand coded assembly
-#
-ifeq ($(CONFIG_EXTERNAL_BUILD),) # Visual Studio uses obj_int_extract.bat
-  ifeq ($(ARCH_ARM), yes)
-    asm_com_offsets.asm: obj_int_extract
-    asm_com_offsets.asm: $(VP8_PREFIX)common/asm_com_offsets.c.o
-	./obj_int_extract rvds $< $(ADS2GAS) > $@
-    OBJS-yes += $(VP8_PREFIX)common/asm_com_offsets.c.o
-    CLEAN-OBJS += asm_com_offsets.asm
-    $(filter %$(ASM).o,$(OBJS-yes)): $(BUILD_PFX)asm_com_offsets.asm
-  endif
-
-  ifeq ($(ARCH_ARM)$(ARCH_X86)$(ARCH_X86_64), yes)
-    ifeq ($(CONFIG_VP8_ENCODER), yes)
-      asm_enc_offsets.asm: obj_int_extract
-      asm_enc_offsets.asm: $(VP8_PREFIX)encoder/asm_enc_offsets.c.o
-	./obj_int_extract rvds $< $(ADS2GAS) > $@
-      OBJS-yes += $(VP8_PREFIX)encoder/asm_enc_offsets.c.o
-      CLEAN-OBJS += asm_enc_offsets.asm
-      $(filter %$(ASM).o,$(OBJS-yes)): $(BUILD_PFX)asm_enc_offsets.asm
-    endif
-  endif
-
-  ifeq ($(ARCH_ARM), yes)
-    ifeq ($(CONFIG_VP8_DECODER), yes)
-      asm_dec_offsets.asm: obj_int_extract
-      asm_dec_offsets.asm: $(VP8_PREFIX)decoder/asm_dec_offsets.c.o
-	./obj_int_extract rvds $< $(ADS2GAS) > $@
-      OBJS-yes += $(VP8_PREFIX)decoder/asm_dec_offsets.c.o
-      CLEAN-OBJS += asm_dec_offsets.asm
-      $(filter %$(ASM).o,$(OBJS-yes)): $(BUILD_PFX)asm_dec_offsets.asm
-    endif
-  endif
-endif
+#$(filter %$(ASM).o,$(OBJS-yes)): $(BUILD_PFX)vpx_config.asm $(BUILD_PFX)vpx_asm_offsets.asm
+$(filter %.s.o,$(OBJS-yes)):   $(BUILD_PFX)vpx_config.asm
+$(filter %.asm.o,$(OBJS-yes)): $(BUILD_PFX)vpx_config.asm

 $(shell $(SRC_PATH_BARE)/build/make/version.sh "$(SRC_PATH_BARE)" $(BUILD_PFX)vpx_version.h)
 CLEAN-OBJS += $(BUILD_PFX)vpx_version.h
--- a/mainpage.dox
+++ b/mainpage.dox
@@ -31,7 +31,7 @@
  The WebM project is an open source project supported by its community. For
  questions about this SDK, please mail the apps-devel@webmproject.org list.
  To contribute, see http://www.webmproject.org/code/contribute and mail
-  codec-devel@webmproject.org.
+  vpx-devel@webmproject.org.
 */

 /*!\page changelog CHANGELOG
--- a/md5_utils.c
+++ b/md5_utils.c
@@ -20,6 +20,8 @@
 * Still in the public domain.
 */

+#include <sys/types.h>    /* for stupid systems */
+
 #include <string.h>   /* for memcpy() */

 #include "md5_utils.h"
--- a/solution.mk
+++ b/solution.mk
@@ -9,13 +9,38 @@
 ##


+ifeq ($(ARCH_ARM),yes)
+ARM_DEVELOP=no
+ARM_DEVELOP:=$(if $(filter %vpx.vcproj,$(wildcard *.vcproj)),yes)
+
+ifeq ($(ARM_DEVELOP),yes)
+vpx.sln:
+	@echo "    [COPY] $@"
+	@cp $(SRC_PATH_BARE)/build/arm-wince-vs8/vpx.sln .
+PROJECTS-yes += vpx.sln
+else
+vpx.sln: $(wildcard *.vcproj)
+	@echo "    [CREATE] $@"
+	$(SRC_PATH_BARE)/build/make/gen_msvs_sln.sh \
+            $(if $(filter %vpx.vcproj,$^),--dep=vpxdec:vpx) \
+            $(if $(filter %vpx.vcproj,$^),--dep=xma:vpx) \
+            --ver=$(CONFIG_VS_VERSION)\
+            --target=$(TOOLCHAIN)\
+            --out=$@ $^
+vpx.sln.mk: vpx.sln
+	@true
+
+PROJECTS-yes += vpx.sln vpx.sln.mk
+-include vpx.sln.mk
+endif
+
+else
 vpx.sln: $(wildcard *.vcproj)
 	@echo "    [CREATE] $@"
 	$(SRC_PATH_BARE)/build/make/gen_msvs_sln.sh \
            $(if $(filter %vpx.vcproj,$^),\
-                $(foreach vcp,$(filter-out %vpx.vcproj %obj_int_extract.vcproj,$^),\
+                $(foreach vcp,$(filter-out %vpx.vcproj,$^),\
                  --dep=$(vcp:.vcproj=):vpx)) \
-            --dep=vpx:obj_int_extract \
            --ver=$(CONFIG_VS_VERSION)\
            --out=$@ $^
 vpx.sln.mk: vpx.sln
@@ -23,6 +48,7 @@ vpx.sln.mk: vpx.sln

 PROJECTS-yes += vpx.sln vpx.sln.mk
 -include vpx.sln.mk
+endif

 # Always install this file, as it is an unconditional post-build rule.
 INSTALL_MAPS += src/%     $(SRC_PATH_BARE)/%
--- a/usage.dox
+++ b/usage.dox
@@ -25,7 +25,7 @@
    codec may write into to store details about a single instance of that codec.
    Most of the context is implementation specific, and thus opaque to the
    application. The context structure as seen by the application is of fixed
-    size, and thus can be allocated with automatic storage or dynamically
+    size, and thus can be allocated eith with automatic storage or dynamically
    on the heap.

    Most operations require an initialized codec context. Codec context
@@ -74,7 +74,7 @@
    the ABI is versioned. The ABI version number must be passed at
    initialization time to ensure the application is using a header file that
    matches the library. The current ABI version number is stored in the
-    preprocessor macros #VPX_CODEC_ABI_VERSION, #VPX_ENCODER_ABI_VERSION, and
+    prepropcessor macros #VPX_CODEC_ABI_VERSION, #VPX_ENCODER_ABI_VERSION, and
    #VPX_DECODER_ABI_VERSION. For convenience, each initialization function has
    a wrapper macro that inserts the correct version number. These macros are
    named like the initialization methods, but without the _ver suffix.
@@ -125,7 +125,7 @@

    The special value <code>0</code> is reserved to represent an infinite
    deadline. In this case, the codec will perform as much processing as
-    possible to yield the highest quality frame.
+    possible to yeild the highest quality frame.

    By convention, the value <code>1</code> is used to mean "return as fast as
    possible."
@@ -135,7 +135,7 @@

 /*! \page usage_xma External Memory Allocation
    Applications that wish to have fine grained control over how and where
-    decoders allocate memory \ref MAY make use of the eXternal Memory Allocation
+    decoders allocate memory \ref MAY make use of the e_xternal Memory Allocation
    (XMA) interface. Not all codecs support the XMA \ref usage_features.

    To use a decoder in XMA mode, the decoder \ref MUST be initialized with the
@@ -143,7 +143,7 @@
    allocate is heavily dependent on the size of the encoded video frames. The
    size of the video must be known before requesting the decoder's memory map.
    This stream information can be obtained with the vpx_codec_peek_stream_info()
-    function, which does not require a constructed decoder context. If the exact
+    function, which does not require a contructed decoder context. If the exact
    stream is not known, a stream info structure can be created that reflects
    the maximum size that the decoder instance is required to support.

@@ -175,7 +175,7 @@
    \section usage_xma_seg_szalign Segment Size and Alignment
    The sz (size) and align (alignment) parameters describe the required size
    and alignment of the requested segment. Alignment will always be a power of
-    two. Applications \ref MUST honor the alignment requested. Failure to do so
+    two. Applications \ref MUST honor the aligment requested. Failure to do so
    could result in program crashes or may incur a speed penalty.

    \section usage_xma_seg_flags Segment Flags
--- a/vp8/common/alloccommon.c
+++ b/vp8/common/alloccommon.c
@@ -12,21 +12,26 @@
 #include "vpx_ports/config.h"
 #include "blockd.h"
 #include "vpx_mem/vpx_mem.h"
+#include "error_concealment.h"
 #include "onyxc_int.h"
 #include "findnearmv.h"
 #include "entropymode.h"
 #include "systemdependent.h"
+#include "vpxerrors.h"


 extern  void vp8_init_scan_order_mask();

-static void update_mode_info_border(MODE_INFO *mi, int rows, int cols)
+void vp8_update_mode_info_border(MODE_INFO *mi, int rows, int cols)
 {
    int i;
    vpx_memset(mi - cols - 2, 0, sizeof(MODE_INFO) * (cols + 1));

    for (i = 0; i < rows; i++)
    {
+        /* TODO(holmer): Bug? This updates the last element of each row
+         * rather than the border element!
+         */
        vpx_memset(&mi[i*cols-1], 0, sizeof(MODE_INFO));
    }
 }
@@ -43,9 +48,11 @@ void vp8_de_alloc_frame_buffers(VP8_COMMON *oci)

    vpx_free(oci->above_context);
    vpx_free(oci->mip);
+    vpx_free(oci->prev_mip);

    oci->above_context = 0;
    oci->mip = 0;
+    oci->prev_mip = 0;

 }

@@ -70,7 +77,7 @@ int vp8_alloc_frame_buffers(VP8_COMMON *oci, int width, int height)
      if (vp8_yv12_alloc_frame_buffer(&oci->yv12_fb[i],  width, height, VP8BORDERINPIXELS) < 0)
        {
            vp8_de_alloc_frame_buffers(oci);
-            return 1;
+            return ALLOC_FAILURE;
        }
    }

@@ -87,13 +94,13 @@ int vp8_alloc_frame_buffers(VP8_COMMON *oci, int width, int height)
    if (vp8_yv12_alloc_frame_buffer(&oci->temp_scale_frame,   width, 16, VP8BORDERINPIXELS) < 0)
    {
        vp8_de_alloc_frame_buffers(oci);
-        return 1;
+        return ALLOC_FAILURE;
    }

    if (vp8_yv12_alloc_frame_buffer(&oci->post_proc_buffer, width, height, VP8BORDERINPIXELS) < 0)
    {
        vp8_de_alloc_frame_buffers(oci);
-        return 1;
+        return ALLOC_FAILURE;
    }

    oci->mb_rows = height >> 4;
@@ -105,21 +112,31 @@ int vp8_alloc_frame_buffers(VP8_COMMON *oci, int width, int height)
    if (!oci->mip)
    {
        vp8_de_alloc_frame_buffers(oci);
-        return 1;
+        return ALLOC_FAILURE;
    }

    oci->mi = oci->mip + oci->mode_info_stride + 1;

+    /* allocate memory for last frame MODE_INFO array */
+    oci->prev_mip = vpx_calloc((oci->mb_cols + 1) * (oci->mb_rows + 1), sizeof(MODE_INFO));
+
+    if (!oci->prev_mip)
+    {
+        vp8_de_alloc_frame_buffers(oci);
+        return ALLOC_FAILURE;
+    }
+
+    oci->prev_mi = oci->prev_mip + oci->mode_info_stride + 1;

    oci->above_context = vpx_calloc(sizeof(ENTROPY_CONTEXT_PLANES) * oci->mb_cols, 1);

    if (!oci->above_context)
    {
        vp8_de_alloc_frame_buffers(oci);
-        return 1;
+        return ALLOC_FAILURE;
    }

-    update_mode_info_border(oci->mi, oci->mb_rows, oci->mb_cols);
+    vp8_update_mode_info_border(oci->mi, oci->mb_rows, oci->mb_cols);

    return 0;
 }
@@ -130,32 +147,32 @@ void vp8_setup_version(VP8_COMMON *cm)
    case 0:
        cm->no_lpf = 0;
        cm->simpler_lpf = 0;
-        cm->mcomp_filter_type = SIXTAP;
+        cm->use_bilinear_mc_filter = 0;
        cm->full_pixel = 0;
        break;
    case 1:
        cm->no_lpf = 0;
        cm->simpler_lpf = 1;
-        cm->mcomp_filter_type = BILINEAR;
+        cm->use_bilinear_mc_filter = 1;
        cm->full_pixel = 0;
        break;
    case 2:
        cm->no_lpf = 1;
        cm->simpler_lpf = 0;
-        cm->mcomp_filter_type = BILINEAR;
+        cm->use_bilinear_mc_filter = 1;
        cm->full_pixel = 0;
        break;
    case 3:
        cm->no_lpf = 1;
        cm->simpler_lpf = 1;
-        cm->mcomp_filter_type = BILINEAR;
+        cm->use_bilinear_mc_filter = 1;
        cm->full_pixel = 1;
        break;
    default:
        /*4,5,6,7 are reserved for future use*/
        cm->no_lpf = 0;
        cm->simpler_lpf = 0;
-        cm->mcomp_filter_type = SIXTAP;
+        cm->use_bilinear_mc_filter = 0;
        cm->full_pixel = 0;
        break;
    }
@@ -170,7 +187,7 @@ void vp8_create_common(VP8_COMMON *oci)
    oci->mb_no_coeff_skip = 1;
    oci->no_lpf = 0;
    oci->simpler_lpf = 0;
-    oci->mcomp_filter_type = SIXTAP;
+    oci->use_bilinear_mc_filter = 0;
    oci->full_pixel = 0;
    oci->multi_token_partition = ONE_PARTITION;
    oci->clr_type = REG_YUV;
--- a/vp8/common/arm/arm_systemdependent.c
+++ b/vp8/common/arm/arm_systemdependent.c
@@ -11,13 +11,21 @@

 #include "vpx_ports/config.h"
 #include "vpx_ports/arm.h"
-#include "vp8/common/g_common.h"
-#include "vp8/common/pragmas.h"
-#include "vp8/common/subpixel.h"
-#include "vp8/common/loopfilter.h"
-#include "vp8/common/recon.h"
-#include "vp8/common/idct.h"
-#include "vp8/common/onyxc_int.h"
+#include "g_common.h"
+#include "pragmas.h"
+#include "subpixel.h"
+#include "loopfilter.h"
+#include "recon.h"
+#include "idct.h"
+#include "onyxc_int.h"
+
+extern void (*vp8_build_intra_predictors_mby_ptr)(MACROBLOCKD *x);
+extern void vp8_build_intra_predictors_mby(MACROBLOCKD *x);
+extern void vp8_build_intra_predictors_mby_neon(MACROBLOCKD *x);
+
+extern void (*vp8_build_intra_predictors_mby_s_ptr)(MACROBLOCKD *x);
+extern void vp8_build_intra_predictors_mby_s(MACROBLOCKD *x);
+extern void vp8_build_intra_predictors_mby_s_neon(MACROBLOCKD *x);

 void vp8_arch_arm_common_init(VP8_COMMON *ctx)
 {
@@ -98,12 +106,31 @@ void vp8_arch_arm_common_init(VP8_COMMON *ctx)
        rtcd->recon.recon2      = vp8_recon2b_neon;
        rtcd->recon.recon4      = vp8_recon4b_neon;
        rtcd->recon.recon_mb    = vp8_recon_mb_neon;
-        rtcd->recon.build_intra_predictors_mby =
-            vp8_build_intra_predictors_mby_neon;
-        rtcd->recon.build_intra_predictors_mby_s =
-            vp8_build_intra_predictors_mby_s_neon;
+
    }
 #endif

 #endif
+
+#if HAVE_ARMV6
+#if CONFIG_RUNTIME_CPU_DETECT
+    if (has_media)
+#endif
+    {
+        vp8_build_intra_predictors_mby_ptr = vp8_build_intra_predictors_mby;
+        vp8_build_intra_predictors_mby_s_ptr = vp8_build_intra_predictors_mby_s;
+    }
+#endif
+
+#if HAVE_ARMV7
+#if CONFIG_RUNTIME_CPU_DETECT
+    if (has_neon)
+#endif
+    {
+        vp8_build_intra_predictors_mby_ptr =
+         vp8_build_intra_predictors_mby_neon;
+        vp8_build_intra_predictors_mby_s_ptr =
+         vp8_build_intra_predictors_mby_s_neon;
+    }
+#endif
 }
--- a/vp8/common/arm/armv6/bilinearfilter_v6.asm
+++ b/vp8/common/arm/armv6/bilinearfilter_v6.asm
@@ -15,19 +15,19 @@
    AREA    |.text|, CODE, READONLY  ; name this block of code

 ;-------------------------------------
-; r0    unsigned char  *src_ptr,
-; r1    unsigned short *dst_ptr,
-; r2    unsigned int    src_pitch,
-; r3    unsigned int    height,
-; stack unsigned int    width,
-; stack const short    *vp8_filter
+; r0    unsigned char *src_ptr,
+; r1    unsigned short *output_ptr,
+; r2    unsigned int src_pixels_per_line,
+; r3    unsigned int output_height,
+; stack    unsigned int output_width,
+; stack    const short *vp8_filter
 ;-------------------------------------
 ; The output is transposed stroed in output array to make it easy for second pass filtering.
 |vp8_filter_block2d_bil_first_pass_armv6| PROC
    stmdb   sp!, {r4 - r11, lr}

    ldr     r11, [sp, #40]                  ; vp8_filter address
-    ldr     r4, [sp, #36]                   ; width
+    ldr     r4, [sp, #36]                   ; output width

    mov     r12, r3                         ; outer-loop counter
    sub     r2, r2, r4                      ; src increment for height loop
@@ -38,10 +38,10 @@

    ldr     r5, [r11]                       ; load up filter coefficients

-    mov     r3, r3, lsl #1                  ; height*2
+    mov     r3, r3, lsl #1                  ; output_height*2
    add     r3, r3, #2                      ; plus 2 to make output buffer 4-bit aligned since height is actually (height+1)

-    mov     r11, r1                         ; save dst_ptr for each row
+    mov     r11, r1                         ; save output_ptr for each row

    cmp     r5, #128                        ; if filter coef = 128, then skip the filter
    beq     bil_null_1st_filter
@@ -140,17 +140,17 @@

 ;---------------------------------
 ; r0    unsigned short *src_ptr,
-; r1    unsigned char  *dst_ptr,
-; r2    int             dst_pitch,
-; r3    unsigned int    height,
-; stack unsigned int    width,
-; stack const short    *vp8_filter
+; r1    unsigned char *output_ptr,
+; r2    int output_pitch,
+; r3    unsigned int  output_height,
+; stack unsigned int  output_width,
+; stack const short *vp8_filter
 ;---------------------------------
 |vp8_filter_block2d_bil_second_pass_armv6| PROC
    stmdb   sp!, {r4 - r11, lr}

    ldr     r11, [sp, #40]                  ; vp8_filter address
-    ldr     r4, [sp, #36]                   ; width
+    ldr     r4, [sp, #36]                   ; output width

    ldr     r5, [r11]                       ; load up filter coefficients
    mov     r12, r4                         ; outer-loop counter = width, since we work on transposed data matrix
--- a/vp8/common/arm/armv6/sixtappredict8x4_v6.asm
+++ b/vp8/common/arm/armv6/sixtappredict8x4_v6.asm
@@ -243,6 +243,8 @@ skip_secondpass_hloop
    ENDP

 ;-----------------
+    AREA    subpelfilters8_dat, DATA, READWRITE         ;read/write by default
+;Data section with name data_area is specified. DCD reserves space in memory for 48 data.
 ;One word each is reserved. Label filter_coeff can be used to access the data.
 ;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
 _filter8_coeff_
--- a/vp8/common/arm/bilinearfilter_arm.c
+++ b/vp8/common/arm/bilinearfilter_arm.c
@@ -10,29 +10,128 @@


 #include <math.h>
-#include "vp8/common/filter.h"
-#include "vp8/common/subpixel.h"
-#include "bilinearfilter_arm.h"
+#include "subpixel.h"
+
+#define BLOCK_HEIGHT_WIDTH 4
+#define VP8_FILTER_WEIGHT 128
+#define VP8_FILTER_SHIFT  7
+
+static const short bilinear_filters[8][2] =
+{
+    { 128,   0 },
+    { 112,  16 },
+    {  96,  32 },
+    {  80,  48 },
+    {  64,  64 },
+    {  48,  80 },
+    {  32,  96 },
+    {  16, 112 }
+};
+
+
+extern void vp8_filter_block2d_bil_first_pass_armv6
+(
+    unsigned char *src_ptr,
+    unsigned short *output_ptr,
+    unsigned int src_pixels_per_line,
+    unsigned int output_height,
+    unsigned int output_width,
+    const short *vp8_filter
+);
+
+extern void vp8_filter_block2d_bil_second_pass_armv6
+(
+    unsigned short *src_ptr,
+    unsigned char  *output_ptr,
+    int output_pitch,
+    unsigned int  output_height,
+    unsigned int  output_width,
+    const short *vp8_filter
+);
+
+#if 0
+void vp8_filter_block2d_bil_first_pass_6
+(
+    unsigned char *src_ptr,
+    unsigned short *output_ptr,
+    unsigned int src_pixels_per_line,
+    unsigned int output_height,
+    unsigned int output_width,
+    const short *vp8_filter
+)
+{
+    unsigned int i, j;
+
+    for ( i=0; i<output_height; i++ )
+    {
+        for ( j=0; j<output_width; j++ )
+        {
+            /* Apply bilinear filter */
+            output_ptr[j] = ( ( (int)src_ptr[0]          * vp8_filter[0]) +
+                               ((int)src_ptr[1] * vp8_filter[1]) +
+                                (VP8_FILTER_WEIGHT/2) ) >> VP8_FILTER_SHIFT;
+            src_ptr++;
+        }
+
+        /* Next row... */
+        src_ptr    += src_pixels_per_line - output_width;
+        output_ptr += output_width;
+    }
+}
+
+void vp8_filter_block2d_bil_second_pass_6
+(
+    unsigned short *src_ptr,
+    unsigned char  *output_ptr,
+    int output_pitch,
+    unsigned int  output_height,
+    unsigned int  output_width,
+    const short *vp8_filter
+)
+{
+    unsigned int  i,j;
+    int  Temp;
+
+    for ( i=0; i<output_height; i++ )
+    {
+        for ( j=0; j<output_width; j++ )
+        {
+            /* Apply filter */
+            Temp =  ((int)src_ptr[0]         * vp8_filter[0]) +
+                    ((int)src_ptr[output_width] * vp8_filter[1]) +
+                    (VP8_FILTER_WEIGHT/2);
+            output_ptr[j] = (unsigned int)(Temp >> VP8_FILTER_SHIFT);
+            src_ptr++;
+        }
+
+        /* Next row... */
+        /*src_ptr    += src_pixels_per_line - output_width;*/
+        output_ptr += output_pitch;
+    }
+}
+#endif

 void vp8_filter_block2d_bil_armv6
 (
    unsigned char *src_ptr,
-    unsigned char *dst_ptr,
-    unsigned int   src_pitch,
+    unsigned char *output_ptr,
+    unsigned int   src_pixels_per_line,
    unsigned int   dst_pitch,
-    const short   *HFilter,
-    const short   *VFilter,
+    const short      *HFilter,
+    const short      *VFilter,
    int            Width,
    int            Height
 )
 {
-    unsigned short FData[36*16]; /* Temp data buffer used in filtering */
+
+    unsigned short FData[36*16]; /* Temp data bufffer used in filtering */

    /* First filter 1-D horizontally... */
-    vp8_filter_block2d_bil_first_pass_armv6(src_ptr, FData, src_pitch, Height + 1, Width, HFilter);
+    /* pixel_step = 1; */
+    vp8_filter_block2d_bil_first_pass_armv6(src_ptr, FData, src_pixels_per_line, Height + 1, Width, HFilter);

    /* then 1-D vertically... */
-    vp8_filter_block2d_bil_second_pass_armv6(FData, dst_ptr, dst_pitch, Height, Width, VFilter);
+    vp8_filter_block2d_bil_second_pass_armv6(FData, output_ptr, dst_pitch, Height, Width, VFilter);
 }


@@ -49,8 +148,8 @@ void vp8_bilinear_predict4x4_armv6
    const short  *HFilter;
    const short  *VFilter;

-    HFilter = vp8_bilinear_filters[xoffset];
-    VFilter = vp8_bilinear_filters[yoffset];
+    HFilter = bilinear_filters[xoffset];
+    VFilter = bilinear_filters[yoffset];

    vp8_filter_block2d_bil_armv6(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 4, 4);
 }
@@ -68,8 +167,8 @@ void vp8_bilinear_predict8x8_armv6
    const short  *HFilter;
    const short  *VFilter;

-    HFilter = vp8_bilinear_filters[xoffset];
-    VFilter = vp8_bilinear_filters[yoffset];
+    HFilter = bilinear_filters[xoffset];
+    VFilter = bilinear_filters[yoffset];

    vp8_filter_block2d_bil_armv6(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 8, 8);
 }
@@ -87,8 +186,8 @@ void vp8_bilinear_predict8x4_armv6
    const short  *HFilter;
    const short  *VFilter;

-    HFilter = vp8_bilinear_filters[xoffset];
-    VFilter = vp8_bilinear_filters[yoffset];
+    HFilter = bilinear_filters[xoffset];
+    VFilter = bilinear_filters[yoffset];

    vp8_filter_block2d_bil_armv6(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 8, 4);
 }
@@ -106,8 +205,8 @@ void vp8_bilinear_predict16x16_armv6
    const short  *HFilter;
    const short  *VFilter;

-    HFilter = vp8_bilinear_filters[xoffset];
-    VFilter = vp8_bilinear_filters[yoffset];
+    HFilter = bilinear_filters[xoffset];
+    VFilter = bilinear_filters[yoffset];

    vp8_filter_block2d_bil_armv6(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 16, 16);
 }
--- a/vp8/common/arm/bilinearfilter_arm.h
+++ b/vp8/common/arm/bilinearfilter_arm.h
@@ -1,35 +0,0 @@
-/*
- *  Copyright (c) 2011 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-
-#ifndef BILINEARFILTER_ARM_H
-#define BILINEARFILTER_ARM_H
-
-extern void vp8_filter_block2d_bil_first_pass_armv6
-(
-    const unsigned char  *src_ptr,
-    unsigned short       *dst_ptr,
-    unsigned int          src_pitch,
-    unsigned int          height,
-    unsigned int          width,
-    const short          *vp8_filter
-);
-
-extern void vp8_filter_block2d_bil_second_pass_armv6
-(
-    const unsigned short *src_ptr,
-    unsigned char        *dst_ptr,
-    int                   dst_pitch,
-    unsigned int          height,
-    unsigned int          width,
-    const short         *vp8_filter
-);
-
-#endif /* BILINEARFILTER_ARM_H */
--- a/vp8/common/arm/filter_arm.c
+++ b/vp8/common/arm/filter_arm.c
@@ -11,10 +11,26 @@

 #include "vpx_ports/config.h"
 #include <math.h>
-#include "vp8/common/filter.h"
-#include "vp8/common/subpixel.h"
+#include "subpixel.h"
 #include "vpx_ports/mem.h"

+#define BLOCK_HEIGHT_WIDTH 4
+#define VP8_FILTER_WEIGHT 128
+#define VP8_FILTER_SHIFT  7
+
+DECLARE_ALIGNED(16, static const short, sub_pel_filters[8][6]) =
+{
+    { 0,  0,  128,    0,   0,  0 },         /* note that 1/8 pel positions are just as per alpha -0.5 bicubic */
+    { 0, -6,  123,   12,  -1,  0 },
+    { 2, -11, 108,   36,  -8,  1 },         /* New 1/4 pel 6 tap filter */
+    { 0, -9,   93,   50,  -6,  0 },
+    { 3, -16,  77,   77, -16,  3 },         /* New 1/2 pel 6 tap filter */
+    { 0, -6,   50,   93,  -9,  0 },
+    { 1, -8,   36,  108, -11,  2 },         /* New 1/4 pel 6 tap filter */
+    { 0, -1,   12,  123,  -6,  0 },
+};
+
+
 extern void vp8_filter_block2d_first_pass_armv6
 (
    unsigned char *src_ptr,
@@ -77,11 +93,11 @@ void vp8_sixtap_predict_armv6
 {
    const short  *HFilter;
    const short  *VFilter;
-    DECLARE_ALIGNED_ARRAY(4, short, FData, 12*4); /* Temp data buffer used in filtering */
+    DECLARE_ALIGNED_ARRAY(4, short, FData, 12*4); /* Temp data bufffer used in filtering */


-    HFilter = vp8_sub_pel_filters[xoffset];   /* 6 tap */
-    VFilter = vp8_sub_pel_filters[yoffset];   /* 6 tap */
+    HFilter = sub_pel_filters[xoffset];   /* 6 tap */
+    VFilter = sub_pel_filters[yoffset];       /* 6 tap */

    /* Vfilter is null. First pass only */
    if (xoffset && !yoffset)
@@ -113,6 +129,47 @@ void vp8_sixtap_predict_armv6
    }
 }

+#if 0
+void vp8_sixtap_predict8x4_armv6
+(
+    unsigned char  *src_ptr,
+    int  src_pixels_per_line,
+    int  xoffset,
+    int  yoffset,
+    unsigned char *dst_ptr,
+    int  dst_pitch
+)
+{
+    const short  *HFilter;
+    const short  *VFilter;
+    DECLARE_ALIGNED_ARRAY(4, short, FData, 16*8); /* Temp data bufffer used in filtering */
+
+    HFilter = sub_pel_filters[xoffset];   /* 6 tap */
+    VFilter = sub_pel_filters[yoffset];       /* 6 tap */
+
+
+    /*if (xoffset && !yoffset)
+    {
+        vp8_filter_block2d_first_pass_only_armv6 (  src_ptr, dst_ptr, src_pixels_per_line, 8, dst_pitch, HFilter );
+    }*/
+    /* Hfilter is null. Second pass only */
+    /*else if (!xoffset && yoffset)
+    {
+        vp8_filter_block2d_second_pass_only_armv6 ( src_ptr, dst_ptr, src_pixels_per_line, 8, dst_pitch, VFilter );
+    }
+    else
+    {
+        if (yoffset & 0x1)
+            vp8_filter_block2d_first_pass_armv6 ( src_ptr-src_pixels_per_line, FData+1, src_pixels_per_line, 8, 7, HFilter );
+        else*/
+
+        vp8_filter_block2d_first_pass_armv6 ( src_ptr-(2*src_pixels_per_line), FData, src_pixels_per_line, 8, 9, HFilter );
+
+        vp8_filter_block2d_second_pass_armv6 ( FData+2, dst_ptr, dst_pitch, 4, 8, VFilter );
+    /*}*/
+}
+#endif
+
 void vp8_sixtap_predict8x8_armv6
 (
    unsigned char  *src_ptr,
@@ -125,10 +182,10 @@ void vp8_sixtap_predict8x8_armv6
 {
    const short  *HFilter;
    const short  *VFilter;
-    DECLARE_ALIGNED_ARRAY(4, short, FData, 16*8); /* Temp data buffer used in filtering */
+    DECLARE_ALIGNED_ARRAY(4, short, FData, 16*8); /* Temp data bufffer used in filtering */

-    HFilter = vp8_sub_pel_filters[xoffset];   /* 6 tap */
-    VFilter = vp8_sub_pel_filters[yoffset];   /* 6 tap */
+    HFilter = sub_pel_filters[xoffset];   /* 6 tap */
+    VFilter = sub_pel_filters[yoffset];       /* 6 tap */

    if (xoffset && !yoffset)
    {
@@ -167,10 +224,10 @@ void vp8_sixtap_predict16x16_armv6
 {
    const short  *HFilter;
    const short  *VFilter;
-    DECLARE_ALIGNED_ARRAY(4, short, FData, 24*16);    /* Temp data buffer used in filtering */
+    DECLARE_ALIGNED_ARRAY(4, short, FData, 24*16);    /* Temp data bufffer used in filtering */

-    HFilter = vp8_sub_pel_filters[xoffset];   /* 6 tap */
-    VFilter = vp8_sub_pel_filters[yoffset];   /* 6 tap */
+    HFilter = sub_pel_filters[xoffset];   /* 6 tap */
+    VFilter = sub_pel_filters[yoffset];       /* 6 tap */

    if (xoffset && !yoffset)
    {
--- a/vp8/common/arm/loopfilter_arm.c
+++ b/vp8/common/arm/loopfilter_arm.c
@@ -11,8 +11,8 @@

 #include "vpx_ports/config.h"
 #include <math.h>
-#include "vp8/common/loopfilter.h"
-#include "vp8/common/onyxc_int.h"
+#include "loopfilter.h"
+#include "onyxc_int.h"

 extern prototype_loopfilter(vp8_loop_filter_horizontal_edge_armv6);
 extern prototype_loopfilter(vp8_loop_filter_vertical_edge_armv6);
@@ -41,13 +41,13 @@ void vp8_loop_filter_mbh_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsig
                               int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
 {
    (void) simpler_lpf;
-    vp8_mbloop_filter_horizontal_edge_armv6(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
+    vp8_mbloop_filter_horizontal_edge_armv6(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);

    if (u_ptr)
-        vp8_mbloop_filter_horizontal_edge_armv6(u_ptr, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1);
+        vp8_mbloop_filter_horizontal_edge_armv6(u_ptr, uv_stride, lfi->uvmbflim, lfi->uvlim, lfi->uvmbthr, 1);

    if (v_ptr)
-        vp8_mbloop_filter_horizontal_edge_armv6(v_ptr, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1);
+        vp8_mbloop_filter_horizontal_edge_armv6(v_ptr, uv_stride, lfi->uvmbflim, lfi->uvlim, lfi->uvmbthr, 1);
 }

 void vp8_loop_filter_mbhs_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
@@ -57,7 +57,7 @@ void vp8_loop_filter_mbhs_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsi
    (void) v_ptr;
    (void) uv_stride;
    (void) simpler_lpf;
-    vp8_loop_filter_simple_horizontal_edge_armv6(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
+    vp8_loop_filter_simple_horizontal_edge_armv6(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);
 }

 /* Vertical MB Filtering */
@@ -65,13 +65,13 @@ void vp8_loop_filter_mbv_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsig
                               int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
 {
    (void) simpler_lpf;
-    vp8_mbloop_filter_vertical_edge_armv6(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
+    vp8_mbloop_filter_vertical_edge_armv6(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);

    if (u_ptr)
-        vp8_mbloop_filter_vertical_edge_armv6(u_ptr, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1);
+        vp8_mbloop_filter_vertical_edge_armv6(u_ptr, uv_stride, lfi->uvmbflim, lfi->uvlim, lfi->uvmbthr, 1);

    if (v_ptr)
-        vp8_mbloop_filter_vertical_edge_armv6(v_ptr, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1);
+        vp8_mbloop_filter_vertical_edge_armv6(v_ptr, uv_stride, lfi->uvmbflim, lfi->uvlim, lfi->uvmbthr, 1);
 }

 void vp8_loop_filter_mbvs_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
@@ -81,7 +81,7 @@ void vp8_loop_filter_mbvs_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsi
    (void) v_ptr;
    (void) uv_stride;
    (void) simpler_lpf;
-    vp8_loop_filter_simple_vertical_edge_armv6(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
+    vp8_loop_filter_simple_vertical_edge_armv6(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);
 }

 /* Horizontal B Filtering */
@@ -94,10 +94,10 @@ void vp8_loop_filter_bh_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsign
    vp8_loop_filter_horizontal_edge_armv6(y_ptr + 12 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);

    if (u_ptr)
-        vp8_loop_filter_horizontal_edge_armv6(u_ptr + 4 * uv_stride, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1);
+        vp8_loop_filter_horizontal_edge_armv6(u_ptr + 4 * uv_stride, uv_stride, lfi->uvflim, lfi->uvlim, lfi->uvthr, 1);

    if (v_ptr)
-        vp8_loop_filter_horizontal_edge_armv6(v_ptr + 4 * uv_stride, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1);
+        vp8_loop_filter_horizontal_edge_armv6(v_ptr + 4 * uv_stride, uv_stride, lfi->uvflim, lfi->uvlim, lfi->uvthr, 1);
 }

 void vp8_loop_filter_bhs_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
@@ -122,10 +122,10 @@ void vp8_loop_filter_bv_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsign
    vp8_loop_filter_vertical_edge_armv6(y_ptr + 12, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);

    if (u_ptr)
-        vp8_loop_filter_vertical_edge_armv6(u_ptr + 4, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1);
+        vp8_loop_filter_vertical_edge_armv6(u_ptr + 4, uv_stride, lfi->uvflim, lfi->uvlim, lfi->uvthr, 1);

    if (v_ptr)
-        vp8_loop_filter_vertical_edge_armv6(v_ptr + 4, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1);
+        vp8_loop_filter_vertical_edge_armv6(v_ptr + 4, uv_stride, lfi->uvflim, lfi->uvlim, lfi->uvthr, 1);
 }

 void vp8_loop_filter_bvs_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
@@ -148,10 +148,10 @@ void vp8_loop_filter_mbh_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsign
                              int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
 {
    (void) simpler_lpf;
-    vp8_mbloop_filter_horizontal_edge_y_neon(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
+    vp8_mbloop_filter_horizontal_edge_y_neon(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);

    if (u_ptr)
-        vp8_mbloop_filter_horizontal_edge_uv_neon(u_ptr, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, v_ptr);
+        vp8_mbloop_filter_horizontal_edge_uv_neon(u_ptr, uv_stride, lfi->uvmbflim, lfi->uvlim, lfi->uvmbthr, v_ptr);
 }

 void vp8_loop_filter_mbhs_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
@@ -161,7 +161,7 @@ void vp8_loop_filter_mbhs_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsig
    (void) v_ptr;
    (void) uv_stride;
    (void) simpler_lpf;
-    vp8_loop_filter_simple_horizontal_edge_neon(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
+    vp8_loop_filter_simple_horizontal_edge_neon(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);
 }

 /* Vertical MB Filtering */
@@ -169,10 +169,10 @@ void vp8_loop_filter_mbv_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsign
                              int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
 {
    (void) simpler_lpf;
-    vp8_mbloop_filter_vertical_edge_y_neon(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
+    vp8_mbloop_filter_vertical_edge_y_neon(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);

    if (u_ptr)
-        vp8_mbloop_filter_vertical_edge_uv_neon(u_ptr, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, v_ptr);
+        vp8_mbloop_filter_vertical_edge_uv_neon(u_ptr, uv_stride, lfi->uvmbflim, lfi->uvlim, lfi->uvmbthr, v_ptr);
 }

 void vp8_loop_filter_mbvs_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
@@ -182,7 +182,7 @@ void vp8_loop_filter_mbvs_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsig
    (void) v_ptr;
    (void) uv_stride;
    (void) simpler_lpf;
-    vp8_loop_filter_simple_vertical_edge_neon(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
+    vp8_loop_filter_simple_vertical_edge_neon(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);
 }

 /* Horizontal B Filtering */
@@ -195,7 +195,7 @@ void vp8_loop_filter_bh_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsigne
    vp8_loop_filter_horizontal_edge_y_neon(y_ptr + 12 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);

    if (u_ptr)
-        vp8_loop_filter_horizontal_edge_uv_neon(u_ptr + 4 * uv_stride, uv_stride, lfi->flim, lfi->lim, lfi->thr, v_ptr + 4 * uv_stride);
+        vp8_loop_filter_horizontal_edge_uv_neon(u_ptr + 4 * uv_stride, uv_stride, lfi->uvflim, lfi->uvlim, lfi->uvthr, v_ptr + 4 * uv_stride);
 }

 void vp8_loop_filter_bhs_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
@@ -220,7 +220,7 @@ void vp8_loop_filter_bv_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsigne
    vp8_loop_filter_vertical_edge_y_neon(y_ptr + 12, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);

    if (u_ptr)
-        vp8_loop_filter_vertical_edge_uv_neon(u_ptr + 4, uv_stride, lfi->flim, lfi->lim, lfi->thr, v_ptr + 4);
+        vp8_loop_filter_vertical_edge_uv_neon(u_ptr + 4, uv_stride, lfi->uvflim, lfi->uvlim, lfi->uvthr, v_ptr + 4);
 }

 void vp8_loop_filter_bvs_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
--- a/vp8/common/arm/neon/bilinearpredict16x16_neon.asm
+++ b/vp8/common/arm/neon/bilinearpredict16x16_neon.asm
@@ -350,7 +350,10 @@ filt_blk2d_spo16x16_loop_neon
    ENDP

 ;-----------------
-
+    AREA    bifilters16_dat, DATA, READWRITE            ;read/write by default
+;Data section with name data_area is specified. DCD reserves space in memory for 48 data.
+;One word each is reserved. Label filter_coeff can be used to access the data.
+;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
 _bifilter16_coeff_
    DCD     bifilter16_coeff
 bifilter16_coeff
--- a/vp8/common/arm/neon/bilinearpredict4x4_neon.asm
+++ b/vp8/common/arm/neon/bilinearpredict4x4_neon.asm
@@ -123,7 +123,10 @@ skip_secondpass_filter
    ENDP

 ;-----------------
-
+    AREA    bilinearfilters4_dat, DATA, READWRITE           ;read/write by default
+;Data section with name data_area is specified. DCD reserves space in memory for 48 data.
+;One word each is reserved. Label filter_coeff can be used to access the data.
+;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
 _bifilter4_coeff_
    DCD     bifilter4_coeff
 bifilter4_coeff
--- a/vp8/common/arm/neon/bilinearpredict8x4_neon.asm
+++ b/vp8/common/arm/neon/bilinearpredict8x4_neon.asm
@@ -128,7 +128,10 @@ skip_secondpass_filter
    ENDP

 ;-----------------
-
+    AREA    bifilters8x4_dat, DATA, READWRITE           ;read/write by default
+;Data section with name data_area is specified. DCD reserves space in memory for 48 data.
+;One word each is reserved. Label filter_coeff can be used to access the data.
+;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
 _bifilter8x4_coeff_
    DCD     bifilter8x4_coeff
 bifilter8x4_coeff
--- a/vp8/common/arm/neon/bilinearpredict8x8_neon.asm
+++ b/vp8/common/arm/neon/bilinearpredict8x8_neon.asm
@@ -176,7 +176,10 @@ skip_secondpass_filter
    ENDP

 ;-----------------
-
+    AREA    bifilters8_dat, DATA, READWRITE         ;read/write by default
+;Data section with name data_area is specified. DCD reserves space in memory for 48 data.
+;One word each is reserved. Label filter_coeff can be used to access the data.
+;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
 _bifilter8_coeff_
    DCD     bifilter8_coeff
 bifilter8_coeff
--- a/vp8/common/arm/neon/loopfilter_neon.asm
+++ b/vp8/common/arm/neon/loopfilter_neon.asm
@@ -397,8 +397,7 @@
    bx          lr
    ENDP        ; |vp8_loop_filter_horizontal_edge_y_neon|

-;-----------------
-
+    AREA    loopfilter_dat, DATA, READONLY
 _lf_coeff_
    DCD     lf_coeff
 lf_coeff
--- a/vp8/common/arm/neon/loopfiltersimplehorizontaledge_neon.asm
+++ b/vp8/common/arm/neon/loopfiltersimplehorizontaledge_neon.asm
@@ -104,7 +104,10 @@
    ENDP        ; |vp8_loop_filter_simple_horizontal_edge_neon|

 ;-----------------
-
+    AREA    hloopfiltery_dat, DATA, READWRITE           ;read/write by default
+;Data section with name data_area is specified. DCD reserves space in memory for 16 data.
+;One word each is reserved. Label filter_coeff can be used to access the data.
+;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
 _lfhy_coeff_
    DCD     lfhy_coeff
 lfhy_coeff
--- a/vp8/common/arm/neon/loopfiltersimpleverticaledge_neon.asm
+++ b/vp8/common/arm/neon/loopfiltersimpleverticaledge_neon.asm
@@ -145,7 +145,10 @@
    ENDP        ; |vp8_loop_filter_simple_vertical_edge_neon|

 ;-----------------
-
+    AREA    vloopfiltery_dat, DATA, READWRITE           ;read/write by default
+;Data section with name data_area is specified. DCD reserves space in memory for 16 data.
+;One word each is reserved. Label filter_coeff can be used to access the data.
+;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
 _vlfy_coeff_
    DCD     vlfy_coeff
 vlfy_coeff
--- a/vp8/common/arm/neon/mbloopfilter_neon.asm
+++ b/vp8/common/arm/neon/mbloopfilter_neon.asm
@@ -505,8 +505,7 @@
    bx          lr
    ENDP        ; |vp8_mbloop_filter_neon|

-;-----------------
-
+    AREA    mbloopfilter_dat, DATA, READONLY
 _mblf_coeff_
    DCD     mblf_coeff
 mblf_coeff
--- a/vp8/common/arm/neon/recon_neon.c
+++ b/vp8/common/arm/neon/recon_neon.c
@@ -10,8 +10,8 @@


 #include "vpx_ports/config.h"
-#include "vp8/common/recon.h"
-#include "vp8/common/blockd.h"
+#include "recon.h"
+#include "blockd.h"

 extern void vp8_recon16x16mb_neon(unsigned char *pred_ptr, short *diff_ptr, unsigned char *dst_ptr, int ystride, unsigned char *udst_ptr, unsigned char *vdst_ptr);

--- a/vp8/common/arm/neon/shortidct4x4llm_neon.asm
+++ b/vp8/common/arm/neon/shortidct4x4llm_neon.asm
@@ -113,7 +113,10 @@
    ENDP

 ;-----------------
-
+    AREA    idct4x4_dat, DATA, READWRITE            ;read/write by default
+;Data section with name data_area is specified. DCD reserves space in memory for 48 data.
+;One word each is reserved. Label filter_coeff can be used to access the data.
+;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
 _idct_coeff_
    DCD     idct_coeff
 idct_coeff
--- a/vp8/common/arm/neon/sixtappredict16x16_neon.asm
+++ b/vp8/common/arm/neon/sixtappredict16x16_neon.asm
@@ -476,7 +476,10 @@ secondpass_only_inner_loop_neon
    ENDP

 ;-----------------
-
+    AREA    subpelfilters16_dat, DATA, READWRITE            ;read/write by default
+;Data section with name data_area is specified. DCD reserves space in memory for 48 data.
+;One word each is reserved. Label filter_coeff can be used to access the data.
+;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
 _filter16_coeff_
    DCD     filter16_coeff
 filter16_coeff
--- a/vp8/common/arm/neon/sixtappredict4x4_neon.asm
+++ b/vp8/common/arm/neon/sixtappredict4x4_neon.asm
@@ -407,7 +407,10 @@ secondpass_filter4x4_only
    ENDP

 ;-----------------
-
+    AREA    subpelfilters4_dat, DATA, READWRITE         ;read/write by default
+;Data section with name data_area is specified. DCD reserves space in memory for 48 data.
+;One word each is reserved. Label filter_coeff can be used to access the data.
+;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
 _filter4_coeff_
    DCD     filter4_coeff
 filter4_coeff
--- a/vp8/common/arm/neon/sixtappredict8x4_neon.asm
+++ b/vp8/common/arm/neon/sixtappredict8x4_neon.asm
@@ -458,7 +458,10 @@ secondpass_filter8x4_only
    ENDP

 ;-----------------
-
+    AREA    subpelfilters8_dat, DATA, READWRITE         ;read/write by default
+;Data section with name data_area is specified. DCD reserves space in memory for 48 data.
+;One word each is reserved. Label filter_coeff can be used to access the data.
+;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
 _filter8_coeff_
    DCD     filter8_coeff
 filter8_coeff
--- a/vp8/common/arm/neon/sixtappredict8x8_neon.asm
+++ b/vp8/common/arm/neon/sixtappredict8x8_neon.asm
@@ -509,7 +509,10 @@ filt_blk2d_spo8x8_loop_neon
    ENDP

 ;-----------------
-
+    AREA    subpelfilters8_dat, DATA, READWRITE         ;read/write by default
+;Data section with name data_area is specified. DCD reserves space in memory for 48 data.
+;One word each is reserved. Label filter_coeff can be used to access the data.
+;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
 _filter8_coeff_
    DCD     filter8_coeff
 filter8_coeff
--- a/vp8/common/arm/recon_arm.h
+++ b/vp8/common/arm/recon_arm.h
@@ -53,9 +53,6 @@ extern prototype_copy_block(vp8_copy_mem16x16_neon);

 extern prototype_recon_macroblock(vp8_recon_mb_neon);

-extern prototype_build_intra_predictors(vp8_build_intra_predictors_mby_neon);
-extern prototype_build_intra_predictors(vp8_build_intra_predictors_mby_s_neon);
-
 #if !CONFIG_RUNTIME_CPU_DETECT
 #undef  vp8_recon_recon
 #define vp8_recon_recon vp8_recon_b_neon
@@ -77,13 +74,6 @@ extern prototype_build_intra_predictors(vp8_build_intra_predictors_mby_s_neon);

 #undef  vp8_recon_recon_mb
 #define vp8_recon_recon_mb vp8_recon_mb_neon
-
-#undef  vp8_recon_build_intra_predictors_mby
-#define vp8_recon_build_intra_predictors_mby vp8_build_intra_predictors_mby_neon
-
-#undef  vp8_recon_build_intra_predictors_mby_s
-#define vp8_recon_build_intra_predictors_mby_s vp8_build_intra_predictors_mby_s_neon
-
 #endif
 #endif

--- a/vp8/common/arm/reconintra_arm.c
+++ b/vp8/common/arm/reconintra_arm.c
@@ -10,10 +10,10 @@


 #include "vpx_ports/config.h"
-#include "vp8/common/blockd.h"
-#include "vp8/common/reconintra.h"
+#include "blockd.h"
+#include "reconintra.h"
 #include "vpx_mem/vpx_mem.h"
-#include "vp8/common/recon.h"
+#include "recon.h"

 #if HAVE_ARMV7
 extern void vp8_build_intra_predictors_mby_neon_func(
--- a/vp8/common/arm/vpx_asm_offsets.c
+++ b/vp8/common/arm/vpx_asm_offsets.c
@@ -1,5 +1,5 @@
 /*
- *  Copyright (c) 2011 The WebM project authors. All Rights Reserved.
+ *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
 *
 *  Use of this source code is governed by a BSD-style license
 *  that can be found in the LICENSE file in the root of the source
@@ -12,7 +12,13 @@
 #include "vpx_ports/config.h"
 #include <stddef.h>

+#if CONFIG_VP8_ENCODER
+#include "vpx_scale/yv12config.h"
+#endif
+
+#if CONFIG_VP8_DECODER
 #include "onyxd_int.h"
+#endif

 #define DEFINE(sym, val) int sym = val;

@@ -25,6 +31,29 @@
 * {
 */

+#if CONFIG_VP8_DECODER || CONFIG_VP8_ENCODER
+DEFINE(yv12_buffer_config_y_width,              offsetof(YV12_BUFFER_CONFIG, y_width));
+DEFINE(yv12_buffer_config_y_height,             offsetof(YV12_BUFFER_CONFIG, y_height));
+DEFINE(yv12_buffer_config_y_stride,             offsetof(YV12_BUFFER_CONFIG, y_stride));
+DEFINE(yv12_buffer_config_uv_width,             offsetof(YV12_BUFFER_CONFIG, uv_width));
+DEFINE(yv12_buffer_config_uv_height,            offsetof(YV12_BUFFER_CONFIG, uv_height));
+DEFINE(yv12_buffer_config_uv_stride,            offsetof(YV12_BUFFER_CONFIG, uv_stride));
+DEFINE(yv12_buffer_config_y_buffer,             offsetof(YV12_BUFFER_CONFIG, y_buffer));
+DEFINE(yv12_buffer_config_u_buffer,             offsetof(YV12_BUFFER_CONFIG, u_buffer));
+DEFINE(yv12_buffer_config_v_buffer,             offsetof(YV12_BUFFER_CONFIG, v_buffer));
+DEFINE(yv12_buffer_config_border,               offsetof(YV12_BUFFER_CONFIG, border));
+#endif
+
+#if CONFIG_VP8_DECODER
+DEFINE(mb_diff,                                 offsetof(MACROBLOCKD, diff));
+DEFINE(mb_predictor,                            offsetof(MACROBLOCKD, predictor));
+DEFINE(mb_dst_y_stride,                         offsetof(MACROBLOCKD, dst.y_stride));
+DEFINE(mb_dst_y_buffer,                         offsetof(MACROBLOCKD, dst.y_buffer));
+DEFINE(mb_dst_u_buffer,                         offsetof(MACROBLOCKD, dst.u_buffer));
+DEFINE(mb_dst_v_buffer,                         offsetof(MACROBLOCKD, dst.v_buffer));
+DEFINE(mb_up_available,                         offsetof(MACROBLOCKD, up_available));
+DEFINE(mb_left_available,                       offsetof(MACROBLOCKD, left_available));
+
 DEFINE(detok_scan,                              offsetof(DETOK, scan));
 DEFINE(detok_ptr_block2leftabove,               offsetof(DETOK, ptr_block2leftabove));
 DEFINE(detok_coef_tree_ptr,                     offsetof(DETOK, vp8_coef_tree_ptr));
@@ -48,6 +77,7 @@ DEFINE(bool_decoder_range,                      offsetof(BOOL_DECODER, range));

 DEFINE(tokenextrabits_min_val,                  offsetof(TOKENEXTRABITS, min_val));
 DEFINE(tokenextrabits_length,                   offsetof(TOKENEXTRABITS, Length));
+#endif

 //add asserts for any offset that is not supported by assembly code
 //add asserts for any size that is not supported by assembly code
--- a/vp8/common/asm_com_offsets.c
+++ b/vp8/common/asm_com_offsets.c
@@ -1,49 +0,0 @@
-/*
- *  Copyright (c) 2011 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-
-#include "vpx_ports/config.h"
-#include <stddef.h>
-
-#include "vpx_scale/yv12config.h"
-
-#define ct_assert(name,cond) \
-    static void assert_##name(void) UNUSED;\
-    static void assert_##name(void) {switch(0){case 0:case !!(cond):;}}
-
-#define DEFINE(sym, val) int sym = val;
-
-/*
-#define BLANK() asm volatile("\n->" : : )
-*/
-
-/*
- * int main(void)
- * {
- */
-
-//vpx_scale
-DEFINE(yv12_buffer_config_y_width,              offsetof(YV12_BUFFER_CONFIG, y_width));
-DEFINE(yv12_buffer_config_y_height,             offsetof(YV12_BUFFER_CONFIG, y_height));
-DEFINE(yv12_buffer_config_y_stride,             offsetof(YV12_BUFFER_CONFIG, y_stride));
-DEFINE(yv12_buffer_config_uv_width,             offsetof(YV12_BUFFER_CONFIG, uv_width));
-DEFINE(yv12_buffer_config_uv_height,            offsetof(YV12_BUFFER_CONFIG, uv_height));
-DEFINE(yv12_buffer_config_uv_stride,            offsetof(YV12_BUFFER_CONFIG, uv_stride));
-DEFINE(yv12_buffer_config_y_buffer,             offsetof(YV12_BUFFER_CONFIG, y_buffer));
-DEFINE(yv12_buffer_config_u_buffer,             offsetof(YV12_BUFFER_CONFIG, u_buffer));
-DEFINE(yv12_buffer_config_v_buffer,             offsetof(YV12_BUFFER_CONFIG, v_buffer));
-DEFINE(yv12_buffer_config_border,               offsetof(YV12_BUFFER_CONFIG, border));
-
-//add asserts for any offset that is not supported by assembly code
-//add asserts for any size that is not supported by assembly code
-/*
- * return 0;
- * }
- */
--- a/vp8/common/blockd.c
+++ b/vp8/common/blockd.c
@@ -12,6 +12,8 @@
 #include "blockd.h"
 #include "vpx_mem/vpx_mem.h"

+const int vp8_block2type[25] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 1};
+
 const unsigned char vp8_block2left[25] =
 {
    0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8
--- a/vp8/common/blockd.h
+++ b/vp8/common/blockd.h
@@ -14,17 +14,12 @@

 void vpx_log(const char *format, ...);

-#include "../../vpx_ports/config.h"
-#include "../../vpx_scale/yv12config.h"
+#include "vpx_ports/config.h"
+#include "vpx_scale/yv12config.h"
 #include "mv.h"
 #include "treecoder.h"
 #include "subpixel.h"
-#include "../../vpx_ports/mem.h"
-
-#include "../../vpx_config.h"
-#if CONFIG_OPENCL
-#include "opencl/vp8_opencl.h"
-#endif
+#include "vpx_ports/mem.h"

 #define TRUE    1
 #define FALSE   0
@@ -33,6 +28,11 @@ void vpx_log(const char *format, ...);
 #define DCPREDSIMTHRESH 0
 #define DCPREDCNTTHRESH 3

+#define Y1CONTEXT 0
+#define UCONTEXT 1
+#define VCONTEXT 2
+#define Y2CONTEXT 3
+
 #define MB_FEATURE_TREE_PROBS   3
 #define MAX_MB_SEGMENTS         4

@@ -48,11 +48,6 @@ typedef struct
    int r, c;
 } POS;

-#define PLANE_TYPE_Y_NO_DC    0
-#define PLANE_TYPE_Y2         1
-#define PLANE_TYPE_UV         2
-#define PLANE_TYPE_Y_WITH_DC  3
-

 typedef char ENTROPY_CONTEXT;
 typedef struct
@@ -63,6 +58,8 @@ typedef struct
    ENTROPY_CONTEXT y2;
 } ENTROPY_CONTEXT_PLANES;

+extern const int vp8_block2type[25];
+
 extern const unsigned char vp8_block2left[25];
 extern const unsigned char vp8_block2above[25];

@@ -78,19 +75,19 @@ typedef enum

 typedef enum
 {
-    DC_PRED = 0,            /* average of above and left pixels */
-    V_PRED = 1,             /* vertical prediction */
-    H_PRED = 2,             /* horizontal prediction */
-    TM_PRED = 3,            /* Truemotion prediction */
-    B_PRED = 4,             /* block based prediction, each block has its own prediction mode */
+    DC_PRED,            /* average of above and left pixels */
+    V_PRED,             /* vertical prediction */
+    H_PRED,             /* horizontal prediction */
+    TM_PRED,            /* Truemotion prediction */
+    B_PRED,             /* block based prediction, each block has its own prediction mode */

-    NEARESTMV = 5,
-    NEARMV = 6,
-    ZEROMV = 7,
-    NEWMV = 8,
-    SPLITMV = 9,
+    NEARESTMV,
+    NEARMV,
+    ZEROMV,
+    NEWMV,
+    SPLITMV,

-    MB_MODE_COUNT = 10
+    MB_MODE_COUNT
 } MB_PREDICTION_MODE;

 /* Macroblock level features */
@@ -192,47 +189,24 @@ typedef struct

 typedef struct
 {
-    short *qcoeff_base;
-    int qcoeff_offset;
-
-    short *dqcoeff_base;
-    int dqcoeff_offset;
-
-    unsigned char *predictor_base;
-    int predictor_offset;
-
-    short *diff_base;
-    int diff_offset;
+    short *qcoeff;
+    short *dqcoeff;
+    unsigned char  *predictor;
+    short *diff;
+    short *reference;

    short *dequant;

-#if CONFIG_OPENCL
-    cl_command_queue cl_commands; //pointer to macroblock CL command queue
-
-    cl_mem cl_diff_mem;
-    cl_mem cl_predictor_mem;
-    cl_mem cl_qcoeff_mem;
-    cl_mem cl_dqcoeff_mem;
-    cl_mem cl_eobs_mem;
-
-    cl_mem cl_dequant_mem; //Block-specific, not shared
-
-    cl_bool sixtap_filter; //Subpixel Prediction type (true=sixtap, false=bilinear)
-
-#endif
-
    /* 16 Y blocks, 4 U blocks, 4 V blocks each with 16 entries */
-    unsigned char **base_pre; //previous frame, same Macroblock, base pointer
+    unsigned char **base_pre;
    int pre;
    int pre_stride;

-    unsigned char **base_dst; //destination base pointer
+    unsigned char **base_dst;
    int dst;
    int dst_stride;

-    int eob; //only used in encoder? Decoder uses MBD.eobs
-
-    char *eobs_base; //beginning of MB.eobs
+    int eob;

    B_MODE_INFO bmi;

@@ -242,26 +216,16 @@ typedef struct
 {
    DECLARE_ALIGNED(16, short, diff[400]);      /* from idct diff */
    DECLARE_ALIGNED(16, unsigned char,  predictor[384]);
+/* not used    DECLARE_ALIGNED(16, short, reference[384]); */
    DECLARE_ALIGNED(16, short, qcoeff[400]);
    DECLARE_ALIGNED(16, short, dqcoeff[400]);
    DECLARE_ALIGNED(16, char,  eobs[25]);

-#if CONFIG_OPENCL
-    cl_command_queue cl_commands; //Each macroblock gets its own command queue.
-    cl_mem cl_diff_mem;
-    cl_mem cl_predictor_mem;
-    cl_mem cl_qcoeff_mem;
-    cl_mem cl_dqcoeff_mem;
-    cl_mem cl_eobs_mem;
-
-    cl_bool sixtap_filter;
-#endif
-
    /* 16 Y blocks, 4 U, 4 V, 1 DC 2nd order block, each with 16 entries. */
    BLOCKD block[25];

    YV12_BUFFER_CONFIG pre; /* Filtered copy of previous frame reconstruction */
-    YV12_BUFFER_CONFIG dst; /* Destination buffer for current frame */
+    YV12_BUFFER_CONFIG dst;

    MODE_INFO *mode_info_context;
    int mode_info_stride;
@@ -311,7 +275,6 @@ typedef struct

    unsigned int frames_since_golden;
    unsigned int frames_till_alt_ref_frame;
-
    vp8_subpix_fn_t  subpixel_predict;
    vp8_subpix_fn_t  subpixel_predict8x4;
    vp8_subpix_fn_t  subpixel_predict8x8;
--- a/vp8/common/boolcoder.h
+++ b/vp8/common/boolcoder.h
@@ -0,0 +1,570 @@
+/*
+ *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
+ *
+ *  Use of this source code is governed by a BSD-style license
+ *  that can be found in the LICENSE file in the root of the source
+ *  tree. An additional intellectual property rights grant can be found
+ *  in the file PATENTS.  All contributing project authors may
+ *  be found in the AUTHORS file in the root of the source tree.
+ */
+
+
+#ifndef bool_coder_h
+#define bool_coder_h 1
+
+/* Arithmetic bool coder with largish probability range.
+   Timothy S Murphy  6 August 2004 */
+
+/* So as not to force users to drag in too much of my idiosyncratic C++ world,
+   I avoid fancy storage management. */
+
+#include <assert.h>
+
+#include <stddef.h>
+#include <stdio.h>
+
+typedef unsigned char vp8bc_index_t; // probability index
+
+/* There are a couple of slight variants in the details of finite-precision
+   arithmetic coding.  May be safely ignored by most users. */
+
+enum vp8bc_rounding
+{
+    vp8bc_down = 0,     // just like VP8
+    vp8bc_down_full = 1, // handles minimum probability correctly
+    vp8bc_up = 2
+};
+
+#if _MSC_VER
+
+/* Note that msvc by default does not inline _anything_ (regardless of the
+   setting of inline_depth) and that a command-line option (-Ob1 or -Ob2)
+   is required to inline even the smallest functions. */
+
+#   pragma inline_depth( 255)           // I mean it when I inline something
+#   pragma warning( disable : 4099)     // No class vs. struct harassment
+#   pragma warning( disable : 4250)     // dominance complaints
+#   pragma warning( disable : 4284)     // operator-> in templates
+#   pragma warning( disable : 4800)     // bool conversion
+
+// don't let prefix ++,-- stand in for postfix, disaster would ensue
+
+#   pragma warning( error : 4620 4621)
+
+#endif  // _MSC_VER
+
+
+#if __cplusplus
+
+// Sometimes one wishes to be definite about integer lengths.
+
+struct int_types
+{
+    typedef const bool cbool;
+    typedef const signed char cchar;
+    typedef const short cshort;
+    typedef const int cint;
+    typedef const int clong;
+
+    typedef const double cdouble;
+    typedef const size_t csize_t;
+
+    typedef unsigned char uchar;    // 8 bits
+    typedef const uchar cuchar;
+
+    typedef short int16;
+    typedef unsigned short uint16;
+    typedef const int16 cint16;
+    typedef const uint16 cuint16;
+
+    typedef int int32;
+    typedef unsigned int uint32;
+    typedef const int32 cint32;
+    typedef const uint32 cuint32;
+
+    typedef unsigned int uint;
+    typedef unsigned int ulong;
+    typedef const uint cuint;
+    typedef const ulong culong;
+
+
+    // All structs consume space, may as well have a vptr.
+
+    virtual ~int_types();
+};
+
+
+struct bool_coder_spec;
+struct bool_coder;
+struct bool_writer;
+struct bool_reader;
+
+
+struct bool_coder_namespace : int_types
+{
+    typedef vp8bc_index_t Index;
+    typedef bool_coder_spec Spec;
+    typedef const Spec c_spec;
+
+    enum Rounding
+    {
+        Down = vp8bc_down,
+        down_full = vp8bc_down_full,
+        Up = vp8bc_up
+    };
+};
+
+
+// Archivable specification of a bool coder includes rounding spec
+// and probability mapping table.  The latter replaces a uchar j
+// (0 <= j < 256) with an arbitrary uint16 tbl[j] = p.
+// p/65536 is then the probability of a zero.
+
+struct bool_coder_spec : bool_coder_namespace
+{
+    friend struct bool_coder;
+    friend struct bool_writer;
+    friend struct bool_reader;
+    friend struct bool_coder_spec_float;
+    friend struct bool_coder_spec_explicit_table;
+    friend struct bool_coder_spec_exponential_table;
+    friend struct BPsrc;
+private:
+    uint w;                 // precision
+    Rounding r;
+
+    uint ebits, mbits, ebias;
+    uint32 mmask;
+
+    Index max_index, half_index;
+
+    uint32 mantissa(Index i) const
+    {
+        assert(i < half_index);
+        return (1 << mbits) + (i & mmask);
+    }
+    uint exponent(Index i) const
+    {
+        assert(i < half_index);
+        return ebias - (i >> mbits);
+    }
+
+    uint16 Ptbl[256];       // kinda clunky, but so is storage management.
+
+    /* Cost in bits of encoding a zero at every probability, scaled by 2^20.
+       Assumes that index is at most 8 bits wide. */
+
+    uint32 Ctbl[256];
+
+    uint32 split(Index i, uint32 R) const    // 1 <= split <= max( 1, R-1)
+    {
+        if (!ebias)
+            return 1 + (((R - 1) * Ptbl[i]) >> 16);
+
+        if (i >= half_index)
+            return R - split(max_index - i, R);
+
+        return 1 + (((R - 1) * mantissa(i)) >> exponent(i));
+    }
+
+    uint32 max_range() const
+    {
+        return (1 << w) - (r == down_full ? 0 : 1);
+    }
+    uint32 min_range() const
+    {
+        return (1 << (w - 1)) + (r == down_full ? 1 : 0);
+    }
+    uint32 Rinc() const
+    {
+        return r == Up ? 1 : 0;
+    }
+
+    void check_prec() const;
+
+    bool float_init(uint Ebits, uint Mbits);
+
+    void cost_init();
+
+    bool_coder_spec(
+        uint prec, Rounding rr, uint Ebits = 0, uint Mbits = 0
+    )
+        : w(prec), r(rr)
+    {
+        float_init(Ebits, Mbits);
+    }
+public:
+    // Read complete spec from file.
+    bool_coder_spec(FILE *);
+
+    // Write spec to file.
+    void dump(FILE *) const;
+
+    // return probability index best approximating prob.
+    Index operator()(double prob) const;
+
+    // probability corresponding to index
+    double operator()(Index i) const;
+
+    Index complement(Index i) const
+    {
+        return max_index - i;
+    }
+
+    Index max_index() const
+    {
+        return max_index;
+    }
+    Index half_index() const
+    {
+        return half_index;
+    }
+
+    uint32 cost_zero(Index i) const
+    {
+        return Ctbl[i];
+    }
+    uint32 cost_one(Index i) const
+    {
+        return Ctbl[ max_index - i];
+    }
+    uint32 cost_bit(Index i, bool b) const
+    {
+        return Ctbl[b? max_index-i:i];
+    }
+};
+
+
+/* Pseudo floating-point probability specification.
+
+   At least one of Ebits and Mbits must be nonzero.
+
+   Since all arithmetic is done at 32 bits, Ebits is at most 5.
+
+   Total significant bits in index is Ebits + Mbits + 1.
+
+   Below the halfway point (i.e. when the top significant bit is 0),
+   the index is (e << Mbits) + m.
+
+   The exponent e is between 0 and (2**Ebits) - 1,
+   the mantissa m is between 0 and (2**Mbits) - 1.
+
+   Prepending an implicit 1 to the mantissa, the probability is then
+
+        (2**Mbits + m) >> (e - 2**Ebits - 1 - Mbits),
+
+   which has (1/2)**(2**Ebits + 1) as a minimum
+   and (1/2) * [1 - 2**(Mbits + 1)] as a maximum.
+
+   When the index is above the halfway point, the probability is the
+   complement of the probability associated to the complement of the index.
+
+   Note that the probability increases with the index and that, because of
+   the symmetry, we cannot encode probability exactly 1/2; though we
+   can get as close to 1/2 as we like, provided we have enough Mbits.
+
+   The latter is of course not a problem in practice, one never has
+   exact probabilities and entropy errors are second order, that is, the
+   "overcoding" of a zero will be largely compensated for by the
+   "undercoding" of a one (or vice-versa).
+
+   Compared to arithmetic probability specs (a la VP8), this will do better
+   at very high and low probabilities and worse at probabilities near 1/2,
+   as well as facilitating the usage of wider or narrower probability indices.
+*/
+
+struct bool_coder_spec_float : bool_coder_spec
+{
+    bool_coder_spec_float(
+        uint Ebits = 3, uint Mbits = 4, Rounding rr = down_full, uint prec = 12
+    )
+        : bool_coder_spec(prec, rr, Ebits, Mbits)
+    {
+        cost_init();
+    }
+};
+
+
+struct bool_coder_spec_explicit_table : bool_coder_spec
+{
+    bool_coder_spec_explicit_table(
+        cuint16 probability_table[256] = 0,  // default is tbl[i] = i << 8.
+        Rounding = down_full,
+        uint precision = 16
+    );
+};
+
+// Contruct table via multiplicative interpolation between
+// p[128] = 1/2  and p[0] = (1/2)^x.
+// Since we are working with 16-bit precision, x is at most 16.
+// For probabilities to increase with i, we must have x > 1.
+// For 0 <= i <= 128, p[i] = (1/2)^{ 1 + [1 - (i/128)]*[x-1] }.
+// Finally, p[128+i] = 1 - p[128 - i].
+
+struct bool_coder_spec_exponential_table : bool_coder_spec
+{
+    bool_coder_spec_exponential_table(uint x, Rounding = down_full, uint prec = 16);
+};
+
+
+// Commonalities between writer and reader.
+
+struct bool_coder : bool_coder_namespace
+{
+    friend struct bool_writer;
+    friend struct bool_reader;
+    friend struct BPsrc;
+private:
+    uint32 Low, Range;
+    cuint32 min_range;
+    cuint32 rinc;
+    c_spec spec;
+
+    void _reset()
+    {
+        Low = 0;
+        Range = spec.max_range();
+    }
+
+    bool_coder(c_spec &s)
+        :  min_range(s.min_range()),
+           rinc(s.Rinc()),
+           spec(s)
+    {
+        _reset();
+    }
+
+    uint32 half() const
+    {
+        return 1 + ((Range - 1) >> 1);
+    }
+public:
+    c_spec &Spec() const
+    {
+        return spec;
+    }
+};
+
+
+struct bool_writer : bool_coder
+{
+    friend struct BPsrc;
+private:
+    uchar *Bstart, *Bend, *B;
+    int bit_lag;
+    bool is_toast;
+    void carry();
+    void reset()
+    {
+        _reset();
+        bit_lag = 32 - spec.w;
+        is_toast = 0;
+    }
+    void raw(bool value, uint32 split);
+public:
+    bool_writer(c_spec &, uchar *Dest, size_t Len);
+    virtual ~bool_writer();
+
+    void operator()(Index p, bool v)
+    {
+        raw(v, spec.split(p, Range));
+    }
+
+    uchar *buf() const
+    {
+        return Bstart;
+    }
+    size_t bytes_written() const
+    {
+        return B - Bstart;
+    }
+
+    // Call when done with input, flushes internal state.
+    // DO NOT write any more data after calling this.
+
+    bool_writer &flush();
+
+    void write_bits(int n, uint val)
+    {
+        if (n)
+        {
+            uint m = 1 << (n - 1);
+
+            do
+            {
+                raw((bool)(val & m), half());
+            }
+            while (m >>= 1);
+        }
+    }
+
+#   if 0
+    // We are agnostic about storage management.
+    // By default, overflows throw an assert but user can
+    // override to provide an expanding buffer using ...
+
+    virtual void overflow(uint Len) const;
+
+    // ... this function copies already-written data into new buffer
+    // and retains new buffer location.
+
+    void new_buffer(uchar *dest, uint Len);
+
+    // Note that storage management is the user's responsibility.
+#   endif
+};
+
+
+// This could be adjusted to use a little less lookahead.
+
+struct bool_reader : bool_coder
+{
+    friend struct BPsrc;
+private:
+    cuchar *const Bstart;   // for debugging
+    cuchar *B;
+    cuchar *const Bend;
+    cuint shf;
+    uint bct;
+    bool raw(uint32 split);
+public:
+    bool_reader(c_spec &s, cuchar *src, size_t Len);
+
+    bool operator()(Index p)
+    {
+        return raw(spec.split(p, Range));
+    }
+
+    uint read_bits(int num_bits)
+    {
+        uint v = 0;
+
+        while (--num_bits >= 0)
+            v += v + (raw(half()) ? 1 : 0);
+
+        return v;
+    }
+};
+
+extern "C" {
+
+#endif /*  __cplusplus */
+
+
+    /* C interface */
+
+    typedef struct bool_coder_spec bool_coder_spec;
+    typedef struct bool_writer bool_writer;
+    typedef struct bool_reader bool_reader;
+
+    typedef const bool_coder_spec c_bool_coder_spec;
+    typedef const bool_writer c_bool_writer;
+    typedef const bool_reader c_bool_reader;
+
+
+    /* Optionally override default precision when constructing coder_specs.
+       Just pass a zero pointer if you don't care.
+       Precision is at most 16 bits for table specs, at most 23 otherwise. */
+
+    struct vp8bc_prec
+    {
+        enum vp8bc_rounding r;      /* see top header file for def */
+        unsigned int prec;          /* range precision in bits */
+    };
+
+    typedef const struct vp8bc_prec vp8bc_c_prec;
+
+    /* bool_coder_spec contains mapping of uchars to actual probabilities
+       (16 bit uints) as well as (usually immaterial) selection of
+       exact finite-precision algorithm used (for now, the latter can only
+       be overridden using the C++ interface).
+       See comments above the corresponding C++ constructors for discussion,
+       especially of exponential probability table generation. */
+
+    bool_coder_spec *vp8bc_vp8spec(); // just like vp8
+
+    bool_coder_spec *vp8bc_literal_spec(
+        const unsigned short prob_map[256],  // 0 is like vp8 w/more precision
+        vp8bc_c_prec*
+    );
+
+    bool_coder_spec *vp8bc_float_spec(
+        unsigned int exponent_bits, unsigned int mantissa_bits, vp8bc_c_prec*
+    );
+
+    bool_coder_spec *vp8bc_exponential_spec(unsigned int min_exp, vp8bc_c_prec *);
+
+    bool_coder_spec *vp8bc_spec_from_file(FILE *);
+
+
+    void vp8bc_destroy_spec(c_bool_coder_spec *);
+
+    void vp8bc_spec_to_file(c_bool_coder_spec *, FILE *);
+
+
+    /* Nearest index to supplied probability of zero, 0 <= prob <= 1. */
+
+    vp8bc_index_t vp8bc_index(c_bool_coder_spec *, double prob);
+
+    vp8bc_index_t vp8bc_index_from_counts(
+        c_bool_coder_spec *p, unsigned int zero_ct, unsigned int one_ct
+    );
+
+    /* In case you want to look */
+
+    double vp8bc_probability(c_bool_coder_spec *, vp8bc_index_t);
+
+    /* Opposite index */
+
+    vp8bc_index_t vp8bc_complement(c_bool_coder_spec *, vp8bc_index_t);
+
+    /* Cost in bits of encoding a zero at given probability, scaled by 2^20.
+       (assumes that an int holds at least 32 bits). */
+
+    unsigned int vp8bc_cost_zero(c_bool_coder_spec *, vp8bc_index_t);
+
+    unsigned int vp8bc_cost_one(c_bool_coder_spec *, vp8bc_index_t);
+    unsigned int vp8bc_cost_bit(c_bool_coder_spec *, vp8bc_index_t, int);
+
+
+    /* bool_writer interface */
+
+    /* Length = 0 disables checking for writes beyond buffer end. */
+
+    bool_writer *vp8bc_create_writer(
+        c_bool_coder_spec *, unsigned char *Destination, size_t Length
+    );
+
+    /* Flushes out any buffered data and returns total # of bytes written. */
+
+    size_t vp8bc_destroy_writer(bool_writer *);
+
+    void vp8bc_write_bool(bool_writer *, int boolean_val, vp8bc_index_t false_prob);
+
+    void vp8bc_write_bits(
+        bool_writer *, unsigned int integer_value, int number_of_bits
+    );
+
+    c_bool_coder_spec *vp8bc_writer_spec(c_bool_writer *);
+
+
+    /* bool_reader interface */
+
+    /* Length = 0 disables checking for reads beyond buffer end. */
+
+    bool_reader *vp8bc_create_reader(
+        c_bool_coder_spec *, const unsigned char *Source, size_t Length
+    );
+    void vp8bc_destroy_reader(bool_reader *);
+
+    int vp8bc_read_bool(bool_reader *, vp8bc_index_t false_prob);
+
+    unsigned int vp8bc_read_bits(bool_reader *, int number_of_bits);
+
+    c_bool_coder_spec *vp8bc_reader_spec(c_bool_reader *);
+
+#if __cplusplus
+}
+#endif
+
+#endif  /* bool_coder_h */
--- a/vp8/common/codec_common_interface.h
+++ b/vp8/common/codec_common_interface.h
@@ -0,0 +1,93 @@
+/*
+ *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
+ *
+ *  Use of this source code is governed by a BSD-style license
+ *  that can be found in the LICENSE file in the root of the source
+ *  tree. An additional intellectual property rights grant can be found
+ *  in the file PATENTS.  All contributing project authors may
+ *  be found in the AUTHORS file in the root of the source tree.
+ */
+
+#ifndef CODEC_COMMON_INTERFACE_H
+#define CODEC_COMMON_INTERFACE_H
+
+#define __export
+#define _export
+#define dll_export   __declspec( dllexport )
+#define dll_import   __declspec( dllimport )
+
+// Playback ERROR Codes.
+#define NO_DECODER_ERROR            0
+#define REMOTE_DECODER_ERROR        -1
+
+#define DFR_BAD_DCT_COEFF           -100
+#define DFR_ZERO_LENGTH_FRAME       -101
+#define DFR_FRAME_SIZE_INVALID      -102
+#define DFR_OUTPUT_BUFFER_OVERFLOW  -103
+#define DFR_INVALID_FRAME_HEADER    -104
+#define FR_INVALID_MODE_TOKEN       -110
+#define ETR_ALLOCATION_ERROR        -200
+#define ETR_INVALID_ROOT_PTR        -201
+#define SYNCH_ERROR                 -400
+#define BUFFER_UNDERFLOW_ERROR      -500
+#define PB_IB_OVERFLOW_ERROR        -501
+
+// External error triggers
+#define PB_HEADER_CHECKSUM_ERROR    -601
+#define PB_DATA_CHECKSUM_ERROR      -602
+
+// DCT Error Codes
+#define DDCT_EXPANSION_ERROR        -700
+#define DDCT_INVALID_TOKEN_ERROR    -701
+
+// exception_errors
+#define GEN_EXCEPTIONS              -800
+#define EX_UNQUAL_ERROR             -801
+
+// Unrecoverable error codes
+#define FATAL_PLAYBACK_ERROR        -1000
+#define GEN_ERROR_CREATING_CDC      -1001
+#define GEN_THREAD_CREATION_ERROR   -1002
+#define DFR_CREATE_BMP_FAILED       -1003
+
+// YUV buffer configuration structure
+typedef struct
+{
+    int     y_width;
+    int     y_height;
+    int     y_stride;
+
+    int     uv_width;
+    int     uv_height;
+    int     uv_stride;
+
+    unsigned char   *y_buffer;
+    unsigned char   *u_buffer;
+    unsigned char   *v_buffer;
+
+} YUV_BUFFER_CONFIG;
+typedef enum
+{
+    C_SET_KEY_FRAME,
+    C_SET_FIXED_Q,
+    C_SET_FIRSTPASS_FILE,
+    C_SET_EXPERIMENTAL_MIN,
+    C_SET_EXPERIMENTAL_MAX = C_SET_EXPERIMENTAL_MIN + 255,
+    C_SET_CHECKPROTECT,
+    C_SET_TESTMODE,
+    C_SET_INTERNAL_SIZE,
+    C_SET_RECOVERY_FRAME,
+    C_SET_REFERENCEFRAME,
+    C_SET_GOLDENFRAME
+
+#ifndef VP50_COMP_INTERFACE
+    // Specialist test facilities.
+//    C_VCAP_PARAMS,              // DO NOT USE FOR NOW WITH VFW CODEC
+#endif
+
+} C_SETTING;
+
+typedef unsigned long C_SET_VALUE;
+
+
+#endif
--- a/vp8/common/entropymv.h
+++ b/vp8/common/entropymv.h
@@ -18,8 +18,6 @@ enum
 {
    mv_max  = 1023,              /* max absolute value of a MV component */
    MVvals = (2 * mv_max) + 1,   /* # possible values "" */
-    mvfp_max  = 255,              /* max absolute value of a full pixel MV component */
-    MVfpvals = (2 * mvfp_max) +1, /* # possible full pixel MV values */

    mvlong_width = 10,       /* Large MVs have 9 bit magnitudes */
    mvnum_short = 8,         /* magnitudes 0 through 7 */
--- a/vp8/common/extend.c
+++ b/vp8/common/extend.c
@@ -13,12 +13,10 @@
 #include "vpx_mem/vpx_mem.h"


-static void copy_and_extend_plane
+static void extend_plane_borders
 (
    unsigned char *s, /* source */
-    int sp,           /* source pitch */
-    unsigned char *d, /* destination */
-    int dp,           /* destination pitch */
+    int sp,           /* pitch */
    int h,            /* height */
    int w,            /* width */
    int et,           /* extend top border */
@@ -27,6 +25,7 @@ static void copy_and_extend_plane
    int er            /* extend right border */
 )
 {
+
    int i;
    unsigned char *src_ptr1, *src_ptr2;
    unsigned char *dest_ptr1, *dest_ptr2;
@@ -35,73 +34,68 @@ static void copy_and_extend_plane
    /* copy the left and right most columns out */
    src_ptr1 = s;
    src_ptr2 = s + w - 1;
-    dest_ptr1 = d - el;
-    dest_ptr2 = d + w;
+    dest_ptr1 = s - el;
+    dest_ptr2 = s + w;

-    for (i = 0; i < h; i++)
+    for (i = 0; i < h - 0 + 1; i++)
    {
-        vpx_memset(dest_ptr1, src_ptr1[0], el);
-        vpx_memcpy(dest_ptr1 + el, src_ptr1, w);
+        /* Some linkers will complain if we call vpx_memset with el set to a
+         * constant 0.
+         */
+        if (el)
+            vpx_memset(dest_ptr1, src_ptr1[0], el);
        vpx_memset(dest_ptr2, src_ptr2[0], er);
        src_ptr1  += sp;
        src_ptr2  += sp;
-        dest_ptr1 += dp;
-        dest_ptr2 += dp;
+        dest_ptr1 += sp;
+        dest_ptr2 += sp;
    }

-    /* Now copy the top and bottom lines into each line of the respective
-     * borders
-     */
-    src_ptr1 = d - el;
-    src_ptr2 = d + dp * (h - 1) - el;
-    dest_ptr1 = d + dp * (-et) - el;
-    dest_ptr2 = d + dp * (h) - el;
-    linesize = el + er + w;
+    /* Now copy the top and bottom source lines into each line of the respective borders */
+    src_ptr1 = s - el;
+    src_ptr2 = s + sp * (h - 1) - el;
+    dest_ptr1 = s + sp * (-et) - el;
+    dest_ptr2 = s + sp * (h) - el;
+    linesize = el + er + w + 1;

-    for (i = 0; i < et; i++)
+    for (i = 0; i < (int)et; i++)
    {
        vpx_memcpy(dest_ptr1, src_ptr1, linesize);
-        dest_ptr1 += dp;
+        dest_ptr1 += sp;
    }

-    for (i = 0; i < eb; i++)
+    for (i = 0; i < (int)eb; i++)
    {
        vpx_memcpy(dest_ptr2, src_ptr2, linesize);
-        dest_ptr2 += dp;
+        dest_ptr2 += sp;
    }
 }


-void vp8_copy_and_extend_frame(YV12_BUFFER_CONFIG *src,
-                               YV12_BUFFER_CONFIG *dst)
+void vp8_extend_to_multiple_of16(YV12_BUFFER_CONFIG *ybf, int width, int height)
 {
-    int et = dst->border;
-    int el = dst->border;
-    int eb = dst->border + dst->y_height - src->y_height;
-    int er = dst->border + dst->y_width - src->y_width;
+    int er = 0xf & (16 - (width & 0xf));
+    int eb = 0xf & (16 - (height & 0xf));

-    copy_and_extend_plane(src->y_buffer, src->y_stride,
-                          dst->y_buffer, dst->y_stride,
-                          src->y_height, src->y_width,
-                          et, el, eb, er);
+    /* check for non multiples of 16 */
+    if (er != 0 || eb != 0)
+    {
+        extend_plane_borders(ybf->y_buffer, ybf->y_stride, height, width, 0, 0, eb, er);

-    et = (et + 1) >> 1;
-    el = (el + 1) >> 1;
-    eb = (eb + 1) >> 1;
-    er = (er + 1) >> 1;
+        /* adjust for uv */
+        height = (height + 1) >> 1;
+        width  = (width  + 1) >> 1;
+        er = 0x7 & (8 - (width  & 0x7));
+        eb = 0x7 & (8 - (height & 0x7));

-    copy_and_extend_plane(src->u_buffer, src->uv_stride,
-                          dst->u_buffer, dst->uv_stride,
-                          src->uv_height, src->uv_width,
-                          et, el, eb, er);
-
-    copy_and_extend_plane(src->v_buffer, src->uv_stride,
-                          dst->v_buffer, dst->uv_stride,
-                          src->uv_height, src->uv_width,
-                          et, el, eb, er);
+        if (er || eb)
+        {
+            extend_plane_borders(ybf->u_buffer, ybf->uv_stride, height, width, 0, 0, eb, er);
+            extend_plane_borders(ybf->v_buffer, ybf->uv_stride, height, width, 0, 0, eb, er);
+        }
+    }
 }

-
 /* note the extension is only for the last row, for intra prediction purpose */
 void vp8_extend_mb_row(YV12_BUFFER_CONFIG *ybf, unsigned char *YPtr, unsigned char *UPtr, unsigned char *VPtr)
 {
--- a/vp8/common/extend.h
+++ b/vp8/common/extend.h
@@ -14,8 +14,8 @@

 #include "vpx_scale/yv12config.h"

+void Extend(YV12_BUFFER_CONFIG *ybf);
 void vp8_extend_mb_row(YV12_BUFFER_CONFIG *ybf, unsigned char *YPtr, unsigned char *UPtr, unsigned char *VPtr);
-void vp8_copy_and_extend_frame(YV12_BUFFER_CONFIG *src,
-                               YV12_BUFFER_CONFIG *dst);
+void vp8_extend_to_multiple_of16(YV12_BUFFER_CONFIG *ybf, int width, int height);

 #endif
--- a/vp8/common/filter.c
+++ b/vp8/common/filter.c
@@ -1,536 +0,0 @@
-/*
- *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-
-#include <stdlib.h>
-#include <stdio.h>
-
-#define REGISTER_FILTER 1
-#define CLAMP(x,min,max) if (x < min) x = min; else if ( x > max ) x = max;
-
-#if REGISTER_FILTER
-#define FILTER0 filter0
-#define FILTER1 filter1
-#define FILTER2 filter2
-#define FILTER3 filter3
-#define FILTER4 filter4
-#define FILTER5 filter5
-#else
-#define FILTER0 vp8_filter[0]
-#define FILTER1 vp8_filter[1]
-#define FILTER2 vp8_filter[2]
-#define FILTER3 vp8_filter[3]
-#define FILTER4 vp8_filter[4]
-#define FILTER5 vp8_filter[5]
-#endif
-
-#define SRC_INCREMENT src_increment
-
-#include "filter.h"
-#include "vpx_ports/mem.h"
-
-DECLARE_ALIGNED(16, const short, vp8_bilinear_filters[8][2]) =
-{
-    { 128,   0 },
-    { 112,  16 },
-    {  96,  32 },
-    {  80,  48 },
-    {  64,  64 },
-    {  48,  80 },
-    {  32,  96 },
-    {  16, 112 }
-};
-
-DECLARE_ALIGNED(16, const short, vp8_sub_pel_filters[8][6]) =
-{
-    { 0,  0,  128,    0,   0,  0 },         /* note that 1/8 pel positions are just as per alpha -0.5 bicubic */
-    { 0, -6,  123,   12,  -1,  0 },
-    { 2, -11, 108,   36,  -8,  1 },         /* New 1/4 pel 6 tap filter */
-    { 0, -9,   93,   50,  -6,  0 },
-    { 3, -16,  77,   77, -16,  3 },         /* New 1/2 pel 6 tap filter */
-    { 0, -6,   50,   93,  -9,  0 },
-    { 1, -8,   36,  108, -11,  2 },         /* New 1/4 pel 6 tap filter */
-    { 0, -1,   12,  123,  -6,  0 },
-};
-
-static void filter_block2d_first_pass
-(
-    unsigned char *src_ptr,
-    int *output_ptr,
-    unsigned int src_pixels_per_line,
-    unsigned int pixel_step,
-    unsigned int output_height,
-    unsigned int output_width,
-    const short *vp8_filter
-)
-{
-
-    unsigned int i, j;
-    int Temp;
-
-#if REGISTER_FILTER
-    short filter0 = vp8_filter[0];
-    short filter1 = vp8_filter[1];
-    short filter2 = vp8_filter[2];
-    short filter3 = vp8_filter[3];
-    short filter4 = vp8_filter[4];
-    short filter5 = vp8_filter[5];
-#endif
-
-    int ps2 = 2*(int)pixel_step;
-    int ps3 = 3*(int)pixel_step;
-
-    unsigned int src_increment = src_pixels_per_line - output_width;
-    for (i = 0; i < output_height; i++)
-    {
-        for (j = 0; j < output_width; j++)
-        {
-            Temp = ((int)src_ptr[-1*ps2]         * FILTER0);
-            Temp += ((int)src_ptr[-1*(int)pixel_step] * FILTER1) +
-               ((int)src_ptr[0]                * FILTER2) +
-               ((int)src_ptr[pixel_step]       * FILTER3) +
-               ((int)src_ptr[ps2]              * FILTER4) +
-               ((int)src_ptr[ps3]              * FILTER5) +
-               (VP8_FILTER_WEIGHT >> 1);      /* Rounding */
-
-            /* Normalize back to 0-255 */
-            Temp = Temp >> VP8_FILTER_SHIFT;
-            CLAMP(Temp, 0, 255);
-
-            output_ptr[j] = Temp;
-            src_ptr++;
-        }
-
-        /* Next row... */
-        src_ptr    += SRC_INCREMENT;
-        output_ptr += output_width;
-    }
-}
-
-static void filter_block2d_second_pass
-(
-    int *src_ptr,
-    unsigned char *output_ptr,
-    int output_pitch,
-    unsigned int src_pixels_per_line,
-    unsigned int pixel_step,
-    unsigned int output_height,
-    unsigned int output_width,
-    const short *vp8_filter
-)
-{
-	unsigned int i, j;
-	int  Temp;
-
-#if REGISTER_FILTER
-    short filter0 = vp8_filter[0];
-    short filter1 = vp8_filter[1];
-    short filter2 = vp8_filter[2];
-    short filter3 = vp8_filter[3];
-    short filter4 = vp8_filter[4];
-    short filter5 = vp8_filter[5];
-#endif
-
-    int ps2 = ((int)pixel_step) << 1;
-    int ps3 = ps2 + (int)pixel_step;
-    unsigned int src_increment = src_pixels_per_line - output_width;
-
-    for (i = 0; i < output_height; i++)
-    {
-        for (j = 0; j < output_width; j++)
-        {
-            /* Apply filter */
-            Temp = ((int)src_ptr[-1*ps2] * FILTER0) +
-                   ((int)src_ptr[-1*(int)pixel_step] * FILTER1) +
-                   ((int)src_ptr[0]                  * FILTER2) +
-                   ((int)src_ptr[pixel_step]         * FILTER3) +
-                   ((int)src_ptr[ps2]       * FILTER4) +
-                   ((int)src_ptr[ps3]       * FILTER5) +
-                   (VP8_FILTER_WEIGHT >> 1);   /* Rounding */
-
-            /* Normalize back to 0-255 */
-            Temp = Temp >> VP8_FILTER_SHIFT;
-            CLAMP(Temp, 0, 255);
-
-            output_ptr[j] = (unsigned char)Temp;
-            src_ptr++;
-        }
-
-        /* Start next row */
-        src_ptr    += src_increment;
-        output_ptr += output_pitch;
-    }
-}
-
-
-static void filter_block2d
-(
-    unsigned char  *src_ptr,
-    unsigned char  *output_ptr,
-    unsigned int src_pixels_per_line,
-    int output_pitch,
-    const short  *HFilter,
-    const short  *VFilter
-)
-{
-    int FData[9*4]; /* Temp data buffer used in filtering */
-
-    /* First filter 1-D horizontally... */
-    filter_block2d_first_pass(src_ptr - (2 * src_pixels_per_line), FData, src_pixels_per_line, 1, 9, 4, HFilter);
-
-    /* then filter verticaly... */
-    filter_block2d_second_pass(FData + 8, output_ptr, output_pitch, 4, 4, 4, 4, VFilter);
-}
-
-
-void vp8_sixtap_predict_c
-(
-    unsigned char  *src_ptr,
-    int   src_pixels_per_line,
-    int  xoffset,
-    int  yoffset,
-    unsigned char *dst_ptr,
-    int dst_pitch
-)
-{
-    const short  *HFilter;
-    const short  *VFilter;
-
-    HFilter = vp8_sub_pel_filters[xoffset];   /* 6 tap */
-    VFilter = vp8_sub_pel_filters[yoffset];   /* 6 tap */
-
-    filter_block2d(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter);
-}
-
-void vp8_sixtap_predict8x8_c
-(
-    unsigned char  *src_ptr,
-    int  src_pixels_per_line,
-    int  xoffset,
-    int  yoffset,
-    unsigned char *dst_ptr,
-    int  dst_pitch
-)
-{
-    const short  *HFilter;
-    const short  *VFilter;
-    int FData[13*16];   /* Temp data buffer used in filtering */
-
-    HFilter = vp8_sub_pel_filters[xoffset];   /* 6 tap */
-    VFilter = vp8_sub_pel_filters[yoffset];   /* 6 tap */
-
-    /* First filter 1-D horizontally... */
-    filter_block2d_first_pass(src_ptr - (2 * src_pixels_per_line), FData, src_pixels_per_line, 1, 13, 8, HFilter);
-
-
-    /* then filter verticaly... */
-    filter_block2d_second_pass(FData + 16, dst_ptr, dst_pitch, 8, 8, 8, 8, VFilter);
-
-}
-
-void vp8_sixtap_predict8x4_c
-(
-    unsigned char  *src_ptr,
-    int  src_pixels_per_line,
-    int  xoffset,
-    int  yoffset,
-    unsigned char *dst_ptr,
-    int  dst_pitch
-)
-{
-    const short  *HFilter;
-    const short  *VFilter;
-    int FData[13*16];   /* Temp data buffer used in filtering */
-
-    HFilter = vp8_sub_pel_filters[xoffset];   /* 6 tap */
-    VFilter = vp8_sub_pel_filters[yoffset];   /* 6 tap */
-
-    /* First filter 1-D horizontally... */
-    filter_block2d_first_pass(src_ptr - (2 * src_pixels_per_line), FData, src_pixels_per_line, 1, 9, 8, HFilter);
-
-
-    /* then filter verticaly... */
-    filter_block2d_second_pass(FData + 16, dst_ptr, dst_pitch, 8, 8, 4, 8, VFilter);
-
-}
-
-void vp8_sixtap_predict16x16_c
-(
-    unsigned char  *src_ptr,
-    int  src_pixels_per_line,
-    int  xoffset,
-    int  yoffset,
-    unsigned char *dst_ptr,
-    int  dst_pitch
-)
-{
-    const short  *HFilter;
-    const short  *VFilter;
-    int FData[21*24];   /* Temp data buffer used in filtering */
-
-
-    HFilter = vp8_sub_pel_filters[xoffset];   /* 6 tap */
-    VFilter = vp8_sub_pel_filters[yoffset];   /* 6 tap */
-
-    /* First filter 1-D horizontally... */
-    filter_block2d_first_pass(src_ptr - (2 * src_pixels_per_line), FData, src_pixels_per_line, 1, 21, 16, HFilter);
-
-    /* then filter verticaly... */
-    filter_block2d_second_pass(FData + 32, dst_ptr, dst_pitch, 16, 16, 16, 16, VFilter);
-
-}
-
-
-/****************************************************************************
- *
- *  ROUTINE       : filter_block2d_bil_first_pass
- *
- *  INPUTS        : UINT8  *src_ptr    : Pointer to source block.
- *                  UINT32  src_stride : Stride of source block.
- *                  UINT32  height     : Block height.
- *                  UINT32  width      : Block width.
- *                  INT32  *vp8_filter : Array of 2 bi-linear filter taps.
- *
- *  OUTPUTS       : INT32  *dst_ptr    : Pointer to filtered block.
- *
- *  RETURNS       : void
- *
- *  FUNCTION      : Applies a 1-D 2-tap bi-linear filter to the source block
- *                  in the horizontal direction to produce the filtered output
- *                  block. Used to implement first-pass of 2-D separable filter.
- *
- *  SPECIAL NOTES : Produces INT32 output to retain precision for next pass.
- *                  Two filter taps should sum to VP8_FILTER_WEIGHT.
- *
- ****************************************************************************/
-static void filter_block2d_bil_first_pass
-(
-    unsigned char  *src_ptr,
-    unsigned short *dst_ptr,
-    unsigned int    src_stride,
-    unsigned int    height,
-    unsigned int    width,
-    const short    *vp8_filter
-)
-{
-    unsigned int i, j;
-
-    for (i = 0; i < height; i++)
-    {
-        for (j = 0; j < width; j++)
-        {
-            /* Apply bilinear filter */
-            dst_ptr[j] = (((int)src_ptr[0] * vp8_filter[0]) +
-                          ((int)src_ptr[1] * vp8_filter[1]) +
-                          (VP8_FILTER_WEIGHT / 2)) >> VP8_FILTER_SHIFT;
-            src_ptr++;
-        }
-
-        /* Next row... */
-        src_ptr += src_stride - width;
-        dst_ptr += width;
-    }
-}
-
-/****************************************************************************
- *
- *  ROUTINE       : filter_block2d_bil_second_pass
- *
- *  INPUTS        : INT32  *src_ptr    : Pointer to source block.
- *                  UINT32  dst_pitch  : Destination block pitch.
- *                  UINT32  height     : Block height.
- *                  UINT32  width      : Block width.
- *                  INT32  *vp8_filter : Array of 2 bi-linear filter taps.
- *
- *  OUTPUTS       : UINT16 *dst_ptr    : Pointer to filtered block.
- *
- *  RETURNS       : void
- *
- *  FUNCTION      : Applies a 1-D 2-tap bi-linear filter to the source block
- *                  in the vertical direction to produce the filtered output
- *                  block. Used to implement second-pass of 2-D separable filter.
- *
- *  SPECIAL NOTES : Requires 32-bit input as produced by filter_block2d_bil_first_pass.
- *                  Two filter taps should sum to VP8_FILTER_WEIGHT.
- *
- ****************************************************************************/
-static void filter_block2d_bil_second_pass
-(
-    unsigned short *src_ptr,
-    unsigned char  *dst_ptr,
-    int             dst_pitch,
-    unsigned int    height,
-    unsigned int    width,
-    const short    *vp8_filter
-)
-{
-    unsigned int  i, j;
-    int  Temp;
-
-    for (i = 0; i < height; i++)
-    {
-        for (j = 0; j < width; j++)
-        {
-            /* Apply filter */
-            Temp = ((int)src_ptr[0]     * vp8_filter[0]) +
-                   ((int)src_ptr[width] * vp8_filter[1]) +
-                   (VP8_FILTER_WEIGHT / 2);
-            dst_ptr[j] = (unsigned int)(Temp >> VP8_FILTER_SHIFT);
-            src_ptr++;
-        }
-
-        /* Next row... */
-        dst_ptr += dst_pitch;
-    }
-}
-
-
-/****************************************************************************
- *
- *  ROUTINE       : filter_block2d_bil
- *
- *  INPUTS        : UINT8  *src_ptr          : Pointer to source block.
- *                  UINT32  src_pitch        : Stride of source block.
- *                  UINT32  dst_pitch        : Stride of destination block.
- *                  INT32  *HFilter          : Array of 2 horizontal filter taps.
- *                  INT32  *VFilter          : Array of 2 vertical filter taps.
- *                  INT32  Width             : Block width
- *                  INT32  Height            : Block height
- *
- *  OUTPUTS       : UINT16 *dst_ptr       : Pointer to filtered block.
- *
- *  RETURNS       : void
- *
- *  FUNCTION      : 2-D filters an input block by applying a 2-tap
- *                  bi-linear filter horizontally followed by a 2-tap
- *                  bi-linear filter vertically on the result.
- *
- *  SPECIAL NOTES : The largest block size can be handled here is 16x16
- *
- ****************************************************************************/
-static void filter_block2d_bil
-(
-    unsigned char *src_ptr,
-    unsigned char *dst_ptr,
-    unsigned int   src_pitch,
-    unsigned int   dst_pitch,
-    const short   *HFilter,
-    const short   *VFilter,
-    int            Width,
-    int            Height
-)
-{
-
-    unsigned short FData[17*16];    /* Temp data buffer used in filtering */
-
-    /* First filter 1-D horizontally... */
-    filter_block2d_bil_first_pass(src_ptr, FData, src_pitch, Height + 1, Width, HFilter);
-
-    /* then 1-D vertically... */
-    filter_block2d_bil_second_pass(FData, dst_ptr, dst_pitch, Height, Width, VFilter);
-}
-
-
-void vp8_bilinear_predict4x4_c
-(
-    unsigned char  *src_ptr,
-    int   src_pixels_per_line,
-    int  xoffset,
-    int  yoffset,
-    unsigned char *dst_ptr,
-    int dst_pitch
-)
-{
-    const short *HFilter;
-    const short *VFilter;
-
-    HFilter = vp8_bilinear_filters[xoffset];
-    VFilter = vp8_bilinear_filters[yoffset];
-#if 0
-    {
-        int i;
-        unsigned char temp1[16];
-        unsigned char temp2[16];
-
-        bilinear_predict4x4_mmx(src_ptr, src_pixels_per_line, xoffset, yoffset, temp1, 4);
-        filter_block2d_bil(src_ptr, temp2, src_pixels_per_line, 4, HFilter, VFilter, 4, 4);
-
-        for (i = 0; i < 16; i++)
-        {
-            if (temp1[i] != temp2[i])
-            {
-                bilinear_predict4x4_mmx(src_ptr, src_pixels_per_line, xoffset, yoffset, temp1, 4);
-                filter_block2d_bil(src_ptr, temp2, src_pixels_per_line, 4, HFilter, VFilter, 4, 4);
-            }
-        }
-    }
-#endif
-    filter_block2d_bil(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 4, 4);
-
-}
-
-void vp8_bilinear_predict8x8_c
-(
-    unsigned char  *src_ptr,
-    int  src_pixels_per_line,
-    int  xoffset,
-    int  yoffset,
-    unsigned char *dst_ptr,
-    int  dst_pitch
-)
-{
-    const short *HFilter;
-    const short *VFilter;
-
-    HFilter = vp8_bilinear_filters[xoffset];
-    VFilter = vp8_bilinear_filters[yoffset];
-
-    filter_block2d_bil(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 8, 8);
-
-}
-
-void vp8_bilinear_predict8x4_c
-(
-    unsigned char  *src_ptr,
-    int  src_pixels_per_line,
-    int  xoffset,
-    int  yoffset,
-    unsigned char *dst_ptr,
-    int  dst_pitch
-)
-{
-    const short *HFilter;
-    const short *VFilter;
-
-    HFilter = vp8_bilinear_filters[xoffset];
-    VFilter = vp8_bilinear_filters[yoffset];
-
-    filter_block2d_bil(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 8, 4);
-
-}
-
-void vp8_bilinear_predict16x16_c
-(
-    unsigned char  *src_ptr,
-    int  src_pixels_per_line,
-    int  xoffset,
-    int  yoffset,
-    unsigned char *dst_ptr,
-    int  dst_pitch
-)
-{
-    const short *HFilter;
-    const short *VFilter;
-
-    HFilter = vp8_bilinear_filters[xoffset];
-    VFilter = vp8_bilinear_filters[yoffset];
-
-    filter_block2d_bil(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 16, 16);
-}
--- a/vp8/common/filter_c.c
+++ b/vp8/common/filter_c.c
@@ -0,0 +1,540 @@
+/*
+ *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
+ *
+ *  Use of this source code is governed by a BSD-style license
+ *  that can be found in the LICENSE file in the root of the source
+ *  tree. An additional intellectual property rights grant can be found
+ *  in the file PATENTS.  All contributing project authors may
+ *  be found in the AUTHORS file in the root of the source tree.
+ */
+
+
+#include <stdlib.h>
+
+#define BLOCK_HEIGHT_WIDTH 4
+#define VP8_FILTER_WEIGHT 128
+#define VP8_FILTER_SHIFT  7
+
+
+static const int bilinear_filters[8][2] =
+{
+    { 128,   0 },
+    { 112,  16 },
+    {  96,  32 },
+    {  80,  48 },
+    {  64,  64 },
+    {  48,  80 },
+    {  32,  96 },
+    {  16, 112 }
+};
+
+
+static const short sub_pel_filters[8][6] =
+{
+
+    { 0,  0,  128,    0,   0,  0 },         /* note that 1/8 pel positions are just as per alpha -0.5 bicubic */
+    { 0, -6,  123,   12,  -1,  0 },
+    { 2, -11, 108,   36,  -8,  1 },         /* New 1/4 pel 6 tap filter */
+    { 0, -9,   93,   50,  -6,  0 },
+    { 3, -16,  77,   77, -16,  3 },         /* New 1/2 pel 6 tap filter */
+    { 0, -6,   50,   93,  -9,  0 },
+    { 1, -8,   36,  108, -11,  2 },         /* New 1/4 pel 6 tap filter */
+    { 0, -1,   12,  123,  -6,  0 },
+
+
+
+};
+
+void vp8_filter_block2d_first_pass
+(
+    unsigned char *src_ptr,
+    int *output_ptr,
+    unsigned int src_pixels_per_line,
+    unsigned int pixel_step,
+    unsigned int output_height,
+    unsigned int output_width,
+    const short *vp8_filter
+)
+{
+    unsigned int i, j;
+    int  Temp;
+
+    for (i = 0; i < output_height; i++)
+    {
+        for (j = 0; j < output_width; j++)
+        {
+            Temp = ((int)src_ptr[-2 * (int)pixel_step] * vp8_filter[0]) +
+                   ((int)src_ptr[-1 * (int)pixel_step] * vp8_filter[1]) +
+                   ((int)src_ptr[0]                 * vp8_filter[2]) +
+                   ((int)src_ptr[pixel_step]         * vp8_filter[3]) +
+                   ((int)src_ptr[2*pixel_step]       * vp8_filter[4]) +
+                   ((int)src_ptr[3*pixel_step]       * vp8_filter[5]) +
+                   (VP8_FILTER_WEIGHT >> 1);      /* Rounding */
+
+            /* Normalize back to 0-255 */
+            Temp = Temp >> VP8_FILTER_SHIFT;
+
+            if (Temp < 0)
+                Temp = 0;
+            else if (Temp > 255)
+                Temp = 255;
+
+            output_ptr[j] = Temp;
+            src_ptr++;
+        }
+
+        /* Next row... */
+        src_ptr    += src_pixels_per_line - output_width;
+        output_ptr += output_width;
+    }
+}
+
+void vp8_filter_block2d_second_pass
+(
+    int *src_ptr,
+    unsigned char *output_ptr,
+    int output_pitch,
+    unsigned int src_pixels_per_line,
+    unsigned int pixel_step,
+    unsigned int output_height,
+    unsigned int output_width,
+    const short *vp8_filter
+)
+{
+    unsigned int i, j;
+    int  Temp;
+
+    for (i = 0; i < output_height; i++)
+    {
+        for (j = 0; j < output_width; j++)
+        {
+            /* Apply filter */
+            Temp = ((int)src_ptr[-2 * (int)pixel_step] * vp8_filter[0]) +
+                   ((int)src_ptr[-1 * (int)pixel_step] * vp8_filter[1]) +
+                   ((int)src_ptr[0]                 * vp8_filter[2]) +
+                   ((int)src_ptr[pixel_step]         * vp8_filter[3]) +
+                   ((int)src_ptr[2*pixel_step]       * vp8_filter[4]) +
+                   ((int)src_ptr[3*pixel_step]       * vp8_filter[5]) +
+                   (VP8_FILTER_WEIGHT >> 1);   /* Rounding */
+
+            /* Normalize back to 0-255 */
+            Temp = Temp >> VP8_FILTER_SHIFT;
+
+            if (Temp < 0)
+                Temp = 0;
+            else if (Temp > 255)
+                Temp = 255;
+
+            output_ptr[j] = (unsigned char)Temp;
+            src_ptr++;
+        }
+
+        /* Start next row */
+        src_ptr    += src_pixels_per_line - output_width;
+        output_ptr += output_pitch;
+    }
+}
+
+
+void vp8_filter_block2d
+(
+    unsigned char  *src_ptr,
+    unsigned char  *output_ptr,
+    unsigned int src_pixels_per_line,
+    int output_pitch,
+    const short  *HFilter,
+    const short  *VFilter
+)
+{
+    int FData[9*4]; /* Temp data bufffer used in filtering */
+
+    /* First filter 1-D horizontally... */
+    vp8_filter_block2d_first_pass(src_ptr - (2 * src_pixels_per_line), FData, src_pixels_per_line, 1, 9, 4, HFilter);
+
+    /* then filter verticaly... */
+    vp8_filter_block2d_second_pass(FData + 8, output_ptr, output_pitch, 4, 4, 4, 4, VFilter);
+}
+
+
+void vp8_block_variation_c
+(
+    unsigned char  *src_ptr,
+    int   src_pixels_per_line,
+    int *HVar,
+    int *VVar
+)
+{
+    int i, j;
+    unsigned char *Ptr = src_ptr;
+
+    for (i = 0; i < 4; i++)
+    {
+        for (j = 0; j < 4; j++)
+        {
+            *HVar += abs((int)Ptr[j] - (int)Ptr[j+1]);
+            *VVar += abs((int)Ptr[j] - (int)Ptr[j+src_pixels_per_line]);
+        }
+
+        Ptr += src_pixels_per_line;
+    }
+}
+
+
+
+
+void vp8_sixtap_predict_c
+(
+    unsigned char  *src_ptr,
+    int   src_pixels_per_line,
+    int  xoffset,
+    int  yoffset,
+    unsigned char *dst_ptr,
+    int dst_pitch
+)
+{
+    const short  *HFilter;
+    const short  *VFilter;
+
+    HFilter = sub_pel_filters[xoffset];   /* 6 tap */
+    VFilter = sub_pel_filters[yoffset];   /* 6 tap */
+
+    vp8_filter_block2d(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter);
+}
+void vp8_sixtap_predict8x8_c
+(
+    unsigned char  *src_ptr,
+    int  src_pixels_per_line,
+    int  xoffset,
+    int  yoffset,
+    unsigned char *dst_ptr,
+    int  dst_pitch
+)
+{
+    const short  *HFilter;
+    const short  *VFilter;
+    int FData[13*16];   /* Temp data bufffer used in filtering */
+
+    HFilter = sub_pel_filters[xoffset];   /* 6 tap */
+    VFilter = sub_pel_filters[yoffset];   /* 6 tap */
+
+    /* First filter 1-D horizontally... */
+    vp8_filter_block2d_first_pass(src_ptr - (2 * src_pixels_per_line), FData, src_pixels_per_line, 1, 13, 8, HFilter);
+
+
+    /* then filter verticaly... */
+    vp8_filter_block2d_second_pass(FData + 16, dst_ptr, dst_pitch, 8, 8, 8, 8, VFilter);
+
+}
+
+void vp8_sixtap_predict8x4_c
+(
+    unsigned char  *src_ptr,
+    int  src_pixels_per_line,
+    int  xoffset,
+    int  yoffset,
+    unsigned char *dst_ptr,
+    int  dst_pitch
+)
+{
+    const short  *HFilter;
+    const short  *VFilter;
+    int FData[13*16];   /* Temp data bufffer used in filtering */
+
+    HFilter = sub_pel_filters[xoffset];   /* 6 tap */
+    VFilter = sub_pel_filters[yoffset];   /* 6 tap */
+
+    /* First filter 1-D horizontally... */
+    vp8_filter_block2d_first_pass(src_ptr - (2 * src_pixels_per_line), FData, src_pixels_per_line, 1, 9, 8, HFilter);
+
+
+    /* then filter verticaly... */
+    vp8_filter_block2d_second_pass(FData + 16, dst_ptr, dst_pitch, 8, 8, 4, 8, VFilter);
+
+}
+
+void vp8_sixtap_predict16x16_c
+(
+    unsigned char  *src_ptr,
+    int  src_pixels_per_line,
+    int  xoffset,
+    int  yoffset,
+    unsigned char *dst_ptr,
+    int  dst_pitch
+)
+{
+    const short  *HFilter;
+    const short  *VFilter;
+    int FData[21*24];   /* Temp data bufffer used in filtering */
+
+
+    HFilter = sub_pel_filters[xoffset];   /* 6 tap */
+    VFilter = sub_pel_filters[yoffset];   /* 6 tap */
+
+    /* First filter 1-D horizontally... */
+    vp8_filter_block2d_first_pass(src_ptr - (2 * src_pixels_per_line), FData, src_pixels_per_line, 1, 21, 16, HFilter);
+
+    /* then filter verticaly... */
+    vp8_filter_block2d_second_pass(FData + 32, dst_ptr, dst_pitch, 16, 16, 16, 16, VFilter);
+
+}
+
+
+/****************************************************************************
+ *
+ *  ROUTINE       : filter_block2d_bil_first_pass
+ *
+ *  INPUTS        : UINT8  *src_ptr          : Pointer to source block.
+ *                  UINT32 src_pixels_per_line : Stride of input block.
+ *                  UINT32 pixel_step        : Offset between filter input samples (see notes).
+ *                  UINT32 output_height     : Input block height.
+ *                  UINT32 output_width      : Input block width.
+ *                  INT32  *vp8_filter          : Array of 2 bi-linear filter taps.
+ *
+ *  OUTPUTS       : INT32 *output_ptr        : Pointer to filtered block.
+ *
+ *  RETURNS       : void
+ *
+ *  FUNCTION      : Applies a 1-D 2-tap bi-linear filter to the source block in
+ *                  either horizontal or vertical direction to produce the
+ *                  filtered output block. Used to implement first-pass
+ *                  of 2-D separable filter.
+ *
+ *  SPECIAL NOTES : Produces INT32 output to retain precision for next pass.
+ *                  Two filter taps should sum to VP8_FILTER_WEIGHT.
+ *                  pixel_step defines whether the filter is applied
+ *                  horizontally (pixel_step=1) or vertically (pixel_step=stride).
+ *                  It defines the offset required to move from one input
+ *                  to the next.
+ *
+ ****************************************************************************/
+void vp8_filter_block2d_bil_first_pass
+(
+    unsigned char *src_ptr,
+    unsigned short *output_ptr,
+    unsigned int src_pixels_per_line,
+    int pixel_step,
+    unsigned int output_height,
+    unsigned int output_width,
+    const int *vp8_filter
+)
+{
+    unsigned int i, j;
+
+    for (i = 0; i < output_height; i++)
+    {
+        for (j = 0; j < output_width; j++)
+        {
+            /* Apply bilinear filter */
+            output_ptr[j] = (((int)src_ptr[0]          * vp8_filter[0]) +
+                             ((int)src_ptr[pixel_step] * vp8_filter[1]) +
+                             (VP8_FILTER_WEIGHT / 2)) >> VP8_FILTER_SHIFT;
+            src_ptr++;
+        }
+
+        /* Next row... */
+        src_ptr    += src_pixels_per_line - output_width;
+        output_ptr += output_width;
+    }
+}
+
+/****************************************************************************
+ *
+ *  ROUTINE       : filter_block2d_bil_second_pass
+ *
+ *  INPUTS        : INT32  *src_ptr          : Pointer to source block.
+ *                  UINT32 src_pixels_per_line : Stride of input block.
+ *                  UINT32 pixel_step        : Offset between filter input samples (see notes).
+ *                  UINT32 output_height     : Input block height.
+ *                  UINT32 output_width      : Input block width.
+ *                  INT32  *vp8_filter          : Array of 2 bi-linear filter taps.
+ *
+ *  OUTPUTS       : UINT16 *output_ptr       : Pointer to filtered block.
+ *
+ *  RETURNS       : void
+ *
+ *  FUNCTION      : Applies a 1-D 2-tap bi-linear filter to the source block in
+ *                  either horizontal or vertical direction to produce the
+ *                  filtered output block. Used to implement second-pass
+ *                  of 2-D separable filter.
+ *
+ *  SPECIAL NOTES : Requires 32-bit input as produced by filter_block2d_bil_first_pass.
+ *                  Two filter taps should sum to VP8_FILTER_WEIGHT.
+ *                  pixel_step defines whether the filter is applied
+ *                  horizontally (pixel_step=1) or vertically (pixel_step=stride).
+ *                  It defines the offset required to move from one input
+ *                  to the next.
+ *
+ ****************************************************************************/
+void vp8_filter_block2d_bil_second_pass
+(
+    unsigned short *src_ptr,
+    unsigned char  *output_ptr,
+    int output_pitch,
+    unsigned int  src_pixels_per_line,
+    unsigned int  pixel_step,
+    unsigned int  output_height,
+    unsigned int  output_width,
+    const int *vp8_filter
+)
+{
+    unsigned int  i, j;
+    int  Temp;
+
+    for (i = 0; i < output_height; i++)
+    {
+        for (j = 0; j < output_width; j++)
+        {
+            /* Apply filter */
+            Temp = ((int)src_ptr[0]         * vp8_filter[0]) +
+                   ((int)src_ptr[pixel_step] * vp8_filter[1]) +
+                   (VP8_FILTER_WEIGHT / 2);
+            output_ptr[j] = (unsigned int)(Temp >> VP8_FILTER_SHIFT);
+            src_ptr++;
+        }
+
+        /* Next row... */
+        src_ptr    += src_pixels_per_line - output_width;
+        output_ptr += output_pitch;
+    }
+}
+
+
+/****************************************************************************
+ *
+ *  ROUTINE       : filter_block2d_bil
+ *
+ *  INPUTS        : UINT8  *src_ptr          : Pointer to source block.
+ *                  UINT32 src_pixels_per_line : Stride of input block.
+ *                  INT32  *HFilter         : Array of 2 horizontal filter taps.
+ *                  INT32  *VFilter         : Array of 2 vertical filter taps.
+ *
+ *  OUTPUTS       : UINT16 *output_ptr       : Pointer to filtered block.
+ *
+ *  RETURNS       : void
+ *
+ *  FUNCTION      : 2-D filters an input block by applying a 2-tap
+ *                  bi-linear filter horizontally followed by a 2-tap
+ *                  bi-linear filter vertically on the result.
+ *
+ *  SPECIAL NOTES : The largest block size can be handled here is 16x16
+ *
+ ****************************************************************************/
+void vp8_filter_block2d_bil
+(
+    unsigned char *src_ptr,
+    unsigned char *output_ptr,
+    unsigned int   src_pixels_per_line,
+    unsigned int   dst_pitch,
+    const int      *HFilter,
+    const int      *VFilter,
+    int            Width,
+    int            Height
+)
+{
+
+    unsigned short FData[17*16];    /* Temp data bufffer used in filtering */
+
+    /* First filter 1-D horizontally... */
+    vp8_filter_block2d_bil_first_pass(src_ptr, FData, src_pixels_per_line, 1, Height + 1, Width, HFilter);
+
+    /* then 1-D vertically... */
+    vp8_filter_block2d_bil_second_pass(FData, output_ptr, dst_pitch, Width, Width, Height, Width, VFilter);
+}
+
+
+void vp8_bilinear_predict4x4_c
+(
+    unsigned char  *src_ptr,
+    int   src_pixels_per_line,
+    int  xoffset,
+    int  yoffset,
+    unsigned char *dst_ptr,
+    int dst_pitch
+)
+{
+    const int  *HFilter;
+    const int  *VFilter;
+
+    HFilter = bilinear_filters[xoffset];
+    VFilter = bilinear_filters[yoffset];
+#if 0
+    {
+        int i;
+        unsigned char temp1[16];
+        unsigned char temp2[16];
+
+        bilinear_predict4x4_mmx(src_ptr, src_pixels_per_line, xoffset, yoffset, temp1, 4);
+        vp8_filter_block2d_bil(src_ptr, temp2, src_pixels_per_line, 4, HFilter, VFilter, 4, 4);
+
+        for (i = 0; i < 16; i++)
+        {
+            if (temp1[i] != temp2[i])
+            {
+                bilinear_predict4x4_mmx(src_ptr, src_pixels_per_line, xoffset, yoffset, temp1, 4);
+                vp8_filter_block2d_bil(src_ptr, temp2, src_pixels_per_line, 4, HFilter, VFilter, 4, 4);
+            }
+        }
+    }
+#endif
+    vp8_filter_block2d_bil(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 4, 4);
+
+}
+
+void vp8_bilinear_predict8x8_c
+(
+    unsigned char  *src_ptr,
+    int  src_pixels_per_line,
+    int  xoffset,
+    int  yoffset,
+    unsigned char *dst_ptr,
+    int  dst_pitch
+)
+{
+    const int  *HFilter;
+    const int  *VFilter;
+
+    HFilter = bilinear_filters[xoffset];
+    VFilter = bilinear_filters[yoffset];
+
+    vp8_filter_block2d_bil(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 8, 8);
+
+}
+
+void vp8_bilinear_predict8x4_c
+(
+    unsigned char  *src_ptr,
+    int  src_pixels_per_line,
+    int  xoffset,
+    int  yoffset,
+    unsigned char *dst_ptr,
+    int  dst_pitch
+)
+{
+    const int  *HFilter;
+    const int  *VFilter;
+
+    HFilter = bilinear_filters[xoffset];
+    VFilter = bilinear_filters[yoffset];
+
+    vp8_filter_block2d_bil(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 8, 4);
+
+}
+
+void vp8_bilinear_predict16x16_c
+(
+    unsigned char  *src_ptr,
+    int  src_pixels_per_line,
+    int  xoffset,
+    int  yoffset,
+    unsigned char *dst_ptr,
+    int  dst_pitch
+)
+{
+    const int  *HFilter;
+    const int  *VFilter;
+
+    HFilter = bilinear_filters[xoffset];
+    VFilter = bilinear_filters[yoffset];
+
+    vp8_filter_block2d_bil(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 16, 16);
+}
--- a/vp8/common/findnearmv.c
+++ b/vp8/common/findnearmv.c
@@ -11,16 +11,47 @@

 #include "findnearmv.h"

-const unsigned char vp8_mbsplit_offset[4][16] = {
-    { 0,  8,  0,  0,  0,  0,  0,  0,  0,  0,   0,  0,  0,  0,  0,  0},
-    { 0,  2,  0,  0,  0,  0,  0,  0,  0,  0,   0,  0,  0,  0,  0,  0},
-    { 0,  2,  8, 10,  0,  0,  0,  0,  0,  0,   0,  0,  0,  0,  0,  0},
-    { 0,  1,  2,  3,  4,  5,  6,  7,  8,  9,  10, 11, 12, 13, 14, 15}
-};
+#define FINDNEAR_SEARCH_SITES   3

 /* Predict motion vectors using those from already-decoded nearby blocks.
   Note that we only consider one 4x4 subblock from each candidate 16x16
   macroblock.   */
+
+typedef union
+{
+    unsigned int as_int;
+    MV           as_mv;
+} int_mv;        /* facilitates rapid equality tests */
+
+static void mv_bias(const MODE_INFO *x, int refframe, int_mv *mvp, const int *ref_frame_sign_bias)
+{
+    MV xmv;
+    xmv = x->mbmi.mv.as_mv;
+
+    if (ref_frame_sign_bias[x->mbmi.ref_frame] != ref_frame_sign_bias[refframe])
+    {
+        xmv.row *= -1;
+        xmv.col *= -1;
+    }
+
+    mvp->as_mv = xmv;
+}
+
+
+void vp8_clamp_mv(MV *mv, const MACROBLOCKD *xd)
+{
+    if (mv->col < (xd->mb_to_left_edge - LEFT_TOP_MARGIN))
+        mv->col = xd->mb_to_left_edge - LEFT_TOP_MARGIN;
+    else if (mv->col > xd->mb_to_right_edge + RIGHT_BOTTOM_MARGIN)
+        mv->col = xd->mb_to_right_edge + RIGHT_BOTTOM_MARGIN;
+
+    if (mv->row < (xd->mb_to_top_edge - LEFT_TOP_MARGIN))
+        mv->row = xd->mb_to_top_edge - LEFT_TOP_MARGIN;
+    else if (mv->row > xd->mb_to_bottom_edge + RIGHT_BOTTOM_MARGIN)
+        mv->row = xd->mb_to_bottom_edge + RIGHT_BOTTOM_MARGIN;
+}
+
+
 void vp8_find_near_mvs
 (
    MACROBLOCKD *xd,
@@ -51,7 +82,7 @@ void vp8_find_near_mvs
        if (above->mbmi.mv.as_int)
        {
            (++mv)->as_int = above->mbmi.mv.as_int;
-            mv_bias(ref_frame_sign_bias[above->mbmi.ref_frame], refframe, mv, ref_frame_sign_bias);
+            mv_bias(above, refframe, mv, ref_frame_sign_bias);
            ++cntx;
        }

@@ -66,7 +97,7 @@ void vp8_find_near_mvs
            int_mv this_mv;

            this_mv.as_int = left->mbmi.mv.as_int;
-            mv_bias(ref_frame_sign_bias[left->mbmi.ref_frame], refframe, &this_mv, ref_frame_sign_bias);
+            mv_bias(left, refframe, &this_mv, ref_frame_sign_bias);

            if (this_mv.as_int != mv->as_int)
            {
@@ -88,7 +119,7 @@ void vp8_find_near_mvs
            int_mv this_mv;

            this_mv.as_int = aboveleft->mbmi.mv.as_int;
-            mv_bias(ref_frame_sign_bias[aboveleft->mbmi.ref_frame], refframe, &this_mv, ref_frame_sign_bias);
+            mv_bias(aboveleft, refframe, &this_mv, ref_frame_sign_bias);

            if (this_mv.as_int != mv->as_int)
            {
--- a/vp8/common/findnearmv.h
+++ b/vp8/common/findnearmv.h
@@ -17,41 +17,6 @@
 #include "modecont.h"
 #include "treecoder.h"

-typedef union
-{
-    unsigned int as_int;
-    MV           as_mv;
-} int_mv;        /* facilitates rapid equality tests */
-
-static void mv_bias(int refmb_ref_frame_sign_bias, int refframe, int_mv *mvp, const int *ref_frame_sign_bias)
-{
-    MV xmv;
-    xmv = mvp->as_mv;
-
-    if (refmb_ref_frame_sign_bias != ref_frame_sign_bias[refframe])
-    {
-        xmv.row *= -1;
-        xmv.col *= -1;
-    }
-
-    mvp->as_mv = xmv;
-}
-
-#define LEFT_TOP_MARGIN (16 << 3)
-#define RIGHT_BOTTOM_MARGIN (16 << 3)
-static void vp8_clamp_mv(MV *mv, const MACROBLOCKD *xd)
-{
-    if (mv->col < (xd->mb_to_left_edge - LEFT_TOP_MARGIN))
-        mv->col = xd->mb_to_left_edge - LEFT_TOP_MARGIN;
-    else if (mv->col > xd->mb_to_right_edge + RIGHT_BOTTOM_MARGIN)
-        mv->col = xd->mb_to_right_edge + RIGHT_BOTTOM_MARGIN;
-
-    if (mv->row < (xd->mb_to_top_edge - LEFT_TOP_MARGIN))
-        mv->row = xd->mb_to_top_edge - LEFT_TOP_MARGIN;
-    else if (mv->row > xd->mb_to_bottom_edge + RIGHT_BOTTOM_MARGIN)
-        mv->row = xd->mb_to_bottom_edge + RIGHT_BOTTOM_MARGIN;
-}
-
 void vp8_find_near_mvs
 (
    MACROBLOCKD *xd,
@@ -70,6 +35,8 @@ const B_MODE_INFO *vp8_left_bmi(const MODE_INFO *cur_mb, int b);

 const B_MODE_INFO *vp8_above_bmi(const MODE_INFO *cur_mb, int b, int mi_stride);

-extern const unsigned char vp8_mbsplit_offset[4][16];
+#define LEFT_TOP_MARGIN (16 << 3)
+#define RIGHT_BOTTOM_MARGIN (16 << 3)
+

 #endif
--- a/vp8/common/fourcc.hpp
+++ b/vp8/common/fourcc.hpp
@@ -0,0 +1,121 @@
+/*
+ *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
+ *
+ *  Use of this source code is governed by a BSD-style license
+ *  that can be found in the LICENSE file in the root of the source
+ *  tree. An additional intellectual property rights grant can be found
+ *  in the file PATENTS.  All contributing project authors may
+ *  be found in the AUTHORS file in the root of the source tree.
+ */
+
+
+#ifndef FOURCC_HPP
+#define FOURCC_HPP
+
+#include <iosfwd>
+#include <cstring>
+
+
+#if defined(__POWERPC__) || defined(__APPLE__) || defined(__MERKS__)
+using namespace std;
+#endif
+
+class four_cc
+{
+public:
+
+    four_cc();
+    four_cc(const char*);
+    explicit four_cc(unsigned long);
+
+    bool operator==(const four_cc&) const;
+    bool operator!=(const four_cc&) const;
+
+    bool operator==(const char*) const;
+    bool operator!=(const char*) const;
+
+    operator unsigned long() const;
+    unsigned long as_long() const;
+
+    four_cc& operator=(unsigned long);
+
+    char operator[](int) const;
+
+    std::ostream& put(std::ostream&) const;
+
+    bool printable() const;
+
+private:
+
+    union
+    {
+        char code[4];
+        unsigned long code_as_long;
+    };
+
+};
+
+
+inline four_cc::four_cc()
+{
+}
+
+inline four_cc::four_cc(unsigned long x)
+    : code_as_long(x)
+{
+}
+
+inline four_cc::four_cc(const char* str)
+{
+    memcpy(code, str, 4);
+}
+
+
+inline bool four_cc::operator==(const four_cc& rhs) const
+{
+    return code_as_long == rhs.code_as_long;
+}
+
+inline bool four_cc::operator!=(const four_cc& rhs) const
+{
+    return !operator==(rhs);
+}
+
+inline bool four_cc::operator==(const char* rhs) const
+{
+    return (memcmp(code, rhs, 4) == 0);
+}
+
+inline bool four_cc::operator!=(const char* rhs) const
+{
+    return !operator==(rhs);
+}
+
+
+inline four_cc::operator unsigned long() const
+{
+    return code_as_long;
+}
+
+inline unsigned long four_cc::as_long() const
+{
+    return code_as_long;
+}
+
+inline char four_cc::operator[](int i) const
+{
+    return code[i];
+}
+
+inline four_cc& four_cc::operator=(unsigned long val)
+{
+    code_as_long = val;
+    return *this;
+}
+
+inline std::ostream& operator<<(std::ostream& os, const four_cc& rhs)
+{
+    return rhs.put(os);
+}
+
+#endif
--- a/vp8/common/generic/systemdependent.c
+++ b/vp8/common/generic/systemdependent.c
@@ -10,16 +10,21 @@


 #include "vpx_ports/config.h"
-#include "vp8/common/g_common.h"
-#include "vp8/common/subpixel.h"
-#include "vp8/common/loopfilter.h"
-#include "vp8/common/recon.h"
-#include "vp8/common/idct.h"
-#include "vp8/common/onyxc_int.h"
+#include "g_common.h"
+#include "subpixel.h"
+#include "loopfilter.h"
+#include "recon.h"
+#include "idct.h"
+#include "onyxc_int.h"

 extern void vp8_arch_x86_common_init(VP8_COMMON *ctx);
 extern void vp8_arch_arm_common_init(VP8_COMMON *ctx);
-extern void vp8_arch_opencl_common_init(VP8_COMMON *ctx);
+
+void (*vp8_build_intra_predictors_mby_ptr)(MACROBLOCKD *x);
+extern void vp8_build_intra_predictors_mby(MACROBLOCKD *x);
+
+void (*vp8_build_intra_predictors_mby_s_ptr)(MACROBLOCKD *x);
+extern void vp8_build_intra_predictors_mby_s(MACROBLOCKD *x);

 void vp8_machine_specific_config(VP8_COMMON *ctx)
 {
@@ -40,10 +45,6 @@ void vp8_machine_specific_config(VP8_COMMON *ctx)
    rtcd->recon.recon4      = vp8_recon4b_c;
    rtcd->recon.recon_mb    = vp8_recon_mb_c;
    rtcd->recon.recon_mby   = vp8_recon_mby_c;
-    rtcd->recon.build_intra_predictors_mby =
-        vp8_build_intra_predictors_mby;
-    rtcd->recon.build_intra_predictors_mby_s =
-        vp8_build_intra_predictors_mby_s;

    rtcd->subpix.sixtap16x16   = vp8_sixtap_predict16x16_c;
    rtcd->subpix.sixtap8x8     = vp8_sixtap_predict8x8_c;
@@ -74,6 +75,9 @@ void vp8_machine_specific_config(VP8_COMMON *ctx)
 #endif

 #endif
+    /* Pure C: */
+    vp8_build_intra_predictors_mby_ptr = vp8_build_intra_predictors_mby;
+    vp8_build_intra_predictors_mby_s_ptr = vp8_build_intra_predictors_mby_s;

 #if ARCH_X86 || ARCH_X86_64
    vp8_arch_x86_common_init(ctx);
@@ -83,8 +87,4 @@ void vp8_machine_specific_config(VP8_COMMON *ctx)
    vp8_arch_arm_common_init(ctx);
 #endif

-#if CONFIG_OPENCL && (ENABLE_CL_IDCT_DEQUANT || ENABLE_CL_SUBPIXEL || ENABLE_CL_LOOPFILTER)
-    vp8_arch_opencl_common_init(ctx);
-#endif
-
 }
--- a/vp8/common/idct.h
+++ b/vp8/common/idct.h
@@ -31,10 +31,6 @@
 #include "arm/idct_arm.h"
 #endif

-#if CONFIG_OPENCL
-#include "opencl/idct_cl.h"
-#endif
-
 #ifndef vp8_idct_idct1
 #define vp8_idct_idct1 vp8_short_idct4x4llm_1_c
 #endif
--- a/vp8/encoder/invtrans.c
+++ b/vp8/encoder/invtrans.c
@@ -11,6 +11,8 @@

 #include "invtrans.h"

+
+
 static void recon_dcblock(MACROBLOCKD *x)
 {
    BLOCKD *b = &x->block[24];
@@ -18,7 +20,7 @@ static void recon_dcblock(MACROBLOCKD *x)

    for (i = 0; i < 16; i++)
    {
-        *(x->block[i].dqcoeff_base+x->block[i].dqcoeff_offset) = b->diff_base[b->diff_offset+i];
+        x->block[i].dqcoeff[0] = b->diff[i];
    }

 }
@@ -26,18 +28,18 @@ static void recon_dcblock(MACROBLOCKD *x)
 void vp8_inverse_transform_b(const vp8_idct_rtcd_vtable_t *rtcd, BLOCKD *b, int pitch)
 {
    if (b->eob > 1)
-        IDCT_INVOKE(rtcd, idct16)(b->dqcoeff_base + b->dqcoeff_offset, &b->diff_base[b->diff_offset], pitch);
+        IDCT_INVOKE(rtcd, idct16)(b->dqcoeff, b->diff, pitch);
    else
-        IDCT_INVOKE(rtcd, idct1)(b->dqcoeff_base + b->dqcoeff_offset, &b->diff_base[b->diff_offset], pitch);
+        IDCT_INVOKE(rtcd, idct1)(b->dqcoeff, b->diff, pitch);
 }

-/* Only used in the encoder */
+
 void vp8_inverse_transform_mby(const vp8_idct_rtcd_vtable_t *rtcd, MACROBLOCKD *x)
 {
    int i;

    /* do 2nd order transform on the dc block */
-    IDCT_INVOKE(rtcd, iwalsh16)(x->block[24].dqcoeff_base + x->block[23].dqcoeff_offset, &x->block[24].diff_base[x->block[24].diff_offset]);
+    IDCT_INVOKE(rtcd, iwalsh16)(x->block[24].dqcoeff, x->block[24].diff);

    recon_dcblock(x);

@@ -47,8 +49,6 @@ void vp8_inverse_transform_mby(const vp8_idct_rtcd_vtable_t *rtcd, MACROBLOCKD *
    }

 }
-
-/* Only used in encoder */
 void vp8_inverse_transform_mbuv(const vp8_idct_rtcd_vtable_t *rtcd, MACROBLOCKD *x)
 {
    int i;
@@ -57,6 +57,7 @@ void vp8_inverse_transform_mbuv(const vp8_idct_rtcd_vtable_t *rtcd, MACROBLOCKD
    {
        vp8_inverse_transform_b(rtcd, &x->block[i], 16);
    }
+
 }


@@ -68,10 +69,8 @@ void vp8_inverse_transform_mb(const vp8_idct_rtcd_vtable_t *rtcd, MACROBLOCKD *x
        x->mode_info_context->mbmi.mode != SPLITMV)
    {
        /* do 2nd order transform on the dc block */
-        BLOCKD b = x->block[24];
-
-        IDCT_INVOKE(rtcd, iwalsh16)(b.dqcoeff_base+b.dqcoeff_offset, &b.diff_base[b.diff_offset]);

+        IDCT_INVOKE(rtcd, iwalsh16)(&x->block[24].dqcoeff[0], x->block[24].diff);
        recon_dcblock(x);
    }

--- a/vp8/encoder/invtrans.h
+++ b/vp8/encoder/invtrans.h
@@ -13,8 +13,8 @@
 #define __INC_INVTRANS_H

 #include "vpx_ports/config.h"
-#include "vp8/common/idct.h"
-#include "vp8/common/blockd.h"
+#include "idct.h"
+#include "blockd.h"
 extern void vp8_inverse_transform_b(const vp8_idct_rtcd_vtable_t *rtcd, BLOCKD *b, int pitch);
 extern void vp8_inverse_transform_mb(const vp8_idct_rtcd_vtable_t *rtcd, MACROBLOCKD *x);
 extern void vp8_inverse_transform_mby(const vp8_idct_rtcd_vtable_t *rtcd, MACROBLOCKD *x);
--- a/vp8/common/loopfilter.c
+++ b/vp8/common/loopfilter.c
@@ -13,10 +13,6 @@
 #include "loopfilter.h"
 #include "onyxc_int.h"

-#if CONFIG_OPENCL
-#include "opencl/loopfilter_cl.h"
-#endif
-
 typedef unsigned char uc;


@@ -32,13 +28,13 @@ void vp8_loop_filter_mbh_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned
                           int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
 {
    (void) simpler_lpf;
-    vp8_mbloop_filter_horizontal_edge_c(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
+    vp8_mbloop_filter_horizontal_edge_c(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);

    if (u_ptr)
-        vp8_mbloop_filter_horizontal_edge_c(u_ptr, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1);
+        vp8_mbloop_filter_horizontal_edge_c(u_ptr, uv_stride, lfi->uvmbflim, lfi->uvlim, lfi->uvmbthr, 1);

    if (v_ptr)
-        vp8_mbloop_filter_horizontal_edge_c(v_ptr, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1);
+        vp8_mbloop_filter_horizontal_edge_c(v_ptr, uv_stride, lfi->uvmbflim, lfi->uvlim, lfi->uvmbthr, 1);
 }

 void vp8_loop_filter_mbhs_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
@@ -48,7 +44,7 @@ void vp8_loop_filter_mbhs_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned
    (void) v_ptr;
    (void) uv_stride;
    (void) simpler_lpf;
-    vp8_loop_filter_simple_horizontal_edge_c(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
+    vp8_loop_filter_simple_horizontal_edge_c(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);
 }

 /* Vertical MB Filtering */
@@ -56,13 +52,13 @@ void vp8_loop_filter_mbv_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned
                           int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
 {
    (void) simpler_lpf;
-    vp8_mbloop_filter_vertical_edge_c(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
+    vp8_mbloop_filter_vertical_edge_c(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);

    if (u_ptr)
-        vp8_mbloop_filter_vertical_edge_c(u_ptr, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1);
+        vp8_mbloop_filter_vertical_edge_c(u_ptr, uv_stride, lfi->uvmbflim, lfi->uvlim, lfi->uvmbthr, 1);

    if (v_ptr)
-        vp8_mbloop_filter_vertical_edge_c(v_ptr, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1);
+        vp8_mbloop_filter_vertical_edge_c(v_ptr, uv_stride, lfi->uvmbflim, lfi->uvlim, lfi->uvmbthr, 1);
 }

 void vp8_loop_filter_mbvs_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
@@ -72,7 +68,7 @@ void vp8_loop_filter_mbvs_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned
    (void) v_ptr;
    (void) uv_stride;
    (void) simpler_lpf;
-    vp8_loop_filter_simple_vertical_edge_c(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
+    vp8_loop_filter_simple_vertical_edge_c(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);
 }

 /* Horizontal B Filtering */
@@ -85,10 +81,10 @@ void vp8_loop_filter_bh_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned c
    vp8_loop_filter_horizontal_edge_c(y_ptr + 12 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);

    if (u_ptr)
-        vp8_loop_filter_horizontal_edge_c(u_ptr + 4 * uv_stride, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1);
+        vp8_loop_filter_horizontal_edge_c(u_ptr + 4 * uv_stride, uv_stride, lfi->uvflim, lfi->uvlim, lfi->uvthr, 1);

    if (v_ptr)
-        vp8_loop_filter_horizontal_edge_c(v_ptr + 4 * uv_stride, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1);
+        vp8_loop_filter_horizontal_edge_c(v_ptr + 4 * uv_stride, uv_stride, lfi->uvflim, lfi->uvlim, lfi->uvthr, 1);
 }

 void vp8_loop_filter_bhs_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
@@ -113,10 +109,10 @@ void vp8_loop_filter_bv_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned c
    vp8_loop_filter_vertical_edge_c(y_ptr + 12, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);

    if (u_ptr)
-        vp8_loop_filter_vertical_edge_c(u_ptr + 4, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1);
+        vp8_loop_filter_vertical_edge_c(u_ptr + 4, uv_stride, lfi->uvflim, lfi->uvlim, lfi->uvthr, 1);

    if (v_ptr)
-        vp8_loop_filter_vertical_edge_c(v_ptr + 4, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1);
+        vp8_loop_filter_vertical_edge_c(v_ptr + 4, uv_stride, lfi->uvflim, lfi->uvlim, lfi->uvthr, 1);
 }

 void vp8_loop_filter_bvs_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
@@ -141,6 +137,8 @@ void vp8_init_loop_filter(VP8_COMMON *cm)

    int block_inside_limit = 0;
    int HEVThresh;
+    const int yhedge_boost  = 2;
+    const int uvhedge_boost = 2;

    /* For each possible value for the loop filter fill out a "loop_filter_info" entry. */
    for (i = 0; i <= MAX_LOOP_FILTER; i++)
@@ -184,9 +182,15 @@ void vp8_init_loop_filter(VP8_COMMON *cm)
        for (j = 0; j < 16; j++)
        {
            lfi[i].lim[j] = block_inside_limit;
-            lfi[i].mbflim[j] = filt_lvl + 2;
+            lfi[i].mbflim[j] = filt_lvl + yhedge_boost;
+            lfi[i].mbthr[j] = HEVThresh;
            lfi[i].flim[j] = filt_lvl;
            lfi[i].thr[j] = HEVThresh;
+            lfi[i].uvlim[j] = block_inside_limit;
+            lfi[i].uvmbflim[j] = filt_lvl + uvhedge_boost;
+            lfi[i].uvmbthr[j] = HEVThresh;
+            lfi[i].uvflim[j] = filt_lvl;
+            lfi[i].uvthr[j] = HEVThresh;
        }

    }
@@ -245,52 +249,57 @@ void vp8_frame_init_loop_filter(loop_filter_info *lfi, int frame_type)
        for (j = 0; j < 16; j++)
        {
            /*lfi[i].lim[j] = block_inside_limit;
-            lfi[i].mbflim[j] = filt_lvl+2;*/
+            lfi[i].mbflim[j] = filt_lvl+yhedge_boost;*/
+            lfi[i].mbthr[j] = HEVThresh;
            /*lfi[i].flim[j] = filt_lvl;*/
            lfi[i].thr[j] = HEVThresh;
+            /*lfi[i].uvlim[j] = block_inside_limit;
+            lfi[i].uvmbflim[j] = filt_lvl+uvhedge_boost;*/
+            lfi[i].uvmbthr[j] = HEVThresh;
+            /*lfi[i].uvflim[j] = filt_lvl;*/
+            lfi[i].uvthr[j] = HEVThresh;
        }
    }
 }


-int vp8_adjust_mb_lf_value(MACROBLOCKD *mbd, int filter_level)
+void vp8_adjust_mb_lf_value(MACROBLOCKD *mbd, int *filter_level)
 {
    MB_MODE_INFO *mbmi = &mbd->mode_info_context->mbmi;

    if (mbd->mode_ref_lf_delta_enabled)
    {
        /* Apply delta for reference frame */
-        filter_level += mbd->ref_lf_deltas[mbmi->ref_frame];
+        *filter_level += mbd->ref_lf_deltas[mbmi->ref_frame];

        /* Apply delta for mode */
        if (mbmi->ref_frame == INTRA_FRAME)
        {
            /* Only the split mode BPRED has a further special case */
            if (mbmi->mode == B_PRED)
-                filter_level +=  mbd->mode_lf_deltas[0];
+                *filter_level +=  mbd->mode_lf_deltas[0];
        }
        else
        {
            /* Zero motion mode */
            if (mbmi->mode == ZEROMV)
-                filter_level +=  mbd->mode_lf_deltas[1];
+                *filter_level +=  mbd->mode_lf_deltas[1];

            /* Split MB motion mode */
            else if (mbmi->mode == SPLITMV)
-                filter_level +=  mbd->mode_lf_deltas[3];
+                *filter_level +=  mbd->mode_lf_deltas[3];

            /* All other inter motion modes (Nearest, Near, New) */
            else
-                filter_level +=  mbd->mode_lf_deltas[2];
+                *filter_level +=  mbd->mode_lf_deltas[2];
        }

        /* Range check */
-        if (filter_level > MAX_LOOP_FILTER)
-            filter_level = MAX_LOOP_FILTER;
-        else if (filter_level < 0)
-            filter_level = 0;
+        if (*filter_level > MAX_LOOP_FILTER)
+            *filter_level = MAX_LOOP_FILTER;
+        else if (*filter_level < 0)
+            *filter_level = 0;
    }
-    return filter_level;
 }


@@ -316,13 +325,6 @@ void vp8_loop_filter_frame
    int i;
    unsigned char *y_ptr, *u_ptr, *v_ptr;

-#if CONFIG_OPENCL && ENABLE_CL_LOOPFILTER
-    if ( cl_initialized == CL_SUCCESS ){
-        vp8_loop_filter_frame_cl(cm,mbd,default_filt_lvl);
-        return;
-    }
-#endif
-
    mbd->mode_info_context = cm->mi;          /* Point at base of Mb MODE_INFO list */

    /* Note the baseline filter values for each segment */
@@ -371,7 +373,7 @@ void vp8_loop_filter_frame
             * These specified to 8th pel as they are always compared to values that are in 1/8th pel units
             * Apply any context driven MB level adjustment
             */
-            filter_level = vp8_adjust_mb_lf_value(mbd, filter_level);
+            vp8_adjust_mb_lf_value(mbd, &filter_level);

            if (filter_level)
            {
@@ -405,7 +407,6 @@ void vp8_loop_filter_frame
 }


-/* Encoder only... */
 void vp8_loop_filter_frame_yonly
 (
    VP8_COMMON *cm,
@@ -472,7 +473,7 @@ void vp8_loop_filter_frame_yonly
            filter_level = baseline_filter_level[Segment];

            /* Apply any context driven MB level adjustment */
-            filter_level = vp8_adjust_mb_lf_value(mbd, filter_level);
+            vp8_adjust_mb_lf_value(mbd, &filter_level);

            if (filter_level)
            {
@@ -501,7 +502,7 @@ void vp8_loop_filter_frame_yonly

 }

-/* Encoder only... */
+
 void vp8_loop_filter_partial_frame
 (
    VP8_COMMON *cm,
--- a/vp8/common/loopfilter.h
+++ b/vp8/common/loopfilter.h
@@ -32,6 +32,12 @@ typedef struct
    DECLARE_ALIGNED(16, signed char, flim[16]);
    DECLARE_ALIGNED(16, signed char, thr[16]);
    DECLARE_ALIGNED(16, signed char, mbflim[16]);
+    DECLARE_ALIGNED(16, signed char, mbthr[16]);
+    DECLARE_ALIGNED(16, signed char, uvlim[16]);
+    DECLARE_ALIGNED(16, signed char, uvflim[16]);
+    DECLARE_ALIGNED(16, signed char, uvthr[16]);
+    DECLARE_ALIGNED(16, signed char, uvmbflim[16]);
+    DECLARE_ALIGNED(16, signed char, uvmbthr[16]);
 } loop_filter_info;


--- a/vp8/common/loopfilter_filters.c
+++ b/vp8/common/loopfilter_filters.c
@@ -49,6 +49,7 @@ static __inline signed char vp8_hevmask(signed char thresh, uc p1, uc p0, uc q0,
 }

 static __inline void vp8_filter(signed char mask, signed char hev, uc *op1, uc *op0, uc *oq0, uc *oq1)
+
 {
    signed char ps0, qs0;
    signed char ps1, qs1;
@@ -93,7 +94,6 @@ static __inline void vp8_filter(signed char mask, signed char hev, uc *op1, uc *
    *op1 = u ^ 0x80;

 }
-
 void vp8_loop_filter_horizontal_edge_c
 (
    unsigned char *s,
--- a/vp8/common/opencl/reconinter_cl.h
+++ b/vp8/common/opencl/reconinter_cl.h
@@ -9,17 +9,23 @@
 */


-#ifndef __INC_RECONINTER_CL_H
-#define __INC_RECONINTER_CL_H
+#if !defined(_mac_specs_h)
+#define _mac_specs_h

-#include "blockd_cl.h"
-#include "subpixel_cl.h"
-#include "filter_cl.h"

-extern void vp8_build_inter_predictors_mb_cl(MACROBLOCKD *x);
-extern void vp8_build_inter_predictors_mbuv_cl(MACROBLOCKD *x);
+#if defined(__cplusplus)
+extern "C" {
+#endif
+
+    extern unsigned int vp8_read_tsc();
+
+    extern unsigned int vp8_get_processor_freq();
+
+    extern unsigned int vpx_has_altivec();
+
+#if defined(__cplusplus)
+}
+#endif

-extern void vp8_build_inter_predictors_mb_s_cl(MACROBLOCKD *x);
-//extern void vp8_build_inter_predictors_b_cl(BLOCKD *d, int pitch);

 #endif
--- a/vp8/common/mbpitch.c
+++ b/vp8/common/mbpitch.c
@@ -11,21 +11,16 @@

 #include "blockd.h"

-#include "stdio.h"
-#include "vpx_config.h"
-#if CONFIG_OPENCL
-#include "opencl/vp8_opencl.h"
-#endif
-
 typedef enum
 {
    PRED = 0,
    DEST = 1
 } BLOCKSET;

-static void setup_block
+void vp8_setup_block
 (
    BLOCKD *b,
+    int mv_stride,
    unsigned char **base,
    int Stride,
    int offset,
@@ -48,183 +43,87 @@ static void setup_block

 }

-
-static void setup_macroblock(MACROBLOCKD *x, BLOCKSET bs)
+void vp8_setup_macroblock(MACROBLOCKD *x, BLOCKSET bs)
 {
    int block;

    unsigned char **y, **u, **v;
-    unsigned char **buf_base;
-    int y_off, u_off, v_off;

    if (bs == DEST)
    {
-        buf_base = &x->dst.buffer_alloc;
-        y_off = x->dst.y_buffer - x->dst.buffer_alloc;
-        u_off = x->dst.u_buffer - x->dst.buffer_alloc;
-        v_off = x->dst.v_buffer - x->dst.buffer_alloc;
        y = &x->dst.y_buffer;
        u = &x->dst.u_buffer;
        v = &x->dst.v_buffer;
-        y_off = 0;
-
-        //y = buf_base;
-        //y_off = x->dst.y_buffer - x->dst.buffer_alloc;
-        
-        u = buf_base;
-        v = buf_base;
-
-        u_off = x->dst.u_buffer - x->dst.buffer_alloc;
-        v_off = x->dst.v_buffer - x->dst.buffer_alloc;
-
    }
    else
    {
-        buf_base = &x->pre.buffer_alloc;
        y = &x->pre.y_buffer;
        u = &x->pre.u_buffer;
        v = &x->pre.v_buffer;
-        y_off = u_off = v_off = 0;
-
-        //y = buf_base;
-        //y_off = x->pre.y_buffer - x->pre.buffer_alloc;
-        //u = buf_base;
-        //u_off = x->pre.u_buffer - x->pre.buffer_alloc;
-        //v = buf_base;
-        //v_off = x->pre.v_buffer - x->pre.buffer_alloc;
    }

    for (block = 0; block < 16; block++) /* y blocks */
    {
-        setup_block(&x->block[block], y, x->dst.y_stride,
-                        y_off + ((block >> 2) * 4 * x->dst.y_stride + (block & 3) * 4), bs);
+        vp8_setup_block(&x->block[block], x->dst.y_stride, y, x->dst.y_stride,
+                        (block >> 2) * 4 * x->dst.y_stride + (block & 3) * 4, bs);
    }

    for (block = 16; block < 20; block++) /* U and V blocks */
    {
-        int block_off = ((block - 16) >> 1) * 4 * x->dst.uv_stride + (block & 1) * 4;
+        vp8_setup_block(&x->block[block], x->dst.uv_stride, u, x->dst.uv_stride,
+                        ((block - 16) >> 1) * 4 * x->dst.uv_stride + (block & 1) * 4, bs);

-        setup_block(&x->block[block], u, x->dst.uv_stride,
-                        u_off + block_off, bs);
-
-        setup_block(&x->block[block+4], v, x->dst.uv_stride,
-                        v_off + block_off, bs);
+        vp8_setup_block(&x->block[block+4], x->dst.uv_stride, v, x->dst.uv_stride,
+                        ((block - 16) >> 1) * 4 * x->dst.uv_stride + (block & 1) * 4, bs);
    }
 }

 void vp8_setup_block_dptrs(MACROBLOCKD *x)
 {
    int r, c;
-    unsigned int offset;

-#if CONFIG_OPENCL && !ONE_CQ_PER_MB
-    cl_command_queue y_cq, u_cq, v_cq;
-    int err;
-    if (cl_initialized == CL_SUCCESS){
-        //Create command queue for Y/U/V Planes
-        y_cq = clCreateCommandQueue(cl_data.context, cl_data.device_id, 0, &err);
-        if (!y_cq || err != CL_SUCCESS) {
-            printf("Error: Failed to create a command queue!\n");
-            cl_destroy(NULL, VP8_CL_TRIED_BUT_FAILED);
-        }
-        u_cq = clCreateCommandQueue(cl_data.context, cl_data.device_id, 0, &err);
-        if (!u_cq || err != CL_SUCCESS) {
-            printf("Error: Failed to create a command queue!\n");
-            cl_destroy(NULL, VP8_CL_TRIED_BUT_FAILED);
-        }
-        v_cq = clCreateCommandQueue(cl_data.context, cl_data.device_id, 0, &err);
-        if (!v_cq || err != CL_SUCCESS) {
-            printf("Error: Failed to create a command queue!\n");
-            cl_destroy(NULL, VP8_CL_TRIED_BUT_FAILED);
-        }
-    }
-#endif
-
-    /* 16 Y blocks */
    for (r = 0; r < 4; r++)
    {
        for (c = 0; c < 4; c++)
        {
-            offset = r * 4 * 16 + c * 4;
-            x->block[r*4+c].diff_offset      = offset;
-            x->block[r*4+c].predictor_offset = offset;
-#if CONFIG_OPENCL && !ONE_CQ_PER_MB
-            if (cl_initialized == CL_SUCCESS)
-                x->block[r*4+c].cl_commands = y_cq;
-#endif
+            x->block[r*4+c].diff      = &x->diff[r * 4 * 16 + c * 4];
+            x->block[r*4+c].predictor = x->predictor + r * 4 * 16 + c * 4;
        }
    }

-    /* 4 U Blocks */
    for (r = 0; r < 2; r++)
    {
        for (c = 0; c < 2; c++)
        {
-            offset = 256 + r * 4 * 8 + c * 4;
-            x->block[16+r*2+c].diff_offset      = offset;
-            x->block[16+r*2+c].predictor_offset = offset;
+            x->block[16+r*2+c].diff      = &x->diff[256 + r * 4 * 8 + c * 4];
+            x->block[16+r*2+c].predictor = x->predictor + 256 + r * 4 * 8 + c * 4;

-#if CONFIG_OPENCL && !ONE_CQ_PER_MB
-            if (cl_initialized == CL_SUCCESS)
-                x->block[16+r*2+c].cl_commands = u_cq;
-#endif
        }
    }

-    /* 4 V Blocks */
    for (r = 0; r < 2; r++)
    {
        for (c = 0; c < 2; c++)
        {
-            offset = 320+ r * 4 * 8 + c * 4;
-            x->block[20+r*2+c].diff_offset      = offset;
-            x->block[20+r*2+c].predictor_offset = offset;
+            x->block[20+r*2+c].diff      = &x->diff[320+ r * 4 * 8 + c * 4];
+            x->block[20+r*2+c].predictor = x->predictor + 320 + r * 4 * 8 + c * 4;

-#if CONFIG_OPENCL && !ONE_CQ_PER_MB
-            if (cl_initialized == CL_SUCCESS)
-                x->block[20+r*2+c].cl_commands = v_cq;
-#endif
        }
    }

-    x->block[24].diff_offset = 384;
+    x->block[24].diff = &x->diff[384];

    for (r = 0; r < 25; r++)
    {
-    	x->block[r].qcoeff_base = x->qcoeff;
-    	x->block[r].qcoeff_offset = r * 16;
-        x->block[r].dqcoeff_base = x->dqcoeff;
-        x->block[r].dqcoeff_offset = r * 16;
-        
-        x->block[r].predictor_base = x->predictor;
-        x->block[r].diff_base = x->diff;
-        x->block[r].eobs_base = x->eobs;
-
-#if CONFIG_OPENCL
-        if (cl_initialized == CL_SUCCESS){
-            /* Copy command queue reference from macroblock */
-#if ONE_CQ_PER_MB
-            x->block[r].cl_commands = x->cl_commands;
-#endif
-
-            /* Set up CL memory buffers as appropriate */
-            x->block[r].cl_diff_mem = x->cl_diff_mem;
-            x->block[r].cl_dqcoeff_mem = x->cl_dqcoeff_mem;
-            x->block[r].cl_eobs_mem = x->cl_eobs_mem;
-            x->block[r].cl_predictor_mem = x->cl_predictor_mem;
-            x->block[r].cl_qcoeff_mem = x->cl_qcoeff_mem;
-        }
-
-        //Copy filter type to block.
-        x->block[r].sixtap_filter = x->sixtap_filter;
-#endif
+        x->block[r].qcoeff  = x->qcoeff  + r * 16;
+        x->block[r].dqcoeff = x->dqcoeff + r * 16;
    }
-
 }

 void vp8_build_block_doffsets(MACROBLOCKD *x)
 {
+
    /* handle the destination pitch features */
-    setup_macroblock(x, DEST);
-    setup_macroblock(x, PRED);
+    vp8_setup_macroblock(x, DEST);
+    vp8_setup_macroblock(x, PRED);
 }
--- a/vp8/common/onyxc_int.h
+++ b/vp8/common/onyxc_int.h
@@ -120,6 +120,7 @@ typedef struct VP8Common
    int mb_no_coeff_skip;
    int no_lpf;
    int simpler_lpf;
+    int use_bilinear_mc_filter;
    int full_pixel;

    int base_qindex;
@@ -139,6 +140,8 @@ typedef struct VP8Common

    MODE_INFO *mip; /* Base of allocated array */
    MODE_INFO *mi;  /* Corresponds to upper left visible macroblock */
+    MODE_INFO *prev_mip; /* MODE_INFO array 'mip' from last decoded frame */
+    MODE_INFO *prev_mi;  /* 'mi' from last frame (points into prev_mip) */


    INTERPOLATIONFILTERTYPE mcomp_filter_type;
@@ -199,7 +202,7 @@ typedef struct VP8Common
 } VP8_COMMON;


-int vp8_adjust_mb_lf_value(MACROBLOCKD *mbd, int filter_level);
+void vp8_adjust_mb_lf_value(MACROBLOCKD *mbd, int *filter_level);
 void vp8_init_loop_filter(VP8_COMMON *cm);
 void vp8_frame_init_loop_filter(loop_filter_info *lfi, int frame_type);
 extern void vp8_loop_filter_frame(VP8_COMMON *cm,    MACROBLOCKD *mbd,  int filt_val);
--- a/vp8/common/opencl/blockd_cl.c
+++ b/vp8/common/opencl/blockd_cl.c
@@ -1,233 +0,0 @@
-/*
- *  Copyright (c) 2011 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-#include "../../decoder/onyxd_int.h"
-#include "../../../vpx_ports/config.h"
-#include "../../common/idct.h"
-#include "blockd_cl.h"
-#include "../../decoder/opencl/dequantize_cl.h"
-
-
-int vp8_cl_mb_prep(MACROBLOCKD *x, int flags){
-        int err;
-
-    if (cl_initialized != CL_SUCCESS){
-        return cl_initialized;
-    }
-
-    //Copy all blockd.cl_*_mem objects
-    if (flags & DIFF)
-        VP8_CL_SET_BUF(x->cl_commands, x->cl_diff_mem, sizeof(cl_short)*400, x->diff,
-            ,err
-        );
-
-    if (flags & PREDICTOR)
-        VP8_CL_SET_BUF(x->cl_commands, x->cl_predictor_mem, sizeof(cl_uchar)*384, x->predictor,
-            ,err
-        );
-
-    if (flags & QCOEFF)
-        VP8_CL_SET_BUF(x->cl_commands, x->cl_qcoeff_mem, sizeof(cl_short)*400, x->qcoeff,
-            ,err
-        );
-
-    if (flags & DQCOEFF)
-        VP8_CL_SET_BUF(x->cl_commands, x->cl_dqcoeff_mem, sizeof(cl_short)*400, x->dqcoeff,
-            ,err
-        );
-
-    if (flags & EOBS)
-        VP8_CL_SET_BUF(x->cl_commands, x->cl_eobs_mem, sizeof(cl_char)*25, x->eobs,
-            ,err
-        );
-
-    if (flags & PRE_BUF){
-        VP8_CL_SET_BUF(x->cl_commands, x->pre.buffer_mem, x->pre.buffer_size, x->pre.buffer_alloc,
-            ,err
-        );
-    }
-
-    if (flags & DST_BUF){
-        VP8_CL_SET_BUF(x->cl_commands, x->dst.buffer_mem, x->dst.buffer_size, x->dst.buffer_alloc,
-            ,err
-        );
-    }
-
-
-    return CL_SUCCESS;
-}
-
-int vp8_cl_mb_finish(MACROBLOCKD *x, int flags){
-    int err;
-
-    if (cl_initialized != CL_SUCCESS){
-        return cl_initialized;
-    }
-
-    if (flags & DIFF){
-        err = clEnqueueReadBuffer(x->cl_commands, x->cl_diff_mem, CL_FALSE, 0, sizeof(cl_short)*400, x->diff, 0, NULL, NULL);
-        VP8_CL_CHECK_SUCCESS( x->cl_commands, err != CL_SUCCESS,
-        "Error: Failed to read from GPU!\n",
-            , err
-        );
-    }
-
-    if (flags & PREDICTOR){
-    err = clEnqueueReadBuffer(x->cl_commands, x->cl_predictor_mem, CL_FALSE, 0, sizeof(cl_uchar)*384, x->predictor, 0, NULL, NULL);
-    VP8_CL_CHECK_SUCCESS( x->cl_commands, err != CL_SUCCESS,
-        "Error: Failed to read from GPU!\n",
-            , err
-    );
-    }
-
-    if (flags & QCOEFF){
-    err = clEnqueueReadBuffer(x->cl_commands, x->cl_qcoeff_mem, CL_FALSE, 0, sizeof(cl_short)*400, x->qcoeff, 0, NULL, NULL);
-    VP8_CL_CHECK_SUCCESS( x->cl_commands, err != CL_SUCCESS,
-        "Error: Failed to read from GPU!\n",
-            , err
-    );
-    }
-
-    if (flags & DQCOEFF){
-    err = clEnqueueReadBuffer(x->cl_commands, x->cl_dqcoeff_mem, CL_FALSE, 0, sizeof(cl_short)*400, x->dqcoeff, 0, NULL, NULL);
-    VP8_CL_CHECK_SUCCESS( x->cl_commands, err != CL_SUCCESS,
-        "Error: Failed to read from GPU!\n",
-            , err
-    );
-    }
-
-    if (flags & EOBS){
-        err = clEnqueueReadBuffer(x->cl_commands, x->cl_eobs_mem, CL_FALSE, 0, sizeof(cl_char)*25, x->eobs, 0, NULL, NULL);
-        VP8_CL_CHECK_SUCCESS( x->cl_commands, err != CL_SUCCESS,
-          "Error: Failed to read from GPU!\n",
-            , err
-        );
-    }
-
-    if (flags & PRE_BUF){
-        err = clEnqueueReadBuffer(x->cl_commands, x->pre.buffer_mem, CL_FALSE, 
-                0, x->pre.buffer_size, x->pre.buffer_alloc, 0, NULL, NULL);
-        VP8_CL_CHECK_SUCCESS( x->cl_commands, err != CL_SUCCESS,
-          "Error: Failed to read from GPU!\n",
-            , err
-        );
-    }
-
-    if (flags & DST_BUF){
-        err = clEnqueueReadBuffer(x->cl_commands, x->dst.buffer_mem, CL_FALSE,
-                0, x->dst.buffer_size, x->dst.buffer_alloc, 0, NULL, NULL);
-        VP8_CL_CHECK_SUCCESS( x->cl_commands, err != CL_SUCCESS,
-          "Error: Failed to read from GPU!\n",
-            , err
-        );
-    }
-
-
-    return CL_SUCCESS;
-}
-
-int vp8_cl_block_prep(BLOCKD *b, int flags){
-    int err;
-
-    if (cl_initialized != CL_SUCCESS){
-        return cl_initialized;
-    }
-
-    //Copy all blockd.cl_*_mem objects
-    if (flags & DIFF)
-        VP8_CL_SET_BUF(b->cl_commands, b->cl_diff_mem, sizeof(cl_short)*400, b->diff_base,
-            ,err
-        );
-
-    if (flags & PREDICTOR)
-        VP8_CL_SET_BUF(b->cl_commands, b->cl_predictor_mem, sizeof(cl_uchar)*384, b->predictor_base,
-            ,err
-        );
-
-    if (flags & QCOEFF)
-        VP8_CL_SET_BUF(b->cl_commands, b->cl_qcoeff_mem, sizeof(cl_short)*400, b->qcoeff_base,
-            ,err
-        );
-
-    if (flags & DQCOEFF)
-        VP8_CL_SET_BUF(b->cl_commands, b->cl_dqcoeff_mem, sizeof(cl_short)*400, b->dqcoeff_base,
-            ,err
-        );
-
-    if (flags & EOBS)
-        VP8_CL_SET_BUF(b->cl_commands, b->cl_eobs_mem, sizeof(cl_char)*25, b->eobs_base,
-            ,err
-        );
-
-    if (flags & DEQUANT)
-        VP8_CL_SET_BUF(b->cl_commands, b->cl_dequant_mem, sizeof(cl_short)*16 ,b->dequant,
-            ,err
-        );
-
-    return CL_SUCCESS;
-}
-
-int vp8_cl_block_finish(BLOCKD *b, int flags){
-    int err;
-
-    if (cl_initialized != CL_SUCCESS){
-        return cl_initialized;
-    }
-
-    if (flags & DIFF){
-        err = clEnqueueReadBuffer(b->cl_commands, b->cl_diff_mem, CL_FALSE, 0, sizeof(cl_short)*400, b->diff_base, 0, NULL, NULL);
-        VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
-        "Error: Failed to read from GPU!\n",
-            , err
-        );
-    }
-
-    if (flags & PREDICTOR){
-    err = clEnqueueReadBuffer(b->cl_commands, b->cl_predictor_mem, CL_FALSE, 0, sizeof(cl_uchar)*384, b->predictor_base, 0, NULL, NULL);
-    VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
-        "Error: Failed to read from GPU!\n",
-            , err
-    );
-    }
-
-    if (flags & QCOEFF){
-    err = clEnqueueReadBuffer(b->cl_commands, b->cl_qcoeff_mem, CL_FALSE, 0, sizeof(cl_short)*400, b->qcoeff_base, 0, NULL, NULL);
-    VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
-        "Error: Failed to read from GPU!\n",
-            , err
-    );
-    }
-
-    if (flags & DQCOEFF){
-    err = clEnqueueReadBuffer(b->cl_commands, b->cl_dqcoeff_mem, CL_FALSE, 0, sizeof(cl_short)*400, b->dqcoeff_base, 0, NULL, NULL);
-    VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
-        "Error: Failed to read from GPU!\n",
-            , err
-    );
-    }
-
-    if (flags & EOBS){
-    err = clEnqueueReadBuffer(b->cl_commands, b->cl_eobs_mem, CL_FALSE, 0, sizeof(cl_char)*25, b->eobs_base, 0, NULL, NULL);
-    VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
-        "Error: Failed to read from GPU!\n",
-            , err
-    );
-    }
-
-    if (flags & DEQUANT){
-    err = clEnqueueReadBuffer(b->cl_commands, b->cl_dequant_mem, CL_FALSE, 0, sizeof(cl_short)*16 ,b->dequant, 0, NULL, NULL);
-    VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
-        "Error: Failed to read from GPU!\n",
-            , err
-    );
-    }
-
-    return CL_SUCCESS;
-}
--- a/vp8/common/opencl/blockd_cl.h
+++ b/vp8/common/opencl/blockd_cl.h
@@ -1,64 +0,0 @@
-/*
- *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-
-#ifndef BLOCKD_OPENCL_H
-#define BLOCKD_OPENCL_H
-
-#ifdef	__cplusplus
-extern "C" {
-#endif
-
-#include "vp8_opencl.h"
-#include "../blockd.h"
-
-#define DIFF 0x0001
-#define PREDICTOR 0x0002
-#define QCOEFF 0x0004
-#define DQCOEFF 0x0008
-#define EOBS 0x0010
-#define DEQUANT 0x0020
-#define PRE_BUF 0x0040
-#define DST_BUF 0x0080
-    
-#define BLOCK_COPY_ALL 0xffff
-
-/*
-#define BLOCK_MEM_SIZE 6
-enum {
-    DIFF_MEM = 0,
-    PRED_MEM = 1,
-    QCOEFF_MEM = 2,
-    DQCOEFF_MEM = 3,
-    EOBS_MEM = 4,
-    DEQUANT_MEM = 5
-} BLOCK_MEM_TYPES;
-
-
-struct cl_block_mem{
-    cl_mem gpu_mem;
-    size_t size;
-    void *host_mem;
-};
-
-typedef struct cl_block_mem block_mem;
-*/
-    
-extern int vp8_cl_block_finish(BLOCKD *b, int flags);
-extern int vp8_cl_block_prep(BLOCKD *b, int flags);
-
-extern int vp8_cl_mb_prep(MACROBLOCKD *x, int flags);
-extern int vp8_cl_mb_finish(MACROBLOCKD *x, int flags);
-
-#ifdef	__cplusplus
-}
-#endif
-
-#endif
--- a/vp8/common/opencl/dynamic_cl.c
+++ b/vp8/common/opencl/dynamic_cl.c
@@ -1,106 +0,0 @@
-/*
- *  Copyright (c) 2011 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-#include "vp8_opencl.h"
-
-#include <stdio.h>
-
-CL_FUNCTIONS cl;
-void *dll = NULL;
-int cl_loaded = VP8_CL_NOT_INITIALIZED;
-
-int close_cl(){
-    int ret = dlclose(dll);
-
-    if (ret != 0)
-        fprintf(stderr, "Error closing OpenCL library: %s", dlerror());
-
-    return ret;
-}
-
-int load_cl(char *lib_name){
-
-    //printf("Loading OpenCL library\n");
-    dll = dlopen(lib_name, RTLD_NOW|RTLD_LOCAL);
-    if (dll != NULL){
-        //printf("Found CL library\n");
-    } else {
-        //printf("Didn't find CL library\n");
-        return VP8_CL_TRIED_BUT_FAILED;
-    }
-
-    CL_LOAD_FN("clGetPlatformIDs", cl.getPlatformIDs);
-    CL_LOAD_FN("clGetPlatformInfo", cl.getPlatformInfo);
-    CL_LOAD_FN("clGetDeviceIDs", cl.getDeviceIDs);
-    CL_LOAD_FN("clGetDeviceInfo", cl.getDeviceInfo);
-    CL_LOAD_FN("clCreateContext", cl.createContext);
-//    CL_LOAD_FN("clCreateContextFromType", cl.createContextFromType);
-//    CL_LOAD_FN("clRetainContext", cl.retainContext);
-    CL_LOAD_FN("clReleaseContext", cl.releaseContext);
-//    CL_LOAD_FN("clGetContextInfo", cl.getContextInfo);
-    CL_LOAD_FN("clCreateCommandQueue", cl.createCommandQueue);
-//    CL_LOAD_FN("clRetainCommandQueue", cl.retainCommandQueue);
-    CL_LOAD_FN("clReleaseCommandQueue", cl.releaseCommandQueue);
-//    CL_LOAD_FN("clGetCommandQueueInfo", cl.getCommandQueue);
-    CL_LOAD_FN("clCreateBuffer", cl.createBuffer);
-//    CL_LOAD_FN("clCreateImage2D", cl.createImage2D);
-//    CL_LOAD_FN("clCreateImage3D", cl.createImage3D);
-//    CL_LOAD_FN("clRetainMemObject", cl.retainMemObject);
-    CL_LOAD_FN("clReleaseMemObject", cl.releaseMemObject);
-//    CL_LOAD_FN("clGetSupportedImageFormats", cl.getSupportedImageFormats);
-//    CL_LOAD_FN("clGetMemObjectInfo", cl.getMemObjectInfo);
-//    CL_LOAD_FN("clGetImageInfo", cl.getImageInfo);
-//    CL_LOAD_FN("clCreateSampler", cl.createSampler);
-//    CL_LOAD_FN("clRetainSampler", cl.retainSampler);
-//    CL_LOAD_FN("clReleaseSampler", cl.releaseSampler);
-//    CL_LOAD_FN("clGetSamplerInfo", cl.getSamplerInfo);
-    CL_LOAD_FN("clCreateProgramWithSource", cl.createProgramWithSource);
-//    CL_LOAD_FN("clCreateProgramWithBinary", cl.createProgramWithBinary);
-//    CL_LOAD_FN("clRetainProgram", cl.retainProgram);
-    CL_LOAD_FN("clReleaseProgram", cl.releaseProgram);
-    CL_LOAD_FN("clBuildProgram", cl.buildProgram);
-//    CL_LOAD_FN("clUnloadCompiler", cl.unloadCompiler);
-    CL_LOAD_FN("clGetProgramInfo", cl.getProgramInfo);
-    CL_LOAD_FN("clGetProgramBuildInfo", cl.getProgramBuildInfo);
-    CL_LOAD_FN("clCreateKernel", cl.createKernel);
-//    CL_LOAD_FN("clCreateKernelsInProgram", cl.createKernelsInProgram);
-//    CL_LOAD_FN("clRetainKernel", cl.retainKernel);
-    CL_LOAD_FN("clReleaseKernel", cl.releaseKernel);
-    CL_LOAD_FN("clSetKernelArg", cl.setKernelArg);
-//    CL_LOAD_FN("clGetKernelInfo", cl.getKernelInfo);
-    CL_LOAD_FN("clGetKernelWorkGroupInfo", cl.getKernelWorkGroupInfo);
-//    CL_LOAD_FN("clWaitForEvents", cl.waitForEvents);
-//    CL_LOAD_FN("clGetEventInfo", cl.getEventInfo);
-//    CL_LOAD_FN("clRetainEvent", cl.retainEvent);
-//    CL_LOAD_FN("clReleaseEvent", cl.releaseEvent);
-//    CL_LOAD_FN("clGetEventProfilingInfo", cl.getEventProfilingInfo);
-    CL_LOAD_FN("clFlush", cl.flush);
-    CL_LOAD_FN("clFinish", cl.finish);
-    CL_LOAD_FN("clEnqueueReadBuffer", cl.enqueueReadBuffer);
-    CL_LOAD_FN("clEnqueueWriteBuffer", cl.enqueueWriteBuffer);
-    CL_LOAD_FN("clEnqueueCopyBuffer", cl.enqueueCopyBuffer);
-//    CL_LOAD_FN("clEnqueueReadImage", cl.enqueueReadImage);
-//    CL_LOAD_FN("clEnqueueWriteImage", cl.enqueueWriteImage);
-//    CL_LOAD_FN("clEnqueueCopyImage", cl.enqueueCopyImage);
-//    CL_LOAD_FN("clEnqueueCopyImageToBuffer", cl.enqueueCopyImageToBuffer);
-//    CL_LOAD_FN("clEnqueueCopyBufferToImage", cl.enqueueCopyBufferToImage);
-//    CL_LOAD_FN("clEnqueueMapBuffer", cl.enqueueMapBuffer);
-//    CL_LOAD_FN("clEnqueueMapImage", cl.enqueueMapImage);
-//    CL_LOAD_FN("clEnqueueUnmapMemObject", cl.enqueueUnmapMemObject);
-    CL_LOAD_FN("clEnqueueNDRangeKernel", cl.enqueueNDRAngeKernel);
-//    CL_LOAD_FN("clEnqueueTask", cl.enqueueTask);
-//    CL_LOAD_FN("clEnqueueNativeKernel", cl.enqueueNativeKernel);
-//    CL_LOAD_FN("clEnqueueMarker", cl.enqueueMarker);
-//    CL_LOAD_FN("clEnqueueWaitForEvents", cl.enqueueWaitForEvents);
-    CL_LOAD_FN("clEnqueueBarrier", cl.enqueueBarrier);
-//    CL_LOAD_FN("clGetExtensionFunctionAddress", cl.getExtensionFunctionAddress);
-
-    return CL_SUCCESS;
-}
--- a/vp8/common/opencl/dynamic_cl.h
+++ b/vp8/common/opencl/dynamic_cl.h
@@ -1,253 +0,0 @@
-/*
- *  Copyright (c) 2011 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-#ifndef DYNAMIC_CL_H
-#define	DYNAMIC_CL_H
-
-#ifdef	__cplusplus
-extern "C" {
-#endif
-
-#ifdef __APPLE__
-#include <OpenCL/cl.h>
-#else
-#include <CL/cl.h>
-#endif
-    
-#include <dlfcn.h>
-
-int load_cl(char *lib_name);
-int close_cl();
-
-extern int cl_loaded;
-
-typedef cl_int(*fn_clGetPlatformIDs_t)(cl_uint, cl_platform_id *, cl_uint *);
-typedef cl_int(*fn_clGetPlatformInfo_t)(cl_platform_id, cl_platform_info, size_t, void *, size_t *);
-typedef cl_int(*fn_clGetDeviceIDs_t)(cl_platform_id, cl_device_type, cl_uint, cl_device_id *, cl_uint *);
-typedef cl_int(*fn_clGetDeviceInfo_t)(cl_device_id, cl_device_info, size_t, void *, size_t *);
-typedef cl_context(*fn_clCreateContext_t)(const cl_context_properties *, cl_uint, const cl_device_id *, void (*pfn_notify)(const char *, const void *, size_t, void *), void *, cl_int *);
-typedef cl_context(*fn_clCreateContextFromType_t)(const cl_context_properties *, cl_device_type, void (*pfn_notify)(const char *, const void *, size_t, void *), void *, cl_int *);
-typedef cl_int(*fn_clRetainContext_t)(cl_context);
-typedef cl_int(*fn_clReleaseContext_t)(cl_context);
-typedef cl_int(*fn_clGetContextInfo_t)(cl_context, cl_context_info, size_t, void *, size_t *);
-typedef cl_command_queue(*fn_clCreateCommandQueue_t)(cl_context, cl_device_id, cl_command_queue_properties, cl_int *);
-typedef cl_int(*fn_clRetainCommandQueue_t)(cl_command_queue);
-typedef cl_int(*fn_clReleaseCommandQueue_t)(cl_command_queue);
-typedef cl_int(*fn_clGetCommandQueueInfo_t)(cl_command_queue, cl_command_queue_info, size_t, void *, size_t *);
-typedef cl_mem(*fn_clCreateBuffer_t)(cl_context, cl_mem_flags, size_t, void *, cl_int *);
-typedef cl_mem(*fn_clCreateImage2D_t)(cl_context, cl_mem_flags, const cl_image_format *, size_t, size_t, size_t, void *, cl_int *);
-typedef cl_mem(*fn_clCreateImage3D_t)(cl_context, cl_mem_flags, const cl_image_format *, size_t, size_t, size_t, size_t, size_t, void *, cl_int *);
-typedef cl_int(*fn_clRetainMemObject_t)(cl_mem);
-typedef cl_int(*fn_clReleaseMemObject_t)(cl_mem);
-typedef cl_int(*fn_clGetSupportedImageFormats_t)(cl_context, cl_mem_flags, cl_mem_object_type, cl_uint, cl_image_format *, cl_uint *);
-typedef cl_int(*fn_clGetMemObjectInfo_t)(cl_mem, cl_mem_info, size_t, void *, size_t *);
-typedef cl_int(*fn_clGetImageInfo_t)(cl_mem, cl_image_info, size_t, void *, size_t *);
-typedef cl_sampler(*fn_clCreateSampler_t)(cl_context, cl_bool, cl_addressing_mode, cl_filter_mode, cl_int *);
-typedef cl_int(*fn_clRetainSampler_t)(cl_sampler);
-typedef cl_int(*fn_clReleaseSampler_t)(cl_sampler);
-typedef cl_int(*fn_clGetSamplerInfo_t)(cl_sampler, cl_sampler_info, size_t, void *, size_t *);
-typedef cl_program(*fn_clCreateProgramWithSource_t)(cl_context, cl_uint, const char **, const size_t *, cl_int *);
-typedef cl_program(*fn_clCreateProgramWithBinary_t)(cl_context, cl_uint, const cl_device_id *, const size_t *, const unsigned char **, cl_int *, cl_int *);
-typedef cl_int(*fn_clRetainProgram_t)(cl_program);
-typedef cl_int(*fn_clReleaseProgram_t)(cl_program);
-typedef cl_int(*fn_clBuildProgram_t)(cl_program, cl_uint, const cl_device_id *, const char *,  void (*pfn_notify)(cl_program,void*), void *);
-typedef cl_int(*fn_clUnloadCompiler_t)(void);
-typedef cl_int(*fn_clGetProgramInfo_t)(cl_program, cl_program_info, size_t, void *, size_t *);
-typedef cl_int(*fn_clGetProgramBuildInfo_t)(cl_program, cl_device_id, cl_program_build_info, size_t, void *, size_t *);
-typedef cl_kernel(*fn_clCreateKernel_t)(cl_program, const char *, cl_int *);
-typedef cl_int(*fn_clCreateKernelsInProgram_t)(cl_program, cl_uint, cl_kernel *, cl_uint *);
-typedef cl_int(*fn_clRetainKernel_t)(cl_kernel);
-typedef cl_int(*fn_clReleaseKernel_t)(cl_kernel);
-typedef cl_int(*fn_clSetKernelArg_t)(cl_kernel, cl_uint, size_t, const void *);
-typedef cl_int(*fn_clGetKernelInfo_t)(cl_kernel, cl_kernel_info, size_t, void *, size_t *);
-typedef cl_int(*fn_clGetKernelWorkGroupInfo_t)(cl_kernel, cl_device_id, cl_kernel_work_group_info, size_t, void *, size_t *);
-typedef cl_int(*fn_clWaitForEvents_t)(cl_uint, const cl_event *);
-typedef cl_int(*fn_clGetEventInfo_t)(cl_event, cl_event_info, size_t, void *, size_t *);
-typedef cl_int(*fn_clRetainEvent_t)(cl_event);
-typedef cl_int(*fn_clReleaseEvent_t)(cl_event);
-typedef cl_int(*fn_clGetEventProfilingInfo_t)(cl_event, cl_profiling_info, size_t, void *, size_t *);
-typedef cl_int(*fn_clFlush_t)(cl_command_queue);
-typedef cl_int(*fn_clFinish_t)(cl_command_queue);
-typedef cl_int(*fn_clEnqueueReadBuffer_t)(cl_command_queue, cl_mem, cl_bool, size_t, size_t, void *, cl_uint, const cl_event *, cl_event *);
-typedef cl_int(*fn_clEnqueueWriteBuffer_t)(cl_command_queue,  cl_mem,  cl_bool,  size_t,  size_t,  const void *,  cl_uint,  const cl_event *,  cl_event *);
-typedef cl_int(*fn_clEnqueueCopyBuffer_t)(cl_command_queue,  cl_mem, cl_mem, size_t, size_t, size_t, cl_uint, const cl_event *, cl_event *);
-typedef cl_int(*fn_clEnqueueReadImage_t)(cl_command_queue, cl_mem, cl_bool, const size_t *, const size_t *, size_t, size_t, void *, cl_uint, const cl_event *, cl_event *);
-typedef cl_int(*fn_clEnqueueWriteImage_t)(cl_command_queue, cl_mem, cl_bool, const size_t *, const size_t *, size_t, size_t, const void *, cl_uint, const cl_event *, cl_event *);
-typedef cl_int(*fn_clEnqueueCopyImage_t)(cl_command_queue, cl_mem, cl_mem, const size_t *, const size_t *, const size_t *, cl_uint, const cl_event *, cl_event *);
-typedef cl_int(*fn_clEnqueueCopyImageToBuffer_t)(cl_command_queue, cl_mem, cl_mem, const size_t *, const size_t *, size_t, cl_uint, const cl_event *, cl_event *);
-typedef cl_int(*fn_clEnqueueCopyBufferToImage_t)(cl_command_queue, cl_mem, cl_mem, size_t, const size_t *, const size_t *, cl_uint, const cl_event *, cl_event *);
-typedef void*(*fn_clEnqueueMapBuffer_t)(cl_command_queue, cl_mem, cl_bool, cl_map_flags, size_t, size_t, cl_uint, const cl_event *, cl_event *, cl_int *);
-typedef void*(*fn_clEnqueueMapImage_t)(cl_command_queue, cl_mem, cl_bool, cl_map_flags, const size_t *, const size_t *, size_t *, size_t *, cl_uint, const cl_event *, cl_event *, cl_int *);
-typedef cl_int(*fn_clEnqueueUnmapMemObject_t)(cl_command_queue, cl_mem, void *, cl_uint, const cl_event *, cl_event *);
-typedef cl_int(*fn_clEnqueueNDRangeKernel_t)(cl_command_queue, cl_kernel, cl_uint, const size_t *, const size_t *, const size_t *, cl_uint, const cl_event *, cl_event *);
-typedef cl_int(*fn_clEnqueueTask_t)(cl_command_queue, cl_kernel, cl_uint, const cl_event *, cl_event *);
-typedef cl_int(*fn_clEnqueueNativeKernel_t)(cl_command_queue,					 void (*user_func)(void *), void *, size_t, cl_uint, const cl_mem *, const void **, cl_uint, const cl_event *, cl_event *);
-typedef cl_int(*fn_clEnqueueMarker_t)(cl_command_queue, cl_event *);
-typedef cl_int(*fn_clEnqueueWaitForEvents_t)(cl_command_queue, cl_uint, const cl_event *);
-typedef cl_int(*fn_clEnqueueBarrier_t)(cl_command_queue);
-typedef void*(*fn_clGetExtensionFunctionAddress_t)(const char *);
-
-typedef struct CL_FUNCTIONS {
-    fn_clGetPlatformIDs_t getPlatformIDs;
-    fn_clGetPlatformInfo_t getPlatformInfo;
-    fn_clGetDeviceIDs_t getDeviceIDs;
-    fn_clGetDeviceInfo_t getDeviceInfo;
-    fn_clCreateContext_t createContext;
-    fn_clCreateContextFromType_t createContextFromType;
-    fn_clRetainContext_t retainContext;
-    fn_clReleaseContext_t releaseContext;
-    fn_clGetContextInfo_t getContextInfo;
-    fn_clCreateCommandQueue_t createCommandQueue;
-    fn_clRetainCommandQueue_t retainCommandQueue;
-    fn_clReleaseCommandQueue_t releaseCommandQueue;
-    fn_clGetCommandQueueInfo_t getCommandQueue;
-    fn_clCreateBuffer_t createBuffer;
-    fn_clCreateImage2D_t createImage2D;
-    fn_clCreateImage3D_t createImage3D;
-    fn_clRetainMemObject_t retainMemObject;
-    fn_clReleaseMemObject_t releaseMemObject;
-    fn_clGetSupportedImageFormats_t getSupportedImageFormats;
-    fn_clGetMemObjectInfo_t getMemObjectInfo;
-    fn_clGetImageInfo_t getImageInfo;
-    fn_clCreateSampler_t createSampler;
-    fn_clRetainSampler_t retainSampler;
-    fn_clReleaseSampler_t releaseSampler;
-    fn_clGetSamplerInfo_t getSamplerInfo;
-    fn_clCreateProgramWithSource_t createProgramWithSource;
-    fn_clCreateProgramWithBinary_t createProgramWithBinary;
-    fn_clRetainProgram_t retainProgram;
-    fn_clReleaseProgram_t releaseProgram;
-    fn_clBuildProgram_t buildProgram;
-    fn_clUnloadCompiler_t unloadCompiler;
-    fn_clGetProgramInfo_t getProgramInfo;
-    fn_clGetProgramBuildInfo_t getProgramBuildInfo;
-    fn_clCreateKernel_t createKernel;
-    fn_clCreateKernelsInProgram_t createKernelsInProgram;
-    fn_clRetainKernel_t retainKernel;
-    fn_clReleaseKernel_t releaseKernel;
-    fn_clSetKernelArg_t setKernelArg;
-    fn_clGetKernelInfo_t getKernelInfo;
-    fn_clGetKernelWorkGroupInfo_t getKernelWorkGroupInfo;
-    fn_clWaitForEvents_t waitForEvents;
-    fn_clGetEventInfo_t getEventInfo;
-    fn_clRetainEvent_t retainEvent;
-    fn_clReleaseEvent_t releaseEvent;
-    fn_clGetEventProfilingInfo_t getEventProfilingInfo;
-    fn_clFlush_t flush;
-    fn_clFinish_t finish;
-    fn_clEnqueueReadBuffer_t enqueueReadBuffer;
-    fn_clEnqueueWriteBuffer_t enqueueWriteBuffer;
-    fn_clEnqueueCopyBuffer_t enqueueCopyBuffer;
-    fn_clEnqueueReadImage_t enqueueReadImage;
-    fn_clEnqueueWriteImage_t enqueueWriteImage;
-    fn_clEnqueueCopyImage_t enqueueCopyImage;
-    fn_clEnqueueCopyImageToBuffer_t enqueueCopyImageToBuffer;
-    fn_clEnqueueCopyBufferToImage_t enqueueCopyBufferToImage;
-    fn_clEnqueueMapBuffer_t enqueueMapBuffer;
-    fn_clEnqueueMapImage_t enqueueMapImage;
-    fn_clEnqueueUnmapMemObject_t enqueueUnmapMemObject;
-    fn_clEnqueueNDRangeKernel_t enqueueNDRAngeKernel;
-    fn_clEnqueueTask_t enqueueTask;
-    fn_clEnqueueNativeKernel_t enqueueNativeKernel;
-    fn_clEnqueueMarker_t enqueueMarker;
-    fn_clEnqueueWaitForEvents_t enqueueWaitForEvents;
-    fn_clEnqueueBarrier_t enqueueBarrier;
-    fn_clGetExtensionFunctionAddress_t getExtensionFunctionAddress;
-} CL_FUNCTIONS;
-
-extern CL_FUNCTIONS cl;
-
-#define clGetPlatformIDs cl.getPlatformIDs
-#define clGetPlatformInfo cl.getPlatformInfo
-#define clGetDeviceIDs cl.getDeviceIDs
-#define clGetDeviceInfo cl.getDeviceInfo
-#define clCreateContext cl.createContext
-#define clCreateContextFromType cl.createContextFromType
-#define clRetainContext cl.retainContext
-#define clReleaseContext cl.releaseContext
-#define clGetContextInfo cl.getContextInfo
-#define clCreateCommandQueue cl.createCommandQueue
-#define clRetainCommandQueue cl.retainCommandQueue
-#define clReleaseCommandQueue cl.releaseCommandQueue
-#define clGetCommandQueueInfo cl.getCommandQueue
-#define clCreateBuffer cl.createBuffer
-#define clCreateSubBuffer cl.createSubBuffer
-#define clCreateImage2D cl.createImage2D
-#define clCreateImage3D cl.createImage3D
-#define clRetainMemObject cl.retainMemObject
-#define clReleaseMemObject cl.releaseMemObject
-#define clGetSupportedImageFormats cl.getSupportedImageFormats
-#define clGetMemObjectInfo cl.getMemObjectInfo
-#define clGetImageInfo cl.getImageInfo
-#define clSetMemObjectDestructorCallback cl.setMemObjectDestructorCallback
-#define clCreateSampler cl.createSampler
-#define clRetainSampler cl.retainSampler
-#define clReleaseSampler cl.releaseSampler
-#define clGetSamplerInfo cl.getSamplerInfo
-#define clCreateProgramWithSource cl.createProgramWithSource
-#define clCreateProgramWithBinary cl.createProgramWithBinary
-#define clRetainProgram cl.retainProgram
-#define clReleaseProgram cl.releaseProgram
-#define clBuildProgram cl.buildProgram
-#define clUnloadCompiler cl.unloadCompiler
-#define clGetProgramInfo cl.getProgramInfo
-#define clGetProgramBuildInfo cl.getProgramBuildInfo
-#define clCreateKernel cl.createKernel
-#define clCreateKernelsInProgram cl.createKernelsInProgram
-#define clRetainKernel cl.retainKernel
-#define clReleaseKernel cl.releaseKernel
-#define clSetKernelArg cl.setKernelArg
-#define clGetKernelInfo cl.getKernelInfo
-#define clGetKernelWorkGroupInfo cl.getKernelWorkGroupInfo
-#define clWaitForEvents cl.waitForEvents
-#define clGetEventInfo cl.getEventInfo
-#define clCreateUserEvent cl.createUserEvent
-#define clRetainEvent cl.retainEvent
-#define clReleaseEvent cl.releaseEvent
-#define clSetUserEventStatus cl.setUserEventStatus
-#define clSetEventCallback cl.setEventCallback
-#define clGetEventProfilingInfo cl.getEventProfilingInfo
-#define clFlush cl.flush
-#define clFinish cl.finish
-#define clEnqueueReadBuffer cl.enqueueReadBuffer
-#define clEnqueueReadBufferRect cl.enqueueReadBufferRect
-#define clEnqueueWriteBuffer cl.enqueueWriteBuffer
-#define clEnqueueWriteBufferRect cl.enqueueWriteBufferRect
-#define clEnqueueCopyBuffer cl.enqueueCopyBuffer
-#define clEnqueueCopyBufferRect cl.enqueueCopyBufferRect
-#define clEnqueueReadImage cl.enqueueReadImage
-#define clEnqueueWriteImage cl.enqueueWriteImage
-#define clEnqueueCopyImage cl.enqueueCopyImage
-#define clEnqueueCopyImageToBuffer cl.enqueueCopyImageToBuffer
-#define clEnqueueCopyBufferToImage cl.enqueueCopyBufferToImage
-#define clEnqueueMapBuffer cl.enqueueMapBuffer
-#define clEnqueueMapImage cl.enqueueMapImage
-#define clEnqueueUnmapMemObject cl.enqueueUnmapMemObject
-#define clEnqueueNDRangeKernel cl.enqueueNDRAngeKernel
-#define clEnqueueTask cl.enqueueTask
-#define clEnqueueNativeKernel cl.enqueueNativeKernel
-#define clEnqueueMarker cl.enqueueMarker
-#define clEnqueueWaitForEvents cl.enqueueWaitForEvents
-#define clEnqueueBarrier cl.enqueueBarrier
-#define clGetExtensionFunctionAddress cl.getExtensionFunctionAddress
-
-#define CL_LOAD_FN(name, ref) \
-    ref = dlsym(dll,name); \
-    if (ref == NULL){ \
-        dlclose(dll); \
-        return CL_INVALID_PLATFORM; \
-    }
-
-
-#ifdef	__cplusplus
-}
-#endif
-
-#endif	/* DYNAMIC_CL_H */
--- a/vp8/common/opencl/filter_cl.c
+++ b/vp8/common/opencl/filter_cl.c
@@ -1,824 +0,0 @@
-/*
- *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-
-#include <stdlib.h>
-
-//ACW: Remove me after debugging.
-#include <stdio.h>
-#include <string.h>
-
-#include "vp8_opencl.h"
-#include "filter_cl.h"
-#include "../blockd.h"
-
-#define SIXTAP_FILTER_LEN 6
-
-const char *filterCompileOptions = "-Ivp8/common/opencl -DVP8_FILTER_WEIGHT=128 -DVP8_FILTER_SHIFT=7 -DFILTER_OFFSET";
-const char *filter_cl_file_name = "vp8/common/opencl/filter_cl.cl";
-
-#define STATIC_MEM 1
-#if STATIC_MEM
-static cl_mem int_mem = NULL;
-#endif
-
-void cl_destroy_filter(){
-
-    if (cl_data.filter_program)
-        clReleaseProgram(cl_data.filter_program);
-
-    //VP8_CL_RELEASE_KERNEL(cl_data.vp8_block_variation_kernel);
-#if !TWO_PASS_SIXTAP
-    VP8_CL_RELEASE_KERNEL(cl_data.vp8_sixtap_predict_kernel);
-    VP8_CL_RELEASE_KERNEL(cl_data.vp8_sixtap_predict8x8_kernel);
-    VP8_CL_RELEASE_KERNEL(cl_data.vp8_sixtap_predict8x4_kernel);
-    VP8_CL_RELEASE_KERNEL(cl_data.vp8_sixtap_predict16x16_kernel);
-#else
-    VP8_CL_RELEASE_KERNEL(cl_data.vp8_filter_block2d_first_pass_kernel);
-    VP8_CL_RELEASE_KERNEL(cl_data.vp8_filter_block2d_second_pass_kernel);
-#endif
-    //VP8_CL_RELEASE_KERNEL(cl_data.vp8_bilinear_predict4x4_kernel);
-    //VP8_CL_RELEASE_KERNEL(cl_data.vp8_bilinear_predict8x4_kernel);
-    //VP8_CL_RELEASE_KERNEL(cl_data.vp8_bilinear_predict8x8_kernel);
-    //VP8_CL_RELEASE_KERNEL(cl_data.vp8_bilinear_predict16x16_kernel);
-
-#if MEM_COPY_KERNEL
-    VP8_CL_RELEASE_KERNEL(cl_data.vp8_memcpy_kernel);
-#endif
-
-    VP8_CL_RELEASE_KERNEL(cl_data.vp8_filter_block2d_bil_first_pass_kernel);
-    VP8_CL_RELEASE_KERNEL(cl_data.vp8_filter_block2d_bil_second_pass_kernel);
-
-#if STATIC_MEM
-    if (int_mem != NULL)
-        clReleaseMemObject(int_mem);
-    int_mem = NULL;
-#endif
-
-    cl_data.filter_program = NULL;
-}
-
-int cl_init_filter() {
-    int err;
-
-
-    // Create the filter compute program from the file-defined source code
-    if ( cl_load_program(&cl_data.filter_program, filter_cl_file_name,
-            filterCompileOptions) != CL_SUCCESS )
-        return VP8_CL_TRIED_BUT_FAILED;
-
-    // Create the compute kernel in the program we wish to run
-#if TWO_PASS_SIXTAP
-    VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_filter_block2d_first_pass_kernel,"vp8_filter_block2d_first_pass_kernel");
-    VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_filter_block2d_second_pass_kernel,"vp8_filter_block2d_second_pass_kernel");
-    VP8_CL_CALC_LOCAL_SIZE(vp8_filter_block2d_first_pass_kernel,vp8_filter_block2d_first_pass_kernel_size);
-    VP8_CL_CALC_LOCAL_SIZE(vp8_filter_block2d_second_pass_kernel,vp8_filter_block2d_second_pass_kernel_size);
-#else
-    VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_sixtap_predict_kernel,"vp8_sixtap_predict_kernel");
-    VP8_CL_CALC_LOCAL_SIZE(vp8_sixtap_predict_kernel,vp8_sixtap_predict_kernel_size);
-    VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_sixtap_predict8x8_kernel,"vp8_sixtap_predict8x8_kernel");
-    VP8_CL_CALC_LOCAL_SIZE(vp8_sixtap_predict8x8_kernel,vp8_sixtap_predict8x8_kernel_size);
-    VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_sixtap_predict8x4_kernel,"vp8_sixtap_predict8x4_kernel");
-    VP8_CL_CALC_LOCAL_SIZE(vp8_sixtap_predict8x4_kernel,vp8_sixtap_predict8x4_kernel_size);
-    VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_sixtap_predict16x16_kernel,"vp8_sixtap_predict16x16_kernel");
-    VP8_CL_CALC_LOCAL_SIZE(vp8_sixtap_predict16x16_kernel,vp8_sixtap_predict16x16_kernel_size);
-#endif
-    
-    //VP8_CL_CALC_LOCAL_SIZE(vp8_filter_block2d_bil_first_pass_kernel,vp8_filter_block2d_bil_first_pass_kernel_size);
-    //VP8_CL_CALC_LOCAL_SIZE(vp8_filter_block2d_bil_second_pass_kernel,vp8_filter_block2d_bil_second_pass_kernel_size);
-    VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_filter_block2d_bil_first_pass_kernel,"vp8_filter_block2d_bil_first_pass_kernel");
-    VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_filter_block2d_bil_second_pass_kernel,"vp8_filter_block2d_bil_second_pass_kernel");
-
-
-    //VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_bilinear_predict4x4_kernel,"vp8_bilinear_predict4x4_kernel");
-    //VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_bilinear_predict8x4_kernel,"vp8_bilinear_predict8x4_kernel");
-    //VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_bilinear_predict8x8_kernel,"vp8_bilinear_predict8x8_kernel");
-    //VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_bilinear_predict16x16_kernel,"vp8_bilinear_predict16x16_kernel");
-
-#if MEM_COPY_KERNEL
-    VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_memcpy_kernel,"vp8_memcpy_kernel");
-    VP8_CL_CALC_LOCAL_SIZE(vp8_memcpy_kernel,vp8_memcpy_kernel_size);
-#endif
-
-#if STATIC_MEM
-    VP8_CL_CREATE_BUF(NULL, int_mem, NULL, sizeof(cl_int)*21*16, NULL, ,err);
-#endif
-
-    return CL_SUCCESS;
-}
-
-void vp8_filter_block2d_first_pass_cl(
-    cl_command_queue cq,
-    cl_mem src_mem,
-    int src_offset,
-    cl_mem int_mem,
-    unsigned int src_pixels_per_line,
-    unsigned int int_height,
-    unsigned int int_width,
-    int xoffset
-){
-    int err;
-    size_t global = int_width*int_height;
-    size_t local = cl_data.vp8_filter_block2d_first_pass_kernel_size;
-    if (local > global)
-        local = global;
-
-    err =  clSetKernelArg(cl_data.vp8_filter_block2d_first_pass_kernel, 0, sizeof (cl_mem), &src_mem);
-    err |= clSetKernelArg(cl_data.vp8_filter_block2d_first_pass_kernel, 1, sizeof (int), &src_offset);
-    err |= clSetKernelArg(cl_data.vp8_filter_block2d_first_pass_kernel, 2, sizeof (cl_mem), &int_mem);
-    err |= clSetKernelArg(cl_data.vp8_filter_block2d_first_pass_kernel, 3, sizeof (cl_uint), &src_pixels_per_line);
-    err |= clSetKernelArg(cl_data.vp8_filter_block2d_first_pass_kernel, 4, sizeof (cl_uint), &int_height);
-    err |= clSetKernelArg(cl_data.vp8_filter_block2d_first_pass_kernel, 5, sizeof (cl_int), &int_width);
-    err |= clSetKernelArg(cl_data.vp8_filter_block2d_first_pass_kernel, 6, sizeof (int), &xoffset);
-    VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
-        "Error: Failed to set kernel arguments!\n",
-        ,
-    );
-
-    /* Execute the kernel */
-    err = clEnqueueNDRangeKernel( cq, cl_data.vp8_filter_block2d_first_pass_kernel, 1, NULL, &global, &local , 0, NULL, NULL);
-    VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
-        "Error: Failed to execute kernel!\n",
-        printf("err = %d\n",err);,
-    );
-}
-
-void vp8_filter_block2d_second_pass_cl(
-    cl_command_queue cq,
-    cl_mem int_mem,
-    int int_offset,
-    cl_mem dst_mem,
-    int dst_offset,
-    int dst_pitch,
-    unsigned int output_height,
-    unsigned int output_width,
-    int yoffset
-){
-    int err;
-    size_t global = output_width*output_height;
-    size_t local = cl_data.vp8_filter_block2d_second_pass_kernel_size;
-    if (local > global){
-        //printf("Local is now %ld\n",global);
-        local = global;
-    }
-
-    /* Set kernel arguments */
-    err =  clSetKernelArg(cl_data.vp8_filter_block2d_second_pass_kernel, 0, sizeof (cl_mem), &int_mem);
-    err |= clSetKernelArg(cl_data.vp8_filter_block2d_second_pass_kernel, 1, sizeof (int), &int_offset);
-    err |= clSetKernelArg(cl_data.vp8_filter_block2d_second_pass_kernel, 2, sizeof (cl_mem), &dst_mem);
-    err |= clSetKernelArg(cl_data.vp8_filter_block2d_second_pass_kernel, 3, sizeof (int), &dst_offset);
-    err |= clSetKernelArg(cl_data.vp8_filter_block2d_second_pass_kernel, 4, sizeof (int), &dst_pitch);
-    err |= clSetKernelArg(cl_data.vp8_filter_block2d_second_pass_kernel, 5, sizeof (int), &output_width);
-    err |= clSetKernelArg(cl_data.vp8_filter_block2d_second_pass_kernel, 6, sizeof (int), &output_width);
-    err |= clSetKernelArg(cl_data.vp8_filter_block2d_second_pass_kernel, 7, sizeof (int), &output_height);
-    err |= clSetKernelArg(cl_data.vp8_filter_block2d_second_pass_kernel, 8, sizeof (int), &output_width);
-    err |= clSetKernelArg(cl_data.vp8_filter_block2d_second_pass_kernel, 9, sizeof (int), &yoffset);
-    VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
-        "Error: Failed to set kernel arguments!\n",
-        ,
-    );
-
-    /* Execute the kernel */
-    err = clEnqueueNDRangeKernel( cq, cl_data.vp8_filter_block2d_second_pass_kernel, 1, NULL, &global, &local , 0, NULL, NULL);
-    VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
-        "Error: Failed to execute kernel!\n",
-        printf("err = %d\n",err);,
-    );
-}
-
-void vp8_sixtap_single_pass(
-    cl_command_queue cq,
-    cl_kernel kernel,
-    size_t local,
-    size_t global,
-    cl_mem src_mem,
-    cl_mem dst_mem,
-    unsigned char *src_base,
-    int src_offset,
-    size_t src_len,
-    int src_pixels_per_line,
-    int xoffset,
-    int yoffset,
-    unsigned char *dst_base,
-    int dst_offset,
-    int dst_pitch,
-    size_t dst_len
-){
-    int err;
-
-#if !STATIC_MEM
-    cl_mem int_mem;
-#endif
-
-    int free_src = 0, free_dst = 0;
-
-    if (local > global){
-        local = global;
-    }
-
-    /* Make space for kernel input/output data.
-     * Initialize the buffer as well if needed.
-     */
-    if (src_mem == NULL){
-        VP8_CL_CREATE_BUF( cq, src_mem,, sizeof (unsigned char) * src_len, src_base-2,,);
-        src_offset = 2;
-        free_src = 1;
-    } else {
-        src_offset -= 2*src_pixels_per_line;
-    }
-
-    if (dst_mem == NULL){
-        VP8_CL_CREATE_BUF( cq, dst_mem,, sizeof (unsigned char) * dst_len + dst_offset, dst_base,, );
-        free_dst = 1;
-    }
-
-#if !STATIC_MEM
-    CL_CREATE_BUF( cq, int_mem,, sizeof(cl_int)*FData_height*FData_width, NULL,, );
-#endif
-
-    err =  clSetKernelArg(kernel, 0, sizeof (cl_mem), &src_mem);
-    err |= clSetKernelArg(kernel, 1, sizeof (int), &src_offset);
-    err |= clSetKernelArg(kernel, 2, sizeof (cl_int), &src_pixels_per_line);
-    err |= clSetKernelArg(kernel, 3, sizeof (cl_int), &xoffset);
-    err |= clSetKernelArg(kernel, 4, sizeof (cl_int), &yoffset);
-    err |= clSetKernelArg(kernel, 5, sizeof (cl_mem), &dst_mem);
-    err |= clSetKernelArg(kernel, 6, sizeof (cl_int), &dst_offset);
-    err |= clSetKernelArg(kernel, 7, sizeof (int), &dst_pitch);
-    VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
-        "Error: Failed to set kernel arguments!\n",
-        ,
-    );
-
-    /* Execute the kernel */
-    err = clEnqueueNDRangeKernel( cq, kernel, 1, NULL, &global, &local , 0, NULL, NULL);
-    VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
-        "Error: Failed to execute kernel!\n",
-        printf("err = %d\n",err);,
-    );
-
-    if (free_src == 1)
-        clReleaseMemObject(src_mem);
-
-    if (free_dst == 1){
-        /* Read back the result data from the device */
-        err = clEnqueueReadBuffer(cq, dst_mem, CL_FALSE, 0, sizeof (unsigned char) * dst_len + dst_offset, dst_base, 0, NULL, NULL);
-        VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
-            "Error: Failed to read output array!\n",
-            ,
-        );
-        clReleaseMemObject(dst_mem);
-    }
-}
-
-void vp8_sixtap_run_cl(
-    cl_command_queue cq,
-    cl_mem src_mem,
-    cl_mem dst_mem,
-    unsigned char *src_base,
-    int src_offset,
-    size_t src_len,
-    int src_pixels_per_line,
-    int xoffset,
-    int yoffset,
-    unsigned char *dst_base,
-    int dst_offset,
-    int dst_pitch,
-    size_t dst_len,
-    unsigned int FData_height,
-    unsigned int FData_width,
-    unsigned int output_height,
-    unsigned int output_width,
-    int int_offset
-)
-{
-    int err;
-
-#if !STATIC_MEM
-    cl_mem int_mem;
-#endif
-
-    int free_src = 0, free_dst = 0;
-
-    /* Make space for kernel input/output data.
-     * Initialize the buffer as well if needed.
-     */
-    if (src_mem == NULL){
-        VP8_CL_CREATE_BUF( cq, src_mem,, sizeof (unsigned char) * src_len, src_base-2,,);
-        src_offset = 2;
-        free_src = 1;
-    } else {
-        src_offset -= 2*src_pixels_per_line;
-    }
-
-    if (dst_mem == NULL){
-        VP8_CL_CREATE_BUF( cq, dst_mem,, sizeof (unsigned char) * dst_len + dst_offset, dst_base,, );
-        free_dst = 1;
-    }
-
-#if !STATIC_MEM
-    CL_CREATE_BUF( cq, int_mem,, sizeof(cl_int)*FData_height*FData_width, NULL,, );
-#endif
-
-    vp8_filter_block2d_first_pass_cl(
-        cq, src_mem, src_offset, int_mem, src_pixels_per_line,
-        FData_height, FData_width, xoffset
-    );
-
-    vp8_filter_block2d_second_pass_cl(cq,int_mem,int_offset,dst_mem,dst_offset,dst_pitch,
-            output_height,output_width,yoffset);
-
-    if (free_src == 1)
-        clReleaseMemObject(src_mem);
-
-    if (free_dst == 1){
-        /* Read back the result data from the device */
-        err = clEnqueueReadBuffer(cq, dst_mem, CL_FALSE, 0, sizeof (unsigned char) * dst_len + dst_offset, dst_base, 0, NULL, NULL);
-        VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
-            "Error: Failed to read output array!\n",
-            ,
-        );
-        clReleaseMemObject(dst_mem);
-    }
-
-#if !STATIC_MEM
-    clReleaseMemObject(int_mem);
-#endif
-}
-
-void vp8_sixtap_predict4x4_cl
-(
-    cl_command_queue cq,
-    unsigned char *src_base,
-    cl_mem src_mem,
-    int src_offset,
-    int src_pixels_per_line,
-    int xoffset,
-    int yoffset,
-    unsigned char *dst_base,
-    cl_mem dst_mem,
-    int dst_offset,
-    int dst_pitch
-) {
-
-    int output_width=4, output_height=4, FData_height=9, FData_width=4;
-
-    //Size of output to transfer
-    int dst_len = DST_LEN(dst_pitch,output_height,output_width);
-    int src_len = SIXTAP_SRC_LEN(FData_width,FData_height,src_pixels_per_line);
-
-#if TWO_PASS_SIXTAP
-    int int_offset = 8;
-    unsigned char *src_ptr = src_base + src_offset;
-
-    vp8_sixtap_run_cl(cq, src_mem, dst_mem,
-            (src_ptr-2*src_pixels_per_line),src_offset, src_len,
-            src_pixels_per_line, xoffset,yoffset,dst_base,dst_offset,
-            dst_pitch,dst_len,FData_height,FData_width,output_height,
-            output_width,int_offset
-    );
-#else
-    vp8_sixtap_single_pass(
-            cq,
-            cl_data.vp8_sixtap_predict_kernel,
-            cl_data.vp8_sixtap_predict_kernel_size,
-            FData_height*FData_width,
-            src_mem,
-            dst_mem,
-            src_base,
-            src_offset,
-            src_len,
-            src_pixels_per_line,
-            xoffset,
-            yoffset,
-            dst_base,
-            dst_offset,
-            dst_pitch,
-            dst_len
-    );
-#endif
-
-
-    return;
-}
-
-void vp8_sixtap_predict8x8_cl
-(
-    cl_command_queue cq,
-    unsigned char *src_base,
-    cl_mem src_mem,
-    int src_offset,
-    int src_pixels_per_line,
-    int xoffset,
-    int yoffset,
-    unsigned char *dst_base,
-    cl_mem dst_mem,
-    int dst_offset,
-    int dst_pitch
-) {
-    int output_width=8, output_height=8, FData_height=13, FData_width=8;
-
-    //Size of output to transfer
-    int dst_len = DST_LEN(dst_pitch,output_height,output_width);
-    int src_len = SIXTAP_SRC_LEN(FData_width,FData_height,src_pixels_per_line);
-
-#if TWO_PASS_SIXTAP
-    int int_offset = 16;
-    unsigned char *src_ptr = src_base + src_offset;
-
-    vp8_sixtap_run_cl(cq, src_mem, dst_mem,
-            (src_ptr-2*src_pixels_per_line),src_offset, src_len,
-            src_pixels_per_line, xoffset,yoffset,dst_base,dst_offset,
-            dst_pitch,dst_len,FData_height,FData_width,output_height,
-            output_width,int_offset
-    );
-#else
-    vp8_sixtap_single_pass(
-            cq,
-            cl_data.vp8_sixtap_predict8x8_kernel,
-            cl_data.vp8_sixtap_predict8x8_kernel_size,
-            FData_height*FData_width,
-            src_mem,
-            dst_mem,
-            src_base,
-            src_offset,
-            src_len,
-            src_pixels_per_line,
-            xoffset,
-            yoffset,
-            dst_base,
-            dst_offset,
-            dst_pitch,
-            dst_len
-    );
-#endif
-
-    return;
-}
-
-void vp8_sixtap_predict8x4_cl
-(
-    cl_command_queue cq,
-    unsigned char *src_base,
-    cl_mem src_mem,
-    int src_offset,
-    int src_pixels_per_line,
-    int xoffset,
-    int yoffset,
-    unsigned char *dst_base,
-    cl_mem dst_mem,
-    int dst_offset,
-    int dst_pitch
-) {
-
-    int output_width=8, output_height=4, FData_height=9, FData_width=8;
-
-    //Size of output to transfer
-    int dst_len = DST_LEN(dst_pitch,output_height,output_width);
-    int src_len = SIXTAP_SRC_LEN(FData_width,FData_height,src_pixels_per_line);
-
-#if TWO_PASS_SIXTAP
-    int int_offset = 16;
-    unsigned char *src_ptr = src_base + src_offset;
-    
-    vp8_sixtap_run_cl(cq, src_mem, dst_mem,
-            (src_ptr-2*src_pixels_per_line),src_offset, src_len,
-            src_pixels_per_line, xoffset,yoffset,dst_base,dst_offset,
-            dst_pitch,dst_len,FData_height,FData_width,output_height,
-            output_width,int_offset
-    );
-#else
-    vp8_sixtap_single_pass(
-            cq,
-            cl_data.vp8_sixtap_predict8x4_kernel,
-            cl_data.vp8_sixtap_predict8x4_kernel_size,
-            FData_height*FData_width,
-            src_mem,
-            dst_mem,
-            src_base,
-            src_offset,
-            src_len,
-            src_pixels_per_line,
-            xoffset,
-            yoffset,
-            dst_base,
-            dst_offset,
-            dst_pitch,
-            dst_len
-    );
-#endif
-
-    return;
-}
-
-void vp8_sixtap_predict16x16_cl
-(
-    cl_command_queue cq,
-    unsigned char *src_base,
-    cl_mem src_mem,
-    int src_offset,
-    int src_pixels_per_line,
-    int xoffset,
-    int yoffset,
-    unsigned char *dst_base,
-    cl_mem dst_mem,
-    int dst_offset,
-    int dst_pitch
-) {
-
-    int output_width=16, output_height=16, FData_height=21, FData_width=16;
-
-    //Size of output to transfer
-    int dst_len = DST_LEN(dst_pitch,output_height,output_width);
-    int src_len = SIXTAP_SRC_LEN(FData_width,FData_height,src_pixels_per_line);
-
-#if TWO_PASS_SIXTAP
-    int int_offset = 32;
-    unsigned char *src_ptr = src_base + src_offset;
-
-    vp8_sixtap_run_cl(cq, src_mem, dst_mem,
-            (src_ptr-2*src_pixels_per_line),src_offset, src_len,
-            src_pixels_per_line, xoffset,yoffset,dst_base,dst_offset,
-            dst_pitch,dst_len,FData_height,FData_width,output_height,
-            output_width,int_offset
-    );
-#else
-    vp8_sixtap_single_pass(
-            cq,
-            cl_data.vp8_sixtap_predict16x16_kernel,
-            cl_data.vp8_sixtap_predict16x16_kernel_size,
-            FData_height*FData_width,
-            src_mem,
-            dst_mem,
-            src_base,
-            src_offset,
-            src_len,
-            src_pixels_per_line,
-            xoffset,
-            yoffset,
-            dst_base,
-            dst_offset,
-            dst_pitch,
-            dst_len
-    );
-#endif
-
-    return;
-
-}
-
-
-
-void vp8_filter_block2d_bil_first_pass_cl(
-    cl_command_queue cq,
-    unsigned char *src_base,
-    cl_mem src_mem,
-    int src_offset,
-    cl_mem int_mem,
-    int src_pixels_per_line,
-    int height,
-    int width,
-    int xoffset
-)
-{
-    int err;
-    size_t global = width*height;
-    int free_src = 0;
-
-    if (src_mem == NULL){
-        int src_len = BIL_SRC_LEN(width,height,src_pixels_per_line);
-
-        /*Make space for kernel input/output data. Initialize the buffer as well if needed. */
-        VP8_CL_CREATE_BUF(cq, src_mem, CL_MEM_READ_ONLY|CL_MEM_COPY_HOST_PTR,
-            sizeof (unsigned char) * src_len, src_base+src_offset,,
-        );
-        src_offset = 0; //Set to zero as long as src_mem starts at base+offset
-        free_src = 1;
-    }
-
-    err =  clSetKernelArg(cl_data.vp8_filter_block2d_bil_first_pass_kernel, 0, sizeof (cl_mem), &src_mem);
-    err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_first_pass_kernel, 1, sizeof (int), &src_offset);
-    err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_first_pass_kernel, 2, sizeof (cl_mem), &int_mem);
-    err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_first_pass_kernel, 3, sizeof (int), &src_pixels_per_line);
-    err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_first_pass_kernel, 4, sizeof (int), &height);
-    err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_first_pass_kernel, 5, sizeof (int), &width);
-    err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_first_pass_kernel, 6, sizeof (int), &xoffset);
-    VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
-        "Error: Failed to set kernel arguments!\n",
-        ,
-    );
-
-    /* Execute the kernel */
-    err = clEnqueueNDRangeKernel( cq, cl_data.vp8_filter_block2d_bil_first_pass_kernel, 1, NULL, &global, NULL , 0, NULL, NULL);
-    VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
-        "Error: Failed to execute kernel!\n",
-        printf("err = %d\n",err);,
-    );
-
-    if (free_src == 1)
-        clReleaseMemObject(src_mem);
-}
-
-
-void vp8_filter_block2d_bil_second_pass_cl(
-    cl_command_queue cq,
-    cl_mem int_mem,
-    unsigned char *dst_base,
-    cl_mem dst_mem,
-    int dst_offset,
-    int dst_pitch,
-    int height,
-    int width,
-    int yoffset
-)
-{
-    int err;
-    size_t global = width*height;
-
-    //Size of output data
-    int dst_len = DST_LEN(dst_pitch,height,width);
-
-    int free_dst = 0;
-    if (dst_mem == NULL){
-        VP8_CL_CREATE_BUF(cq, dst_mem, CL_MEM_WRITE_ONLY|CL_MEM_COPY_HOST_PTR,
-            sizeof (unsigned char) * dst_len + dst_offset, dst_base,,
-        );
-        free_dst = 1;
-    }
-
-    err =  clSetKernelArg(cl_data.vp8_filter_block2d_bil_second_pass_kernel, 0, sizeof (cl_mem), &int_mem);
-    err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_second_pass_kernel, 1, sizeof (cl_mem), &dst_mem);
-    err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_second_pass_kernel, 2, sizeof (int), &dst_offset);
-    err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_second_pass_kernel, 3, sizeof (int), &dst_pitch);
-    err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_second_pass_kernel, 4, sizeof (int), &height);
-    err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_second_pass_kernel, 5, sizeof (int), &width);
-    err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_second_pass_kernel, 6, sizeof (int), &yoffset);
-    VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
-        "Error: Failed to set kernel arguments!\n",
-        ,
-    );
-
-    /* Execute the kernel */
-    err = clEnqueueNDRangeKernel( cq, cl_data.vp8_filter_block2d_bil_second_pass_kernel, 1, NULL, &global, NULL , 0, NULL, NULL);
-    VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
-        "Error: Failed to execute kernel!\n",
-        printf("err = %d\n",err);,
-    );
-
-    if (free_dst == 1){
-        /* Read back the result data from the device */
-        err = clEnqueueReadBuffer(cq, dst_mem, CL_FALSE, 0, sizeof (unsigned char) * dst_len + dst_offset, dst_base, 0, NULL, NULL);
-        VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
-            "Error: Failed to read output array!\n",
-            ,
-        );
-        clReleaseMemObject(dst_mem);
-    }
-
-}
-
-void vp8_bilinear_predict4x4_cl
-(
-    cl_command_queue cq,
-    unsigned char *src_base,
-    cl_mem src_mem,
-    int src_offset,
-    int src_pixels_per_line,
-    int xoffset,
-    int yoffset,
-    unsigned char *dst_base,
-    cl_mem dst_mem,
-    int dst_offset,
-    int dst_pitch
-) {
-
-    const int height = 4, width = 4;
-
-#if !STATIC_MEM
-    int err;
-    cl_mem int_mem = NULL;
-    VP8_CL_CREATE_BUF(NULL, int_mem, NULL, sizeof(cl_int)*21*16, NULL, ,);
-#endif
-    
-    /* First filter 1-D horizontally... */
-    vp8_filter_block2d_bil_first_pass_cl(cq, src_base, src_mem, src_offset, int_mem, src_pixels_per_line, height + 1, width, xoffset);
-
-    /* then 1-D vertically... */
-    vp8_filter_block2d_bil_second_pass_cl(cq, int_mem, dst_base, dst_mem, dst_offset, dst_pitch, height, width, yoffset);
-
-#if !STATIC_MEM
-    clReleaseMemObject(int_mem);
-#endif
-
-}
-
-void vp8_bilinear_predict8x8_cl
-(
-    cl_command_queue cq,
-    unsigned char *src_base,
-    cl_mem src_mem,
-    int src_offset,
-    int src_pixels_per_line,
-    int xoffset,
-    int yoffset,
-    unsigned char *dst_base,
-    cl_mem dst_mem,
-    int dst_offset,
-    int dst_pitch
-) {
-
-    const int height = 8, width = 8;
-
-#if !STATIC_MEM
-    int err;
-    cl_mem int_mem = NULL;
-    VP8_CL_CREATE_BUF(NULL, int_mem, NULL, sizeof(cl_int)*21*16, NULL, ,);
-#endif
-
-    /* First filter 1-D horizontally... */
-    vp8_filter_block2d_bil_first_pass_cl(cq, src_base, src_mem, src_offset, int_mem, src_pixels_per_line, height + 1, width, xoffset);
-
-    /* then 1-D vertically... */
-    vp8_filter_block2d_bil_second_pass_cl(cq, int_mem, dst_base, dst_mem, dst_offset, dst_pitch, height, width, yoffset);
-
-#if !STATIC_MEM
-    clReleaseMemObject(int_mem);
-#endif
-    
-}
-
-void vp8_bilinear_predict8x4_cl
-(
-    cl_command_queue cq,
-    unsigned char *src_base,
-    cl_mem src_mem,
-    int src_offset,
-    int src_pixels_per_line,
-    int xoffset,
-    int yoffset,
-    unsigned char *dst_base,
-    cl_mem dst_mem,
-    int dst_offset,
-    int dst_pitch
-) {
-
-    const int height = 4, width = 8;
-
-#if !STATIC_MEM
-    int err;
-    cl_mem int_mem = NULL;
-    VP8_CL_CREATE_BUF(NULL, int_mem, NULL, sizeof(cl_int)*21*16, NULL, ,);
-#endif
-
-    /* First filter 1-D horizontally... */
-    vp8_filter_block2d_bil_first_pass_cl(cq, src_base, src_mem, src_offset, int_mem, src_pixels_per_line, height + 1, width, xoffset);
-
-    /* then 1-D vertically... */
-    vp8_filter_block2d_bil_second_pass_cl(cq, int_mem, dst_base, dst_mem, dst_offset, dst_pitch, height, width, yoffset);
-
-#if !STATIC_MEM
-    clReleaseMemObject(int_mem);
-#endif
-
-}
-
-void vp8_bilinear_predict16x16_cl
-(
-    cl_command_queue cq,
-    unsigned char *src_base,
-    cl_mem src_mem,
-    int src_offset,
-    int src_pixels_per_line,
-    int xoffset,
-    int yoffset,
-    unsigned char *dst_base,
-    cl_mem dst_mem,
-    int dst_offset,
-    int dst_pitch
-) {
-
-    const int height = 16, width = 16;
-
-#if !STATIC_MEM
-    int err;
-    cl_mem int_mem = NULL;
-    VP8_CL_CREATE_BUF(NULL, int_mem, NULL, sizeof(cl_int)*21*16, NULL, ,);
-#endif
-
-    /* First filter 1-D horizontally... */
-    vp8_filter_block2d_bil_first_pass_cl(cq, src_base, src_mem, src_offset, int_mem, src_pixels_per_line, height + 1, width, xoffset);
-
-    /* then 1-D vertically... */
-    vp8_filter_block2d_bil_second_pass_cl(cq, int_mem, dst_base, dst_mem, dst_offset, dst_pitch, height, width, yoffset);
-
-#if !STATIC_MEM
-    clReleaseMemObject(int_mem);
-#endif
-
-}
--- a/vp8/common/opencl/filter_cl.cl
+++ b/vp8/common/opencl/filter_cl.cl
@@ -1,562 +0,0 @@
-#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable
-#pragma OPENCL EXTENSION cl_amd_printf : enable
-
-__constant int bilinear_filters[8][2] = {
-    { 128, 0},
-    { 112, 16},
-    { 96, 32},
-    { 80, 48},
-    { 64, 64},
-    { 48, 80},
-    { 32, 96},
-    { 16, 112}
-};
-
-__constant short sub_pel_filters[8][8] = {
-    //These were originally 8x6, but are padded for vector ops
-    { 0, 0, 128, 0, 0, 0, 0, 0}, /* note that 1/8 pel positions are just as per alpha -0.5 bicubic */
-    { 0, -6, 123, 12, -1, 0, 0, 0},
-    { 2, -11, 108, 36, -8, 1, 0, 0}, /* New 1/4 pel 6 tap filter */
-    { 0, -9, 93, 50, -6, 0, 0, 0},
-    { 3, -16, 77, 77, -16, 3, 0, 0}, /* New 1/2 pel 6 tap filter */
-    { 0, -6, 50, 93, -9, 0, 0, 0},
-    { 1, -8, 36, 108, -11, 2, 0, 0}, /* New 1/4 pel 6 tap filter */
-    { 0, -1, 12, 123, -6, 0, 0, 0},
-};
-
-
-kernel void vp8_filter_block2d_first_pass_kernel(
-    __global unsigned char *src_base,
-    int src_offset,
-    __global int *output_ptr,
-    unsigned int src_pixels_per_line,
-    unsigned int output_height,
-    unsigned int output_width,
-    int filter_offset
-){
-    uint tid = get_global_id(0);
-
-    global unsigned char *src_ptr = &src_base[src_offset];
-    //Note that src_offset will be reset later, which is why we use it now
-
-    int Temp;
-
-    __constant short *vp8_filter = sub_pel_filters[filter_offset];
-
-    if (tid < (output_width*output_height)){
-        src_offset = tid + (tid/output_width * (src_pixels_per_line - output_width));
-
-        Temp = (int)(src_ptr[src_offset - 2] * vp8_filter[0]) +
-           (int)(src_ptr[src_offset - 1] * vp8_filter[1]) +
-           (int)(src_ptr[src_offset]     * vp8_filter[2]) +
-           (int)(src_ptr[src_offset + 1] * vp8_filter[3]) +
-           (int)(src_ptr[src_offset + 2] * vp8_filter[4]) +
-           (int)(src_ptr[src_offset + 3] * vp8_filter[5]) +
-           (VP8_FILTER_WEIGHT >> 1);      /* Rounding */
-
-        /* Normalize back to 0-255 */
-        Temp = Temp >> VP8_FILTER_SHIFT;
-
-        if (Temp < 0)
-            Temp = 0;
-        else if ( Temp > 255 )
-            Temp = 255;
-
-        output_ptr[tid] = Temp;
-    }
-
-}
-
-kernel void vp8_filter_block2d_second_pass_kernel
-(
-    __global int *src_base,
-    int src_offset,
-    __global unsigned char *output_base,
-    int output_offset,
-    int output_pitch,
-    unsigned int src_pixels_per_line,
-    unsigned int pixel_step,
-    unsigned int output_height,
-    unsigned int output_width,
-    int filter_offset
-) {
-
-    uint i = get_global_id(0);
-
-    global int *src_ptr = &src_base[src_offset];
-    global unsigned char *output_ptr = &output_base[output_offset];
-
-    int out_offset; //Not same as output_offset...
-    int Temp;
-    int PS2 = 2*(int)pixel_step;
-    int PS3 = 3*(int)pixel_step;
-
-    unsigned int src_increment = src_pixels_per_line - output_width;
-
-    __constant short *vp8_filter = sub_pel_filters[filter_offset];
-
-    if (i < (output_width * output_height)){
-        out_offset = i/output_width;
-        src_offset = out_offset;
-
-        src_offset = i + (src_offset * src_increment);
-        out_offset = i%output_width + (out_offset * output_pitch);
-
-        /* Apply filter */
-        Temp = ((int)src_ptr[src_offset - PS2] * vp8_filter[0]) +
-           ((int)src_ptr[src_offset -(int)pixel_step] * vp8_filter[1]) +
-           ((int)src_ptr[src_offset]                  * vp8_filter[2]) +
-           ((int)src_ptr[src_offset + pixel_step]     * vp8_filter[3]) +
-           ((int)src_ptr[src_offset + PS2]       * vp8_filter[4]) +
-           ((int)src_ptr[src_offset + PS3]       * vp8_filter[5]) +
-           (VP8_FILTER_WEIGHT >> 1);   /* Rounding */
-
-        /* Normalize back to 0-255 */
-        Temp = Temp >> VP8_FILTER_SHIFT;
-        if (Temp < 0)
-            Temp = 0;
-        else if (Temp > 255)
-            Temp = 255;
-
-        output_ptr[out_offset] = (unsigned char)Temp;
-    }
-}
-
-
-kernel void vp8_filter_block2d_bil_first_pass_kernel(
-    __global unsigned char *src_base,
-    int src_offset,
-    __global int *output_ptr,
-    unsigned int src_pixels_per_line,
-    unsigned int output_height,
-    unsigned int output_width,
-    int filter_offset
-)
-{
-    uint tid = get_global_id(0);
-
-    if (tid < output_width * output_height){
-        global unsigned char *src_ptr = &src_base[src_offset];
-
-        unsigned int i, j;
-        __constant int *vp8_filter = bilinear_filters[filter_offset];
-
-        unsigned int out_row,out_offset;
-        int src_increment = src_pixels_per_line - output_width;
-
-        i = tid / output_width;
-        j = tid % output_width;
-
-        src_offset = i*(output_width+src_increment) + j;
-        out_row = output_width * i;
-
-        out_offset = out_row + j;
-
-        /* Apply bilinear filter */
-        output_ptr[out_offset] = (((int)src_ptr[src_offset]   * vp8_filter[0]) +
-                 ((int)src_ptr[src_offset+1] * vp8_filter[1]) +
-                 (VP8_FILTER_WEIGHT / 2)) >> VP8_FILTER_SHIFT;
-    }
-}
-
-kernel void vp8_filter_block2d_bil_second_pass_kernel
-(
-    __global int *src_ptr,
-    __global unsigned char *output_base,
-    int output_offset,
-    int output_pitch,
-    unsigned int output_height,
-    unsigned int output_width,
-    int filter_offset
-)
-{
-
-    uint tid = get_global_id(0);
-
-    if (tid < output_width * output_height){
-        global unsigned char *output_ptr = &output_base[output_offset];
-
-        unsigned int i, j;
-        int Temp;
-        __constant int *vp8_filter = bilinear_filters[filter_offset];
-
-        int out_offset;
-        int src_offset;
-
-        i = tid / output_width;
-        j = tid % output_width;
-
-        src_offset = i*(output_width) + j;
-        out_offset = i*output_pitch + j;
-
-        /* Apply filter */
-        Temp = ((int)src_ptr[src_offset]         * vp8_filter[0]) +
-               ((int)src_ptr[src_offset+output_width] * vp8_filter[1]) +
-               (VP8_FILTER_WEIGHT / 2);
-
-        output_ptr[out_offset++] = (unsigned int)(Temp >> VP8_FILTER_SHIFT);
-    }
-}
-
-
-
-
-//Called from reconinter_cl.c
-kernel void vp8_memcpy_kernel(
-    global unsigned char *src_base,
-    int src_offset,
-    int src_stride,
-    global unsigned char *dst_base,
-    int dst_offset,
-    int dst_stride,
-    int num_bytes,
-    int num_iter
-){
-
-    int i,r;
-    global unsigned char *src = &src_base[src_offset];
-    global unsigned char *dst = &dst_base[dst_offset];
-    src_offset = dst_offset = 0;
-
-    r = get_global_id(1);
-    if (r < get_global_size(1)){
-        i = get_global_id(0);
-        if (i < get_global_size(0)){
-            src_offset = r*src_stride + i;
-            dst_offset = r*dst_stride + i;
-            dst[dst_offset] = src[src_offset];
-        }
-    }
-}
-
-//Not used currently.
-void vp8_memset_short(
-    global short *mem,
-    int offset,
-    short newval,
-    unsigned int size
-)
-{
-    int tid = get_global_id(0);
-
-    if (tid < (size/2)){
-        mem[offset+tid/2] = newval;
-    }
-}
-
-
-
-__kernel void vp8_bilinear_predict4x4_kernel
-(
-        __global unsigned char *src_base,
-        int src_offset,
-        int src_pixels_per_line,
-        int xoffset,
-        int yoffset,
-        __global unsigned char *dst_base,
-        int dst_offset,
-        int dst_pitch,
-        __global int *int_mem
-)
-{
-    int Height = 4, Width = 4;
-
-    /* First filter 1-D horizontally... */
-    vp8_filter_block2d_bil_first_pass_kernel(src_base, src_offset, int_mem, src_pixels_per_line, Height + 1, Width, xoffset);
-
-    /* then 1-D vertically... */
-    vp8_filter_block2d_bil_second_pass_kernel(int_mem, dst_base, dst_offset, dst_pitch, Height, Width, yoffset);
-}
-
-__kernel void vp8_bilinear_predict8x8_kernel
-(
-    __global unsigned char *src_base,
-    int src_offset,
-    int src_pixels_per_line,
-    int xoffset,
-    int yoffset,
-    __global unsigned char *dst_base,
-    int dst_offset,
-    int dst_pitch,
-    __global int *int_mem
-)
-{
-    int Height = 8, Width = 8;
-
-    /* First filter 1-D horizontally... */
-    vp8_filter_block2d_bil_first_pass_kernel(src_base, src_offset, int_mem, src_pixels_per_line, Height + 1, Width, xoffset);
-
-    /* then 1-D vertically... */
-    vp8_filter_block2d_bil_second_pass_kernel(int_mem, dst_base, dst_offset, dst_pitch, Height, Width, yoffset);
-
-}
-
-__kernel void vp8_bilinear_predict8x4_kernel
-(
-    __global unsigned char *src_base,
-    int src_offset,
-    int src_pixels_per_line,
-    int xoffset,
-    int yoffset,
-    __global unsigned char *dst_base,
-    int dst_offset,
-    int dst_pitch,
-    __global int *int_mem
-)
-{
-    int Height = 4, Width = 8;
-
-    /* First filter 1-D horizontally... */
-    vp8_filter_block2d_bil_first_pass_kernel(src_base, src_offset, int_mem, src_pixels_per_line, Height + 1, Width, xoffset);
-
-    /* then 1-D vertically... */
-    vp8_filter_block2d_bil_second_pass_kernel(int_mem, dst_base, dst_offset, dst_pitch, Height, Width, yoffset);
-}
-
-__kernel void vp8_bilinear_predict16x16_kernel
-(
-    __global unsigned char *src_base,
-    int src_offset,
-    int src_pixels_per_line,
-    int xoffset,
-    int yoffset,
-    __global unsigned char *dst_base,
-    int dst_offset,
-    int dst_pitch,
-    __global int *int_mem
-)
-{
-
-    int Height = 16, Width = 16;
-
-    /* First filter 1-D horizontally... */
-    vp8_filter_block2d_bil_first_pass_kernel(src_base, src_offset, int_mem, src_pixels_per_line, Height + 1, Width, xoffset);
-
-    /* then 1-D vertically... */
-    vp8_filter_block2d_bil_second_pass_kernel(int_mem, dst_base, dst_offset, dst_pitch, Height, Width, yoffset);
-
-}
-
-void vp8_filter_block2d_first_pass(
-    global unsigned char *src_base,
-    int src_offset,
-    local int *output_ptr,
-    unsigned int src_pixels_per_line,
-    unsigned int pixel_step,
-    unsigned int output_height,
-    unsigned int output_width,
-    int filter_offset
-){
-    uint tid = get_global_id(0);
-    uint i = tid;
-
-    int nthreads = get_global_size(0);
-    int ngroups = nthreads / get_local_size(0);
-
-    global unsigned char *src_ptr = &src_base[src_offset];
-    //Note that src_offset will be reset later, which is why we capture it now
-
-    int Temp;
-
-    __constant short *vp8_filter = sub_pel_filters[filter_offset];
-
-    if (tid < (output_width*output_height)){
-        short filter0 = vp8_filter[0];
-        short filter1 = vp8_filter[1];
-        short filter2 = vp8_filter[2];
-        short filter3 = vp8_filter[3];
-        short filter4 = vp8_filter[4];
-        short filter5 = vp8_filter[5];
-
-        if (ngroups > 1){
-            //This is generally only true on Apple CPU-CL, which gives a group
-            //size of 1, regardless of the CPU core count.
-            for (i=0; i < output_width*output_height; i++){
-                src_offset = i + (i/output_width * (src_pixels_per_line - output_width));
-
-                Temp = (int)(src_ptr[src_offset - 2] * filter0) +
-                       (int)(src_ptr[src_offset - 1] * filter1) +
-                       (int)(src_ptr[src_offset]     * filter2) +
-                       (int)(src_ptr[src_offset + 1] * filter3) +
-                       (int)(src_ptr[src_offset + 2] * filter4) +
-                       (int)(src_ptr[src_offset + 3] * filter5) +
-                       (VP8_FILTER_WEIGHT >> 1);      /* Rounding */
-
-                /* Normalize back to 0-255 */
-                Temp >>= VP8_FILTER_SHIFT;
-
-                if (Temp < 0)
-                    Temp = 0;
-                else if ( Temp > 255 )
-                    Temp = 255;
-
-                output_ptr[i] = Temp;
-            }
-        } else {
-            src_offset = i + (i/output_width * (src_pixels_per_line - output_width));
-
-            Temp = (int)(src_ptr[src_offset - 2] * filter0) +
-                   (int)(src_ptr[src_offset - 1] * filter1) +
-                   (int)(src_ptr[src_offset]     * filter2) +
-                   (int)(src_ptr[src_offset + 1] * filter3) +
-                   (int)(src_ptr[src_offset + 2] * filter4) +
-                   (int)(src_ptr[src_offset + 3] * filter5) +
-                   (VP8_FILTER_WEIGHT >> 1);      /* Rounding */
-
-            /* Normalize back to 0-255 */
-            Temp >>= VP8_FILTER_SHIFT;
-
-            if (Temp < 0)
-                Temp = 0;
-            else if ( Temp > 255 )
-                Temp = 255;
-
-            output_ptr[i] = Temp;
-        }
-    }
-
-    //Add a fence so that no 2nd pass stuff starts before 1st pass writes are done.
-    barrier(CLK_LOCAL_MEM_FENCE);
-}
-
-void vp8_filter_block2d_second_pass
-(
-    local int *src_ptr,
-    global unsigned char *output_base,
-    int output_offset,
-    int output_pitch,
-    unsigned int src_pixels_per_line,
-    unsigned int pixel_step,
-    unsigned int output_height,
-    unsigned int output_width,
-    int filter_offset
-) {
-
-    global unsigned char *output_ptr = &output_base[output_offset];
-
-    int out_offset; //Not same as output_offset...
-    int src_offset;
-    int Temp;
-    int PS2 = 2*(int)pixel_step;
-    int PS3 = 3*(int)pixel_step;
-
-    unsigned int src_increment = src_pixels_per_line - output_width;
-
-    uint i = get_global_id(0);
-
-    __constant short *vp8_filter = sub_pel_filters[filter_offset];
-
-    if (i < (output_width * output_height)){
-        out_offset = i/output_width;
-        src_offset = out_offset;
-
-        src_offset = i + (src_offset * src_increment);
-        out_offset = i%output_width + (out_offset * output_pitch);
-
-        /* Apply filter */
-        Temp = ((int)src_ptr[src_offset - PS2] * vp8_filter[0]) +
-           ((int)src_ptr[src_offset -(int)pixel_step] * vp8_filter[1]) +
-           ((int)src_ptr[src_offset]                  * vp8_filter[2]) +
-           ((int)src_ptr[src_offset + pixel_step]     * vp8_filter[3]) +
-           ((int)src_ptr[src_offset + PS2]            * vp8_filter[4]) +
-           ((int)src_ptr[src_offset + PS3]       * vp8_filter[5]) +
-           (VP8_FILTER_WEIGHT >> 1);   /* Rounding */
-
-        /* Normalize back to 0-255 */
-        Temp = Temp >> VP8_FILTER_SHIFT;
-        if (Temp < 0)
-            Temp = 0;
-        else if (Temp > 255)
-            Temp = 255;
-
-        output_ptr[out_offset] = (unsigned char)Temp;
-    }
-}
-
-__kernel void vp8_sixtap_predict_kernel
-(
-    __global unsigned char  *src_ptr,
-    int src_offset,
-    int  src_pixels_per_line,
-    int  xoffset,
-    int  yoffset,
-    __global unsigned char *dst_ptr,
-    int dst_offset,
-    int  dst_pitch
-)
-{
-
-    local int FData[9*4];
-
-    /* First filter 1-D horizontally... */
-    vp8_filter_block2d_first_pass(src_ptr, src_offset, FData, src_pixels_per_line, 1, 9, 4, xoffset);
-
-    /* then filter vertically... */
-    vp8_filter_block2d_second_pass(&FData[8], dst_ptr, dst_offset, dst_pitch, 4, 4, 4, 4, yoffset);
-}
-
-__kernel void vp8_sixtap_predict8x8_kernel
-(
-    __global unsigned char  *src_ptr,
-    int src_offset,
-    int  src_pixels_per_line,
-    int  xoffset,
-    int  yoffset,
-    __global unsigned char *dst_ptr,
-    int dst_offset,
-    int  dst_pitch
-)
-{
-    local int FData[13*16];   /* Temp data bufffer used in filtering */
-
-    /* First filter 1-D horizontally... */
-    vp8_filter_block2d_first_pass(src_ptr, src_offset, FData, src_pixels_per_line, 1, 13, 8, xoffset);
-
-    /* then filter vertically... */
-    vp8_filter_block2d_second_pass(&FData[16], dst_ptr, dst_offset, dst_pitch, 8, 8, 8, 8, yoffset);
-
-}
-
-__kernel void vp8_sixtap_predict8x4_kernel
-(
-    __global unsigned char  *src_ptr,
-    int src_offset,
-    int  src_pixels_per_line,
-    int  xoffset,
-    int  yoffset,
-    __global unsigned char *dst_ptr,
-    int dst_offset,
-    int  dst_pitch
-)
-{
-    local int FData[13*16];   /* Temp data buffer used in filtering */
-
-    /* First filter 1-D horizontally... */
-    vp8_filter_block2d_first_pass(src_ptr, src_offset, FData, src_pixels_per_line, 1, 9, 8, xoffset);
-
-    /* then filter verticaly... */
-    vp8_filter_block2d_second_pass(&FData[16], dst_ptr, dst_offset, dst_pitch, 8, 8, 4, 8, yoffset);
-}
-
-__kernel void vp8_sixtap_predict16x16_kernel
-(
-    __global unsigned char  *src_ptr,
-    int src_offset,
-    int  src_pixels_per_line,
-    int  xoffset,
-    int  yoffset,
-    __global unsigned char *dst_ptr,
-    int dst_offset,
-    int  dst_pitch
-)
-{
-    local int FData[21*24];   /* Temp data buffer used in filtering */
-
-    /* First filter 1-D horizontally... */
-    vp8_filter_block2d_first_pass(src_ptr, src_offset, FData, src_pixels_per_line, 1, 21, 16, xoffset);
-
-    /* then filter verticaly... */
-    vp8_filter_block2d_second_pass(&FData[32], dst_ptr, dst_offset, dst_pitch, 16, 16, 16, 16, yoffset);
-
-    return;
-}
--- a/vp8/common/opencl/filter_cl.h
+++ b/vp8/common/opencl/filter_cl.h
@@ -1,74 +0,0 @@
-/*
- *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-#ifndef FILTER_CL_H_
-#define FILTER_CL_H_
-
-#ifdef	__cplusplus
-extern "C" {
-#endif
-
-#include "vp8_opencl.h"
-
-#define VP8_FILTER_WEIGHT 128
-#define VP8_FILTER_SHIFT  7
-
-#define REGISTER_FILTER 1
-#define CLAMP(x,min,max) if (x < min) x = min; else if ( x > max ) x = max;
-#define PRE_CALC_PIXEL_STEPS 1
-#define PRE_CALC_SRC_INCREMENT 1
-
-#if PRE_CALC_PIXEL_STEPS
-#define PS2 two_pixel_steps
-#define PS3 three_pixel_steps
-#else
-#define PS2 2*(int)pixel_step
-#define PS3 3*(int)pixel_step
-#endif
-
-#if REGISTER_FILTER
-#define FILTER0 filter0
-#define FILTER1 filter1
-#define FILTER2 filter2
-#define FILTER3 filter3
-#define FILTER4 filter4
-#define FILTER5 filter5
-#else
-#define FILTER0 vp8_filter[0]
-#define FILTER1 vp8_filter[1]
-#define FILTER2 vp8_filter[2]
-#define FILTER3 vp8_filter[3]
-#define FILTER4 vp8_filter[4]
-#define FILTER5 vp8_filter[5]
-#endif
-
-#if PRE_CALC_SRC_INCREMENT
-#define SRC_INCREMENT src_increment
-#else
-#define SRC_INCREMENT (src_pixels_per_line - output_width)
-#endif
-
-#define FILTER_OFFSET //Filter data stored as CL constant memory
-#define FILTER_REF sub_pel_filters[filter_offset]
-
-extern const char *filterCompileOptions;
-extern const char *filter_cl_file_name;
-
-//Copy the -2*pixel_step (and ps*3) bytes because the filter algorithm
-//accesses negative indexes
-#define SIXTAP_SRC_LEN(out_width,out_height,src_px) ((out_width)*(out_height) + (((out_width)*(out_height)-1)/(out_width))*(src_px - out_width) + 5)
-#define BIL_SRC_LEN(out_width,out_height,src_px) ((out_height) * src_px + out_width)
-#define DST_LEN(dst_pitch,dst_height,dst_width) (dst_pitch * (dst_height) + (dst_width))
-
-#ifdef	__cplusplus
-}
-#endif
-
-#endif /* FILTER_CL_H_ */
--- a/vp8/common/opencl/idct_cl.h
+++ b/vp8/common/opencl/idct_cl.h
@@ -1,45 +0,0 @@
-/*
- *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-
-#ifndef IDCT_OPENCL_H
-#define IDCT_OPENCL_H
-
-#ifdef	__cplusplus
-extern "C" {
-#endif
-
-#include "vp8_opencl.h"
-#include "vp8/common/blockd.h"
-
-#define prototype_second_order_cl(sym) \
-    void sym(BLOCKD *b)
-
-#define prototype_idct_cl(sym) \
-    void sym(BLOCKD *b, int pitch)
-
-#define prototype_idct_scalar_add_cl(sym) \
-    void sym(BLOCKD *b, cl_int use_diff, int diff_offset, int qcoeff_offset, \
-             int pred_offset, unsigned char *output, cl_mem out_mem, int out_offset, size_t out_size, \
-             int pitch, int stride)\
-
-
-extern prototype_idct_cl(vp8_short_idct4x4llm_1_cl);
-extern prototype_idct_cl(vp8_short_idct4x4llm_cl);
-extern prototype_idct_scalar_add_cl(vp8_dc_only_idct_add_cl);
-
-extern prototype_second_order_cl(vp8_short_inv_walsh4x4_1_cl);
-extern prototype_second_order_cl(vp8_short_inv_walsh4x4_cl);
-
-#ifdef	__cplusplus
-}
-#endif
-
-#endif
--- a/vp8/common/opencl/idctllm_cl.c
+++ b/vp8/common/opencl/idctllm_cl.c
@@ -1,325 +0,0 @@
-/*
- *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-
-#include <stdlib.h>
-
-//ACW: Remove me after debugging.
-#include <stdio.h>
-#include <string.h>
-
-#include "idct_cl.h"
-#include "idctllm_cl.h"
-#include "blockd_cl.h"
-
-void cl_destroy_idct(){
-
-    if (cl_data.idct_program)
-        clReleaseProgram(cl_data.idct_program);
-
-    cl_data.idct_program = NULL;
-    
-    VP8_CL_RELEASE_KERNEL(cl_data.vp8_short_inv_walsh4x4_1_kernel);
-    VP8_CL_RELEASE_KERNEL(cl_data.vp8_short_inv_walsh4x4_1st_pass_kernel);
-    VP8_CL_RELEASE_KERNEL(cl_data.vp8_short_inv_walsh4x4_2nd_pass_kernel);
-    VP8_CL_RELEASE_KERNEL(cl_data.vp8_dc_only_idct_add_kernel);
-    //VP8_CL_RELEASE_KERNEL(cl_data.vp8_short_idct4x4llm_1_kernel);
-    //VP8_CL_RELEASE_KERNEL(cl_data.vp8_short_idct4x4llm_kernel);
-
-}
-
-int cl_init_idct() {
-    int err;
-
-    // Create the filter compute program from the file-defined source code
-    if (cl_load_program(&cl_data.idct_program, idctllm_cl_file_name,
-            idctCompileOptions) != CL_SUCCESS)
-        return VP8_CL_TRIED_BUT_FAILED;
-
-    // Create the compute kernel in the program we wish to run
-    VP8_CL_CREATE_KERNEL(cl_data,idct_program,vp8_short_inv_walsh4x4_1_kernel,"vp8_short_inv_walsh4x4_1_kernel");
-    VP8_CL_CREATE_KERNEL(cl_data,idct_program,vp8_short_inv_walsh4x4_1st_pass_kernel,"vp8_short_inv_walsh4x4_1st_pass_kernel");
-    VP8_CL_CREATE_KERNEL(cl_data,idct_program,vp8_short_inv_walsh4x4_2nd_pass_kernel,"vp8_short_inv_walsh4x4_2nd_pass_kernel");
-    VP8_CL_CREATE_KERNEL(cl_data,idct_program,vp8_dc_only_idct_add_kernel,"vp8_dc_only_idct_add_kernel");
-
-    ////idct4x4llm kernels are only useful for the encoder
-    //VP8_CL_CREATE_KERNEL(cl_data,idct_program,vp8_short_idct4x4llm_1_kernel,"vp8_short_idct4x4llm_1_kernel");
-    //VP8_CL_CREATE_KERNEL(cl_data,idct_program,vp8_short_idct4x4llm_kernel,"vp8_short_idct4x4llm_kernel");
-
-    return CL_SUCCESS;
-}
-
-#define max(x,y) (x > y ? x: y)
-//#define NO_CL
-
-/* Only useful for encoder... Untested... */
-void vp8_short_idct4x4llm_cl(BLOCKD *b, int pitch)
-{
-    int err;
-
-    short *input = b->dqcoeff_base + b->dqcoeff_offset;
-    short *output = &b->diff_base[b->diff_offset];
-
-    cl_mem src_mem, dst_mem;
-
-    //1 instance for now. This should be split into 2-pass * 4 thread.
-    size_t global = 1;
-
-    if (cl_initialized != CL_SUCCESS){
-        vp8_short_idct4x4llm_c(input,output,pitch);
-        return;
-    }
-
-    VP8_CL_CREATE_BUF(b->cl_commands, src_mem,,
-            sizeof(short)*16, input,
-            vp8_short_idct4x4llm_c(input,output,pitch),
-    );
-
-    VP8_CL_CREATE_BUF(b->cl_commands, dst_mem,,
-            sizeof(short)*(4+(pitch/2)*3), output,
-            vp8_short_idct4x4llm_c(input,output,pitch),
-    );
-
-    //Set arguments and run kernel
-    err = 0;
-    err = clSetKernelArg(cl_data.vp8_short_idct4x4llm_kernel, 0, sizeof (cl_mem), &src_mem);
-    err |= clSetKernelArg(cl_data.vp8_short_idct4x4llm_kernel, 1, sizeof (cl_mem), &dst_mem);
-    err |= clSetKernelArg(cl_data.vp8_short_idct4x4llm_kernel, 2, sizeof (int), &pitch);
-    VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
-        "Error: Failed to set kernel arguments!\n",
-        vp8_short_idct4x4llm_c(input,output,pitch),
-    );
-    
-    /* Execute the kernel */
-    err = clEnqueueNDRangeKernel(b->cl_commands, cl_data.vp8_short_idct4x4llm_kernel, 1, NULL, &global, NULL , 0, NULL, NULL);
-    VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
-        "Error: Failed to execute kernel!\n",
-        printf("err = %d\n",err);
-        vp8_short_idct4x4llm_c(input,output,pitch),
-    );
-
-    /* Read back the result data from the device */
-    err = clEnqueueReadBuffer(b->cl_commands, dst_mem, CL_FALSE, 0, sizeof(short)*(4+pitch/2*3), output, 0, NULL, NULL);
-    VP8_CL_CHECK_SUCCESS(b->cl_commands, err != CL_SUCCESS,
-        "Error: Failed to read output array!\n",
-        vp8_short_idct4x4llm_c(input,output,pitch),
-    );
-
-    clReleaseMemObject(src_mem);
-    clReleaseMemObject(dst_mem);
-
-    return;
-}
-
-/* Only useful for encoder... Untested... */
-void vp8_short_idct4x4llm_1_cl(BLOCKD *b, int pitch)
-{
-    int err;
-    size_t global = 4;
-
-    short *input = b->dqcoeff_base + b->dqcoeff_offset;
-    short *output = &b->diff_base[b->diff_offset];
-
-    cl_mem src_mem, dst_mem;
-
-    if (cl_initialized != CL_SUCCESS){
-        vp8_short_idct4x4llm_1_c(input,output,pitch);
-        return;
-    }
-
-    printf("vp8_short_idct4x4llm_1_cl\n");
-
-    VP8_CL_CREATE_BUF(b->cl_commands, src_mem,,
-            sizeof(short), input,
-            vp8_short_idct4x4llm_1_c(input,output,pitch),
-    );
-
-    VP8_CL_CREATE_BUF(b->cl_commands, dst_mem,,
-            sizeof(short)*(4+(pitch/2)*3), output,
-            vp8_short_idct4x4llm_1_c(input,output,pitch),
-    );
-
-    //Set arguments and run kernel
-    err = 0;
-    err = clSetKernelArg(cl_data.vp8_short_idct4x4llm_1_kernel, 0, sizeof (cl_mem), &src_mem);
-    err |= clSetKernelArg(cl_data.vp8_short_idct4x4llm_1_kernel, 1, sizeof (cl_mem), &dst_mem);
-    err |= clSetKernelArg(cl_data.vp8_short_idct4x4llm_1_kernel, 2, sizeof (int), &pitch);
-    VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
-        "Error: Failed to set kernel arguments!\n",
-        vp8_short_idct4x4llm_1_c(input,output,pitch),
-    );
-
-    /* Execute the kernel */
-    err = clEnqueueNDRangeKernel(b->cl_commands, cl_data.vp8_short_idct4x4llm_1_kernel, 1, NULL, &global, NULL , 0, NULL, NULL);
-    VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
-        "Error: Failed to execute kernel!\n",
-        printf("err = %d\n",err);
-        vp8_short_idct4x4llm_1_c(input,output,pitch),
-    );
-
-    /* Read back the result data from the device */
-    err = clEnqueueReadBuffer(b->cl_commands, dst_mem, CL_FALSE, 0, sizeof(short)*(4+pitch/2*3), output, 0, NULL, NULL);
-    VP8_CL_CHECK_SUCCESS(b->cl_commands, err != CL_SUCCESS,
-        "Error: Failed to read output array!\n",
-        vp8_short_idct4x4llm_1_c(input,output,pitch),
-    );
-
-    clReleaseMemObject(src_mem);
-    clReleaseMemObject(dst_mem);
-
-    return;
-
-}
-
-void vp8_dc_only_idct_add_cl(BLOCKD *b, cl_int use_diff, int diff_offset, 
-        int qcoeff_offset, int pred_offset,
-        unsigned char *dst_base, cl_mem dst_mem, int dst_offset, size_t dest_size,
-        int pitch, int stride
-)
-{
-    
-    int err;
-    size_t global = 16;
-
-    int free_mem = 0;
-    //cl_mem dest_mem = NULL;
-
-    if (dst_mem == NULL){
-        VP8_CL_CREATE_BUF(b->cl_commands, dst_mem,,
-                dest_size, dst_base,,
-        );
-        free_mem = 1;
-    }
-
-    //Set arguments and run kernel
-    err =  clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 0, sizeof (cl_mem), &b->cl_predictor_mem);
-    err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 1, sizeof (int), &pred_offset);
-    err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 2, sizeof (cl_mem), &dst_mem);
-    err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 3, sizeof (int), &dst_offset);
-    err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 4, sizeof (int), &pitch);
-    err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 5, sizeof (int), &stride);
-    err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 6, sizeof (cl_int), &use_diff);
-    err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 7, sizeof (cl_mem), &b->cl_diff_mem);
-    err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 8, sizeof (int), &diff_offset);
-    err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 9, sizeof (cl_mem), &b->cl_qcoeff_mem);
-    err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 10, sizeof (int), &qcoeff_offset);
-    err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 11, sizeof (cl_mem), &b->cl_dequant_mem);
-    VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
-        "Error: Failed to set kernel arguments!\n",,
-    );
-
-    /* Execute the kernel */
-    err = clEnqueueNDRangeKernel(b->cl_commands, cl_data.vp8_dc_only_idct_add_kernel, 1, NULL, &global, NULL , 0, NULL, NULL);
-    VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
-        "Error: Failed to execute kernel!\n",
-        printf("err = %d\n",err);,
-    );
-
-
-    if (free_mem == 1){
-    /* Read back the result data from the device */
-        err = clEnqueueReadBuffer(b->cl_commands, dst_mem, CL_FALSE, 0,
-                dest_size, dst_base, 0, NULL, NULL);
-
-        VP8_CL_CHECK_SUCCESS(b->cl_commands, err != CL_SUCCESS,
-            "Error: Failed to read output array!\n",,
-        );
-
-        clReleaseMemObject(dst_mem);
-    }
-
-    return;
-}
-
-void vp8_short_inv_walsh4x4_cl(BLOCKD *b)
-{
-    int err;
-    size_t global = 4;
-
-    if (cl_initialized != CL_SUCCESS){
-        vp8_short_inv_walsh4x4_c(b->dqcoeff_base+b->dqcoeff_offset,&b->diff_base[b->diff_offset]);
-        return;
-    }
-
-    //Set arguments and run kernel
-    err = 0;
-    err = clSetKernelArg(cl_data.vp8_short_inv_walsh4x4_1st_pass_kernel, 0, sizeof (cl_mem), &b->cl_dqcoeff_mem);
-    err |= clSetKernelArg(cl_data.vp8_short_inv_walsh4x4_1st_pass_kernel, 1, sizeof(int), &b->dqcoeff_offset);
-    err |= clSetKernelArg(cl_data.vp8_short_inv_walsh4x4_1st_pass_kernel, 2, sizeof (cl_mem), &b->cl_diff_mem);
-    err |= clSetKernelArg(cl_data.vp8_short_inv_walsh4x4_1st_pass_kernel, 3, sizeof(int), &b->diff_offset);
-    VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
-        "Error: Failed to set kernel arguments!\n",
-        vp8_short_inv_walsh4x4_c(b->dqcoeff_base+b->dqcoeff_offset, &b->diff_base[b->diff_offset]),
-    );
-
-    /* Execute the kernel */
-    err = clEnqueueNDRangeKernel(b->cl_commands, cl_data.vp8_short_inv_walsh4x4_1st_pass_kernel, 1, NULL, &global, NULL , 0, NULL, NULL);
-    VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
-        "Error: Failed to execute kernel!\n",
-        printf("err = %d\n",err);
-        vp8_short_inv_walsh4x4_c(b->dqcoeff_base+b->dqcoeff_offset, &b->diff_base[b->diff_offset]),
-    );
-
-    //Second pass
-    //Set arguments and run kernel
-    err = 0;
-    err = clSetKernelArg(cl_data.vp8_short_inv_walsh4x4_2nd_pass_kernel, 0, sizeof (cl_mem), &b->cl_diff_mem);
-    err |= clSetKernelArg(cl_data.vp8_short_inv_walsh4x4_2nd_pass_kernel, 1, sizeof(int), &b->diff_offset);
-    VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
-        "Error: Failed to set kernel arguments!\n",
-        vp8_short_inv_walsh4x4_c(b->dqcoeff_base+b->dqcoeff_offset, &b->diff_base[b->diff_offset]),
-    );
-
-    /* Execute the kernel */
-    err = clEnqueueNDRangeKernel(b->cl_commands, cl_data.vp8_short_inv_walsh4x4_2nd_pass_kernel, 1, NULL, &global, NULL , 0, NULL, NULL);
-    VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
-        "Error: Failed to execute kernel!\n",
-        printf("err = %d\n",err);
-        vp8_short_inv_walsh4x4_c(b->dqcoeff_base+b->dqcoeff_offset, &b->diff_base[b->diff_offset]),
-    );
-
-    return;
-}
-
-void vp8_short_inv_walsh4x4_1_cl(BLOCKD *b)
-{
-    
-    int err;
-    size_t global = 4;
-
-    if (cl_initialized != CL_SUCCESS){
-        vp8_short_inv_walsh4x4_1_c(b->dqcoeff_base + b->dqcoeff_offset,
-            &b->diff_base[b->diff_offset]);
-        return;
-    }
-
-    //Set arguments and run kernel
-    err = 0;
-    err = clSetKernelArg(cl_data.vp8_short_inv_walsh4x4_1_kernel, 0, sizeof (cl_mem), &b->cl_dqcoeff_mem);
-    err |= clSetKernelArg(cl_data.vp8_short_inv_walsh4x4_1_kernel, 1, sizeof (int), &b->dqcoeff_offset);
-    err |= clSetKernelArg(cl_data.vp8_short_inv_walsh4x4_1_kernel, 2, sizeof (cl_mem), &b->cl_diff_mem);
-    err |= clSetKernelArg(cl_data.vp8_short_inv_walsh4x4_1_kernel, 3, sizeof (int), &b->diff_offset);
-    VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
-        "Error: Failed to set kernel arguments!\n",
-        vp8_short_inv_walsh4x4_1_c(b->dqcoeff_base + b->dqcoeff_offset,
-            &b->diff_base[b->diff_offset]),
-    );
-
-    /* Execute the kernel */
-    err = clEnqueueNDRangeKernel(b->cl_commands, cl_data.vp8_short_inv_walsh4x4_1_kernel, 1, NULL, &global, NULL , 0, NULL, NULL);
-    VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
-        "Error: Failed to execute kernel!\n",
-        printf("err = %d\n",err);
-        vp8_short_inv_walsh4x4_1_c(b->dqcoeff_base + b->dqcoeff_offset,
-                &b->diff_base[b->diff_offset]),
-    );
-
-    return;
-}
--- a/vp8/common/opencl/idctllm_cl.cl
+++ b/vp8/common/opencl/idctllm_cl.cl
@@ -1,309 +0,0 @@
-#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable
-#pragma OPENCL EXTENSION cl_amd_printf : enable
-
-__constant int cospi8sqrt2minus1 = 20091;
-__constant int sinpi8sqrt2      = 35468;
-__constant int rounding = 0;
-
-
-kernel void vp8_short_idct4x4llm_1st_pass_kernel(global short*,global short *,int);
-kernel void vp8_short_idct4x4llm_2nd_pass_kernel(global short*,int);
-
-
-__kernel void vp8_short_idct4x4llm_kernel(
-    __global short *input,
-    __global short *output,
-    int pitch
-){
-    vp8_short_idct4x4llm_1st_pass_kernel(input,output,pitch);
-    vp8_short_idct4x4llm_2nd_pass_kernel(output,pitch);
-}
-
-__kernel void vp8_short_idct4x4llm_1st_pass_kernel(
-    __global short *ip,
-    __global short *op,
-    int pitch
-)
-{
-    int i;
-    int a1, b1, c1, d1;
-
-    int temp1, temp2;
-    int shortpitch = pitch >> 1;
-
-    for (i = 0; i < 4; i++)
-    {
-        a1 = ip[0] + ip[8];
-        b1 = ip[0] - ip[8];
-
-        temp1 = (ip[4] * sinpi8sqrt2 + rounding) >> 16;
-        temp2 = ip[12] + ((ip[12] * cospi8sqrt2minus1 + rounding) >> 16);
-        c1 = temp1 - temp2;
-
-        temp1 = ip[4] + ((ip[4] * cospi8sqrt2minus1 + rounding) >> 16);
-        temp2 = (ip[12] * sinpi8sqrt2 + rounding) >> 16;
-        d1 = temp1 + temp2;
-
-        op[shortpitch*0] = a1 + d1;
-        op[shortpitch*3] = a1 - d1;
-
-        op[shortpitch*1] = b1 + c1;
-        op[shortpitch*2] = b1 - c1;
-
-        ip++;
-        op++;
-    }
-
-    return;
-}
-
-__kernel void vp8_short_idct4x4llm_2nd_pass_kernel(
-    __global short *output,
-    int pitch
-)
-{
-    int i;
-    int a1, b1, c1, d1;
-
-    int temp1, temp2;
-    int shortpitch = pitch >> 1;
-    __global short *ip = output;
-    __global short *op = output;
-
-    for (i = 0; i < 4; i++)
-    {
-        a1 = ip[0] + ip[2];
-        b1 = ip[0] - ip[2];
-
-        temp1 = (ip[1] * sinpi8sqrt2 + rounding) >> 16;
-        temp2 = ip[3] + ((ip[3] * cospi8sqrt2minus1 + rounding) >> 16);
-        c1 = temp1 - temp2;
-
-        temp1 = ip[1] + ((ip[1] * cospi8sqrt2minus1 + rounding) >> 16);
-        temp2 = (ip[3] * sinpi8sqrt2 + rounding) >> 16;
-        d1 = temp1 + temp2;
-
-        op[0] = (a1 + d1 + 4) >> 3;
-        op[3] = (a1 - d1 + 4) >> 3;
-
-        op[1] = (b1 + c1 + 4) >> 3;
-        op[2] = (b1 - c1 + 4) >> 3;
-
-        ip += shortpitch;
-        op += shortpitch;
-    }
-
-    return;
-}
-
-__kernel void vp8_short_idct4x4llm_1_kernel(
-    __global short *input,
-    __global short *output,
-    int pitch
-)
-{
-    int a1;
-    int out_offset;
-    int shortpitch = pitch >> 1;
-
-    //short4 a;
-    a1 = ((input[0] + 4) >> 3);
-    //a = a1;
-
-    int tid = get_global_id(0);
-    if (tid < 4){
-        out_offset = shortpitch * tid;
-
-        //vstore4(a,0,&output[out_offset];
-        output[out_offset] = a1;
-        output[out_offset+1] = a1;
-        output[out_offset+2] = a1;
-        output[out_offset+3] = a1;
-    }
-}
-
-__kernel void vp8_dc_only_idct_add_kernel(
-    __global unsigned char *pred_base,
-    int pred_offset,
-    __global unsigned char *dst_base,
-    int dst_offset,
-    int pitch,
-    int stride,
-    int use_diff,
-    global short *diff_base,
-    int diff_offset,
-    global short *qcoeff_base,
-    int qcoeff_offset,
-    global short *dequant
-)
-{
-    int r, c;
-    //int pred_offset;
-    global unsigned char *pred_ptr = &pred_base[pred_offset];
-    global unsigned char *dst_ptr = &dst_base[dst_offset];
-
-    int tid = get_global_id(0);
-
-    int a1;
-
-    if (tid < 16){
-
-        if (use_diff == 1){
-            a1 = diff_base[diff_offset];
-        } else {
-            a1 = qcoeff_base[qcoeff_offset] * dequant[0];
-        }
-        a1 = (a1 + 4)>>3;
-
-        r = tid / 4;
-        c = tid % 4;
-
-        pred_offset = r * pitch;
-        dst_offset += r * stride;
-        int a = a1 + pred_ptr[pred_offset + c] ;
-
-        if (a < 0)
-            a = 0;
-        else if (a > 255)
-            a = 255;
-
-        dst_base[dst_offset + c] = (unsigned char) a ;
-    }
-}
-
-
-__kernel void vp8_short_inv_walsh4x4_1st_pass_kernel(
-    __global short *src_base,
-    int src_offset,
-    __global short *output_base,
-    int out_offset
-)
-{
-
-    __global short *input = src_base + src_offset;
-    __global short *output = output_base + src_offset;
-    int tid = get_global_id(0);
-
-#define VEC_WALSH 0
-#if VEC_WALSH
-    //4-short vectors to calculate things in
-    short4 a,b,c,d, a2v, b2v, c2v, d2v, a1t, b1t, c1t, d1t;
-    short16 out;
-
-    if (tid == 0){
-        //first pass loop in vector form
-        a = vload4(0,input) + vload4(3,input);
-        b = vload4(1,input) + vload4(2,input);
-        c = vload4(1,input) - vload4(2,input);
-        d = vload4(0,input) - vload4(3,input);
-        vstore4(a + b, 0, output);
-        vstore4(c + d, 1, output);
-        vstore4(a - b, 2, output);
-        vstore4(d - c, 3, output);
-
-        return;
-
-        //2nd pass
-        a = (short4)(output[0], output[4], output[8], output[12]);
-        b = (short4)(output[1], output[5], output[9], output[13]);
-        c = (short4)(output[1], output[5], output[9], output[13]);
-        d = (short4)(output[0], output[4], output[8], output[12]);
-        a1t = (short4)(output[3], output[7], output[11], output[15]);
-        b1t = (short4)(output[2], output[6], output[10], output[14]);
-        c1t = (short4)(output[2], output[6], output[10], output[14]);
-        d1t = (short4)(output[3], output[7], output[11], output[15]);
-
-        a = a + a1t + (short)3;
-        b = b + b1t;
-        c = c - c1t;
-        d = d - d1t + (short)3;
-
-        a2v = (a + b) >> (short)3;
-        b2v = (c + d) >> (short)3;
-        c2v = (a - b) >> (short)3;
-        d2v = (d - c) >> (short)3;
-
-        out.s048c = a2v;
-        out.s159d = b2v;
-        out.s26ae = c2v;
-        out.s37bf = d2v;
-        vstore16(out,0,output);
-    }
-#else
-
-    int i;
-    int a1, b1, c1, d1;
-    int a2, b2, c2, d2;
-    global short *ip = input;
-    global short *op = output;
-
-    int offset;
-
-    if (tid < 4){
-        offset = tid;
-        a1 = ip[offset] + ip[offset + 12];
-        b1 = ip[offset + 4] + ip[offset + 8];
-        c1 = ip[offset + 4] - ip[offset + 8];
-        d1 = ip[offset] - ip[offset + 12];
-
-        op[offset] = a1 + b1;
-        op[offset + 4] = c1 + d1;
-        op[offset + 8] = a1 - b1;
-        op[offset + 12] = d1 - c1;
-    }
-#endif
-}
-
-__kernel void vp8_short_inv_walsh4x4_2nd_pass_kernel(
-    __global short *output_base,
-    int out_offset
-)
-{
-    int i;
-    int a1, b1, c1, d1;
-    int a2, b2, c2, d2;
-
-    __global short *output = output_base + out_offset;
-    int tid = get_global_id(0);
-    int offset = 0;
-
-    if (tid < 4){
-        offset = 4*tid;
-        a1 = output[offset] + output[offset + 3];
-        b1 = output[offset + 1] + output[offset + 2];
-        c1 = output[offset + 1] - output[offset + 2];
-        d1 = output[offset + 0] - output[offset + 3];
-
-        a2 = a1 + b1;
-        b2 = c1 + d1;
-        c2 = a1 - b1;
-        d2 = d1 - c1;
-
-        output[offset + 0] = (a2 + 3) >> 3;
-        output[offset + 1] = (b2 + 3) >> 3;
-        output[offset + 2] = (c2 + 3) >> 3;
-        output[offset + 3] = (d2 + 3) >> 3;
-    }
-}
-
-__kernel void vp8_short_inv_walsh4x4_1_kernel(
-    __global short *src_data,
-    int src_offset,
-    __global short *dst_data,
-    int dst_offset
-){
-    int a1;
-    int tid = get_global_id(0);
-    //short16 a;
-    int i;
-    short4 a;
-    __global short *input = src_data + src_offset;
-    __global short *output = dst_data + dst_offset;
-
-    if (tid < 4)
-    {
-        a1 = ((input[0] + 3) >> 3);
-        a = (short)a1; //Set all elements of vector to a1
-        vstore4(a, tid, output);
-    }
-}
--- a/vp8/common/opencl/idctllm_cl.h
+++ b/vp8/common/opencl/idctllm_cl.h
@@ -1,26 +0,0 @@
-/*
- *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-#include "vpx_config.h"
-#include "vp8_opencl.h"
-#include "vp8/common/blockd.h"
-
-#define CLAMP(x,min,max) if (x < min) x = min; else if ( x > max ) x = max;
-
-//External functions that are fallbacks if CL is unavailable
-extern void vp8_short_idct4x4llm_c(short *input, short *output, int pitch);
-extern void vp8_short_idct4x4llm_1_c(short *input, short *output, int pitch);
-extern void vp8_dc_only_idct_add_c(short input_dc, unsigned char *pred_ptr, unsigned char *dst_ptr, int pitch, int stride);
-extern void vp8_short_inv_walsh4x4_c(short *input, short *output);
-extern void vp8_short_inv_walsh4x4_1_c(short *input, short *output);
-
-const char *idctCompileOptions = "-Ivp8/common/opencl";
-const char *idctllm_cl_file_name = "vp8/common/opencl/idctllm_cl.cl";
-
--- a/vp8/common/opencl/loopfilter.cl
+++ b/vp8/common/opencl/loopfilter.cl
@@ -1,427 +0,0 @@
-#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable
-#pragma OPENCL EXTENSION cl_amd_printf : enable
-
-typedef unsigned char uc;
-typedef signed char sc;
-
-__inline signed char vp8_filter_mask(sc, sc, uc, uc, uc, uc, uc, uc, uc, uc);
-__inline signed char vp8_simple_filter_mask(signed char, signed char, uc, uc, uc, uc);
-__inline signed char vp8_hevmask(signed char, uc, uc, uc, uc);
-__inline signed char vp8_signed_char_clamp(int);
-
-__inline void vp8_mbfilter(signed char mask,signed char hev,global uc *op2,
-    global uc *op1,global uc *op0,global uc *oq0,global uc *oq1,global uc *oq2);
-
-void vp8_simple_filter(signed char mask,global uc *base, int op1_off,int op0_off,int oq0_off,int oq1_off);
-
-
-typedef struct
-{
-    signed char lim[16];
-    signed char flim[16];
-    signed char thr[16];
-    signed char mbflim[16];
-    signed char mbthr[16];
-    signed char uvlim[16];
-    signed char uvflim[16];
-    signed char uvthr[16];
-    signed char uvmbflim[16];
-    signed char uvmbthr[16];
-} loop_filter_info;
-
-
-
-
-void vp8_filter(
-    signed char mask,
-    signed char hev,
-    global uc *base,
-    int op1_off,
-    int op0_off,
-    int oq0_off,
-    int oq1_off
-)
-{
-
-    global uc *op1 = &base[op1_off];
-    global uc *op0 = &base[op0_off];
-    global uc *oq0 = &base[oq0_off];
-    global uc *oq1 = &base[oq1_off];
-
-    signed char ps0, qs0;
-    signed char ps1, qs1;
-    signed char vp8_filter, Filter1, Filter2;
-    signed char u;
-
-    ps1 = (signed char) * op1 ^ 0x80;
-    ps0 = (signed char) * op0 ^ 0x80;
-    qs0 = (signed char) * oq0 ^ 0x80;
-    qs1 = (signed char) * oq1 ^ 0x80;
-
-    /* add outer taps if we have high edge variance */
-    vp8_filter = vp8_signed_char_clamp(ps1 - qs1);
-    vp8_filter &= hev;
-
-    /* inner taps */
-    vp8_filter = vp8_signed_char_clamp(vp8_filter + 3 * (qs0 - ps0));
-    vp8_filter &= mask;
-
-    /* save bottom 3 bits so that we round one side +4 and the other +3
-     * if it equals 4 we'll set to adjust by -1 to account for the fact
-     * we'd round 3 the other way
-     */
-    Filter1 = vp8_signed_char_clamp(vp8_filter + 4);
-    Filter2 = vp8_signed_char_clamp(vp8_filter + 3);
-    Filter1 >>= 3;
-    Filter2 >>= 3;
-    u = vp8_signed_char_clamp(qs0 - Filter1);
-    *oq0 = u ^ 0x80;
-    u = vp8_signed_char_clamp(ps0 + Filter2);
-    *op0 = u ^ 0x80;
-    vp8_filter = Filter1;
-
-    /* outer tap adjustments */
-    vp8_filter += 1;
-    vp8_filter >>= 1;
-    vp8_filter &= ~hev;
-
-    u = vp8_signed_char_clamp(qs1 - vp8_filter);
-    *oq1 = u ^ 0x80;
-    u = vp8_signed_char_clamp(ps1 + vp8_filter);
-    *op1 = u ^ 0x80;
-}
-
-
-kernel void vp8_loop_filter_horizontal_edge_kernel
-(
-    global unsigned char *s_base,
-    int s_off,
-    int p, /* pitch */
-    global signed char *flimit,
-    global signed char *limit,
-    global signed char *thresh,
-    int off_stride
-)
-{
-    int  hev = 0; /* high edge variance */
-    signed char mask = 0;
-    int i = get_global_id(0);
-
-    if (i < get_global_size(0)){
-        s_off += i;
-
-        mask = vp8_filter_mask(limit[i], flimit[i], s_base[s_off - 4*p],
-                s_base[s_off - 3*p], s_base[s_off - 2*p], s_base[s_off - p],
-                s_base[s_off], s_base[s_off + p], s_base[s_off + 2*p],
-                s_base[s_off + 3*p]);
-
-        hev = vp8_hevmask(thresh[i], s_base[s_off - 2*p], s_base[s_off - p],
-                s_base[s_off], s_base[s_off+p]);
-
-        vp8_filter(mask, hev, s_base, s_off - 2 * p, s_off - p, s_off,
-                s_off + p);
-    }
-}
-
-
-kernel void vp8_loop_filter_vertical_edge_kernel
-(
-    global unsigned char *s_base,
-    int s_off,
-    int p,
-    global signed char *flimit,
-    global signed char *limit,
-    global signed char *thresh,
-    int off_stride
-)
-{
-
-    int  hev = 0; /* high edge variance */
-    signed char mask = 0;
-    int i = get_global_id(0);
-
-    if ( i < get_global_size(0) ){
-        s_off += p * i;
-        mask = vp8_filter_mask(limit[i], flimit[i],
-                s_base[s_off-4], s_base[s_off-3], s_base[s_off-2],
-                s_base[s_off-1], s_base[s_off], s_base[s_off+1],
-                s_base[s_off+2], s_base[s_off+3]);
-
-        hev = vp8_hevmask(thresh[i], s_base[s_off-2], s_base[s_off-1],
-                s_base[s_off], s_base[s_off+1]);
-
-        vp8_filter(mask, hev, s_base, s_off - 2, s_off - 1, s_off, s_off + 1);
-
-    }
-}
-
-
-kernel void vp8_mbloop_filter_horizontal_edge_kernel
-(
-    global unsigned char *s_base,
-    int s_off,
-    int p,
-    global signed char *flimit,
-    global signed char *limit,
-    global signed char *thresh,
-    int off_stride
-)
-{
-
-    global uc *s = s_base+s_off;
-
-    signed char hev = 0; /* high edge variance */
-    signed char mask = 0;
-    int i = get_global_id(0);
-
-    if (i < get_global_size(0)){
-        s += i;
-
-        mask = vp8_filter_mask(limit[i], flimit[i],
-                               s[-4*p], s[-3*p], s[-2*p], s[-1*p],
-                               s[0*p], s[1*p], s[2*p], s[3*p]);
-
-        hev = vp8_hevmask(thresh[i], s[-2*p], s[-1*p], s[0*p], s[1*p]);
-
-        vp8_mbfilter(mask, hev, s - 3 * p, s - 2 * p, s - 1 * p, s, s + 1 * p, s + 2 * p);
-
-    }
-}
-
-
-kernel void vp8_mbloop_filter_vertical_edge_kernel
-(
-    global unsigned char *s_base,
-    int s_off,
-    int p,
-    global signed char *flimit,
-    global signed char *limit,
-    global signed char *thresh,
-    int off_stride
-)
-{
-
-    global uc *s = s_base + s_off;
-
-    signed char hev = 0; /* high edge variance */
-    signed char mask = 0;
-    int i = get_global_id(0);
-
-    if (i < get_global_size(0)){
-        s += p * i;
-
-        mask = vp8_filter_mask(limit[i], flimit[i],
-                               s[-4], s[-3], s[-2], s[-1], s[0], s[1], s[2], s[3]);
-
-        hev = vp8_hevmask(thresh[i], s[-2], s[-1], s[0], s[1]);
-
-        vp8_mbfilter(mask, hev, s - 3, s - 2, s - 1, s, s + 1, s + 2);
-
-    }
-}
-
-
-kernel void vp8_loop_filter_simple_horizontal_edge_kernel
-(
-    global unsigned char *s_base,
-    int s_off,
-    int p,
-    global const signed char *flimit,
-    global const signed char *limit,
-    global const signed char *thresh,
-    int off_stride
-)
-{
-
-    signed char mask = 0;
-    int i = get_global_id(0);
-    (void) thresh;
-
-    if (i < get_global_size(0))
-    {
-        s_off += i;
-        mask = vp8_simple_filter_mask(limit[i], flimit[i], s_base[s_off-2*p], s_base[s_off-p], s_base[s_off], s_base[s_off+p]);
-        vp8_simple_filter(mask, s_base, s_off - 2 * p, s_off - 1 * p, s_off, s_off + 1 * p);
-    }
-}
-
-
-kernel void vp8_loop_filter_simple_vertical_edge_kernel
-(
-    global unsigned char *s_base,
-    int s_off,
-    int p,
-    global signed char *flimit,
-    global signed char *limit,
-    global signed char *thresh,
-    int off_stride
-)
-{
-
-    signed char mask = 0;
-    int i = get_global_id(0);
-    (void) thresh;
-
-    if (i < get_global_size(0)){
-        s_off += p * i;
-        mask = vp8_simple_filter_mask(limit[i], flimit[i], s_base[s_off-2], s_base[s_off-1], s_base[s_off], s_base[s_off+1]);
-        vp8_simple_filter(mask, s_base, s_off - 2, s_off - 1, s_off, s_off + 1);
-    }
-
-}
-
-
-
-//Inline and non-kernel functions follow.
-
-__inline void vp8_mbfilter(
-    signed char mask,
-    signed char hev,
-    global uc *op2,
-    global uc *op1,
-    global uc *op0,
-    global uc *oq0,
-    global uc *oq1,
-    global uc *oq2
-)
-{
-    signed char s, u;
-    signed char vp8_filter, Filter1, Filter2;
-    signed char ps2 = (signed char) * op2 ^ 0x80;
-    signed char ps1 = (signed char) * op1 ^ 0x80;
-    signed char ps0 = (signed char) * op0 ^ 0x80;
-    signed char qs0 = (signed char) * oq0 ^ 0x80;
-    signed char qs1 = (signed char) * oq1 ^ 0x80;
-    signed char qs2 = (signed char) * oq2 ^ 0x80;
-
-    /* add outer taps if we have high edge variance */
-    vp8_filter = vp8_signed_char_clamp(ps1 - qs1);
-    vp8_filter = vp8_signed_char_clamp(vp8_filter + 3 * (qs0 - ps0));
-    vp8_filter &= mask;
-
-    Filter2 = vp8_filter;
-    Filter2 &= hev;
-
-    /* save bottom 3 bits so that we round one side +4 and the other +3 */
-    Filter1 = vp8_signed_char_clamp(Filter2 + 4);
-    Filter2 = vp8_signed_char_clamp(Filter2 + 3);
-    Filter1 >>= 3;
-    Filter2 >>= 3;
-    qs0 = vp8_signed_char_clamp(qs0 - Filter1);
-    ps0 = vp8_signed_char_clamp(ps0 + Filter2);
-
-
-    /* only apply wider filter if not high edge variance */
-    vp8_filter &= ~hev;
-    Filter2 = vp8_filter;
-
-    /* roughly 3/7th difference across boundary */
-    u = vp8_signed_char_clamp((63 + Filter2 * 27) >> 7);
-    s = vp8_signed_char_clamp(qs0 - u);
-    *oq0 = s ^ 0x80;
-    s = vp8_signed_char_clamp(ps0 + u);
-    *op0 = s ^ 0x80;
-
-    /* roughly 2/7th difference across boundary */
-    u = vp8_signed_char_clamp((63 + Filter2 * 18) >> 7);
-    s = vp8_signed_char_clamp(qs1 - u);
-    *oq1 = s ^ 0x80;
-    s = vp8_signed_char_clamp(ps1 + u);
-    *op1 = s ^ 0x80;
-
-    /* roughly 1/7th difference across boundary */
-    u = vp8_signed_char_clamp((63 + Filter2 * 9) >> 7);
-    s = vp8_signed_char_clamp(qs2 - u);
-    *oq2 = s ^ 0x80;
-    s = vp8_signed_char_clamp(ps2 + u);
-    *op2 = s ^ 0x80;
-}
-
-
-__inline signed char vp8_signed_char_clamp(int t)
-{
-    t = (t < -128 ? -128 : t);
-    t = (t > 127 ? 127 : t);
-    return (signed char) t;
-}
-
-
-/* is there high variance internal edge ( 11111111 yes, 00000000 no) */
-__inline signed char vp8_hevmask(signed char thresh, uc p1, uc p0, uc q0, uc q1)
-{
-    signed char hev = 0;
-    hev  |= (abs(p1 - p0) > thresh) * -1;
-    hev  |= (abs(q1 - q0) > thresh) * -1;
-    return hev;
-}
-
-
-/* should we apply any filter at all ( 11111111 yes, 00000000 no) */
-__inline signed char vp8_filter_mask(
-    signed char limit,
-    signed char flimit,
-     uc p3, uc p2, uc p1, uc p0, uc q0, uc q1, uc q2, uc q3)
-{
-    signed char mask = 0;
-    mask |= (abs(p3 - p2) > limit) * -1;
-    mask |= (abs(p2 - p1) > limit) * -1;
-    mask |= (abs(p1 - p0) > limit) * -1;
-    mask |= (abs(q1 - q0) > limit) * -1;
-    mask |= (abs(q2 - q1) > limit) * -1;
-    mask |= (abs(q3 - q2) > limit) * -1;
-    mask |= (abs(p0 - q0) * 2 + abs(p1 - q1) / 2  > flimit * 2 + limit) * -1;
-    mask = ~mask;
-    return mask;
-}
-
-/* should we apply any filter at all ( 11111111 yes, 00000000 no) */
-__inline signed char vp8_simple_filter_mask(
-    signed char limit,
-    signed char flimit,
-    uc p1,
-    uc p0,
-    uc q0,
-    uc q1
-)
-{
-    signed char mask = (abs(p0 - q0) * 2 + abs(p1 - q1) / 2  <= flimit * 2 + limit) * -1;
-    return mask;
-}
-
-void vp8_simple_filter(
-    signed char mask,
-    global uc *base,
-    int op1_off,
-    int op0_off,
-    int oq0_off,
-    int oq1_off
-)
-{
-
-    global uc *op1 = base + op1_off;
-    global uc *op0 = base + op0_off;
-    global uc *oq0 = base + oq0_off;
-    global uc *oq1 = base + oq1_off;
-
-    signed char vp8_filter, Filter1, Filter2;
-    signed char p1 = (signed char) * op1 ^ 0x80;
-    signed char p0 = (signed char) * op0 ^ 0x80;
-    signed char q0 = (signed char) * oq0 ^ 0x80;
-    signed char q1 = (signed char) * oq1 ^ 0x80;
-    signed char u;
-
-    vp8_filter = vp8_signed_char_clamp(p1 - q1);
-    vp8_filter = vp8_signed_char_clamp(vp8_filter + 3 * (q0 - p0));
-    vp8_filter &= mask;
-
-    /* save bottom 3 bits so that we round one side +4 and the other +3 */
-    Filter1 = vp8_signed_char_clamp(vp8_filter + 4);
-    Filter1 >>= 3;
-    u = vp8_signed_char_clamp(q0 - Filter1);
-    *oq0  = u ^ 0x80;
-
-    Filter2 = vp8_signed_char_clamp(vp8_filter + 3);
-    Filter2 >>= 3;
-    u = vp8_signed_char_clamp(p0 + Filter2);
-    *op0 = u ^ 0x80;
-}
--- a/vp8/common/opencl/loopfilter_cl.c
+++ b/vp8/common/opencl/loopfilter_cl.c
@@ -1,457 +0,0 @@
-/*
- *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-
-#include "../../../vpx_ports/config.h"
-#include "loopfilter_cl.h"
-#include "../onyxc_int.h"
-
-#include "vpx_config.h"
-#include "vp8_opencl.h"
-#include "blockd_cl.h"
-
-const char *loopFilterCompileOptions = "-Ivp8/common/opencl";
-const char *loop_filter_cl_file_name = "vp8/common/opencl/loopfilter.cl";
-
-typedef unsigned char uc;
-
-extern void vp8_loop_filter_frame
-(
-    VP8_COMMON *cm,
-    MACROBLOCKD *mbd,
-    int default_filt_lvl
-);
-
-prototype_loopfilter_cl(vp8_loop_filter_horizontal_edge_cl);
-prototype_loopfilter_cl(vp8_loop_filter_vertical_edge_cl);
-prototype_loopfilter_cl(vp8_mbloop_filter_horizontal_edge_cl);
-prototype_loopfilter_cl(vp8_mbloop_filter_vertical_edge_cl);
-prototype_loopfilter_cl(vp8_loop_filter_simple_horizontal_edge_cl);
-prototype_loopfilter_cl(vp8_loop_filter_simple_vertical_edge_cl);
-
-/* Horizontal MB filtering */
-void vp8_loop_filter_mbh_cl(
-    MACROBLOCKD *x,
-    cl_mem buf_base,
-    int y_off,
-    int u_off,
-    int v_off,
-    int y_stride,
-    int uv_stride,
-    loop_filter_info *lfi,
-    int simpler_lpf
-)
-{
-    (void) simpler_lpf;
-
-    vp8_mbloop_filter_horizontal_edge_cl(x, buf_base, y_off, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2, 1);
-    vp8_mbloop_filter_horizontal_edge_cl(x, buf_base, u_off, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1, 1);
-    vp8_mbloop_filter_horizontal_edge_cl(x, buf_base, v_off, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1, 1);
-}
-
-void vp8_loop_filter_mbhs_cl(MACROBLOCKD *x, cl_mem buf_base, int y_off, int u_off, int v_off,
-                            int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
-{
-    (void) uv_stride;
-    (void) simpler_lpf;
-    vp8_loop_filter_simple_horizontal_edge_cl(x, buf_base, y_off, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2, 1);
-}
-
-/* Vertical MB Filtering */
-void vp8_loop_filter_mbv_cl(MACROBLOCKD *x, cl_mem buf_base, int y_off, int u_off, int v_off,
-                           int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
-{
-    (void) simpler_lpf;
-
-    vp8_mbloop_filter_vertical_edge_cl(x, buf_base, y_off, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2, 1);
-    vp8_mbloop_filter_vertical_edge_cl(x, buf_base, u_off, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1, 1);
-    vp8_mbloop_filter_vertical_edge_cl(x, buf_base, v_off, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1, 1);
-}
-
-void vp8_loop_filter_mbvs_cl(MACROBLOCKD *x, cl_mem buf_base, int y_off, int u_off, int v_off,
-                            int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
-{
-    (void) uv_stride;
-    (void) simpler_lpf;
-    vp8_loop_filter_simple_vertical_edge_cl(x, buf_base, y_off, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2, 1);
-}
-
-/* Horizontal B Filtering */
-void vp8_loop_filter_bh_cl(MACROBLOCKD *x, cl_mem buf_base, int y_off, int u_off, int v_off,
-                          int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
-{
-    (void) simpler_lpf;
-
-    vp8_loop_filter_horizontal_edge_cl(x, buf_base, y_off + 4 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
-    vp8_loop_filter_horizontal_edge_cl(x, buf_base, y_off + 8 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
-    vp8_loop_filter_horizontal_edge_cl(x, buf_base, y_off + 12 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
-    vp8_loop_filter_horizontal_edge_cl(x, buf_base, u_off + 4 * uv_stride, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1, 1);
-    vp8_loop_filter_horizontal_edge_cl(x, buf_base, v_off + 4 * uv_stride, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1, 1);
-
-}
-
-void vp8_loop_filter_bhs_cl(MACROBLOCKD *x, cl_mem buf_base, int y_off, int u_off, int v_off,
-                           int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
-{
-    (void) uv_stride;
-    (void) simpler_lpf;
-
-    vp8_loop_filter_simple_horizontal_edge_cl(x, buf_base, y_off + 4 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
-    vp8_loop_filter_simple_horizontal_edge_cl(x, buf_base, y_off + 8 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
-    vp8_loop_filter_simple_horizontal_edge_cl(x, buf_base, y_off + 12 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
-}
-
-/* Vertical B Filtering */
-void vp8_loop_filter_bv_cl(MACROBLOCKD *x, cl_mem buf_base, int y_off, int u_off, int v_off,
-                          int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
-{
-    (void) simpler_lpf;
-
-    vp8_loop_filter_vertical_edge_cl(x, buf_base, y_off + 4, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
-    vp8_loop_filter_vertical_edge_cl(x, buf_base, y_off + 8, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
-    vp8_loop_filter_vertical_edge_cl(x, buf_base, y_off + 12, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
-
-    vp8_loop_filter_vertical_edge_cl(x, buf_base, u_off + 4, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1, 1);
-    vp8_loop_filter_vertical_edge_cl(x, buf_base, v_off + 4, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1, 1);
-}
-
-void vp8_loop_filter_bvs_cl(MACROBLOCKD *x, cl_mem buf_base, int y_off, int u_off, int v_off,
-                           int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
-{
-    (void) uv_stride;
-    (void) simpler_lpf;
-
-    vp8_loop_filter_simple_vertical_edge_cl(x, buf_base, y_off + 4, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
-    vp8_loop_filter_simple_vertical_edge_cl(x, buf_base, y_off + 8, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
-    vp8_loop_filter_simple_vertical_edge_cl(x, buf_base, y_off + 12, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
-}
-
-void vp8_init_loop_filter_cl(VP8_COMMON *cm)
-{
-    loop_filter_info *lfi = cm->lf_info;
-    int sharpness_lvl = cm->sharpness_level;
-    int frame_type = cm->frame_type;
-    int i, j;
-
-    int block_inside_limit = 0;
-    int HEVThresh;
-    const int yhedge_boost  = 2;
-
-    /* For each possible value for the loop filter fill out a "loop_filter_info" entry. */
-    for (i = 0; i <= MAX_LOOP_FILTER; i++)
-    {
-        int filt_lvl = i;
-
-        if (frame_type == KEY_FRAME)
-        {
-            if (filt_lvl >= 40)
-                HEVThresh = 2;
-            else if (filt_lvl >= 15)
-                HEVThresh = 1;
-            else
-                HEVThresh = 0;
-        }
-        else
-        {
-            if (filt_lvl >= 40)
-                HEVThresh = 3;
-            else if (filt_lvl >= 20)
-                HEVThresh = 2;
-            else if (filt_lvl >= 15)
-                HEVThresh = 1;
-            else
-                HEVThresh = 0;
-        }
-
-        /* Set loop filter paramaeters that control sharpness. */
-        block_inside_limit = filt_lvl >> (sharpness_lvl > 0);
-        block_inside_limit = block_inside_limit >> (sharpness_lvl > 4);
-
-        if (sharpness_lvl > 0)
-        {
-            if (block_inside_limit > (9 - sharpness_lvl))
-                block_inside_limit = (9 - sharpness_lvl);
-        }
-
-        if (block_inside_limit < 1)
-            block_inside_limit = 1;
-
-        for (j = 0; j < 16; j++)
-        {
-            lfi[i].lim[j] = block_inside_limit;
-            lfi[i].mbflim[j] = filt_lvl + yhedge_boost;
-            lfi[i].flim[j] = filt_lvl;
-            lfi[i].thr[j] = HEVThresh;
-        }
-    }
-}
-
-/* Put vp8_init_loop_filter() in vp8dx_create_decompressor(). Only call vp8_frame_init_loop_filter() while decoding
- * each frame. Check last_frame_type to skip the function most of times.
- */
-void vp8_frame_init_loop_filter_cl(loop_filter_info *lfi, int frame_type)
-{
-    int HEVThresh;
-    int i, j;
-
-    /* For each possible value for the loop filter fill out a "loop_filter_info" entry. */
-    for (i = 0; i <= MAX_LOOP_FILTER; i++)
-    {
-        int filt_lvl = i;
-
-        if (frame_type == KEY_FRAME)
-        {
-            if (filt_lvl >= 40)
-                HEVThresh = 2;
-            else if (filt_lvl >= 15)
-                HEVThresh = 1;
-            else
-                HEVThresh = 0;
-        }
-        else
-        {
-            if (filt_lvl >= 40)
-                HEVThresh = 3;
-            else if (filt_lvl >= 20)
-                HEVThresh = 2;
-            else if (filt_lvl >= 15)
-                HEVThresh = 1;
-            else
-                HEVThresh = 0;
-        }
-
-        for (j = 0; j < 16; j++)
-        {
-            lfi[i].thr[j] = HEVThresh;
-        }
-    }
-}
-
-
-//This might not need to be copied from loopfilter.c
-void vp8_adjust_mb_lf_value_cl(MACROBLOCKD *mbd, int *filter_level)
-{
-    MB_MODE_INFO *mbmi = &mbd->mode_info_context->mbmi;
-
-    if (mbd->mode_ref_lf_delta_enabled)
-    {
-        /* Apply delta for reference frame */
-        *filter_level += mbd->ref_lf_deltas[mbmi->ref_frame];
-
-        /* Apply delta for mode */
-        if (mbmi->ref_frame == INTRA_FRAME)
-        {
-            /* Only the split mode BPRED has a further special case */
-            if (mbmi->mode == B_PRED)
-                *filter_level +=  mbd->mode_lf_deltas[0];
-        }
-        else
-        {
-            /* Zero motion mode */
-            if (mbmi->mode == ZEROMV)
-                *filter_level +=  mbd->mode_lf_deltas[1];
-
-            /* Split MB motion mode */
-            else if (mbmi->mode == SPLITMV)
-                *filter_level +=  mbd->mode_lf_deltas[3];
-
-            /* All other inter motion modes (Nearest, Near, New) */
-            else
-                *filter_level +=  mbd->mode_lf_deltas[2];
-        }
-
-        /* Range check */
-        if (*filter_level > MAX_LOOP_FILTER)
-            *filter_level = MAX_LOOP_FILTER;
-        else if (*filter_level < 0)
-            *filter_level = 0;
-    }
-}
-
-
-//Start of externally callable functions.
-
-int cl_init_loop_filter() {
-    int err;
-
-    // Create the filter compute program from the file-defined source code
-    if ( cl_load_program(&cl_data.loop_filter_program, loop_filter_cl_file_name,
-            loopFilterCompileOptions) != CL_SUCCESS )
-        return VP8_CL_TRIED_BUT_FAILED;
-
-    // Create the compute kernels in the program we wish to run
-    VP8_CL_CREATE_KERNEL(cl_data,loop_filter_program,vp8_loop_filter_horizontal_edge_kernel,"vp8_loop_filter_horizontal_edge_kernel");
-    VP8_CL_CREATE_KERNEL(cl_data,loop_filter_program,vp8_loop_filter_vertical_edge_kernel,"vp8_loop_filter_vertical_edge_kernel");
-    VP8_CL_CREATE_KERNEL(cl_data,loop_filter_program,vp8_mbloop_filter_horizontal_edge_kernel,"vp8_mbloop_filter_horizontal_edge_kernel");
-    VP8_CL_CREATE_KERNEL(cl_data,loop_filter_program,vp8_mbloop_filter_vertical_edge_kernel,"vp8_mbloop_filter_vertical_edge_kernel");
-    VP8_CL_CREATE_KERNEL(cl_data,loop_filter_program,vp8_loop_filter_simple_horizontal_edge_kernel,"vp8_loop_filter_simple_horizontal_edge_kernel");
-    VP8_CL_CREATE_KERNEL(cl_data,loop_filter_program,vp8_loop_filter_simple_vertical_edge_kernel,"vp8_loop_filter_simple_vertical_edge_kernel");
-
-    return CL_SUCCESS;
-}
-
-void cl_destroy_loop_filter(){
-
-    if (cl_data.loop_filter_program)
-        clReleaseProgram(cl_data.loop_filter_program);
-
-    VP8_CL_RELEASE_KERNEL(cl_data.vp8_loop_filter_horizontal_edge_kernel);
-    VP8_CL_RELEASE_KERNEL(cl_data.vp8_loop_filter_vertical_edge_kernel);
-    VP8_CL_RELEASE_KERNEL(cl_data.vp8_mbloop_filter_horizontal_edge_kernel);
-    VP8_CL_RELEASE_KERNEL(cl_data.vp8_mbloop_filter_vertical_edge_kernel);
-    VP8_CL_RELEASE_KERNEL(cl_data.vp8_loop_filter_simple_horizontal_edge_kernel);
-    VP8_CL_RELEASE_KERNEL(cl_data.vp8_loop_filter_simple_vertical_edge_kernel);
-
-    cl_data.loop_filter_program = NULL;
-}
-
-
-void vp8_loop_filter_set_baselines_cl(MACROBLOCKD *mbd, int default_filt_lvl, int *baseline_filter_level){
-    int alt_flt_enabled = mbd->segmentation_enabled;
-    int i;
-
-    if (alt_flt_enabled)
-    {
-        for (i = 0; i < MAX_MB_SEGMENTS; i++)
-        {
-            /* Abs value */
-            if (mbd->mb_segement_abs_delta == SEGMENT_ABSDATA)
-                baseline_filter_level[i] = mbd->segment_feature_data[MB_LVL_ALT_LF][i];
-            /* Delta Value */
-            else
-            {
-                baseline_filter_level[i] = default_filt_lvl + mbd->segment_feature_data[MB_LVL_ALT_LF][i];
-                baseline_filter_level[i] = (baseline_filter_level[i] >= 0) ? ((baseline_filter_level[i] <= MAX_LOOP_FILTER) ? baseline_filter_level[i] : MAX_LOOP_FILTER) : 0;  /* Clamp to valid range */
-            }
-        }
-    }
-    else
-    {
-        for (i = 0; i < MAX_MB_SEGMENTS; i++)
-            baseline_filter_level[i] = default_filt_lvl;
-    }
-}
-
-void vp8_loop_filter_frame_cl
-(
-    VP8_COMMON *cm,
-    MACROBLOCKD *mbd,
-    int default_filt_lvl
-)
-{
-    YV12_BUFFER_CONFIG *post = cm->frame_to_show;
-    loop_filter_info *lfi = cm->lf_info;
-    FRAME_TYPE frame_type = cm->frame_type;
-    LOOPFILTERTYPE filter_type = cm->filter_type;
-
-    int mb_row;
-    int mb_col;
-
-    int baseline_filter_level[MAX_MB_SEGMENTS];
-    int filter_level;
-    int alt_flt_enabled = mbd->segmentation_enabled;
-
-    int err;
-    unsigned char *buf_base;
-    int y_off, u_off, v_off;
-    //unsigned char *y_ptr, *u_ptr, *v_ptr;
-
-    mbd->mode_info_context = cm->mi;          /* Point at base of Mb MODE_INFO list */
-
-    /* Note the baseline filter values for each segment */
-    vp8_loop_filter_set_baselines_cl(mbd, default_filt_lvl, baseline_filter_level);
-
-    /* Initialize the loop filter for this frame. */
-    if ((cm->last_filter_type != cm->filter_type) || (cm->last_sharpness_level != cm->sharpness_level))
-        vp8_init_loop_filter_cl(cm);
-    else if (frame_type != cm->last_frame_type)
-        vp8_frame_init_loop_filter_cl(lfi, frame_type);
-
-    /* Set up the buffer pointers */
-
-    buf_base = post->buffer_alloc;
-    y_off = post->y_buffer - buf_base;
-    u_off = post->u_buffer - buf_base;
-    v_off = post->v_buffer - buf_base;
-
-    VP8_CL_SET_BUF(mbd->cl_commands, post->buffer_mem, post->buffer_size, post->buffer_alloc,
-            vp8_loop_filter_frame(cm,mbd,default_filt_lvl),);
-
-    /* vp8_filter each macro block */
-    for (mb_row = 0; mb_row < cm->mb_rows; mb_row++)
-    {
-        for (mb_col = 0; mb_col < cm->mb_cols; mb_col++)
-        {
-            int Segment = (alt_flt_enabled) ? mbd->mode_info_context->mbmi.segment_id : 0;
-
-            filter_level = baseline_filter_level[Segment];
-
-            /* Distance of Mb to the various image edges.
-             * These specified to 8th pel as they are always compared to values 
-             * that are in 1/8th pel units. Apply any context driven MB level
-             * adjustment
-             */
-            filter_level = vp8_adjust_mb_lf_value(mbd, filter_level);
-
-            if (filter_level)
-            {
-                if (mb_col > 0){
-                    if (filter_type == NORMAL_LOOPFILTER)
-                        vp8_loop_filter_mbv_cl(mbd, post->buffer_mem, y_off, u_off, v_off, post->y_stride, post->uv_stride, &lfi[filter_level], cm->simpler_lpf);
-                    else
-                        vp8_loop_filter_mbvs_cl(mbd, post->buffer_mem, y_off, u_off, v_off, post->y_stride, post->uv_stride, &lfi[filter_level], cm->simpler_lpf);
-                }
-
-                if (mbd->mode_info_context->mbmi.dc_diff > 0){
-                    if (filter_type == NORMAL_LOOPFILTER)
-                        vp8_loop_filter_bv_cl(mbd, post->buffer_mem, y_off, u_off, v_off, post->y_stride, post->uv_stride, &lfi[filter_level], cm->simpler_lpf);
-                    else
-                        vp8_loop_filter_bvs_cl(mbd, post->buffer_mem, y_off, u_off, v_off, post->y_stride, post->uv_stride, &lfi[filter_level], cm->simpler_lpf);
-                }
-
-                /* don't apply across umv border */
-                if (mb_row > 0){
-                    if (filter_type == NORMAL_LOOPFILTER)
-                        vp8_loop_filter_mbh_cl(mbd, post->buffer_mem, y_off, u_off, v_off, post->y_stride, post->uv_stride, &lfi[filter_level], cm->simpler_lpf);
-                    else
-                        vp8_loop_filter_mbhs_cl(mbd, post->buffer_mem, y_off, u_off, v_off, post->y_stride, post->uv_stride, &lfi[filter_level], cm->simpler_lpf);
-                }
-
-                if (mbd->mode_info_context->mbmi.dc_diff > 0){
-                    if (filter_type == NORMAL_LOOPFILTER)
-                        vp8_loop_filter_bh_cl(mbd, post->buffer_mem, y_off, u_off, v_off, post->y_stride, post->uv_stride, &lfi[filter_level], cm->simpler_lpf);
-                    else
-                        vp8_loop_filter_bhs_cl(mbd, post->buffer_mem, y_off, u_off, v_off, post->y_stride, post->uv_stride, &lfi[filter_level], cm->simpler_lpf);
-                }
-            }
-
-            y_off += 16;
-            u_off += 8;
-            v_off += 8;
-
-            mbd->mode_info_context++;     /* step to next MB */
-        }
-
-        y_off += post->y_stride  * 16 - post->y_width;
-        u_off += post->uv_stride *  8 - post->uv_width;
-        v_off += post->uv_stride *  8 - post->uv_width;
-
-        mbd->mode_info_context++;         /* Skip border mb */
-    }
-
-    //Retrieve buffer contents
-    err = clEnqueueReadBuffer(mbd->cl_commands, post->buffer_mem, CL_FALSE, 0, post->buffer_size, post->buffer_alloc, 0, NULL, NULL);
-    VP8_CL_CHECK_SUCCESS(mbd->cl_commands, err != CL_SUCCESS,
-        "Error: Failed to read loop filter output!\n",
-        ,
-    );
-
-    VP8_CL_FINISH(mbd->cl_commands);
-}
--- a/vp8/common/opencl/loopfilter_cl.h
+++ b/vp8/common/opencl/loopfilter_cl.h
@@ -1,48 +0,0 @@
-/*
- *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-
-#ifndef loopfilter_cl_h
-#define loopfilter_cl_h
-
-#include "../../../vpx_ports/mem.h"
-
-#include "../onyxc_int.h"
-#include "blockd_cl.h"
-#include "../loopfilter.h"
-
-#define prototype_loopfilter_cl(sym) \
-    void sym(MACROBLOCKD*, cl_mem src_base, int src_offset,  \
-             int pitch, const signed char *flimit, \
-             const signed char *limit, const signed char *thresh, int count, int block_cnt)
-
-#define prototype_loopfilter_block_cl(sym) \
-    void sym(MACROBLOCKD*, unsigned char *y, unsigned char *u, unsigned char *v,\
-             int ystride, int uv_stride, loop_filter_info *lfi, int simpler)
-
-extern void vp8_loop_filter_frame_cl
-(
-    VP8_COMMON *cm,
-    MACROBLOCKD *mbd,
-    int default_filt_lvl
-);
-
-extern prototype_loopfilter_block_cl(vp8_lf_normal_mb_v_cl);
-extern prototype_loopfilter_block_cl(vp8_lf_normal_b_v_cl);
-extern prototype_loopfilter_block_cl(vp8_lf_normal_mb_h_cl);
-extern prototype_loopfilter_block_cl(vp8_lf_normal_b_h_cl);
-extern prototype_loopfilter_block_cl(vp8_lf_simple_mb_v_cl);
-extern prototype_loopfilter_block_cl(vp8_lf_simple_b_v_cl);
-extern prototype_loopfilter_block_cl(vp8_lf_simple_mb_h_cl);
-extern prototype_loopfilter_block_cl(vp8_lf_simple_b_h_cl);
-
-typedef prototype_loopfilter_block_cl((*vp8_lf_block_cl_fn_t));
-
-#endif
--- a/vp8/common/opencl/loopfilter_filters_cl.c
+++ b/vp8/common/opencl/loopfilter_filters_cl.c
@@ -1,187 +0,0 @@
-/*
- *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-
-#include <stdlib.h>
-
-#include <stdio.h>
-
-#include "vpx_ports/config.h"
-#include "vp8_opencl.h"
-#include "blockd_cl.h"
-
-//#include "loopfilter_cl.h"
-//#include "../onyxc_int.h"
-
-typedef unsigned char uc;
-
-static void vp8_loop_filter_cl_run(
-    cl_command_queue cq,
-    cl_kernel kernel,
-    cl_mem buf_mem,
-    int s_off,
-    int p,
-    const signed char *flimit,
-    const signed char *limit,
-    const signed char *thresh,
-    int count,
-    int block_cnt
-){
-    size_t global[] = {count,block_cnt};
-    int err;
-
-    cl_mem flimit_mem;
-    cl_mem limit_mem;
-    cl_mem thresh_mem;
-
-    VP8_CL_CREATE_BUF(cq, flimit_mem, , sizeof(uc)*16, flimit,, );
-    VP8_CL_CREATE_BUF(cq, limit_mem, , sizeof(uc)*16, limit,, );
-    VP8_CL_CREATE_BUF(cq, thresh_mem, , sizeof(uc)*16, thresh,, );
-
-    err = 0;
-    err = clSetKernelArg(kernel, 0, sizeof (cl_mem), &buf_mem);
-    err |= clSetKernelArg(kernel, 1, sizeof (cl_int), &s_off);
-    err |= clSetKernelArg(kernel, 2, sizeof (cl_int), &p);
-    err |= clSetKernelArg(kernel, 3, sizeof (cl_mem), &flimit_mem);
-    err |= clSetKernelArg(kernel, 4, sizeof (cl_mem), &limit_mem);
-    err |= clSetKernelArg(kernel, 5, sizeof (cl_mem), &thresh_mem);
-    err |= clSetKernelArg(kernel, 6, sizeof (cl_int), &block_cnt);
-    VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
-        "Error: Failed to set kernel arguments!\n",,
-    );
-
-    /* Execute the kernel */
-    err = clEnqueueNDRangeKernel(cq, kernel, 2, NULL, global, NULL , 0, NULL, NULL);
-    VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
-        "Error: Failed to execute kernel!\n",
-        printf("err = %d\n",err);,
-    );
-
-    clReleaseMemObject(flimit_mem);
-    clReleaseMemObject(limit_mem);
-    clReleaseMemObject(thresh_mem);
-
-    VP8_CL_FINISH(cq);
-}
-
-void vp8_loop_filter_horizontal_edge_cl
-(
-    MACROBLOCKD *x,
-    cl_mem s_base,
-    int s_off,
-    int p, /* pitch */
-    const signed char *flimit,
-    const signed char *limit,
-    const signed char *thresh,
-    int count,
-    int block_cnt
-)
-{
-    vp8_loop_filter_cl_run(x->cl_commands,
-        cl_data.vp8_loop_filter_horizontal_edge_kernel, s_base, s_off,
-        p, flimit, limit, thresh, count*8, block_cnt
-    );
-}
-
-void vp8_loop_filter_vertical_edge_cl
-(
-    MACROBLOCKD *x,
-    cl_mem s_base,
-    int s_off,
-    int p,
-    const signed char *flimit,
-    const signed char *limit,
-    const signed char *thresh,
-    int count,
-    int block_cnt
-)
-{
-    vp8_loop_filter_cl_run(x->cl_commands,
-        cl_data.vp8_loop_filter_vertical_edge_kernel, s_base, s_off,
-        p, flimit, limit, thresh, count*8, block_cnt
-    );
-}
-
-void vp8_mbloop_filter_horizontal_edge_cl
-(
-    MACROBLOCKD *x,
-    cl_mem s_base,
-    int s_off,
-    int p,
-    const signed char *flimit,
-    const signed char *limit,
-    const signed char *thresh,
-    int count,
-    int block_cnt
-)
-{
-    vp8_loop_filter_cl_run(x->cl_commands,
-        cl_data.vp8_mbloop_filter_horizontal_edge_kernel, s_base, s_off,
-        p, flimit, limit, thresh, count*8, block_cnt
-    );
-}
-
-
-void vp8_mbloop_filter_vertical_edge_cl
-(
-    MACROBLOCKD *x,
-    cl_mem s_base,
-    int s_off,
-    int p,
-    const signed char *flimit,
-    const signed char *limit,
-    const signed char *thresh,
-    int count,
-    int block_cnt
-)
-{
-    vp8_loop_filter_cl_run(x->cl_commands,
-        cl_data.vp8_mbloop_filter_vertical_edge_kernel, s_base, s_off,
-        p, flimit, limit, thresh, count*8, block_cnt
-    );
-}
-
-void vp8_loop_filter_simple_horizontal_edge_cl
-(
-    MACROBLOCKD *x,
-    cl_mem s_base,
-    int s_off,
-    int p,
-    const signed char *flimit,
-    const signed char *limit,
-    const signed char *thresh,
-    int count,
-    int block_cnt
-)
-{
-    vp8_loop_filter_cl_run(x->cl_commands,
-        cl_data.vp8_loop_filter_simple_horizontal_edge_kernel, s_base, s_off,
-        p, flimit, limit, thresh, count*8, block_cnt
-    );
-}
-
-void vp8_loop_filter_simple_vertical_edge_cl
-(
-    MACROBLOCKD *x,
-    cl_mem s_base,
-    int s_off,
-    int p,
-    const signed char *flimit,
-    const signed char *limit,
-    const signed char *thresh,
-    int count,
-    int block_cnt
-)
-{
-    vp8_loop_filter_cl_run(x->cl_commands,
-        cl_data.vp8_loop_filter_simple_vertical_edge_kernel, s_base, s_off,
-        p, flimit, limit, thresh, count*8, block_cnt
-    );
-}
--- a/vp8/common/opencl/opencl_systemdependent.c
+++ b/vp8/common/opencl/opencl_systemdependent.c
@@ -1,41 +0,0 @@
-/*
- *  Copyright (c) 2011 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-#include "vpx_ports/config.h"
-#include "../subpixel.h"
-#include "subpixel_cl.h"
-#include "../onyxc_int.h"
-#include "vp8_opencl.h"
-
-#if HAVE_DLOPEN
-#include "dynamic_cl.h"
-#endif
-
-void vp8_arch_opencl_common_init(VP8_COMMON *ctx)
-{
-
-#if HAVE_DLOPEN
-
-#if WIN32 //Windows .dll has no lib prefix and no extension
-    	cl_loaded = load_cl("OpenCL");
-#else   //But *nix needs full name
-    	cl_loaded = load_cl("libOpenCL.so");
-#endif
-
-        if (cl_loaded == CL_SUCCESS)
-            cl_initialized = cl_common_init();
-        else
-            cl_initialized = VP8_CL_TRIED_BUT_FAILED;
-
-#else //!HAVE_DLOPEN (e.g. Apple)
-        cl_initialized = cl_common_init();
-#endif
-
-}
--- a/vp8/common/opencl/reconinter_cl.c
+++ b/vp8/common/opencl/reconinter_cl.c
@@ -1,641 +0,0 @@
-/*
- *  Copyright (c) 2011 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-//for the decoder, all subpixel prediction is done in this file.
-//
-//Need to determine some sort of mechanism for easily determining SIXTAP/BILINEAR
-//and what arguments to feed into the kernels. These kernels SHOULD be 2-pass,
-//and ideally there'd be a data structure that determined what static arguments
-//to pass in.
-//
-//Also, the only external functions being called here are the subpixel prediction
-//functions. Hopefully this means no worrying about when to copy data back/forth.
-
-#include "../../../vpx_ports/config.h"
-//#include "../recon.h"
-#include "../subpixel.h"
-//#include "../blockd.h"
-//#include "../reconinter.h"
-#if CONFIG_RUNTIME_CPU_DETECT
-//#include "../onyxc_int.h"
-#endif
-
-#include "vp8_opencl.h"
-#include "filter_cl.h"
-#include "reconinter_cl.h"
-#include "blockd_cl.h"
-
-#include <stdio.h>
-
-/* use this define on systems where unaligned int reads and writes are
- * not allowed, i.e. ARM architectures
- */
-/*#define MUST_BE_ALIGNED*/
-
-static const int bbb[4] = {0, 2, 8, 10};
-
-static void vp8_memcpy(
-    unsigned char *src_base,
-    int src_offset,
-    int src_stride,
-    unsigned char *dst_base,
-    int dst_offset,
-    int dst_stride,
-    int num_bytes,
-    int num_iter
-){
-
-    int i,r;
-    unsigned char *src = &src_base[src_offset];
-    unsigned char *dst = &dst_base[dst_offset];
-    src_offset = dst_offset = 0;
-
-    for (r = 0; r < num_iter; r++){
-        for (i = 0; i < num_bytes; i++){
-            src_offset = r*src_stride + i;
-            dst_offset = r*dst_stride + i;
-            dst[dst_offset] = src[src_offset];
-        }
-    }
-}
-
-static void vp8_copy_mem_cl(
-    cl_command_queue cq,
-    cl_mem src_mem,
-    int *src_offsets,
-    int src_stride,
-    cl_mem dst_mem,
-    int *dst_offsets,
-    int dst_stride,
-    int num_bytes,
-    int num_iter,
-    int num_blocks
-){
-
-    int err,block;
-
-#if MEM_COPY_KERNEL
-    size_t global[3] = {num_bytes, num_iter, num_blocks};
-
-    size_t local[3];
-    local[0] = global[0];
-    local[1] = global[1];
-    local[2] = global[2];
-
-    err  = clSetKernelArg(cl_data.vp8_memcpy_kernel, 0, sizeof (cl_mem), &src_mem);
-    err |= clSetKernelArg(cl_data.vp8_memcpy_kernel, 2, sizeof (int), &src_stride);
-    err |= clSetKernelArg(cl_data.vp8_memcpy_kernel, 3, sizeof (cl_mem), &dst_mem);
-    err |= clSetKernelArg(cl_data.vp8_memcpy_kernel, 5, sizeof (int), &dst_stride);
-    err |= clSetKernelArg(cl_data.vp8_memcpy_kernel, 6, sizeof (int), &num_bytes);
-    err |= clSetKernelArg(cl_data.vp8_memcpy_kernel, 7, sizeof (int), &num_iter);
-    VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
-        "Error: Failed to set kernel arguments!\n",
-        return,
-    );
-
-    for (block = 0; block < num_blocks; block++){
-
-        /* Set kernel arguments */
-        err = clSetKernelArg(cl_data.vp8_memcpy_kernel, 1, sizeof (int), &src_offsets[block]);
-        err |= clSetKernelArg(cl_data.vp8_memcpy_kernel, 4, sizeof (int), &dst_offsets[block]);
-        VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
-            "Error: Failed to set kernel arguments!\n",
-            return,
-        );
-
-        /* Execute the kernel */
-        if (num_bytes * num_iter > cl_data.vp8_memcpy_kernel_size){
-            err = clEnqueueNDRangeKernel( cq, cl_data.vp8_memcpy_kernel, 2, NULL, global, NULL , 0, NULL, NULL);
-        } else {
-            err = clEnqueueNDRangeKernel( cq, cl_data.vp8_memcpy_kernel, 2, NULL, global, local , 0, NULL, NULL);
-        }
-
-        VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
-            "Error: Failed to execute kernel!\n",
-            return,
-        );
-    }
-#else
-    int iter;
-    for (block=0; block < num_blocks; block++){
-        for (iter = 0; iter < num_iter; iter++){
-            err = clEnqueueCopyBuffer(cq, src_mem, dst_mem,
-                    src_offsets[block]+iter*src_stride,
-                    dst_offsets[block]+iter*dst_stride,
-                    num_bytes, 0, NULL, NULL
-                  );
-            VP8_CL_CHECK_SUCCESS(cq, err != CL_SUCCESS, "Error copying between buffers\n",
-                    ,
-            );
-        }
-    }
-#endif
-}
-
-static void vp8_build_inter_predictors_b_cl(MACROBLOCKD *x, BLOCKD *d, int pitch)
-{
-    unsigned char *ptr_base = *(d->base_pre);
-    int ptr_offset = d->pre + (d->bmi.mv.as_mv.row >> 3) * d->pre_stride + (d->bmi.mv.as_mv.col >> 3);
-
-    vp8_subpix_cl_fn_t sppf;
-
-    int pre_dist = *d->base_pre - x->pre.buffer_alloc;
-    cl_mem pre_mem = x->pre.buffer_mem;
-    int pre_off = pre_dist+ptr_offset;
-
-    if (d->sixtap_filter == CL_TRUE)
-        sppf = vp8_sixtap_predict4x4_cl;
-    else
-        sppf = vp8_bilinear_predict4x4_cl;
-
-    //ptr_base a.k.a. d->base_pre is the start of the
-    //Macroblock's y_buffer, u_buffer, or v_buffer
-
-    if ( (d->bmi.mv.as_mv.row | d->bmi.mv.as_mv.col) & 7)
-    {
-        sppf(d->cl_commands, ptr_base, pre_mem, pre_off, d->pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, d->predictor_base, d->cl_predictor_mem, d->predictor_offset, pitch);
-    }
-    else
-    {
-        vp8_copy_mem_cl(d->cl_commands, pre_mem, &pre_off, d->pre_stride,d->cl_predictor_mem, &d->predictor_offset,pitch,4,4,1);
-    }
-}
-
-
-static void vp8_build_inter_predictors4b_cl(MACROBLOCKD *x, BLOCKD *d, int pitch)
-{
-    unsigned char *ptr_base = *(d->base_pre);
-    int ptr_offset = d->pre + (d->bmi.mv.as_mv.row >> 3) * d->pre_stride + (d->bmi.mv.as_mv.col >> 3);
-
-    int pre_dist = *d->base_pre - x->pre.buffer_alloc;
-    cl_mem pre_mem = x->pre.buffer_mem;
-    int pre_off = pre_dist + ptr_offset;
-
-    //If there's motion in the bottom 8 subpixels, need to do subpixel prediction
-    if ( (d->bmi.mv.as_mv.row | d->bmi.mv.as_mv.col) & 7)
-    {
-            if (d->sixtap_filter == CL_TRUE)
-                vp8_sixtap_predict8x8_cl(d->cl_commands, ptr_base, pre_mem, pre_off, d->pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, d->predictor_base, d->cl_predictor_mem, d->predictor_offset, pitch);
-            else
-                vp8_bilinear_predict8x8_cl(d->cl_commands, ptr_base, pre_mem, pre_off, d->pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, d->predictor_base, d->cl_predictor_mem, d->predictor_offset, pitch);
-    }
-    //Otherwise copy memory directly from src to dest
-    else
-    {
-        vp8_copy_mem_cl(d->cl_commands, pre_mem, &pre_off, d->pre_stride, d->cl_predictor_mem, &d->predictor_offset, pitch, 8, 8, 1);
-    }
-
-
-}
-
-static void vp8_build_inter_predictors2b_cl(MACROBLOCKD *x, BLOCKD *d, int pitch)
-{
-    unsigned char *ptr_base = *(d->base_pre);
-
-    int ptr_offset = d->pre + (d->bmi.mv.as_mv.row >> 3) * d->pre_stride + (d->bmi.mv.as_mv.col >> 3);
-
-    int pre_dist = *d->base_pre - x->pre.buffer_alloc;
-    cl_mem pre_mem = x->pre.buffer_mem;
-    int pre_off = pre_dist+ptr_offset;
-
-    if ( (d->bmi.mv.as_mv.row | d->bmi.mv.as_mv.col) & 7)
-    {
-        if (d->sixtap_filter == CL_TRUE)
-            vp8_sixtap_predict8x4_cl(d->cl_commands,ptr_base,pre_mem,pre_off, d->pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, d->predictor_base, d->cl_predictor_mem, d->predictor_offset, pitch);
-        else
-            vp8_bilinear_predict8x4_cl(d->cl_commands,ptr_base,pre_mem,pre_off, d->pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, d->predictor_base, d->cl_predictor_mem, d->predictor_offset, pitch);
-    }
-    else
-    {
-        vp8_copy_mem_cl(d->cl_commands, pre_mem, &pre_off, d->pre_stride, d->cl_predictor_mem, &d->predictor_offset, pitch, 8, 4, 1);
-    }
-}
-
-
-void vp8_build_inter_predictors_mbuv_cl(MACROBLOCKD *x)
-{
-    int i;
-
-    vp8_cl_mb_prep(x, PREDICTOR|PRE_BUF);
-
-#if !ONE_CQ_PER_MB
-    VP8_CL_FINISH(x->cl_commands);
-#endif
-
-    if (x->mode_info_context->mbmi.ref_frame != INTRA_FRAME &&
-        x->mode_info_context->mbmi.mode != SPLITMV)
-    {
-
-        unsigned char *pred_base = x->predictor;
-        int upred_offset = 256;
-        int vpred_offset = 320;
-
-        int mv_row = x->block[16].bmi.mv.as_mv.row;
-        int mv_col = x->block[16].bmi.mv.as_mv.col;
-        int offset;
-
-        unsigned char *pre_base = x->pre.buffer_alloc;
-        cl_mem pre_mem = x->pre.buffer_mem;
-        int upre_off = x->pre.u_buffer - pre_base;
-        int vpre_off = x->pre.v_buffer - pre_base;
-        int pre_stride = x->block[16].pre_stride;
-
-        offset = (mv_row >> 3) * pre_stride + (mv_col >> 3);
-
-        if ((mv_row | mv_col) & 7)
-        {
-            if (cl_initialized == CL_SUCCESS && x->sixtap_filter == CL_TRUE){
-                vp8_sixtap_predict8x8_cl(x->block[16].cl_commands,pre_base, pre_mem, upre_off+offset, pre_stride, mv_col & 7, mv_row & 7, pred_base, x->cl_predictor_mem, upred_offset, 8);
-                vp8_sixtap_predict8x8_cl(x->block[20].cl_commands,pre_base, pre_mem, vpre_off+offset, pre_stride, mv_col & 7, mv_row & 7, pred_base, x->cl_predictor_mem, vpred_offset, 8);
-            }
-            else{
-                vp8_bilinear_predict8x8_cl(x->block[16].cl_commands,pre_base, pre_mem, upre_off+offset, pre_stride, mv_col & 7, mv_row & 7, pred_base, x->cl_predictor_mem, upred_offset, 8);
-                vp8_bilinear_predict8x8_cl(x->block[20].cl_commands,pre_base, pre_mem, vpre_off+offset, pre_stride, mv_col & 7, mv_row & 7, pred_base, x->cl_predictor_mem, vpred_offset, 8);
-            }
-        }
-        else
-        {
-            int pre_offsets[2] = {upre_off+offset, vpre_off+offset};
-            int pred_offsets[2] = {upred_offset,vpred_offset};
-            vp8_copy_mem_cl(x->block[16].cl_commands, pre_mem, pre_offsets, pre_stride, x->cl_predictor_mem, pred_offsets, 8, 8, 8, 2);
-        }
-    }
-    else
-    {
-        // Can probably batch these operations as well, but not tested in decoder
-        // (or at least the test videos I've been using.
-        for (i = 16; i < 24; i += 2)
-        {
-            BLOCKD *d0 = &x->block[i];
-            BLOCKD *d1 = &x->block[i+1];
-            if (d0->bmi.mv.as_int == d1->bmi.mv.as_int)
-                vp8_build_inter_predictors2b_cl(x, d0, 8);
-            else
-            {
-                vp8_build_inter_predictors_b_cl(x, d0, 8);
-                vp8_build_inter_predictors_b_cl(x, d1, 8);
-            }
-        }
-    }
-
-#if !ONE_CQ_PER_MB
-    VP8_CL_FINISH(x->block[0].cl_commands);
-    VP8_CL_FINISH(x->block[16].cl_commands);
-    VP8_CL_FINISH(x->block[20].cl_commands);
-#endif
-
-    vp8_cl_mb_finish(x, PREDICTOR);
-}
-
-void vp8_build_inter_predictors_mb_cl(MACROBLOCKD *x)
-{
-    //If CL is running in encoder, need to call following before proceeding.
-    //vp8_cl_mb_prep(x, PRE_BUF);
-
-#if !ONE_CQ_PER_MB
-    VP8_CL_FINISH(x->cl_commands);
-#endif
-
-    if (x->mode_info_context->mbmi.ref_frame != INTRA_FRAME &&
-        x->mode_info_context->mbmi.mode != SPLITMV)
-    {
-        int offset;
-        unsigned char *pred_base = x->predictor;
-        int upred_offset = 256;
-        int vpred_offset = 320;
-
-        int mv_row = x->mode_info_context->mbmi.mv.as_mv.row;
-        int mv_col = x->mode_info_context->mbmi.mv.as_mv.col;
-        int pre_stride = x->block[0].pre_stride;
-
-        unsigned char *pre_base = x->pre.buffer_alloc;
-        cl_mem pre_mem = x->pre.buffer_mem;
-        int ypre_off = x->pre.y_buffer - pre_base + (mv_row >> 3) * pre_stride + (mv_col >> 3);
-        int upre_off = x->pre.u_buffer - pre_base;
-        int vpre_off = x->pre.v_buffer - pre_base;
-
-        if ((mv_row | mv_col) & 7)
-        {
-            if (cl_initialized == CL_SUCCESS && x->sixtap_filter == CL_TRUE){
-                vp8_sixtap_predict16x16_cl(x->block[0].cl_commands, pre_base, pre_mem, ypre_off, pre_stride, mv_col & 7, mv_row & 7, pred_base, x->cl_predictor_mem, 0, 16);
-            }
-            else
-                vp8_bilinear_predict16x16_cl(x->block[0].cl_commands, pre_base, pre_mem,  ypre_off, pre_stride, mv_col & 7, mv_row & 7, pred_base, x->cl_predictor_mem, 0, 16);
-        }
-        else
-        {
-            //16x16 copy
-            int pred_off = 0;
-            vp8_copy_mem_cl(x->block[0].cl_commands, pre_mem, &ypre_off, pre_stride, x->cl_predictor_mem, &pred_off, 16, 16, 16, 1);
-        }
-
-
-        mv_row = x->block[16].bmi.mv.as_mv.row;
-        mv_col = x->block[16].bmi.mv.as_mv.col;
-        pre_stride >>= 1;
-        offset = (mv_row >> 3) * pre_stride + (mv_col >> 3);
-
-        if ((mv_row | mv_col) & 7)
-        {
-            if (x->sixtap_filter == CL_TRUE){
-                vp8_sixtap_predict8x8_cl(x->block[16].cl_commands, pre_base, pre_mem, upre_off+offset, pre_stride, mv_col & 7, mv_row & 7, pred_base, x->cl_predictor_mem, upred_offset, 8);
-                vp8_sixtap_predict8x8_cl(x->block[20].cl_commands, pre_base, pre_mem, vpre_off+offset, pre_stride, mv_col & 7, mv_row & 7, pred_base, x->cl_predictor_mem, vpred_offset, 8);
-            }
-            else {
-                vp8_bilinear_predict8x8_cl(x->block[16].cl_commands, pre_base, pre_mem, upre_off+offset, pre_stride, mv_col & 7, mv_row & 7, pred_base, x->cl_predictor_mem, upred_offset, 8);
-                vp8_bilinear_predict8x8_cl(x->block[20].cl_commands, pre_base, pre_mem, vpre_off+offset, pre_stride, mv_col & 7, mv_row & 7, pred_base, x->cl_predictor_mem, vpred_offset, 8);
-            }
-        }
-        else
-        {
-            int pre_off = upre_off + offset;
-            vp8_copy_mem_cl(x->block[16].cl_commands, pre_mem, &pre_off, pre_stride, x->cl_predictor_mem, &upred_offset, 8, 8, 8, 1);
-            pre_off = vpre_off + offset;
-            vp8_copy_mem_cl(x->block[20].cl_commands, pre_mem, &pre_off, pre_stride, x->cl_predictor_mem, &vpred_offset, 8, 8, 8, 1);
-        }
-    }
-    else
-    {
-        int i;
-
-        if (x->mode_info_context->mbmi.partitioning < 3)
-        {
-            for (i = 0; i < 4; i++)
-            {
-                BLOCKD *d = &x->block[bbb[i]];
-                vp8_build_inter_predictors4b_cl(x, d, 16);
-            }
-        }
-        else
-        {
-            /* This loop can be done in any order... No dependencies.*/
-            /* Also, d0/d1 can be decoded simultaneously */
-            for (i = 0; i < 16; i += 2)
-            {
-                BLOCKD *d0 = &x->block[i];
-                BLOCKD *d1 = &x->block[i+1];
-
-                if (d0->bmi.mv.as_int == d1->bmi.mv.as_int)
-                    vp8_build_inter_predictors2b_cl(x, d0, 16);
-                else
-                {
-                    vp8_build_inter_predictors_b_cl(x, d0, 16);
-                    vp8_build_inter_predictors_b_cl(x, d1, 16);
-                }
-            }
-        }
-
-        /* Another case of re-orderable/batchable loop */
-        for (i = 16; i < 24; i += 2)
-        {
-            BLOCKD *d0 = &x->block[i];
-            BLOCKD *d1 = &x->block[i+1];
-
-            if (d0->bmi.mv.as_int == d1->bmi.mv.as_int)
-                vp8_build_inter_predictors2b_cl(x, d0, 8);
-            else
-            {
-                vp8_build_inter_predictors_b_cl(x, d0, 8);
-                vp8_build_inter_predictors_b_cl(x, d1, 8);
-            }
-        }
-    }
-
-#if !ONE_CQ_PER_MB
-    VP8_CL_FINISH(x->block[0].cl_commands);
-    VP8_CL_FINISH(x->block[16].cl_commands);
-    VP8_CL_FINISH(x->block[20].cl_commands);
-#endif
-
-    vp8_cl_mb_finish(x, PREDICTOR);
-}
-
-
-/* The following functions are written for skip_recon_mb() to call. Since there is no recon in this
- * situation, we can write the result directly to dst buffer instead of writing it to predictor
- * buffer and then copying it to dst buffer.
- */
-static void vp8_build_inter_predictors_b_s_cl(MACROBLOCKD *x, BLOCKD *d, int dst_offset)
-{
-    unsigned char *ptr_base = *(d->base_pre);
-    int dst_stride = d->dst_stride;
-    int pre_stride = d->pre_stride;
-    int ptr_offset = d->pre + (d->bmi.mv.as_mv.row >> 3) * d->pre_stride + (d->bmi.mv.as_mv.col >> 3);
-    vp8_subpix_cl_fn_t sppf;
-
-    int pre_dist = *d->base_pre - x->pre.buffer_alloc;
-    cl_mem pre_mem = x->pre.buffer_mem;
-    cl_mem dst_mem = x->dst.buffer_mem;
-
-    if (d->sixtap_filter == CL_TRUE){
-        sppf = vp8_sixtap_predict4x4_cl;
-    } else
-        sppf = vp8_bilinear_predict4x4_cl;
-        
-    if ( (d->bmi.mv.as_mv.row | d->bmi.mv.as_mv.col) & 7)
-    {
-        sppf(d->cl_commands, ptr_base, pre_mem, pre_dist+ptr_offset, pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, NULL, dst_mem, dst_offset, dst_stride);
-    }
-    else
-    {
-        int pre_off = pre_dist+ptr_offset;
-        vp8_copy_mem_cl(d->cl_commands, pre_mem,&pre_off,pre_stride, dst_mem, &dst_offset,dst_stride,4,4,1);
-    }
-}
-
-
-void vp8_build_inter_predictors_mb_s_cl(MACROBLOCKD *x)
-{
-    cl_mem dst_mem = NULL;
-    cl_mem pre_mem = x->pre.buffer_mem;
-
-    unsigned char *dst_base = x->dst.buffer_alloc;
-    int ydst_off = x->dst.y_buffer - dst_base;
-    int udst_off = x->dst.u_buffer - dst_base;
-    int vdst_off = x->dst.v_buffer - dst_base;
-
-    dst_mem = x->dst.buffer_mem;
-    vp8_cl_mb_prep(x, DST_BUF);
-
-#if !ONE_CQ_PER_MB
-    VP8_CL_FINISH(x->cl_commands);
-#endif
-
-    if (x->mode_info_context->mbmi.mode != SPLITMV)
-    {
-        int offset;
-        unsigned char *pre_base = x->pre.buffer_alloc;
-        int ypre_off = x->pre.y_buffer - pre_base;
-        int upre_off = x->pre.u_buffer - pre_base;
-        int vpre_off = x->pre.v_buffer - pre_base;
-
-        int mv_row = x->mode_info_context->mbmi.mv.as_mv.row;
-        int mv_col = x->mode_info_context->mbmi.mv.as_mv.col;
-        int pre_stride = x->dst.y_stride;
-
-        int ptr_offset = (mv_row >> 3) * pre_stride + (mv_col >> 3);
-
-        if ((mv_row | mv_col) & 7)
-        {
-            if (x->sixtap_filter == CL_TRUE){
-                vp8_sixtap_predict16x16_cl(x->block[0].cl_commands, pre_base, pre_mem, ypre_off+ptr_offset, pre_stride, mv_col & 7, mv_row & 7, dst_base, dst_mem, ydst_off, x->dst.y_stride);
-            }
-            else
-                vp8_bilinear_predict16x16_cl(x->block[0].cl_commands, pre_base, pre_mem, ypre_off+ptr_offset, pre_stride, mv_col & 7, mv_row & 7, dst_base, dst_mem, ydst_off, x->dst.y_stride);
-        }
-        else
-        {
-            int pre_off = ypre_off+ptr_offset;
-            vp8_copy_mem_cl(x->block[0].cl_commands, pre_mem, &pre_off, pre_stride, dst_mem, &ydst_off, x->dst.y_stride, 16, 16, 1);
-        }
-
-        mv_row = x->block[16].bmi.mv.as_mv.row;
-        mv_col = x->block[16].bmi.mv.as_mv.col;
-        pre_stride >>= 1;
-        offset = (mv_row >> 3) * pre_stride + (mv_col >> 3);
-
-        if ((mv_row | mv_col) & 7)
-        {
-            if (x->sixtap_filter == CL_TRUE){
-                vp8_sixtap_predict8x8_cl(x->block[16].cl_commands, pre_base, pre_mem, upre_off+offset, pre_stride, mv_col & 7, mv_row & 7, dst_base, dst_mem, udst_off, x->dst.uv_stride);
-                vp8_sixtap_predict8x8_cl(x->block[20].cl_commands, pre_base, pre_mem, vpre_off+offset, pre_stride, mv_col & 7, mv_row & 7, dst_base, dst_mem, vdst_off, x->dst.uv_stride);
-            } else {
-                vp8_bilinear_predict8x8_cl(x->block[16].cl_commands, pre_base, pre_mem, upre_off+offset, pre_stride, mv_col & 7, mv_row & 7, dst_base, dst_mem, udst_off, x->dst.uv_stride);
-                vp8_bilinear_predict8x8_cl(x->block[20].cl_commands, pre_base, pre_mem, vpre_off+offset, pre_stride, mv_col & 7, mv_row & 7, dst_base, dst_mem, vdst_off, x->dst.uv_stride);
-            }
-        }
-        else
-        {
-            int pre_offsets[2] = {upre_off+offset, vpre_off+offset};
-            int dst_offsets[2] = {udst_off,vdst_off};
-            vp8_copy_mem_cl(x->block[16].cl_commands, pre_mem, pre_offsets, pre_stride, dst_mem, dst_offsets, x->dst.uv_stride, 8, 8, 2);
-        }
-
-    }
-    else
-    {
-        /* note: this whole ELSE part is not executed at all. So, no way to test the correctness of my modification. Later,
-         * if sth is wrong, go back to what it is in build_inter_predictors_mb.
-         *
-         * ACW: Not sure who the above comment belongs to, but it is
-         *      accurate for the decoder. Verified by reverse trace of source
-         */
-        int i;
-
-        if (x->mode_info_context->mbmi.partitioning < 3)
-        {
-            for (i = 0; i < 4; i++)
-            {
-                BLOCKD *d = &x->block[bbb[i]];
-
-                {
-                    unsigned char *ptr_base = *(d->base_pre);
-                    int pre_off = ptr_base - x->pre.buffer_alloc;
-                    
-                    int ptr_offset = d->pre + (d->bmi.mv.as_mv.row >> 3) * d->pre_stride + (d->bmi.mv.as_mv.col >> 3);
-
-                    pre_off += ptr_offset;
-
-                    if ( (d->bmi.mv.as_mv.row | d->bmi.mv.as_mv.col) & 7)
-                    {
-                        if (x->sixtap_filter == CL_TRUE)
-                            vp8_sixtap_predict8x8_cl(d->cl_commands, ptr_base, pre_mem, pre_off, d->pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, dst_base, dst_mem, ydst_off, x->dst.y_stride);
-                        else
-                            vp8_bilinear_predict8x8_cl(d->cl_commands, ptr_base, pre_mem, pre_off, d->pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, dst_base, dst_mem, ydst_off, x->dst.y_stride);
-                    }
-                    else
-                    {
-                        vp8_copy_mem_cl(x->block[0].cl_commands, pre_mem, &pre_off, d->pre_stride, dst_mem, &ydst_off, x->dst.y_stride, 8, 8, 1);
-                    }
-                }
-            }
-        }
-        else
-        {
-            for (i = 0; i < 16; i += 2)
-            {
-                BLOCKD *d0 = &x->block[i];
-                BLOCKD *d1 = &x->block[i+1];
-
-                if (d0->bmi.mv.as_int == d1->bmi.mv.as_int)
-                {
-                    /*vp8_build_inter_predictors2b(x, d0, 16);*/
-                    unsigned char *ptr_base = *(d0->base_pre);
-
-                    int pre_off = ptr_base - x->pre.buffer_alloc;
-
-                    int ptr_offset = d0->pre + (d0->bmi.mv.as_mv.row >> 3) * d0->pre_stride + (d0->bmi.mv.as_mv.col >> 3);
-                    pre_off += ptr_offset;
-
-                    if ( (d0->bmi.mv.as_mv.row | d0->bmi.mv.as_mv.col) & 7)
-                    {
-                        if (d0->sixtap_filter == CL_TRUE)
-                            vp8_sixtap_predict8x4_cl(d0->cl_commands, ptr_base, pre_mem, pre_off, d0->pre_stride, d0->bmi.mv.as_mv.col & 7, d0->bmi.mv.as_mv.row & 7, dst_base, dst_mem, ydst_off, x->dst.y_stride);
-                        else
-                            vp8_bilinear_predict8x4_cl(d0->cl_commands, ptr_base, pre_mem,pre_off, d0->pre_stride, d0->bmi.mv.as_mv.col & 7, d0->bmi.mv.as_mv.row & 7, dst_base, dst_mem, ydst_off, x->dst.y_stride);
-                    }
-                    else
-                    {
-                        vp8_copy_mem_cl(x->block[0].cl_commands, pre_mem, &pre_off, d0->pre_stride, dst_mem, &ydst_off, x->dst.y_stride, 8, 4, 1);
-                    }
-                }
-                else
-                {
-                    vp8_build_inter_predictors_b_s_cl(x,d0, ydst_off);
-                    vp8_build_inter_predictors_b_s_cl(x,d1, ydst_off);
-                }
-            }
-        }
-
-        for (i = 16; i < 24; i += 2)
-        {
-            BLOCKD *d0 = &x->block[i];
-            BLOCKD *d1 = &x->block[i+1];
-
-            if (d0->bmi.mv.as_int == d1->bmi.mv.as_int)
-            {
-                /*vp8_build_inter_predictors2b(x, d0, 8);*/
-                unsigned char *ptr_base = *(d0->base_pre);
-                int ptr_offset = d0->pre + (d0->bmi.mv.as_mv.row >> 3) * d0->pre_stride + (d0->bmi.mv.as_mv.col >> 3);
-                int pre_off = ptr_base - x->pre.buffer_alloc + ptr_offset;
-
-                if ( (d0->bmi.mv.as_mv.row | d0->bmi.mv.as_mv.col) & 7)
-                {
-                    if (d0->sixtap_filter || CL_TRUE)
-                        vp8_sixtap_predict8x4_cl(d0->cl_commands, ptr_base, pre_mem, pre_off, d0->pre_stride,
-                            d0->bmi.mv.as_mv.col & 7, d0->bmi.mv.as_mv.row & 7,
-                            dst_base, dst_mem, ydst_off, x->dst.uv_stride);
-                    else
-                        vp8_bilinear_predict8x4_cl(d0->cl_commands, ptr_base, pre_mem, pre_off, d0->pre_stride,
-                            d0->bmi.mv.as_mv.col & 7, d0->bmi.mv.as_mv.row & 7,
-                            dst_base, dst_mem, ydst_off, x->dst.uv_stride);
-                }
-                else
-                {
-                    vp8_copy_mem_cl(x->block[0].cl_commands, pre_mem, &pre_off,
-                        d0->pre_stride, dst_mem, &ydst_off, x->dst.uv_stride, 8, 4, 1);
-                }
-            }
-            else
-            {
-                vp8_build_inter_predictors_b_s_cl(x,d0, ydst_off);
-                vp8_build_inter_predictors_b_s_cl(x,d1, ydst_off);
-            }
-        } //end for
-    }
-
-#if !ONE_CQ_PER_MB
-    VP8_CL_FINISH(x->block[0].cl_commands);
-    VP8_CL_FINISH(x->block[16].cl_commands);
-    VP8_CL_FINISH(x->block[20].cl_commands);
-#endif
-
-    vp8_cl_mb_finish(x, DST_BUF);
-}
--- a/vp8/common/opencl/subpixel_cl.h
+++ b/vp8/common/opencl/subpixel_cl.h
@@ -1,46 +0,0 @@
-/*
- *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-
-#ifndef SUBPIXEL_CL_H
-#define SUBPIXEL_CL_H
-
-#include "../blockd.h"
-
-/* Note:
- *
- * This platform is commonly built for runtime CPU detection. If you modify
- * any of the function mappings present in this file, be sure to also update
- * them in the function pointer initialization code
- */
-
-#define prototype_subpixel_predict_cl(sym) \
-    void sym(cl_command_queue cq, unsigned char *src_base, cl_mem src_mem, int src_offset, \
-            int src_pitch, int xofst, int yofst, \
-             unsigned char *dst_base, cl_mem dst_mem, int dst_offset, int dst_pitch)
-
-extern prototype_subpixel_predict_cl(vp8_sixtap_predict16x16_cl);
-extern prototype_subpixel_predict_cl(vp8_sixtap_predict8x8_cl);
-extern prototype_subpixel_predict_cl(vp8_sixtap_predict8x4_cl);
-extern prototype_subpixel_predict_cl(vp8_sixtap_predict4x4_cl);
-extern prototype_subpixel_predict_cl(vp8_bilinear_predict16x16_cl);
-extern prototype_subpixel_predict_cl(vp8_bilinear_predict8x8_cl);
-extern prototype_subpixel_predict_cl(vp8_bilinear_predict8x4_cl);
-extern prototype_subpixel_predict_cl(vp8_bilinear_predict4x4_cl);
-
-typedef prototype_subpixel_predict_cl((*vp8_subpix_cl_fn_t));
-
-//typedef enum
-//{
-//    SIXTAP = 0,
-//    BILINEAR = 1
-//} SUBPIX_TYPE;
-
-#endif
--- a/vp8/common/opencl/vp8_opencl.c
+++ b/vp8/common/opencl/vp8_opencl.c
@@ -1,342 +0,0 @@
-/*
- *  Copyright (c) 2011 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-#include <stdio.h>
-#include <string.h>
-#include <stdlib.h>
-#include "vp8_opencl.h"
-
-int cl_initialized = VP8_CL_NOT_INITIALIZED;
-VP8_COMMON_CL cl_data;
-
-//Initialization functions for various CL programs.
-extern int cl_init_filter();
-extern int cl_init_idct();
-extern int cl_init_loop_filter();
-
-//Common CL destructors
-extern void cl_destroy_loop_filter();
-extern void cl_destroy_filter();
-extern void cl_destroy_idct();
-
-//Destructors for encoder/decoder-specific bits
-extern void cl_decode_destroy();
-extern void cl_encode_destroy();
-
-/**
- * 
- * @param cq
- * @param new_status
- */
-void cl_destroy(cl_command_queue cq, int new_status) {
-
-    if (cl_initialized != CL_SUCCESS)
-        return;
-
-    //Wait on any pending operations to complete... frees up all of our pointers
-    if (cq != NULL)
-        clFinish(cq);
-
-#if ENABLE_CL_SUBPIXEL
-    //Release the objects that we've allocated on the GPU
-    cl_destroy_filter();
-#endif
-
-#if ENABLE_CL_IDCT_DEQUANT
-    cl_destroy_idct();
-
-#if CONFIG_VP8_DECODER
-    if (cl_data.cl_decode_initialized == CL_SUCCESS)
-        cl_decode_destroy();
-#endif
-
-#endif
-#if ENABLE_CL_LOOPFILTER
-    cl_destroy_loop_filter();
-#endif
-
-
-#if CONFIG_VP8_ENCODER
-    //placeholder for if/when encoder CL gets implemented
-#endif
-
-    if (cq){
-        clReleaseCommandQueue(cq);
-    }
-
-    if (cl_data.context){
-        clReleaseContext(cl_data.context);
-        cl_data.context = NULL;
-    }
-
-    cl_initialized = new_status;
-
-    return;
-}
-
-/**
- * 
- * @param dev
- * @return
- */
-cl_device_type device_type(cl_device_id dev){
-    cl_device_type type;
-    int err;
-
-    err = clGetDeviceInfo(dev, CL_DEVICE_TYPE, sizeof(type),&type,NULL);
-    if (err != CL_SUCCESS)
-        return CL_INVALID_DEVICE;
-    return type;
-}
-
-/**
- * 
- * @return
- */
-int cl_common_init() {
-    int err,i,dev;
-    cl_platform_id platform_ids[MAX_NUM_PLATFORMS];
-    cl_uint num_found, num_devices;
-    cl_device_id devices[MAX_NUM_DEVICES];
-
-    //Don't allow multiple CL contexts..
-    if (cl_initialized != VP8_CL_NOT_INITIALIZED)
-        return cl_initialized;
-
-    // Connect to a compute device
-    err = clGetPlatformIDs(MAX_NUM_PLATFORMS, platform_ids, &num_found);
-
-    if (err != CL_SUCCESS) {
-        fprintf(stderr, "Couldn't query platform IDs\n");
-        return VP8_CL_TRIED_BUT_FAILED;
-    }
-
-    if (num_found == 0) {
-        fprintf(stderr, "No platforms found\n");
-        return VP8_CL_TRIED_BUT_FAILED;
-    }
-
-    //printf("Enumerating %d platform(s)\n", num_found);
-    //Enumerate the platforms found
-    for (i = 0; i < num_found; i++){
-    	char buf[2048];
-        size_t len;
-        
-    	err = clGetPlatformInfo( platform_ids[i], CL_PLATFORM_VENDOR, sizeof(buf), buf, &len);
-    	if (err != CL_SUCCESS){
-            fprintf(stderr, "Error retrieving platform vendor for platform %d",i);
-            continue;
-    	}
-    	//printf("Platform %d: %s\n",i,buf);
-
-        //If you need to force a platform (e.g. CPU-only testing), uncomment this
-        //if (strstr(buf,"NVIDIA"))
-        //    continue;
-
-    	//Try to find a valid compute device
-    	//Favor the GPU, but fall back to any other available device if necessary
-#ifdef __APPLE__
-    	printf("Apple system. Running CL as CPU-only for now...\n");
-        err = clGetDeviceIDs(platform_ids[i], CL_DEVICE_TYPE_CPU, MAX_NUM_DEVICES, devices, &num_devices);
-#else
-        err = clGetDeviceIDs(platform_ids[i], CL_DEVICE_TYPE_ALL, MAX_NUM_DEVICES, devices, &num_devices);
-#endif //__APPLE__
-        //printf("found %d devices\n", num_devices);
-        cl_data.device_id = NULL;
-        for( dev = 0; dev < num_devices; dev++ ){
-            char ext[2048];
-            //Get info for this device.
-            err = clGetDeviceInfo(devices[dev], CL_DEVICE_EXTENSIONS,
-                    sizeof(ext),ext,NULL);
-            VP8_CL_CHECK_SUCCESS(NULL,err != CL_SUCCESS,
-                    "Error retrieving device extension list",continue, 0);
-            //printf("Device %d supports: %s\n",dev,ext);
-            
-            //The kernels in VP8 require byte-addressable stores, which is an
-            //extension. It's required in OpenCL 1.1, but not all devices
-            //support it.
-            if (strstr(ext,"cl_khr_byte_addressable_store")){
-                //We found a valid device, so use it. But if we find a GPU
-                //(maybe this is one), prefer that.
-                cl_data.device_id = devices[dev];
-
-                if ( device_type(devices[dev]) == CL_DEVICE_TYPE_GPU ){
-                    //printf("Device %d is a GPU\n",dev);
-                    break;
-                }
-            }
-        }
-
-        //If we've found a usable GPU, stop looking.
-        if (cl_data.device_id != NULL && device_type(cl_data.device_id) == CL_DEVICE_TYPE_GPU )
-            break;
-
-    }
-
-    if (cl_data.device_id == NULL){
-    	printf("Error: Failed to find a valid OpenCL device. Using CPU paths\n");
-    	return VP8_CL_TRIED_BUT_FAILED;
-    }
-
-    // Create the compute context
-    cl_data.context = clCreateContext(0, 1, &cl_data.device_id, NULL, NULL, &err);
-    if (!cl_data.context) {
-        printf("Error: Failed to create a compute context!\n");
-        return VP8_CL_TRIED_BUT_FAILED;
-    }
-
-    //Initialize programs to null value
-    //Enables detection of if they've been initialized as well.
-    cl_data.filter_program = NULL;
-    cl_data.idct_program = NULL;
-    cl_data.loop_filter_program = NULL;
-
-#if ENABLE_CL_SUBPIXEL
-    err = cl_init_filter();
-    if (err != CL_SUCCESS)
-        return err;
-#endif
-
-#if ENABLE_CL_IDCT_DEQUANT
-    err = cl_init_idct();
-    if (err != CL_SUCCESS)
-        return err;
-#endif
-
-#if ENABLE_CL_LOOPFILTER
-
-    err = cl_init_loop_filter();
-    if (err != CL_SUCCESS)
-        return err;
-#endif
-
-    return CL_SUCCESS;
-}
-
-char *cl_read_file(const char* file_name) {
-    long pos;
-    char *bytes;
-    size_t amt_read;
-    FILE *f;
-
-    f = fopen(file_name, "rb");
-    
-    if (f == NULL) {
-        char *fullpath;
-        //printf("Couldn't find %s\n", file_name);
-
-        //Generate a file path for the CL sources using the library install dir
-        fullpath = malloc(strlen(vpx_codec_lib_dir()) + strlen(file_name) + 2);
-        if (fullpath == NULL) {
-           return NULL;
-        }
-        strcpy(fullpath, vpx_codec_lib_dir());
-        strcat(fullpath, "/"); //Will need to be changed for MSVS
-        strcat(fullpath, file_name);
-
-        //printf("Looking in %s\n", fullpath);
-
-        f = fopen(fullpath, "rb");
-        if (f == NULL) {
-            fprintf(stderr,"Couldn't find CL source at %s or %s\n", file_name, fullpath);
-            free(fullpath);
-            return NULL;
-        }
-
-        //printf("Found cl source at %s\n", fullpath);
-        free(fullpath);
-    } else {
-        //printf("Found cl source at %s\n", file_name);
-    }
-
-    fseek(f, 0, SEEK_END);
-    pos = ftell(f);
-    fseek(f, 0, SEEK_SET);
-    bytes = malloc(pos+1);
-
-    if (bytes == NULL) {
-        fclose(f);
-        return NULL;
-    }
-
-    amt_read = fread(bytes, pos, 1, f);
-    if (amt_read != 1) {
-        free(bytes);
-        fclose(f);
-        return NULL;
-    }
-
-    bytes[pos] = '\0'; //null terminate the source string
-    fclose(f);
-
-
-    return bytes;
-}
-
-void show_build_log(cl_program *prog_ref){
-    size_t len;
-    char *buffer;
-    int err = clGetProgramBuildInfo(*prog_ref, cl_data.device_id, CL_PROGRAM_BUILD_LOG, 0, NULL, &len);
-
-    if (err != CL_SUCCESS){
-        printf("Error: Could not get length of CL build log\n");
-    }
-
-    buffer = (char*) malloc(len);
-    if (buffer == NULL) {
-        printf("Error: Couldn't allocate compile output buffer memory\n");
-    }
-
-    err = clGetProgramBuildInfo(*prog_ref, cl_data.device_id, CL_PROGRAM_BUILD_LOG, len, buffer, NULL);
-    if (err != CL_SUCCESS) {
-        printf("Error: Could not get CL build log\n");
-
-    } else {
-        printf("Compile output: %s\n", buffer);
-    }
-    free(buffer);
-}
-
-int cl_load_program(cl_program *prog_ref, const char *file_name, const char *opts) {
-
-    int err;
-    char *kernel_src = cl_read_file(file_name);
-    
-    *prog_ref = NULL;
-    if (kernel_src != NULL) {
-        *prog_ref = clCreateProgramWithSource(cl_data.context, 1, (const char**)&kernel_src, NULL, &err);
-        free(kernel_src);
-    } else {
-        cl_destroy(NULL, VP8_CL_TRIED_BUT_FAILED);
-        printf("Couldn't find OpenCL source files. \nUsing software path.\n");
-        return VP8_CL_TRIED_BUT_FAILED;
-    }
-
-    if (*prog_ref == NULL) {
-        printf("Error: Couldn't create program\n");
-        return VP8_CL_TRIED_BUT_FAILED;
-    }
-
-    if (err != CL_SUCCESS) {
-        printf("Error creating program: %d\n", err);
-    }
-
-    /* Build the program executable */
-    err = clBuildProgram(*prog_ref, 0, NULL, opts, NULL, NULL);
-    if (err != CL_SUCCESS) {
-        printf("Error: Failed to build program executable for %s!\n", file_name);
-
-        show_build_log(prog_ref);
-
-        return VP8_CL_TRIED_BUT_FAILED;
-    }
-
-    return CL_SUCCESS;
-}
--- a/vp8/common/opencl/vp8_opencl.h
+++ b/vp8/common/opencl/vp8_opencl.h
@@ -1,192 +0,0 @@
-/*
- *  Copyright (c) 2011 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-#ifndef VP8_OPENCL_H
-#define	VP8_OPENCL_H
-
-#ifdef	__cplusplus
-extern "C" {
-#endif
-
-#include "../../../vpx_config.h"
-
-#ifdef __APPLE__
-#include <OpenCL/cl.h>
-#else
-#include <CL/cl.h>
-#endif
-
-#if HAVE_DLOPEN
-#include "dynamic_cl.h"
-#endif
-
-#define ENABLE_CL_IDCT_DEQUANT 0
-#define ENABLE_CL_SUBPIXEL 1
-#define TWO_PASS_SIXTAP 0
-#define MEM_COPY_KERNEL 1
-#define ONE_CQ_PER_MB 1 //Value of 0 is racey... still experimental.
-#define ENABLE_CL_LOOPFILTER 0
-
-extern char *cl_read_file(const char* file_name);
-extern int cl_common_init();
-extern void cl_destroy(cl_command_queue cq, int new_status);
-extern int cl_load_program(cl_program *prog_ref, const char *file_name, const char *opts);
-
-#define MAX_NUM_PLATFORMS 4
-#define MAX_NUM_DEVICES 10
-
-#define VP8_CL_TRIED_BUT_FAILED 1
-#define VP8_CL_NOT_INITIALIZED -1
-extern int cl_initialized;
-
-extern const char *vpx_codec_lib_dir(void);
-
-#define VP8_CL_FINISH(cq) \
-    if (cl_initialized == CL_SUCCESS){ \
-        /* Wait for kernels to finish. */ \
-        clFinish(cq); \
-    }
-
-#define VP8_CL_BARRIER(cq) \
-    if (cl_initialized == CL_SUCCESS){ \
-        /* Insert a barrier into the command queue. */ \
-        clEnqueueBarrier(cq); \
-    }
-
-#define VP8_CL_CHECK_SUCCESS(cq,cond,msg,alt,retCode) \
-    if ( cond ){ \
-        fprintf(stderr, msg);  \
-        cl_destroy(cq, VP8_CL_TRIED_BUT_FAILED); \
-        alt; \
-        return retCode; \
-    }
-
-#define VP8_CL_CALC_LOCAL_SIZE(kernel, kernel_size) \
-    err = clGetKernelWorkGroupInfo( cl_data.kernel, \
-  	cl_data.device_id, \
-  	CL_KERNEL_WORK_GROUP_SIZE, \
-  	sizeof(size_t), \
-  	&cl_data.kernel_size, \
-  	NULL);\
-    VP8_CL_CHECK_SUCCESS(NULL, err != CL_SUCCESS, \
-        "Error: Failed to calculate local size of kernel!\n", \
-        ,\
-        VP8_CL_TRIED_BUT_FAILED \
-    ); \
-
-#define VP8_CL_CREATE_KERNEL(data,program,name,str_name) \
-    data.name = clCreateKernel(data.program, str_name , &err); \
-    VP8_CL_CHECK_SUCCESS(NULL, err != CL_SUCCESS || !data.name, \
-        "Error: Failed to create compute kernel "#str_name"!\n", \
-        ,\
-        VP8_CL_TRIED_BUT_FAILED \
-    );
-
-#define VP8_CL_READ_BUF(cq, bufRef, bufSize, dstPtr) \
-    err = clEnqueueReadBuffer(cq, bufRef, CL_FALSE, 0, bufSize , dstPtr, 0, NULL, NULL); \
-    VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS, \
-        "Error: Failed to read from GPU!\n",, err \
-    ); \
-
-#define VP8_CL_SET_BUF(cq, bufRef, bufSize, dataPtr, altPath, retCode) \
-    { \
-        err = clEnqueueWriteBuffer(cq, bufRef, CL_FALSE, 0, \
-            bufSize, dataPtr, 0, NULL, NULL); \
-        \
-        VP8_CL_CHECK_SUCCESS(cq, err != CL_SUCCESS, \
-            "Error: Failed to write to buffer!\n", \
-            altPath, retCode\
-        ); \
-    } \
-
-#define VP8_CL_CREATE_BUF(cq, bufRef, bufType, bufSize, dataPtr, altPath, retCode) \
-    bufRef = clCreateBuffer(cl_data.context, CL_MEM_READ_WRITE, bufSize, NULL, NULL); \
-    if (dataPtr != NULL && bufRef != NULL){ \
-        VP8_CL_SET_BUF(cq, bufRef, bufSize, dataPtr, altPath, retCode)\
-    } \
-    VP8_CL_CHECK_SUCCESS(cq, !bufRef, \
-        "Error: Failed to allocate buffer. Using CPU path!\n", \
-        altPath, retCode\
-    ); \
-
-#define VP8_CL_RELEASE_KERNEL(kernel) \
-    if (kernel) \
-        clReleaseKernel(kernel); \
-    kernel = NULL;
-
-typedef struct VP8_COMMON_CL {
-    cl_device_id device_id; // compute device id
-    cl_context context; // compute context
-    //cl_command_queue commands; // compute command queue
-
-    cl_program filter_program; // compute program for subpixel/bilinear filters
-    cl_kernel vp8_sixtap_predict_kernel;
-    size_t    vp8_sixtap_predict_kernel_size;
-    cl_kernel vp8_sixtap_predict8x4_kernel;
-    size_t    vp8_sixtap_predict8x4_kernel_size;
-    cl_kernel vp8_sixtap_predict8x8_kernel;
-    size_t    vp8_sixtap_predict8x8_kernel_size;
-    cl_kernel vp8_sixtap_predict16x16_kernel;
-    size_t    vp8_sixtap_predict16x16_kernel_size;
-
-    cl_kernel vp8_bilinear_predict4x4_kernel;
-    cl_kernel vp8_bilinear_predict8x4_kernel;
-    cl_kernel vp8_bilinear_predict8x8_kernel;
-    cl_kernel vp8_bilinear_predict16x16_kernel;
-
-    cl_kernel vp8_filter_block2d_first_pass_kernel;
-    size_t    vp8_filter_block2d_first_pass_kernel_size;
-    cl_kernel vp8_filter_block2d_second_pass_kernel;
-    size_t    vp8_filter_block2d_second_pass_kernel_size;
-
-    cl_kernel vp8_filter_block2d_bil_first_pass_kernel;
-    size_t    vp8_filter_block2d_bil_first_pass_kernel_size;
-    cl_kernel vp8_filter_block2d_bil_second_pass_kernel;
-    size_t    vp8_filter_block2d_bil_second_pass_kernel_size;
-
-    cl_kernel vp8_memcpy_kernel;
-    size_t    vp8_memcpy_kernel_size;
-    cl_kernel vp8_memset_short_kernel;
-
-    cl_program idct_program;
-    cl_kernel vp8_short_inv_walsh4x4_1_kernel;
-    cl_kernel vp8_short_inv_walsh4x4_1st_pass_kernel;
-    cl_kernel vp8_short_inv_walsh4x4_2nd_pass_kernel;
-    cl_kernel vp8_dc_only_idct_add_kernel;
-    //Note that the following 2 kernels are encoder-only. Not used in decoder.
-    cl_kernel vp8_short_idct4x4llm_1_kernel;
-    cl_kernel vp8_short_idct4x4llm_kernel;
-
-    cl_program loop_filter_program;
-    cl_kernel vp8_loop_filter_horizontal_edge_kernel;
-    cl_kernel vp8_loop_filter_vertical_edge_kernel;
-    cl_kernel vp8_mbloop_filter_horizontal_edge_kernel;
-    cl_kernel vp8_mbloop_filter_vertical_edge_kernel;
-    cl_kernel vp8_loop_filter_simple_horizontal_edge_kernel;
-    cl_kernel vp8_loop_filter_simple_vertical_edge_kernel;
-
-    cl_program dequant_program;
-    cl_kernel vp8_dequant_dc_idct_add_kernel;
-    cl_kernel vp8_dequant_idct_add_kernel;
-    cl_kernel vp8_dequantize_b_kernel;
-
-    cl_int cl_decode_initialized;
-    cl_int cl_encode_initialized;
-    
-} VP8_COMMON_CL;
-
-extern VP8_COMMON_CL cl_data;
-
-#ifdef	__cplusplus
-}
-#endif
-
-#endif	/* VP8_OPENCL_H */
-
--- a/vp8/decoder/opencl/opencl_systemdependent.c
+++ b/vp8/decoder/opencl/opencl_systemdependent.c
@@ -9,17 +9,11 @@
 */


-#include "vpx_ports/config.h"
-#include "vp8/decoder/onyxd_int.h"
+#ifndef __INC_PARTIALGFUPDATE_H
+#define __INC_PARTIALGFUPDATE_H

-#include "vp8/common/opencl/vp8_opencl.h"
-#include "vp8_decode_cl.h"
+#include "onyxc_int.h"

-void vp8_arch_opencl_decode_init(VP8D_COMP *pbi)
-{
+extern void update_gf_selective(ONYX_COMMON *cm, MACROBLOCKD *x);

-    if (cl_initialized == CL_SUCCESS){
-        cl_decode_init();
-    }
-
-}
+#endif
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Stefan Holmer	3cf0ef4593	Added configure option to enable error-concealment. Disabled by default. Change-Id: I94580a5ecb13520195ea2b8a10ca11bb5a01d2a6	2011-04-29 14:08:47 +02:00
Stefan Holmer	0909b83427	Concealed MBs are always SPLITMV with partition=3. This can be optimized. Also changed the criterion for when to skip decoding the residual, now only skipping for blocks which actually is missing residual. Now using mvs_corrupt_from_mb for this decision since asking the bool decoder doesn't work (it has already finished decoding). Change-Id: I3175f11c84ae701fc2935ebe22e1d75297072eae	2011-04-29 13:50:15 +02:00
John Koleszar	62da6700dc	Update VP8DX_BOOL_DECODER_FILL to better detect EOS Allow more reliable detection of truncated bitstreams by being more precise with the count of "virtual" bits in the value buffer. Specifically, the VP8_LOTS_OF_BITS value is accumulated into count, rather than being assigned, which was losing the prior value, increasing the required tolerance when testing for the error condition. Change-Id: Ib5172eaa57323b939c439fff8a8ab5fa38da9b69	2011-04-29 11:22:09 +02:00
Stefan Holmer	98ea0d71a4	Added more descriptive comments and did some smaller refactoring. Also changed to setting the mb_skip_coeff flag when a macroblock needs to be concealed. Change-Id: I0bbf6de899f5b27f4a8ca0454da7e928e8b23919	2011-04-28 16:28:07 +02:00
Stefan Holmer	8d49ea12c2	Added correct handling of motion vectors outside frame boundaries. Change-Id: Ibf81e1d188d8dd6de877e1c52761fa212e848865	2011-04-20 12:08:27 +02:00
Stefan Holmer	766ad7edb6	Reverting some of the changes done in a64b37..., moving back the bool dec error check to vp8_decode_mb_row. Change-Id: I717ee57efc29b8e0619d6f00d1c64d0d20114a8b	2011-04-19 16:23:05 +02:00
Stefan Holmer	20431c1354	Forgot to remove two lines in previous submit Change-Id: Idbc0bc328cf2f99071008fd4a54ea00bac7beb94	2011-04-19 15:38:39 +02:00
Stefan Holmer	1b913c1f78	Refactored find_neighboring_blocks() and moved the test for corrupt stream and intra concealment inside vp8_decode_macroblock to be able tocapture and conceal errors in the residual before reconstruction. Change-Id: Id0f0bd87945a9bb1db0c20bb5467e2ff9aae5d28	2011-04-19 15:33:46 +02:00
Stefan Holmer	a64b37fdbc	Added spatial motion vector interpolation. Used for intra blocks with missing residual coefficients. Change-Id: I3e765b5dee251362d1330ebbcf9fa22d852377a1	2011-04-19 12:45:51 +02:00
Stefan Holmer	a2951d8deb	Implemented a first version of the motion vector extrapolation error concealment algorithm. Tested on foreman_cif.yuv only. Some special cases are still not handled in a good way, for instance when receiving intra blocks without coefficients. Change-Id: Ie7bb41855860923b313645dacb3cf70f1e350549	2011-04-01 11:55:30 +02:00
Stefan Holmer	83a2b4e114	Added a first simple version of error-concealment Added a first very simple version of error-concealment which simply repeats the last decoded motion vector for corrupt MBs. Change-Id: Ia83e111649afe11870c3c66065977bd0610c4fa1	2011-02-01 17:30:51 +01:00
Henrik Lundin	1422ce5cff	Error concealment in decoder Implementing an error concealment in the VP8 decoder. Change-Id: I63934df71191ad0b1e65c89725d9e021e1d8d93d	2011-01-20 11:22:50 +01:00