Update CHANGELOG for v1.1.0 (Eider) release

Change-Id: Ic429556e76bcc4f96a34e18835a153b07fe410a2
Update AUTHORS
2012-05-08 16:14:00 -07:00 · 2012-05-08 15:01:35 -07:00 · 2012-05-08 15:01:35 -07:00 · 2012-05-08 15:01:24 -07:00 · 2012-05-04 12:24:04 -07:00 · 2012-05-04 10:44:47 -07:00
223 changed files with 9840 additions and 12208 deletions
--- a/.mailmap
+++ b/.mailmap
@@ -5,3 +5,4 @@ Tom Finegan <tomfinegan@google.com>
 Ralph Giles <giles@xiph.org> <giles@entropywave.com>
 Ralph Giles <giles@xiph.org> <giles@mozilla.com>
 Alpha Lam <hclam@google.com> <hclam@chromium.org>
+Deb Mukherjee <debargha@google.com>
--- a/4
+++ b/4
@@ -31,9 +31,11 @@ John Koleszar <jkoleszar@google.com>
 Joshua Bleecher Snyder <josh@treelinelabs.com>
 Justin Clift <justin@salasaga.org>
 Justin Lebar <justin.lebar@gmail.com>
+KO Myung-Hun <komh@chollian.net>
 Lou Quillio <louquillio@google.com>
 Luca Barbato <lu_zero@gentoo.org>
 Makoto Kato <makoto.kt@gmail.com>
+Marco Paniconi <marpan@google.com>
 Martin Ettl <ettl.martin78@googlemail.com>
 Michael Kohler <michaelkohler@live.com>
 Mike Hommey <mhommey@mozilla.com>
@@ -43,6 +45,7 @@ Patrik Westin <patrik.westin@gmail.com>
 Paul Wilkins <paulwilkins@google.com>
 Pavol Rusnak <stick@gk2.sk>
 Philip Jägenstedt <philipj@opera.com>
+Priit Laes <plaes@plaes.org>
 Rafael Ávila de Espíndola <rafael.espindola@gmail.com>
 Rafaël Carré <funman@videolan.org>
 Ralph Giles <giles@xiph.org>
@@ -50,6 +53,7 @@ Ronald S. Bultje <rbultje@google.com>
 Scott LaVarnway <slavarnway@google.com>
 Stefan Holmer <holmer@google.com>
 Taekhyun Kim <takim@nvidia.com>
+Takanori MATSUURA <t.matsuu@gmail.com>
 Tero Rintaluoma <teror@google.com>
 Thijs Vermeir <thijsvermeir@gmail.com>
 Timothy B. Terriberry <tterribe@xiph.org>
--- a/92
+++ b/92
@@ -1,3 +1,95 @@
+2012-05-09 v1.1.0 "Eider"
+  This introduces a number of enhancements, mostly focused on real-time
+  encoding. In addition, it fixes a decoder bug (first introduced in
+  Duclair) so all users of that release are encouraged to upgrade.
+
+  - Upgrading:
+    This release is ABI and API compatible with Duclair (v1.0.0). Users
+    of older releases should refer to the Upgrading notes in this
+    document for that release.
+
+    This release introduces a new temporal denoiser, controlled by the
+    VP8E_SET_NOISE_SENSITIVITY control. The temporal denoiser does not
+    currently take a strength parameter, so the control is effectively
+    a boolean - zero (off) or non-zero (on). For compatibility with
+    existing applications, the values accepted are the same as those
+    for the spatial denoiser (0-6). The temporal denoiser is enabled
+    by default, and the older spatial denoiser may be restored by
+    configuring with --disable-temporal-denoising. The temporal denoiser
+    is more computationally intensive than the spatial one.
+
+    This release removes support for a legacy, decode only API that was
+    supported, but deprecated, at the initial release of libvpx
+    (v0.9.0). This is not expected to have any impact. If you are
+    impacted, you can apply a reversion to commit 2bf8fb58 locally.
+    Please update to the latest libvpx API if you are affected.
+
+  - Enhancements:
+      Adds a motion compensated temporal denoiser to the encoder, which
+      gives higher quality than the older spatial denoiser. (See above
+      for notes on upgrading).
+
+      In addition, support for new compilers and platforms were added,
+      including:
+        improved support for XCode
+        Android x86 NDK build
+        OS/2 support
+        SunCC support
+
+      Changing resolution with vpx_codec_enc_config_set() is now
+      supported. Previously, reinitializing the codec was required to
+      change the input resolution.
+
+      The vpxenc application has initial support for producing multiple
+      encodes from the same input in one call. Resizing is not yet
+      supported, but varying other codec parameters is. Use -- to
+      delineate output streams. Options persist from one stream to the
+      next.
+
+      Also, the vpxenc application will now use a keyframe interval of
+      5 seconds by default. Use the --kf-max-dist option to override.
+
+  - Speed:
+      Decoder performance improved 2.5% versus Duclair. Encoder speed is
+      consistent with Duclair for most material. Two pass encoding of
+      slideshow-like material will see significant improvements.
+
+      Large realtime encoding speed gains at a small quality expense are
+      possible by configuring the on-the-fly bitpacking experiment with
+      --enable-onthefly-bitpacking. Realtime encoder can be up to 13%
+      faster (ARM) depending on the number of threads and bitrate
+      settings. This technique sees constant gain over the 5-16 speed
+      range. For VC style input the loss seen is up to 0.2dB. See commit
+      52cf4dca for further details.
+
+  - Quality:
+      On the whole, quality is consistent with the Duclair release. Some
+      tweaks:
+
+        Reduced blockiness in easy sections by applying a penalty to
+        intra modes.
+
+        Improved quality of static sections (like slideshows) with
+        two pass encoding.
+
+        Improved keyframe sizing with multiple temporal layers
+
+  - Bug Fixes:
+      Corrected alt-ref contribution to frame rate for visible updates
+      to the alt-ref buffer. This affected applications making manual
+      usage of the frame reference flags, or temporal layers.
+
+      Additional constraints were added to disable multi-frame quality
+      enhancement (MFQE) in sections of the frame where there is motion.
+      (#392)
+
+      Fixed corruption issues when vpx_codec_enc_config_set() was called
+      with spatial resampling enabled.
+
+      Fixed a decoder error introduced in Duclair where the segmentation
+      map was not being reinitialized on keyframes (#378)
+
+
 2012-01-27 v1.0.0 "Duclair"
  Our fourth named release, focused on performance and features related to
  real-time encoding. It also fixes a decoder crash bug introduced in
--- a/build/make/Android.mk
+++ b/build/make/Android.mk
@@ -99,7 +99,7 @@ $$(eval $$(call ev-build-file))

 $(1) : $$(_OBJ) $(2)
 	@mkdir -p $$(dir $$@)
-	@grep -w EQU $$< | tr -d '\#' | $(CONFIG_DIR)/$(ASM_CONVERSION) > $$@
+	@grep $(OFFSET_PATTERN) $$< | tr -d '\#' | $(CONFIG_DIR)/$(ASM_CONVERSION) > $$@
 endef

 # Use ads2gas script to convert from RVCT format to GAS format.  This passes
@@ -118,6 +118,9 @@ $(ASM_CNV_PATH)/libvpx/%.asm.s: $(LIBVPX_PATH)/%.asm $(ASM_CNV_OFFSETS_DEPEND)
 	@mkdir -p $(dir $@)
 	@$(CONFIG_DIR)/$(ASM_CONVERSION) <$< > $@

+# For building vpx_rtcd.h, which has a rule in libs.mk
+TGT_ISA:=$(word 1, $(subst -, ,$(TOOLCHAIN)))
+target := libs

 LOCAL_SRC_FILES += vpx_config.c

@@ -165,12 +168,15 @@ LOCAL_LDLIBS := -llog

 LOCAL_STATIC_LIBRARIES := cpufeatures

+$(foreach file, $(LOCAL_SRC_FILES), $(LOCAL_PATH)/$(file)): vpx_rtcd.h
+
 .PHONY: clean
 clean:
 	@echo "Clean: ads2gas files [$(TARGET_ARCH_ABI)]"
 	@$(RM) $(CODEC_SRCS_ASM_ADS2GAS) $(CODEC_SRCS_ASM_NEON_ADS2GAS)
 	@$(RM) $(patsubst %.asm, %.*, $(ASM_CNV_OFFSETS_DEPEND))
 	@$(RM) -r $(ASM_CNV_PATH)
+	@$(RM) $(CLEAN-OBJS)

 include $(BUILD_SHARED_LIBRARY)

--- a/build/make/configure.sh
+++ b/build/make/configure.sh
@@ -391,6 +391,8 @@ LDFLAGS = ${LDFLAGS}
 ASFLAGS = ${ASFLAGS}
 extralibs = ${extralibs}
 AS_SFX    = ${AS_SFX:-.asm}
+EXE_SFX   = ${EXE_SFX}
+RTCD_OPTIONS = ${RTCD_OPTIONS}
 EOF

    if enabled rvct; then cat >> $1 << EOF
@@ -454,9 +456,25 @@ process_common_cmdline() {
        ;;
        --enable-?*|--disable-?*)
        eval `echo "$opt" | sed 's/--/action=/;s/-/ option=/;s/-/_/g'`
-        echo "${CMDLINE_SELECT} ${ARCH_EXT_LIST}" | grep "^ *$option\$" >/dev/null || die_unknown $opt
+        if echo "${ARCH_EXT_LIST}" | grep "^ *$option\$" >/dev/null; then
+            [ $action = "disable" ] && RTCD_OPTIONS="${RTCD_OPTIONS}${opt} "
+        elif [ $action = "disable" ] && ! disabled $option ; then
+          echo "${CMDLINE_SELECT}" | grep "^ *$option\$" >/dev/null ||
+            die_unknown $opt
+        elif [ $action = "enable" ] && ! enabled $option ; then
+          echo "${CMDLINE_SELECT}" | grep "^ *$option\$" >/dev/null ||
+            die_unknown $opt
+        fi
        $action $option
        ;;
+        --require-?*)
+        eval `echo "$opt" | sed 's/--/action=/;s/-/ option=/;s/-/_/g'`
+        if echo "${ARCH_EXT_LIST}" none | grep "^ *$option\$" >/dev/null; then
+            RTCD_OPTIONS="${RTCD_OPTIONS}${opt} "
+        else
+            die_unknown $opt
+        fi
+        ;;
        --force-enable-?*|--force-disable-?*)
        eval `echo "$opt" | sed 's/--force-/action=/;s/-/ option=/;s/-/_/g'`
        $action $option
@@ -526,6 +544,7 @@ setup_gnu_toolchain() {
    STRIP=${STRIP:-${CROSS}strip}
    NM=${NM:-${CROSS}nm}
        AS_SFX=.s
+        EXE_SFX=
 }

 process_common_toolchain() {
@@ -569,6 +588,10 @@ process_common_toolchain() {
                tgt_isa=x86_64
                tgt_os=darwin11
                ;;
+            *darwin12*)
+                tgt_isa=x86_64
+                tgt_os=darwin12
+                ;;
            *mingw32*|*cygwin*)
                [ -z "$tgt_isa" ] && tgt_isa=x86
                tgt_os=win32
@@ -579,6 +602,9 @@ process_common_toolchain() {
            *solaris2.10)
                tgt_os=solaris
                ;;
+            *os2*)
+                tgt_os=os2
+                ;;
        esac

        if [ -n "$tgt_isa" ] && [ -n "$tgt_os" ]; then
@@ -616,44 +642,51 @@ process_common_toolchain() {

    # Handle darwin variants. Newer SDKs allow targeting older
    # platforms, so find the newest SDK available.
-    if [ -d "/Developer/SDKs/MacOSX10.4u.sdk" ]; then
-        osx_sdk_dir="/Developer/SDKs/MacOSX10.4u.sdk"
-    fi
-    if [ -d "/Developer/SDKs/MacOSX10.5.sdk" ]; then
-        osx_sdk_dir="/Developer/SDKs/MacOSX10.5.sdk"
-    fi
-    if [ -d "/Developer/SDKs/MacOSX10.6.sdk" ]; then
-        osx_sdk_dir="/Developer/SDKs/MacOSX10.6.sdk"
-    fi
-    if [ -d "/Developer/SDKs/MacOSX10.7.sdk" ]; then
-        osx_sdk_dir="/Developer/SDKs/MacOSX10.7.sdk"
+    case ${toolchain} in
+        *-darwin*)
+            if [ -z "${DEVELOPER_DIR}" ]; then
+                DEVELOPER_DIR=`xcode-select -print-path 2> /dev/null`
+                [ $? -ne 0 ] && OSX_SKIP_DIR_CHECK=1
+            fi
+            if [ -z "${OSX_SKIP_DIR_CHECK}" ]; then
+                OSX_SDK_ROOTS="${DEVELOPER_DIR}/SDKs"
+                OSX_SDK_VERSIONS="MacOSX10.4u.sdk MacOSX10.5.sdk MacOSX10.6.sdk"
+                OSX_SDK_VERSIONS="${OSX_SDK_VERSIONS} MacOSX10.7.sdk"
+                for v in ${OSX_SDK_VERSIONS}; do
+                    if [ -d "${OSX_SDK_ROOTS}/${v}" ]; then
+                        osx_sdk_dir="${OSX_SDK_ROOTS}/${v}"
+                    fi
+                done
+            fi
+            ;;
+    esac
+
+    if [ -d "${osx_sdk_dir}" ]; then
+        add_cflags  "-isysroot ${osx_sdk_dir}"
+        add_ldflags "-isysroot ${osx_sdk_dir}"
    fi

    case ${toolchain} in
        *-darwin8-*)
-            add_cflags  "-isysroot ${osx_sdk_dir}"
            add_cflags  "-mmacosx-version-min=10.4"
-            add_ldflags "-isysroot ${osx_sdk_dir}"
            add_ldflags "-mmacosx-version-min=10.4"
            ;;
        *-darwin9-*)
-            add_cflags  "-isysroot ${osx_sdk_dir}"
            add_cflags  "-mmacosx-version-min=10.5"
-            add_ldflags "-isysroot ${osx_sdk_dir}"
            add_ldflags "-mmacosx-version-min=10.5"
            ;;
        *-darwin10-*)
-            add_cflags  "-isysroot ${osx_sdk_dir}"
            add_cflags  "-mmacosx-version-min=10.6"
-            add_ldflags "-isysroot ${osx_sdk_dir}"
            add_ldflags "-mmacosx-version-min=10.6"
            ;;
        *-darwin11-*)
-            add_cflags  "-isysroot ${osx_sdk_dir}"
            add_cflags  "-mmacosx-version-min=10.7"
-            add_ldflags "-isysroot ${osx_sdk_dir}"
            add_ldflags "-mmacosx-version-min=10.7"
            ;;
+        *-darwin12-*)
+            add_cflags  "-mmacosx-version-min=10.8"
+            add_ldflags "-mmacosx-version-min=10.8"
+            ;;
    esac

    # Handle Solaris variants. Solaris 10 needs -lposix4
@@ -671,10 +704,22 @@ process_common_toolchain() {
    case ${toolchain} in
    arm*)
        # on arm, isa versions are supersets
-        enabled armv7a && soft_enable armv7 ### DEBUG
-        enabled armv7 && soft_enable armv6
-        enabled armv7 || enabled armv6 && soft_enable armv5te
-        enabled armv7 || enabled armv6 && soft_enable fast_unaligned
+        case ${tgt_isa} in
+        armv7)
+            soft_enable neon
+            soft_enable media
+            soft_enable edsp
+            soft_enable fast_unaligned
+            ;;
+        armv6)
+            soft_enable media
+            soft_enable edsp
+            soft_enable fast_unaligned
+            ;;
+        armv5te)
+            soft_enable edsp
+            ;;
+        esac

        asm_conversion_cmd="cat"

@@ -687,10 +732,14 @@ process_common_toolchain() {
            arch_int=${arch_int%%te}
            check_add_asflags --defsym ARCHITECTURE=${arch_int}
            tune_cflags="-mtune="
-            if enabled armv7
-            then
-                check_add_cflags -march=armv7-a -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp  #-ftree-vectorize
-                check_add_asflags -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp  #-march=armv7-a
+            if [ ${tgt_isa} == "armv7" ]; then
+                if enabled neon
+                then
+                    check_add_cflags -mfpu=neon #-ftree-vectorize
+                    check_add_asflags -mfpu=neon
+                fi
+                check_add_cflags -march=armv7-a -mcpu=cortex-a8 -mfloat-abi=softfp
+                check_add_asflags -mcpu=cortex-a8 -mfloat-abi=softfp  #-march=armv7-a
            else
                check_add_cflags -march=${tgt_isa}
                check_add_asflags -march=${tgt_isa}
@@ -708,10 +757,14 @@ process_common_toolchain() {
            tune_cflags="--cpu="
            tune_asflags="--cpu="
            if [ -z "${tune_cpu}" ]; then
-            if enabled armv7
-                then
-                    check_add_cflags --cpu=Cortex-A8 --fpu=softvfp+vfpv3
-                    check_add_asflags --cpu=Cortex-A8 --fpu=softvfp+vfpv3
+                if [ ${tgt_isa} == "armv7" ]; then
+                    if enabled neon
+                    then
+                        check_add_cflags --fpu=softvfp+vfpv3
+                        check_add_asflags --fpu=softvfp+vfpv3
+                    fi
+                    check_add_cflags --cpu=Cortex-A8
+                    check_add_asflags --cpu=Cortex-A8
                else
                    check_add_cflags --cpu=${tgt_isa##armv}
                    check_add_asflags --cpu=${tgt_isa##armv}
@@ -757,17 +810,19 @@ process_common_toolchain() {
            add_cflags "--sysroot=${alt_libc}"
            add_ldflags "--sysroot=${alt_libc}"

+            add_cflags "-I${SDK_PATH}/sources/android/cpufeatures/"
+
            enable pic
            soft_enable realtime_only
-            if enabled armv7
-            then
+            if [ ${tgt_isa} == "armv7" ]; then
                enable runtime_cpu_detect
            fi
          ;;

        darwin*)
            if [ -z "${sdk_path}" ]; then
-                SDK_PATH=/Developer/Platforms/iPhoneOS.platform/Developer
+                SDK_PATH=`xcode-select -print-path 2> /dev/null`
+                SDK_PATH=${SDK_PATH}/Platforms/iPhoneOS.platform/Developer
            else
                SDK_PATH=${sdk_path}
            fi
@@ -789,7 +844,7 @@ process_common_toolchain() {
            add_ldflags -arch_only ${tgt_isa}

            if [ -z "${alt_libc}" ]; then
-                alt_libc=${SDK_PATH}/SDKs/iPhoneOS5.0.sdk
+                alt_libc=${SDK_PATH}/SDKs/iPhoneOS5.1.sdk
            fi

            add_cflags  "-isysroot ${alt_libc}"
@@ -886,6 +941,9 @@ process_common_toolchain() {
                LD=${LD:-${CROSS}gcc}
                CROSS=${CROSS:-g}
                ;;
+            os2)
+                AS=${AS:-nasm}
+                ;;
        esac

        AS="${alt_as:-${AS:-auto}}"
@@ -956,6 +1014,11 @@ process_common_toolchain() {
                # enabled icc && ! enabled pic && add_cflags -fno-pic -mdynamic-no-pic
                enabled icc && ! enabled pic && add_cflags -fno-pic
            ;;
+            os2)
+                add_asflags -f aout
+                enabled debug && add_asflags -g
+                EXE_SFX=.exe
+            ;;
            *) log "Warning: Unknown os $tgt_os while setting up $AS flags"
            ;;
        esac
--- a/build/make/gen_asm_deps.sh
+++ b/build/make/gen_asm_deps.sh
@@ -42,7 +42,7 @@ done

 [ -n "$srcfile" ] || show_help
 sfx=${sfx:-asm}
-includes=$(egrep -i "include +\"?+[a-z0-9_/]+\.${sfx}" $srcfile |
+includes=$(LC_ALL=C egrep -i "include +\"?+[a-z0-9_/]+\.${sfx}" $srcfile |
           perl -p -e "s;.*?([a-z0-9_/]+.${sfx}).*;\1;")
 #" restore editor state
 for inc in ${includes}; do
--- a/build/make/rtcd.sh
+++ b/build/make/rtcd.sh
@@ -0,0 +1,330 @@
+#!/bin/sh
+self=$0
+
+usage() {
+  cat <<EOF >&2
+Usage: $self [options] FILE
+
+Reads the Run Time CPU Detections definitions from FILE and generates a
+C header file on stdout.
+
+Options:
+  --arch=ARCH   Architecture to generate defs for (required)
+  --disable-EXT Disable support for EXT extensions
+  --require-EXT Require support for EXT extensions
+  --sym=SYMBOL  Unique symbol to use for RTCD initialization function
+  --config=FILE File with CONFIG_FOO=yes lines to parse
+EOF
+  exit 1
+}
+
+die() {
+  echo "$@" >&2
+  exit 1
+}
+
+die_argument_required() {
+  die "Option $opt requires argument"
+}
+
+for opt; do
+  optval="${opt#*=}"
+  case "$opt" in
+    --arch) die_argument_required;;
+    --arch=*) arch=${optval};;
+    --disable-*) eval "disable_${opt#--disable-}=true";;
+    --require-*) REQUIRES="${REQUIRES}${opt#--require-} ";;
+    --sym) die_argument_required;;
+    --sym=*) symbol=${optval};;
+    --config=*) config_file=${optval};;
+    -h|--help)
+      usage
+      ;;
+    -*)
+      die "Unrecognized option: ${opt%%=*}"
+      ;;
+    *)
+      defs_file="$defs_file $opt"
+      ;;
+  esac
+  shift
+done
+for f in $defs_file; do [ -f "$f" ] || usage; done
+[ -n "$arch" ] || usage
+
+# Import the configuration
+[ -f "$config_file" ] && eval $(grep CONFIG_ "$config_file")
+
+#
+# Routines for the RTCD DSL to call
+#
+prototype() {
+  local rtyp
+  case "$1" in
+    unsigned) rtyp="$1 "; shift;;
+  esac
+  rtyp="${rtyp}$1"
+  local fn="$2"
+  local args="$3"
+
+  eval "${2}_rtyp='$rtyp'"
+  eval "${2}_args='$3'"
+  ALL_FUNCS="$ALL_FUNCS $fn"
+  specialize $fn c
+}
+
+specialize() {
+  local fn="$1"
+  shift
+  for opt in "$@"; do
+    eval "${fn}_${opt}=${fn}_${opt}"
+  done
+}
+
+require() {
+  for fn in $ALL_FUNCS; do
+    for opt in "$@"; do
+      local ofn=$(eval "echo \$${fn}_${opt}")
+      [ -z "$ofn" ] && continue
+
+      # if we already have a default, then we can disable it, as we know
+      # we can do better.
+      local best=$(eval "echo \$${fn}_default")
+      local best_ofn=$(eval "echo \$${best}")
+      [ -n "$best" ] && [ "$best_ofn" != "$ofn" ] && eval "${best}_link=false"
+      eval "${fn}_default=${fn}_${opt}"
+      eval "${fn}_${opt}_link=true"
+    done
+  done
+}
+
+forward_decls() {
+  ALL_FORWARD_DECLS="$ALL_FORWARD_DECLS $1"
+}
+
+#
+# Include the user's directives
+#
+for f in $defs_file; do
+  . $f
+done
+
+#
+# Process the directives according to the command line
+#
+process_forward_decls() {
+  for fn in $ALL_FORWARD_DECLS; do
+    eval $fn
+  done
+}
+
+determine_indirection() {
+  [ "$CONFIG_RUNTIME_CPU_DETECT" = "yes" ] || require $ALL_ARCHS
+  for fn in $ALL_FUNCS; do
+    local n=""
+    local rtyp="$(eval "echo \$${fn}_rtyp")"
+    local args="$(eval "echo \"\$${fn}_args\"")"
+    local dfn="$(eval "echo \$${fn}_default")"
+    dfn=$(eval "echo \$${dfn}")
+    for opt in "$@"; do
+      local ofn=$(eval "echo \$${fn}_${opt}")
+      [ -z "$ofn" ] && continue
+      local link=$(eval "echo \$${fn}_${opt}_link")
+      [ "$link" = "false" ] && continue
+      n="${n}x"
+    done
+    if [ "$n" = "x" ]; then
+      eval "${fn}_indirect=false"
+    else
+      eval "${fn}_indirect=true"
+    fi
+  done
+}
+
+declare_function_pointers() {
+  for fn in $ALL_FUNCS; do
+    local rtyp="$(eval "echo \$${fn}_rtyp")"
+    local args="$(eval "echo \"\$${fn}_args\"")"
+    local dfn="$(eval "echo \$${fn}_default")"
+    dfn=$(eval "echo \$${dfn}")
+    for opt in "$@"; do
+      local ofn=$(eval "echo \$${fn}_${opt}")
+      [ -z "$ofn" ] && continue
+      echo "$rtyp ${ofn}($args);"
+    done
+    if [ "$(eval "echo \$${fn}_indirect")" = "false" ]; then
+      echo "#define ${fn} ${dfn}"
+    else
+      echo "RTCD_EXTERN $rtyp (*${fn})($args);"
+    fi
+    echo
+  done
+}
+
+set_function_pointers() {
+  for fn in $ALL_FUNCS; do
+    local n=""
+    local rtyp="$(eval "echo \$${fn}_rtyp")"
+    local args="$(eval "echo \"\$${fn}_args\"")"
+    local dfn="$(eval "echo \$${fn}_default")"
+    dfn=$(eval "echo \$${dfn}")
+    if $(eval "echo \$${fn}_indirect"); then
+      echo "    $fn = $dfn;"
+      for opt in "$@"; do
+        local ofn=$(eval "echo \$${fn}_${opt}")
+        [ -z "$ofn" ] && continue
+        [ "$ofn" = "$dfn" ] && continue;
+        local link=$(eval "echo \$${fn}_${opt}_link")
+        [ "$link" = "false" ] && continue
+        local cond="$(eval "echo \$have_${opt}")"
+        echo "    if (${cond}) $fn = $ofn;"
+      done
+    fi
+    echo
+  done
+}
+
+filter() {
+  local filtered
+  for opt in "$@"; do
+    [ -z $(eval "echo \$disable_${opt}") ] && filtered="$filtered $opt"
+  done
+  echo $filtered
+}
+
+#
+# Helper functions for generating the arch specific RTCD files
+#
+common_top() {
+  local outfile_basename=$(basename ${symbol:-rtcd.h})
+  local include_guard=$(echo $outfile_basename | tr '[a-z]' '[A-Z]' | tr -c '[A-Z]' _)
+  cat <<EOF
+#ifndef ${include_guard}
+#define ${include_guard}
+
+#ifdef RTCD_C
+#define RTCD_EXTERN
+#else
+#define RTCD_EXTERN extern
+#endif
+
+$(process_forward_decls)
+
+$(declare_function_pointers c $ALL_ARCHS)
+EOF
+}
+
+common_bottom() {
+  cat <<EOF
+#endif
+EOF
+}
+
+x86() {
+  determine_indirection c $ALL_ARCHS
+
+  # Assign the helper variable for each enabled extension
+  for opt in $ALL_ARCHS; do
+    local uc=$(echo $opt | tr '[a-z]' '[A-Z]')
+    eval "have_${opt}=\"flags & HAS_${uc}\""
+  done
+
+  cat <<EOF
+$(common_top)
+void ${symbol:-rtcd}(void);
+
+#ifdef RTCD_C
+#include "vpx_ports/x86.h"
+void ${symbol:-rtcd}(void)
+{
+    int flags = x86_simd_caps();
+
+    (void)flags;
+
+$(set_function_pointers c $ALL_ARCHS)
+}
+#endif
+$(common_bottom)
+EOF
+}
+
+arm() {
+  determine_indirection c $ALL_ARCHS
+
+  # Assign the helper variable for each enabled extension
+  for opt in $ALL_ARCHS; do
+    local uc=$(echo $opt | tr '[a-z]' '[A-Z]')
+    eval "have_${opt}=\"flags & HAS_${uc}\""
+  done
+
+  cat <<EOF
+$(common_top)
+#include "vpx_config.h"
+
+void ${symbol:-rtcd}(void);
+
+#ifdef RTCD_C
+#include "vpx_ports/arm.h"
+void ${symbol:-rtcd}(void)
+{
+    int flags = arm_cpu_caps();
+
+    (void)flags;
+
+$(set_function_pointers c $ALL_ARCHS)
+}
+#endif
+$(common_bottom)
+EOF
+}
+
+
+unoptimized() {
+  determine_indirection c
+  cat <<EOF
+$(common_top)
+#include "vpx_config.h"
+
+void ${symbol:-rtcd}(void);
+
+#ifdef RTCD_C
+void ${symbol:-rtcd}(void)
+{
+$(set_function_pointers c)
+}
+#endif
+$(common_bottom)
+EOF
+
+}
+#
+# Main Driver
+#
+require c
+case $arch in
+  x86)
+    ALL_ARCHS=$(filter mmx sse sse2 sse3 ssse3 sse4_1)
+    x86
+    ;;
+  x86_64)
+    ALL_ARCHS=$(filter mmx sse sse2 sse3 ssse3 sse4_1)
+    REQUIRES=${REQUIRES:-mmx sse sse2}
+    require $(filter $REQUIRES)
+    x86
+    ;;
+  armv5te)
+    ALL_ARCHS=$(filter edsp)
+    arm
+    ;;
+  armv6)
+    ALL_ARCHS=$(filter edsp media)
+    arm
+    ;;
+  armv7)
+    ALL_ARCHS=$(filter edsp media neon)
+    arm
+    ;;
+  *)
+    unoptimized
+    ;;
+esac
--- a/43
+++ b/43
@@ -39,6 +39,7 @@ Advanced options:
  ${toggle_multithread}           multithreaded encoding and decoding
  ${toggle_spatial_resampling}    spatial sampling (scaling) support
  ${toggle_realtime_only}         enable this option while building for real-time encoding
+  ${toggle_onthefly_bitpacking}   enable on-the-fly bitpacking in real-time encoding
  ${toggle_error_concealment}     enable this option to get a decoder which is able to conceal losses
  ${toggle_runtime_cpu_detect}    runtime cpu detection
  ${toggle_shared}                shared library support
@@ -46,6 +47,7 @@ Advanced options:
  ${toggle_small}                 favor smaller size over speed
  ${toggle_postproc_visualizer}   macro block / block level visualizers
  ${toggle_multi_res_encoding}    enable multiple-resolution encoding
+  ${toggle_temporal_denoising}    enable temporal denoising and disable the spatial denoiser

 Codecs:
  Codecs can be selectively enabled or disabled individually, or by family:
@@ -107,8 +109,11 @@ all_platforms="${all_platforms} x86-darwin8-icc"
 all_platforms="${all_platforms} x86-darwin9-gcc"
 all_platforms="${all_platforms} x86-darwin9-icc"
 all_platforms="${all_platforms} x86-darwin10-gcc"
+all_platforms="${all_platforms} x86-darwin11-gcc"
+all_platforms="${all_platforms} x86-darwin12-gcc"
 all_platforms="${all_platforms} x86-linux-gcc"
 all_platforms="${all_platforms} x86-linux-icc"
+all_platforms="${all_platforms} x86-os2-gcc"
 all_platforms="${all_platforms} x86-solaris-gcc"
 all_platforms="${all_platforms} x86-win32-gcc"
 all_platforms="${all_platforms} x86-win32-vs7"
@@ -117,6 +122,7 @@ all_platforms="${all_platforms} x86-win32-vs9"
 all_platforms="${all_platforms} x86_64-darwin9-gcc"
 all_platforms="${all_platforms} x86_64-darwin10-gcc"
 all_platforms="${all_platforms} x86_64-darwin11-gcc"
+all_platforms="${all_platforms} x86_64-darwin12-gcc"
 all_platforms="${all_platforms} x86_64-linux-gcc"
 all_platforms="${all_platforms} x86_64-linux-icc"
 all_platforms="${all_platforms} x86_64-solaris-gcc"
@@ -125,6 +131,9 @@ all_platforms="${all_platforms} x86_64-win64-vs8"
 all_platforms="${all_platforms} x86_64-win64-vs9"
 all_platforms="${all_platforms} universal-darwin8-gcc"
 all_platforms="${all_platforms} universal-darwin9-gcc"
+all_platforms="${all_platforms} universal-darwin10-gcc"
+all_platforms="${all_platforms} universal-darwin11-gcc"
+all_platforms="${all_platforms} universal-darwin12-gcc"
 all_platforms="${all_platforms} generic-gnu"

 # all_targets is a list of all targets that can be configured
@@ -163,6 +172,7 @@ enable md5
 enable spatial_resampling
 enable multithread
 enable os_support
+enable temporal_denoising

 [ -d ${source_path}/../include ] && enable alt_tree_layout
 for d in vp8; do
@@ -176,6 +186,8 @@ else
 # customer environment
 [ -f ${source_path}/../include/vpx/vp8cx.h ] && CODECS="${CODECS} vp8_encoder"
 [ -f ${source_path}/../include/vpx/vp8dx.h ] && CODECS="${CODECS} vp8_decoder"
+[ -f ${source_path}/../include/vpx/vp8cx.h ] || disable vp8_encoder
+[ -f ${source_path}/../include/vpx/vp8dx.h ] || disable vp8_decoder

 [ -f ${source_path}/../lib/*/*mt.lib ] && soft_enable static_msvcrt
 fi
@@ -192,9 +204,9 @@ ARCH_LIST="
    ppc64
 "
 ARCH_EXT_LIST="
-    armv5te
-    armv6
-    armv7
+    edsp
+    media
+    neon

    mips32

@@ -252,6 +264,7 @@ CONFIG_LIST="
    static_msvcrt
    spatial_resampling
    realtime_only
+    onthefly_bitpacking
    error_concealment
    shared
    static
@@ -260,6 +273,7 @@ CONFIG_LIST="
    os_support
    unit_tests
    multi_res_encoding
+    temporal_denoising
 "
 CMDLINE_SELECT="
    extra_warnings
@@ -296,6 +310,7 @@ CMDLINE_SELECT="
    mem_tracker
    spatial_resampling
    realtime_only
+    onthefly_bitpacking
    error_concealment
    shared
    static
@@ -303,6 +318,7 @@ CMDLINE_SELECT="
    postproc_visualizer
    unit_tests
    multi_res_encoding
+    temporal_denoising
 "

 process_cmdline() {
@@ -483,11 +499,20 @@ process_toolchain() {
    case $toolchain in
        universal-darwin*)
            local darwin_ver=${tgt_os##darwin}
-            fat_bin_archs="$fat_bin_archs ppc32-${tgt_os}-gcc"

-            # Intel
-            fat_bin_archs="$fat_bin_archs x86-${tgt_os}-${tgt_cc}"
-            if [ $darwin_ver -gt 8 ]; then
+            # Snow Leopard (10.6/darwin10) dropped support for PPC
+            # Include PPC support for all prior versions
+            if [ $darwin_ver -lt 10 ]; then
+                fat_bin_archs="$fat_bin_archs ppc32-${tgt_os}-gcc"
+            fi
+
+            # Tiger (10.4/darwin8) brought support for x86
+            if [ $darwin_ver -ge 8 ]; then
+                fat_bin_archs="$fat_bin_archs x86-${tgt_os}-${tgt_cc}"
+            fi
+
+            # Leopard (10.5/darwin9) brought 64 bit support
+            if [ $darwin_ver -ge 9 ]; then
                fat_bin_archs="$fat_bin_archs x86_64-${tgt_os}-${tgt_cc}"
            fi
            ;;
@@ -503,6 +528,10 @@ process_toolchain() {
        check_add_cflags -Wpointer-arith
        check_add_cflags -Wtype-limits
        check_add_cflags -Wcast-qual
+        check_add_cflags -Wimplicit-function-declaration
+        check_add_cflags -Wuninitialized
+        check_add_cflags -Wunused-variable
+        check_add_cflags -Wunused-but-set-variable        
        enabled extra_warnings || check_add_cflags -Wno-unused-function
    fi

--- a/docs.mk
+++ b/docs.mk
@@ -21,9 +21,6 @@ CODEC_DOX :=    mainpage.dox \
 		usage_dx.dox \

 # Other doxy files sourced in Markdown
-TXT_DOX-$(CONFIG_VP8)          += vp8_api1_migration.dox
-vp8_api1_migration.dox.DESC     = VP8 API 1.x Migration
-
 TXT_DOX = $(call enabled,TXT_DOX)

 %.dox: %.txt
--- a/examples.mk
+++ b/examples.mk
@@ -32,6 +32,7 @@ vpxenc.SRCS                 += args.c args.h y4minput.c y4minput.h
 vpxenc.SRCS                 += tools_common.c tools_common.h
 vpxenc.SRCS                 += vpx_ports/mem_ops.h
 vpxenc.SRCS                 += vpx_ports/mem_ops_aligned.h
+vpxenc.SRCS                 += vpx_ports/vpx_timer.h
 vpxenc.SRCS                 += libmkv/EbmlIDs.h
 vpxenc.SRCS                 += libmkv/EbmlWriter.c
 vpxenc.SRCS                 += libmkv/EbmlWriter.h
@@ -168,12 +169,12 @@ $(eval $(if $(filter universal%,$(TOOLCHAIN)),LIPO_OBJS,BUILD_OBJS):=yes)
 # Create build/install dependencies for all examples. The common case
 # is handled here. The MSVS case is handled below.
 NOT_MSVS = $(if $(CONFIG_MSVS),,yes)
-DIST-BINS-$(NOT_MSVS)      += $(addprefix bin/,$(ALL_EXAMPLES:.c=))
-INSTALL-BINS-$(NOT_MSVS)   += $(addprefix bin/,$(UTILS:.c=))
+DIST-BINS-$(NOT_MSVS)      += $(addprefix bin/,$(ALL_EXAMPLES:.c=$(EXE_SFX)))
+INSTALL-BINS-$(NOT_MSVS)   += $(addprefix bin/,$(UTILS:.c=$(EXE_SFX)))
 DIST-SRCS-yes              += $(ALL_SRCS)
 INSTALL-SRCS-yes           += $(UTIL_SRCS)
 OBJS-$(NOT_MSVS)           += $(if $(BUILD_OBJS),$(call objs,$(ALL_SRCS)))
-BINS-$(NOT_MSVS)           += $(addprefix $(BUILD_PFX),$(ALL_EXAMPLES:.c=))
+BINS-$(NOT_MSVS)           += $(addprefix $(BUILD_PFX),$(ALL_EXAMPLES:.c=$(EXE_SFX)))


 # Instantiate linker template for all examples.
@@ -183,7 +184,7 @@ $(foreach bin,$(BINS-yes),\
    $(if $(BUILD_OBJS),$(eval $(bin):\
        $(LIB_PATH)/lib$(CODEC_LIB)$(CODEC_LIB_SUF)))\
    $(if $(BUILD_OBJS),$(eval $(call linker_template,$(bin),\
-        $(call objs,$($(notdir $(bin)).SRCS)) \
+        $(call objs,$($(notdir $(bin:$(EXE_SFX)=)).SRCS)) \
        -l$(CODEC_LIB) $(addprefix -l,$(CODEC_EXTRA_LIBS))\
        )))\
    $(if $(LIPO_OBJS),$(eval $(call lipo_bin_template,$(bin))))\
--- a/libs.mk
+++ b/libs.mk
@@ -17,6 +17,7 @@ else
  ASM:=.asm
 endif

+CODEC_SRCS-yes += CHANGELOG
 CODEC_SRCS-yes += libs.mk

 include $(SRC_PATH_BARE)/vpx/vpx_codec.mk
@@ -34,9 +35,9 @@ ifeq ($(CONFIG_VP8_ENCODER),yes)
  include $(SRC_PATH_BARE)/$(VP8_PREFIX)vp8cx.mk
  CODEC_SRCS-yes += $(addprefix $(VP8_PREFIX),$(call enabled,VP8_CX_SRCS))
  CODEC_EXPORTS-yes += $(addprefix $(VP8_PREFIX),$(VP8_CX_EXPORTS))
-  CODEC_SRCS-yes += $(VP8_PREFIX)vp8cx.mk vpx/vp8.h vpx/vp8cx.h vpx/vp8e.h
+  CODEC_SRCS-yes += $(VP8_PREFIX)vp8cx.mk vpx/vp8.h vpx/vp8cx.h
  CODEC_SRCS-$(ARCH_ARM) += $(VP8_PREFIX)vp8cx_arm.mk
-  INSTALL-LIBS-yes += include/vpx/vp8.h include/vpx/vp8e.h include/vpx/vp8cx.h
+  INSTALL-LIBS-yes += include/vpx/vp8.h include/vpx/vp8cx.h
  INSTALL_MAPS += include/vpx/% $(SRC_PATH_BARE)/$(VP8_PREFIX)/%
  CODEC_DOC_SRCS += vpx/vp8.h vpx/vp8cx.h
  CODEC_DOC_SECTIONS += vp8 vp8_encoder
@@ -48,7 +49,6 @@ ifeq ($(CONFIG_VP8_DECODER),yes)
  CODEC_SRCS-yes += $(addprefix $(VP8_PREFIX),$(call enabled,VP8_DX_SRCS))
  CODEC_EXPORTS-yes += $(addprefix $(VP8_PREFIX),$(VP8_DX_EXPORTS))
  CODEC_SRCS-yes += $(VP8_PREFIX)vp8dx.mk vpx/vp8.h vpx/vp8dx.h
-  CODEC_SRCS-$(ARCH_ARM) += $(VP8_PREFIX)vp8dx_arm.mk
  INSTALL-LIBS-yes += include/vpx/vp8.h include/vpx/vp8dx.h
  INSTALL_MAPS += include/vpx/% $(SRC_PATH_BARE)/$(VP8_PREFIX)/%
  CODEC_DOC_SRCS += vpx/vp8.h vpx/vp8dx.h
@@ -90,6 +90,7 @@ endif
 $(eval $(if $(filter universal%,$(TOOLCHAIN)),LIPO_LIBVPX,BUILD_LIBVPX):=yes)

 CODEC_SRCS-$(BUILD_LIBVPX) += build/make/version.sh
+CODEC_SRCS-$(BUILD_LIBVPX) += build/make/rtcd.sh
 CODEC_SRCS-$(BUILD_LIBVPX) += vpx/vpx_integer.h
 CODEC_SRCS-$(BUILD_LIBVPX) += vpx_ports/asm_offsets.h
 CODEC_SRCS-$(BUILD_LIBVPX) += vpx_ports/vpx_timer.h
@@ -114,7 +115,6 @@ INSTALL-LIBS-yes += include/vpx/vpx_integer.h
 INSTALL-LIBS-yes += include/vpx/vpx_codec_impl_top.h
 INSTALL-LIBS-yes += include/vpx/vpx_codec_impl_bottom.h
 INSTALL-LIBS-$(CONFIG_DECODERS) += include/vpx/vpx_decoder.h
-INSTALL-LIBS-$(CONFIG_DECODERS) += include/vpx/vpx_decoder_compat.h
 INSTALL-LIBS-$(CONFIG_ENCODERS) += include/vpx/vpx_encoder.h
 ifeq ($(CONFIG_EXTERNAL_BUILD),yes)
 ifeq ($(CONFIG_MSVS),yes)
@@ -183,6 +183,7 @@ vpx.vcproj: $(CODEC_SRCS) vpx.def
 PROJECTS-$(BUILD_LIBVPX) += vpx.vcproj

 vpx.vcproj: vpx_config.asm
+vpx.vcproj: vpx_rtcd.h

 endif
 else
@@ -232,7 +233,7 @@ vpx.pc: config.mk libs.mk
 	$(qexec)echo '# pkg-config file from libvpx $(VERSION_STRING)' > $@
 	$(qexec)echo 'prefix=$(PREFIX)' >> $@
 	$(qexec)echo 'exec_prefix=$${prefix}' >> $@
-	$(qexec)echo 'libdir=$${prefix}/lib' >> $@
+	$(qexec)echo 'libdir=$${prefix}/$(LIBSUBDIR)' >> $@
 	$(qexec)echo 'includedir=$${prefix}/include' >> $@
 	$(qexec)echo '' >> $@
 	$(qexec)echo 'Name: vpx' >> $@
@@ -279,19 +280,21 @@ $(filter %$(ASM).o,$(OBJS-yes)): $(BUILD_PFX)vpx_config.asm
 # Calculate platform- and compiler-specific offsets for hand coded assembly
 #

+OFFSET_PATTERN:='^[a-zA-Z0-9_]* EQU'
+
 ifeq ($(filter icc gcc,$(TGT_CC)), $(TGT_CC))
    $(BUILD_PFX)asm_com_offsets.asm: $(BUILD_PFX)$(VP8_PREFIX)common/asm_com_offsets.c.S
-	grep -w EQU $< | tr -d '$$\#' $(ADS2GAS) > $@
+	LC_ALL=C grep $(OFFSET_PATTERN) $< | tr -d '$$\#' $(ADS2GAS) > $@
    $(BUILD_PFX)$(VP8_PREFIX)common/asm_com_offsets.c.S: $(VP8_PREFIX)common/asm_com_offsets.c
    CLEAN-OBJS += $(BUILD_PFX)asm_com_offsets.asm $(BUILD_PFX)$(VP8_PREFIX)common/asm_com_offsets.c.S

    $(BUILD_PFX)asm_enc_offsets.asm: $(BUILD_PFX)$(VP8_PREFIX)encoder/asm_enc_offsets.c.S
-	grep -w EQU $< | tr -d '$$\#' $(ADS2GAS) > $@
+	LC_ALL=C grep $(OFFSET_PATTERN) $< | tr -d '$$\#' $(ADS2GAS) > $@
    $(BUILD_PFX)$(VP8_PREFIX)encoder/asm_enc_offsets.c.S: $(VP8_PREFIX)encoder/asm_enc_offsets.c
    CLEAN-OBJS += $(BUILD_PFX)asm_enc_offsets.asm $(BUILD_PFX)$(VP8_PREFIX)encoder/asm_enc_offsets.c.S

    $(BUILD_PFX)asm_dec_offsets.asm: $(BUILD_PFX)$(VP8_PREFIX)decoder/asm_dec_offsets.c.S
-	grep -w EQU $< | tr -d '$$\#' $(ADS2GAS) > $@
+	LC_ALL=C grep $(OFFSET_PATTERN) $< | tr -d '$$\#' $(ADS2GAS) > $@
    $(BUILD_PFX)$(VP8_PREFIX)decoder/asm_dec_offsets.c.S: $(VP8_PREFIX)decoder/asm_dec_offsets.c
    CLEAN-OBJS += $(BUILD_PFX)asm_dec_offsets.asm $(BUILD_PFX)$(VP8_PREFIX)decoder/asm_dec_offsets.c.S
 else
@@ -322,6 +325,18 @@ endif
 $(shell $(SRC_PATH_BARE)/build/make/version.sh "$(SRC_PATH_BARE)" $(BUILD_PFX)vpx_version.h)
 CLEAN-OBJS += $(BUILD_PFX)vpx_version.h

+#
+# Rule to generate runtime cpu detection files
+#
+$(OBJS-yes:.o=.d): $(BUILD_PFX)vpx_rtcd.h
+$(BUILD_PFX)vpx_rtcd.h: $(SRC_PATH_BARE)/$(sort $(filter %rtcd_defs.sh,$(CODEC_SRCS)))
+	@echo "    [CREATE] $@"
+	$(qexec)$(SRC_PATH_BARE)/build/make/rtcd.sh --arch=$(TGT_ISA) \
+          --sym=vpx_rtcd \
+          --config=$(target)$(if $(FAT_ARCHS),,-$(TOOLCHAIN)).mk \
+          $(RTCD_OPTIONS) $^ > $@
+CLEAN-OBJS += $(BUILD_PFX)vpx_rtcd.h
+
 CODEC_DOC_SRCS += vpx/vpx_codec.h \
                  vpx/vpx_decoder.h \
                  vpx/vpx_encoder.h \
--- a/mainpage.dox
+++ b/mainpage.dox
@@ -12,8 +12,12 @@

  This distribution of the WebM VP8 Codec SDK includes the following support:

-  \if vp8_encoder    - \ref vp8_encoder   \endif
-  \if vp8_decoder    - \ref vp8_decoder   \endif
+  \if vp8_encoder
+  - \ref vp8_encoder
+  \endif
+  \if vp8_decoder
+  - \ref vp8_decoder
+  \endif


  \section main_startpoints Starting Points
@@ -24,8 +28,12 @@
  - Read the \ref samples "sample code" for examples of how to interact with the
    codec.
  - \ref codec reference
-    \if encoder - \ref encoder reference \endif
-    \if decoder - \ref decoder reference \endif
+    \if encoder
+    - \ref encoder reference
+    \endif
+    \if decoder
+    - \ref decoder reference
+    \endif

  \section main_support Support Options & FAQ
  The WebM project is an open source project supported by its community. For
--- a/tools/ftfy.sh
+++ b/tools/ftfy.sh
@@ -0,0 +1,160 @@
+#!/bin/sh
+self="$0"
+dirname_self=$(dirname "$self")
+
+usage() {
+  cat <<EOF >&2
+Usage: $self [option]
+
+This script applies a whitespace transformation to the commit at HEAD. If no
+options are given, then the modified files are left in the working tree.
+
+Options:
+  -h, --help     Shows this message
+  -n, --dry-run  Shows a diff of the changes to be made.
+  --amend        Squashes the changes into the commit at HEAD
+                     This option will also reformat the commit message.
+  --commit       Creates a new commit containing only the whitespace changes
+  --msg-only     Reformat the commit message only, ignore the patch itself.
+
+EOF
+  rm -f ${CLEAN_FILES}
+  exit 1
+}
+
+
+log() {
+  echo "${self##*/}: $@" >&2
+}
+
+
+vpx_style() {
+  astyle --style=bsd --min-conditional-indent=0 --break-blocks \
+         --pad-oper --pad-header --unpad-paren \
+         --align-pointer=name \
+         --indent-preprocessor --convert-tabs --indent-labels \
+         --suffix=none --quiet "$@"
+  sed -i 's/[[:space:]]\{1,\},/,/g' "$@"
+}
+
+
+apply() {
+  [ $INTERSECT_RESULT -ne 0 ] && patch -p1 < "$1"
+}
+
+
+commit() {
+  LAST_CHANGEID=$(git show | awk '/Change-Id:/{print $2}')
+  if [ -z "$LAST_CHANGEID" ]; then
+    log "HEAD doesn't have a Change-Id, unable to generate a new commit"
+    exit 1
+  fi
+
+  # Build a deterministic Change-Id from the parent's
+  NEW_CHANGEID=${LAST_CHANGEID}-styled
+  NEW_CHANGEID=I$(echo $NEW_CHANGEID | git hash-object --stdin)
+
+  # Commit, preserving authorship from the parent commit.
+  git commit -a -C HEAD > /dev/null
+  git commit --amend -F- << EOF
+Cosmetic: Fix whitespace in change ${LAST_CHANGEID:0:9}
+
+Change-Id: ${NEW_CHANGEID}
+EOF
+}
+
+
+show_commit_msg_diff() {
+  if [ $DIFF_MSG_RESULT -ne 0 ]; then
+    log "Modified commit message:"
+    diff -u "$ORIG_COMMIT_MSG" "$NEW_COMMIT_MSG" | tail -n +3
+  fi
+}
+
+
+amend() {
+  show_commit_msg_diff
+  if [ $DIFF_MSG_RESULT -ne 0 ] || [ $INTERSECT_RESULT -ne 0 ]; then
+    git commit -a --amend -F "$NEW_COMMIT_MSG"
+  fi
+}
+
+
+diff_msg() {
+  git log -1 --format=%B > "$ORIG_COMMIT_MSG"
+  "${dirname_self}"/wrap-commit-msg.py \
+      < "$ORIG_COMMIT_MSG" > "$NEW_COMMIT_MSG"
+  cmp -s "$ORIG_COMMIT_MSG" "$NEW_COMMIT_MSG"
+  DIFF_MSG_RESULT=$?
+}
+
+
+# Temporary files
+ORIG_DIFF=orig.diff.$$
+MODIFIED_DIFF=modified.diff.$$
+FINAL_DIFF=final.diff.$$
+ORIG_COMMIT_MSG=orig.commit-msg.$$
+NEW_COMMIT_MSG=new.commit-msg.$$
+CLEAN_FILES="${ORIG_DIFF} ${MODIFIED_DIFF} ${FINAL_DIFF}"
+CLEAN_FILES="${CLEAN_FILES} ${ORIG_COMMIT_MSG} ${NEW_COMMIT_MSG}"
+
+# Preconditions
+[ $# -lt 2 ] || usage
+
+# Check that astyle supports pad-header and align-pointer=name
+if ! astyle --pad-header --align-pointer=name < /dev/null; then
+  log "Install astyle v1.24 or newer"
+  exit 1
+fi
+
+if ! git diff --quiet HEAD; then
+  log "Working tree is dirty, commit your changes first"
+  exit 1
+fi
+
+# Need to be in the root
+cd "$(git rev-parse --show-toplevel)"
+
+# Collect the original diff
+git show > "${ORIG_DIFF}"
+
+# Apply the style guide on new and modified files and collect its diff
+for f in $(git diff HEAD^ --name-only -M90 --diff-filter=AM \
+           | grep '\.[ch]$'); do
+  case "$f" in
+    third_party/*) continue;;
+    nestegg/*) continue;;
+  esac
+  vpx_style "$f"
+done
+git diff --no-color --no-ext-diff > "${MODIFIED_DIFF}"
+
+# Intersect the two diffs
+"${dirname_self}"/intersect-diffs.py \
+    "${ORIG_DIFF}" "${MODIFIED_DIFF}" > "${FINAL_DIFF}"
+INTERSECT_RESULT=$?
+git reset --hard >/dev/null
+
+# Fixup the commit message
+diff_msg
+
+# Handle options
+if [ -n "$1" ]; then
+  case "$1" in
+    -h|--help) usage;;
+    -n|--dry-run) cat "${FINAL_DIFF}"; show_commit_msg_diff;;
+    --commit) apply "${FINAL_DIFF}"; commit;;
+    --amend) apply "${FINAL_DIFF}"; amend;;
+    --msg-only) amend;;
+    *) usage;;
+  esac
+else
+  apply "${FINAL_DIFF}"
+  if ! git diff --quiet; then
+    log "Formatting changes applied, verify and commit."
+    log "See also: http://www.webmproject.org/code/contribute/conventions/"
+    git diff --stat
+  fi
+fi
+
+rm -f ${CLEAN_FILES}
--- a/tools/intersect-diffs.py
+++ b/tools/intersect-diffs.py
@@ -0,0 +1,188 @@
+#!/usr/bin/env python
+##  Copyright (c) 2012 The WebM project authors. All Rights Reserved.
+##
+##  Use of this source code is governed by a BSD-style license
+##  that can be found in the LICENSE file in the root of the source
+##  tree. An additional intellectual property rights grant can be found
+##  in the file PATENTS.  All contributing project authors may
+##  be found in the AUTHORS file in the root of the source tree.
+##
+"""Calculates the "intersection" of two unified diffs.
+
+Given two diffs, A and B, it finds all hunks in B that had non-context lines
+in A and prints them to stdout. This is useful to determine the hunks in B that
+are relevant to A. The resulting file can be applied with patch(1) on top of A.
+"""
+
+__author__ = "jkoleszar@google.com"
+
+import re
+import sys
+
+
+class DiffLines(object):
+    """A container for one half of a diff."""
+
+    def __init__(self, filename, offset, length):
+        self.filename = filename
+        self.offset = offset
+        self.length = length
+        self.lines = []
+        self.delta_line_nums = []
+
+    def Append(self, line):
+        l = len(self.lines)
+        if line[0] != " ":
+            self.delta_line_nums.append(self.offset + l)
+        self.lines.append(line[1:])
+        assert l+1 <= self.length
+
+    def Complete(self):
+        return len(self.lines) == self.length
+
+    def __contains__(self, item):
+        return item >= self.offset and item <= self.offset + self.length - 1
+
+
+class DiffHunk(object):
+    """A container for one diff hunk, consisting of two DiffLines."""
+
+    def __init__(self, header, file_a, file_b, start_a, len_a, start_b, len_b):
+        self.header = header
+        self.left = DiffLines(file_a, start_a, len_a)
+        self.right = DiffLines(file_b, start_b, len_b)
+        self.lines = []
+
+    def Append(self, line):
+        """Adds a line to the DiffHunk and its DiffLines children."""
+        if line[0] == "-":
+            self.left.Append(line)
+        elif line[0] == "+":
+            self.right.Append(line)
+        elif line[0] == " ":
+            self.left.Append(line)
+            self.right.Append(line)
+        else:
+            assert False, ("Unrecognized character at start of diff line "
+                           "%r" % line[0])
+        self.lines.append(line)
+
+    def Complete(self):
+        return self.left.Complete() and self.right.Complete()
+
+    def __repr__(self):
+        return "DiffHunk(%s, %s, len %d)" % (
+            self.left.filename, self.right.filename,
+            max(self.left.length, self.right.length))
+
+
+def ParseDiffHunks(stream):
+    """Walk a file-like object, yielding DiffHunks as they're parsed."""
+
+    file_regex = re.compile(r"(\+\+\+|---) (\S+)")
+    range_regex = re.compile(r"@@ -(\d+)(,(\d+))? \+(\d+)(,(\d+))?")
+    hunk = None
+    while True:
+        line = stream.readline()
+        if not line:
+            break
+
+        if hunk is None:
+            # Parse file names
+            diff_file = file_regex.match(line)
+            if diff_file:
+              if line.startswith("---"):
+                  a_line = line
+                  a = diff_file.group(2)
+                  continue
+              if line.startswith("+++"):
+                  b_line = line
+                  b = diff_file.group(2)
+                  continue
+
+            # Parse offset/lengths
+            diffrange = range_regex.match(line)
+            if diffrange:
+                if diffrange.group(2):
+                    start_a = int(diffrange.group(1))
+                    len_a = int(diffrange.group(3))
+                else:
+                    start_a = 1
+                    len_a = int(diffrange.group(1))
+
+                if diffrange.group(5):
+                    start_b = int(diffrange.group(4))
+                    len_b = int(diffrange.group(6))
+                else:
+                    start_b = 1
+                    len_b = int(diffrange.group(4))
+
+                header = [a_line, b_line, line]
+                hunk = DiffHunk(header, a, b, start_a, len_a, start_b, len_b)
+        else:
+            # Add the current line to the hunk
+            hunk.Append(line)
+
+            # See if the whole hunk has been parsed. If so, yield it and prepare
+            # for the next hunk.
+            if hunk.Complete():
+                yield hunk
+                hunk = None
+
+    # Partial hunks are a parse error
+    assert hunk is None
+
+
+def FormatDiffHunks(hunks):
+    """Re-serialize a list of DiffHunks."""
+    r = []
+    last_header = None
+    for hunk in hunks:
+        this_header = hunk.header[0:2]
+        if last_header != this_header:
+            r.extend(hunk.header)
+            last_header = this_header
+        else:
+            r.extend(hunk.header[2])
+        r.extend(hunk.lines)
+        r.append("\n")
+    return "".join(r)
+
+
+def ZipHunks(rhs_hunks, lhs_hunks):
+    """Join two hunk lists on filename."""
+    for rhs_hunk in rhs_hunks:
+        rhs_file = rhs_hunk.right.filename.split("/")[1:]
+
+        for lhs_hunk in lhs_hunks:
+            lhs_file = lhs_hunk.left.filename.split("/")[1:]
+            if lhs_file != rhs_file:
+                continue
+            yield (rhs_hunk, lhs_hunk)
+
+
+def main():
+    old_hunks = [x for x in ParseDiffHunks(open(sys.argv[1], "r"))]
+    new_hunks = [x for x in ParseDiffHunks(open(sys.argv[2], "r"))]
+    out_hunks = []
+
+    # Join the right hand side of the older diff with the left hand side of the
+    # newer diff.
+    for old_hunk, new_hunk in ZipHunks(old_hunks, new_hunks):
+        if new_hunk in out_hunks:
+            continue
+        old_lines = old_hunk.right
+        new_lines = new_hunk.left
+
+        # Determine if this hunk overlaps any non-context line from the other
+        for i in old_lines.delta_line_nums:
+            if i in new_lines:
+                out_hunks.append(new_hunk)
+                break
+
+    if out_hunks:
+        print FormatDiffHunks(out_hunks)
+        sys.exit(1)
+
+if __name__ == "__main__":
+    main()
--- a/tools/wrap-commit-msg.py
+++ b/tools/wrap-commit-msg.py
@@ -0,0 +1,70 @@
+#!/usr/bin/env python
+##  Copyright (c) 2012 The WebM project authors. All Rights Reserved.
+##
+##  Use of this source code is governed by a BSD-style license
+##  that can be found in the LICENSE file in the root of the source
+##  tree. An additional intellectual property rights grant can be found
+##  in the file PATENTS.  All contributing project authors may
+##  be found in the AUTHORS file in the root of the source tree.
+##
+"""Wraps paragraphs of text, preserving manual formatting
+
+This is like fold(1), but has the special convention of not modifying lines
+that start with whitespace. This allows you to intersperse blocks with
+special formatting, like code blocks, with written prose. The prose will
+be wordwrapped, and the manual formatting will be preserved.
+
+ * This won't handle the case of a bulleted (or ordered) list specially, so
+   manual wrapping must be done.
+
+Occasionally it's useful to put something with explicit formatting that
+doesn't look at all like a block of text inline.
+
+  indicator = has_leading_whitespace(line);
+  if (indicator)
+    preserve_formatting(line);
+
+The intent is that this docstring would make it through the transform
+and still be legible and presented as it is in the source. If additional
+cases are handled, update this doc to describe the effect.
+"""
+
+__author__ = "jkoleszar@google.com"
+import textwrap
+import sys
+
+def wrap(text):
+    if text:
+        return textwrap.fill(text, break_long_words=False) + '\n'
+    return ""
+
+
+def main(fileobj):
+    text = ""
+    output = ""
+    while True:
+        line = fileobj.readline()
+        if not line:
+            break
+
+        if line.lstrip() == line:
+            text += line
+        else:
+            output += wrap(text)
+            text=""
+            output += line
+    output += wrap(text)
+
+    # Replace the file or write to stdout.
+    if fileobj == sys.stdin:
+        fileobj = sys.stdout
+    else:
+        fileobj.seek(0)
+        fileobj.truncate(0)
+    fileobj.write(output)
+
+if __name__ == "__main__":
+    if len(sys.argv) > 1:
+        main(open(sys.argv[1], "r+"))
+    else:
+        main(sys.stdin)
--- a/tools_common.c
+++ b/tools_common.c
@@ -9,15 +9,21 @@
 */
 #include <stdio.h>
 #include "tools_common.h"
-#ifdef _WIN32
+#if defined(_WIN32) || defined(__OS2__)
 #include <io.h>
 #include <fcntl.h>
+
+#ifdef __OS2__
+#define _setmode    setmode
+#define _fileno     fileno
+#define _O_BINARY   O_BINARY
+#endif
 #endif

 FILE* set_binary_mode(FILE *stream)
 {
    (void)stream;
-#ifdef _WIN32
+#if defined(_WIN32) || defined(__OS2__)
    _setmode(_fileno(stream), _O_BINARY);
 #endif
    return stream;
--- a/usage.dox
+++ b/usage.dox
@@ -1,6 +1,6 @@
 /*!\page usage Usage

-    The vpx Multi-Format codec SDK provides a unified interface amongst its
+    The vpx multi-format codec SDK provides a unified interface amongst its
    supported codecs. This abstraction allows applications using this SDK to
    easily support multiple video formats with minimal code duplication or
    "special casing." This section describes the interface common to all codecs.
@@ -14,8 +14,12 @@

    Fore more information on decoder and encoder specific usage, see the
    following pages:
-    \if decoder - \subpage usage_decode \endif
-    \if decoder - \subpage usage_encode \endif
+    \if decoder
+    - \subpage usage_decode
+    \endif
+    \if decoder
+    - \subpage usage_encode
+    \endif

    \section usage_types Important Data Types
    There are two important data structures to consider in this interface.
--- a/vp8/common/alloccommon.c
+++ b/vp8/common/alloccommon.c
@@ -37,14 +37,15 @@ static void update_mode_info_border(MODE_INFO *mi, int rows, int cols)
 void vp8_de_alloc_frame_buffers(VP8_COMMON *oci)
 {
    int i;
-
    for (i = 0; i < NUM_YV12_BUFFERS; i++)
        vp8_yv12_de_alloc_frame_buffer(&oci->yv12_fb[i]);

    vp8_yv12_de_alloc_frame_buffer(&oci->temp_scale_frame);
+#if CONFIG_POSTPROC
    vp8_yv12_de_alloc_frame_buffer(&oci->post_proc_buffer);
    if (oci->post_proc_buffer_int_used)
        vp8_yv12_de_alloc_frame_buffer(&oci->post_proc_buffer_int);
+#endif

    vpx_free(oci->above_context);
    vpx_free(oci->mip);
@@ -97,6 +98,7 @@ int vp8_alloc_frame_buffers(VP8_COMMON *oci, int width, int height)
        return 1;
    }

+#if CONFIG_POSTPROC
    if (vp8_yv12_alloc_frame_buffer(&oci->post_proc_buffer, width, height, VP8BORDERINPIXELS) < 0)
    {
        vp8_de_alloc_frame_buffers(oci);
@@ -104,6 +106,9 @@ int vp8_alloc_frame_buffers(VP8_COMMON *oci, int width, int height)
    }

    oci->post_proc_buffer_int_used = 0;
+    vpx_memset(&oci->postproc_state, 0, sizeof(oci->postproc_state));
+    vpx_memset((&oci->post_proc_buffer)->buffer_alloc,128,(&oci->post_proc_buffer)->frame_size);
+#endif

    oci->mb_rows = height >> 4;
    oci->mb_cols = width >> 4;
@@ -203,7 +208,7 @@ void vp8_create_common(VP8_COMMON *oci)
    oci->clr_type = REG_YUV;
    oci->clamp_type = RECON_CLAMP_REQUIRED;

-    /* Initialise reference frame sign bias structure to defaults */
+    /* Initialize reference frame sign bias structure to defaults */
    vpx_memset(oci->ref_frame_sign_bias, 0, sizeof(oci->ref_frame_sign_bias));

    /* Default disable buffer to buffer copying */
@@ -215,13 +220,3 @@ void vp8_remove_common(VP8_COMMON *oci)
 {
    vp8_de_alloc_frame_buffers(oci);
 }
-
-void vp8_initialize_common()
-{
-    vp8_coef_tree_initialize();
-
-    vp8_entropy_mode_init();
-
-    vp8_init_scan_order_mask();
-
-}
--- a/vp8/common/arm/arm_systemdependent.c
+++ b/vp8/common/arm/arm_systemdependent.c
@@ -1,115 +0,0 @@
-/*
- *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-
-#include "vpx_config.h"
-#include "vpx_ports/arm.h"
-#include "vp8/common/pragmas.h"
-#include "vp8/common/subpixel.h"
-#include "vp8/common/loopfilter.h"
-#include "vp8/common/recon.h"
-#include "vp8/common/idct.h"
-#include "vp8/common/onyxc_int.h"
-
-void vp8_arch_arm_common_init(VP8_COMMON *ctx)
-{
-#if CONFIG_RUNTIME_CPU_DETECT
-    VP8_COMMON_RTCD *rtcd = &ctx->rtcd;
-    int flags = arm_cpu_caps();
-    rtcd->flags = flags;
-
-    /* Override default functions with fastest ones for this CPU. */
-#if HAVE_ARMV5TE
-    if (flags & HAS_EDSP)
-    {
-    }
-#endif
-
-#if HAVE_ARMV6
-    if (flags & HAS_MEDIA)
-    {
-        rtcd->subpix.sixtap16x16   = vp8_sixtap_predict16x16_armv6;
-        rtcd->subpix.sixtap8x8     = vp8_sixtap_predict8x8_armv6;
-        rtcd->subpix.sixtap8x4     = vp8_sixtap_predict8x4_armv6;
-        rtcd->subpix.sixtap4x4     = vp8_sixtap_predict_armv6;
-        rtcd->subpix.bilinear16x16 = vp8_bilinear_predict16x16_armv6;
-        rtcd->subpix.bilinear8x8   = vp8_bilinear_predict8x8_armv6;
-        rtcd->subpix.bilinear8x4   = vp8_bilinear_predict8x4_armv6;
-        rtcd->subpix.bilinear4x4   = vp8_bilinear_predict4x4_armv6;
-
-        rtcd->idct.idct16       = vp8_short_idct4x4llm_v6_dual;
-        rtcd->idct.iwalsh16     = vp8_short_inv_walsh4x4_v6;
-
-        rtcd->loopfilter.normal_mb_v = vp8_loop_filter_mbv_armv6;
-        rtcd->loopfilter.normal_b_v  = vp8_loop_filter_bv_armv6;
-        rtcd->loopfilter.normal_mb_h = vp8_loop_filter_mbh_armv6;
-        rtcd->loopfilter.normal_b_h  = vp8_loop_filter_bh_armv6;
-        rtcd->loopfilter.simple_mb_v =
-                vp8_loop_filter_simple_vertical_edge_armv6;
-        rtcd->loopfilter.simple_b_v  = vp8_loop_filter_bvs_armv6;
-        rtcd->loopfilter.simple_mb_h =
-                vp8_loop_filter_simple_horizontal_edge_armv6;
-        rtcd->loopfilter.simple_b_h  = vp8_loop_filter_bhs_armv6;
-
-        rtcd->recon.copy16x16   = vp8_copy_mem16x16_v6;
-        rtcd->recon.copy8x8     = vp8_copy_mem8x8_v6;
-        rtcd->recon.copy8x4     = vp8_copy_mem8x4_v6;
-        rtcd->recon.intra4x4_predict = vp8_intra4x4_predict_armv6;
-
-        rtcd->dequant.block               = vp8_dequantize_b_v6;
-        rtcd->dequant.idct_add            = vp8_dequant_idct_add_v6;
-        rtcd->dequant.idct_add_y_block    = vp8_dequant_idct_add_y_block_v6;
-        rtcd->dequant.idct_add_uv_block   = vp8_dequant_idct_add_uv_block_v6;
-
-    }
-#endif
-
-#if HAVE_ARMV7
-    if (flags & HAS_NEON)
-    {
-        rtcd->subpix.sixtap16x16   = vp8_sixtap_predict16x16_neon;
-        rtcd->subpix.sixtap8x8     = vp8_sixtap_predict8x8_neon;
-        rtcd->subpix.sixtap8x4     = vp8_sixtap_predict8x4_neon;
-        rtcd->subpix.sixtap4x4     = vp8_sixtap_predict_neon;
-        rtcd->subpix.bilinear16x16 = vp8_bilinear_predict16x16_neon;
-        rtcd->subpix.bilinear8x8   = vp8_bilinear_predict8x8_neon;
-        rtcd->subpix.bilinear8x4   = vp8_bilinear_predict8x4_neon;
-        rtcd->subpix.bilinear4x4   = vp8_bilinear_predict4x4_neon;
-
-        rtcd->idct.idct16       = vp8_short_idct4x4llm_neon;
-        rtcd->idct.iwalsh16     = vp8_short_inv_walsh4x4_neon;
-
-        rtcd->loopfilter.normal_mb_v = vp8_loop_filter_mbv_neon;
-        rtcd->loopfilter.normal_b_v  = vp8_loop_filter_bv_neon;
-        rtcd->loopfilter.normal_mb_h = vp8_loop_filter_mbh_neon;
-        rtcd->loopfilter.normal_b_h  = vp8_loop_filter_bh_neon;
-        rtcd->loopfilter.simple_mb_v = vp8_loop_filter_mbvs_neon;
-        rtcd->loopfilter.simple_b_v  = vp8_loop_filter_bvs_neon;
-        rtcd->loopfilter.simple_mb_h = vp8_loop_filter_mbhs_neon;
-        rtcd->loopfilter.simple_b_h  = vp8_loop_filter_bhs_neon;
-
-        rtcd->recon.copy16x16   = vp8_copy_mem16x16_neon;
-        rtcd->recon.copy8x8     = vp8_copy_mem8x8_neon;
-        rtcd->recon.copy8x4     = vp8_copy_mem8x4_neon;
-        rtcd->recon.build_intra_predictors_mby =
-            vp8_build_intra_predictors_mby_neon;
-        rtcd->recon.build_intra_predictors_mby_s =
-            vp8_build_intra_predictors_mby_s_neon;
-
-        rtcd->dequant.block               = vp8_dequantize_b_neon;
-        rtcd->dequant.idct_add            = vp8_dequant_idct_add_neon;
-        rtcd->dequant.idct_add_y_block    = vp8_dequant_idct_add_y_block_neon;
-        rtcd->dequant.idct_add_uv_block   = vp8_dequant_idct_add_uv_block_neon;
-
-    }
-#endif
-
-#endif
-}
--- a/vp8/common/arm/armv6/idct_blk_v6.c
+++ b/vp8/common/arm/armv6/idct_blk_v6.c
@@ -9,8 +9,7 @@
 */

 #include "vpx_config.h"
-#include "vp8/common/idct.h"
-#include "vp8/common/dequantize.h"
+#include "vpx_rtcd.h"


 void vp8_dequant_idct_add_y_block_v6(short *q, short *dq,
--- a/vp8/encoder/arm/armv6/vp8_sad16x16_armv6.asm
+++ b/vp8/encoder/arm/armv6/vp8_sad16x16_armv6.asm
--- a/vp8/encoder/arm/armv6/vp8_variance16x16_armv6.asm
+++ b/vp8/encoder/arm/armv6/vp8_variance16x16_armv6.asm
@@ -144,7 +144,7 @@ loop
    ldr     r6, [sp, #40]       ; get address of sse
    mul     r0, r8, r8          ; sum * sum
    str     r11, [r6]           ; store sse
-    sub     r0, r11, r0, asr #8 ; return (sse - ((sum * sum) >> 8))
+    sub     r0, r11, r0, lsr #8 ; return (sse - ((sum * sum) >> 8))

    ldmfd   sp!, {r4-r12, pc}

--- a/vp8/encoder/arm/armv6/vp8_variance8x8_armv6.asm
+++ b/vp8/encoder/arm/armv6/vp8_variance8x8_armv6.asm
--- a/vp8/encoder/arm/armv6/vp8_variance_halfpixvar16x16_h_armv6.asm
+++ b/vp8/encoder/arm/armv6/vp8_variance_halfpixvar16x16_h_armv6.asm
@@ -169,7 +169,7 @@ loop
    ldr     r6, [sp, #40]       ; get address of sse
    mul     r0, r8, r8          ; sum * sum
    str     r11, [r6]           ; store sse
-    sub     r0, r11, r0, asr #8 ; return (sse - ((sum * sum) >> 8))
+    sub     r0, r11, r0, lsr #8 ; return (sse - ((sum * sum) >> 8))

    ldmfd   sp!, {r4-r12, pc}

--- a/vp8/encoder/arm/armv6/vp8_variance_halfpixvar16x16_hv_armv6.asm
+++ b/vp8/encoder/arm/armv6/vp8_variance_halfpixvar16x16_hv_armv6.asm
@@ -210,7 +210,7 @@ loop
    ldr     r6, [sp, #40]       ; get address of sse
    mul     r0, r8, r8          ; sum * sum
    str     r11, [r6]           ; store sse
-    sub     r0, r11, r0, asr #8 ; return (sse - ((sum * sum) >> 8))
+    sub     r0, r11, r0, lsr #8 ; return (sse - ((sum * sum) >> 8))

    ldmfd   sp!, {r4-r12, pc}

--- a/vp8/encoder/arm/armv6/vp8_variance_halfpixvar16x16_v_armv6.asm
+++ b/vp8/encoder/arm/armv6/vp8_variance_halfpixvar16x16_v_armv6.asm
@@ -171,7 +171,7 @@ loop
    ldr     r6, [sp, #40]       ; get address of sse
    mul     r0, r8, r8          ; sum * sum
    str     r11, [r6]           ; store sse
-    sub     r0, r11, r0, asr #8 ; return (sse - ((sum * sum) >> 8))
+    sub     r0, r11, r0, lsr #8 ; return (sse - ((sum * sum) >> 8))

    ldmfd   sp!, {r4-r12, pc}

--- a/vp8/common/arm/bilinearfilter_arm.c
+++ b/vp8/common/arm/bilinearfilter_arm.c
@@ -8,10 +8,10 @@
 *  be found in the AUTHORS file in the root of the source tree.
 */

-
+#include "vpx_config.h"
+#include "vpx_rtcd.h"
 #include <math.h>
 #include "vp8/common/filter.h"
-#include "vp8/common/subpixel.h"
 #include "bilinearfilter_arm.h"

 void vp8_filter_block2d_bil_armv6
--- a/vp8/common/arm/dequantize_arm.c
+++ b/vp8/common/arm/dequantize_arm.c
@@ -10,18 +10,17 @@


 #include "vpx_config.h"
-#include "vp8/common/dequantize.h"
-#include "vp8/common/idct.h"
+#include "vp8/common/blockd.h"

-#if HAVE_ARMV7
+#if HAVE_NEON
 extern void vp8_dequantize_b_loop_neon(short *Q, short *DQC, short *DQ);
 #endif

-#if HAVE_ARMV6
+#if HAVE_MEDIA
 extern void vp8_dequantize_b_loop_v6(short *Q, short *DQC, short *DQ);
 #endif

-#if HAVE_ARMV7
+#if HAVE_NEON

 void vp8_dequantize_b_neon(BLOCKD *d, short *DQC)
 {
@@ -32,7 +31,7 @@ void vp8_dequantize_b_neon(BLOCKD *d, short *DQC)
 }
 #endif

-#if HAVE_ARMV6
+#if HAVE_MEDIA
 void vp8_dequantize_b_v6(BLOCKD *d, short *DQC)
 {
    short *DQ  = d->dqcoeff;
--- a/vp8/common/arm/dequantize_arm.h
+++ b/vp8/common/arm/dequantize_arm.h
@@ -1,59 +0,0 @@
-/*
- *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-
-#ifndef DEQUANTIZE_ARM_H
-#define DEQUANTIZE_ARM_H
-
-#if HAVE_ARMV6
-extern prototype_dequant_block(vp8_dequantize_b_v6);
-extern prototype_dequant_idct_add(vp8_dequant_idct_add_v6);
-extern prototype_dequant_idct_add_y_block(vp8_dequant_idct_add_y_block_v6);
-extern prototype_dequant_idct_add_uv_block(vp8_dequant_idct_add_uv_block_v6);
-
-#if !CONFIG_RUNTIME_CPU_DETECT
-#undef  vp8_dequant_block
-#define vp8_dequant_block vp8_dequantize_b_v6
-
-#undef  vp8_dequant_idct_add
-#define vp8_dequant_idct_add vp8_dequant_idct_add_v6
-
-#undef  vp8_dequant_idct_add_y_block
-#define vp8_dequant_idct_add_y_block vp8_dequant_idct_add_y_block_v6
-
-#undef  vp8_dequant_idct_add_uv_block
-#define vp8_dequant_idct_add_uv_block vp8_dequant_idct_add_uv_block_v6
-#endif
-#endif
-
-#if HAVE_ARMV7
-extern prototype_dequant_block(vp8_dequantize_b_neon);
-extern prototype_dequant_idct_add(vp8_dequant_idct_add_neon);
-extern prototype_dequant_idct_add_y_block(vp8_dequant_idct_add_y_block_neon);
-extern prototype_dequant_idct_add_uv_block(vp8_dequant_idct_add_uv_block_neon);
-
-
-#if !CONFIG_RUNTIME_CPU_DETECT
-#undef  vp8_dequant_block
-#define vp8_dequant_block vp8_dequantize_b_neon
-
-#undef  vp8_dequant_idct_add
-#define vp8_dequant_idct_add vp8_dequant_idct_add_neon
-
-#undef  vp8_dequant_idct_add_y_block
-#define vp8_dequant_idct_add_y_block vp8_dequant_idct_add_y_block_neon
-
-#undef  vp8_dequant_idct_add_uv_block
-#define vp8_dequant_idct_add_uv_block vp8_dequant_idct_add_uv_block_neon
-#endif
-
-#endif
-
-#endif
--- a/vp8/common/arm/filter_arm.c
+++ b/vp8/common/arm/filter_arm.c
@@ -10,9 +10,9 @@


 #include "vpx_config.h"
+#include "vpx_rtcd.h"
 #include <math.h>
 #include "vp8/common/filter.h"
-#include "vp8/common/subpixel.h"
 #include "vpx_ports/mem.h"

 extern void vp8_filter_block2d_first_pass_armv6
@@ -86,8 +86,8 @@ extern void vp8_filter_block2d_second_pass_only_armv6
    const short *vp8_filter
 );

-#if HAVE_ARMV6
-void vp8_sixtap_predict_armv6
+#if HAVE_MEDIA
+void vp8_sixtap_predict4x4_armv6
 (
    unsigned char  *src_ptr,
    int  src_pixels_per_line,
--- a/vp8/common/arm/idct_arm.h
+++ b/vp8/common/arm/idct_arm.h
@@ -1,51 +0,0 @@
-/*
- *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-
-#ifndef IDCT_ARM_H
-#define IDCT_ARM_H
-
-#if HAVE_ARMV6
-extern prototype_idct(vp8_short_idct4x4llm_v6_dual);
-extern prototype_idct_scalar_add(vp8_dc_only_idct_add_v6);
-extern prototype_second_order(vp8_short_inv_walsh4x4_1_v6);
-extern prototype_second_order(vp8_short_inv_walsh4x4_v6);
-
-#if !CONFIG_RUNTIME_CPU_DETECT
-#undef  vp8_idct_idct16
-#define vp8_idct_idct16 vp8_short_idct4x4llm_v6_dual
-
-#undef  vp8_idct_idct1_scalar_add
-#define vp8_idct_idct1_scalar_add vp8_dc_only_idct_add_v6
-
-#undef  vp8_idct_iwalsh16
-#define vp8_idct_iwalsh16 vp8_short_inv_walsh4x4_v6
-#endif
-#endif
-
-#if HAVE_ARMV7
-extern prototype_idct(vp8_short_idct4x4llm_neon);
-extern prototype_idct_scalar_add(vp8_dc_only_idct_add_neon);
-extern prototype_second_order(vp8_short_inv_walsh4x4_1_neon);
-extern prototype_second_order(vp8_short_inv_walsh4x4_neon);
-
-#if !CONFIG_RUNTIME_CPU_DETECT
-#undef  vp8_idct_idct16
-#define vp8_idct_idct16 vp8_short_idct4x4llm_neon
-
-#undef  vp8_idct_idct1_scalar_add
-#define vp8_idct_idct1_scalar_add vp8_dc_only_idct_add_neon
-
-#undef  vp8_idct_iwalsh16
-#define vp8_idct_iwalsh16 vp8_short_inv_walsh4x4_neon
-#endif
-#endif
-
-#endif
--- a/vp8/common/arm/loopfilter_arm.c
+++ b/vp8/common/arm/loopfilter_arm.c
@@ -10,17 +10,22 @@


 #include "vpx_config.h"
+#include "vpx_rtcd.h"
 #include "vp8/common/loopfilter.h"
 #include "vp8/common/onyxc_int.h"

-#if HAVE_ARMV6
+#define prototype_loopfilter(sym) \
+    void sym(unsigned char *src, int pitch, const unsigned char *blimit,\
+             const unsigned char *limit, const unsigned char *thresh, int count)
+
+#if HAVE_MEDIA
 extern prototype_loopfilter(vp8_loop_filter_horizontal_edge_armv6);
 extern prototype_loopfilter(vp8_loop_filter_vertical_edge_armv6);
 extern prototype_loopfilter(vp8_mbloop_filter_horizontal_edge_armv6);
 extern prototype_loopfilter(vp8_mbloop_filter_vertical_edge_armv6);
 #endif

-#if HAVE_ARMV7
+#if HAVE_NEON
 typedef void loopfilter_y_neon(unsigned char *src, int pitch,
        unsigned char blimit, unsigned char limit, unsigned char thresh);
 typedef void loopfilter_uv_neon(unsigned char *u, int pitch,
@@ -38,8 +43,8 @@ extern loopfilter_uv_neon vp8_mbloop_filter_horizontal_edge_uv_neon;
 extern loopfilter_uv_neon vp8_mbloop_filter_vertical_edge_uv_neon;
 #endif

-#if HAVE_ARMV6
-/*ARMV6 loopfilter functions*/
+#if HAVE_MEDIA
+/* ARMV6/MEDIA loopfilter functions*/
 /* Horizontal MB filtering */
 void vp8_loop_filter_mbh_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
                               int y_stride, int uv_stride, loop_filter_info *lfi)
@@ -113,7 +118,7 @@ void vp8_loop_filter_bvs_armv6(unsigned char *y_ptr, int y_stride,
 }
 #endif

-#if HAVE_ARMV7
+#if HAVE_NEON
 /* NEON loopfilter functions */
 /* Horizontal MB filtering */
 void vp8_loop_filter_mbh_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
--- a/vp8/common/arm/loopfilter_arm.h
+++ b/vp8/common/arm/loopfilter_arm.h
@@ -1,93 +0,0 @@
-/*
- *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-
-#ifndef LOOPFILTER_ARM_H
-#define LOOPFILTER_ARM_H
-
-#include "vpx_config.h"
-
-#if HAVE_ARMV6
-extern prototype_loopfilter_block(vp8_loop_filter_mbv_armv6);
-extern prototype_loopfilter_block(vp8_loop_filter_bv_armv6);
-extern prototype_loopfilter_block(vp8_loop_filter_mbh_armv6);
-extern prototype_loopfilter_block(vp8_loop_filter_bh_armv6);
-extern prototype_simple_loopfilter(vp8_loop_filter_bvs_armv6);
-extern prototype_simple_loopfilter(vp8_loop_filter_bhs_armv6);
-extern prototype_simple_loopfilter(vp8_loop_filter_simple_horizontal_edge_armv6);
-extern prototype_simple_loopfilter(vp8_loop_filter_simple_vertical_edge_armv6);
-
-#if !CONFIG_RUNTIME_CPU_DETECT
-#undef  vp8_lf_normal_mb_v
-#define vp8_lf_normal_mb_v vp8_loop_filter_mbv_armv6
-
-#undef  vp8_lf_normal_b_v
-#define vp8_lf_normal_b_v vp8_loop_filter_bv_armv6
-
-#undef  vp8_lf_normal_mb_h
-#define vp8_lf_normal_mb_h vp8_loop_filter_mbh_armv6
-
-#undef  vp8_lf_normal_b_h
-#define vp8_lf_normal_b_h vp8_loop_filter_bh_armv6
-
-#undef  vp8_lf_simple_mb_v
-#define vp8_lf_simple_mb_v vp8_loop_filter_simple_vertical_edge_armv6
-
-#undef  vp8_lf_simple_b_v
-#define vp8_lf_simple_b_v vp8_loop_filter_bvs_armv6
-
-#undef  vp8_lf_simple_mb_h
-#define vp8_lf_simple_mb_h vp8_loop_filter_simple_horizontal_edge_armv6
-
-#undef  vp8_lf_simple_b_h
-#define vp8_lf_simple_b_h vp8_loop_filter_bhs_armv6
-#endif /* !CONFIG_RUNTIME_CPU_DETECT */
-
-#endif /* HAVE_ARMV6 */
-
-#if HAVE_ARMV7
-extern prototype_loopfilter_block(vp8_loop_filter_mbv_neon);
-extern prototype_loopfilter_block(vp8_loop_filter_bv_neon);
-extern prototype_loopfilter_block(vp8_loop_filter_mbh_neon);
-extern prototype_loopfilter_block(vp8_loop_filter_bh_neon);
-extern prototype_simple_loopfilter(vp8_loop_filter_mbvs_neon);
-extern prototype_simple_loopfilter(vp8_loop_filter_bvs_neon);
-extern prototype_simple_loopfilter(vp8_loop_filter_mbhs_neon);
-extern prototype_simple_loopfilter(vp8_loop_filter_bhs_neon);
-
-#if !CONFIG_RUNTIME_CPU_DETECT
-#undef  vp8_lf_normal_mb_v
-#define vp8_lf_normal_mb_v vp8_loop_filter_mbv_neon
-
-#undef  vp8_lf_normal_b_v
-#define vp8_lf_normal_b_v vp8_loop_filter_bv_neon
-
-#undef  vp8_lf_normal_mb_h
-#define vp8_lf_normal_mb_h vp8_loop_filter_mbh_neon
-
-#undef  vp8_lf_normal_b_h
-#define vp8_lf_normal_b_h vp8_loop_filter_bh_neon
-
-#undef  vp8_lf_simple_mb_v
-#define vp8_lf_simple_mb_v vp8_loop_filter_mbvs_neon
-
-#undef  vp8_lf_simple_b_v
-#define vp8_lf_simple_b_v vp8_loop_filter_bvs_neon
-
-#undef  vp8_lf_simple_mb_h
-#define vp8_lf_simple_mb_h vp8_loop_filter_mbhs_neon
-
-#undef  vp8_lf_simple_b_h
-#define vp8_lf_simple_b_h vp8_loop_filter_bhs_neon
-#endif /* !CONFIG_RUNTIME_CPU_DETECT */
-
-#endif /* HAVE_ARMV7 */
-
-#endif /* LOOPFILTER_ARM_H */
--- a/vp8/common/arm/neon/idct_blk_neon.c
+++ b/vp8/common/arm/neon/idct_blk_neon.c
@@ -9,8 +9,7 @@
 */

 #include "vpx_config.h"
-#include "vp8/common/idct.h"
-#include "vp8/common/dequantize.h"
+#include "vpx_rtcd.h"

 /* place these declarations here because we don't want to maintain them
 * outside of this scope
--- a/vp8/encoder/arm/neon/sad16_neon.asm
+++ b/vp8/encoder/arm/neon/sad16_neon.asm
--- a/vp8/encoder/arm/neon/sad8_neon.asm
+++ b/vp8/encoder/arm/neon/sad8_neon.asm
--- a/vp8/common/arm/neon/save_reg_neon.asm
+++ b/vp8/common/arm/neon/save_reg_neon.asm
--- a/vp8/common/arm/neon/sixtappredict4x4_neon.asm
+++ b/vp8/common/arm/neon/sixtappredict4x4_neon.asm
@@ -9,7 +9,7 @@
 ;


-    EXPORT  |vp8_sixtap_predict_neon|
+    EXPORT  |vp8_sixtap_predict4x4_neon|
    ARM
    REQUIRE8
    PRESERVE8
@@ -33,7 +33,7 @@ filter4_coeff
 ; stack(r4) unsigned char *dst_ptr,
 ; stack(lr) int  dst_pitch

-|vp8_sixtap_predict_neon| PROC
+|vp8_sixtap_predict4x4_neon| PROC
    push            {r4, lr}

    adr             r12, filter4_coeff
--- a/vp8/encoder/arm/neon/variance_neon.asm
+++ b/vp8/encoder/arm/neon/variance_neon.asm
@@ -77,14 +77,14 @@ variance16x16_neon_loop
    ;vmov.32        r1, d1[0]
    ;mul            r0, r0, r0
    ;str            r1, [r12]
-    ;sub            r0, r1, r0, asr #8
+    ;sub            r0, r1, r0, lsr #8

-    ;sum is in [-255x256, 255x256]. sumxsum is 32-bit. Shift to right should
-    ;have sign-bit exension, which is vshr.s. Have to use s32 to make it right.
+    ; while sum is signed, sum * sum is always positive and must be treated as
+    ; unsigned to avoid propagating the sign bit.
    vmull.s32       q5, d0, d0
    vst1.32         {d1[0]}, [r12]              ;store sse
-    vshr.s32        d10, d10, #8
-    vsub.s32        d0, d1, d10
+    vshr.u32        d10, d10, #8
+    vsub.u32        d0, d1, d10

    vmov.32         r0, d0[0]                   ;return
    bx              lr
@@ -145,8 +145,8 @@ variance16x8_neon_loop

    vmull.s32       q5, d0, d0
    vst1.32         {d1[0]}, [r12]              ;store sse
-    vshr.s32        d10, d10, #7
-    vsub.s32        d0, d1, d10
+    vshr.u32        d10, d10, #7
+    vsub.u32        d0, d1, d10

    vmov.32         r0, d0[0]                   ;return
    bx              lr
@@ -200,8 +200,8 @@ variance8x16_neon_loop

    vmull.s32       q5, d0, d0
    vst1.32         {d1[0]}, [r12]              ;store sse
-    vshr.s32        d10, d10, #7
-    vsub.s32        d0, d1, d10
+    vshr.u32        d10, d10, #7
+    vsub.u32        d0, d1, d10

    vmov.32         r0, d0[0]                   ;return
    bx              lr
@@ -265,8 +265,8 @@ variance8x8_neon_loop

    vmull.s32       q5, d0, d0
    vst1.32         {d1[0]}, [r12]              ;store sse
-    vshr.s32        d10, d10, #6
-    vsub.s32        d0, d1, d10
+    vshr.u32        d10, d10, #6
+    vsub.u32        d0, d1, d10

    vmov.32         r0, d0[0]                   ;return
    bx              lr
--- a/vp8/encoder/arm/neon/vp8_subpixelvariance16x16_neon.asm
+++ b/vp8/encoder/arm/neon/vp8_subpixelvariance16x16_neon.asm
@@ -9,6 +9,11 @@
 ;


+bilinear_taps_coeff
+    DCD     128, 0, 112, 16, 96, 32, 80, 48, 64, 64, 48, 80, 32, 96, 16, 112
+
+;-----------------
+
    EXPORT  |vp8_sub_pixel_variance16x16_neon_func|
    ARM
    REQUIRE8
@@ -27,7 +32,7 @@
 |vp8_sub_pixel_variance16x16_neon_func| PROC
    push            {r4-r6, lr}

-    ldr             r12, _BilinearTaps_coeff_
+    adr             r12, bilinear_taps_coeff
    ldr             r4, [sp, #16]           ;load *dst_ptr from stack
    ldr             r5, [sp, #20]           ;load dst_pixels_per_line from stack
    ldr             r6, [sp, #24]           ;load *sse from stack
@@ -405,8 +410,8 @@ sub_pixel_variance16x16_neon_loop

    vmull.s32       q5, d0, d0
    vst1.32         {d1[0]}, [r6]               ;store sse
-    vshr.s32        d10, d10, #8
-    vsub.s32        d0, d1, d10
+    vshr.u32        d10, d10, #8
+    vsub.u32        d0, d1, d10

    add             sp, sp, #528
    vmov.32         r0, d0[0]                   ;return
@@ -415,11 +420,4 @@ sub_pixel_variance16x16_neon_loop

    ENDP

-;-----------------
-
-_BilinearTaps_coeff_
-    DCD     bilinear_taps_coeff
-bilinear_taps_coeff
-    DCD     128, 0, 112, 16, 96, 32, 80, 48, 64, 64, 48, 80, 32, 96, 16, 112
-
    END
--- a/vp8/encoder/arm/neon/vp8_subpixelvariance16x16s_neon.asm
+++ b/vp8/encoder/arm/neon/vp8_subpixelvariance16x16s_neon.asm
@@ -112,8 +112,8 @@ vp8_filt_fpo16x16s_4_0_loop_neon

    vmull.s32       q5, d0, d0
    vst1.32         {d1[0]}, [lr]               ;store sse
-    vshr.s32        d10, d10, #8
-    vsub.s32        d0, d1, d10
+    vshr.u32        d10, d10, #8
+    vsub.u32        d0, d1, d10

    vmov.32         r0, d0[0]                   ;return
    pop             {pc}
@@ -208,8 +208,8 @@ vp8_filt_spo16x16s_0_4_loop_neon

    vmull.s32       q5, d0, d0
    vst1.32         {d1[0]}, [lr]               ;store sse
-    vshr.s32        d10, d10, #8
-    vsub.s32        d0, d1, d10
+    vshr.u32        d10, d10, #8
+    vsub.u32        d0, d1, d10

    vmov.32         r0, d0[0]                   ;return
    pop             {pc}
@@ -327,8 +327,8 @@ vp8_filt16x16s_4_4_loop_neon

    vmull.s32       q5, d0, d0
    vst1.32         {d1[0]}, [lr]               ;store sse
-    vshr.s32        d10, d10, #8
-    vsub.s32        d0, d1, d10
+    vshr.u32        d10, d10, #8
+    vsub.u32        d0, d1, d10

    vmov.32         r0, d0[0]                   ;return
    pop             {pc}
@@ -560,8 +560,8 @@ sub_pixel_variance16x16s_neon_loop

    vmull.s32       q5, d0, d0
    vst1.32         {d1[0]}, [lr]               ;store sse
-    vshr.s32        d10, d10, #8
-    vsub.s32        d0, d1, d10
+    vshr.u32        d10, d10, #8
+    vsub.u32        d0, d1, d10

    add             sp, sp, #256
    vmov.32         r0, d0[0]                   ;return
--- a/vp8/encoder/arm/neon/vp8_subpixelvariance8x8_neon.asm
+++ b/vp8/encoder/arm/neon/vp8_subpixelvariance8x8_neon.asm
@@ -27,7 +27,7 @@
 |vp8_sub_pixel_variance8x8_neon| PROC
    push            {r4-r5, lr}

-    ldr             r12, _BilinearTaps_coeff_
+    adr             r12, bilinear_taps_coeff
    ldr             r4, [sp, #12]           ;load *dst_ptr from stack
    ldr             r5, [sp, #16]           ;load dst_pixels_per_line from stack
    ldr             lr, [sp, #20]           ;load *sse from stack
@@ -206,8 +206,8 @@ sub_pixel_variance8x8_neon_loop

    vmull.s32       q5, d0, d0
    vst1.32         {d1[0]}, [lr]               ;store sse
-    vshr.s32        d10, d10, #6
-    vsub.s32        d0, d1, d10
+    vshr.u32        d10, d10, #6
+    vsub.u32        d0, d1, d10

    vmov.32         r0, d0[0]                   ;return
    pop             {r4-r5, pc}
@@ -216,8 +216,6 @@ sub_pixel_variance8x8_neon_loop

 ;-----------------

-_BilinearTaps_coeff_
-    DCD     bilinear_taps_coeff
 bilinear_taps_coeff
    DCD     128, 0, 112, 16, 96, 32, 80, 48, 64, 64, 48, 80, 32, 96, 16, 112

--- a/vp8/common/arm/recon_arm.h
+++ b/vp8/common/arm/recon_arm.h
@@ -1,65 +0,0 @@
-/*
- *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-
-#ifndef RECON_ARM_H
-#define RECON_ARM_H
-
-#if HAVE_ARMV6
-
-extern prototype_copy_block(vp8_copy_mem8x8_v6);
-extern prototype_copy_block(vp8_copy_mem8x4_v6);
-extern prototype_copy_block(vp8_copy_mem16x16_v6);
-extern prototype_intra4x4_predict(vp8_intra4x4_predict_armv6);
-
-#if !CONFIG_RUNTIME_CPU_DETECT
-#undef  vp8_recon_copy8x8
-#define vp8_recon_copy8x8 vp8_copy_mem8x8_v6
-
-#undef  vp8_recon_copy8x4
-#define vp8_recon_copy8x4 vp8_copy_mem8x4_v6
-
-#undef  vp8_recon_copy16x16
-#define vp8_recon_copy16x16 vp8_copy_mem16x16_v6
-
-#undef  vp8_recon_intra4x4_predict
-#define vp8_recon_intra4x4_predict vp8_intra4x4_predict_armv6
-#endif
-#endif
-
-#if HAVE_ARMV7
-
-extern prototype_copy_block(vp8_copy_mem8x8_neon);
-extern prototype_copy_block(vp8_copy_mem8x4_neon);
-extern prototype_copy_block(vp8_copy_mem16x16_neon);
-
-extern prototype_build_intra_predictors(vp8_build_intra_predictors_mby_neon);
-extern prototype_build_intra_predictors(vp8_build_intra_predictors_mby_s_neon);
-
-#if !CONFIG_RUNTIME_CPU_DETECT
-#undef  vp8_recon_copy8x8
-#define vp8_recon_copy8x8 vp8_copy_mem8x8_neon
-
-#undef  vp8_recon_copy8x4
-#define vp8_recon_copy8x4 vp8_copy_mem8x4_neon
-
-#undef  vp8_recon_copy16x16
-#define vp8_recon_copy16x16 vp8_copy_mem16x16_neon
-
-#undef  vp8_recon_build_intra_predictors_mby
-#define vp8_recon_build_intra_predictors_mby vp8_build_intra_predictors_mby_neon
-
-#undef  vp8_recon_build_intra_predictors_mby_s
-#define vp8_recon_build_intra_predictors_mby_s vp8_build_intra_predictors_mby_s_neon
-
-#endif
-#endif
-
-#endif
--- a/vp8/common/arm/reconintra_arm.c
+++ b/vp8/common/arm/reconintra_arm.c
@@ -10,12 +10,11 @@


 #include "vpx_config.h"
+#include "vpx_rtcd.h"
 #include "vp8/common/blockd.h"
-#include "vp8/common/reconintra.h"
 #include "vpx_mem/vpx_mem.h"
-#include "vp8/common/recon.h"

-#if HAVE_ARMV7
+#if HAVE_NEON
 extern void vp8_build_intra_predictors_mby_neon_func(
    unsigned char *y_buffer,
    unsigned char *ypred_ptr,
@@ -35,10 +34,7 @@ void vp8_build_intra_predictors_mby_neon(MACROBLOCKD *x)

    vp8_build_intra_predictors_mby_neon_func(y_buffer, ypred_ptr, y_stride, mode, Up, Left);
 }
-#endif

-
-#if HAVE_ARMV7
 extern void vp8_build_intra_predictors_mby_s_neon_func(
    unsigned char *y_buffer,
    unsigned char *ypred_ptr,
--- a/vp8/common/arm/subpixel_arm.h
+++ b/vp8/common/arm/subpixel_arm.h
@@ -1,89 +0,0 @@
-/*
- *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-
-#ifndef SUBPIXEL_ARM_H
-#define SUBPIXEL_ARM_H
-
-#if HAVE_ARMV6
-extern prototype_subpixel_predict(vp8_sixtap_predict16x16_armv6);
-extern prototype_subpixel_predict(vp8_sixtap_predict8x8_armv6);
-extern prototype_subpixel_predict(vp8_sixtap_predict8x4_armv6);
-extern prototype_subpixel_predict(vp8_sixtap_predict_armv6);
-extern prototype_subpixel_predict(vp8_bilinear_predict16x16_armv6);
-extern prototype_subpixel_predict(vp8_bilinear_predict8x8_armv6);
-extern prototype_subpixel_predict(vp8_bilinear_predict8x4_armv6);
-extern prototype_subpixel_predict(vp8_bilinear_predict4x4_armv6);
-
-#if !CONFIG_RUNTIME_CPU_DETECT
-#undef  vp8_subpix_sixtap16x16
-#define vp8_subpix_sixtap16x16 vp8_sixtap_predict16x16_armv6
-
-#undef  vp8_subpix_sixtap8x8
-#define vp8_subpix_sixtap8x8 vp8_sixtap_predict8x8_armv6
-
-#undef  vp8_subpix_sixtap8x4
-#define vp8_subpix_sixtap8x4 vp8_sixtap_predict8x4_armv6
-
-#undef  vp8_subpix_sixtap4x4
-#define vp8_subpix_sixtap4x4 vp8_sixtap_predict_armv6
-
-#undef  vp8_subpix_bilinear16x16
-#define vp8_subpix_bilinear16x16 vp8_bilinear_predict16x16_armv6
-
-#undef  vp8_subpix_bilinear8x8
-#define vp8_subpix_bilinear8x8 vp8_bilinear_predict8x8_armv6
-
-#undef  vp8_subpix_bilinear8x4
-#define vp8_subpix_bilinear8x4 vp8_bilinear_predict8x4_armv6
-
-#undef  vp8_subpix_bilinear4x4
-#define vp8_subpix_bilinear4x4 vp8_bilinear_predict4x4_armv6
-#endif
-#endif
-
-#if HAVE_ARMV7
-extern prototype_subpixel_predict(vp8_sixtap_predict16x16_neon);
-extern prototype_subpixel_predict(vp8_sixtap_predict8x8_neon);
-extern prototype_subpixel_predict(vp8_sixtap_predict8x4_neon);
-extern prototype_subpixel_predict(vp8_sixtap_predict_neon);
-extern prototype_subpixel_predict(vp8_bilinear_predict16x16_neon);
-extern prototype_subpixel_predict(vp8_bilinear_predict8x8_neon);
-extern prototype_subpixel_predict(vp8_bilinear_predict8x4_neon);
-extern prototype_subpixel_predict(vp8_bilinear_predict4x4_neon);
-
-#if !CONFIG_RUNTIME_CPU_DETECT
-#undef  vp8_subpix_sixtap16x16
-#define vp8_subpix_sixtap16x16 vp8_sixtap_predict16x16_neon
-
-#undef  vp8_subpix_sixtap8x8
-#define vp8_subpix_sixtap8x8 vp8_sixtap_predict8x8_neon
-
-#undef  vp8_subpix_sixtap8x4
-#define vp8_subpix_sixtap8x4 vp8_sixtap_predict8x4_neon
-
-#undef  vp8_subpix_sixtap4x4
-#define vp8_subpix_sixtap4x4 vp8_sixtap_predict_neon
-
-#undef  vp8_subpix_bilinear16x16
-#define vp8_subpix_bilinear16x16 vp8_bilinear_predict16x16_neon
-
-#undef  vp8_subpix_bilinear8x8
-#define vp8_subpix_bilinear8x8 vp8_bilinear_predict8x8_neon
-
-#undef  vp8_subpix_bilinear8x4
-#define vp8_subpix_bilinear8x4 vp8_bilinear_predict8x4_neon
-
-#undef  vp8_subpix_bilinear4x4
-#define vp8_subpix_bilinear4x4 vp8_bilinear_predict4x4_neon
-#endif
-#endif
-
-#endif
--- a/vp8/encoder/arm/variance_arm.c
+++ b/vp8/encoder/arm/variance_arm.c
@@ -9,10 +9,11 @@
 */

 #include "vpx_config.h"
-#include "vp8/encoder/variance.h"
+#include "vpx_rtcd.h"
+#include "vp8/common/variance.h"
 #include "vp8/common/filter.h"

-#if HAVE_ARMV6
+#if HAVE_MEDIA
 #include "vp8/common/arm/bilinearfilter_arm.h"

 unsigned int vp8_sub_pixel_variance8x8_armv6
@@ -91,10 +92,21 @@ unsigned int vp8_sub_pixel_variance16x16_armv6
    return var;
 }

-#endif /* HAVE_ARMV6 */
+#endif /* HAVE_MEDIA */


-#if HAVE_ARMV7
+#if HAVE_NEON
+
+extern unsigned int vp8_sub_pixel_variance16x16_neon_func
+(
+    const unsigned char  *src_ptr,
+    int  src_pixels_per_line,
+    int  xoffset,
+    int  yoffset,
+    const unsigned char *dst_ptr,
+    int dst_pixels_per_line,
+    unsigned int *sse
+);

 unsigned int vp8_sub_pixel_variance16x16_neon
 (
--- a/vp8/common/asm_com_offsets.c
+++ b/vp8/common/asm_com_offsets.c
@@ -15,6 +15,10 @@
 #include "vpx_scale/yv12config.h"
 #include "vp8/common/blockd.h"

+#if CONFIG_POSTPROC
+#include "postproc.h"
+#endif /* CONFIG_POSTPROC */
+
 BEGIN

 /* vpx_scale */
@@ -30,12 +34,17 @@ DEFINE(yv12_buffer_config_v_buffer,             offsetof(YV12_BUFFER_CONFIG, v_b
 DEFINE(yv12_buffer_config_border,               offsetof(YV12_BUFFER_CONFIG, border));
 DEFINE(VP8BORDERINPIXELS_VAL,                   VP8BORDERINPIXELS);

+#if CONFIG_POSTPROC
+/* mfqe.c / filter_by_weight */
+DEFINE(MFQE_PRECISION_VAL,                      MFQE_PRECISION);
+#endif /* CONFIG_POSTPROC */
+
 END

 /* add asserts for any offset that is not supported by assembly code */
 /* add asserts for any size that is not supported by assembly code */

-#if HAVE_ARMV6
+#if HAVE_MEDIA
 /* switch case in vp8_intra4x4_predict_armv6 is based on these enumerated values */
 ct_assert(B_DC_PRED, B_DC_PRED == 0);
 ct_assert(B_TM_PRED, B_TM_PRED == 1);
@@ -49,7 +58,14 @@ ct_assert(B_HD_PRED, B_HD_PRED == 8);
 ct_assert(B_HU_PRED, B_HU_PRED == 9);
 #endif

-#if HAVE_ARMV7
+#if HAVE_NEON
 /* vp8_yv12_extend_frame_borders_neon makes several assumptions based on this */
 ct_assert(VP8BORDERINPIXELS_VAL, VP8BORDERINPIXELS == 32)
 #endif
+
+#if HAVE_SSE2
+#if CONFIG_POSTPROC
+/* vp8_filter_by_weight16x16 and 8x8 */
+ct_assert(MFQE_PRECISION_VAL, MFQE_PRECISION == 4)
+#endif /* CONFIG_POSTPROC */
+#endif /* HAVE_SSE2 */
--- a/vp8/common/blockd.h
+++ b/vp8/common/blockd.h
@@ -18,7 +18,6 @@ void vpx_log(const char *format, ...);
 #include "vpx_scale/yv12config.h"
 #include "mv.h"
 #include "treecoder.h"
-#include "subpixel.h"
 #include "vpx_ports/mem.h"

 /*#define DCPRED 1*/
@@ -151,14 +150,15 @@ typedef enum

 typedef struct
 {
-    MB_PREDICTION_MODE mode, uv_mode;
-    MV_REFERENCE_FRAME ref_frame;
+    uint8_t mode, uv_mode;
+    uint8_t ref_frame;
+    uint8_t is_4x4;
    int_mv mv;

-    unsigned char partitioning;
-    unsigned char mb_skip_coeff;                                /* does this mb has coefficients at all, 1=no coefficients, 0=need decode tokens */
-    unsigned char need_to_clamp_mvs;
-    unsigned char segment_id;                  /* Which set of segmentation parameters should be used for this MB */
+    uint8_t partitioning;
+    uint8_t mb_skip_coeff;                                /* does this mb has coefficients at all, 1=no coefficients, 0=need decode tokens */
+    uint8_t need_to_clamp_mvs;
+    uint8_t segment_id;                  /* Which set of segmentation parameters should be used for this MB */
 } MB_MODE_INFO;

 typedef struct
@@ -179,28 +179,22 @@ typedef struct
 } LOWER_RES_INFO;
 #endif

-typedef struct
+typedef struct blockd
 {
    short *qcoeff;
    short *dqcoeff;
    unsigned char  *predictor;
    short *dequant;

-    /* 16 Y blocks, 4 U blocks, 4 V blocks each with 16 entries */
-    unsigned char **base_pre;
-    int pre;
-    int pre_stride;
-
-    unsigned char **base_dst;
-    int dst;
-    int dst_stride;
-
+    int offset;
    char *eob;

    union b_mode_info bmi;
 } BLOCKD;

-typedef struct MacroBlockD
+typedef void (*vp8_subpix_fn_t)(unsigned char *src, int src_pitch, int xofst, int yofst, unsigned char *dst, int dst_pitch);
+
+typedef struct macroblockd
 {
    DECLARE_ALIGNED(16, unsigned char,  predictor[384]);
    DECLARE_ALIGNED(16, short, qcoeff[400]);
@@ -222,11 +216,21 @@ typedef struct MacroBlockD
    MODE_INFO *mode_info_context;
    int mode_info_stride;

+#if CONFIG_TEMPORAL_DENOISING
+    MB_PREDICTION_MODE best_sse_inter_mode;
+    int_mv best_sse_mv;
+    unsigned char need_to_clamp_best_mvs;
+#endif
+
    FRAME_TYPE frame_type;

    int up_available;
    int left_available;

+    unsigned char *recon_above[3];
+    unsigned char *recon_left[3];
+    int recon_left_stride[2];
+
    /* Y,U,V,Y2 */
    ENTROPY_CONTEXT_PLANES *above_context;
    ENTROPY_CONTEXT_PLANES *left_context;
@@ -265,11 +269,8 @@ typedef struct MacroBlockD
    int mb_to_top_edge;
    int mb_to_bottom_edge;

-    int ref_frame_cost[MAX_REF_FRAMES];


-    unsigned int frames_since_golden;
-    unsigned int frames_till_alt_ref_frame;
    vp8_subpix_fn_t  subpixel_predict;
    vp8_subpix_fn_t  subpixel_predict8x4;
    vp8_subpix_fn_t  subpixel_predict8x8;
@@ -286,10 +287,6 @@ typedef struct MacroBlockD
     */
    DECLARE_ALIGNED(32, unsigned char, y_buf[22*32]);
 #endif
-
-#if CONFIG_RUNTIME_CPU_DETECT
-    struct VP8_COMMON_RTCD  *rtcd;
-#endif
 } MACROBLOCKD;


--- a/vp8/common/dequantize.c
+++ b/vp8/common/dequantize.c
@@ -10,8 +10,8 @@


 #include "vpx_config.h"
-#include "dequantize.h"
-#include "vp8/common/idct.h"
+#include "vpx_rtcd.h"
+#include "vp8/common/blockd.h"
 #include "vpx_mem/vpx_mem.h"

 void vp8_dequantize_b_c(BLOCKD *d, short *DQC)
--- a/vp8/common/dequantize.h
+++ b/vp8/common/dequantize.h
@@ -1,85 +0,0 @@
-/*
- *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-
-#ifndef DEQUANTIZE_H
-#define DEQUANTIZE_H
-#include "vp8/common/blockd.h"
-
-#define prototype_dequant_block(sym) \
-    void sym(BLOCKD *x, short *DQC)
-
-#define prototype_dequant_idct_add(sym) \
-    void sym(short *input, short *dq, \
-             unsigned char *output, \
-             int stride)
-
-#define prototype_dequant_idct_add_y_block(sym) \
-    void sym(short *q, short *dq, \
-             unsigned char *dst, \
-             int stride, char *eobs)
-
-#define prototype_dequant_idct_add_uv_block(sym) \
-    void sym(short *q, short *dq, \
-             unsigned char *dst_u, \
-             unsigned char *dst_v, int stride, char *eobs)
-
-#if ARCH_X86 || ARCH_X86_64
-#include "x86/dequantize_x86.h"
-#endif
-
-#if ARCH_ARM
-#include "arm/dequantize_arm.h"
-#endif
-
-#ifndef vp8_dequant_block
-#define vp8_dequant_block vp8_dequantize_b_c
-#endif
-extern prototype_dequant_block(vp8_dequant_block);
-
-#ifndef vp8_dequant_idct_add
-#define vp8_dequant_idct_add vp8_dequant_idct_add_c
-#endif
-extern prototype_dequant_idct_add(vp8_dequant_idct_add);
-
-#ifndef vp8_dequant_idct_add_y_block
-#define vp8_dequant_idct_add_y_block vp8_dequant_idct_add_y_block_c
-#endif
-extern prototype_dequant_idct_add_y_block(vp8_dequant_idct_add_y_block);
-
-#ifndef vp8_dequant_idct_add_uv_block
-#define vp8_dequant_idct_add_uv_block vp8_dequant_idct_add_uv_block_c
-#endif
-extern prototype_dequant_idct_add_uv_block(vp8_dequant_idct_add_uv_block);
-
-
-typedef prototype_dequant_block((*vp8_dequant_block_fn_t));
-
-typedef prototype_dequant_idct_add((*vp8_dequant_idct_add_fn_t));
-
-typedef prototype_dequant_idct_add_y_block((*vp8_dequant_idct_add_y_block_fn_t));
-
-typedef prototype_dequant_idct_add_uv_block((*vp8_dequant_idct_add_uv_block_fn_t));
-
-typedef struct
-{
-    vp8_dequant_block_fn_t               block;
-    vp8_dequant_idct_add_fn_t            idct_add;
-    vp8_dequant_idct_add_y_block_fn_t    idct_add_y_block;
-    vp8_dequant_idct_add_uv_block_fn_t   idct_add_uv_block;
-} vp8_dequant_rtcd_vtable_t;
-
-#if CONFIG_RUNTIME_CPU_DETECT
-#define DEQUANT_INVOKE(ctx,fn) (ctx)->fn
-#else
-#define DEQUANT_INVOKE(ctx,fn) vp8_dequant_##fn
-#endif
-
-#endif
--- a/vp8/common/entropy.c
+++ b/vp8/common/entropy.c
@@ -8,23 +8,11 @@
 *  be found in the AUTHORS file in the root of the source tree.
 */

-
-#include <stdio.h>
-
 #include "entropy.h"
-#include "string.h"
 #include "blockd.h"
 #include "onyxc_int.h"
 #include "vpx_mem/vpx_mem.h"

-#define uchar unsigned char     /* typedefs can clash */
-#define uint  unsigned int
-
-typedef const uchar cuchar;
-typedef const uint cuint;
-
-typedef vp8_prob Prob;
-
 #include "coefupdateprobs.h"

 DECLARE_ALIGNED(16, const unsigned char, vp8_norm[256]) =
@@ -47,10 +35,11 @@ DECLARE_ALIGNED(16, const unsigned char, vp8_norm[256]) =
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
 };

-DECLARE_ALIGNED(16, cuchar, vp8_coef_bands[16]) =
+DECLARE_ALIGNED(16, const unsigned char, vp8_coef_bands[16]) =
 { 0, 1, 2, 3, 6, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 7};

-DECLARE_ALIGNED(16, cuchar, vp8_prev_token_class[MAX_ENTROPY_TOKENS]) =
+DECLARE_ALIGNED(16, const unsigned char,
+                vp8_prev_token_class[MAX_ENTROPY_TOKENS]) =
 { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0};

 DECLARE_ALIGNED(16, const int, vp8_default_zig_zag1d[16]) =
@@ -69,7 +58,26 @@ DECLARE_ALIGNED(16, const short, vp8_default_inv_zig_zag[16]) =
   10, 11, 15, 16
 };

-DECLARE_ALIGNED(16, short, vp8_default_zig_zag_mask[16]);
+/* vp8_default_zig_zag_mask generated with:
+
+    void vp8_init_scan_order_mask()
+    {
+        int i;
+
+        for (i = 0; i < 16; i++)
+        {
+            vp8_default_zig_zag_mask[vp8_default_zig_zag1d[i]] = 1 << i;
+        }
+
+    }
+*/
+DECLARE_ALIGNED(16, const short, vp8_default_zig_zag_mask[16]) =
+{
+     1,    2,    32,     64,
+     4,   16,   128,   4096,
+     8,  256,  2048,   8192,
+   512, 1024, 16384, -32768
+};

 const int vp8_mb_feature_data_bits[MB_LVL_MAX] = {7, 6};

@@ -90,56 +98,72 @@ const vp8_tree_index vp8_coef_tree[ 22] =     /* corresponding _CONTEXT_NODEs */
    -DCT_VAL_CATEGORY5, -DCT_VAL_CATEGORY6   /* 10 = CAT_FIVE */
 };

-struct vp8_token_struct vp8_coef_encodings[MAX_ENTROPY_TOKENS];
+/* vp8_coef_encodings generated with:
+    vp8_tokens_from_tree(vp8_coef_encodings, vp8_coef_tree);
+*/
+const vp8_token vp8_coef_encodings[MAX_ENTROPY_TOKENS] =
+{
+    {2, 2},
+    {6, 3},
+    {28, 5},
+    {58, 6},
+    {59, 6},
+    {60, 6},
+    {61, 6},
+    {124, 7},
+    {125, 7},
+    {126, 7},
+    {127, 7},
+    {0, 1}
+};

 /* Trees for extra bits.  Probabilities are constant and
   do not depend on previously encoded bits */

-static const Prob Pcat1[] = { 159};
-static const Prob Pcat2[] = { 165, 145};
-static const Prob Pcat3[] = { 173, 148, 140};
-static const Prob Pcat4[] = { 176, 155, 140, 135};
-static const Prob Pcat5[] = { 180, 157, 141, 134, 130};
-static const Prob Pcat6[] =
+static const vp8_prob Pcat1[] = { 159};
+static const vp8_prob Pcat2[] = { 165, 145};
+static const vp8_prob Pcat3[] = { 173, 148, 140};
+static const vp8_prob Pcat4[] = { 176, 155, 140, 135};
+static const vp8_prob Pcat5[] = { 180, 157, 141, 134, 130};
+static const vp8_prob Pcat6[] =
 { 254, 254, 243, 230, 196, 177, 153, 140, 133, 130, 129};

-static vp8_tree_index cat1[2], cat2[4], cat3[6], cat4[8], cat5[10], cat6[22];

-void vp8_init_scan_order_mask()
-{
-    int i;
+/* tree index tables generated with:

-    for (i = 0; i < 16; i++)
+    void init_bit_tree(vp8_tree_index *p, int n)
    {
-        vp8_default_zig_zag_mask[vp8_default_zig_zag1d[i]] = 1 << i;
+        int i = 0;
+
+        while (++i < n)
+        {
+            p[0] = p[1] = i << 1;
+            p += 2;
+        }
+
+        p[0] = p[1] = 0;
    }

-}
-
-static void init_bit_tree(vp8_tree_index *p, int n)
-{
-    int i = 0;
-
-    while (++i < n)
+    void init_bit_trees()
    {
-        p[0] = p[1] = i << 1;
-        p += 2;
+        init_bit_tree(cat1, 1);
+        init_bit_tree(cat2, 2);
+        init_bit_tree(cat3, 3);
+        init_bit_tree(cat4, 4);
+        init_bit_tree(cat5, 5);
+        init_bit_tree(cat6, 11);
    }
+*/

-    p[0] = p[1] = 0;
-}
+static const vp8_tree_index cat1[2] = { 0, 0 };
+static const vp8_tree_index cat2[4] = { 2, 2, 0, 0 };
+static const vp8_tree_index cat3[6] = { 2, 2, 4, 4, 0, 0 };
+static const vp8_tree_index cat4[8] = { 2, 2, 4, 4, 6, 6, 0, 0 };
+static const vp8_tree_index cat5[10] = { 2, 2, 4, 4, 6, 6, 8, 8, 0, 0 };
+static const vp8_tree_index cat6[22] = { 2, 2, 4, 4, 6, 6, 8, 8, 10, 10, 12, 12,
+                                        14, 14, 16, 16, 18, 18, 20, 20, 0, 0 };

-static void init_bit_trees()
-{
-    init_bit_tree(cat1, 1);
-    init_bit_tree(cat2, 2);
-    init_bit_tree(cat3, 3);
-    init_bit_tree(cat4, 4);
-    init_bit_tree(cat5, 5);
-    init_bit_tree(cat6, 11);
-}
-
-vp8_extra_bit_struct vp8_extra_bits[12] =
+const vp8_extra_bit_struct vp8_extra_bits[12] =
 {
    { 0, 0, 0, 0},
    { 0, 0, 0, 1},
@@ -163,8 +187,3 @@ void vp8_default_coef_probs(VP8_COMMON *pc)
                   sizeof(default_coef_probs));
 }

-void vp8_coef_tree_initialize()
-{
-    init_bit_trees();
-    vp8_tokens_from_tree(vp8_coef_encodings, vp8_coef_tree);
-}
--- a/vp8/common/entropy.h
+++ b/vp8/common/entropy.h
@@ -35,7 +35,7 @@

 extern const vp8_tree_index vp8_coef_tree[];

-extern struct vp8_token_struct vp8_coef_encodings[MAX_ENTROPY_TOKENS];
+extern const struct vp8_token_struct vp8_coef_encodings[MAX_ENTROPY_TOKENS];

 typedef struct
 {
@@ -45,7 +45,7 @@ typedef struct
    int base_val;
 } vp8_extra_bit_struct;

-extern vp8_extra_bit_struct vp8_extra_bits[12];    /* indexed by token value */
+extern const vp8_extra_bit_struct vp8_extra_bits[12];    /* indexed by token value */

 #define PROB_UPDATE_BASELINE_COST   7

@@ -94,7 +94,7 @@ void vp8_default_coef_probs(struct VP8Common *);

 extern DECLARE_ALIGNED(16, const int, vp8_default_zig_zag1d[16]);
 extern DECLARE_ALIGNED(16, const short, vp8_default_inv_zig_zag[16]);
-extern short vp8_default_zig_zag_mask[16];
+extern DECLARE_ALIGNED(16, const short, vp8_default_zig_zag_mask[16]);
 extern const int vp8_mb_feature_data_bits[MB_LVL_MAX];

 void vp8_coef_tree_initialize(void);
--- a/vp8/common/entropymode.c
+++ b/vp8/common/entropymode.c
@@ -8,22 +8,13 @@
 *  be found in the AUTHORS file in the root of the source tree.
 */

+#define USE_PREBUILT_TABLES

 #include "entropymode.h"
 #include "entropy.h"
 #include "vpx_mem/vpx_mem.h"

-static const unsigned int kf_y_mode_cts[VP8_YMODES] = { 1607, 915, 812, 811, 5455};
-static const unsigned int y_mode_cts  [VP8_YMODES] = { 8080, 1908, 1582, 1007, 5874};
-
-static const unsigned int uv_mode_cts  [VP8_UV_MODES] = { 59483, 13605, 16492, 4230};
-static const unsigned int kf_uv_mode_cts[VP8_UV_MODES] = { 5319, 1904, 1703, 674};
-
-static const unsigned int bmode_cts[VP8_BINTRAMODES] =
-{
-    43891, 17694, 10036, 3920, 3363, 2546, 5119, 3221, 2471, 1723
-};
-
+#include "vp8_entropymodedata.h"

 int vp8_mv_cont(const int_mv *l, const int_mv *a)
 {
@@ -59,7 +50,7 @@ const vp8_prob vp8_sub_mv_ref_prob2 [SUBMVREF_COUNT][VP8_SUBMVREFS-1] =



-vp8_mbsplit vp8_mbsplits [VP8_NUMMBSPLITS] =
+const vp8_mbsplit vp8_mbsplits [VP8_NUMMBSPLITS] =
 {
    {
        0,  0,  0,  0,
@@ -84,7 +75,7 @@ vp8_mbsplit vp8_mbsplits [VP8_NUMMBSPLITS] =
        4,  5,  6,  7,
        8,  9,  10, 11,
        12, 13, 14, 15,
-    },
+    }
 };

 const int vp8_mbsplit_count [VP8_NUMMBSPLITS] = { 2, 2, 4, 16};
@@ -155,17 +146,6 @@ const vp8_tree_index vp8_sub_mv_ref_tree[6] =
    -ZERO4X4, -NEW4X4
 };

-
-struct vp8_token_struct vp8_bmode_encodings   [VP8_BINTRAMODES];
-struct vp8_token_struct vp8_ymode_encodings   [VP8_YMODES];
-struct vp8_token_struct vp8_kf_ymode_encodings [VP8_YMODES];
-struct vp8_token_struct vp8_uv_mode_encodings  [VP8_UV_MODES];
-struct vp8_token_struct vp8_mbsplit_encodings [VP8_NUMMBSPLITS];
-
-struct vp8_token_struct vp8_mv_ref_encoding_array    [VP8_MVREFS];
-struct vp8_token_struct vp8_sub_mv_ref_encoding_array [VP8_SUBMVREFS];
-
-
 const vp8_tree_index vp8_small_mvtree [14] =
 {
    2, 8,
@@ -177,89 +157,21 @@ const vp8_tree_index vp8_small_mvtree [14] =
    -6, -7
 };

-struct vp8_token_struct vp8_small_mvencodings [8];
-
 void vp8_init_mbmode_probs(VP8_COMMON *x)
 {
-    unsigned int bct [VP8_YMODES] [2];      /* num Ymodes > num UV modes */
-
-    vp8_tree_probs_from_distribution(
-        VP8_YMODES, vp8_ymode_encodings, vp8_ymode_tree,
-        x->fc.ymode_prob, bct, y_mode_cts,
-        256, 1
-    );
-    vp8_tree_probs_from_distribution(
-        VP8_YMODES, vp8_kf_ymode_encodings, vp8_kf_ymode_tree,
-        x->kf_ymode_prob, bct, kf_y_mode_cts,
-        256, 1
-    );
-    vp8_tree_probs_from_distribution(
-        VP8_UV_MODES, vp8_uv_mode_encodings, vp8_uv_mode_tree,
-        x->fc.uv_mode_prob, bct, uv_mode_cts,
-        256, 1
-    );
-    vp8_tree_probs_from_distribution(
-        VP8_UV_MODES, vp8_uv_mode_encodings, vp8_uv_mode_tree,
-        x->kf_uv_mode_prob, bct, kf_uv_mode_cts,
-        256, 1
-    );
+    vpx_memcpy(x->fc.ymode_prob, vp8_ymode_prob, sizeof(vp8_ymode_prob));
+    vpx_memcpy(x->kf_ymode_prob, vp8_kf_ymode_prob, sizeof(vp8_kf_ymode_prob));
+    vpx_memcpy(x->fc.uv_mode_prob, vp8_uv_mode_prob, sizeof(vp8_uv_mode_prob));
+    vpx_memcpy(x->kf_uv_mode_prob, vp8_kf_uv_mode_prob, sizeof(vp8_kf_uv_mode_prob));
    vpx_memcpy(x->fc.sub_mv_ref_prob, sub_mv_ref_prob, sizeof(sub_mv_ref_prob));
 }

-
-static void intra_bmode_probs_from_distribution(
-    vp8_prob p [VP8_BINTRAMODES-1],
-    unsigned int branch_ct [VP8_BINTRAMODES-1] [2],
-    const unsigned int events [VP8_BINTRAMODES]
-)
-{
-    vp8_tree_probs_from_distribution(
-        VP8_BINTRAMODES, vp8_bmode_encodings, vp8_bmode_tree,
-        p, branch_ct, events,
-        256, 1
-    );
-}
-
 void vp8_default_bmode_probs(vp8_prob p [VP8_BINTRAMODES-1])
 {
-    unsigned int branch_ct [VP8_BINTRAMODES-1] [2];
-    intra_bmode_probs_from_distribution(p, branch_ct, bmode_cts);
+    vpx_memcpy(p, vp8_bmode_prob, sizeof(vp8_bmode_prob));
 }

 void vp8_kf_default_bmode_probs(vp8_prob p [VP8_BINTRAMODES] [VP8_BINTRAMODES] [VP8_BINTRAMODES-1])
 {
-    unsigned int branch_ct [VP8_BINTRAMODES-1] [2];
-
-    int i = 0;
-
-    do
-    {
-        int j = 0;
-
-        do
-        {
-            intra_bmode_probs_from_distribution(
-                p[i][j], branch_ct, vp8_kf_default_bmode_counts[i][j]);
-
-        }
-        while (++j < VP8_BINTRAMODES);
-    }
-    while (++i < VP8_BINTRAMODES);
-}
-
-
-void vp8_entropy_mode_init()
-{
-    vp8_tokens_from_tree(vp8_bmode_encodings,   vp8_bmode_tree);
-    vp8_tokens_from_tree(vp8_ymode_encodings,   vp8_ymode_tree);
-    vp8_tokens_from_tree(vp8_kf_ymode_encodings, vp8_kf_ymode_tree);
-    vp8_tokens_from_tree(vp8_uv_mode_encodings,  vp8_uv_mode_tree);
-    vp8_tokens_from_tree(vp8_mbsplit_encodings, vp8_mbsplit_tree);
-
-    vp8_tokens_from_tree_offset(vp8_mv_ref_encoding_array,
-                                vp8_mv_ref_tree, NEARESTMV);
-    vp8_tokens_from_tree_offset(vp8_sub_mv_ref_encoding_array,
-                                vp8_sub_mv_ref_tree, LEFT4X4);
-
-    vp8_tokens_from_tree(vp8_small_mvencodings, vp8_small_mvtree);
+    vpx_memcpy(p, vp8_kf_bmode_prob, sizeof(vp8_kf_bmode_prob));
 }
--- a/vp8/common/entropymode.h
+++ b/vp8/common/entropymode.h
@@ -52,22 +52,20 @@ extern const vp8_tree_index  vp8_mbsplit_tree[];
 extern const vp8_tree_index  vp8_mv_ref_tree[];
 extern const vp8_tree_index  vp8_sub_mv_ref_tree[];

-extern struct vp8_token_struct vp8_bmode_encodings   [VP8_BINTRAMODES];
-extern struct vp8_token_struct vp8_ymode_encodings   [VP8_YMODES];
-extern struct vp8_token_struct vp8_kf_ymode_encodings [VP8_YMODES];
-extern struct vp8_token_struct vp8_uv_mode_encodings  [VP8_UV_MODES];
-extern struct vp8_token_struct vp8_mbsplit_encodings  [VP8_NUMMBSPLITS];
+extern const struct vp8_token_struct vp8_bmode_encodings[VP8_BINTRAMODES];
+extern const struct vp8_token_struct vp8_ymode_encodings[VP8_YMODES];
+extern const struct vp8_token_struct vp8_kf_ymode_encodings[VP8_YMODES];
+extern const struct vp8_token_struct vp8_uv_mode_encodings[VP8_UV_MODES];
+extern const struct vp8_token_struct vp8_mbsplit_encodings[VP8_NUMMBSPLITS];

 /* Inter mode values do not start at zero */

-extern struct vp8_token_struct vp8_mv_ref_encoding_array    [VP8_MVREFS];
-extern struct vp8_token_struct vp8_sub_mv_ref_encoding_array [VP8_SUBMVREFS];
+extern const struct vp8_token_struct vp8_mv_ref_encoding_array[VP8_MVREFS];
+extern const struct vp8_token_struct vp8_sub_mv_ref_encoding_array[VP8_SUBMVREFS];

 extern const vp8_tree_index vp8_small_mvtree[];

-extern struct vp8_token_struct vp8_small_mvencodings [8];
-
-void vp8_entropy_mode_init(void);
+extern const struct vp8_token_struct vp8_small_mvencodings[8];

 void vp8_init_mbmode_probs(VP8_COMMON *x);

--- a/vp8/common/filter.c
+++ b/vp8/common/filter.c
@@ -149,7 +149,7 @@ static void filter_block2d
 }


-void vp8_sixtap_predict_c
+void vp8_sixtap_predict4x4_c
 (
    unsigned char  *src_ptr,
    int   src_pixels_per_line,
--- a/vp8/common/generic/systemdependent.c
+++ b/vp8/common/generic/systemdependent.c
@@ -10,30 +10,33 @@


 #include "vpx_config.h"
-#include "vp8/common/subpixel.h"
-#include "vp8/common/loopfilter.h"
-#include "vp8/common/recon.h"
-#include "vp8/common/idct.h"
+#include "vpx_rtcd.h"
+#if ARCH_ARM
+#include "vpx_ports/arm.h"
+#elif ARCH_X86 || ARCH_X86_64
+#include "vpx_ports/x86.h"
+#endif
 #include "vp8/common/onyxc_int.h"

 #if CONFIG_MULTITHREAD
-#if HAVE_UNISTD_H
+#if HAVE_UNISTD_H && !defined(__OS2__)
 #include <unistd.h>
 #elif defined(_WIN32)
 #include <windows.h>
 typedef void (WINAPI *PGNSI)(LPSYSTEM_INFO);
+#elif defined(__OS2__)
+#define INCL_DOS
+#define INCL_DOSSPINLOCK
+#include <os2.h>
 #endif
 #endif

-extern void vp8_arch_x86_common_init(VP8_COMMON *ctx);
-extern void vp8_arch_arm_common_init(VP8_COMMON *ctx);
-
 #if CONFIG_MULTITHREAD
 static int get_cpu_count()
 {
    int core_count = 16;

-#if HAVE_UNISTD_H
+#if HAVE_UNISTD_H && !defined(__OS2__)
 #if defined(_SC_NPROCESSORS_ONLN)
    core_count = sysconf(_SC_NPROCESSORS_ONLN);
 #elif defined(_SC_NPROC_ONLN)
@@ -56,6 +59,21 @@ static int get_cpu_count()

        core_count = sysinfo.dwNumberOfProcessors;
    }
+#elif defined(__OS2__)
+    {
+        ULONG proc_id;
+        ULONG status;
+
+        core_count = 0;
+        for (proc_id = 1; ; proc_id++)
+        {
+            if (DosGetProcessorStatus(proc_id, &status))
+                break;
+
+            if (status == PROC_ONLINE)
+                core_count++;
+        }
+    }
 #else
    /* other platforms */
 #endif
@@ -64,78 +82,69 @@ static int get_cpu_count()
 }
 #endif

+
+#if HAVE_PTHREAD_H
+#include <pthread.h>
+static void once(void (*func)(void))
+{
+    static pthread_once_t lock = PTHREAD_ONCE_INIT;
+    pthread_once(&lock, func);
+}
+
+
+#elif defined(_WIN32)
+static void once(void (*func)(void))
+{
+    /* Using a static initializer here rather than InitializeCriticalSection()
+     * since there's no race-free context in which to execute it. Protecting
+     * it with an atomic op like InterlockedCompareExchangePointer introduces
+     * an x86 dependency, and InitOnceExecuteOnce requires Vista.
+     */
+    static CRITICAL_SECTION lock = {(void *)-1, -1, 0, 0, 0, 0};
+    static int done;
+
+    EnterCriticalSection(&lock);
+
+    if (!done)
+    {
+        func();
+        done = 1;
+    }
+
+    LeaveCriticalSection(&lock);
+}
+
+
+#else
+/* No-op version that performs no synchronization. vpx_rtcd() is idempotent,
+ * so as long as your platform provides atomic loads/stores of pointers
+ * no synchronization is strictly necessary.
+ */
+
+static void once(void (*func)(void))
+{
+    static int done;
+
+    if(!done)
+    {
+        func();
+        done = 1;
+    }
+}
+#endif
+
+
 void vp8_machine_specific_config(VP8_COMMON *ctx)
 {
-#if CONFIG_RUNTIME_CPU_DETECT
-    VP8_COMMON_RTCD *rtcd = &ctx->rtcd;
-
-
-    rtcd->dequant.block             = vp8_dequantize_b_c;
-    rtcd->dequant.idct_add          = vp8_dequant_idct_add_c;
-    rtcd->dequant.idct_add_y_block  = vp8_dequant_idct_add_y_block_c;
-    rtcd->dequant.idct_add_uv_block =
-        vp8_dequant_idct_add_uv_block_c;
-
-
-    rtcd->idct.idct16       = vp8_short_idct4x4llm_c;
-    rtcd->idct.idct1_scalar_add = vp8_dc_only_idct_add_c;
-    rtcd->idct.iwalsh1      = vp8_short_inv_walsh4x4_1_c;
-    rtcd->idct.iwalsh16     = vp8_short_inv_walsh4x4_c;
-
-    rtcd->recon.copy16x16   = vp8_copy_mem16x16_c;
-    rtcd->recon.copy8x8     = vp8_copy_mem8x8_c;
-    rtcd->recon.copy8x4     = vp8_copy_mem8x4_c;
-
-    rtcd->recon.build_intra_predictors_mby =
-        vp8_build_intra_predictors_mby;
-    rtcd->recon.build_intra_predictors_mby_s =
-        vp8_build_intra_predictors_mby_s;
-    rtcd->recon.build_intra_predictors_mbuv =
-        vp8_build_intra_predictors_mbuv;
-    rtcd->recon.build_intra_predictors_mbuv_s =
-        vp8_build_intra_predictors_mbuv_s;
-    rtcd->recon.intra4x4_predict =
-        vp8_intra4x4_predict_c;
-
-    rtcd->subpix.sixtap16x16   = vp8_sixtap_predict16x16_c;
-    rtcd->subpix.sixtap8x8     = vp8_sixtap_predict8x8_c;
-    rtcd->subpix.sixtap8x4     = vp8_sixtap_predict8x4_c;
-    rtcd->subpix.sixtap4x4     = vp8_sixtap_predict_c;
-    rtcd->subpix.bilinear16x16 = vp8_bilinear_predict16x16_c;
-    rtcd->subpix.bilinear8x8   = vp8_bilinear_predict8x8_c;
-    rtcd->subpix.bilinear8x4   = vp8_bilinear_predict8x4_c;
-    rtcd->subpix.bilinear4x4   = vp8_bilinear_predict4x4_c;
-
-    rtcd->loopfilter.normal_mb_v = vp8_loop_filter_mbv_c;
-    rtcd->loopfilter.normal_b_v  = vp8_loop_filter_bv_c;
-    rtcd->loopfilter.normal_mb_h = vp8_loop_filter_mbh_c;
-    rtcd->loopfilter.normal_b_h  = vp8_loop_filter_bh_c;
-    rtcd->loopfilter.simple_mb_v = vp8_loop_filter_simple_vertical_edge_c;
-    rtcd->loopfilter.simple_b_v  = vp8_loop_filter_bvs_c;
-    rtcd->loopfilter.simple_mb_h = vp8_loop_filter_simple_horizontal_edge_c;
-    rtcd->loopfilter.simple_b_h  = vp8_loop_filter_bhs_c;
-
-#if CONFIG_POSTPROC || (CONFIG_VP8_ENCODER && CONFIG_INTERNAL_STATS)
-    rtcd->postproc.down             = vp8_mbpost_proc_down_c;
-    rtcd->postproc.across           = vp8_mbpost_proc_across_ip_c;
-    rtcd->postproc.downacross       = vp8_post_proc_down_and_across_c;
-    rtcd->postproc.addnoise         = vp8_plane_add_noise_c;
-    rtcd->postproc.blend_mb_inner   = vp8_blend_mb_inner_c;
-    rtcd->postproc.blend_mb_outer   = vp8_blend_mb_outer_c;
-    rtcd->postproc.blend_b          = vp8_blend_b_c;
-#endif
-
-#endif
-
-#if ARCH_X86 || ARCH_X86_64
-    vp8_arch_x86_common_init(ctx);
-#endif
-
-#if ARCH_ARM
-    vp8_arch_arm_common_init(ctx);
-#endif
-
 #if CONFIG_MULTITHREAD
    ctx->processor_core_count = get_cpu_count();
 #endif /* CONFIG_MULTITHREAD */
+
+#if ARCH_ARM
+    ctx->cpu_caps = arm_cpu_caps();
+#elif ARCH_X86 || ARCH_X86_64
+    ctx->cpu_caps = x86_simd_caps();
+#endif
+
+    once(vpx_rtcd);
 }
--- a/vp8/common/idct.h
+++ b/vp8/common/idct.h
@@ -1,80 +0,0 @@
-/*
- *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-
-#ifndef __INC_IDCT_H
-#define __INC_IDCT_H
-
-#define prototype_second_order(sym) \
-    void sym(short *input, short *output)
-
-#define prototype_idct(sym) \
-    void sym(short *input, unsigned char *pred, int pitch, unsigned char *dst, \
-             int dst_stride)
-
-#define prototype_idct_scalar_add(sym) \
-    void sym(short input, \
-            unsigned char *pred, int pred_stride, \
-            unsigned char *dst, \
-            int dst_stride)
-
-#if ARCH_X86 || ARCH_X86_64
-#include "x86/idct_x86.h"
-#endif
-
-#if ARCH_ARM
-#include "arm/idct_arm.h"
-#endif
-
-#ifndef vp8_idct_idct16
-#define vp8_idct_idct16 vp8_short_idct4x4llm_c
-#endif
-extern prototype_idct(vp8_idct_idct16);
-/* add this prototype to prevent compiler warning about implicit
- * declaration of vp8_short_idct4x4llm_c function in dequantize.c
- * when building, for example, neon optimized version */
-extern prototype_idct(vp8_short_idct4x4llm_c);
-
-#ifndef vp8_idct_idct1_scalar_add
-#define vp8_idct_idct1_scalar_add vp8_dc_only_idct_add_c
-#endif
-extern prototype_idct_scalar_add(vp8_idct_idct1_scalar_add);
-
-
-#ifndef vp8_idct_iwalsh1
-#define vp8_idct_iwalsh1 vp8_short_inv_walsh4x4_1_c
-#endif
-extern prototype_second_order(vp8_idct_iwalsh1);
-
-#ifndef vp8_idct_iwalsh16
-#define vp8_idct_iwalsh16 vp8_short_inv_walsh4x4_c
-#endif
-extern prototype_second_order(vp8_idct_iwalsh16);
-
-typedef prototype_idct((*vp8_idct_fn_t));
-typedef prototype_idct_scalar_add((*vp8_idct_scalar_add_fn_t));
-typedef prototype_second_order((*vp8_second_order_fn_t));
-
-typedef struct
-{
-    vp8_idct_fn_t            idct16;
-    vp8_idct_scalar_add_fn_t idct1_scalar_add;
-
-    vp8_second_order_fn_t iwalsh1;
-    vp8_second_order_fn_t iwalsh16;
-} vp8_idct_rtcd_vtable_t;
-
-#if CONFIG_RUNTIME_CPU_DETECT
-#define IDCT_INVOKE(ctx,fn) (ctx)->fn
-#else
-#define IDCT_INVOKE(ctx,fn) vp8_idct_##fn
-#endif
-
-#endif
--- a/vp8/common/idct_blk.c
+++ b/vp8/common/idct_blk.c
@@ -9,8 +9,7 @@
 */

 #include "vpx_config.h"
-#include "vp8/common/idct.h"
-#include "dequantize.h"
+#include "vpx_rtcd.h"

 void vp8_dequant_idct_add_c(short *input, short *dq,
                            unsigned char *dest, int stride);
--- a/vp8/common/idctllm_test.cc
+++ b/vp8/common/idctllm_test.cc
@@ -0,0 +1,31 @@
+/*
+ *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
+ *
+ *  Use of this source code is governed by a BSD-style license
+ *  that can be found in the LICENSE file in the root of the source
+ *  tree. An additional intellectual property rights grant can be found
+ *  in the file PATENTS.  All contributing project authors may
+ *  be found in the AUTHORS file in the root of the source tree.
+ */
+
+
+ extern "C" {
+    void vp8_short_idct4x4llm_c(short *input, unsigned char *pred_ptr,
+                            int pred_stride, unsigned char *dst_ptr,
+                            int dst_stride);
+}
+
+#include "vpx_config.h"
+#include "idctllm_test.h"
+namespace
+{
+
+INSTANTIATE_TEST_CASE_P(C, IDCTTest,
+                        ::testing::Values(vp8_short_idct4x4llm_c));
+
+} // namespace
+
+int main(int argc, char **argv) {
+  ::testing::InitGoogleTest(&argc, argv);
+  return RUN_ALL_TESTS();
+}
--- a/vp8/common/idctllm_test.h
+++ b/vp8/common/idctllm_test.h
@@ -0,0 +1,113 @@
+/*
+ *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
+ *
+ *  Use of this source code is governed by a BSD-style license
+ *  that can be found in the LICENSE file in the root of the source
+ *  tree. An additional intellectual property rights grant can be found
+ *  in the file PATENTS.  All contributing project authors may
+ *  be found in the AUTHORS file in the root of the source tree.
+ */
+
+
+ #include "third_party/googletest/src/include/gtest/gtest.h"
+typedef void (*idct_fn_t)(short *input, unsigned char *pred_ptr,
+                            int pred_stride, unsigned char *dst_ptr,
+                            int dst_stride);
+namespace {
+class IDCTTest : public ::testing::TestWithParam<idct_fn_t>
+{
+  protected:
+    virtual void SetUp()
+    {
+        int i;
+
+        UUT = GetParam();
+        memset(input, 0, sizeof(input));
+        /* Set up guard blocks */
+        for(i=0; i<256; i++)
+            output[i] = ((i&0xF)<4&&(i<64))?0:-1;
+    }
+
+    idct_fn_t UUT;
+    short input[16];
+    unsigned char output[256];
+    unsigned char predict[256];
+};
+
+TEST_P(IDCTTest, TestGuardBlocks)
+{
+    int i;
+
+    for(i=0; i<256; i++)
+        if((i&0xF) < 4 && i<64)
+            EXPECT_EQ(0, output[i]) << i;
+        else
+            EXPECT_EQ(255, output[i]);
+}
+
+TEST_P(IDCTTest, TestAllZeros)
+{
+    int i;
+
+    UUT(input, output, 16, output, 16);
+
+    for(i=0; i<256; i++)
+        if((i&0xF) < 4 && i<64)
+            EXPECT_EQ(0, output[i]) << "i==" << i;
+        else
+            EXPECT_EQ(255, output[i]) << "i==" << i;
+}
+
+TEST_P(IDCTTest, TestAllOnes)
+{
+    int i;
+
+    input[0] = 4;
+    UUT(input, output, 16, output, 16);
+
+    for(i=0; i<256; i++)
+        if((i&0xF) < 4 && i<64)
+            EXPECT_EQ(1, output[i]) << "i==" << i;
+        else
+            EXPECT_EQ(255, output[i]) << "i==" << i;
+}
+
+TEST_P(IDCTTest, TestAddOne)
+{
+    int i;
+
+    for(i=0; i<256; i++)
+        predict[i] = i;
+
+    input[0] = 4;
+    UUT(input, predict, 16, output, 16);
+
+    for(i=0; i<256; i++)
+        if((i&0xF) < 4 && i<64)
+            EXPECT_EQ(i+1, output[i]) << "i==" << i;
+        else
+            EXPECT_EQ(255, output[i]) << "i==" << i;
+}
+
+TEST_P(IDCTTest, TestWithData)
+{
+    int i;
+
+    for(i=0; i<16; i++)
+        input[i] = i;
+
+    UUT(input, output, 16, output, 16);
+
+    for(i=0; i<256; i++)
+        if((i&0xF) > 3 || i>63)
+            EXPECT_EQ(255, output[i]) << "i==" << i;
+        else if(i == 0)
+            EXPECT_EQ(11, output[i]) << "i==" << i;
+        else if(i == 34)
+            EXPECT_EQ(1, output[i]) << "i==" << i;
+        else if(i == 2 || i == 17 || i == 32)
+            EXPECT_EQ(3, output[i]) << "i==" << i;
+        else
+            EXPECT_EQ(0, output[i]) << "i==" << i;
+}
+}
--- a/vp8/common/invtrans.h
+++ b/vp8/common/invtrans.h
@@ -13,7 +13,7 @@
 #define __INC_INVTRANS_H

 #include "vpx_config.h"
-#include "idct.h"
+#include "vpx_rtcd.h"
 #include "blockd.h"
 #include "onyxc_int.h"

@@ -33,8 +33,7 @@ static void eob_adjust(char *eobs, short *diff)
    }
 }

-static void vp8_inverse_transform_mby(MACROBLOCKD *xd,
-                                      const VP8_COMMON_RTCD *rtcd)
+static void vp8_inverse_transform_mby(MACROBLOCKD *xd)
 {
    short *DQC = xd->dequant_y1;

@@ -43,19 +42,19 @@ static void vp8_inverse_transform_mby(MACROBLOCKD *xd,
        /* do 2nd order transform on the dc block */
        if (xd->eobs[24] > 1)
        {
-            IDCT_INVOKE(&rtcd->idct, iwalsh16)
+            vp8_short_inv_walsh4x4
                (&xd->block[24].dqcoeff[0], xd->qcoeff);
        }
        else
        {
-            IDCT_INVOKE(&rtcd->idct, iwalsh1)
+            vp8_short_inv_walsh4x4_1
                (&xd->block[24].dqcoeff[0], xd->qcoeff);
        }
        eob_adjust(xd->eobs, xd->qcoeff);

        DQC = xd->dequant_y1_dc;
    }
-    DEQUANT_INVOKE (&rtcd->dequant, idct_add_y_block)
+    vp8_dequant_idct_add_y_block
                    (xd->qcoeff, DQC,
                     xd->dst.y_buffer,
                     xd->dst.y_stride, xd->eobs);
--- a/vp8/common/loopfilter.c
+++ b/vp8/common/loopfilter.c
@@ -10,96 +10,13 @@


 #include "vpx_config.h"
+#include "vpx_rtcd.h"
 #include "loopfilter.h"
 #include "onyxc_int.h"
 #include "vpx_mem/vpx_mem.h"

 typedef unsigned char uc;

-prototype_loopfilter(vp8_loop_filter_horizontal_edge_c);
-prototype_loopfilter(vp8_loop_filter_vertical_edge_c);
-prototype_loopfilter(vp8_mbloop_filter_horizontal_edge_c);
-prototype_loopfilter(vp8_mbloop_filter_vertical_edge_c);
-
-prototype_simple_loopfilter(vp8_loop_filter_simple_horizontal_edge_c);
-prototype_simple_loopfilter(vp8_loop_filter_simple_vertical_edge_c);
-
-/* Horizontal MB filtering */
-void vp8_loop_filter_mbh_c(unsigned char *y_ptr, unsigned char *u_ptr,
-                           unsigned char *v_ptr, int y_stride, int uv_stride,
-                           loop_filter_info *lfi)
-{
-    vp8_mbloop_filter_horizontal_edge_c(y_ptr, y_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 2);
-
-    if (u_ptr)
-        vp8_mbloop_filter_horizontal_edge_c(u_ptr, uv_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 1);
-
-    if (v_ptr)
-        vp8_mbloop_filter_horizontal_edge_c(v_ptr, uv_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 1);
-}
-
-/* Vertical MB Filtering */
-void vp8_loop_filter_mbv_c(unsigned char *y_ptr, unsigned char *u_ptr,
-                           unsigned char *v_ptr, int y_stride, int uv_stride,
-                           loop_filter_info *lfi)
-{
-    vp8_mbloop_filter_vertical_edge_c(y_ptr, y_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 2);
-
-    if (u_ptr)
-        vp8_mbloop_filter_vertical_edge_c(u_ptr, uv_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 1);
-
-    if (v_ptr)
-        vp8_mbloop_filter_vertical_edge_c(v_ptr, uv_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 1);
-}
-
-/* Horizontal B Filtering */
-void vp8_loop_filter_bh_c(unsigned char *y_ptr, unsigned char *u_ptr,
-                          unsigned char *v_ptr, int y_stride, int uv_stride,
-                          loop_filter_info *lfi)
-{
-    vp8_loop_filter_horizontal_edge_c(y_ptr + 4 * y_stride, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
-    vp8_loop_filter_horizontal_edge_c(y_ptr + 8 * y_stride, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
-    vp8_loop_filter_horizontal_edge_c(y_ptr + 12 * y_stride, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
-
-    if (u_ptr)
-        vp8_loop_filter_horizontal_edge_c(u_ptr + 4 * uv_stride, uv_stride, lfi->blim, lfi->lim, lfi->hev_thr, 1);
-
-    if (v_ptr)
-        vp8_loop_filter_horizontal_edge_c(v_ptr + 4 * uv_stride, uv_stride, lfi->blim, lfi->lim, lfi->hev_thr, 1);
-}
-
-void vp8_loop_filter_bhs_c(unsigned char *y_ptr, int y_stride,
-                           const unsigned char *blimit)
-{
-    vp8_loop_filter_simple_horizontal_edge_c(y_ptr + 4 * y_stride, y_stride, blimit);
-    vp8_loop_filter_simple_horizontal_edge_c(y_ptr + 8 * y_stride, y_stride, blimit);
-    vp8_loop_filter_simple_horizontal_edge_c(y_ptr + 12 * y_stride, y_stride, blimit);
-}
-
-/* Vertical B Filtering */
-void vp8_loop_filter_bv_c(unsigned char *y_ptr, unsigned char *u_ptr,
-                          unsigned char *v_ptr, int y_stride, int uv_stride,
-                          loop_filter_info *lfi)
-{
-    vp8_loop_filter_vertical_edge_c(y_ptr + 4, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
-    vp8_loop_filter_vertical_edge_c(y_ptr + 8, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
-    vp8_loop_filter_vertical_edge_c(y_ptr + 12, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
-
-    if (u_ptr)
-        vp8_loop_filter_vertical_edge_c(u_ptr + 4, uv_stride, lfi->blim, lfi->lim, lfi->hev_thr, 1);
-
-    if (v_ptr)
-        vp8_loop_filter_vertical_edge_c(v_ptr + 4, uv_stride, lfi->blim, lfi->lim, lfi->hev_thr, 1);
-}
-
-void vp8_loop_filter_bvs_c(unsigned char *y_ptr, int y_stride,
-                           const unsigned char *blimit)
-{
-    vp8_loop_filter_simple_vertical_edge_c(y_ptr + 4, y_stride, blimit);
-    vp8_loop_filter_simple_vertical_edge_c(y_ptr + 8, y_stride, blimit);
-    vp8_loop_filter_simple_vertical_edge_c(y_ptr + 12, y_stride, blimit);
-}
-
 static void lf_init_lut(loop_filter_info_n *lfi)
 {
    int filt_lvl;
@@ -293,6 +210,8 @@ void vp8_loop_filter_frame

    int mb_row;
    int mb_col;
+    int mb_rows = cm->mb_rows;
+    int mb_cols = cm->mb_cols;

    int filter_level;

@@ -300,6 +219,8 @@ void vp8_loop_filter_frame

    /* Point at base of Mb MODE_INFO list */
    const MODE_INFO *mode_info_context = cm->mi;
+    int post_y_stride = post->y_stride;
+    int post_uv_stride = post->uv_stride;

    /* Initialize the loop filter for this frame. */
    vp8_loop_filter_frame_init(cm, mbd, cm->filter_level);
@@ -310,23 +231,23 @@ void vp8_loop_filter_frame
    v_ptr = post->v_buffer;

    /* vp8_filter each macro block */
-    for (mb_row = 0; mb_row < cm->mb_rows; mb_row++)
+    if (cm->filter_type == NORMAL_LOOPFILTER)
    {
-        for (mb_col = 0; mb_col < cm->mb_cols; mb_col++)
+        for (mb_row = 0; mb_row < mb_rows; mb_row++)
        {
-            int skip_lf = (mode_info_context->mbmi.mode != B_PRED &&
-                            mode_info_context->mbmi.mode != SPLITMV &&
-                            mode_info_context->mbmi.mb_skip_coeff);
-
-            const int mode_index = lfi_n->mode_lf_lut[mode_info_context->mbmi.mode];
-            const int seg = mode_info_context->mbmi.segment_id;
-            const int ref_frame = mode_info_context->mbmi.ref_frame;
-
-            filter_level = lfi_n->lvl[seg][ref_frame][mode_index];
-
-            if (filter_level)
+            for (mb_col = 0; mb_col < mb_cols; mb_col++)
            {
-                if (cm->filter_type == NORMAL_LOOPFILTER)
+                int skip_lf = (mode_info_context->mbmi.mode != B_PRED &&
+                                mode_info_context->mbmi.mode != SPLITMV &&
+                                mode_info_context->mbmi.mb_skip_coeff);
+
+                const int mode_index = lfi_n->mode_lf_lut[mode_info_context->mbmi.mode];
+                const int seg = mode_info_context->mbmi.segment_id;
+                const int ref_frame = mode_info_context->mbmi.ref_frame;
+
+                filter_level = lfi_n->lvl[seg][ref_frame][mode_index];
+
+                if (filter_level)
                {
                    const int hev_index = lfi_n->hev_thr_lut[frame_type][filter_level];
                    lfi.mblim = lfi_n->mblim[filter_level];
@@ -335,55 +256,88 @@ void vp8_loop_filter_frame
                    lfi.hev_thr = lfi_n->hev_thr[hev_index];

                    if (mb_col > 0)
-                        LF_INVOKE(&cm->rtcd.loopfilter, normal_mb_v)
-                        (y_ptr, u_ptr, v_ptr, post->y_stride, post->uv_stride, &lfi);
+                        vp8_loop_filter_mbv
+                        (y_ptr, u_ptr, v_ptr, post_y_stride, post_uv_stride, &lfi);

                    if (!skip_lf)
-                        LF_INVOKE(&cm->rtcd.loopfilter, normal_b_v)
-                        (y_ptr, u_ptr, v_ptr, post->y_stride, post->uv_stride, &lfi);
+                        vp8_loop_filter_bv
+                        (y_ptr, u_ptr, v_ptr, post_y_stride, post_uv_stride, &lfi);

                    /* don't apply across umv border */
                    if (mb_row > 0)
-                        LF_INVOKE(&cm->rtcd.loopfilter, normal_mb_h)
-                        (y_ptr, u_ptr, v_ptr, post->y_stride, post->uv_stride, &lfi);
+                        vp8_loop_filter_mbh
+                        (y_ptr, u_ptr, v_ptr, post_y_stride, post_uv_stride, &lfi);

                    if (!skip_lf)
-                        LF_INVOKE(&cm->rtcd.loopfilter, normal_b_h)
-                        (y_ptr, u_ptr, v_ptr, post->y_stride, post->uv_stride, &lfi);
+                        vp8_loop_filter_bh
+                        (y_ptr, u_ptr, v_ptr, post_y_stride, post_uv_stride, &lfi);
                }
-                else
-                {
-                    if (mb_col > 0)
-                        LF_INVOKE(&cm->rtcd.loopfilter, simple_mb_v)
-                        (y_ptr, post->y_stride, lfi_n->mblim[filter_level]);

-                    if (!skip_lf)
-                        LF_INVOKE(&cm->rtcd.loopfilter, simple_b_v)
-                        (y_ptr, post->y_stride, lfi_n->blim[filter_level]);
+                y_ptr += 16;
+                u_ptr += 8;
+                v_ptr += 8;

-                    /* don't apply across umv border */
-                    if (mb_row > 0)
-                        LF_INVOKE(&cm->rtcd.loopfilter, simple_mb_h)
-                        (y_ptr, post->y_stride, lfi_n->mblim[filter_level]);
-
-                    if (!skip_lf)
-                        LF_INVOKE(&cm->rtcd.loopfilter, simple_b_h)
-                        (y_ptr, post->y_stride, lfi_n->blim[filter_level]);
-                }
+                mode_info_context++;     /* step to next MB */
            }
+            y_ptr += post_y_stride  * 16 - post->y_width;
+            u_ptr += post_uv_stride *  8 - post->uv_width;
+            v_ptr += post_uv_stride *  8 - post->uv_width;

-            y_ptr += 16;
-            u_ptr += 8;
-            v_ptr += 8;
+            mode_info_context++;         /* Skip border mb */

-            mode_info_context++;     /* step to next MB */
        }
+    }
+    else /* SIMPLE_LOOPFILTER */
+    {
+        for (mb_row = 0; mb_row < mb_rows; mb_row++)
+        {
+            for (mb_col = 0; mb_col < mb_cols; mb_col++)
+            {
+                int skip_lf = (mode_info_context->mbmi.mode != B_PRED &&
+                                mode_info_context->mbmi.mode != SPLITMV &&
+                                mode_info_context->mbmi.mb_skip_coeff);

-        y_ptr += post->y_stride  * 16 - post->y_width;
-        u_ptr += post->uv_stride *  8 - post->uv_width;
-        v_ptr += post->uv_stride *  8 - post->uv_width;
+                const int mode_index = lfi_n->mode_lf_lut[mode_info_context->mbmi.mode];
+                const int seg = mode_info_context->mbmi.segment_id;
+                const int ref_frame = mode_info_context->mbmi.ref_frame;

-        mode_info_context++;         /* Skip border mb */
+                filter_level = lfi_n->lvl[seg][ref_frame][mode_index];
+                if (filter_level)
+                {
+                    const unsigned char * mblim = lfi_n->mblim[filter_level];
+                    const unsigned char * blim = lfi_n->blim[filter_level];
+
+                    if (mb_col > 0)
+                        vp8_loop_filter_simple_mbv
+                        (y_ptr, post_y_stride, mblim);
+
+                    if (!skip_lf)
+                        vp8_loop_filter_simple_bv
+                        (y_ptr, post_y_stride, blim);
+
+                    /* don't apply across umv border */
+                    if (mb_row > 0)
+                        vp8_loop_filter_simple_mbh
+                        (y_ptr, post_y_stride, mblim);
+
+                    if (!skip_lf)
+                        vp8_loop_filter_simple_bh
+                        (y_ptr, post_y_stride, blim);
+                }
+
+                y_ptr += 16;
+                u_ptr += 8;
+                v_ptr += 8;
+
+                mode_info_context++;     /* step to next MB */
+            }
+            y_ptr += post_y_stride  * 16 - post->y_width;
+            u_ptr += post_uv_stride *  8 - post->uv_width;
+            v_ptr += post_uv_stride *  8 - post->uv_width;
+
+            mode_info_context++;         /* Skip border mb */
+
+        }
    }
 }

@@ -446,39 +400,39 @@ void vp8_loop_filter_frame_yonly
                    lfi.hev_thr = lfi_n->hev_thr[hev_index];

                    if (mb_col > 0)
-                        LF_INVOKE(&cm->rtcd.loopfilter, normal_mb_v)
+                        vp8_loop_filter_mbv
                        (y_ptr, 0, 0, post->y_stride, 0, &lfi);

                    if (!skip_lf)
-                        LF_INVOKE(&cm->rtcd.loopfilter, normal_b_v)
+                        vp8_loop_filter_bv
                        (y_ptr, 0, 0, post->y_stride, 0, &lfi);

                    /* don't apply across umv border */
                    if (mb_row > 0)
-                        LF_INVOKE(&cm->rtcd.loopfilter, normal_mb_h)
+                        vp8_loop_filter_mbh
                        (y_ptr, 0, 0, post->y_stride, 0, &lfi);

                    if (!skip_lf)
-                        LF_INVOKE(&cm->rtcd.loopfilter, normal_b_h)
+                        vp8_loop_filter_bh
                        (y_ptr, 0, 0, post->y_stride, 0, &lfi);
                }
                else
                {
                    if (mb_col > 0)
-                        LF_INVOKE(&cm->rtcd.loopfilter, simple_mb_v)
+                        vp8_loop_filter_simple_mbv
                        (y_ptr, post->y_stride, lfi_n->mblim[filter_level]);

                    if (!skip_lf)
-                        LF_INVOKE(&cm->rtcd.loopfilter, simple_b_v)
+                        vp8_loop_filter_simple_bv
                        (y_ptr, post->y_stride, lfi_n->blim[filter_level]);

                    /* don't apply across umv border */
                    if (mb_row > 0)
-                        LF_INVOKE(&cm->rtcd.loopfilter, simple_mb_h)
+                        vp8_loop_filter_simple_mbh
                        (y_ptr, post->y_stride, lfi_n->mblim[filter_level]);

                    if (!skip_lf)
-                        LF_INVOKE(&cm->rtcd.loopfilter, simple_b_h)
+                        vp8_loop_filter_simple_bh
                        (y_ptr, post->y_stride, lfi_n->blim[filter_level]);
                }
            }
@@ -578,35 +532,35 @@ void vp8_loop_filter_partial_frame
                    lfi.hev_thr = lfi_n->hev_thr[hev_index];

                    if (mb_col > 0)
-                        LF_INVOKE(&cm->rtcd.loopfilter, normal_mb_v)
+                        vp8_loop_filter_mbv
                        (y_ptr, 0, 0, post->y_stride, 0, &lfi);

                    if (!skip_lf)
-                        LF_INVOKE(&cm->rtcd.loopfilter, normal_b_v)
+                        vp8_loop_filter_bv
                        (y_ptr, 0, 0, post->y_stride, 0, &lfi);

-                    LF_INVOKE(&cm->rtcd.loopfilter, normal_mb_h)
+                    vp8_loop_filter_mbh
                        (y_ptr, 0, 0, post->y_stride, 0, &lfi);

                    if (!skip_lf)
-                        LF_INVOKE(&cm->rtcd.loopfilter, normal_b_h)
+                        vp8_loop_filter_bh
                        (y_ptr, 0, 0, post->y_stride, 0, &lfi);
                }
                else
                {
                    if (mb_col > 0)
-                        LF_INVOKE(&cm->rtcd.loopfilter, simple_mb_v)
+                        vp8_loop_filter_simple_mbv
                        (y_ptr, post->y_stride, lfi_n->mblim[filter_level]);

                    if (!skip_lf)
-                        LF_INVOKE(&cm->rtcd.loopfilter, simple_b_v)
+                        vp8_loop_filter_simple_bv
                        (y_ptr, post->y_stride, lfi_n->blim[filter_level]);

-                    LF_INVOKE(&cm->rtcd.loopfilter, simple_mb_h)
+                    vp8_loop_filter_simple_mbh
                        (y_ptr, post->y_stride, lfi_n->mblim[filter_level]);

                    if (!skip_lf)
-                        LF_INVOKE(&cm->rtcd.loopfilter, simple_b_h)
+                        vp8_loop_filter_simple_bh
                        (y_ptr, post->y_stride, lfi_n->blim[filter_level]);
                }
            }
--- a/vp8/common/loopfilter.h
+++ b/vp8/common/loopfilter.h
@@ -14,6 +14,7 @@

 #include "vpx_ports/mem.h"
 #include "vpx_config.h"
+#include "vpx_rtcd.h"

 #define MAX_LOOP_FILTER             63
 /* fraction of total macroblock rows to be used in fast filter level picking */
@@ -46,7 +47,7 @@ typedef struct
    unsigned char mode_lf_lut[10];
 } loop_filter_info_n;

-typedef struct
+typedef struct loop_filter_info
 {
    const unsigned char * mblim;
    const unsigned char * blim;
@@ -55,86 +56,6 @@ typedef struct
 } loop_filter_info;


-#define prototype_loopfilter(sym) \
-    void sym(unsigned char *src, int pitch, const unsigned char *blimit,\
-             const unsigned char *limit, const unsigned char *thresh, int count)
-
-#define prototype_loopfilter_block(sym) \
-    void sym(unsigned char *y, unsigned char *u, unsigned char *v, \
-             int ystride, int uv_stride, loop_filter_info *lfi)
-
-#define prototype_simple_loopfilter(sym) \
-    void sym(unsigned char *y, int ystride, const unsigned char *blimit)
-
-#if ARCH_X86 || ARCH_X86_64
-#include "x86/loopfilter_x86.h"
-#endif
-
-#if ARCH_ARM
-#include "arm/loopfilter_arm.h"
-#endif
-
-#ifndef vp8_lf_normal_mb_v
-#define vp8_lf_normal_mb_v vp8_loop_filter_mbv_c
-#endif
-extern prototype_loopfilter_block(vp8_lf_normal_mb_v);
-
-#ifndef vp8_lf_normal_b_v
-#define vp8_lf_normal_b_v vp8_loop_filter_bv_c
-#endif
-extern prototype_loopfilter_block(vp8_lf_normal_b_v);
-
-#ifndef vp8_lf_normal_mb_h
-#define vp8_lf_normal_mb_h vp8_loop_filter_mbh_c
-#endif
-extern prototype_loopfilter_block(vp8_lf_normal_mb_h);
-
-#ifndef vp8_lf_normal_b_h
-#define vp8_lf_normal_b_h vp8_loop_filter_bh_c
-#endif
-extern prototype_loopfilter_block(vp8_lf_normal_b_h);
-
-#ifndef vp8_lf_simple_mb_v
-#define vp8_lf_simple_mb_v vp8_loop_filter_simple_vertical_edge_c
-#endif
-extern prototype_simple_loopfilter(vp8_lf_simple_mb_v);
-
-#ifndef vp8_lf_simple_b_v
-#define vp8_lf_simple_b_v vp8_loop_filter_bvs_c
-#endif
-extern prototype_simple_loopfilter(vp8_lf_simple_b_v);
-
-#ifndef vp8_lf_simple_mb_h
-#define vp8_lf_simple_mb_h vp8_loop_filter_simple_horizontal_edge_c
-#endif
-extern prototype_simple_loopfilter(vp8_lf_simple_mb_h);
-
-#ifndef vp8_lf_simple_b_h
-#define vp8_lf_simple_b_h vp8_loop_filter_bhs_c
-#endif
-extern prototype_simple_loopfilter(vp8_lf_simple_b_h);
-
-typedef prototype_loopfilter_block((*vp8_lf_block_fn_t));
-typedef prototype_simple_loopfilter((*vp8_slf_block_fn_t));
-
-typedef struct
-{
-    vp8_lf_block_fn_t  normal_mb_v;
-    vp8_lf_block_fn_t  normal_b_v;
-    vp8_lf_block_fn_t  normal_mb_h;
-    vp8_lf_block_fn_t  normal_b_h;
-    vp8_slf_block_fn_t  simple_mb_v;
-    vp8_slf_block_fn_t  simple_b_v;
-    vp8_slf_block_fn_t  simple_mb_h;
-    vp8_slf_block_fn_t  simple_b_h;
-} vp8_loopfilter_rtcd_vtable_t;
-
-#if CONFIG_RUNTIME_CPU_DETECT
-#define LF_INVOKE(ctx,fn) (ctx)->fn
-#else
-#define LF_INVOKE(ctx,fn) vp8_lf_##fn
-#endif
-
 typedef void loop_filter_uvfunction
 (
    unsigned char *u,   /* source pointer */
@@ -147,22 +68,22 @@ typedef void loop_filter_uvfunction

 /* assorted loopfilter functions which get used elsewhere */
 struct VP8Common;
-struct MacroBlockD;
+struct macroblockd;

 void vp8_loop_filter_init(struct VP8Common *cm);

 void vp8_loop_filter_frame_init(struct VP8Common *cm,
-                                struct MacroBlockD *mbd,
+                                struct macroblockd *mbd,
                                int default_filt_lvl);

-void vp8_loop_filter_frame(struct VP8Common *cm, struct MacroBlockD *mbd);
+void vp8_loop_filter_frame(struct VP8Common *cm, struct macroblockd *mbd);

 void vp8_loop_filter_partial_frame(struct VP8Common *cm,
-                                   struct MacroBlockD *mbd,
+                                   struct macroblockd *mbd,
                                   int default_filt_lvl);

 void vp8_loop_filter_frame_yonly(struct VP8Common *cm,
-                                 struct MacroBlockD *mbd,
+                                 struct macroblockd *mbd,
                                 int default_filt_lvl);

 void vp8_loop_filter_update_sharpness(loop_filter_info_n *lfi,
--- a/vp8/common/loopfilter_filters.c
+++ b/vp8/common/loopfilter_filters.c
@@ -15,7 +15,7 @@

 typedef unsigned char uc;

-static __inline signed char vp8_signed_char_clamp(int t)
+static signed char vp8_signed_char_clamp(int t)
 {
    t = (t < -128 ? -128 : t);
    t = (t > 127 ? 127 : t);
@@ -24,9 +24,9 @@ static __inline signed char vp8_signed_char_clamp(int t)


 /* should we apply any filter at all ( 11111111 yes, 00000000 no) */
-static __inline signed char vp8_filter_mask(uc limit, uc blimit,
-                                     uc p3, uc p2, uc p1, uc p0,
-                                     uc q0, uc q1, uc q2, uc q3)
+static signed char vp8_filter_mask(uc limit, uc blimit,
+                            uc p3, uc p2, uc p1, uc p0,
+                            uc q0, uc q1, uc q2, uc q3)
 {
    signed char mask = 0;
    mask |= (abs(p3 - p2) > limit);
@@ -40,7 +40,7 @@ static __inline signed char vp8_filter_mask(uc limit, uc blimit,
 }

 /* is there high variance internal edge ( 11111111 yes, 00000000 no) */
-static __inline signed char vp8_hevmask(uc thresh, uc p1, uc p0, uc q0, uc q1)
+static signed char vp8_hevmask(uc thresh, uc p1, uc p0, uc q0, uc q1)
 {
    signed char hev = 0;
    hev  |= (abs(p1 - p0) > thresh) * -1;
@@ -48,7 +48,7 @@ static __inline signed char vp8_hevmask(uc thresh, uc p1, uc p0, uc q0, uc q1)
    return hev;
 }

-static __inline void vp8_filter(signed char mask, uc hev, uc *op1,
+static void vp8_filter(signed char mask, uc hev, uc *op1,
        uc *op0, uc *oq0, uc *oq1)

 {
@@ -158,7 +158,7 @@ void vp8_loop_filter_vertical_edge_c
    while (++i < count * 8);
 }

-static __inline void vp8_mbfilter(signed char mask, uc hev,
+static void vp8_mbfilter(signed char mask, uc hev,
                           uc *op2, uc *op1, uc *op0, uc *oq0, uc *oq1, uc *oq2)
 {
    signed char s, u;
@@ -279,7 +279,7 @@ void vp8_mbloop_filter_vertical_edge_c
 }

 /* should we apply any filter at all ( 11111111 yes, 00000000 no) */
-static __inline signed char vp8_simple_filter_mask(uc blimit, uc p1, uc p0, uc q0, uc q1)
+static signed char vp8_simple_filter_mask(uc blimit, uc p1, uc p0, uc q0, uc q1)
 {
 /* Why does this cause problems for win32?
 * error C2143: syntax error : missing ';' before 'type'
@@ -289,7 +289,7 @@ static __inline signed char vp8_simple_filter_mask(uc blimit, uc p1, uc p0, uc q
    return mask;
 }

-static __inline void vp8_simple_filter(signed char mask, uc *op1, uc *op0, uc *oq0, uc *oq1)
+static void vp8_simple_filter(signed char mask, uc *op1, uc *op0, uc *oq0, uc *oq1)
 {
    signed char vp8_filter, Filter1, Filter2;
    signed char p1 = (signed char) * op1 ^ 0x80;
@@ -352,3 +352,79 @@ void vp8_loop_filter_simple_vertical_edge_c
    while (++i < 16);

 }
+
+/* Horizontal MB filtering */
+void vp8_loop_filter_mbh_c(unsigned char *y_ptr, unsigned char *u_ptr,
+                           unsigned char *v_ptr, int y_stride, int uv_stride,
+                           loop_filter_info *lfi)
+{
+    vp8_mbloop_filter_horizontal_edge_c(y_ptr, y_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 2);
+
+    if (u_ptr)
+        vp8_mbloop_filter_horizontal_edge_c(u_ptr, uv_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 1);
+
+    if (v_ptr)
+        vp8_mbloop_filter_horizontal_edge_c(v_ptr, uv_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 1);
+}
+
+/* Vertical MB Filtering */
+void vp8_loop_filter_mbv_c(unsigned char *y_ptr, unsigned char *u_ptr,
+                           unsigned char *v_ptr, int y_stride, int uv_stride,
+                           loop_filter_info *lfi)
+{
+    vp8_mbloop_filter_vertical_edge_c(y_ptr, y_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 2);
+
+    if (u_ptr)
+        vp8_mbloop_filter_vertical_edge_c(u_ptr, uv_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 1);
+
+    if (v_ptr)
+        vp8_mbloop_filter_vertical_edge_c(v_ptr, uv_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 1);
+}
+
+/* Horizontal B Filtering */
+void vp8_loop_filter_bh_c(unsigned char *y_ptr, unsigned char *u_ptr,
+                          unsigned char *v_ptr, int y_stride, int uv_stride,
+                          loop_filter_info *lfi)
+{
+    vp8_loop_filter_horizontal_edge_c(y_ptr + 4 * y_stride, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
+    vp8_loop_filter_horizontal_edge_c(y_ptr + 8 * y_stride, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
+    vp8_loop_filter_horizontal_edge_c(y_ptr + 12 * y_stride, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
+
+    if (u_ptr)
+        vp8_loop_filter_horizontal_edge_c(u_ptr + 4 * uv_stride, uv_stride, lfi->blim, lfi->lim, lfi->hev_thr, 1);
+
+    if (v_ptr)
+        vp8_loop_filter_horizontal_edge_c(v_ptr + 4 * uv_stride, uv_stride, lfi->blim, lfi->lim, lfi->hev_thr, 1);
+}
+
+void vp8_loop_filter_bhs_c(unsigned char *y_ptr, int y_stride,
+                           const unsigned char *blimit)
+{
+    vp8_loop_filter_simple_horizontal_edge_c(y_ptr + 4 * y_stride, y_stride, blimit);
+    vp8_loop_filter_simple_horizontal_edge_c(y_ptr + 8 * y_stride, y_stride, blimit);
+    vp8_loop_filter_simple_horizontal_edge_c(y_ptr + 12 * y_stride, y_stride, blimit);
+}
+
+/* Vertical B Filtering */
+void vp8_loop_filter_bv_c(unsigned char *y_ptr, unsigned char *u_ptr,
+                          unsigned char *v_ptr, int y_stride, int uv_stride,
+                          loop_filter_info *lfi)
+{
+    vp8_loop_filter_vertical_edge_c(y_ptr + 4, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
+    vp8_loop_filter_vertical_edge_c(y_ptr + 8, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
+    vp8_loop_filter_vertical_edge_c(y_ptr + 12, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
+
+    if (u_ptr)
+        vp8_loop_filter_vertical_edge_c(u_ptr + 4, uv_stride, lfi->blim, lfi->lim, lfi->hev_thr, 1);
+
+    if (v_ptr)
+        vp8_loop_filter_vertical_edge_c(v_ptr + 4, uv_stride, lfi->blim, lfi->lim, lfi->hev_thr, 1);
+}
+
+void vp8_loop_filter_bvs_c(unsigned char *y_ptr, int y_stride,
+                           const unsigned char *blimit)
+{
+    vp8_loop_filter_simple_vertical_edge_c(y_ptr + 4, y_stride, blimit);
+    vp8_loop_filter_simple_vertical_edge_c(y_ptr + 8, y_stride, blimit);
+    vp8_loop_filter_simple_vertical_edge_c(y_ptr + 12, y_stride, blimit);
+}
--- a/vp8/common/mbpitch.c
+++ b/vp8/common/mbpitch.c
@@ -11,74 +11,6 @@

 #include "blockd.h"

-typedef enum
-{
-    PRED = 0,
-    DEST = 1
-} BLOCKSET;
-
-static void setup_block
-(
-    BLOCKD *b,
-    int mv_stride,
-    unsigned char **base,
-    int Stride,
-    int offset,
-    BLOCKSET bs
-)
-{
-
-    if (bs == DEST)
-    {
-        b->dst_stride = Stride;
-        b->dst = offset;
-        b->base_dst = base;
-    }
-    else
-    {
-        b->pre_stride = Stride;
-        b->pre = offset;
-        b->base_pre = base;
-    }
-
-}
-
-
-static void setup_macroblock(MACROBLOCKD *x, BLOCKSET bs)
-{
-    int block;
-
-    unsigned char **y, **u, **v;
-
-    if (bs == DEST)
-    {
-        y = &x->dst.y_buffer;
-        u = &x->dst.u_buffer;
-        v = &x->dst.v_buffer;
-    }
-    else
-    {
-        y = &x->pre.y_buffer;
-        u = &x->pre.u_buffer;
-        v = &x->pre.v_buffer;
-    }
-
-    for (block = 0; block < 16; block++) /* y blocks */
-    {
-        setup_block(&x->block[block], x->dst.y_stride, y, x->dst.y_stride,
-                        (block >> 2) * 4 * x->dst.y_stride + (block & 3) * 4, bs);
-    }
-
-    for (block = 16; block < 20; block++) /* U and V blocks */
-    {
-        setup_block(&x->block[block], x->dst.uv_stride, u, x->dst.uv_stride,
-                        ((block - 16) >> 1) * 4 * x->dst.uv_stride + (block & 1) * 4, bs);
-
-        setup_block(&x->block[block+4], x->dst.uv_stride, v, x->dst.uv_stride,
-                        ((block - 16) >> 1) * 4 * x->dst.uv_stride + (block & 1) * 4, bs);
-    }
-}
-
 void vp8_setup_block_dptrs(MACROBLOCKD *x)
 {
    int r, c;
@@ -119,8 +51,18 @@ void vp8_setup_block_dptrs(MACROBLOCKD *x)

 void vp8_build_block_doffsets(MACROBLOCKD *x)
 {
+    int block;

-    /* handle the destination pitch features */
-    setup_macroblock(x, DEST);
-    setup_macroblock(x, PRED);
+    for (block = 0; block < 16; block++) /* y blocks */
+    {
+        x->block[block].offset =
+            (block >> 2) * 4 * x->dst.y_stride + (block & 3) * 4;
+    }
+
+    for (block = 16; block < 20; block++) /* U and V blocks */
+    {
+        x->block[block+4].offset =
+        x->block[block].offset =
+            ((block - 16) >> 1) * 4 * x->dst.uv_stride + (block & 1) * 4;
+    }
 }
--- a/vp8/common/mfqe.c
+++ b/vp8/common/mfqe.c
@@ -0,0 +1,385 @@
+/*
+ *  Copyright (c) 2012 The WebM project authors. All Rights Reserved.
+ *
+ *  Use of this source code is governed by a BSD-style license
+ *  that can be found in the LICENSE file in the root of the source
+ *  tree. An additional intellectual property rights grant can be found
+ *  in the file PATENTS.  All contributing project authors may
+ *  be found in the AUTHORS file in the root of the source tree.
+ */
+
+
+/* MFQE: Multiframe Quality Enhancement
+ * In rate limited situations keyframes may cause significant visual artifacts
+ * commonly referred to as "popping." This file implements a postproccesing
+ * algorithm which blends data from the preceeding frame when there is no
+ * motion and the q from the previous frame is lower which indicates that it is
+ * higher quality.
+ */
+
+#include "postproc.h"
+#include "variance.h"
+#include "vpx_mem/vpx_mem.h"
+#include "vpx_rtcd.h"
+#include "vpx_scale/yv12config.h"
+
+#include <limits.h>
+#include <stdlib.h>
+
+static void filter_by_weight(unsigned char *src, int src_stride,
+                             unsigned char *dst, int dst_stride,
+                             int block_size, int src_weight)
+{
+    int dst_weight = (1 << MFQE_PRECISION) - src_weight;
+    int rounding_bit = 1 << (MFQE_PRECISION - 1);
+    int r, c;
+
+    for (r = 0; r < block_size; r++)
+    {
+        for (c = 0; c < block_size; c++)
+        {
+            dst[c] = (src[c] * src_weight +
+                      dst[c] * dst_weight +
+                      rounding_bit) >> MFQE_PRECISION;
+        }
+        src += src_stride;
+        dst += dst_stride;
+    }
+}
+
+void vp8_filter_by_weight16x16_c(unsigned char *src, int src_stride,
+                                 unsigned char *dst, int dst_stride,
+                                 int src_weight)
+{
+    filter_by_weight(src, src_stride, dst, dst_stride, 16, src_weight);
+}
+
+void vp8_filter_by_weight8x8_c(unsigned char *src, int src_stride,
+                               unsigned char *dst, int dst_stride,
+                               int src_weight)
+{
+    filter_by_weight(src, src_stride, dst, dst_stride, 8, src_weight);
+}
+
+void vp8_filter_by_weight4x4_c(unsigned char *src, int src_stride,
+                               unsigned char *dst, int dst_stride,
+                               int src_weight)
+{
+    filter_by_weight(src, src_stride, dst, dst_stride, 4, src_weight);
+}
+
+static void apply_ifactor(unsigned char *y_src,
+                          int y_src_stride,
+                          unsigned char *y_dst,
+                          int y_dst_stride,
+                          unsigned char *u_src,
+                          unsigned char *v_src,
+                          int uv_src_stride,
+                          unsigned char *u_dst,
+                          unsigned char *v_dst,
+                          int uv_dst_stride,
+                          int block_size,
+                          int src_weight)
+{
+    if (block_size == 16)
+    {
+        vp8_filter_by_weight16x16(y_src, y_src_stride, y_dst, y_dst_stride, src_weight);
+        vp8_filter_by_weight8x8(u_src, uv_src_stride, u_dst, uv_dst_stride, src_weight);
+        vp8_filter_by_weight8x8(v_src, uv_src_stride, v_dst, uv_dst_stride, src_weight);
+    }
+    else /* if (block_size == 8) */
+    {
+        vp8_filter_by_weight8x8(y_src, y_src_stride, y_dst, y_dst_stride, src_weight);
+        vp8_filter_by_weight4x4(u_src, uv_src_stride, u_dst, uv_dst_stride, src_weight);
+        vp8_filter_by_weight4x4(v_src, uv_src_stride, v_dst, uv_dst_stride, src_weight);
+    }
+}
+
+static unsigned int int_sqrt(unsigned int x)
+{
+    unsigned int y = x;
+    unsigned int guess;
+    int p = 1;
+    while (y>>=1) p++;
+    p>>=1;
+
+    guess=0;
+    while (p>=0)
+    {
+        guess |= (1<<p);
+        if (x<guess*guess)
+            guess -= (1<<p);
+        p--;
+    }
+    /* choose between guess or guess+1 */
+    return guess+(guess*guess+guess+1<=x);
+}
+
+#define USE_SSD
+static void multiframe_quality_enhance_block
+(
+    int blksize, /* Currently only values supported are 16, 8 */
+    int qcurr,
+    int qprev,
+    unsigned char *y,
+    unsigned char *u,
+    unsigned char *v,
+    int y_stride,
+    int uv_stride,
+    unsigned char *yd,
+    unsigned char *ud,
+    unsigned char *vd,
+    int yd_stride,
+    int uvd_stride
+)
+{
+    static const unsigned char VP8_ZEROS[16]=
+    {
+         0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
+    };
+    int uvblksize = blksize >> 1;
+    int qdiff = qcurr - qprev;
+
+    int i;
+    unsigned char *up;
+    unsigned char *udp;
+    unsigned char *vp;
+    unsigned char *vdp;
+
+    unsigned int act, actd, sad, usad, vsad, sse, thr, thrsq, actrisk;
+
+    if (blksize == 16)
+    {
+        actd = (vp8_variance16x16(yd, yd_stride, VP8_ZEROS, 0, &sse)+128)>>8;
+        act = (vp8_variance16x16(y, y_stride, VP8_ZEROS, 0, &sse)+128)>>8;
+#ifdef USE_SSD
+        sad = (vp8_variance16x16(y, y_stride, yd, yd_stride, &sse));
+        sad = (sse + 128)>>8;
+        usad = (vp8_variance8x8(u, uv_stride, ud, uvd_stride, &sse));
+        usad = (sse + 32)>>6;
+        vsad = (vp8_variance8x8(v, uv_stride, vd, uvd_stride, &sse));
+        vsad = (sse + 32)>>6;
+#else
+        sad = (vp8_sad16x16(y, y_stride, yd, yd_stride, INT_MAX)+128)>>8;
+        usad = (vp8_sad8x8(u, uv_stride, ud, uvd_stride, INT_MAX)+32)>>6;
+        vsad = (vp8_sad8x8(v, uv_stride, vd, uvd_stride, INT_MAX)+32)>>6;
+#endif
+    }
+    else /* if (blksize == 8) */
+    {
+        actd = (vp8_variance8x8(yd, yd_stride, VP8_ZEROS, 0, &sse)+32)>>6;
+        act = (vp8_variance8x8(y, y_stride, VP8_ZEROS, 0, &sse)+32)>>6;
+#ifdef USE_SSD
+        sad = (vp8_variance8x8(y, y_stride, yd, yd_stride, &sse));
+        sad = (sse + 32)>>6;
+        usad = (vp8_variance4x4(u, uv_stride, ud, uvd_stride, &sse));
+        usad = (sse + 8)>>4;
+        vsad = (vp8_variance4x4(v, uv_stride, vd, uvd_stride, &sse));
+        vsad = (sse + 8)>>4;
+#else
+        sad = (vp8_sad8x8(y, y_stride, yd, yd_stride, INT_MAX)+32)>>6;
+        usad = (vp8_sad4x4(u, uv_stride, ud, uvd_stride, INT_MAX)+8)>>4;
+        vsad = (vp8_sad4x4(v, uv_stride, vd, uvd_stride, INT_MAX)+8)>>4;
+#endif
+    }
+
+    actrisk = (actd > act * 5);
+
+    /* thr = qdiff/8 + log2(act) + log4(qprev) */
+    thr = (qdiff >> 3);
+    while (actd >>= 1) thr++;
+    while (qprev >>= 2) thr++;
+
+#ifdef USE_SSD
+    thrsq = thr * thr;
+    if (sad < thrsq &&
+        /* additional checks for color mismatch and excessive addition of
+         * high-frequencies */
+        4 * usad < thrsq && 4 * vsad < thrsq && !actrisk)
+#else
+    if (sad < thr &&
+        /* additional checks for color mismatch and excessive addition of
+         * high-frequencies */
+        2 * usad < thr && 2 * vsad < thr && !actrisk)
+#endif
+    {
+        int ifactor;
+#ifdef USE_SSD
+        /* TODO: optimize this later to not need sqr root */
+        sad = int_sqrt(sad);
+#endif
+        ifactor = (sad << MFQE_PRECISION) / thr;
+        ifactor >>= (qdiff >> 5);
+
+        if (ifactor)
+        {
+            apply_ifactor(y, y_stride, yd, yd_stride,
+                          u, v, uv_stride,
+                          ud, vd, uvd_stride,
+                          blksize, ifactor);
+        }
+    }
+    else  /* else implicitly copy from previous frame */
+    {
+        if (blksize == 16)
+        {
+            vp8_copy_mem16x16(y, y_stride, yd, yd_stride);
+            vp8_copy_mem8x8(u, uv_stride, ud, uvd_stride);
+            vp8_copy_mem8x8(v, uv_stride, vd, uvd_stride);
+        }
+        else  /* if (blksize == 8) */
+        {
+            vp8_copy_mem8x8(y, y_stride, yd, yd_stride);
+            for (up = u, udp = ud, i = 0; i < uvblksize; ++i, up += uv_stride, udp += uvd_stride)
+                vpx_memcpy(udp, up, uvblksize);
+            for (vp = v, vdp = vd, i = 0; i < uvblksize; ++i, vp += uv_stride, vdp += uvd_stride)
+                vpx_memcpy(vdp, vp, uvblksize);
+        }
+    }
+}
+
+static int qualify_inter_mb(const MODE_INFO *mode_info_context, int *map)
+{
+    if (mode_info_context->mbmi.mb_skip_coeff)
+        map[0] = map[1] = map[2] = map[3] = 1;
+    else if (mode_info_context->mbmi.mode==SPLITMV)
+    {
+        static int ndx[4][4] =
+        {
+            {0, 1, 4, 5},
+            {2, 3, 6, 7},
+            {8, 9, 12, 13},
+            {10, 11, 14, 15}
+        };
+        int i, j;
+        for (i=0; i<4; ++i)
+        {
+            map[i] = 1;
+            for (j=0; j<4 && map[j]; ++j)
+                map[i] &= (mode_info_context->bmi[ndx[i][j]].mv.as_mv.row <= 2 &&
+                           mode_info_context->bmi[ndx[i][j]].mv.as_mv.col <= 2);
+        }
+    }
+    else
+    {
+        map[0] = map[1] = map[2] = map[3] =
+            (mode_info_context->mbmi.mode > B_PRED &&
+             abs(mode_info_context->mbmi.mv.as_mv.row) <= 2 &&
+             abs(mode_info_context->mbmi.mv.as_mv.col) <= 2);
+    }
+    return (map[0]+map[1]+map[2]+map[3]);
+}
+
+void vp8_multiframe_quality_enhance
+(
+    VP8_COMMON *cm
+)
+{
+    YV12_BUFFER_CONFIG *show = cm->frame_to_show;
+    YV12_BUFFER_CONFIG *dest = &cm->post_proc_buffer;
+
+    FRAME_TYPE frame_type = cm->frame_type;
+    /* Point at base of Mb MODE_INFO list has motion vectors etc */
+    const MODE_INFO *mode_info_context = cm->mi;
+    int mb_row;
+    int mb_col;
+    int totmap, map[4];
+    int qcurr = cm->base_qindex;
+    int qprev = cm->postproc_state.last_base_qindex;
+
+    unsigned char *y_ptr, *u_ptr, *v_ptr;
+    unsigned char *yd_ptr, *ud_ptr, *vd_ptr;
+
+    /* Set up the buffer pointers */
+    y_ptr = show->y_buffer;
+    u_ptr = show->u_buffer;
+    v_ptr = show->v_buffer;
+    yd_ptr = dest->y_buffer;
+    ud_ptr = dest->u_buffer;
+    vd_ptr = dest->v_buffer;
+
+    /* postprocess each macro block */
+    for (mb_row = 0; mb_row < cm->mb_rows; mb_row++)
+    {
+        for (mb_col = 0; mb_col < cm->mb_cols; mb_col++)
+        {
+            /* if motion is high there will likely be no benefit */
+            if (frame_type == INTER_FRAME) totmap = qualify_inter_mb(mode_info_context, map);
+            else totmap = (frame_type == KEY_FRAME ? 4 : 0);
+            if (totmap)
+            {
+                if (totmap < 4)
+                {
+                    int i, j;
+                    for (i=0; i<2; ++i)
+                        for (j=0; j<2; ++j)
+                        {
+                            if (map[i*2+j])
+                            {
+                                multiframe_quality_enhance_block(8, qcurr, qprev,
+                                                                 y_ptr + 8*(i*show->y_stride+j),
+                                                                 u_ptr + 4*(i*show->uv_stride+j),
+                                                                 v_ptr + 4*(i*show->uv_stride+j),
+                                                                 show->y_stride,
+                                                                 show->uv_stride,
+                                                                 yd_ptr + 8*(i*dest->y_stride+j),
+                                                                 ud_ptr + 4*(i*dest->uv_stride+j),
+                                                                 vd_ptr + 4*(i*dest->uv_stride+j),
+                                                                 dest->y_stride,
+                                                                 dest->uv_stride);
+                            }
+                            else
+                            {
+                                /* copy a 8x8 block */
+                                int k;
+                                unsigned char *up = u_ptr + 4*(i*show->uv_stride+j);
+                                unsigned char *udp = ud_ptr + 4*(i*dest->uv_stride+j);
+                                unsigned char *vp = v_ptr + 4*(i*show->uv_stride+j);
+                                unsigned char *vdp = vd_ptr + 4*(i*dest->uv_stride+j);
+                                vp8_copy_mem8x8(y_ptr + 8*(i*show->y_stride+j), show->y_stride,
+                                                yd_ptr + 8*(i*dest->y_stride+j), dest->y_stride);
+                                for (k = 0; k < 4; ++k, up += show->uv_stride, udp += dest->uv_stride,
+                                                        vp += show->uv_stride, vdp += dest->uv_stride)
+                                {
+                                    vpx_memcpy(udp, up, 4);
+                                    vpx_memcpy(vdp, vp, 4);
+                                }
+                            }
+                        }
+                }
+                else /* totmap = 4 */
+                {
+                    multiframe_quality_enhance_block(16, qcurr, qprev, y_ptr,
+                                                     u_ptr, v_ptr,
+                                                     show->y_stride,
+                                                     show->uv_stride,
+                                                     yd_ptr, ud_ptr, vd_ptr,
+                                                     dest->y_stride,
+                                                     dest->uv_stride);
+                }
+            }
+            else
+            {
+                vp8_copy_mem16x16(y_ptr, show->y_stride, yd_ptr, dest->y_stride);
+                vp8_copy_mem8x8(u_ptr, show->uv_stride, ud_ptr, dest->uv_stride);
+                vp8_copy_mem8x8(v_ptr, show->uv_stride, vd_ptr, dest->uv_stride);
+            }
+            y_ptr += 16;
+            u_ptr += 8;
+            v_ptr += 8;
+            yd_ptr += 16;
+            ud_ptr += 8;
+            vd_ptr += 8;
+            mode_info_context++;     /* step to next MB */
+        }
+
+        y_ptr += show->y_stride  * 16 - 16 * cm->mb_cols;
+        u_ptr += show->uv_stride *  8 - 8 * cm->mb_cols;
+        v_ptr += show->uv_stride *  8 - 8 * cm->mb_cols;
+        yd_ptr += dest->y_stride  * 16 - 16 * cm->mb_cols;
+        ud_ptr += dest->uv_stride *  8 - 8 * cm->mb_cols;
+        vd_ptr += dest->uv_stride *  8 - 8 * cm->mb_cols;
+
+        mode_info_context++;         /* Skip border mb */
+    }
+}
--- a/vp8/common/modecontext.c
+++ b/vp8/common/modecontext.c
@@ -1,146 +0,0 @@
-/*
- *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-
-#include "entropymode.h"
-
-const unsigned int vp8_kf_default_bmode_counts [VP8_BINTRAMODES] [VP8_BINTRAMODES] [VP8_BINTRAMODES] =
-{
-    {
-        /*Above Mode :  0*/
-        { 43438,   2195,    470,    316,    615,    171,    217,    412,    124,    160, }, /* left_mode 0 */
-        {  5722,   2751,    296,    291,     81,     68,     80,    101,    100,    170, }, /* left_mode 1 */
-        {  1629,    201,    307,     25,     47,     16,     34,     72,     19,     28, }, /* left_mode 2 */
-        {   332,    266,     36,    500,     20,     65,     23,     14,    154,    106, }, /* left_mode 3 */
-        {   450,     97,     10,     24,    117,     10,      2,     12,      8,     71, }, /* left_mode 4 */
-        {   384,     49,     29,     44,     12,    162,     51,      5,     87,     42, }, /* left_mode 5 */
-        {   495,     53,    157,     27,     14,     57,    180,     17,     17,     34, }, /* left_mode 6 */
-        {   695,     64,     62,      9,     27,      5,      3,    147,     10,     26, }, /* left_mode 7 */
-        {   230,     54,     20,    124,     16,    125,     29,     12,    283,     37, }, /* left_mode 8 */
-        {   260,     87,     21,    120,     32,     16,     33,     16,     33,    203, }, /* left_mode 9 */
-    },
-    {
-        /*Above Mode :  1*/
-        {  3934,   2573,    355,    137,    128,     87,    133,    117,     37,     27, }, /* left_mode 0 */
-        {  1036,   1929,    278,    135,     27,     37,     48,     55,     41,     91, }, /* left_mode 1 */
-        {   223,    256,    253,     15,     13,      9,     28,     64,      3,      3, }, /* left_mode 2 */
-        {   120,    129,     17,    316,     15,     11,      9,      4,     53,     74, }, /* left_mode 3 */
-        {   129,     58,      6,     11,     38,      2,      0,      5,      2,     67, }, /* left_mode 4 */
-        {    53,     22,     11,     16,      8,     26,     14,      3,     19,     12, }, /* left_mode 5 */
-        {    59,     26,     61,     11,      4,      9,     35,     13,      8,      8, }, /* left_mode 6 */
-        {   101,     52,     40,      8,      5,      2,      8,     59,      2,     20, }, /* left_mode 7 */
-        {    48,     34,     10,     52,      8,     15,      6,      6,     63,     20, }, /* left_mode 8 */
-        {    96,     48,     22,     63,     11,     14,      5,      8,      9,     96, }, /* left_mode 9 */
-    },
-    {
-        /*Above Mode :  2*/
-        {   709,    461,    506,     36,     27,     33,    151,     98,     24,      6, }, /* left_mode 0 */
-        {   201,    375,    442,     27,     13,      8,     46,     58,      6,     19, }, /* left_mode 1 */
-        {   122,    140,    417,      4,     13,      3,     33,     59,      4,      2, }, /* left_mode 2 */
-        {    36,     17,     22,     16,      6,      8,     12,     17,      9,     21, }, /* left_mode 3 */
-        {    51,     15,      7,      1,     14,      0,      4,      5,      3,     22, }, /* left_mode 4 */
-        {    18,     11,     30,      9,      7,     20,     11,      5,      2,      6, }, /* left_mode 5 */
-        {    38,     21,    103,      9,      4,     12,     79,     13,      2,      5, }, /* left_mode 6 */
-        {    64,     17,     66,      2,     12,      4,      2,     65,      4,      5, }, /* left_mode 7 */
-        {    14,      7,      7,     16,      3,     11,      4,     13,     15,     16, }, /* left_mode 8 */
-        {    36,      8,     32,      9,      9,      4,     14,      7,      6,     24, }, /* left_mode 9 */
-    },
-    {
-        /*Above Mode :  3*/
-        {  1340,    173,     36,    119,     30,     10,     13,     10,     20,     26, }, /* left_mode 0 */
-        {   156,    293,     26,    108,      5,     16,      2,      4,     23,     30, }, /* left_mode 1 */
-        {    60,     34,     13,      7,      3,      3,      0,      8,      4,      5, }, /* left_mode 2 */
-        {    72,     64,      1,    235,      3,      9,      2,      7,     28,     38, }, /* left_mode 3 */
-        {    29,     14,      1,      3,      5,      0,      2,      2,      5,     13, }, /* left_mode 4 */
-        {    22,      7,      4,     11,      2,      5,      1,      2,      6,      4, }, /* left_mode 5 */
-        {    18,     14,      5,      6,      4,      3,     14,      0,      9,      2, }, /* left_mode 6 */
-        {    41,     10,      7,      1,      2,      0,      0,     10,      2,      1, }, /* left_mode 7 */
-        {    23,     19,      2,     33,      1,      5,      2,      0,     51,      8, }, /* left_mode 8 */
-        {    33,     26,      7,     53,      3,      9,      3,      3,      9,     19, }, /* left_mode 9 */
-    },
-    {
-        /*Above Mode :  4*/
-        {   410,    165,     43,     31,     66,     15,     30,     54,      8,     17, }, /* left_mode 0 */
-        {   115,     64,     27,     18,     30,      7,     11,     15,      4,     19, }, /* left_mode 1 */
-        {    31,     23,     25,      1,      7,      2,      2,     10,      0,      5, }, /* left_mode 2 */
-        {    17,      4,      1,      6,      8,      2,      7,      5,      5,     21, }, /* left_mode 3 */
-        {   120,     12,      1,      2,     83,      3,      0,      4,      1,     40, }, /* left_mode 4 */
-        {     4,      3,      1,      2,      1,      2,      5,      0,      3,      6, }, /* left_mode 5 */
-        {    10,      2,     13,      6,      6,      6,      8,      2,      4,      5, }, /* left_mode 6 */
-        {    58,     10,      5,      1,     28,      1,      1,     33,      1,      9, }, /* left_mode 7 */
-        {     8,      2,      1,      4,      2,      5,      1,      1,      2,     10, }, /* left_mode 8 */
-        {    76,      7,      5,      7,     18,      2,      2,      0,      5,     45, }, /* left_mode 9 */
-    },
-    {
-        /*Above Mode :  5*/
-        {   444,     46,     47,     20,     14,    110,     60,     14,     60,      7, }, /* left_mode 0 */
-        {    59,     57,     25,     18,      3,     17,     21,      6,     14,      6, }, /* left_mode 1 */
-        {    24,     17,     20,      6,      4,     13,      7,      2,      3,      2, }, /* left_mode 2 */
-        {    13,     11,      5,     14,      4,      9,      2,      4,     15,      7, }, /* left_mode 3 */
-        {     8,      5,      2,      1,      4,      0,      1,      1,      2,     12, }, /* left_mode 4 */
-        {    19,      5,      5,      7,      4,     40,      6,      3,     10,      4, }, /* left_mode 5 */
-        {    16,      5,      9,      1,      1,     16,     26,      2,     10,      4, }, /* left_mode 6 */
-        {    11,      4,      8,      1,      1,      4,      4,      5,      4,      1, }, /* left_mode 7 */
-        {    15,      1,      3,      7,      3,     21,      7,      1,     34,      5, }, /* left_mode 8 */
-        {    18,      5,      1,      3,      4,      3,      7,      1,      2,      9, }, /* left_mode 9 */
-    },
-    {
-        /*Above Mode :  6*/
-        {   476,    149,     94,     13,     14,     77,    291,     27,     23,      3, }, /* left_mode 0 */
-        {    79,     83,     42,     14,      2,     12,     63,      2,      4,     14, }, /* left_mode 1 */
-        {    43,     36,     55,      1,      3,      8,     42,     11,      5,      1, }, /* left_mode 2 */
-        {     9,      9,      6,     16,      1,      5,      6,      3,     11,     10, }, /* left_mode 3 */
-        {    10,      3,      1,      3,     10,      1,      0,      1,      1,      4, }, /* left_mode 4 */
-        {    14,      6,     15,      5,      1,     20,     25,      2,      5,      0, }, /* left_mode 5 */
-        {    28,      7,     51,      1,      0,      8,    127,      6,      2,      5, }, /* left_mode 6 */
-        {    13,      3,      3,      2,      3,      1,      2,      8,      1,      2, }, /* left_mode 7 */
-        {    10,      3,      3,      3,      3,      8,      2,      2,      9,      3, }, /* left_mode 8 */
-        {    13,      7,     11,      4,      0,      4,      6,      2,      5,      8, }, /* left_mode 9 */
-    },
-    {
-        /*Above Mode :  7*/
-        {   376,    135,    119,      6,     32,      8,     31,    224,      9,      3, }, /* left_mode 0 */
-        {    93,     60,     54,      6,     13,      7,      8,     92,      2,     12, }, /* left_mode 1 */
-        {    74,     36,     84,      0,      3,      2,      9,     67,      2,      1, }, /* left_mode 2 */
-        {    19,      4,      4,      8,      8,      2,      4,      7,      6,     16, }, /* left_mode 3 */
-        {    51,      7,      4,      1,     77,      3,      0,     14,      1,     15, }, /* left_mode 4 */
-        {     7,      7,      5,      7,      4,      7,      4,      5,      0,      3, }, /* left_mode 5 */
-        {    18,      2,     19,      2,      2,      4,     12,     11,      1,      2, }, /* left_mode 6 */
-        {   129,      6,     27,      1,     21,      3,      0,    189,      0,      6, }, /* left_mode 7 */
-        {     9,      1,      2,      8,      3,      7,      0,      5,      3,      3, }, /* left_mode 8 */
-        {    20,      4,      5,     10,      4,      2,      7,     17,      3,     16, }, /* left_mode 9 */
-    },
-    {
-        /*Above Mode :  8*/
-        {   617,     68,     34,     79,     11,     27,     25,     14,     75,     13, }, /* left_mode 0 */
-        {    51,     82,     21,     26,      6,     12,     13,      1,     26,     16, }, /* left_mode 1 */
-        {    29,      9,     12,     11,      3,      7,      1,     10,      2,      2, }, /* left_mode 2 */
-        {    17,     19,     11,     74,      4,      3,      2,      0,     58,     13, }, /* left_mode 3 */
-        {    10,      1,      1,      3,      4,      1,      0,      2,      1,      8, }, /* left_mode 4 */
-        {    14,      4,      5,      5,      1,     13,      2,      0,     27,      8, }, /* left_mode 5 */
-        {    10,      3,      5,      4,      1,      7,      6,      4,      5,      1, }, /* left_mode 6 */
-        {    10,      2,      6,      2,      1,      1,      1,      4,      2,      1, }, /* left_mode 7 */
-        {    14,      8,      5,     23,      2,     12,      6,      2,    117,      5, }, /* left_mode 8 */
-        {     9,      6,      2,     19,      1,      6,      3,      2,      9,      9, }, /* left_mode 9 */
-    },
-    {
-        /*Above Mode :  9*/
-        {   680,     73,     22,     38,     42,      5,     11,      9,      6,     28, }, /* left_mode 0 */
-        {   113,    112,     21,     22,     10,      2,      8,      4,      6,     42, }, /* left_mode 1 */
-        {    44,     20,     24,      6,      5,      4,      3,      3,      1,      2, }, /* left_mode 2 */
-        {    40,     23,      7,     71,      5,      2,      4,      1,      7,     22, }, /* left_mode 3 */
-        {    85,      9,      4,      4,     17,      2,      0,      3,      2,     23, }, /* left_mode 4 */
-        {    13,      4,      2,      6,      1,      7,      0,      1,      7,      6, }, /* left_mode 5 */
-        {    26,      6,      8,      3,      2,      3,      8,      1,      5,      4, }, /* left_mode 6 */
-        {    54,      8,      9,      6,      7,      0,      1,     11,      1,      3, }, /* left_mode 7 */
-        {     9,     10,      4,     13,      2,      5,      4,      2,     14,      8, }, /* left_mode 8 */
-        {    92,      9,      5,     19,     15,      3,      3,      1,      6,     58, }, /* left_mode 9 */
-    },
-};
--- a/vp8/common/mv.h
+++ b/vp8/common/mv.h
@@ -19,7 +19,7 @@ typedef struct
    short col;
 } MV;

-typedef union
+typedef union int_mv
 {
    uint32_t  as_int;
    MV        as_mv;
--- a/vp8/common/onyx.h
+++ b/vp8/common/onyx.h
@@ -60,19 +60,19 @@ extern "C"
        MODE_BESTQUALITY    = 0x2,
        MODE_FIRSTPASS      = 0x3,
        MODE_SECONDPASS     = 0x4,
-        MODE_SECONDPASS_BEST = 0x5,
+        MODE_SECONDPASS_BEST = 0x5
    } MODE;

    typedef enum
    {
        FRAMEFLAGS_KEY    = 1,
        FRAMEFLAGS_GOLDEN = 2,
-        FRAMEFLAGS_ALTREF = 4,
+        FRAMEFLAGS_ALTREF = 4
    } FRAMETYPE_FLAGS;


 #include <assert.h>
-    static __inline void Scale2Ratio(int mode, int *hr, int *hs)
+    static void Scale2Ratio(int mode, int *hr, int *hs)
    {
        switch (mode)
        {
@@ -207,10 +207,10 @@ extern "C"

        // Temporal scaling parameters
        unsigned int number_of_layers;
-        unsigned int target_bitrate[MAX_PERIODICITY];
-        unsigned int rate_decimator[MAX_PERIODICITY];
+        unsigned int target_bitrate[VPX_TS_MAX_PERIODICITY];
+        unsigned int rate_decimator[VPX_TS_MAX_PERIODICITY];
        unsigned int periodicity;
-        unsigned int layer_id[MAX_PERIODICITY];
+        unsigned int layer_id[VPX_TS_MAX_PERIODICITY];

 #if CONFIG_MULTI_RES_ENCODING
        /* Number of total resolutions encoded */
--- a/vp8/common/onyxc_int.h
+++ b/vp8/common/onyxc_int.h
@@ -13,25 +13,19 @@
 #define __INC_VP8C_INT_H

 #include "vpx_config.h"
+#include "vpx_rtcd.h"
 #include "vpx/internal/vpx_codec_internal.h"
 #include "loopfilter.h"
 #include "entropymv.h"
 #include "entropy.h"
-#include "idct.h"
-#include "recon.h"
 #if CONFIG_POSTPROC
 #include "postproc.h"
 #endif
-#include "dequantize.h"

 /*#ifdef PACKET_TESTING*/
 #include "header.h"
 /*#endif*/

-/* Create/destroy static data structures. */
-
-void vp8_initialize_common(void);
-
 #define MINQ 0
 #define MAXQ 127
 #define QINDEX_RANGE (MAXQ + 1)
@@ -71,23 +65,6 @@ typedef enum
    BILINEAR = 1
 } INTERPOLATIONFILTERTYPE;

-typedef struct VP8_COMMON_RTCD
-{
-#if CONFIG_RUNTIME_CPU_DETECT
-    vp8_dequant_rtcd_vtable_t        dequant;
-    vp8_idct_rtcd_vtable_t        idct;
-    vp8_recon_rtcd_vtable_t       recon;
-    vp8_subpix_rtcd_vtable_t      subpix;
-    vp8_loopfilter_rtcd_vtable_t  loopfilter;
-#if CONFIG_POSTPROC
-    vp8_postproc_rtcd_vtable_t    postproc;
-#endif
-    int                           flags;
-#else
-    int unused;
-#endif
-} VP8_COMMON_RTCD;
-
 typedef struct VP8Common

 {
@@ -111,11 +88,13 @@ typedef struct VP8Common
    int fb_idx_ref_cnt[NUM_YV12_BUFFERS];
    int new_fb_idx, lst_fb_idx, gld_fb_idx, alt_fb_idx;

-    YV12_BUFFER_CONFIG post_proc_buffer;
    YV12_BUFFER_CONFIG temp_scale_frame;

+#if CONFIG_POSTPROC
+    YV12_BUFFER_CONFIG post_proc_buffer;
    YV12_BUFFER_CONFIG post_proc_buffer_int;
    int post_proc_buffer_int_used;
+#endif

    FRAME_TYPE last_frame_type;  /* Save last frame's frame type for motion search. */
    FRAME_TYPE frame_type;
@@ -203,15 +182,13 @@ typedef struct VP8Common
    double bitrate;
    double framerate;

-#if CONFIG_RUNTIME_CPU_DETECT
-    VP8_COMMON_RTCD rtcd;
-#endif
 #if CONFIG_MULTITHREAD
    int processor_core_count;
 #endif
 #if CONFIG_POSTPROC
    struct postproc_state  postproc_state;
 #endif
+    int cpu_caps;
 } VP8_COMMON;

 #endif
--- a/vp8/common/postproc.c
+++ b/vp8/common/postproc.c
@@ -10,15 +10,14 @@


 #include "vpx_config.h"
+#include "vpx_rtcd.h"
 #include "vpx_scale/yv12config.h"
 #include "postproc.h"
 #include "common.h"
-#include "recon.h"
-#include "vpx_scale/yv12extend.h"
 #include "vpx_scale/vpxscale.h"
 #include "systemdependent.h"
-#include "../encoder/variance.h"

+#include <limits.h>
 #include <math.h>
 #include <stdlib.h>
 #include <stdio.h>
@@ -29,7 +28,6 @@
    ( (0.439*(float)(t>>16)) - (0.368*(float)(t>>8&0xff)) - (0.071*(float)(t&0xff)) + 128)

 /* global constants */
-#define MFQE_PRECISION 4
 #if CONFIG_POSTPROC_VISUALIZER
 static const unsigned char MB_PREDICTION_MODE_colors[MB_MODE_COUNT][3] =
 {
@@ -329,21 +327,19 @@ static void vp8_deblock_and_de_macro_block(YV12_BUFFER_CONFIG         *source,
        YV12_BUFFER_CONFIG         *post,
        int                         q,
        int                         low_var_thresh,
-        int                         flag,
-        vp8_postproc_rtcd_vtable_t *rtcd)
+        int                         flag)
 {
    double level = 6.0e-05 * q * q * q - .0067 * q * q + .306 * q + .0065;
    int ppl = (int)(level + .5);
    (void) low_var_thresh;
    (void) flag;

-    POSTPROC_INVOKE(rtcd, downacross)(source->y_buffer, post->y_buffer, source->y_stride,  post->y_stride, source->y_height, source->y_width,  ppl);
-    POSTPROC_INVOKE(rtcd, across)(post->y_buffer, post->y_stride, post->y_height, post->y_width, q2mbl(q));
-    POSTPROC_INVOKE(rtcd, down)(post->y_buffer, post->y_stride, post->y_height, post->y_width, q2mbl(q));
+    vp8_post_proc_down_and_across(source->y_buffer, post->y_buffer, source->y_stride,  post->y_stride, source->y_height, source->y_width,  ppl);
+    vp8_mbpost_proc_across_ip(post->y_buffer, post->y_stride, post->y_height, post->y_width, q2mbl(q));
+    vp8_mbpost_proc_down(post->y_buffer, post->y_stride, post->y_height, post->y_width, q2mbl(q));

-
-    POSTPROC_INVOKE(rtcd, downacross)(source->u_buffer, post->u_buffer, source->uv_stride, post->uv_stride, source->uv_height, source->uv_width, ppl);
-    POSTPROC_INVOKE(rtcd, downacross)(source->v_buffer, post->v_buffer, source->uv_stride, post->uv_stride, source->uv_height, source->uv_width, ppl);
+    vp8_post_proc_down_and_across(source->u_buffer, post->u_buffer, source->uv_stride, post->uv_stride, source->uv_height, source->uv_width, ppl);
+    vp8_post_proc_down_and_across(source->v_buffer, post->v_buffer, source->uv_stride, post->uv_stride, source->uv_height, source->uv_width, ppl);

 }

@@ -351,25 +347,24 @@ void vp8_deblock(YV12_BUFFER_CONFIG         *source,
                 YV12_BUFFER_CONFIG         *post,
                 int                         q,
                 int                         low_var_thresh,
-                 int                         flag,
-                 vp8_postproc_rtcd_vtable_t *rtcd)
+                 int                         flag)
 {
    double level = 6.0e-05 * q * q * q - .0067 * q * q + .306 * q + .0065;
    int ppl = (int)(level + .5);
    (void) low_var_thresh;
    (void) flag;

-    POSTPROC_INVOKE(rtcd, downacross)(source->y_buffer, post->y_buffer, source->y_stride,  post->y_stride, source->y_height, source->y_width,   ppl);
-    POSTPROC_INVOKE(rtcd, downacross)(source->u_buffer, post->u_buffer, source->uv_stride, post->uv_stride,  source->uv_height, source->uv_width, ppl);
-    POSTPROC_INVOKE(rtcd, downacross)(source->v_buffer, post->v_buffer, source->uv_stride, post->uv_stride, source->uv_height, source->uv_width, ppl);
+    vp8_post_proc_down_and_across(source->y_buffer, post->y_buffer, source->y_stride,  post->y_stride, source->y_height, source->y_width,   ppl);
+    vp8_post_proc_down_and_across(source->u_buffer, post->u_buffer, source->uv_stride, post->uv_stride,  source->uv_height, source->uv_width, ppl);
+    vp8_post_proc_down_and_across(source->v_buffer, post->v_buffer, source->uv_stride, post->uv_stride, source->uv_height, source->uv_width, ppl);
 }

+#if !(CONFIG_TEMPORAL_DENOISING)
 void vp8_de_noise(YV12_BUFFER_CONFIG         *source,
                  YV12_BUFFER_CONFIG         *post,
                  int                         q,
                  int                         low_var_thresh,
-                  int                         flag,
-                  vp8_postproc_rtcd_vtable_t *rtcd)
+                  int                         flag)
 {
    double level = 6.0e-05 * q * q * q - .0067 * q * q + .306 * q + .0065;
    int ppl = (int)(level + .5);
@@ -377,7 +372,7 @@ void vp8_de_noise(YV12_BUFFER_CONFIG         *source,
    (void) low_var_thresh;
    (void) flag;

-    POSTPROC_INVOKE(rtcd, downacross)(
+    vp8_post_proc_down_and_across(
        source->y_buffer + 2 * source->y_stride + 2,
        source->y_buffer + 2 * source->y_stride + 2,
        source->y_stride,
@@ -385,14 +380,14 @@ void vp8_de_noise(YV12_BUFFER_CONFIG         *source,
        source->y_height - 4,
        source->y_width - 4,
        ppl);
-    POSTPROC_INVOKE(rtcd, downacross)(
+    vp8_post_proc_down_and_across(
        source->u_buffer + 2 * source->uv_stride + 2,
        source->u_buffer + 2 * source->uv_stride + 2,
        source->uv_stride,
        source->uv_stride,
        source->uv_height - 4,
        source->uv_width - 4, ppl);
-    POSTPROC_INVOKE(rtcd, downacross)(
+    vp8_post_proc_down_and_across(
        source->v_buffer + 2 * source->uv_stride + 2,
        source->v_buffer + 2 * source->uv_stride + 2,
        source->uv_stride,
@@ -401,6 +396,7 @@ void vp8_de_noise(YV12_BUFFER_CONFIG         *source,
        source->uv_width - 4, ppl);

 }
+#endif

 double vp8_gaussian(double sigma, double mu, double x)
 {
@@ -408,9 +404,6 @@ double vp8_gaussian(double sigma, double mu, double x)
           (exp(-(x - mu) * (x - mu) / (2 * sigma * sigma)));
 }

-extern void (*vp8_clear_system_state)(void);
-
-
 static void fillrd(struct postproc_state *state, int q, int a)
 {
    char char_dist[300];
@@ -696,220 +689,7 @@ static void constrain_line (int x0, int *x1, int y0, int *y1, int width, int hei
    }
 }

-
-static void multiframe_quality_enhance_block
-(
-    int blksize, /* Currently only values supported are 16, 8, 4 */
-    int qcurr,
-    int qprev,
-    unsigned char *y,
-    unsigned char *u,
-    unsigned char *v,
-    int y_stride,
-    int uv_stride,
-    unsigned char *yd,
-    unsigned char *ud,
-    unsigned char *vd,
-    int yd_stride,
-    int uvd_stride
-)
-{
-    static const unsigned char VP8_ZEROS[16]=
-    {
-         0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
-    };
-    int blksizeby2 = blksize >> 1;
-    int qdiff = qcurr - qprev;
-
-    int i, j;
-    unsigned char *yp;
-    unsigned char *ydp;
-    unsigned char *up;
-    unsigned char *udp;
-    unsigned char *vp;
-    unsigned char *vdp;
-
-    unsigned int act, sse, sad, thr;
-    if (blksize == 16)
-    {
-        act = (vp8_variance_var16x16(yd, yd_stride, VP8_ZEROS, 0, &sse)+128)>>8;
-        sad = (vp8_variance_sad16x16(y, y_stride, yd, yd_stride, 0)+128)>>8;
-    }
-    else if (blksize == 8)
-    {
-        act = (vp8_variance_var8x8(yd, yd_stride, VP8_ZEROS, 0, &sse)+32)>>6;
-        sad = (vp8_variance_sad8x8(y, y_stride, yd, yd_stride, 0)+32)>>6;
-    }
-    else
-    {
-        act = (vp8_variance_var4x4(yd, yd_stride, VP8_ZEROS, 0, &sse)+8)>>4;
-        sad = (vp8_variance_sad4x4(y, y_stride, yd, yd_stride, 0)+8)>>4;
-    }
-    /* thr = qdiff/8 + log2(act) + log4(qprev) */
-    thr = (qdiff>>3);
-    while (act>>=1) thr++;
-    while (qprev>>=2) thr++;
-    if (sad < thr)
-    {
-        static const int roundoff = (1 << (MFQE_PRECISION - 1));
-        int ifactor = (sad << MFQE_PRECISION) / thr;
-        ifactor >>= (qdiff >> 5);
-        // TODO: SIMD optimize this section
-        if (ifactor)
-        {
-            int icfactor = (1 << MFQE_PRECISION) - ifactor;
-            for (yp = y, ydp = yd, i = 0; i < blksize; ++i, yp += y_stride, ydp += yd_stride)
-            {
-                for (j = 0; j < blksize; ++j)
-                    ydp[j] = (int)((yp[j] * ifactor + ydp[j] * icfactor + roundoff) >> MFQE_PRECISION);
-            }
-            for (up = u, udp = ud, i = 0; i < blksizeby2; ++i, up += uv_stride, udp += uvd_stride)
-            {
-                for (j = 0; j < blksizeby2; ++j)
-                    udp[j] = (int)((up[j] * ifactor + udp[j] * icfactor + roundoff) >> MFQE_PRECISION);
-            }
-            for (vp = v, vdp = vd, i = 0; i < blksizeby2; ++i, vp += uv_stride, vdp += uvd_stride)
-            {
-                for (j = 0; j < blksizeby2; ++j)
-                    vdp[j] = (int)((vp[j] * ifactor + vdp[j] * icfactor + roundoff) >> MFQE_PRECISION);
-            }
-        }
-    }
-    else
-    {
-        if (blksize == 16)
-        {
-            vp8_recon_copy16x16(y, y_stride, yd, yd_stride);
-            vp8_recon_copy8x8(u, uv_stride, ud, uvd_stride);
-            vp8_recon_copy8x8(v, uv_stride, vd, uvd_stride);
-        }
-        else if (blksize == 8)
-        {
-            vp8_recon_copy8x8(y, y_stride, yd, yd_stride);
-            for (up = u, udp = ud, i = 0; i < blksizeby2; ++i, up += uv_stride, udp += uvd_stride)
-                vpx_memcpy(udp, up, blksizeby2);
-            for (vp = v, vdp = vd, i = 0; i < blksizeby2; ++i, vp += uv_stride, vdp += uvd_stride)
-                vpx_memcpy(vdp, vp, blksizeby2);
-        }
-        else
-        {
-            for (yp = y, ydp = yd, i = 0; i < blksize; ++i, yp += y_stride, ydp += yd_stride)
-                vpx_memcpy(ydp, yp, blksize);
-            for (up = u, udp = ud, i = 0; i < blksizeby2; ++i, up += uv_stride, udp += uvd_stride)
-                vpx_memcpy(udp, up, blksizeby2);
-            for (vp = v, vdp = vd, i = 0; i < blksizeby2; ++i, vp += uv_stride, vdp += uvd_stride)
-                vpx_memcpy(vdp, vp, blksizeby2);
-        }
-    }
-}
-
-#if CONFIG_RUNTIME_CPU_DETECT
-#define RTCD_VTABLE(oci) (&(oci)->rtcd.postproc)
-#else
-#define RTCD_VTABLE(oci) NULL
-#endif
-
-void vp8_multiframe_quality_enhance
-(
-    VP8_COMMON *cm
-)
-{
-    YV12_BUFFER_CONFIG *show = cm->frame_to_show;
-    YV12_BUFFER_CONFIG *dest = &cm->post_proc_buffer;
-
-    FRAME_TYPE frame_type = cm->frame_type;
-    /* Point at base of Mb MODE_INFO list has motion vectors etc */
-    const MODE_INFO *mode_info_context = cm->mi;
-    int mb_row;
-    int mb_col;
-    int qcurr = cm->base_qindex;
-    int qprev = cm->postproc_state.last_base_qindex;
-
-    unsigned char *y_ptr, *u_ptr, *v_ptr;
-    unsigned char *yd_ptr, *ud_ptr, *vd_ptr;
-
-    /* Set up the buffer pointers */
-    y_ptr = show->y_buffer;
-    u_ptr = show->u_buffer;
-    v_ptr = show->v_buffer;
-    yd_ptr = dest->y_buffer;
-    ud_ptr = dest->u_buffer;
-    vd_ptr = dest->v_buffer;
-
-    /* postprocess each macro block */
-    for (mb_row = 0; mb_row < cm->mb_rows; mb_row++)
-    {
-        for (mb_col = 0; mb_col < cm->mb_cols; mb_col++)
-        {
-            /* if motion is high there will likely be no benefit */
-            if (((frame_type == INTER_FRAME &&
-                  abs(mode_info_context->mbmi.mv.as_mv.row) <= 10 &&
-                  abs(mode_info_context->mbmi.mv.as_mv.col) <= 10) ||
-                 (frame_type == KEY_FRAME)))
-            {
-                if (mode_info_context->mbmi.mode == B_PRED || mode_info_context->mbmi.mode == SPLITMV)
-                {
-                    int i, j;
-                    for (i=0; i<2; ++i)
-                        for (j=0; j<2; ++j)
-                            multiframe_quality_enhance_block(8,
-                                                             qcurr,
-                                                             qprev,
-                                                             y_ptr + 8*(i*show->y_stride+j),
-                                                             u_ptr + 4*(i*show->uv_stride+j),
-                                                             v_ptr + 4*(i*show->uv_stride+j),
-                                                             show->y_stride,
-                                                             show->uv_stride,
-                                                             yd_ptr + 8*(i*dest->y_stride+j),
-                                                             ud_ptr + 4*(i*dest->uv_stride+j),
-                                                             vd_ptr + 4*(i*dest->uv_stride+j),
-                                                             dest->y_stride,
-                                                             dest->uv_stride);
-                }
-                else
-                {
-                    multiframe_quality_enhance_block(16,
-                                                     qcurr,
-                                                     qprev,
-                                                     y_ptr,
-                                                     u_ptr,
-                                                     v_ptr,
-                                                     show->y_stride,
-                                                     show->uv_stride,
-                                                     yd_ptr,
-                                                     ud_ptr,
-                                                     vd_ptr,
-                                                     dest->y_stride,
-                                                     dest->uv_stride);
-
-                }
-            }
-            else
-            {
-                vp8_recon_copy16x16(y_ptr, show->y_stride, yd_ptr, dest->y_stride);
-                vp8_recon_copy8x8(u_ptr, show->uv_stride, ud_ptr, dest->uv_stride);
-                vp8_recon_copy8x8(v_ptr, show->uv_stride, vd_ptr, dest->uv_stride);
-            }
-            y_ptr += 16;
-            u_ptr += 8;
-            v_ptr += 8;
-            yd_ptr += 16;
-            ud_ptr += 8;
-            vd_ptr += 8;
-            mode_info_context++;     /* step to next MB */
-        }
-
-        y_ptr += show->y_stride  * 16 - 16 * cm->mb_cols;
-        u_ptr += show->uv_stride *  8 - 8 * cm->mb_cols;
-        v_ptr += show->uv_stride *  8 - 8 * cm->mb_cols;
-        yd_ptr += dest->y_stride  * 16 - 16 * cm->mb_cols;
-        ud_ptr += dest->uv_stride *  8 - 8 * cm->mb_cols;
-        vd_ptr += dest->uv_stride *  8 - 8 * cm->mb_cols;
-
-        mode_info_context++;         /* Skip border mb */
-    }
-}
-
+#if CONFIG_POSTPROC
 int vp8_post_proc_frame(VP8_COMMON *oci, YV12_BUFFER_CONFIG *dest, vp8_ppflags_t *ppflags)
 {
    int q = oci->filter_level * 10 / 6;
@@ -932,6 +712,7 @@ int vp8_post_proc_frame(VP8_COMMON *oci, YV12_BUFFER_CONFIG *dest, vp8_ppflags_t
        dest->y_height = oci->Height;
        dest->uv_height = dest->y_height / 2;
        oci->postproc_state.last_base_qindex = oci->base_qindex;
+        oci->postproc_state.last_frame_valid = 1;
        return 0;
    }

@@ -940,13 +721,19 @@ int vp8_post_proc_frame(VP8_COMMON *oci, YV12_BUFFER_CONFIG *dest, vp8_ppflags_t
    {
        if ((flags & VP8D_DEBLOCK) || (flags & VP8D_DEMACROBLOCK))
        {
-            if (vp8_yv12_alloc_frame_buffer(&oci->post_proc_buffer_int, oci->Width, oci->Height, VP8BORDERINPIXELS) >= 0)
-            {
-                oci->post_proc_buffer_int_used = 1;
-            }
+            int width = (oci->Width + 15) & ~15;
+            int height = (oci->Height + 15) & ~15;
+
+            if (vp8_yv12_alloc_frame_buffer(&oci->post_proc_buffer_int,
+                                            width, height, VP8BORDERINPIXELS))
+                vpx_internal_error(&oci->error, VPX_CODEC_MEM_ERROR,
+                                   "Failed to allocate MFQE framebuffer");
+
+            oci->post_proc_buffer_int_used = 1;
+
            // insure that postproc is set to all 0's so that post proc
            // doesn't pull random data in from edge
-            vpx_memset((&oci->post_proc_buffer_int)->buffer_alloc,126,(&oci->post_proc_buffer)->frame_size);
+            vpx_memset((&oci->post_proc_buffer_int)->buffer_alloc,128,(&oci->post_proc_buffer)->frame_size);

        }
    }
@@ -956,6 +743,7 @@ int vp8_post_proc_frame(VP8_COMMON *oci, YV12_BUFFER_CONFIG *dest, vp8_ppflags_t
 #endif

    if ((flags & VP8D_MFQE) &&
+         oci->postproc_state.last_frame_valid &&
         oci->current_video_frame >= 2 &&
         oci->base_qindex - oci->postproc_state.last_base_qindex >= 10)
    {
@@ -963,16 +751,16 @@ int vp8_post_proc_frame(VP8_COMMON *oci, YV12_BUFFER_CONFIG *dest, vp8_ppflags_t
        if (((flags & VP8D_DEBLOCK) || (flags & VP8D_DEMACROBLOCK)) &&
            oci->post_proc_buffer_int_used)
        {
-            vp8_yv12_copy_frame_ptr(&oci->post_proc_buffer, &oci->post_proc_buffer_int);
+            vp8_yv12_copy_frame(&oci->post_proc_buffer, &oci->post_proc_buffer_int);
            if (flags & VP8D_DEMACROBLOCK)
            {
                vp8_deblock_and_de_macro_block(&oci->post_proc_buffer_int, &oci->post_proc_buffer,
-                                               q + (deblock_level - 5) * 10, 1, 0, RTCD_VTABLE(oci));
+                                               q + (deblock_level - 5) * 10, 1, 0);
            }
            else if (flags & VP8D_DEBLOCK)
            {
                vp8_deblock(&oci->post_proc_buffer_int, &oci->post_proc_buffer,
-                            q, 1, 0, RTCD_VTABLE(oci));
+                            q, 1, 0);
            }
        }
        /* Move partially towards the base q of the previous frame */
@@ -981,20 +769,21 @@ int vp8_post_proc_frame(VP8_COMMON *oci, YV12_BUFFER_CONFIG *dest, vp8_ppflags_t
    else if (flags & VP8D_DEMACROBLOCK)
    {
        vp8_deblock_and_de_macro_block(oci->frame_to_show, &oci->post_proc_buffer,
-                                       q + (deblock_level - 5) * 10, 1, 0, RTCD_VTABLE(oci));
+                                       q + (deblock_level - 5) * 10, 1, 0);
        oci->postproc_state.last_base_qindex = oci->base_qindex;
    }
    else if (flags & VP8D_DEBLOCK)
    {
        vp8_deblock(oci->frame_to_show, &oci->post_proc_buffer,
-                    q, 1, 0, RTCD_VTABLE(oci));
+                    q, 1, 0);
        oci->postproc_state.last_base_qindex = oci->base_qindex;
    }
    else
    {
-        vp8_yv12_copy_frame_ptr(oci->frame_to_show, &oci->post_proc_buffer);
+        vp8_yv12_copy_frame(oci->frame_to_show, &oci->post_proc_buffer);
        oci->postproc_state.last_base_qindex = oci->base_qindex;
    }
+    oci->postproc_state.last_frame_valid = 1;

    if (flags & VP8D_ADDNOISE)
    {
@@ -1004,7 +793,7 @@ int vp8_post_proc_frame(VP8_COMMON *oci, YV12_BUFFER_CONFIG *dest, vp8_ppflags_t
            fillrd(&oci->postproc_state, 63 - q, noise_level);
        }

-        POSTPROC_INVOKE(RTCD_VTABLE(oci), addnoise)
+        vp8_plane_add_noise
        (oci->post_proc_buffer.y_buffer,
         oci->postproc_state.noise,
         oci->postproc_state.blackclamp,
@@ -1302,7 +1091,7 @@ int vp8_post_proc_frame(VP8_COMMON *oci, YV12_BUFFER_CONFIG *dest, vp8_ppflags_t
                                U = B_PREDICTION_MODE_colors[bmi->as_mode][1];
                                V = B_PREDICTION_MODE_colors[bmi->as_mode][2];

-                                POSTPROC_INVOKE(RTCD_VTABLE(oci), blend_b)
+                                vp8_blend_b
                                    (yl+bx, ul+(bx>>1), vl+(bx>>1), Y, U, V, 0xc000, y_stride);
                            }
                            bmi++;
@@ -1319,7 +1108,7 @@ int vp8_post_proc_frame(VP8_COMMON *oci, YV12_BUFFER_CONFIG *dest, vp8_ppflags_t
                    U = MB_PREDICTION_MODE_colors[mi->mbmi.mode][1];
                    V = MB_PREDICTION_MODE_colors[mi->mbmi.mode][2];

-                    POSTPROC_INVOKE(RTCD_VTABLE(oci), blend_mb_inner)
+                    vp8_blend_mb_inner
                        (y_ptr+x, u_ptr+(x>>1), v_ptr+(x>>1), Y, U, V, 0xc000, y_stride);
                }

@@ -1358,7 +1147,7 @@ int vp8_post_proc_frame(VP8_COMMON *oci, YV12_BUFFER_CONFIG *dest, vp8_ppflags_t
                    U = MV_REFERENCE_FRAME_colors[mi->mbmi.ref_frame][1];
                    V = MV_REFERENCE_FRAME_colors[mi->mbmi.ref_frame][2];

-                    POSTPROC_INVOKE(RTCD_VTABLE(oci), blend_mb_outer)
+                    vp8_blend_mb_outer
                        (y_ptr+x, u_ptr+(x>>1), v_ptr+(x>>1), Y, U, V, 0xc000, y_stride);
                }

@@ -1381,3 +1170,4 @@ int vp8_post_proc_frame(VP8_COMMON *oci, YV12_BUFFER_CONFIG *dest, vp8_ppflags_t
    dest->uv_height = dest->y_height / 2;
    return 0;
 }
+#endif
--- a/vp8/common/postproc.h
+++ b/vp8/common/postproc.h
@@ -12,92 +12,6 @@
 #ifndef POSTPROC_H
 #define POSTPROC_H

-#define prototype_postproc_inplace(sym)\
-    void sym (unsigned char *dst, int pitch, int rows, int cols,int flimit)
-
-#define prototype_postproc(sym)\
-    void sym (unsigned char *src, unsigned char *dst, int src_pitch,\
-              int dst_pitch, int rows, int cols, int flimit)
-
-#define prototype_postproc_addnoise(sym) \
-    void sym (unsigned char *s, char *noise, char blackclamp[16],\
-              char whiteclamp[16], char bothclamp[16],\
-              unsigned int w, unsigned int h, int pitch)
-
-#define prototype_postproc_blend_mb_inner(sym)\
-    void sym (unsigned char *y, unsigned char *u, unsigned char *v,\
-              int y1, int u1, int v1, int alpha, int stride)
-
-#define prototype_postproc_blend_mb_outer(sym)\
-    void sym (unsigned char *y, unsigned char *u, unsigned char *v,\
-              int y1, int u1, int v1, int alpha, int stride)
-
-#define prototype_postproc_blend_b(sym)\
-    void sym (unsigned char *y, unsigned char *u, unsigned char *v,\
-              int y1, int u1, int v1, int alpha, int stride)
-
-#if ARCH_X86 || ARCH_X86_64
-#include "x86/postproc_x86.h"
-#endif
-
-#ifndef vp8_postproc_down
-#define vp8_postproc_down vp8_mbpost_proc_down_c
-#endif
-extern prototype_postproc_inplace(vp8_postproc_down);
-
-#ifndef vp8_postproc_across
-#define vp8_postproc_across vp8_mbpost_proc_across_ip_c
-#endif
-extern prototype_postproc_inplace(vp8_postproc_across);
-
-#ifndef vp8_postproc_downacross
-#define vp8_postproc_downacross vp8_post_proc_down_and_across_c
-#endif
-extern prototype_postproc(vp8_postproc_downacross);
-
-#ifndef vp8_postproc_addnoise
-#define vp8_postproc_addnoise vp8_plane_add_noise_c
-#endif
-extern prototype_postproc_addnoise(vp8_postproc_addnoise);
-
-#ifndef vp8_postproc_blend_mb_inner
-#define vp8_postproc_blend_mb_inner vp8_blend_mb_inner_c
-#endif
-extern prototype_postproc_blend_mb_inner(vp8_postproc_blend_mb_inner);
-
-#ifndef vp8_postproc_blend_mb_outer
-#define vp8_postproc_blend_mb_outer vp8_blend_mb_outer_c
-#endif
-extern prototype_postproc_blend_mb_outer(vp8_postproc_blend_mb_outer);
-
-#ifndef vp8_postproc_blend_b
-#define vp8_postproc_blend_b vp8_blend_b_c
-#endif
-extern prototype_postproc_blend_b(vp8_postproc_blend_b);
-
-typedef prototype_postproc((*vp8_postproc_fn_t));
-typedef prototype_postproc_inplace((*vp8_postproc_inplace_fn_t));
-typedef prototype_postproc_addnoise((*vp8_postproc_addnoise_fn_t));
-typedef prototype_postproc_blend_mb_inner((*vp8_postproc_blend_mb_inner_fn_t));
-typedef prototype_postproc_blend_mb_outer((*vp8_postproc_blend_mb_outer_fn_t));
-typedef prototype_postproc_blend_b((*vp8_postproc_blend_b_fn_t));
-typedef struct
-{
-    vp8_postproc_inplace_fn_t           down;
-    vp8_postproc_inplace_fn_t           across;
-    vp8_postproc_fn_t                   downacross;
-    vp8_postproc_addnoise_fn_t          addnoise;
-    vp8_postproc_blend_mb_inner_fn_t    blend_mb_inner;
-    vp8_postproc_blend_mb_outer_fn_t    blend_mb_outer;
-    vp8_postproc_blend_b_fn_t           blend_b;
-} vp8_postproc_rtcd_vtable_t;
-
-#if CONFIG_RUNTIME_CPU_DETECT
-#define POSTPROC_INVOKE(ctx,fn) (ctx)->fn
-#else
-#define POSTPROC_INVOKE(ctx,fn) vp8_postproc_##fn
-#endif
-
 #include "vpx_ports/mem.h"
 struct postproc_state
 {
@@ -105,6 +19,7 @@ struct postproc_state
    int           last_noise;
    char          noise[3072];
    int           last_base_qindex;
+    int           last_frame_valid;
    DECLARE_ALIGNED(16, char, blackclamp[16]);
    DECLARE_ALIGNED(16, char, whiteclamp[16]);
    DECLARE_ALIGNED(16, char, bothclamp[16]);
@@ -119,13 +34,15 @@ void vp8_de_noise(YV12_BUFFER_CONFIG         *source,
                  YV12_BUFFER_CONFIG         *post,
                  int                         q,
                  int                         low_var_thresh,
-                  int                         flag,
-                  vp8_postproc_rtcd_vtable_t *rtcd);
+                  int                         flag);

 void vp8_deblock(YV12_BUFFER_CONFIG         *source,
                 YV12_BUFFER_CONFIG         *post,
                 int                         q,
                 int                         low_var_thresh,
-                 int                         flag,
-                 vp8_postproc_rtcd_vtable_t *rtcd);
+                 int                         flag);
+
+#define MFQE_PRECISION 4
+
+void vp8_multiframe_quality_enhance(struct VP8Common *cm);
 #endif
--- a/vp8/encoder/ppc/sad_altivec.asm
+++ b/vp8/encoder/ppc/sad_altivec.asm
--- a/vp8/encoder/ppc/variance_altivec.asm
+++ b/vp8/encoder/ppc/variance_altivec.asm
@@ -98,7 +98,7 @@
    stw     r4, 0(r7)           ;# sse

    mullw   r3, r3, r3          ;# sum*sum
-    srawi   r3, r3, \DS         ;# (sum*sum) >> DS
+    srlwi   r3, r3, \DS         ;# (sum*sum) >> DS
    subf    r3, r3, r4          ;# sse - ((sum*sum) >> DS)
 .endm

@@ -142,7 +142,7 @@
    stw     r4, 0(r7)           ;# sse

    mullw   r3, r3, r3          ;# sum*sum
-    srawi   r3, r3, \DS         ;# (sum*sum) >> 8
+    srlwi   r3, r3, \DS         ;# (sum*sum) >> 8
    subf    r3, r3, r4          ;# sse - ((sum*sum) >> 8)
 .endm

@@ -367,7 +367,7 @@ vp8_variance4x4_ppc:
    stw     r4, 0(r7)           ;# sse

    mullw   r3, r3, r3          ;# sum*sum
-    srawi   r3, r3, 4           ;# (sum*sum) >> 4
+    srlwi   r3, r3, 4           ;# (sum*sum) >> 4
    subf    r3, r3, r4          ;# sse - ((sum*sum) >> 4)

    epilogue
--- a/vp8/encoder/ppc/variance_subpixel_altivec.asm
+++ b/vp8/encoder/ppc/variance_subpixel_altivec.asm
@@ -157,7 +157,7 @@
    stw     r4, 0(r9)           ;# sse

    mullw   r3, r3, r3          ;# sum*sum
-    srawi   r3, r3, \DS         ;# (sum*sum) >> 8
+    srlwi   r3, r3, \DS         ;# (sum*sum) >> 8
    subf    r3, r3, r4          ;# sse - ((sum*sum) >> 8)
 .endm

--- a/vp8/common/recon.h
+++ b/vp8/common/recon.h
@@ -1,111 +0,0 @@
-/*
- *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-
-#ifndef __INC_RECON_H
-#define __INC_RECON_H
-
-#include "blockd.h"
-
-#define prototype_copy_block(sym) \
-    void sym(unsigned char *src, int src_pitch, unsigned char *dst, int dst_pitch)
-
-#define prototype_recon_block(sym) \
-    void sym(unsigned char *pred, short *diff, int diff_stride, unsigned char *dst, int pitch)
-
-#define prototype_recon_macroblock(sym) \
-    void sym(const struct vp8_recon_rtcd_vtable *rtcd, MACROBLOCKD *x)
-
-#define prototype_build_intra_predictors(sym) \
-    void sym(MACROBLOCKD *x)
-
-#define prototype_intra4x4_predict(sym) \
-    void sym(unsigned char *src, int src_stride, int b_mode, \
-             unsigned char *dst, int dst_stride)
-
-struct vp8_recon_rtcd_vtable;
-
-#if ARCH_X86 || ARCH_X86_64
-#include "x86/recon_x86.h"
-#endif
-
-#if ARCH_ARM
-#include "arm/recon_arm.h"
-#endif
-
-#ifndef vp8_recon_copy16x16
-#define vp8_recon_copy16x16 vp8_copy_mem16x16_c
-#endif
-extern prototype_copy_block(vp8_recon_copy16x16);
-
-#ifndef vp8_recon_copy8x8
-#define vp8_recon_copy8x8 vp8_copy_mem8x8_c
-#endif
-extern prototype_copy_block(vp8_recon_copy8x8);
-
-#ifndef vp8_recon_copy8x4
-#define vp8_recon_copy8x4 vp8_copy_mem8x4_c
-#endif
-extern prototype_copy_block(vp8_recon_copy8x4);
-
-#ifndef vp8_recon_build_intra_predictors_mby
-#define vp8_recon_build_intra_predictors_mby vp8_build_intra_predictors_mby
-#endif
-extern prototype_build_intra_predictors\
-    (vp8_recon_build_intra_predictors_mby);
-
-#ifndef vp8_recon_build_intra_predictors_mby_s
-#define vp8_recon_build_intra_predictors_mby_s vp8_build_intra_predictors_mby_s
-#endif
-extern prototype_build_intra_predictors\
-    (vp8_recon_build_intra_predictors_mby_s);
-
-#ifndef vp8_recon_build_intra_predictors_mbuv
-#define vp8_recon_build_intra_predictors_mbuv vp8_build_intra_predictors_mbuv
-#endif
-extern prototype_build_intra_predictors\
-    (vp8_recon_build_intra_predictors_mbuv);
-
-#ifndef vp8_recon_build_intra_predictors_mbuv_s
-#define vp8_recon_build_intra_predictors_mbuv_s vp8_build_intra_predictors_mbuv_s
-#endif
-extern prototype_build_intra_predictors\
-    (vp8_recon_build_intra_predictors_mbuv_s);
-
-#ifndef vp8_recon_intra4x4_predict
-#define vp8_recon_intra4x4_predict vp8_intra4x4_predict_c
-#endif
-extern prototype_intra4x4_predict\
-    (vp8_recon_intra4x4_predict);
-
-
-typedef prototype_copy_block((*vp8_copy_block_fn_t));
-typedef prototype_build_intra_predictors((*vp8_build_intra_pred_fn_t));
-typedef prototype_intra4x4_predict((*vp8_intra4x4_pred_fn_t));
-typedef struct vp8_recon_rtcd_vtable
-{
-    vp8_copy_block_fn_t  copy16x16;
-    vp8_copy_block_fn_t  copy8x8;
-    vp8_copy_block_fn_t  copy8x4;
-
-    vp8_build_intra_pred_fn_t  build_intra_predictors_mby_s;
-    vp8_build_intra_pred_fn_t  build_intra_predictors_mby;
-    vp8_build_intra_pred_fn_t  build_intra_predictors_mbuv_s;
-    vp8_build_intra_pred_fn_t  build_intra_predictors_mbuv;
-    vp8_intra4x4_pred_fn_t intra4x4_predict;
-} vp8_recon_rtcd_vtable_t;
-
-#if CONFIG_RUNTIME_CPU_DETECT
-#define RECON_INVOKE(ctx,fn) (ctx)->fn
-#else
-#define RECON_INVOKE(ctx,fn) vp8_recon_##fn
-#endif
-
-#endif
--- a/vp8/common/reconinter.c
+++ b/vp8/common/reconinter.c
@@ -9,10 +9,10 @@
 */


+#include <limits.h>
 #include "vpx_config.h"
+#include "vpx_rtcd.h"
 #include "vpx/vpx_integer.h"
-#include "recon.h"
-#include "subpixel.h"
 #include "blockd.h"
 #include "reconinter.h"
 #if CONFIG_RUNTIME_CPU_DETECT
@@ -123,25 +123,19 @@ void vp8_copy_mem8x4_c(
 }


-void vp8_build_inter_predictors_b(BLOCKD *d, int pitch, vp8_subpix_fn_t sppf)
+void vp8_build_inter_predictors_b(BLOCKD *d, int pitch, unsigned char *base_pre, int pre_stride, vp8_subpix_fn_t sppf)
 {
    int r;
-    unsigned char *ptr_base;
-    unsigned char *ptr;
    unsigned char *pred_ptr = d->predictor;
-
-    ptr_base = *(d->base_pre);
+    unsigned char *ptr;
+    ptr = base_pre + d->offset + (d->bmi.mv.as_mv.row >> 3) * pre_stride + (d->bmi.mv.as_mv.col >> 3);

    if (d->bmi.mv.as_mv.row & 7 || d->bmi.mv.as_mv.col & 7)
    {
-        ptr = ptr_base + d->pre + (d->bmi.mv.as_mv.row >> 3) * d->pre_stride + (d->bmi.mv.as_mv.col >> 3);
-        sppf(ptr, d->pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, pred_ptr, pitch);
+        sppf(ptr, pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, pred_ptr, pitch);
    }
    else
    {
-        ptr_base += d->pre + (d->bmi.mv.as_mv.row >> 3) * d->pre_stride + (d->bmi.mv.as_mv.col >> 3);
-        ptr = ptr_base;
-
        for (r = 0; r < 4; r++)
        {
 #if !(CONFIG_FAST_UNALIGNED)
@@ -153,65 +147,53 @@ void vp8_build_inter_predictors_b(BLOCKD *d, int pitch, vp8_subpix_fn_t sppf)
            *(uint32_t *)pred_ptr = *(uint32_t *)ptr ;
 #endif
            pred_ptr     += pitch;
-            ptr         += d->pre_stride;
+            ptr         += pre_stride;
        }
    }
 }

-static void build_inter_predictors4b(MACROBLOCKD *x, BLOCKD *d, unsigned char *dst, int dst_stride)
+static void build_inter_predictors4b(MACROBLOCKD *x, BLOCKD *d, unsigned char *dst, int dst_stride, unsigned char *base_pre, int pre_stride)
 {
-    unsigned char *ptr_base;
    unsigned char *ptr;
-
-    ptr_base = *(d->base_pre);
-    ptr = ptr_base + d->pre + (d->bmi.mv.as_mv.row >> 3) * d->pre_stride + (d->bmi.mv.as_mv.col >> 3);
+    ptr = base_pre + d->offset + (d->bmi.mv.as_mv.row >> 3) * pre_stride + (d->bmi.mv.as_mv.col >> 3);

    if (d->bmi.mv.as_mv.row & 7 || d->bmi.mv.as_mv.col & 7)
    {
-        x->subpixel_predict8x8(ptr, d->pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, dst, dst_stride);
+        x->subpixel_predict8x8(ptr, pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, dst, dst_stride);
    }
    else
    {
-        RECON_INVOKE(&x->rtcd->recon, copy8x8)(ptr, d->pre_stride, dst, dst_stride);
+        vp8_copy_mem8x8(ptr, pre_stride, dst, dst_stride);
    }
 }

-static void build_inter_predictors2b(MACROBLOCKD *x, BLOCKD *d, unsigned char *dst, int dst_stride)
+static void build_inter_predictors2b(MACROBLOCKD *x, BLOCKD *d, unsigned char *dst, int dst_stride, unsigned char *base_pre, int pre_stride)
 {
-    unsigned char *ptr_base;
    unsigned char *ptr;
-
-    ptr_base = *(d->base_pre);
-    ptr = ptr_base + d->pre + (d->bmi.mv.as_mv.row >> 3) * d->pre_stride + (d->bmi.mv.as_mv.col >> 3);
+    ptr = base_pre + d->offset + (d->bmi.mv.as_mv.row >> 3) * pre_stride + (d->bmi.mv.as_mv.col >> 3);

    if (d->bmi.mv.as_mv.row & 7 || d->bmi.mv.as_mv.col & 7)
    {
-        x->subpixel_predict8x4(ptr, d->pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, dst, dst_stride);
+        x->subpixel_predict8x4(ptr, pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, dst, dst_stride);
    }
    else
    {
-        RECON_INVOKE(&x->rtcd->recon, copy8x4)(ptr, d->pre_stride, dst, dst_stride);
+        vp8_copy_mem8x4(ptr, pre_stride, dst, dst_stride);
    }
 }

-static void build_inter_predictors_b(BLOCKD *d, unsigned char *dst, int dst_stride, vp8_subpix_fn_t sppf)
+static void build_inter_predictors_b(BLOCKD *d, unsigned char *dst, int dst_stride, unsigned char *base_pre, int pre_stride, vp8_subpix_fn_t sppf)
 {
    int r;
-    unsigned char *ptr_base;
    unsigned char *ptr;
-
-    ptr_base = *(d->base_pre);
+    ptr = base_pre + d->offset + (d->bmi.mv.as_mv.row >> 3) * pre_stride + (d->bmi.mv.as_mv.col >> 3);

    if (d->bmi.mv.as_mv.row & 7 || d->bmi.mv.as_mv.col & 7)
    {
-        ptr = ptr_base + d->pre + (d->bmi.mv.as_mv.row >> 3) * d->pre_stride + (d->bmi.mv.as_mv.col >> 3);
-        sppf(ptr, d->pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, dst, dst_stride);
+        sppf(ptr, pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, dst, dst_stride);
    }
    else
    {
-        ptr_base += d->pre + (d->bmi.mv.as_mv.row >> 3) * d->pre_stride + (d->bmi.mv.as_mv.col >> 3);
-        ptr = ptr_base;
-
        for (r = 0; r < 4; r++)
        {
 #if !(CONFIG_FAST_UNALIGNED)
@@ -223,7 +205,7 @@ static void build_inter_predictors_b(BLOCKD *d, unsigned char *dst, int dst_stri
            *(uint32_t *)dst = *(uint32_t *)ptr ;
 #endif
            dst     += dst_stride;
-            ptr         += d->pre_stride;
+            ptr     += pre_stride;
        }
    }
 }
@@ -239,22 +221,13 @@ void vp8_build_inter16x16_predictors_mbuv(MACROBLOCKD *x)
    int mv_row = x->mode_info_context->mbmi.mv.as_mv.row;
    int mv_col = x->mode_info_context->mbmi.mv.as_mv.col;
    int offset;
-    int pre_stride = x->block[16].pre_stride;
+    int pre_stride = x->pre.uv_stride;

    /* calc uv motion vectors */
-    if (mv_row < 0)
-        mv_row -= 1;
-    else
-        mv_row += 1;
-
-    if (mv_col < 0)
-        mv_col -= 1;
-    else
-        mv_col += 1;
-
+    mv_row += 1 | (mv_row >> (sizeof(int) * CHAR_BIT - 1));
+    mv_col += 1 | (mv_col >> (sizeof(int) * CHAR_BIT - 1));
    mv_row /= 2;
    mv_col /= 2;
-
    mv_row &= x->fullpixel_mask;
    mv_col &= x->fullpixel_mask;

@@ -269,8 +242,8 @@ void vp8_build_inter16x16_predictors_mbuv(MACROBLOCKD *x)
    }
    else
    {
-        RECON_INVOKE(&x->rtcd->recon, copy8x8)(uptr, pre_stride, upred_ptr, 8);
-        RECON_INVOKE(&x->rtcd->recon, copy8x8)(vptr, pre_stride, vpred_ptr, 8);
+        vp8_copy_mem8x8(uptr, pre_stride, upred_ptr, 8);
+        vp8_copy_mem8x8(vptr, pre_stride, vpred_ptr, 8);
    }
 }

@@ -278,6 +251,8 @@ void vp8_build_inter16x16_predictors_mbuv(MACROBLOCKD *x)
 void vp8_build_inter4x4_predictors_mbuv(MACROBLOCKD *x)
 {
    int i, j;
+    int pre_stride = x->pre.uv_stride;
+    unsigned char *base_pre;

    /* build uv mvs */
    for (i = 0; i < 2; i++)
@@ -295,8 +270,7 @@ void vp8_build_inter4x4_predictors_mbuv(MACROBLOCKD *x)
                   + x->block[yoffset+4].bmi.mv.as_mv.row
                   + x->block[yoffset+5].bmi.mv.as_mv.row;

-            if (temp < 0) temp -= 4;
-            else temp += 4;
+            temp += 4 + ((temp >> (sizeof(int) * CHAR_BIT - 1)) << 3);

            x->block[uoffset].bmi.mv.as_mv.row = (temp / 8) & x->fullpixel_mask;

@@ -305,29 +279,41 @@ void vp8_build_inter4x4_predictors_mbuv(MACROBLOCKD *x)
                   + x->block[yoffset+4].bmi.mv.as_mv.col
                   + x->block[yoffset+5].bmi.mv.as_mv.col;

-            if (temp < 0) temp -= 4;
-            else temp += 4;
+            temp += 4 + ((temp >> (sizeof(int) * CHAR_BIT - 1)) << 3);

            x->block[uoffset].bmi.mv.as_mv.col = (temp / 8) & x->fullpixel_mask;

-            x->block[voffset].bmi.mv.as_mv.row =
-                x->block[uoffset].bmi.mv.as_mv.row ;
-            x->block[voffset].bmi.mv.as_mv.col =
-                x->block[uoffset].bmi.mv.as_mv.col ;
+            x->block[voffset].bmi.mv.as_int = x->block[uoffset].bmi.mv.as_int;
        }
    }

-    for (i = 16; i < 24; i += 2)
+    base_pre = x->pre.u_buffer;
+    for (i = 16; i < 20; i += 2)
    {
        BLOCKD *d0 = &x->block[i];
        BLOCKD *d1 = &x->block[i+1];

        if (d0->bmi.mv.as_int == d1->bmi.mv.as_int)
-            build_inter_predictors2b(x, d0, d0->predictor, 8);
+            build_inter_predictors2b(x, d0, d0->predictor, 8, base_pre, pre_stride);
        else
        {
-            vp8_build_inter_predictors_b(d0, 8, x->subpixel_predict);
-            vp8_build_inter_predictors_b(d1, 8, x->subpixel_predict);
+            vp8_build_inter_predictors_b(d0, 8, base_pre, pre_stride, x->subpixel_predict);
+            vp8_build_inter_predictors_b(d1, 8, base_pre, pre_stride, x->subpixel_predict);
+        }
+    }
+
+    base_pre = x->pre.v_buffer;
+    for (i = 20; i < 24; i += 2)
+    {
+        BLOCKD *d0 = &x->block[i];
+        BLOCKD *d1 = &x->block[i+1];
+
+        if (d0->bmi.mv.as_int == d1->bmi.mv.as_int)
+            build_inter_predictors2b(x, d0, d0->predictor, 8, base_pre, pre_stride);
+        else
+        {
+            vp8_build_inter_predictors_b(d0, 8, base_pre, pre_stride, x->subpixel_predict);
+            vp8_build_inter_predictors_b(d1, 8, base_pre, pre_stride, x->subpixel_predict);
        }
    }
 }
@@ -342,7 +328,7 @@ void vp8_build_inter16x16_predictors_mby(MACROBLOCKD *x,
    unsigned char *ptr;
    int mv_row = x->mode_info_context->mbmi.mv.as_mv.row;
    int mv_col = x->mode_info_context->mbmi.mv.as_mv.col;
-    int pre_stride = x->block[0].pre_stride;
+    int pre_stride = x->pre.y_stride;

    ptr_base = x->pre.y_buffer;
    ptr = ptr_base + (mv_row >> 3) * pre_stride + (mv_col >> 3);
@@ -354,7 +340,7 @@ void vp8_build_inter16x16_predictors_mby(MACROBLOCKD *x,
    }
    else
    {
-        RECON_INVOKE(&x->rtcd->recon, copy16x16)(ptr, pre_stride, dst_y,
+        vp8_copy_mem16x16(ptr, pre_stride, dst_y,
            dst_ystride);
    }
 }
@@ -409,7 +395,7 @@ void vp8_build_inter16x16_predictors_mb(MACROBLOCKD *x,
    int_mv _16x16mv;

    unsigned char *ptr_base = x->pre.y_buffer;
-    int pre_stride = x->block[0].pre_stride;
+    int pre_stride = x->pre.y_stride;

    _16x16mv.as_int = x->mode_info_context->mbmi.mv.as_int;

@@ -426,23 +412,14 @@ void vp8_build_inter16x16_predictors_mb(MACROBLOCKD *x,
    }
    else
    {
-        RECON_INVOKE(&x->rtcd->recon, copy16x16)(ptr, pre_stride, dst_y, dst_ystride);
+        vp8_copy_mem16x16(ptr, pre_stride, dst_y, dst_ystride);
    }

    /* calc uv motion vectors */
-    if ( _16x16mv.as_mv.row < 0)
-      _16x16mv.as_mv.row -= 1;
-    else
-      _16x16mv.as_mv.row += 1;
-
-    if (_16x16mv.as_mv.col < 0)
-        _16x16mv.as_mv.col -= 1;
-    else
-        _16x16mv.as_mv.col += 1;
-
+    _16x16mv.as_mv.row += 1 | (_16x16mv.as_mv.row >> (sizeof(int) * CHAR_BIT - 1));
+    _16x16mv.as_mv.col += 1 | (_16x16mv.as_mv.col >> (sizeof(int) * CHAR_BIT - 1));
    _16x16mv.as_mv.row /= 2;
    _16x16mv.as_mv.col /= 2;
-
    _16x16mv.as_mv.row &= x->fullpixel_mask;
    _16x16mv.as_mv.col &= x->fullpixel_mask;

@@ -458,19 +435,21 @@ void vp8_build_inter16x16_predictors_mb(MACROBLOCKD *x,
    }
    else
    {
-        RECON_INVOKE(&x->rtcd->recon, copy8x8)(uptr, pre_stride, dst_u, dst_uvstride);
-        RECON_INVOKE(&x->rtcd->recon, copy8x8)(vptr, pre_stride, dst_v, dst_uvstride);
+        vp8_copy_mem8x8(uptr, pre_stride, dst_u, dst_uvstride);
+        vp8_copy_mem8x8(vptr, pre_stride, dst_v, dst_uvstride);
    }
 }

 static void build_inter4x4_predictors_mb(MACROBLOCKD *x)
 {
    int i;
+    unsigned char *base_dst = x->dst.y_buffer;
+    unsigned char *base_pre = x->pre.y_buffer;

    if (x->mode_info_context->mbmi.partitioning < 3)
    {
        BLOCKD *b;
-        int dst_stride = x->block[ 0].dst_stride;
+        int dst_stride = x->dst.y_stride;

        x->block[ 0].bmi = x->mode_info_context->bmi[ 0];
        x->block[ 2].bmi = x->mode_info_context->bmi[ 2];
@@ -485,13 +464,13 @@ static void build_inter4x4_predictors_mb(MACROBLOCKD *x)
        }

        b = &x->block[ 0];
-        build_inter_predictors4b(x, b, *(b->base_dst) + b->dst, dst_stride);
+        build_inter_predictors4b(x, b, base_dst + b->offset, dst_stride, base_pre, dst_stride);
        b = &x->block[ 2];
-        build_inter_predictors4b(x, b, *(b->base_dst) + b->dst, dst_stride);
+        build_inter_predictors4b(x, b, base_dst + b->offset, dst_stride, base_pre, dst_stride);
        b = &x->block[ 8];
-        build_inter_predictors4b(x, b, *(b->base_dst) + b->dst, dst_stride);
+        build_inter_predictors4b(x, b, base_dst + b->offset, dst_stride, base_pre, dst_stride);
        b = &x->block[10];
-        build_inter_predictors4b(x, b, *(b->base_dst) + b->dst, dst_stride);
+        build_inter_predictors4b(x, b, base_dst + b->offset, dst_stride, base_pre, dst_stride);
    }
    else
    {
@@ -499,7 +478,7 @@ static void build_inter4x4_predictors_mb(MACROBLOCKD *x)
        {
            BLOCKD *d0 = &x->block[i];
            BLOCKD *d1 = &x->block[i+1];
-            int dst_stride = x->block[ 0].dst_stride;
+            int dst_stride = x->dst.y_stride;

            x->block[i+0].bmi = x->mode_info_context->bmi[i+0];
            x->block[i+1].bmi = x->mode_info_context->bmi[i+1];
@@ -510,31 +489,51 @@ static void build_inter4x4_predictors_mb(MACROBLOCKD *x)
            }

            if (d0->bmi.mv.as_int == d1->bmi.mv.as_int)
-                build_inter_predictors2b(x, d0, *(d0->base_dst) + d0->dst, dst_stride);
+                build_inter_predictors2b(x, d0, base_dst + d0->offset, dst_stride, base_pre, dst_stride);
            else
            {
-                build_inter_predictors_b(d0, *(d0->base_dst) + d0->dst, dst_stride, x->subpixel_predict);
-                build_inter_predictors_b(d1, *(d1->base_dst) + d1->dst, dst_stride, x->subpixel_predict);
+                build_inter_predictors_b(d0, base_dst + d0->offset, dst_stride, base_pre, dst_stride, x->subpixel_predict);
+                build_inter_predictors_b(d1, base_dst + d1->offset, dst_stride, base_pre, dst_stride, x->subpixel_predict);
            }

        }

    }
-
-    for (i = 16; i < 24; i += 2)
+    base_dst = x->dst.u_buffer;
+    base_pre = x->pre.u_buffer;
+    for (i = 16; i < 20; i += 2)
    {
        BLOCKD *d0 = &x->block[i];
        BLOCKD *d1 = &x->block[i+1];
-        int dst_stride = x->block[ 16].dst_stride;
+        int dst_stride = x->dst.uv_stride;

        /* Note: uv mvs already clamped in build_4x4uvmvs() */

        if (d0->bmi.mv.as_int == d1->bmi.mv.as_int)
-            build_inter_predictors2b(x, d0, *(d0->base_dst) + d0->dst, dst_stride);
+            build_inter_predictors2b(x, d0, base_dst + d0->offset, dst_stride, base_pre, dst_stride);
        else
        {
-            build_inter_predictors_b(d0, *(d0->base_dst) + d0->dst, dst_stride, x->subpixel_predict);
-            build_inter_predictors_b(d1, *(d1->base_dst) + d1->dst, dst_stride, x->subpixel_predict);
+            build_inter_predictors_b(d0, base_dst + d0->offset, dst_stride, base_pre, dst_stride, x->subpixel_predict);
+            build_inter_predictors_b(d1, base_dst + d1->offset, dst_stride, base_pre, dst_stride, x->subpixel_predict);
+        }
+    }
+
+    base_dst = x->dst.v_buffer;
+    base_pre = x->pre.v_buffer;
+    for (i = 20; i < 24; i += 2)
+    {
+        BLOCKD *d0 = &x->block[i];
+        BLOCKD *d1 = &x->block[i+1];
+        int dst_stride = x->dst.uv_stride;
+
+        /* Note: uv mvs already clamped in build_4x4uvmvs() */
+
+        if (d0->bmi.mv.as_int == d1->bmi.mv.as_int)
+            build_inter_predictors2b(x, d0, base_dst + d0->offset, dst_stride, base_pre, dst_stride);
+        else
+        {
+            build_inter_predictors_b(d0, base_dst + d0->offset, dst_stride, base_pre, dst_stride, x->subpixel_predict);
+            build_inter_predictors_b(d1, base_dst + d1->offset, dst_stride, base_pre, dst_stride, x->subpixel_predict);
        }
    }
 }
@@ -559,8 +558,7 @@ void build_4x4uvmvs(MACROBLOCKD *x)
                 + x->mode_info_context->bmi[yoffset + 4].mv.as_mv.row
                 + x->mode_info_context->bmi[yoffset + 5].mv.as_mv.row;

-            if (temp < 0) temp -= 4;
-            else temp += 4;
+            temp += 4 + ((temp >> (sizeof(int) * CHAR_BIT - 1)) << 3);

            x->block[uoffset].bmi.mv.as_mv.row = (temp / 8) & x->fullpixel_mask;

@@ -569,18 +567,14 @@ void build_4x4uvmvs(MACROBLOCKD *x)
                 + x->mode_info_context->bmi[yoffset + 4].mv.as_mv.col
                 + x->mode_info_context->bmi[yoffset + 5].mv.as_mv.col;

-            if (temp < 0) temp -= 4;
-            else temp += 4;
+            temp += 4 + ((temp >> (sizeof(int) * CHAR_BIT - 1)) << 3);

            x->block[uoffset].bmi.mv.as_mv.col = (temp / 8) & x->fullpixel_mask;

            if (x->mode_info_context->mbmi.need_to_clamp_mvs)
                clamp_uvmv_to_umv_border(&x->block[uoffset].bmi.mv.as_mv, x);

-            x->block[voffset].bmi.mv.as_mv.row =
-                x->block[uoffset].bmi.mv.as_mv.row ;
-            x->block[voffset].bmi.mv.as_mv.col =
-                x->block[uoffset].bmi.mv.as_mv.col ;
+            x->block[voffset].bmi.mv.as_int = x->block[uoffset].bmi.mv.as_int;
        }
    }
 }
--- a/vp8/common/reconinter.h
+++ b/vp8/common/reconinter.h
@@ -25,6 +25,8 @@ extern void vp8_build_inter16x16_predictors_mby(MACROBLOCKD *x,
                                                unsigned char *dst_y,
                                                int dst_ystride);
 extern void vp8_build_inter_predictors_b(BLOCKD *d, int pitch,
+                                         unsigned char *base_pre,
+                                         int pre_stride,
                                         vp8_subpix_fn_t sppf);

 extern void vp8_build_inter16x16_predictors_mbuv(MACROBLOCKD *x);
--- a/vp8/common/reconintra.c
+++ b/vp8/common/reconintra.c
@@ -10,147 +10,24 @@


 #include "vpx_config.h"
-#include "recon.h"
-#include "reconintra.h"
+#include "vpx_rtcd.h"
 #include "vpx_mem/vpx_mem.h"
+#include "blockd.h"

-/* For skip_recon_mb(), add vp8_build_intra_predictors_mby_s(MACROBLOCKD *x) and
- * vp8_build_intra_predictors_mbuv_s(MACROBLOCKD *x).
- */
-
-void vp8_build_intra_predictors_mby(MACROBLOCKD *x)
+void vp8_build_intra_predictors_mby_s_c(MACROBLOCKD *x,
+                                          unsigned char * yabove_row,
+                                          unsigned char * yleft,
+                                          int left_stride,
+                                          unsigned char * ypred_ptr,
+                                          int y_stride)
 {
-
-    unsigned char *yabove_row = x->dst.y_buffer - x->dst.y_stride;
    unsigned char yleft_col[16];
    unsigned char ytop_left = yabove_row[-1];
-    unsigned char *ypred_ptr = x->predictor;
    int r, c, i;

    for (i = 0; i < 16; i++)
    {
-        yleft_col[i] = x->dst.y_buffer [i* x->dst.y_stride -1];
-    }
-
-    /* for Y */
-    switch (x->mode_info_context->mbmi.mode)
-    {
-    case DC_PRED:
-    {
-        int expected_dc;
-        int i;
-        int shift;
-        int average = 0;
-
-
-        if (x->up_available || x->left_available)
-        {
-            if (x->up_available)
-            {
-                for (i = 0; i < 16; i++)
-                {
-                    average += yabove_row[i];
-                }
-            }
-
-            if (x->left_available)
-            {
-
-                for (i = 0; i < 16; i++)
-                {
-                    average += yleft_col[i];
-                }
-
-            }
-
-
-
-            shift = 3 + x->up_available + x->left_available;
-            expected_dc = (average + (1 << (shift - 1))) >> shift;
-        }
-        else
-        {
-            expected_dc = 128;
-        }
-
-        vpx_memset(ypred_ptr, expected_dc, 256);
-    }
-    break;
-    case V_PRED:
-    {
-
-        for (r = 0; r < 16; r++)
-        {
-
-            ((int *)ypred_ptr)[0] = ((int *)yabove_row)[0];
-            ((int *)ypred_ptr)[1] = ((int *)yabove_row)[1];
-            ((int *)ypred_ptr)[2] = ((int *)yabove_row)[2];
-            ((int *)ypred_ptr)[3] = ((int *)yabove_row)[3];
-            ypred_ptr += 16;
-        }
-    }
-    break;
-    case H_PRED:
-    {
-
-        for (r = 0; r < 16; r++)
-        {
-
-            vpx_memset(ypred_ptr, yleft_col[r], 16);
-            ypred_ptr += 16;
-        }
-
-    }
-    break;
-    case TM_PRED:
-    {
-
-        for (r = 0; r < 16; r++)
-        {
-            for (c = 0; c < 16; c++)
-            {
-                int pred =  yleft_col[r] + yabove_row[ c] - ytop_left;
-
-                if (pred < 0)
-                    pred = 0;
-
-                if (pred > 255)
-                    pred = 255;
-
-                ypred_ptr[c] = pred;
-            }
-
-            ypred_ptr += 16;
-        }
-
-    }
-    break;
-    case B_PRED:
-    case NEARESTMV:
-    case NEARMV:
-    case ZEROMV:
-    case NEWMV:
-    case SPLITMV:
-    case MB_MODE_COUNT:
-        break;
-    }
-}
-
-void vp8_build_intra_predictors_mby_s(MACROBLOCKD *x)
-{
-
-    unsigned char *yabove_row = x->dst.y_buffer - x->dst.y_stride;
-    unsigned char yleft_col[16];
-    unsigned char ytop_left = yabove_row[-1];
-    unsigned char *ypred_ptr = x->predictor;
-    int r, c, i;
-
-    int y_stride = x->dst.y_stride;
-    ypred_ptr = x->dst.y_buffer; /*x->predictor;*/
-
-    for (i = 0; i < 16; i++)
-    {
-        yleft_col[i] = x->dst.y_buffer [i* x->dst.y_stride -1];
+        yleft_col[i] = yleft[i* left_stride];
    }

    /* for Y */
@@ -198,7 +75,7 @@ void vp8_build_intra_predictors_mby_s(MACROBLOCKD *x)
        for (r = 0; r < 16; r++)
        {
            vpx_memset(ypred_ptr, expected_dc, 16);
-            ypred_ptr += y_stride; /*16;*/
+            ypred_ptr += y_stride;
        }
    }
    break;
@@ -212,7 +89,7 @@ void vp8_build_intra_predictors_mby_s(MACROBLOCKD *x)
            ((int *)ypred_ptr)[1] = ((int *)yabove_row)[1];
            ((int *)ypred_ptr)[2] = ((int *)yabove_row)[2];
            ((int *)ypred_ptr)[3] = ((int *)yabove_row)[3];
-            ypred_ptr += y_stride; /*16;*/
+            ypred_ptr += y_stride;
        }
    }
    break;
@@ -223,7 +100,7 @@ void vp8_build_intra_predictors_mby_s(MACROBLOCKD *x)
        {

            vpx_memset(ypred_ptr, yleft_col[r], 16);
-            ypred_ptr += y_stride;  /*16;*/
+            ypred_ptr += y_stride;
        }

    }
@@ -246,7 +123,7 @@ void vp8_build_intra_predictors_mby_s(MACROBLOCKD *x)
                ypred_ptr[c] = pred;
            }

-            ypred_ptr += y_stride;  /*16;*/
+            ypred_ptr += y_stride;
        }

    }
@@ -262,162 +139,27 @@ void vp8_build_intra_predictors_mby_s(MACROBLOCKD *x)
    }
 }

-void vp8_build_intra_predictors_mbuv(MACROBLOCKD *x)
+void vp8_build_intra_predictors_mbuv_s_c(MACROBLOCKD *x,
+                                         unsigned char * uabove_row,
+                                         unsigned char * vabove_row,
+                                         unsigned char * uleft,
+                                         unsigned char * vleft,
+                                         int left_stride,
+                                         unsigned char * upred_ptr,
+                                         unsigned char * vpred_ptr,
+                                         int pred_stride)
 {
-    unsigned char *uabove_row = x->dst.u_buffer - x->dst.uv_stride;
-    unsigned char uleft_col[16];
+    unsigned char uleft_col[8];
    unsigned char utop_left = uabove_row[-1];
-    unsigned char *vabove_row = x->dst.v_buffer - x->dst.uv_stride;
-    unsigned char vleft_col[20];
+    unsigned char vleft_col[8];
    unsigned char vtop_left = vabove_row[-1];
-    unsigned char *upred_ptr = &x->predictor[256];
-    unsigned char *vpred_ptr = &x->predictor[320];
-    int i, j;
-
-    for (i = 0; i < 8; i++)
-    {
-        uleft_col[i] = x->dst.u_buffer [i* x->dst.uv_stride -1];
-        vleft_col[i] = x->dst.v_buffer [i* x->dst.uv_stride -1];
-    }
-
-    switch (x->mode_info_context->mbmi.uv_mode)
-    {
-    case DC_PRED:
-    {
-        int expected_udc;
-        int expected_vdc;
-        int i;
-        int shift;
-        int Uaverage = 0;
-        int Vaverage = 0;
-
-        if (x->up_available)
-        {
-            for (i = 0; i < 8; i++)
-            {
-                Uaverage += uabove_row[i];
-                Vaverage += vabove_row[i];
-            }
-        }
-
-        if (x->left_available)
-        {
-            for (i = 0; i < 8; i++)
-            {
-                Uaverage += uleft_col[i];
-                Vaverage += vleft_col[i];
-            }
-        }
-
-        if (!x->up_available && !x->left_available)
-        {
-            expected_udc = 128;
-            expected_vdc = 128;
-        }
-        else
-        {
-            shift = 2 + x->up_available + x->left_available;
-            expected_udc = (Uaverage + (1 << (shift - 1))) >> shift;
-            expected_vdc = (Vaverage + (1 << (shift - 1))) >> shift;
-        }
-
-
-        vpx_memset(upred_ptr, expected_udc, 64);
-        vpx_memset(vpred_ptr, expected_vdc, 64);
-
-
-    }
-    break;
-    case V_PRED:
-    {
-        int i;
-
-        for (i = 0; i < 8; i++)
-        {
-            vpx_memcpy(upred_ptr, uabove_row, 8);
-            vpx_memcpy(vpred_ptr, vabove_row, 8);
-            upred_ptr += 8;
-            vpred_ptr += 8;
-        }
-
-    }
-    break;
-    case H_PRED:
-    {
-        int i;
-
-        for (i = 0; i < 8; i++)
-        {
-            vpx_memset(upred_ptr, uleft_col[i], 8);
-            vpx_memset(vpred_ptr, vleft_col[i], 8);
-            upred_ptr += 8;
-            vpred_ptr += 8;
-        }
-    }
-
-    break;
-    case TM_PRED:
-    {
-        int i;
-
-        for (i = 0; i < 8; i++)
-        {
-            for (j = 0; j < 8; j++)
-            {
-                int predu = uleft_col[i] + uabove_row[j] - utop_left;
-                int predv = vleft_col[i] + vabove_row[j] - vtop_left;
-
-                if (predu < 0)
-                    predu = 0;
-
-                if (predu > 255)
-                    predu = 255;
-
-                if (predv < 0)
-                    predv = 0;
-
-                if (predv > 255)
-                    predv = 255;
-
-                upred_ptr[j] = predu;
-                vpred_ptr[j] = predv;
-            }
-
-            upred_ptr += 8;
-            vpred_ptr += 8;
-        }
-
-    }
-    break;
-    case B_PRED:
-    case NEARESTMV:
-    case NEARMV:
-    case ZEROMV:
-    case NEWMV:
-    case SPLITMV:
-    case MB_MODE_COUNT:
-        break;
-    }
-}
-
-void vp8_build_intra_predictors_mbuv_s(MACROBLOCKD *x)
-{
-    unsigned char *uabove_row = x->dst.u_buffer - x->dst.uv_stride;
-    unsigned char uleft_col[16];
-    unsigned char utop_left = uabove_row[-1];
-    unsigned char *vabove_row = x->dst.v_buffer - x->dst.uv_stride;
-    unsigned char vleft_col[20];
-    unsigned char vtop_left = vabove_row[-1];
-    unsigned char *upred_ptr = x->dst.u_buffer; /*&x->predictor[256];*/
-    unsigned char *vpred_ptr = x->dst.v_buffer; /*&x->predictor[320];*/
-    int uv_stride = x->dst.uv_stride;

    int i, j;

    for (i = 0; i < 8; i++)
    {
-        uleft_col[i] = x->dst.u_buffer [i* x->dst.uv_stride -1];
-        vleft_col[i] = x->dst.v_buffer [i* x->dst.uv_stride -1];
+        uleft_col[i] = uleft [i* left_stride];
+        vleft_col[i] = vleft [i* left_stride];
    }

    switch (x->mode_info_context->mbmi.uv_mode)
@@ -468,8 +210,8 @@ void vp8_build_intra_predictors_mbuv_s(MACROBLOCKD *x)
        {
            vpx_memset(upred_ptr, expected_udc, 8);
            vpx_memset(vpred_ptr, expected_vdc, 8);
-            upred_ptr += uv_stride; /*8;*/
-            vpred_ptr += uv_stride; /*8;*/
+            upred_ptr += pred_stride;
+            vpred_ptr += pred_stride;
        }
    }
    break;
@@ -481,8 +223,8 @@ void vp8_build_intra_predictors_mbuv_s(MACROBLOCKD *x)
        {
            vpx_memcpy(upred_ptr, uabove_row, 8);
            vpx_memcpy(vpred_ptr, vabove_row, 8);
-            upred_ptr += uv_stride; /*8;*/
-            vpred_ptr += uv_stride; /*8;*/
+            upred_ptr += pred_stride;
+            vpred_ptr += pred_stride;
        }

    }
@@ -495,8 +237,8 @@ void vp8_build_intra_predictors_mbuv_s(MACROBLOCKD *x)
        {
            vpx_memset(upred_ptr, uleft_col[i], 8);
            vpx_memset(vpred_ptr, vleft_col[i], 8);
-            upred_ptr += uv_stride; /*8;*/
-            vpred_ptr += uv_stride; /*8;*/
+            upred_ptr += pred_stride;
+            vpred_ptr += pred_stride;
        }
    }

@@ -528,8 +270,8 @@ void vp8_build_intra_predictors_mbuv_s(MACROBLOCKD *x)
                vpred_ptr[j] = predv;
            }

-            upred_ptr += uv_stride; /*8;*/
-            vpred_ptr += uv_stride; /*8;*/
+            upred_ptr += pred_stride;
+            vpred_ptr += pred_stride;
        }

    }
--- a/vp8/common/reconintra4x4.c
+++ b/vp8/common/reconintra4x4.c
@@ -9,22 +9,23 @@
 */


-#include "recon.h"
+#include "vpx_config.h"
+#include "vpx_rtcd.h"
+#include "blockd.h"

-void vp8_intra4x4_predict_c(unsigned char *src, int src_stride,
-                            int b_mode,
-                            unsigned char *dst, int dst_stride)
+void vp8_intra4x4_predict_d_c(unsigned char *Above,
+                              unsigned char *yleft, int left_stride,
+                              int b_mode,
+                              unsigned char *dst, int dst_stride,
+                              unsigned char top_left)
 {
    int i, r, c;

-    unsigned char *Above = src - src_stride;
    unsigned char Left[4];
-    unsigned char top_left = Above[-1];
-
-    Left[0] = src[-1];
-    Left[1] = src[-1 + src_stride];
-    Left[2] = src[-1 + 2 * src_stride];
-    Left[3] = src[-1 + 3 * src_stride];
+    Left[0] = yleft[0];
+    Left[1] = yleft[left_stride];
+    Left[2] = yleft[2 * left_stride];
+    Left[3] = yleft[3 * left_stride];

    switch (b_mode)
    {
@@ -293,23 +294,15 @@ void vp8_intra4x4_predict_c(unsigned char *src, int src_stride,
    }
 }

-
-
-
-
-/* copy 4 bytes from the above right down so that the 4x4 prediction modes using pixels above and
- * to the right prediction have filled in pixels to use.
- */
-void vp8_intra_prediction_down_copy(MACROBLOCKD *x)
+void vp8_intra4x4_predict_c(unsigned char *src, int src_stride,
+                            int b_mode,
+                            unsigned char *dst, int dst_stride)
 {
-    unsigned char *above_right = *(x->block[0].base_dst) + x->block[0].dst - x->block[0].dst_stride + 16;
+    unsigned char *Above = src - src_stride;

-    unsigned int *src_ptr = (unsigned int *)above_right;
-    unsigned int *dst_ptr0 = (unsigned int *)(above_right + 4 * x->block[0].dst_stride);
-    unsigned int *dst_ptr1 = (unsigned int *)(above_right + 8 * x->block[0].dst_stride);
-    unsigned int *dst_ptr2 = (unsigned int *)(above_right + 12 * x->block[0].dst_stride);
-
-    *dst_ptr0 = *src_ptr;
-    *dst_ptr1 = *src_ptr;
-    *dst_ptr2 = *src_ptr;
+    vp8_intra4x4_predict_d_c(Above,
+                             src - 1, src_stride,
+                             b_mode,
+                             dst, dst_stride,
+                             Above[-1]);
 }
--- a/vp8/common/reconintra4x4.h
+++ b/vp8/common/reconintra4x4.h
@@ -11,7 +11,22 @@

 #ifndef __INC_RECONINTRA4x4_H
 #define __INC_RECONINTRA4x4_H
+#include "vp8/common/blockd.h"

-extern void vp8_intra_prediction_down_copy(MACROBLOCKD *x);
+static void intra_prediction_down_copy(MACROBLOCKD *xd,
+                                             unsigned char *above_right_src)
+{
+    int dst_stride = xd->dst.y_stride;
+    unsigned char *above_right_dst = xd->dst.y_buffer - dst_stride + 16;
+
+    unsigned int *src_ptr = (unsigned int *)above_right_src;
+    unsigned int *dst_ptr0 = (unsigned int *)(above_right_dst + 4 * dst_stride);
+    unsigned int *dst_ptr1 = (unsigned int *)(above_right_dst + 8 * dst_stride);
+    unsigned int *dst_ptr2 = (unsigned int *)(above_right_dst + 12 * dst_stride);
+
+    *dst_ptr0 = *src_ptr;
+    *dst_ptr1 = *src_ptr;
+    *dst_ptr2 = *src_ptr;
+}

 #endif
--- a/vp8/decoder/x86/x86_dsystemdependent.c
+++ b/vp8/decoder/x86/x86_dsystemdependent.c
@@ -1,5 +1,5 @@
 /*
- *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
+ *  Copyright (c) 2011 The WebM project authors. All Rights Reserved.
 *
 *  Use of this source code is governed by a BSD-style license
 *  that can be found in the LICENSE file in the root of the source
@@ -7,13 +7,6 @@
 *  in the file PATENTS.  All contributing project authors may
 *  be found in the AUTHORS file in the root of the source tree.
 */
-
-
 #include "vpx_config.h"
-#include "vpx_ports/x86.h"
-#include "vp8/decoder/onyxd_int.h"
-
-void vp8_arch_x86_decode_init(VP8D_COMP *pbi)
-{
-
-}
+#define RTCD_C
+#include "vpx_rtcd.h"
--- a/vp8/common/rtcd_defs.sh
+++ b/vp8/common/rtcd_defs.sh
@@ -0,0 +1,541 @@
+common_forward_decls() {
+cat <<EOF
+struct blockd;
+struct macroblockd;
+struct loop_filter_info;
+
+/* Encoder forward decls */
+struct block;
+struct macroblock;
+struct variance_vtable;
+union int_mv;
+struct yv12_buffer_config;
+EOF
+}
+forward_decls common_forward_decls
+
+#
+# Dequant
+#
+prototype void vp8_dequantize_b "struct blockd*, short *dqc"
+specialize vp8_dequantize_b mmx media neon
+vp8_dequantize_b_media=vp8_dequantize_b_v6
+
+prototype void vp8_dequant_idct_add "short *input, short *dq, unsigned char *output, int stride"
+specialize vp8_dequant_idct_add mmx media neon
+vp8_dequant_idct_add_media=vp8_dequant_idct_add_v6
+
+prototype void vp8_dequant_idct_add_y_block "short *q, short *dq, unsigned char *dst, int stride, char *eobs"
+specialize vp8_dequant_idct_add_y_block mmx sse2 media neon
+vp8_dequant_idct_add_y_block_media=vp8_dequant_idct_add_y_block_v6
+
+prototype void vp8_dequant_idct_add_uv_block "short *q, short *dq, unsigned char *dst_u, unsigned char *dst_v, int stride, char *eobs"
+specialize vp8_dequant_idct_add_uv_block mmx sse2 media neon
+vp8_dequant_idct_add_uv_block_media=vp8_dequant_idct_add_uv_block_v6
+
+#
+# Loopfilter
+#
+prototype void vp8_loop_filter_mbv "unsigned char *y, unsigned char *u, unsigned char *v, int ystride, int uv_stride, struct loop_filter_info *lfi"
+specialize vp8_loop_filter_mbv mmx sse2 media neon
+vp8_loop_filter_mbv_media=vp8_loop_filter_mbv_armv6
+
+prototype void vp8_loop_filter_bv "unsigned char *y, unsigned char *u, unsigned char *v, int ystride, int uv_stride, struct loop_filter_info *lfi"
+specialize vp8_loop_filter_bv mmx sse2 media neon
+vp8_loop_filter_bv_media=vp8_loop_filter_bv_armv6
+
+prototype void vp8_loop_filter_mbh "unsigned char *y, unsigned char *u, unsigned char *v, int ystride, int uv_stride, struct loop_filter_info *lfi"
+specialize vp8_loop_filter_mbh mmx sse2 media neon
+vp8_loop_filter_mbh_media=vp8_loop_filter_mbh_armv6
+
+prototype void vp8_loop_filter_bh "unsigned char *y, unsigned char *u, unsigned char *v, int ystride, int uv_stride, struct loop_filter_info *lfi"
+specialize vp8_loop_filter_bh mmx sse2 media neon
+vp8_loop_filter_bh_media=vp8_loop_filter_bh_armv6
+
+
+prototype void vp8_loop_filter_simple_mbv "unsigned char *y, int ystride, const unsigned char *blimit"
+specialize vp8_loop_filter_simple_mbv mmx sse2 media neon
+vp8_loop_filter_simple_mbv_c=vp8_loop_filter_simple_vertical_edge_c
+vp8_loop_filter_simple_mbv_mmx=vp8_loop_filter_simple_vertical_edge_mmx
+vp8_loop_filter_simple_mbv_sse2=vp8_loop_filter_simple_vertical_edge_sse2
+vp8_loop_filter_simple_mbv_media=vp8_loop_filter_simple_vertical_edge_armv6
+vp8_loop_filter_simple_mbv_neon=vp8_loop_filter_mbvs_neon
+
+prototype void vp8_loop_filter_simple_mbh "unsigned char *y, int ystride, const unsigned char *blimit"
+specialize vp8_loop_filter_simple_mbh mmx sse2 media neon
+vp8_loop_filter_simple_mbh_c=vp8_loop_filter_simple_horizontal_edge_c
+vp8_loop_filter_simple_mbh_mmx=vp8_loop_filter_simple_horizontal_edge_mmx
+vp8_loop_filter_simple_mbh_sse2=vp8_loop_filter_simple_horizontal_edge_sse2
+vp8_loop_filter_simple_mbh_media=vp8_loop_filter_simple_horizontal_edge_armv6
+vp8_loop_filter_simple_mbh_neon=vp8_loop_filter_mbhs_neon
+
+prototype void vp8_loop_filter_simple_bv "unsigned char *y, int ystride, const unsigned char *blimit"
+specialize vp8_loop_filter_simple_bv mmx sse2 media neon
+vp8_loop_filter_simple_bv_c=vp8_loop_filter_bvs_c
+vp8_loop_filter_simple_bv_mmx=vp8_loop_filter_bvs_mmx
+vp8_loop_filter_simple_bv_sse2=vp8_loop_filter_bvs_sse2
+vp8_loop_filter_simple_bv_media=vp8_loop_filter_bvs_armv6
+vp8_loop_filter_simple_bv_neon=vp8_loop_filter_bvs_neon
+
+prototype void vp8_loop_filter_simple_bh "unsigned char *y, int ystride, const unsigned char *blimit"
+specialize vp8_loop_filter_simple_bh mmx sse2 media neon
+vp8_loop_filter_simple_bh_c=vp8_loop_filter_bhs_c
+vp8_loop_filter_simple_bh_mmx=vp8_loop_filter_bhs_mmx
+vp8_loop_filter_simple_bh_sse2=vp8_loop_filter_bhs_sse2
+vp8_loop_filter_simple_bh_media=vp8_loop_filter_bhs_armv6
+vp8_loop_filter_simple_bh_neon=vp8_loop_filter_bhs_neon
+
+#
+# IDCT
+#
+#idct16
+prototype void vp8_short_idct4x4llm "short *input, unsigned char *pred, int pitch, unsigned char *dst, int dst_stride"
+specialize vp8_short_idct4x4llm mmx media neon
+vp8_short_idct4x4llm_media=vp8_short_idct4x4llm_v6_dual
+
+#iwalsh1
+prototype void vp8_short_inv_walsh4x4_1 "short *input, short *output"
+# no asm yet
+
+#iwalsh16
+prototype void vp8_short_inv_walsh4x4 "short *input, short *output"
+specialize vp8_short_inv_walsh4x4 mmx sse2 media neon
+vp8_short_inv_walsh4x4_media=vp8_short_inv_walsh4x4_v6
+
+#idct1_scalar_add
+prototype void vp8_dc_only_idct_add "short input, unsigned char *pred, int pred_stride, unsigned char *dst, int dst_stride"
+specialize vp8_dc_only_idct_add	mmx media neon
+vp8_dc_only_idct_add_media=vp8_dc_only_idct_add_v6
+
+#
+# RECON
+#
+prototype void vp8_copy_mem16x16 "unsigned char *src, int src_pitch, unsigned char *dst, int dst_pitch"
+specialize vp8_copy_mem16x16 mmx sse2 media neon
+vp8_copy_mem16x16_media=vp8_copy_mem16x16_v6
+
+prototype void vp8_copy_mem8x8 "unsigned char *src, int src_pitch, unsigned char *dst, int dst_pitch"
+specialize vp8_copy_mem8x8 mmx media neon
+vp8_copy_mem8x8_media=vp8_copy_mem8x8_v6
+
+prototype void vp8_copy_mem8x4 "unsigned char *src, int src_pitch, unsigned char *dst, int dst_pitch"
+specialize vp8_copy_mem8x4 mmx media neon
+vp8_copy_mem8x4_media=vp8_copy_mem8x4_v6
+
+prototype void vp8_build_intra_predictors_mby_s "struct macroblockd *x, unsigned char * yabove_row, unsigned char * yleft, int left_stride, unsigned char * ypred_ptr, int y_stride"
+specialize vp8_build_intra_predictors_mby_s sse2 ssse3
+#TODO: fix assembly for neon
+
+prototype void vp8_build_intra_predictors_mbuv_s "struct macroblockd *x, unsigned char * uabove_row, unsigned char * vabove_row,  unsigned char *uleft, unsigned char *vleft, int left_stride, unsigned char * upred_ptr, unsigned char * vpred_ptr, int pred_stride"
+specialize vp8_build_intra_predictors_mbuv_s sse2 ssse3
+
+prototype void vp8_intra4x4_predict_d "unsigned char *above, unsigned char *left, int left_stride, int b_mode, unsigned char *dst, int dst_stride, unsigned char top_left"
+prototype void vp8_intra4x4_predict "unsigned char *src, int src_stride, int b_mode, unsigned char *dst, int dst_stride"
+specialize vp8_intra4x4_predict media
+vp8_intra4x4_predict_media=vp8_intra4x4_predict_armv6
+
+#
+# Postproc
+#
+if [ "$CONFIG_POSTPROC" = "yes" ]; then
+    prototype void vp8_mbpost_proc_down "unsigned char *dst, int pitch, int rows, int cols,int flimit"
+    specialize vp8_mbpost_proc_down mmx sse2
+    vp8_mbpost_proc_down_sse2=vp8_mbpost_proc_down_xmm
+
+    prototype void vp8_mbpost_proc_across_ip "unsigned char *dst, int pitch, int rows, int cols,int flimit"
+    specialize vp8_mbpost_proc_across_ip sse2
+    vp8_mbpost_proc_across_ip_sse2=vp8_mbpost_proc_across_ip_xmm
+
+    prototype void vp8_post_proc_down_and_across "unsigned char *src, unsigned char *dst, int src_pitch, int dst_pitch, int rows, int cols, int flimit"
+    specialize vp8_post_proc_down_and_across mmx sse2
+    vp8_post_proc_down_and_across_sse2=vp8_post_proc_down_and_across_xmm
+
+    prototype void vp8_plane_add_noise "unsigned char *s, char *noise, char blackclamp[16], char whiteclamp[16], char bothclamp[16], unsigned int w, unsigned int h, int pitch"
+    specialize vp8_plane_add_noise mmx sse2
+    vp8_plane_add_noise_sse2=vp8_plane_add_noise_wmt
+
+    prototype void vp8_blend_mb_inner "unsigned char *y, unsigned char *u, unsigned char *v, int y1, int u1, int v1, int alpha, int stride"
+    # no asm yet
+
+    prototype void vp8_blend_mb_outer "unsigned char *y, unsigned char *u, unsigned char *v, int y1, int u1, int v1, int alpha, int stride"
+    # no asm yet
+
+    prototype void vp8_blend_b "unsigned char *y, unsigned char *u, unsigned char *v, int y1, int u1, int v1, int alpha, int stride"
+    # no asm yet
+
+    prototype void vp8_filter_by_weight16x16 "unsigned char *src, int src_stride, unsigned char *dst, int dst_stride, int src_weight"
+    specialize vp8_filter_by_weight16x16 sse2
+
+    prototype void vp8_filter_by_weight8x8 "unsigned char *src, int src_stride, unsigned char *dst, int dst_stride, int src_weight"
+    specialize vp8_filter_by_weight8x8 sse2
+
+    prototype void vp8_filter_by_weight4x4 "unsigned char *src, int src_stride, unsigned char *dst, int dst_stride, int src_weight"
+    # no asm yet
+fi
+
+#
+# Subpixel
+#
+prototype void vp8_sixtap_predict16x16 "unsigned char *src, int src_pitch, int xofst, int yofst, unsigned char *dst, int dst_pitch"
+specialize vp8_sixtap_predict16x16 mmx sse2 ssse3 media neon
+vp8_sixtap_predict16x16_media=vp8_sixtap_predict16x16_armv6
+
+prototype void vp8_sixtap_predict8x8 "unsigned char *src, int src_pitch, int xofst, int yofst, unsigned char *dst, int dst_pitch"
+specialize vp8_sixtap_predict8x8 mmx sse2 ssse3 media neon
+vp8_sixtap_predict8x8_media=vp8_sixtap_predict8x8_armv6
+
+prototype void vp8_sixtap_predict8x4 "unsigned char *src, int src_pitch, int xofst, int yofst, unsigned char *dst, int dst_pitch"
+specialize vp8_sixtap_predict8x4 mmx sse2 ssse3 media neon
+vp8_sixtap_predict8x4_media=vp8_sixtap_predict8x4_armv6
+
+prototype void vp8_sixtap_predict4x4 "unsigned char *src, int src_pitch, int xofst, int yofst, unsigned char *dst, int dst_pitch"
+specialize vp8_sixtap_predict4x4 mmx ssse3 media neon
+vp8_sixtap_predict4x4_media=vp8_sixtap_predict4x4_armv6
+
+prototype void vp8_bilinear_predict16x16 "unsigned char *src, int src_pitch, int xofst, int yofst, unsigned char *dst, int dst_pitch"
+specialize vp8_bilinear_predict16x16 mmx sse2 ssse3 media neon
+vp8_bilinear_predict16x16_media=vp8_bilinear_predict16x16_armv6
+
+prototype void vp8_bilinear_predict8x8 "unsigned char *src, int src_pitch, int xofst, int yofst, unsigned char *dst, int dst_pitch"
+specialize vp8_bilinear_predict8x8 mmx sse2 ssse3 media neon
+vp8_bilinear_predict8x8_media=vp8_bilinear_predict8x8_armv6
+
+prototype void vp8_bilinear_predict8x4 "unsigned char *src, int src_pitch, int xofst, int yofst, unsigned char *dst, int dst_pitch"
+specialize vp8_bilinear_predict8x4 mmx media neon
+vp8_bilinear_predict8x4_media=vp8_bilinear_predict8x4_armv6
+
+prototype void vp8_bilinear_predict4x4 "unsigned char *src, int src_pitch, int xofst, int yofst, unsigned char *dst, int dst_pitch"
+specialize vp8_bilinear_predict4x4 mmx media neon
+vp8_bilinear_predict4x4_media=vp8_bilinear_predict4x4_armv6
+
+#
+# Whole-pixel Variance
+#
+prototype unsigned int vp8_variance4x4 "const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int  ref_stride, unsigned int *sse"
+specialize vp8_variance4x4 mmx sse2
+vp8_variance4x4_sse2=vp8_variance4x4_wmt
+
+prototype unsigned int vp8_variance8x8 "const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int  ref_stride, unsigned int *sse"
+specialize vp8_variance8x8 mmx sse2 media neon
+vp8_variance8x8_sse2=vp8_variance8x8_wmt
+vp8_variance8x8_media=vp8_variance8x8_armv6
+
+prototype unsigned int vp8_variance8x16 "const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int  ref_stride, unsigned int *sse"
+specialize vp8_variance8x16 mmx sse2 neon
+vp8_variance8x16_sse2=vp8_variance8x16_wmt
+
+prototype unsigned int vp8_variance16x8 "const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int  ref_stride, unsigned int *sse"
+specialize vp8_variance16x8 mmx sse2 neon
+vp8_variance16x8_sse2=vp8_variance16x8_wmt
+
+prototype unsigned int vp8_variance16x16 "const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int  ref_stride, unsigned int *sse"
+specialize vp8_variance16x16 mmx sse2 media neon
+vp8_variance16x16_sse2=vp8_variance16x16_wmt
+vp8_variance16x16_media=vp8_variance16x16_armv6
+
+#
+# Sub-pixel Variance
+#
+prototype unsigned int vp8_sub_pixel_variance4x4 "const unsigned char  *src_ptr, int  source_stride, int  xoffset, int  yoffset, const unsigned char *ref_ptr, int Refstride, unsigned int *sse"
+specialize vp8_sub_pixel_variance4x4 mmx sse2
+vp8_sub_pixel_variance4x4_sse2=vp8_sub_pixel_variance4x4_wmt
+
+prototype unsigned int vp8_sub_pixel_variance8x8 "const unsigned char  *src_ptr, int  source_stride, int  xoffset, int  yoffset, const unsigned char *ref_ptr, int Refstride, unsigned int *sse"
+specialize vp8_sub_pixel_variance8x8 mmx sse2 media neon
+vp8_sub_pixel_variance8x8_sse2=vp8_sub_pixel_variance8x8_wmt
+vp8_sub_pixel_variance8x8_media=vp8_sub_pixel_variance8x8_armv6
+
+prototype unsigned int vp8_sub_pixel_variance8x16 "const unsigned char  *src_ptr, int  source_stride, int  xoffset, int  yoffset, const unsigned char *ref_ptr, int Refstride, unsigned int *sse"
+specialize vp8_sub_pixel_variance8x16 mmx sse2
+vp8_sub_pixel_variance8x16_sse2=vp8_sub_pixel_variance8x16_wmt
+
+prototype unsigned int vp8_sub_pixel_variance16x8 "const unsigned char  *src_ptr, int  source_stride, int  xoffset, int  yoffset, const unsigned char *ref_ptr, int Refstride, unsigned int *sse"
+specialize vp8_sub_pixel_variance16x8 mmx sse2 ssse3
+vp8_sub_pixel_variance16x8_sse2=vp8_sub_pixel_variance16x8_wmt
+
+prototype unsigned int vp8_sub_pixel_variance16x16 "const unsigned char  *src_ptr, int  source_stride, int  xoffset, int  yoffset, const unsigned char *ref_ptr, int Refstride, unsigned int *sse"
+specialize vp8_sub_pixel_variance16x16 mmx sse2 ssse3 media neon
+vp8_sub_pixel_variance16x16_sse2=vp8_sub_pixel_variance16x16_wmt
+vp8_sub_pixel_variance16x16_media=vp8_sub_pixel_variance16x16_armv6
+
+prototype unsigned int vp8_variance_halfpixvar16x16_h "const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int  ref_stride, unsigned int *sse"
+specialize vp8_variance_halfpixvar16x16_h mmx sse2 media neon
+vp8_variance_halfpixvar16x16_h_sse2=vp8_variance_halfpixvar16x16_h_wmt
+vp8_variance_halfpixvar16x16_h_media=vp8_variance_halfpixvar16x16_h_armv6
+
+prototype unsigned int vp8_variance_halfpixvar16x16_v "const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int  ref_stride, unsigned int *sse"
+specialize vp8_variance_halfpixvar16x16_v mmx sse2 media neon
+vp8_variance_halfpixvar16x16_v_sse2=vp8_variance_halfpixvar16x16_v_wmt
+vp8_variance_halfpixvar16x16_v_media=vp8_variance_halfpixvar16x16_v_armv6
+
+prototype unsigned int vp8_variance_halfpixvar16x16_hv "const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int  ref_stride, unsigned int *sse"
+specialize vp8_variance_halfpixvar16x16_hv mmx sse2 media neon
+vp8_variance_halfpixvar16x16_hv_sse2=vp8_variance_halfpixvar16x16_hv_wmt
+vp8_variance_halfpixvar16x16_hv_media=vp8_variance_halfpixvar16x16_hv_armv6
+
+#
+# Single block SAD
+#
+prototype unsigned int vp8_sad4x4 "const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int ref_stride, int max_sad"
+specialize vp8_sad4x4 mmx sse2 neon
+vp8_sad4x4_sse2=vp8_sad4x4_wmt
+
+prototype unsigned int vp8_sad8x8 "const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int ref_stride, int max_sad"
+specialize vp8_sad8x8 mmx sse2 neon
+vp8_sad8x8_sse2=vp8_sad8x8_wmt
+
+prototype unsigned int vp8_sad8x16 "const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int ref_stride, int max_sad"
+specialize vp8_sad8x16 mmx sse2 neon
+vp8_sad8x16_sse2=vp8_sad8x16_wmt
+
+prototype unsigned int vp8_sad16x8 "const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int ref_stride, int max_sad"
+specialize vp8_sad16x8 mmx sse2 neon
+vp8_sad16x8_sse2=vp8_sad16x8_wmt
+
+prototype unsigned int vp8_sad16x16 "const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int ref_stride, int max_sad"
+specialize vp8_sad16x16 mmx sse2 sse3 media neon
+vp8_sad16x16_sse2=vp8_sad16x16_wmt
+vp8_sad16x16_media=vp8_sad16x16_armv6
+
+#
+# Multi-block SAD, comparing a reference to N blocks 1 pixel apart horizontally
+#
+prototype void vp8_sad4x4x3 "const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int  ref_stride, unsigned int *sad_array"
+specialize vp8_sad4x4x3 sse3
+
+prototype void vp8_sad8x8x3 "const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int  ref_stride, unsigned int *sad_array"
+specialize vp8_sad8x8x3 sse3
+
+prototype void vp8_sad8x16x3 "const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int  ref_stride, unsigned int *sad_array"
+specialize vp8_sad8x16x3 sse3
+
+prototype void vp8_sad16x8x3 "const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int  ref_stride, unsigned int *sad_array"
+specialize vp8_sad16x8x3 sse3 ssse3
+
+prototype void vp8_sad16x16x3 "const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int  ref_stride, unsigned int *sad_array"
+specialize vp8_sad16x16x3 sse3 ssse3
+
+# Note the only difference in the following prototypes is that they return into
+# an array of short
+prototype void vp8_sad4x4x8 "const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int  ref_stride, unsigned short *sad_array"
+specialize vp8_sad4x4x8 sse4_1
+vp8_sad4x4x8_sse4_1=vp8_sad4x4x8_sse4
+
+prototype void vp8_sad8x8x8 "const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int  ref_stride, unsigned short *sad_array"
+specialize vp8_sad8x8x8 sse4_1
+vp8_sad8x8x8_sse4_1=vp8_sad8x8x8_sse4
+
+prototype void vp8_sad8x16x8 "const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int  ref_stride, unsigned short *sad_array"
+specialize vp8_sad8x16x8 sse4_1
+vp8_sad8x16x8_sse4_1=vp8_sad8x16x8_sse4
+
+prototype void vp8_sad16x8x8 "const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int  ref_stride, unsigned short *sad_array"
+specialize vp8_sad16x8x8 sse4_1
+vp8_sad16x8x8_sse4_1=vp8_sad16x8x8_sse4
+
+prototype void vp8_sad16x16x8 "const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int  ref_stride, unsigned short *sad_array"
+specialize vp8_sad16x16x8 sse4_1
+vp8_sad16x16x8_sse4_1=vp8_sad16x16x8_sse4
+
+#
+# Multi-block SAD, comparing a reference to N independent blocks
+#
+prototype void vp8_sad4x4x4d "const unsigned char *src_ptr, int source_stride, unsigned char *ref_ptr[4], int  ref_stride, unsigned int *sad_array"
+specialize vp8_sad4x4x4d sse3
+
+prototype void vp8_sad8x8x4d "const unsigned char *src_ptr, int source_stride, unsigned char *ref_ptr[4], int  ref_stride, unsigned int *sad_array"
+specialize vp8_sad8x8x4d sse3
+
+prototype void vp8_sad8x16x4d "const unsigned char *src_ptr, int source_stride, unsigned char *ref_ptr[4], int  ref_stride, unsigned int *sad_array"
+specialize vp8_sad8x16x4d sse3
+
+prototype void vp8_sad16x8x4d "const unsigned char *src_ptr, int source_stride, unsigned char *ref_ptr[4], int  ref_stride, unsigned int *sad_array"
+specialize vp8_sad16x8x4d sse3
+
+prototype void vp8_sad16x16x4d "const unsigned char *src_ptr, int source_stride, unsigned char *ref_ptr[4], int  ref_stride, unsigned int *sad_array"
+specialize vp8_sad16x16x4d sse3
+
+#
+# Encoder functions below this point.
+#
+if [ "$CONFIG_VP8_ENCODER" = "yes" ]; then
+
+#
+# Sum of squares (vector)
+#
+prototype unsigned int vp8_get_mb_ss "const short *"
+specialize vp8_get_mb_ss mmx sse2
+
+#
+# SSE (Sum Squared Error)
+#
+prototype unsigned int vp8_sub_pixel_mse16x16 "const unsigned char  *src_ptr, int  source_stride, int  xoffset, int  yoffset, const unsigned char *ref_ptr, int Refstride, unsigned int *sse"
+specialize vp8_sub_pixel_mse16x16 mmx sse2
+vp8_sub_pixel_mse16x16_sse2=vp8_sub_pixel_mse16x16_wmt
+
+prototype unsigned int vp8_mse16x16 "const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int  ref_stride, unsigned int *sse"
+specialize vp8_mse16x16 mmx sse2 media neon
+vp8_mse16x16_sse2=vp8_mse16x16_wmt
+vp8_mse16x16_media=vp8_mse16x16_armv6
+
+prototype unsigned int vp8_get4x4sse_cs "const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int  ref_stride"
+specialize vp8_get4x4sse_cs mmx neon
+
+#
+# Block copy
+#
+case $arch in
+    x86*)
+    prototype void vp8_copy32xn "const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int ref_stride, int n"
+    specialize vp8_copy32xn sse2 sse3
+    ;;
+esac
+
+#
+# Structured Similarity (SSIM)
+#
+if [ "$CONFIG_INTERNAL_STATS" = "yes" ]; then
+    [ $arch = "x86_64" ] && sse2_on_x86_64=sse2
+
+    prototype void vp8_ssim_parms_8x8 "unsigned char *s, int sp, unsigned char *r, int rp, unsigned long *sum_s, unsigned long *sum_r, unsigned long *sum_sq_s, unsigned long *sum_sq_r, unsigned long *sum_sxr"
+    specialize vp8_ssim_parms_8x8 $sse2_on_x86_64
+
+    prototype void vp8_ssim_parms_16x16 "unsigned char *s, int sp, unsigned char *r, int rp, unsigned long *sum_s, unsigned long *sum_r, unsigned long *sum_sq_s, unsigned long *sum_sq_r, unsigned long *sum_sxr"
+    specialize vp8_ssim_parms_16x16 $sse2_on_x86_64
+fi
+
+#
+# Forward DCT
+#
+prototype void vp8_short_fdct4x4 "short *input, short *output, int pitch"
+specialize vp8_short_fdct4x4 mmx sse2 media neon
+vp8_short_fdct4x4_media=vp8_short_fdct4x4_armv6
+
+prototype void vp8_short_fdct8x4 "short *input, short *output, int pitch"
+specialize vp8_short_fdct8x4 mmx sse2 media neon
+vp8_short_fdct8x4_media=vp8_short_fdct8x4_armv6
+
+prototype void vp8_short_walsh4x4 "short *input, short *output, int pitch"
+specialize vp8_short_walsh4x4 sse2 media neon
+vp8_short_walsh4x4_media=vp8_short_walsh4x4_armv6
+
+#
+# Quantizer
+#
+prototype void vp8_regular_quantize_b "struct block *, struct blockd *"
+specialize vp8_regular_quantize_b sse2 sse4_1
+vp8_regular_quantize_b_sse4_1=vp8_regular_quantize_b_sse4
+
+prototype void vp8_fast_quantize_b "struct block *, struct blockd *"
+specialize vp8_fast_quantize_b sse2 ssse3 media neon
+vp8_fast_quantize_b_media=vp8_fast_quantize_b_armv6
+
+prototype void vp8_regular_quantize_b_pair "struct block *b1, struct block *b2, struct blockd *d1, struct blockd *d2"
+# no asm yet
+
+prototype void vp8_fast_quantize_b_pair "struct block *b1, struct block *b2, struct blockd *d1, struct blockd *d2"
+specialize vp8_fast_quantize_b_pair neon
+
+prototype void vp8_quantize_mb "struct macroblock *"
+specialize vp8_quantize_mb neon
+
+prototype void vp8_quantize_mby "struct macroblock *"
+specialize vp8_quantize_mby neon
+
+prototype void vp8_quantize_mbuv "struct macroblock *"
+specialize vp8_quantize_mbuv neon
+
+#
+# Block subtraction
+#
+prototype int vp8_block_error "short *coeff, short *dqcoeff"
+specialize vp8_block_error mmx sse2
+vp8_block_error_sse2=vp8_block_error_xmm
+
+prototype int vp8_mbblock_error "struct macroblock *mb, int dc"
+specialize vp8_mbblock_error mmx sse2
+vp8_mbblock_error_sse2=vp8_mbblock_error_xmm
+
+prototype int vp8_mbuverror "struct macroblock *mb"
+specialize vp8_mbuverror mmx sse2
+vp8_mbuverror_sse2=vp8_mbuverror_xmm
+
+prototype void vp8_subtract_b "struct block *be, struct blockd *bd, int pitch"
+specialize vp8_subtract_b mmx sse2 media neon
+vp8_subtract_b_media=vp8_subtract_b_armv6
+
+prototype void vp8_subtract_mby "short *diff, unsigned char *src, int src_stride, unsigned char *pred, int pred_stride"
+specialize vp8_subtract_mby mmx sse2 media neon
+vp8_subtract_mby_media=vp8_subtract_mby_armv6
+
+prototype void vp8_subtract_mbuv "short *diff, unsigned char *usrc, unsigned char *vsrc, int src_stride, unsigned char *upred, unsigned char *vpred, int pred_stride"
+specialize vp8_subtract_mbuv mmx sse2 media neon
+vp8_subtract_mbuv_media=vp8_subtract_mbuv_armv6
+
+#
+# Motion search
+#
+prototype int vp8_full_search_sad "struct macroblock *x, struct block *b, struct blockd *d, union int_mv *ref_mv, int sad_per_bit, int distance, struct variance_vtable *fn_ptr, int *mvcost[2], union int_mv *center_mv"
+specialize vp8_full_search_sad sse3 sse4_1
+vp8_full_search_sad_sse3=vp8_full_search_sadx3
+vp8_full_search_sad_sse4_1=vp8_full_search_sadx8
+
+prototype int vp8_refining_search_sad "struct macroblock *x, struct block *b, struct blockd *d, union int_mv *ref_mv, int sad_per_bit, int distance, struct variance_vtable *fn_ptr, int *mvcost[2], union int_mv *center_mv"
+specialize vp8_refining_search_sad sse3
+vp8_refining_search_sad_sse3=vp8_refining_search_sadx4
+
+prototype int vp8_diamond_search_sad "struct macroblock *x, struct block *b, struct blockd *d, union int_mv *ref_mv, union int_mv *best_mv, int search_param, int sad_per_bit, int *num00, struct variance_vtable *fn_ptr, int *mvcost[2], union int_mv *center_mv"
+vp8_diamond_search_sad_sse3=vp8_diamond_search_sadx4
+
+#
+# Alt-ref Noise Reduction (ARNR)
+#
+if [ "$CONFIG_REALTIME_ONLY" != "yes" ]; then
+    prototype void vp8_temporal_filter_apply "unsigned char *frame1, unsigned int stride, unsigned char *frame2, unsigned int block_size, int strength, int filter_weight, unsigned int *accumulator, unsigned short *count"
+    specialize vp8_temporal_filter_apply sse2
+fi
+
+#
+# Pick Loopfilter
+#
+prototype void vp8_yv12_copy_partial_frame "struct yv12_buffer_config *src_ybc, struct yv12_buffer_config *dst_ybc"
+specialize vp8_yv12_copy_partial_frame neon
+
+# End of encoder only functions
+fi
+
+# Scaler functions
+if [ "CONFIG_SPATIAL_RESAMPLING" != "yes" ]; then
+    prototype void vp8_horizontal_line_4_5_scale "const unsigned char *source, unsigned int source_width, unsigned char *dest, unsigned int dest_width"
+    prototype void vp8_vertical_band_4_5_scale "unsigned char *dest, unsigned int dest_pitch, unsigned int dest_width"
+    prototype void vp8_last_vertical_band_4_5_scale "unsigned char *dest, unsigned int dest_pitch, unsigned int dest_width"
+    prototype void vp8_horizontal_line_2_3_scale "const unsigned char *source, unsigned int source_width, unsigned char *dest, unsigned int dest_width"
+    prototype void vp8_vertical_band_2_3_scale "unsigned char *dest, unsigned int dest_pitch, unsigned int dest_width"
+    prototype void vp8_last_vertical_band_2_3_scale "unsigned char *dest, unsigned int dest_pitch, unsigned int dest_width"
+    prototype void vp8_horizontal_line_3_5_scale "const unsigned char *source, unsigned int source_width, unsigned char *dest, unsigned int dest_width"
+    prototype void vp8_vertical_band_3_5_scale "unsigned char *dest, unsigned int dest_pitch, unsigned int dest_width"
+    prototype void vp8_last_vertical_band_3_5_scale "unsigned char *dest, unsigned int dest_pitch, unsigned int dest_width"
+    prototype void vp8_horizontal_line_3_4_scale "const unsigned char *source, unsigned int source_width, unsigned char *dest, unsigned int dest_width"
+    prototype void vp8_vertical_band_3_4_scale "unsigned char *dest, unsigned int dest_pitch, unsigned int dest_width"
+    prototype void vp8_last_vertical_band_3_4_scale "unsigned char *dest, unsigned int dest_pitch, unsigned int dest_width"
+    prototype void vp8_horizontal_line_1_2_scale "const unsigned char *source, unsigned int source_width, unsigned char *dest, unsigned int dest_width"
+    prototype void vp8_vertical_band_1_2_scale "unsigned char *dest, unsigned int dest_pitch, unsigned int dest_width"
+    prototype void vp8_last_vertical_band_1_2_scale "unsigned char *dest, unsigned int dest_pitch, unsigned int dest_width"
+    prototype void vp8_horizontal_line_5_4_scale "const unsigned char *source, unsigned int source_width, unsigned char *dest, unsigned int dest_width"
+    prototype void vp8_vertical_band_5_4_scale "unsigned char *source, unsigned int src_pitch, unsigned char *dest, unsigned int dest_pitch, unsigned int dest_width"
+    prototype void vp8_horizontal_line_5_3_scale "const unsigned char *source, unsigned int source_width, unsigned char *dest, unsigned int dest_width"
+    prototype void vp8_vertical_band_5_3_scale "unsigned char *source, unsigned int src_pitch, unsigned char *dest, unsigned int dest_pitch, unsigned int dest_width"
+    prototype void vp8_horizontal_line_2_1_scale "const unsigned char *source, unsigned int source_width, unsigned char *dest, unsigned int dest_width"
+    prototype void vp8_vertical_band_2_1_scale "unsigned char *source, unsigned int src_pitch, unsigned char *dest, unsigned int dest_pitch, unsigned int dest_width"
+    prototype void vp8_vertical_band_2_1_scale_i "unsigned char *source, unsigned int src_pitch, unsigned char *dest, unsigned int dest_pitch, unsigned int dest_width"
+fi
+
+prototype void vp8_yv12_extend_frame_borders "struct yv12_buffer_config *ybf"
+specialize vp8_yv12_extend_frame_borders neon
+
+prototype void vp8_yv12_copy_frame "struct yv12_buffer_config *src_ybc, struct yv12_buffer_config *dst_ybc"
+specialize vp8_yv12_copy_frame neon
+
+prototype void vp8_yv12_copy_y "struct yv12_buffer_config *src_ybc, struct yv12_buffer_config *dst_ybc"
+specialize vp8_yv12_copy_y neon
+
--- a/vp8/encoder/sad_c.c
+++ b/vp8/encoder/sad_c.c
@@ -13,40 +13,15 @@
 #include "vpx_config.h"
 #include "vpx/vpx_integer.h"

-unsigned int vp8_sad16x16_c(
-    const unsigned char *src_ptr,
-    int  src_stride,
-    const unsigned char *ref_ptr,
-    int  ref_stride,
-    int max_sad)
-{
-
-    int r, c;
-    unsigned int sad = 0;
-
-    for (r = 0; r < 16; r++)
-    {
-        for (c = 0; c < 16; c++)
-        {
-            sad += abs(src_ptr[c] - ref_ptr[c]);
-        }
-
-        src_ptr += src_stride;
-        ref_ptr += ref_stride;
-    }
-
-    return sad;
-}
-
-
-static __inline
+static
 unsigned int sad_mx_n_c(
    const unsigned char *src_ptr,
    int  src_stride,
    const unsigned char *ref_ptr,
    int  ref_stride,
-    int m,
-    int n)
+    int  max_sad,
+    int  m,
+    int  n)
 {

    int r, c;
@@ -59,6 +34,9 @@ unsigned int sad_mx_n_c(
            sad += abs(src_ptr[c] - ref_ptr[c]);
        }

+        if (sad > max_sad)
+          break;
+
        src_ptr += src_stride;
        ref_ptr += ref_stride;
    }
@@ -66,16 +44,31 @@ unsigned int sad_mx_n_c(
    return sad;
 }

+/* max_sad is provided as an optional optimization point. Alternative
+ * implementations of these functions are not required to check it.
+ */
+
+unsigned int vp8_sad16x16_c(
+    const unsigned char *src_ptr,
+    int  src_stride,
+    const unsigned char *ref_ptr,
+    int  ref_stride,
+    int  max_sad)
+{
+
+    return sad_mx_n_c(src_ptr, src_stride, ref_ptr, ref_stride, max_sad, 16, 16);
+}
+

 unsigned int vp8_sad8x8_c(
    const unsigned char *src_ptr,
    int  src_stride,
    const unsigned char *ref_ptr,
    int  ref_stride,
-    int max_sad)
+    int  max_sad)
 {

-    return sad_mx_n_c(src_ptr, src_stride, ref_ptr, ref_stride, 8, 8);
+    return sad_mx_n_c(src_ptr, src_stride, ref_ptr, ref_stride, max_sad, 8, 8);
 }


@@ -84,10 +77,10 @@ unsigned int vp8_sad16x8_c(
    int  src_stride,
    const unsigned char *ref_ptr,
    int  ref_stride,
-    int max_sad)
+    int  max_sad)
 {

-    return sad_mx_n_c(src_ptr, src_stride, ref_ptr, ref_stride, 16, 8);
+    return sad_mx_n_c(src_ptr, src_stride, ref_ptr, ref_stride, max_sad, 16, 8);

 }

@@ -97,10 +90,10 @@ unsigned int vp8_sad8x16_c(
    int  src_stride,
    const unsigned char *ref_ptr,
    int  ref_stride,
-    int max_sad)
+    int  max_sad)
 {

-    return sad_mx_n_c(src_ptr, src_stride, ref_ptr, ref_stride, 8, 16);
+    return sad_mx_n_c(src_ptr, src_stride, ref_ptr, ref_stride, max_sad, 8, 16);
 }


@@ -109,10 +102,10 @@ unsigned int vp8_sad4x4_c(
    int  src_stride,
    const unsigned char *ref_ptr,
    int  ref_stride,
-    int max_sad)
+    int  max_sad)
 {

-    return sad_mx_n_c(src_ptr, src_stride, ref_ptr, ref_stride, 4, 4);
+    return sad_mx_n_c(src_ptr, src_stride, ref_ptr, ref_stride, max_sad, 4, 4);
 }

 void vp8_sad16x16x3_c(
--- a/vp8/common/subpixel.h
+++ b/vp8/common/subpixel.h
@@ -1,86 +0,0 @@
-/*
- *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-
-#ifndef SUBPIXEL_H
-#define SUBPIXEL_H
-
-#define prototype_subpixel_predict(sym) \
-    void sym(unsigned char *src, int src_pitch, int xofst, int yofst, \
-             unsigned char *dst, int dst_pitch)
-
-#if ARCH_X86 || ARCH_X86_64
-#include "x86/subpixel_x86.h"
-#endif
-
-#if ARCH_ARM
-#include "arm/subpixel_arm.h"
-#endif
-
-#ifndef vp8_subpix_sixtap16x16
-#define vp8_subpix_sixtap16x16 vp8_sixtap_predict16x16_c
-#endif
-extern prototype_subpixel_predict(vp8_subpix_sixtap16x16);
-
-#ifndef vp8_subpix_sixtap8x8
-#define vp8_subpix_sixtap8x8 vp8_sixtap_predict8x8_c
-#endif
-extern prototype_subpixel_predict(vp8_subpix_sixtap8x8);
-
-#ifndef vp8_subpix_sixtap8x4
-#define vp8_subpix_sixtap8x4 vp8_sixtap_predict8x4_c
-#endif
-extern prototype_subpixel_predict(vp8_subpix_sixtap8x4);
-
-#ifndef vp8_subpix_sixtap4x4
-#define vp8_subpix_sixtap4x4 vp8_sixtap_predict_c
-#endif
-extern prototype_subpixel_predict(vp8_subpix_sixtap4x4);
-
-#ifndef vp8_subpix_bilinear16x16
-#define vp8_subpix_bilinear16x16 vp8_bilinear_predict16x16_c
-#endif
-extern prototype_subpixel_predict(vp8_subpix_bilinear16x16);
-
-#ifndef vp8_subpix_bilinear8x8
-#define vp8_subpix_bilinear8x8 vp8_bilinear_predict8x8_c
-#endif
-extern prototype_subpixel_predict(vp8_subpix_bilinear8x8);
-
-#ifndef vp8_subpix_bilinear8x4
-#define vp8_subpix_bilinear8x4 vp8_bilinear_predict8x4_c
-#endif
-extern prototype_subpixel_predict(vp8_subpix_bilinear8x4);
-
-#ifndef vp8_subpix_bilinear4x4
-#define vp8_subpix_bilinear4x4 vp8_bilinear_predict4x4_c
-#endif
-extern prototype_subpixel_predict(vp8_subpix_bilinear4x4);
-
-typedef prototype_subpixel_predict((*vp8_subpix_fn_t));
-typedef struct
-{
-    vp8_subpix_fn_t  sixtap16x16;
-    vp8_subpix_fn_t  sixtap8x8;
-    vp8_subpix_fn_t  sixtap8x4;
-    vp8_subpix_fn_t  sixtap4x4;
-    vp8_subpix_fn_t  bilinear16x16;
-    vp8_subpix_fn_t  bilinear8x8;
-    vp8_subpix_fn_t  bilinear8x4;
-    vp8_subpix_fn_t  bilinear4x4;
-} vp8_subpix_rtcd_vtable_t;
-
-#if CONFIG_RUNTIME_CPU_DETECT
-#define SUBPIX_INVOKE(ctx,fn) (ctx)->fn
-#else
-#define SUBPIX_INVOKE(ctx,fn) vp8_subpix_##fn
-#endif
-
-#endif
--- a/vp8/common/threading.h
+++ b/vp8/common/threading.h
@@ -33,6 +33,29 @@
 #define pthread_getspecific(ts_key) TlsGetValue(ts_key)
 #define pthread_setspecific(ts_key, value) TlsSetValue(ts_key, (void *)value)
 #define pthread_self() GetCurrentThreadId()
+
+#elif defined(__OS2__)
+/* OS/2 */
+#define INCL_DOS
+#include <os2.h>
+
+#include <stdlib.h>
+#define THREAD_FUNCTION void
+#define THREAD_FUNCTION_RETURN void
+#define THREAD_SPECIFIC_INDEX PULONG
+#define pthread_t TID
+#define pthread_attr_t ULONG
+#define pthread_create(thhandle,attr,thfunc,tharg) \
+    ((int)((*(thhandle)=_beginthread(thfunc,NULL,1024*1024,tharg))==-1))
+#define pthread_join(thread, result) ((int)DosWaitThread(&(thread),0))
+#define pthread_detach(thread) 0
+#define thread_sleep(nms) DosSleep(nms)
+#define pthread_cancel(thread) DosKillThread(thread)
+#define ts_key_create(ts_key, destructor) \
+    DosAllocThreadLocalMemory(1, &(ts_key));
+#define pthread_getspecific(ts_key) ((void *)(*(ts_key)))
+#define pthread_setspecific(ts_key, value) (*(ts_key)=(ULONG)(value))
+#define pthread_self() _gettid()
 #else
 #ifdef __APPLE__
 #include <mach/mach_init.h>
@@ -64,6 +87,76 @@
 #define sem_destroy(sem) if(*sem)((int)(CloseHandle(*sem))==TRUE)
 #define thread_sleep(nms) Sleep(nms)

+#elif defined(__OS2__)
+typedef struct
+{
+    HEV  event;
+    HMTX wait_mutex;
+    HMTX count_mutex;
+    int  count;
+} sem_t;
+
+static inline int sem_init(sem_t *sem, int pshared, unsigned int value)
+{
+    DosCreateEventSem(NULL, &sem->event, pshared ? DC_SEM_SHARED : 0,
+                      value > 0 ? TRUE : FALSE);
+    DosCreateMutexSem(NULL, &sem->wait_mutex, 0, FALSE);
+    DosCreateMutexSem(NULL, &sem->count_mutex, 0, FALSE);
+
+    sem->count = value;
+
+    return 0;
+}
+
+static inline int sem_wait(sem_t * sem)
+{
+    DosRequestMutexSem(sem->wait_mutex, -1);
+
+    DosWaitEventSem(sem->event, -1);
+
+    DosRequestMutexSem(sem->count_mutex, -1);
+
+    sem->count--;
+    if (sem->count == 0)
+    {
+        ULONG post_count;
+
+        DosResetEventSem(sem->event, &post_count);
+    }
+
+    DosReleaseMutexSem(sem->count_mutex);
+
+    DosReleaseMutexSem(sem->wait_mutex);
+
+    return 0;
+}
+
+static inline int sem_post(sem_t * sem)
+{
+    DosRequestMutexSem(sem->count_mutex, -1);
+
+    if (sem->count < 32768)
+    {
+        sem->count++;
+        DosPostEventSem(sem->event);
+    }
+
+    DosReleaseMutexSem(sem->count_mutex);
+
+    return 0;
+}
+
+static inline int sem_destroy(sem_t * sem)
+{
+    DosCloseEventSem(sem->event);
+    DosCloseMutexSem(sem->wait_mutex);
+    DosCloseMutexSem(sem->count_mutex);
+
+    return 0;
+}
+
+#define thread_sleep(nms) DosSleep(nms)
+
 #else

 #ifdef __APPLE__
--- a/vp8/common/variance.h
+++ b/vp8/common/variance.h
@@ -0,0 +1,115 @@
+/*
+ *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
+ *
+ *  Use of this source code is governed by a BSD-style license
+ *  that can be found in the LICENSE file in the root of the source
+ *  tree. An additional intellectual property rights grant can be found
+ *  in the file PATENTS.  All contributing project authors may
+ *  be found in the AUTHORS file in the root of the source tree.
+ */
+
+
+#ifndef VARIANCE_H
+#define VARIANCE_H
+
+typedef unsigned int(*vp8_sad_fn_t)
+    (
+    const unsigned char *src_ptr,
+    int source_stride,
+    const unsigned char *ref_ptr,
+    int ref_stride,
+    int max_sad
+    );
+
+typedef void (*vp8_copy32xn_fn_t)(
+    const unsigned char *src_ptr,
+    int source_stride,
+    const unsigned char *ref_ptr,
+    int ref_stride,
+    int n);
+
+typedef void (*vp8_sad_multi_fn_t)(
+    const unsigned char *src_ptr,
+    int source_stride,
+    const unsigned char *ref_ptr,
+    int  ref_stride,
+    unsigned int *sad_array);
+
+typedef void (*vp8_sad_multi1_fn_t)
+    (
+     const unsigned char *src_ptr,
+     int source_stride,
+     const unsigned char *ref_ptr,
+     int  ref_stride,
+     unsigned short *sad_array
+    );
+
+typedef void (*vp8_sad_multi_d_fn_t)
+    (
+     const unsigned char *src_ptr,
+     int source_stride,
+     unsigned char *ref_ptr[4],
+     int  ref_stride,
+     unsigned int *sad_array
+    );
+
+typedef unsigned int (*vp8_variance_fn_t)
+    (
+     const unsigned char *src_ptr,
+     int source_stride,
+     const unsigned char *ref_ptr,
+     int  ref_stride,
+     unsigned int *sse
+    );
+
+typedef unsigned int (*vp8_subpixvariance_fn_t)
+    (
+      const unsigned char  *src_ptr,
+      int  source_stride,
+      int  xoffset,
+      int  yoffset,
+      const unsigned char *ref_ptr,
+      int Refstride,
+      unsigned int *sse
+    );
+
+typedef void (*vp8_ssimpf_fn_t)
+      (
+        unsigned char *s,
+        int sp,
+        unsigned char *r,
+        int rp,
+        unsigned long *sum_s,
+        unsigned long *sum_r,
+        unsigned long *sum_sq_s,
+        unsigned long *sum_sq_r,
+        unsigned long *sum_sxr
+      );
+
+typedef unsigned int (*vp8_getmbss_fn_t)(const short *);
+
+typedef unsigned int (*vp8_get16x16prederror_fn_t)
+    (
+     const unsigned char *src_ptr,
+     int source_stride,
+     const unsigned char *ref_ptr,
+     int  ref_stride
+    );
+
+typedef struct variance_vtable
+{
+    vp8_sad_fn_t            sdf;
+    vp8_variance_fn_t       vf;
+    vp8_subpixvariance_fn_t svf;
+    vp8_variance_fn_t       svf_halfpix_h;
+    vp8_variance_fn_t       svf_halfpix_v;
+    vp8_variance_fn_t       svf_halfpix_hv;
+    vp8_sad_multi_fn_t      sdx3f;
+    vp8_sad_multi1_fn_t     sdx8f;
+    vp8_sad_multi_d_fn_t    sdx4df;
+#if ARCH_X86 || ARCH_X86_64
+    vp8_copy32xn_fn_t       copymem;
+#endif
+} vp8_variance_fn_ptr_t;
+
+#endif
--- a/vp8/encoder/variance_c.c
+++ b/vp8/encoder/variance_c.c
@@ -10,7 +10,7 @@


 #include "variance.h"
-#include "vp8/common/filter.h"
+#include "filter.h"


 unsigned int vp8_get_mb_ss_c
@@ -75,7 +75,7 @@ unsigned int vp8_variance16x16_c(

    variance(src_ptr, source_stride, ref_ptr, recon_stride, 16, 16, &var, &avg);
    *sse = var;
-    return (var - ((avg * avg) >> 8));
+    return (var - ((unsigned int)(avg * avg) >> 8));
 }

 unsigned int vp8_variance8x16_c(
@@ -91,7 +91,7 @@ unsigned int vp8_variance8x16_c(

    variance(src_ptr, source_stride, ref_ptr, recon_stride, 8, 16, &var, &avg);
    *sse = var;
-    return (var - ((avg * avg) >> 7));
+    return (var - ((unsigned int)(avg * avg) >> 7));
 }

 unsigned int vp8_variance16x8_c(
@@ -107,7 +107,7 @@ unsigned int vp8_variance16x8_c(

    variance(src_ptr, source_stride, ref_ptr, recon_stride, 16, 8, &var, &avg);
    *sse = var;
-    return (var - ((avg * avg) >> 7));
+    return (var - ((unsigned int)(avg * avg) >> 7));
 }


@@ -124,7 +124,7 @@ unsigned int vp8_variance8x8_c(

    variance(src_ptr, source_stride, ref_ptr, recon_stride, 8, 8, &var, &avg);
    *sse = var;
-    return (var - ((avg * avg) >> 6));
+    return (var - ((unsigned int)(avg * avg) >> 6));
 }

 unsigned int vp8_variance4x4_c(
@@ -140,7 +140,7 @@ unsigned int vp8_variance4x4_c(

    variance(src_ptr, source_stride, ref_ptr, recon_stride, 4, 4, &var, &avg);
    *sse = var;
-    return (var - ((avg * avg) >> 4));
+    return (var - ((unsigned int)(avg * avg) >> 4));
 }


--- a/vp8/common/vp8_entropymodedata.h
+++ b/vp8/common/vp8_entropymodedata.h
@@ -0,0 +1,242 @@
+/*
+ *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
+ *
+ *  Use of this source code is governed by a BSD-style license
+ *  that can be found in the LICENSE file in the root of the source
+ *  tree. An additional intellectual property rights grant can be found
+ *  in the file PATENTS.  All contributing project authors may
+ *  be found in the AUTHORS file in the root of the source tree.
+*/
+
+
+/*Generated file, included by entropymode.c*/
+
+
+const struct vp8_token_struct vp8_bmode_encodings[VP8_BINTRAMODES] =
+{
+    { 0, 1 },
+    { 2, 2 },
+    { 6, 3 },
+    { 28, 5 },
+    { 30, 5 },
+    { 58, 6 },
+    { 59, 6 },
+    { 62, 6 },
+    { 126, 7 },
+    { 127, 7 }
+};
+
+const struct vp8_token_struct vp8_ymode_encodings[VP8_YMODES] =
+{
+    { 0, 1 },
+    { 4, 3 },
+    { 5, 3 },
+    { 6, 3 },
+    { 7, 3 }
+};
+
+const struct vp8_token_struct vp8_kf_ymode_encodings[VP8_YMODES] =
+{
+    { 4, 3 },
+    { 5, 3 },
+    { 6, 3 },
+    { 7, 3 },
+    { 0, 1 }
+};
+
+const struct vp8_token_struct vp8_uv_mode_encodings[VP8_UV_MODES] =
+{
+    { 0, 1 },
+    { 2, 2 },
+    { 6, 3 },
+    { 7, 3 }
+};
+
+const struct vp8_token_struct vp8_mbsplit_encodings[VP8_NUMMBSPLITS] =
+{
+    { 6, 3 },
+    { 7, 3 },
+    { 2, 2 },
+    { 0, 1 }
+};
+
+const struct vp8_token_struct vp8_mv_ref_encoding_array[VP8_MVREFS] =
+{
+    { 2, 2 },
+    { 6, 3 },
+    { 0, 1 },
+    { 14, 4 },
+    { 15, 4 }
+};
+
+const struct vp8_token_struct vp8_sub_mv_ref_encoding_array[VP8_SUBMVREFS] =
+{
+    { 0, 1 },
+    { 2, 2 },
+    { 6, 3 },
+    { 7, 3 }
+};
+
+const struct vp8_token_struct vp8_small_mvencodings[8] =
+{
+    { 0, 3 },
+    { 1, 3 },
+    { 2, 3 },
+    { 3, 3 },
+    { 4, 3 },
+    { 5, 3 },
+    { 6, 3 },
+    { 7, 3 }
+};
+
+const vp8_prob vp8_ymode_prob[VP8_YMODES-1] =
+{
+    112, 86, 140, 37
+};
+
+const vp8_prob vp8_kf_ymode_prob[VP8_YMODES-1] =
+{
+    145, 156, 163, 128
+};
+
+const vp8_prob vp8_uv_mode_prob[VP8_UV_MODES-1] =
+{
+    162, 101, 204
+};
+
+const vp8_prob vp8_kf_uv_mode_prob[VP8_UV_MODES-1] =
+{
+    142, 114, 183
+};
+
+const vp8_prob vp8_bmode_prob[VP8_BINTRAMODES-1] =
+{
+    120, 90, 79, 133, 87, 85, 80, 111, 151
+};
+
+
+
+const vp8_prob vp8_kf_bmode_prob
+[VP8_BINTRAMODES] [VP8_BINTRAMODES] [VP8_BINTRAMODES-1] =
+{
+    {
+        { 231, 120,  48,  89, 115, 113, 120, 152, 112 },
+        { 152, 179,  64, 126, 170, 118,  46,  70,  95 },
+        { 175,  69, 143,  80,  85,  82,  72, 155, 103 },
+        {  56,  58,  10, 171, 218, 189,  17,  13, 152 },
+        { 144,  71,  10,  38, 171, 213, 144,  34,  26 },
+        { 114,  26,  17, 163,  44, 195,  21,  10, 173 },
+        { 121,  24,  80, 195,  26,  62,  44,  64,  85 },
+        { 170,  46,  55,  19, 136, 160,  33, 206,  71 },
+        {  63,  20,   8, 114, 114, 208,  12,   9, 226 },
+        {  81,  40,  11,  96, 182,  84,  29,  16,  36 }
+    },
+    {
+        { 134, 183,  89, 137,  98, 101, 106, 165, 148 },
+        {  72, 187, 100, 130, 157, 111,  32,  75,  80 },
+        {  66, 102, 167,  99,  74,  62,  40, 234, 128 },
+        {  41,  53,   9, 178, 241, 141,  26,   8, 107 },
+        { 104,  79,  12,  27, 217, 255,  87,  17,   7 },
+        {  74,  43,  26, 146,  73, 166,  49,  23, 157 },
+        {  65,  38, 105, 160,  51,  52,  31, 115, 128 },
+        {  87,  68,  71,  44, 114,  51,  15, 186,  23 },
+        {  47,  41,  14, 110, 182, 183,  21,  17, 194 },
+        {  66,  45,  25, 102, 197, 189,  23,  18,  22 }
+    },
+    {
+        {  88,  88, 147, 150,  42,  46,  45, 196, 205 },
+        {  43,  97, 183, 117,  85,  38,  35, 179,  61 },
+        {  39,  53, 200,  87,  26,  21,  43, 232, 171 },
+        {  56,  34,  51, 104, 114, 102,  29,  93,  77 },
+        { 107,  54,  32,  26,  51,   1,  81,  43,  31 },
+        {  39,  28,  85, 171,  58, 165,  90,  98,  64 },
+        {  34,  22, 116, 206,  23,  34,  43, 166,  73 },
+        {  68,  25, 106,  22,  64, 171,  36, 225, 114 },
+        {  34,  19,  21, 102, 132, 188,  16,  76, 124 },
+        {  62,  18,  78,  95,  85,  57,  50,  48,  51 }
+    },
+    {
+        { 193, 101,  35, 159, 215, 111,  89,  46, 111 },
+        {  60, 148,  31, 172, 219, 228,  21,  18, 111 },
+        { 112, 113,  77,  85, 179, 255,  38, 120, 114 },
+        {  40,  42,   1, 196, 245, 209,  10,  25, 109 },
+        { 100,  80,   8,  43, 154,   1,  51,  26,  71 },
+        {  88,  43,  29, 140, 166, 213,  37,  43, 154 },
+        {  61,  63,  30, 155,  67,  45,  68,   1, 209 },
+        { 142,  78,  78,  16, 255, 128,  34, 197, 171 },
+        {  41,  40,   5, 102, 211, 183,   4,   1, 221 },
+        {  51,  50,  17, 168, 209, 192,  23,  25,  82 }
+    },
+    {
+        { 125,  98,  42,  88, 104,  85, 117, 175,  82 },
+        {  95,  84,  53,  89, 128, 100, 113, 101,  45 },
+        {  75,  79, 123,  47,  51, 128,  81, 171,   1 },
+        {  57,  17,   5,  71, 102,  57,  53,  41,  49 },
+        { 115,  21,   2,  10, 102, 255, 166,  23,   6 },
+        {  38,  33,  13, 121,  57,  73,  26,   1,  85 },
+        {  41,  10,  67, 138,  77, 110,  90,  47, 114 },
+        { 101,  29,  16,  10,  85, 128, 101, 196,  26 },
+        {  57,  18,  10, 102, 102, 213,  34,  20,  43 },
+        { 117,  20,  15,  36, 163, 128,  68,   1,  26 }
+    },
+    {
+        { 138,  31,  36, 171,  27, 166,  38,  44, 229 },
+        {  67,  87,  58, 169,  82, 115,  26,  59, 179 },
+        {  63,  59,  90, 180,  59, 166,  93,  73, 154 },
+        {  40,  40,  21, 116, 143, 209,  34,  39, 175 },
+        {  57,  46,  22,  24, 128,   1,  54,  17,  37 },
+        {  47,  15,  16, 183,  34, 223,  49,  45, 183 },
+        {  46,  17,  33, 183,   6,  98,  15,  32, 183 },
+        {  65,  32,  73, 115,  28, 128,  23, 128, 205 },
+        {  40,   3,   9, 115,  51, 192,  18,   6, 223 },
+        {  87,  37,   9, 115,  59,  77,  64,  21,  47 }
+    },
+    {
+        { 104,  55,  44, 218,   9,  54,  53, 130, 226 },
+        {  64,  90,  70, 205,  40,  41,  23,  26,  57 },
+        {  54,  57, 112, 184,   5,  41,  38, 166, 213 },
+        {  30,  34,  26, 133, 152, 116,  10,  32, 134 },
+        {  75,  32,  12,  51, 192, 255, 160,  43,  51 },
+        {  39,  19,  53, 221,  26, 114,  32,  73, 255 },
+        {  31,   9,  65, 234,   2,  15,   1, 118,  73 },
+        {  88,  31,  35,  67, 102,  85,  55, 186,  85 },
+        {  56,  21,  23, 111,  59, 205,  45,  37, 192 },
+        {  55,  38,  70, 124,  73, 102,   1,  34,  98 }
+    },
+    {
+        { 102,  61,  71,  37,  34,  53,  31, 243, 192 },
+        {  69,  60,  71,  38,  73, 119,  28, 222,  37 },
+        {  68,  45, 128,  34,   1,  47,  11, 245, 171 },
+        {  62,  17,  19,  70, 146,  85,  55,  62,  70 },
+        {  75,  15,   9,   9,  64, 255, 184, 119,  16 },
+        {  37,  43,  37, 154, 100, 163,  85, 160,   1 },
+        {  63,   9,  92, 136,  28,  64,  32, 201,  85 },
+        {  86,   6,  28,   5,  64, 255,  25, 248,   1 },
+        {  56,   8,  17, 132, 137, 255,  55, 116, 128 },
+        {  58,  15,  20,  82, 135,  57,  26, 121,  40 }
+    },
+    {
+        { 164,  50,  31, 137, 154, 133,  25,  35, 218 },
+        {  51, 103,  44, 131, 131, 123,  31,   6, 158 },
+        {  86,  40,  64, 135, 148, 224,  45, 183, 128 },
+        {  22,  26,  17, 131, 240, 154,  14,   1, 209 },
+        {  83,  12,  13,  54, 192, 255,  68,  47,  28 },
+        {  45,  16,  21,  91,  64, 222,   7,   1, 197 },
+        {  56,  21,  39, 155,  60, 138,  23, 102, 213 },
+        {  85,  26,  85,  85, 128, 128,  32, 146, 171 },
+        {  18,  11,   7,  63, 144, 171,   4,   4, 246 },
+        {  35,  27,  10, 146, 174, 171,  12,  26, 128 }
+    },
+    {
+        { 190,  80,  35,  99, 180,  80, 126,  54,  45 },
+        {  85, 126,  47,  87, 176,  51,  41,  20,  32 },
+        { 101,  75, 128, 139, 118, 146, 116, 128,  85 },
+        {  56,  41,  15, 176, 236,  85,  37,   9,  62 },
+        { 146,  36,  19,  30, 171, 255,  97,  27,  20 },
+        {  71,  30,  17, 119, 118, 255,  17,  18, 138 },
+        { 101,  38,  60, 138,  55,  70,  43,  26, 142 },
+        { 138,  45,  61,  62, 219,   1,  81, 188,  64 },
+        {  32,  41,  20, 117, 151, 142,  20,  21, 163 },
+        { 112,  19,  12,  61, 195, 128,  48,   4,  24 }
+    }
+};
--- a/vp8/common/x86/dequantize_x86.h
+++ b/vp8/common/x86/dequantize_x86.h
@@ -1,58 +0,0 @@
-/*
- *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-
-#ifndef DEQUANTIZE_X86_H
-#define DEQUANTIZE_X86_H
-
-
-/* Note:
- *
- * This platform is commonly built for runtime CPU detection. If you modify
- * any of the function mappings present in this file, be sure to also update
- * them in the function pointer initialization code
- */
-#if HAVE_MMX
-extern prototype_dequant_block(vp8_dequantize_b_mmx);
-extern prototype_dequant_idct_add(vp8_dequant_idct_add_mmx);
-extern prototype_dequant_idct_add_y_block(vp8_dequant_idct_add_y_block_mmx);
-extern prototype_dequant_idct_add_uv_block(vp8_dequant_idct_add_uv_block_mmx);
-
-#if !CONFIG_RUNTIME_CPU_DETECT
-#undef  vp8_dequant_block
-#define vp8_dequant_block vp8_dequantize_b_mmx
-
-#undef  vp8_dequant_idct_add
-#define vp8_dequant_idct_add vp8_dequant_idct_add_mmx
-
-#undef vp8_dequant_idct_add_y_block
-#define vp8_dequant_idct_add_y_block vp8_dequant_idct_add_y_block_mmx
-
-#undef vp8_dequant_idct_add_uv_block
-#define vp8_dequant_idct_add_uv_block vp8_dequant_idct_add_uv_block_mmx
-
-#endif
-#endif
-
-#if HAVE_SSE2
-extern prototype_dequant_idct_add_y_block(vp8_dequant_idct_add_y_block_sse2);
-extern prototype_dequant_idct_add_uv_block(vp8_dequant_idct_add_uv_block_sse2);
-
-#if !CONFIG_RUNTIME_CPU_DETECT
-#undef vp8_dequant_idct_add_y_block
-#define vp8_dequant_idct_add_y_block vp8_dequant_idct_add_y_block_sse2
-
-#undef vp8_dequant_idct_add_uv_block
-#define vp8_dequant_idct_add_uv_block vp8_dequant_idct_add_uv_block_sse2
-
-#endif
-#endif
-
-#endif
--- a/vp8/common/x86/idct_blk_mmx.c
+++ b/vp8/common/x86/idct_blk_mmx.c
@@ -9,8 +9,8 @@
 */

 #include "vpx_config.h"
-#include "vp8/common/idct.h"
-#include "vp8/common/dequantize.h"
+#include "vpx_rtcd.h"
+#include "vp8/common/blockd.h"

 extern void vp8_dequantize_b_impl_mmx(short *sq, short *dq, short *q);

--- a/vp8/common/x86/idct_blk_sse2.c
+++ b/vp8/common/x86/idct_blk_sse2.c
@@ -9,8 +9,7 @@
 */

 #include "vpx_config.h"
-#include "vp8/common/idct.h"
-#include "vp8/common/dequantize.h"
+#include "vpx_rtcd.h"

 void vp8_idct_dequant_0_2x_sse2
            (short *q, short *dq ,
--- a/vp8/common/x86/idct_x86.h
+++ b/vp8/common/x86/idct_x86.h
@@ -1,56 +0,0 @@
-/*
- *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-
-#ifndef IDCT_X86_H
-#define IDCT_X86_H
-
-/* Note:
- *
- * This platform is commonly built for runtime CPU detection. If you modify
- * any of the function mappings present in this file, be sure to also update
- * them in the function pointer initialization code
- */
-
-#if HAVE_MMX
-extern prototype_idct(vp8_short_idct4x4llm_mmx);
-extern prototype_idct_scalar_add(vp8_dc_only_idct_add_mmx);
-
-extern prototype_second_order(vp8_short_inv_walsh4x4_mmx);
-
-#if !CONFIG_RUNTIME_CPU_DETECT
-#undef  vp8_idct_idct16
-#define vp8_idct_idct16 vp8_short_idct4x4llm_mmx
-
-#undef  vp8_idct_idct1_scalar_add
-#define vp8_idct_idct1_scalar_add vp8_dc_only_idct_add_mmx
-
-#undef vp8_idct_iwalsh16
-#define vp8_idct_iwalsh16 vp8_short_inv_walsh4x4_mmx
-
-#endif
-#endif
-
-#if HAVE_SSE2
-
-extern prototype_second_order(vp8_short_inv_walsh4x4_sse2);
-
-#if !CONFIG_RUNTIME_CPU_DETECT
-
-#undef vp8_idct_iwalsh16
-#define vp8_idct_iwalsh16 vp8_short_inv_walsh4x4_sse2
-
-#endif
-
-#endif
-
-
-
-#endif
--- a/vp8/common/x86/idctllm_mmx_test.cc
+++ b/vp8/common/x86/idctllm_mmx_test.cc
@@ -0,0 +1,31 @@
+/*
+ *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
+ *
+ *  Use of this source code is governed by a BSD-style license
+ *  that can be found in the LICENSE file in the root of the source
+ *  tree. An additional intellectual property rights grant can be found
+ *  in the file PATENTS.  All contributing project authors may
+ *  be found in the AUTHORS file in the root of the source tree.
+ */
+
+
+ extern "C" {
+    void vp8_short_idct4x4llm_mmx(short *input, unsigned char *pred_ptr,
+                            int pred_stride, unsigned char *dst_ptr,
+                            int dst_stride);
+}
+
+#include "vp8/common/idctllm_test.h"
+
+namespace
+{
+
+INSTANTIATE_TEST_CASE_P(MMX, IDCTTest,
+                        ::testing::Values(vp8_short_idct4x4llm_mmx));
+
+} // namespace
+
+int main(int argc, char **argv) {
+  ::testing::InitGoogleTest(&argc, argv);
+  return RUN_ALL_TESTS();
+}
--- a/vp8/common/x86/loopfilter_sse2.asm
+++ b/vp8/common/x86/loopfilter_sse2.asm
--- a/vp8/common/x86/loopfilter_x86.c
+++ b/vp8/common/x86/loopfilter_x86.c
@@ -12,20 +12,33 @@
 #include "vpx_config.h"
 #include "vp8/common/loopfilter.h"

+#define prototype_loopfilter(sym) \
+    void sym(unsigned char *src, int pitch, const unsigned char *blimit,\
+             const unsigned char *limit, const unsigned char *thresh, int count)
+
+#define prototype_loopfilter_nc(sym) \
+    void sym(unsigned char *src, int pitch, const unsigned char *blimit,\
+             const unsigned char *limit, const unsigned char *thresh)
+
+#define prototype_simple_loopfilter(sym) \
+    void sym(unsigned char *y, int ystride, const unsigned char *blimit)
+
 prototype_loopfilter(vp8_mbloop_filter_vertical_edge_mmx);
 prototype_loopfilter(vp8_mbloop_filter_horizontal_edge_mmx);
 prototype_loopfilter(vp8_loop_filter_vertical_edge_mmx);
 prototype_loopfilter(vp8_loop_filter_horizontal_edge_mmx);
+prototype_simple_loopfilter(vp8_loop_filter_simple_horizontal_edge_mmx);
+prototype_simple_loopfilter(vp8_loop_filter_simple_vertical_edge_mmx);

 #if HAVE_SSE2 && ARCH_X86_64
 prototype_loopfilter(vp8_loop_filter_bv_y_sse2);
 prototype_loopfilter(vp8_loop_filter_bh_y_sse2);
 #else
-prototype_loopfilter(vp8_loop_filter_vertical_edge_sse2);
-prototype_loopfilter(vp8_loop_filter_horizontal_edge_sse2);
+prototype_loopfilter_nc(vp8_loop_filter_vertical_edge_sse2);
+prototype_loopfilter_nc(vp8_loop_filter_horizontal_edge_sse2);
 #endif
-prototype_loopfilter(vp8_mbloop_filter_vertical_edge_sse2);
-prototype_loopfilter(vp8_mbloop_filter_horizontal_edge_sse2);
+prototype_loopfilter_nc(vp8_mbloop_filter_vertical_edge_sse2);
+prototype_loopfilter_nc(vp8_mbloop_filter_horizontal_edge_sse2);

 extern loop_filter_uvfunction vp8_loop_filter_horizontal_edge_uv_sse2;
 extern loop_filter_uvfunction vp8_loop_filter_vertical_edge_uv_sse2;
@@ -115,7 +128,7 @@ void vp8_loop_filter_bvs_mmx(unsigned char *y_ptr, int y_stride, const unsigned
 void vp8_loop_filter_mbh_sse2(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
                              int y_stride, int uv_stride, loop_filter_info *lfi)
 {
-    vp8_mbloop_filter_horizontal_edge_sse2(y_ptr, y_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 2);
+    vp8_mbloop_filter_horizontal_edge_sse2(y_ptr, y_stride, lfi->mblim, lfi->lim, lfi->hev_thr);

    if (u_ptr)
        vp8_mbloop_filter_horizontal_edge_uv_sse2(u_ptr, uv_stride, lfi->mblim, lfi->lim, lfi->hev_thr, v_ptr);
@@ -126,7 +139,7 @@ void vp8_loop_filter_mbh_sse2(unsigned char *y_ptr, unsigned char *u_ptr, unsign
 void vp8_loop_filter_mbv_sse2(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
                              int y_stride, int uv_stride, loop_filter_info *lfi)
 {
-    vp8_mbloop_filter_vertical_edge_sse2(y_ptr, y_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 2);
+    vp8_mbloop_filter_vertical_edge_sse2(y_ptr, y_stride, lfi->mblim, lfi->lim, lfi->hev_thr);

    if (u_ptr)
        vp8_mbloop_filter_vertical_edge_uv_sse2(u_ptr, uv_stride, lfi->mblim, lfi->lim, lfi->hev_thr, v_ptr);
@@ -140,9 +153,9 @@ void vp8_loop_filter_bh_sse2(unsigned char *y_ptr, unsigned char *u_ptr, unsigne
 #if ARCH_X86_64
    vp8_loop_filter_bh_y_sse2(y_ptr, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
 #else
-    vp8_loop_filter_horizontal_edge_sse2(y_ptr + 4 * y_stride, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
-    vp8_loop_filter_horizontal_edge_sse2(y_ptr + 8 * y_stride, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
-    vp8_loop_filter_horizontal_edge_sse2(y_ptr + 12 * y_stride, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
+    vp8_loop_filter_horizontal_edge_sse2(y_ptr + 4 * y_stride, y_stride, lfi->blim, lfi->lim, lfi->hev_thr);
+    vp8_loop_filter_horizontal_edge_sse2(y_ptr + 8 * y_stride, y_stride, lfi->blim, lfi->lim, lfi->hev_thr);
+    vp8_loop_filter_horizontal_edge_sse2(y_ptr + 12 * y_stride, y_stride, lfi->blim, lfi->lim, lfi->hev_thr);
 #endif

    if (u_ptr)
@@ -165,9 +178,9 @@ void vp8_loop_filter_bv_sse2(unsigned char *y_ptr, unsigned char *u_ptr, unsigne
 #if ARCH_X86_64
    vp8_loop_filter_bv_y_sse2(y_ptr, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
 #else
-    vp8_loop_filter_vertical_edge_sse2(y_ptr + 4, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
-    vp8_loop_filter_vertical_edge_sse2(y_ptr + 8, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
-    vp8_loop_filter_vertical_edge_sse2(y_ptr + 12, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
+    vp8_loop_filter_vertical_edge_sse2(y_ptr + 4, y_stride, lfi->blim, lfi->lim, lfi->hev_thr);
+    vp8_loop_filter_vertical_edge_sse2(y_ptr + 8, y_stride, lfi->blim, lfi->lim, lfi->hev_thr);
+    vp8_loop_filter_vertical_edge_sse2(y_ptr + 12, y_stride, lfi->blim, lfi->lim, lfi->hev_thr);
 #endif

    if (u_ptr)
--- a/vp8/common/x86/loopfilter_x86.h
+++ b/vp8/common/x86/loopfilter_x86.h
@@ -1,100 +0,0 @@
-/*
- *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-
-#ifndef LOOPFILTER_X86_H
-#define LOOPFILTER_X86_H
-
-/* Note:
- *
- * This platform is commonly built for runtime CPU detection. If you modify
- * any of the function mappings present in this file, be sure to also update
- * them in the function pointer initialization code
- */
-
-#if HAVE_MMX
-extern prototype_loopfilter_block(vp8_loop_filter_mbv_mmx);
-extern prototype_loopfilter_block(vp8_loop_filter_bv_mmx);
-extern prototype_loopfilter_block(vp8_loop_filter_mbh_mmx);
-extern prototype_loopfilter_block(vp8_loop_filter_bh_mmx);
-extern prototype_simple_loopfilter(vp8_loop_filter_simple_vertical_edge_mmx);
-extern prototype_simple_loopfilter(vp8_loop_filter_bvs_mmx);
-extern prototype_simple_loopfilter(vp8_loop_filter_simple_horizontal_edge_mmx);
-extern prototype_simple_loopfilter(vp8_loop_filter_bhs_mmx);
-
-
-#if !CONFIG_RUNTIME_CPU_DETECT
-#undef  vp8_lf_normal_mb_v
-#define vp8_lf_normal_mb_v vp8_loop_filter_mbv_mmx
-
-#undef  vp8_lf_normal_b_v
-#define vp8_lf_normal_b_v vp8_loop_filter_bv_mmx
-
-#undef  vp8_lf_normal_mb_h
-#define vp8_lf_normal_mb_h vp8_loop_filter_mbh_mmx
-
-#undef  vp8_lf_normal_b_h
-#define vp8_lf_normal_b_h vp8_loop_filter_bh_mmx
-
-#undef  vp8_lf_simple_mb_v
-#define vp8_lf_simple_mb_v vp8_loop_filter_simple_vertical_edge_mmx
-
-#undef  vp8_lf_simple_b_v
-#define vp8_lf_simple_b_v vp8_loop_filter_bvs_mmx
-
-#undef  vp8_lf_simple_mb_h
-#define vp8_lf_simple_mb_h vp8_loop_filter_simple_horizontal_edge_mmx
-
-#undef  vp8_lf_simple_b_h
-#define vp8_lf_simple_b_h vp8_loop_filter_bhs_mmx
-#endif
-#endif
-
-
-#if HAVE_SSE2
-extern prototype_loopfilter_block(vp8_loop_filter_mbv_sse2);
-extern prototype_loopfilter_block(vp8_loop_filter_bv_sse2);
-extern prototype_loopfilter_block(vp8_loop_filter_mbh_sse2);
-extern prototype_loopfilter_block(vp8_loop_filter_bh_sse2);
-extern prototype_simple_loopfilter(vp8_loop_filter_simple_vertical_edge_sse2);
-extern prototype_simple_loopfilter(vp8_loop_filter_bvs_sse2);
-extern prototype_simple_loopfilter(vp8_loop_filter_simple_horizontal_edge_sse2);
-extern prototype_simple_loopfilter(vp8_loop_filter_bhs_sse2);
-
-
-#if !CONFIG_RUNTIME_CPU_DETECT
-#undef  vp8_lf_normal_mb_v
-#define vp8_lf_normal_mb_v vp8_loop_filter_mbv_sse2
-
-#undef  vp8_lf_normal_b_v
-#define vp8_lf_normal_b_v vp8_loop_filter_bv_sse2
-
-#undef  vp8_lf_normal_mb_h
-#define vp8_lf_normal_mb_h vp8_loop_filter_mbh_sse2
-
-#undef  vp8_lf_normal_b_h
-#define vp8_lf_normal_b_h vp8_loop_filter_bh_sse2
-
-#undef  vp8_lf_simple_mb_v
-#define vp8_lf_simple_mb_v vp8_loop_filter_simple_vertical_edge_sse2
-
-#undef  vp8_lf_simple_b_v
-#define vp8_lf_simple_b_v vp8_loop_filter_bvs_sse2
-
-#undef  vp8_lf_simple_mb_h
-#define vp8_lf_simple_mb_h vp8_loop_filter_simple_horizontal_edge_sse2
-
-#undef  vp8_lf_simple_b_h
-#define vp8_lf_simple_b_h vp8_loop_filter_bhs_sse2
-#endif
-#endif
-
-
-#endif
--- a/vp8/common/x86/mfqe_sse2.asm
+++ b/vp8/common/x86/mfqe_sse2.asm
@@ -0,0 +1,281 @@
+;
+;  Copyright (c) 2012 The WebM project authors. All Rights Reserved.
+;
+;  Use of this source code is governed by a BSD-style license
+;  that can be found in the LICENSE file in the root of the source
+;  tree. An additional intellectual property rights grant can be found
+;  in the file PATENTS.  All contributing project authors may
+;  be found in the AUTHORS file in the root of the source tree.
+;
+
+
+%include "vpx_ports/x86_abi_support.asm"
+
+;void vp8_filter_by_weight16x16_sse2
+;(
+;    unsigned char *src,
+;    int            src_stride,
+;    unsigned char *dst,
+;    int            dst_stride,
+;    int            src_weight
+;)
+global sym(vp8_filter_by_weight16x16_sse2)
+sym(vp8_filter_by_weight16x16_sse2):
+    push        rbp
+    mov         rbp, rsp
+    SHADOW_ARGS_TO_STACK 5
+    SAVE_XMM 6
+    GET_GOT     rbx
+    push        rsi
+    push        rdi
+    ; end prolog
+
+    movd        xmm0, arg(4)                ; src_weight
+    pshuflw     xmm0, xmm0, 0x0             ; replicate to all low words
+    punpcklqdq  xmm0, xmm0                  ; replicate to all hi words
+
+    movdqa      xmm1, [GLOBAL(tMFQE)]
+    psubw       xmm1, xmm0                  ; dst_weight
+
+    mov         rax, arg(0)                 ; src
+    mov         rsi, arg(1)                 ; src_stride
+    mov         rdx, arg(2)                 ; dst
+    mov         rdi, arg(3)                 ; dst_stride
+
+    mov         rcx, 16                     ; loop count
+    pxor        xmm6, xmm6
+
+.combine
+    movdqa      xmm2, [rax]
+    movdqa      xmm4, [rdx]
+    add         rax, rsi
+
+    ; src * src_weight
+    movdqa      xmm3, xmm2
+    punpcklbw   xmm2, xmm6
+    punpckhbw   xmm3, xmm6
+    pmullw      xmm2, xmm0
+    pmullw      xmm3, xmm0
+
+    ; dst * dst_weight
+    movdqa      xmm5, xmm4
+    punpcklbw   xmm4, xmm6
+    punpckhbw   xmm5, xmm6
+    pmullw      xmm4, xmm1
+    pmullw      xmm5, xmm1
+
+    ; sum, round and shift
+    paddw       xmm2, xmm4
+    paddw       xmm3, xmm5
+    paddw       xmm2, [GLOBAL(tMFQE_round)]
+    paddw       xmm3, [GLOBAL(tMFQE_round)]
+    psrlw       xmm2, 4
+    psrlw       xmm3, 4
+
+    packuswb    xmm2, xmm3
+    movdqa      [rdx], xmm2
+    add         rdx, rdi
+
+    dec         rcx
+    jnz         .combine
+
+    ; begin epilog
+    pop         rdi
+    pop         rsi
+    RESTORE_GOT
+    RESTORE_XMM
+    UNSHADOW_ARGS
+    pop         rbp
+
+    ret
+
+;void vp8_filter_by_weight8x8_sse2
+;(
+;    unsigned char *src,
+;    int            src_stride,
+;    unsigned char *dst,
+;    int            dst_stride,
+;    int            src_weight
+;)
+global sym(vp8_filter_by_weight8x8_sse2)
+sym(vp8_filter_by_weight8x8_sse2):
+    push        rbp
+    mov         rbp, rsp
+    SHADOW_ARGS_TO_STACK 5
+    GET_GOT     rbx
+    push        rsi
+    push        rdi
+    ; end prolog
+
+    movd        xmm0, arg(4)                ; src_weight
+    pshuflw     xmm0, xmm0, 0x0             ; replicate to all low words
+    punpcklqdq  xmm0, xmm0                  ; replicate to all hi words
+
+    movdqa      xmm1, [GLOBAL(tMFQE)]
+    psubw       xmm1, xmm0                  ; dst_weight
+
+    mov         rax, arg(0)                 ; src
+    mov         rsi, arg(1)                 ; src_stride
+    mov         rdx, arg(2)                 ; dst
+    mov         rdi, arg(3)                 ; dst_stride
+
+    mov         rcx, 8                      ; loop count
+    pxor        xmm4, xmm4
+
+.combine
+    movq        xmm2, [rax]
+    movq        xmm3, [rdx]
+    add         rax, rsi
+
+    ; src * src_weight
+    punpcklbw   xmm2, xmm4
+    pmullw      xmm2, xmm0
+
+    ; dst * dst_weight
+    punpcklbw   xmm3, xmm4
+    pmullw      xmm3, xmm1
+
+    ; sum, round and shift
+    paddw       xmm2, xmm3
+    paddw       xmm2, [GLOBAL(tMFQE_round)]
+    psrlw       xmm2, 4
+
+    packuswb    xmm2, xmm4
+    movq        [rdx], xmm2
+    add         rdx, rdi
+
+    dec         rcx
+    jnz         .combine
+
+    ; begin epilog
+    pop         rdi
+    pop         rsi
+    RESTORE_GOT
+    UNSHADOW_ARGS
+    pop         rbp
+
+    ret
+
+;void vp8_variance_and_sad_16x16_sse2 | arg
+;(
+;    unsigned char *src1,          0
+;    int            stride1,       1
+;    unsigned char *src2,          2
+;    int            stride2,       3
+;    unsigned int  *variance,      4
+;    unsigned int  *sad,           5
+;)
+global sym(vp8_variance_and_sad_16x16_sse2)
+sym(vp8_variance_and_sad_16x16_sse2):
+    push        rbp
+    mov         rbp, rsp
+    SHADOW_ARGS_TO_STACK 6
+    GET_GOT     rbx
+    push        rsi
+    push        rdi
+    ; end prolog
+
+    mov         rax,        arg(0)          ; src1
+    mov         rcx,        arg(1)          ; stride1
+    mov         rdx,        arg(2)          ; src2
+    mov         rdi,        arg(3)          ; stride2
+
+    mov         rsi,        16              ; block height
+
+    ; Prep accumulator registers
+    pxor        xmm3, xmm3                  ; SAD
+    pxor        xmm4, xmm4                  ; sum of src2
+    pxor        xmm5, xmm5                  ; sum of src2^2
+
+    ; Because we're working with the actual output frames
+    ; we can't depend on any kind of data alignment.
+.accumulate
+    movdqa      xmm0, [rax]                 ; src1
+    movdqa      xmm1, [rdx]                 ; src2
+    add         rax, rcx                    ; src1 + stride1
+    add         rdx, rdi                    ; src2 + stride2
+
+    ; SAD(src1, src2)
+    psadbw      xmm0, xmm1
+    paddusw     xmm3, xmm0
+
+    ; SUM(src2)
+    pxor        xmm2, xmm2
+    psadbw      xmm2, xmm1                  ; sum src2 by misusing SAD against 0
+    paddusw     xmm4, xmm2
+
+    ; pmaddubsw would be ideal if it took two unsigned values. instead,
+    ; it expects a signed and an unsigned value. so instead we zero extend
+    ; and operate on words.
+    pxor        xmm2, xmm2
+    movdqa      xmm0, xmm1
+    punpcklbw   xmm0, xmm2
+    punpckhbw   xmm1, xmm2
+    pmaddwd     xmm0, xmm0
+    pmaddwd     xmm1, xmm1
+    paddd       xmm5, xmm0
+    paddd       xmm5, xmm1
+
+    sub         rsi,        1
+    jnz         .accumulate
+
+    ; phaddd only operates on adjacent double words.
+    ; Finalize SAD and store
+    movdqa      xmm0, xmm3
+    psrldq      xmm0, 8
+    paddusw     xmm0, xmm3
+    paddd       xmm0, [GLOBAL(t128)]
+    psrld       xmm0, 8
+
+    mov         rax,  arg(5)
+    movd        [rax], xmm0
+
+    ; Accumulate sum of src2
+    movdqa      xmm0, xmm4
+    psrldq      xmm0, 8
+    paddusw     xmm0, xmm4
+    ; Square src2. Ignore high value
+    pmuludq     xmm0, xmm0
+    psrld       xmm0, 8
+
+    ; phaddw could be used to sum adjacent values but we want
+    ; all the values summed. promote to doubles, accumulate,
+    ; shift and sum
+    pxor        xmm2, xmm2
+    movdqa      xmm1, xmm5
+    punpckldq   xmm1, xmm2
+    punpckhdq   xmm5, xmm2
+    paddd       xmm1, xmm5
+    movdqa      xmm2, xmm1
+    psrldq      xmm1, 8
+    paddd       xmm1, xmm2
+
+    psubd       xmm1, xmm0
+
+    ; (variance + 128) >> 8
+    paddd       xmm1, [GLOBAL(t128)]
+    psrld       xmm1, 8
+    mov         rax,  arg(4)
+
+    movd        [rax], xmm1
+
+
+    ; begin epilog
+    pop         rdi
+    pop         rsi
+    RESTORE_GOT
+    UNSHADOW_ARGS
+    pop         rbp
+    ret
+
+SECTION_RODATA
+align 16
+t128:
+    ddq 128
+align 16
+tMFQE: ; 1 << MFQE_PRECISION
+    times 8 dw 0x10
+align 16
+tMFQE_round: ; 1 << (MFQE_PRECISION - 1)
+    times 8 dw 0x08
+
--- a/vp8/common/x86/postproc_x86.c
+++ b/vp8/common/x86/postproc_x86.c
@@ -1,5 +1,5 @@
 /*
- *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
+ *  Copyright (c) 2012 The WebM project authors. All Rights Reserved.
 *
 *  Use of this source code is governed by a BSD-style license
 *  that can be found in the LICENSE file in the root of the source
@@ -8,10 +8,14 @@
 *  be found in the AUTHORS file in the root of the source tree.
 */

+/* On Android NDK, rand is inlined function, but postproc needs rand symbol */
+#if defined(__ANDROID__)
+#define rand __rand
+#include <stdlib.h>
+#undef rand

-#ifndef __INC_RECONINTRA_H
-#define __INC_RECONINTRA_H
-
-extern void init_intra_left_above_pixels(MACROBLOCKD *x);
-
+extern int rand(void)
+{
+  return __rand();
+}
 #endif
--- a/vp8/common/x86/postproc_x86.h
+++ b/vp8/common/x86/postproc_x86.h
@@ -1,64 +0,0 @@
-/*
- *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
- *
- *  Use of this source code is governed by a BSD-style license
- *  that can be found in the LICENSE file in the root of the source
- *  tree. An additional intellectual property rights grant can be found
- *  in the file PATENTS.  All contributing project authors may
- *  be found in the AUTHORS file in the root of the source tree.
- */
-
-
-#ifndef POSTPROC_X86_H
-#define POSTPROC_X86_H
-
-/* Note:
- *
- * This platform is commonly built for runtime CPU detection. If you modify
- * any of the function mappings present in this file, be sure to also update
- * them in the function pointer initialization code
- */
-
-#if HAVE_MMX
-extern prototype_postproc_inplace(vp8_mbpost_proc_down_mmx);
-extern prototype_postproc(vp8_post_proc_down_and_across_mmx);
-extern prototype_postproc_addnoise(vp8_plane_add_noise_mmx);
-
-#if !CONFIG_RUNTIME_CPU_DETECT
-#undef  vp8_postproc_down
-#define vp8_postproc_down vp8_mbpost_proc_down_mmx
-
-#undef  vp8_postproc_downacross
-#define vp8_postproc_downacross vp8_post_proc_down_and_across_mmx
-
-#undef  vp8_postproc_addnoise
-#define vp8_postproc_addnoise vp8_plane_add_noise_mmx
-
-#endif
-#endif
-
-
-#if HAVE_SSE2
-extern prototype_postproc_inplace(vp8_mbpost_proc_down_xmm);
-extern prototype_postproc_inplace(vp8_mbpost_proc_across_ip_xmm);
-extern prototype_postproc(vp8_post_proc_down_and_across_xmm);
-extern prototype_postproc_addnoise(vp8_plane_add_noise_wmt);
-
-#if !CONFIG_RUNTIME_CPU_DETECT
-#undef  vp8_postproc_down
-#define vp8_postproc_down vp8_mbpost_proc_down_xmm
-
-#undef  vp8_postproc_across
-#define vp8_postproc_across vp8_mbpost_proc_across_ip_xmm
-
-#undef  vp8_postproc_downacross
-#define vp8_postproc_downacross vp8_post_proc_down_and_across_xmm
-
-#undef  vp8_postproc_addnoise
-#define vp8_postproc_addnoise vp8_plane_add_noise_wmt
-
-
-#endif
-#endif
-
-#endif
--- a/Show More
+++ b/Show More