Update CHANGELOG for v1.4.0 (Indian Runner Duck) release

Change-Id: Id31b4da40c484aefc1236f5cc568171a9fd12af2
vp9: fix high-bitdepth NEON build
2015-04-03 11:49:19 -07:00 · 2015-04-02 15:19:46 -07:00 · 2015-04-02 15:19:23 -07:00 · 2015-03-23 23:54:52 -07:00 · 2015-03-23 17:21:27 -07:00
51 changed files with 617 additions and 1647 deletions
--- a/.mailmap
+++ b/.mailmap
@@ -1,18 +1,26 @@
 Adrian Grange <agrange@google.com>
+Alex Converse <aconverse@google.com> <alex.converse@gmail.com>
 Alexis Ballier <aballier@gentoo.org> <alexis.ballier@gmail.com>
+Alpha Lam <hclam@google.com> <hclam@chromium.org>
+Deb Mukherjee <debargha@google.com>
+Erik Niemeyer <erik.a.niemeyer@intel.com> <erik.a.niemeyer@gmail.com>
+Guillaume Martres <gmartres@google.com> <smarter3@gmail.com>
 Hangyu Kuang <hkuang@google.com>
 Jim Bankoski <jimbankoski@google.com>
-John Koleszar <jkoleszar@google.com>
 Johann Koenig <johannkoenig@google.com>
 Johann Koenig <johannkoenig@google.com> <johann.koenig@duck.com>
-Johann Koenig <johannkoenig@google.com> <johannkoenig@dhcp-172-19-7-52.mtv.corp.google.com>
+John Koleszar <jkoleszar@google.com>
+Joshua Litt <joshualitt@google.com> <joshualitt@chromium.org>
+Marco Paniconi <marpan@google.com>
+Marco Paniconi <marpan@google.com> <marpan@chromium.org>
 Pascal Massimino <pascal.massimino@gmail.com>
+Paul Wilkins <paulwilkins@google.com>
+Ralph Giles <giles@xiph.org> <giles@entropywave.com>
+Ralph Giles <giles@xiph.org> <giles@mozilla.com>
 Sami Pietilä <samipietila@google.com>
+Tamar Levy <tamar.levy@intel.com>
+Tamar Levy <tamar.levy@intel.com> <levytamar82@gmail.com>
 Tero Rintaluoma <teror@google.com> <tero.rintaluoma@on2.com>
 Timothy B. Terriberry <tterribe@xiph.org> Tim Terriberry <tterriberry@mozilla.com>
 Tom Finegan <tomfinegan@google.com>
-Ralph Giles <giles@xiph.org> <giles@entropywave.com>
-Ralph Giles <giles@xiph.org> <giles@mozilla.com>
-Alpha Lam <hclam@google.com> <hclam@chromium.org>
-Deb Mukherjee <debargha@google.com>
 Yaowu Xu <yaowu@google.com> <yaowu@xuyaowu.com>
--- a/29
+++ b/29
@@ -3,10 +3,11 @@

 Aaron Watry <awatry@gmail.com>
 Abo Talib Mahfoodh <ab.mahfoodh@gmail.com>
+Adam Xu <adam@xuyaowu.com>
 Adrian Grange <agrange@google.com>
 Ahmad Sharif <asharif@google.com>
 Alexander Voronov <avoronov@graphics.cs.msu.ru>
-Alex Converse <alex.converse@gmail.com>
+Alex Converse <aconverse@google.com>
 Alexis Ballier <aballier@gentoo.org>
 Alok Ahuja <waveletcoeff@gmail.com>
 Alpha Lam <hclam@google.com>
@@ -14,44 +15,58 @@ A.Mahfoodh <ab.mahfoodh@gmail.com>
 Ami Fischman <fischman@chromium.org>
 Andoni Morales Alastruey <ylatuya@gmail.com>
 Andres Mejia <mcitadel@gmail.com>
+Andrew Russell <anrussell@google.com>
 Aron Rosenberg <arosenberg@logitech.com>
 Attila Nagy <attilanagy@google.com>
 changjun.yang <changjun.yang@intel.com>
+Charles 'Buck' Krasic <ckrasic@google.com>
 chm <chm@rock-chips.com>
 Christian Duvivier <cduvivier@google.com>
 Daniel Kang <ddkang@google.com>
 Deb Mukherjee <debargha@google.com>
+Dim Temp <dimtemp0@gmail.com>
 Dmitry Kovalev <dkovalev@google.com>
 Dragan Mrdjan <dmrdjan@mips.com>
-Erik Niemeyer <erik.a.niemeyer@gmail.com>
+Ehsan Akhgari <ehsan.akhgari@gmail.com>
+Erik Niemeyer <erik.a.niemeyer@intel.com>
 Fabio Pedretti <fabio.ped@libero.it>
 Frank Galligan <fgalligan@google.com>
 Fredrik Söderquist <fs@opera.com>
 Fritz Koenig <frkoenig@google.com>
 Gaute Strokkenes <gaute.strokkenes@broadcom.com>
 Giuseppe Scrivano <gscrivano@gnu.org>
+Gordana Cmiljanovic <gordana.cmiljanovic@imgtec.com>
 Guillaume Martres <gmartres@google.com>
 Guillermo Ballester Valor <gbvalor@gmail.com>
 Hangyu Kuang <hkuang@google.com>
+Hanno Böck <hanno@hboeck.de>
 Henrik Lundin <hlundin@google.com>
 Hui Su <huisu@google.com>
 Ivan Maltz <ivanmaltz@google.com>
+Jacek Caban <cjacek@gmail.com>
+JackyChen <jackychen@google.com>
 James Berry <jamesberry@google.com>
+James Yu <james.yu@linaro.org>
 James Zern <jzern@google.com>
+Jan Gerber <j@mailb.org>
 Jan Kratochvil <jan.kratochvil@redhat.com>
 Janne Salonen <jsalonen@google.com>
 Jeff Faust <jfaust@google.com>
 Jeff Muizelaar <jmuizelaar@mozilla.com>
 Jeff Petkau <jpet@chromium.org>
+Jia Jia <jia.jia@linaro.org>
 Jim Bankoski <jimbankoski@google.com>
 Jingning Han <jingning@google.com>
+Joey Parrish <joeyparrish@google.com>
 Johann Koenig <johannkoenig@google.com>
 John Koleszar <jkoleszar@google.com>
+John Stark <jhnstrk@gmail.com>
 Joshua Bleecher Snyder <josh@treelinelabs.com>
 Joshua Litt <joshualitt@google.com>
 Justin Clift <justin@salasaga.org>
 Justin Lebar <justin.lebar@gmail.com>
 KO Myung-Hun <komh@chollian.net>
+Lawrence Velázquez <larryv@macports.org>
 Lou Quillio <louquillio@google.com>
 Luca Barbato <lu_zero@gentoo.org>
 Makoto Kato <makoto.kt@gmail.com>
@@ -65,6 +80,7 @@ Michael Kohler <michaelkohler@live.com>
 Mike Frysinger <vapier@chromium.org>
 Mike Hommey <mhommey@mozilla.com>
 Mikhal Shemer <mikhal@google.com>
+Minghai Shang <minghai@google.com>
 Morton Jonuschat <yabawock@gmail.com>
 Parag Salasakar <img.mips1@gmail.com>
 Pascal Massimino <pascal.massimino@gmail.com>
@@ -72,6 +88,8 @@ Patrik Westin <patrik.westin@gmail.com>
 Paul Wilkins <paulwilkins@google.com>
 Pavol Rusnak <stick@gk2.sk>
 Paweł Hajdan <phajdan@google.com>
+Pengchong Jin <pengchong@google.com>
+Peter de Rivaz <peter.derivaz@gmail.com>
 Philip Jägenstedt <philipj@opera.com>
 Priit Laes <plaes@plaes.org>
 Rafael Ávila de Espíndola <rafael.espindola@gmail.com>
@@ -79,22 +97,29 @@ Rafaël Carré <funman@videolan.org>
 Ralph Giles <giles@xiph.org>
 Rob Bradford <rob@linux.intel.com>
 Ronald S. Bultje <rbultje@google.com>
+Rui Ueyama <ruiu@google.com>
 Sami Pietilä <samipietila@google.com>
 Scott Graham <scottmg@chromium.org>
 Scott LaVarnway <slavarnway@google.com>
+Sean McGovern <gseanmcg@gmail.com>
+Sergey Ulanov <sergeyu@chromium.org>
 Shimon Doodkin <helpmepro1@gmail.com>
 Stefan Holmer <holmer@google.com>
 Suman Sunkara <sunkaras@google.com>
 Taekhyun Kim <takim@nvidia.com>
 Takanori MATSUURA <t.matsuu@gmail.com>
 Tamar Levy <tamar.levy@intel.com>
+Tao Bai <michaelbai@chromium.org>
 Tero Rintaluoma <teror@google.com>
 Thijs Vermeir <thijsvermeir@gmail.com>
+Tim Kopp <tkopp@google.com>
 Timothy B. Terriberry <tterribe@xiph.org>
 Tom Finegan <tomfinegan@google.com>
 Vignesh Venkatasubramanian <vigneshv@google.com>
 Yaowu Xu <yaowu@google.com>
+Yongzhe Wang <yongzhe@google.com>
 Yunqing Wang <yunqingwang@google.com>
+Zoe Liu <zoeliu@google.com>
 Google Inc.
 The Mozilla Foundation
 The Xiph.Org Foundation
--- a/23
+++ b/23
@@ -1,3 +1,26 @@
+2015-04-03 v1.4.0 "Indian Runner Duck"
+  This release includes significant improvements to the VP9 codec.
+
+  - Upgrading:
+    This release is ABI incompatible with 1.3.0. It drops the compatibility
+    layer, requiring VPX_IMG_FMT_* instead of IMG_FMT_*, and adds several codec
+    controls for VP9.
+
+  - Enhancements:
+    Faster VP9 encoding and decoding
+    Multithreaded VP9 decoding (tile and frame-based)
+    Multithreaded VP9 encoding - on by default
+    YUV 4:2:2 and 4:4:4 support in VP9
+    10 and 12bit support in VP9
+    64bit ARM support by replacing ARM assembly with intrinsics
+
+  - Bug Fixes:
+    Fixes a VP9 bitstream issue in Profile 1. This only affected non-YUV 4:2:0
+    files.
+
+  - Known Issues:
+    Frame Parallel decoding fails for segmented and non-420 files.
+
 2013-11-15 v1.3.0 "Forest"
  This release introduces the VP9 codec in a backward-compatible way.
  All existing users of VP8 can continue to use the library without
--- a/5
+++ b/5
@@ -1,4 +1,4 @@
-README - 30 May 2014
+README - 23 March 2015

 Welcome to the WebM VP8/VP9 Codec SDK!

@@ -78,6 +78,7 @@ COMPILING THE APPLICATIONS/LIBRARIES:
    x86-darwin11-gcc
    x86-darwin12-gcc
    x86-darwin13-gcc
+    x86-darwin14-gcc
    x86-iphonesimulator-gcc
    x86-linux-gcc
    x86-linux-icc
@@ -95,6 +96,7 @@ COMPILING THE APPLICATIONS/LIBRARIES:
    x86_64-darwin11-gcc
    x86_64-darwin12-gcc
    x86_64-darwin13-gcc
+    x86_64-darwin14-gcc
    x86_64-iphonesimulator-gcc
    x86_64-linux-gcc
    x86_64-linux-icc
@@ -111,6 +113,7 @@ COMPILING THE APPLICATIONS/LIBRARIES:
    universal-darwin11-gcc
    universal-darwin12-gcc
    universal-darwin13-gcc
+    universal-darwin14-gcc
    generic-gnu

  The generic-gnu target, in conjunction with the CROSS environment variable,
--- a/build/make/Makefile
+++ b/build/make/Makefile
@@ -383,8 +383,8 @@ LIBS=$(call enabled,LIBS)
 .libs: $(LIBS)
 	@touch $@
 $(foreach lib,$(filter %_g.a,$(LIBS)),$(eval $(call archive_template,$(lib))))
-$(foreach lib,$(filter %so.$(VERSION_MAJOR).$(VERSION_MINOR).$(VERSION_PATCH),$(LIBS)),$(eval $(call so_template,$(lib))))
-$(foreach lib,$(filter %$(VERSION_MAJOR).dylib,$(LIBS)),$(eval $(call dl_template,$(lib))))
+$(foreach lib,$(filter %so.$(SO_VERSION_MAJOR).$(SO_VERSION_MINOR).$(SO_VERSION_PATCH),$(LIBS)),$(eval $(call so_template,$(lib))))
+$(foreach lib,$(filter %$(SO_VERSION_MAJOR).dylib,$(LIBS)),$(eval $(call dl_template,$(lib))))

 INSTALL-LIBS=$(call cond_enabled,CONFIG_INSTALL_LIBS,INSTALL-LIBS)
 ifeq ($(MAKECMDGOALS),dist)
--- a/build/make/configure.sh
+++ b/build/make/configure.sh
@@ -1041,31 +1041,6 @@ EOF
        check_add_cflags -mips32r2 -mdspr2
        disable_feature fast_unaligned
      fi
-
-      if [ -n "${tune_cpu}" ]; then
-        case ${tune_cpu} in
-          p5600)
-            add_cflags -mips32r5 -funroll-loops -mload-store-pairs
-            add_cflags -msched-weight -mhard-float
-            add_asflags -mips32r5 -mhard-float
-            ;;
-          i6400)
-            add_cflags -mips64r6 -mabi=64 -funroll-loops -mload-store-pairs
-            add_cflags -msched-weight -mhard-float
-            add_asflags -mips64r6 -mabi=64 -mhard-float
-            add_ldflags -mips64r6 -mabi=64
-            ;;
-        esac
-
-        if enabled msa; then
-          add_cflags -mmsa -mfp64 -flax-vector-conversions
-          add_asflags -mmsa -mfp64 -flax-vector-conversions
-          add_ldflags -mmsa -mfp64 -flax-vector-conversions
-
-          disable_feature fast_unaligned
-        fi
-      fi
-
      check_add_cflags -march=${tgt_isa}
      check_add_asflags -march=${tgt_isa}
      check_add_asflags -KPIC
--- a/build/make/rtcd.pl
+++ b/build/make/rtcd.pl
@@ -376,10 +376,6 @@ if ($opts{arch} eq 'x86') {
      @ALL_ARCHS = filter("$opts{arch}", qw/dspr2/);
      last;
    }
-    if (/HAVE_MSA=yes/) {
-      @ALL_ARCHS = filter("$opts{arch}", qw/msa/);
-      last;
-    }
  }
  close CONFIG_FILE;
  mips;
--- a/2
+++ b/2
@@ -258,7 +258,7 @@ ARCH_EXT_LIST="

    mips32
    dspr2
-    msa
+
    mips64

    mmx
--- a/libs.mk
+++ b/libs.mk
@@ -230,25 +230,27 @@ $(BUILD_PFX)libvpx_g.a: $(LIBVPX_OBJS)

 BUILD_LIBVPX_SO         := $(if $(BUILD_LIBVPX),$(CONFIG_SHARED))

+SO_VERSION_MAJOR := 2
+SO_VERSION_MINOR := 0
+SO_VERSION_PATCH := 0
 ifeq ($(filter darwin%,$(TGT_OS)),$(TGT_OS))
-LIBVPX_SO               := libvpx.$(VERSION_MAJOR).dylib
+LIBVPX_SO               := libvpx.$(SO_VERSION_MAJOR).dylib
 EXPORT_FILE             := libvpx.syms
 LIBVPX_SO_SYMLINKS      := $(addprefix $(LIBSUBDIR)/, \
                             libvpx.dylib  )
 else
-LIBVPX_SO               := libvpx.so.$(VERSION_MAJOR).$(VERSION_MINOR).$(VERSION_PATCH)
+LIBVPX_SO               := libvpx.so.$(SO_VERSION_MAJOR).$(SO_VERSION_MINOR).$(SO_VERSION_PATCH)
 EXPORT_FILE             := libvpx.ver
-SYM_LINK                := libvpx.so
 LIBVPX_SO_SYMLINKS      := $(addprefix $(LIBSUBDIR)/, \
-                             libvpx.so libvpx.so.$(VERSION_MAJOR) \
-                             libvpx.so.$(VERSION_MAJOR).$(VERSION_MINOR))
+                             libvpx.so libvpx.so.$(SO_VERSION_MAJOR) \
+                             libvpx.so.$(SO_VERSION_MAJOR).$(SO_VERSION_MINOR))
 endif

 LIBS-$(BUILD_LIBVPX_SO) += $(BUILD_PFX)$(LIBVPX_SO)\
                           $(notdir $(LIBVPX_SO_SYMLINKS))
 $(BUILD_PFX)$(LIBVPX_SO): $(LIBVPX_OBJS) $(EXPORT_FILE)
 $(BUILD_PFX)$(LIBVPX_SO): extralibs += -lm
-$(BUILD_PFX)$(LIBVPX_SO): SONAME = libvpx.so.$(VERSION_MAJOR)
+$(BUILD_PFX)$(LIBVPX_SO): SONAME = libvpx.so.$(SO_VERSION_MAJOR)
 $(BUILD_PFX)$(LIBVPX_SO): EXPORTS_FILE = $(EXPORT_FILE)

 libvpx.ver: $(call enabled,CODEC_EXPORTS)
--- a/test/partial_idct_test.cc
+++ b/test/partial_idct_test.cc
@@ -230,7 +230,7 @@ INSTANTIATE_TEST_CASE_P(
                   &vp9_idct4x4_1_add_c,
                   TX_4X4, 1)));

-#if HAVE_NEON
+#if HAVE_NEON && !CONFIG_VP9_HIGHBITDEPTH && !CONFIG_EMULATE_HARDWARE
 INSTANTIATE_TEST_CASE_P(
    NEON, PartialIDctTest,
    ::testing::Values(
@@ -258,7 +258,7 @@ INSTANTIATE_TEST_CASE_P(
                   &vp9_idct4x4_16_add_c,
                   &vp9_idct4x4_1_add_neon,
                   TX_4X4, 1)));
-#endif  // HAVE_NEON
+#endif  // HAVE_NEON && !CONFIG_VP9_HIGHBITDEPTH && !CONFIG_EMULATE_HARDWARE

 #if HAVE_SSE2 && !CONFIG_VP9_HIGHBITDEPTH && !CONFIG_EMULATE_HARDWARE
 INSTANTIATE_TEST_CASE_P(
--- a/test/test_vector_test.cc
+++ b/test/test_vector_test.cc
@@ -29,7 +29,7 @@ namespace {

 enum DecodeMode {
  kSerialMode,
-  kFrameParallelMode
+  kFrameParallMode
 };

 const int kDecodeMode = 0;
@@ -95,7 +95,7 @@ TEST_P(TestVectorTest, MD5Match) {
  vpx_codec_dec_cfg_t cfg = {0};
  char str[256];

-  if (mode == kFrameParallelMode) {
+  if (mode == kFrameParallMode) {
    flags |= VPX_CODEC_USE_FRAME_THREADING;
  }

--- a/vp8/vp8_cx_iface.c
+++ b/vp8/vp8_cx_iface.c
@@ -858,6 +858,9 @@ static vpx_codec_err_t vp8e_encode(vpx_codec_alg_priv_t  *ctx,
 {
    vpx_codec_err_t res = VPX_CODEC_OK;

+    if (!ctx->cfg.rc_target_bitrate)
+        return res;
+
    if (!ctx->cfg.rc_target_bitrate)
        return res;

--- a/vp9/common/vp9_alloccommon.c
+++ b/vp9/common/vp9_alloccommon.c
@@ -83,7 +83,8 @@ static void free_seg_map(VP9_COMMON *cm) {
  }
 }

-void vp9_free_ref_frame_buffers(BufferPool *pool) {
+void vp9_free_ref_frame_buffers(VP9_COMMON *cm) {
+  BufferPool *const pool = cm->buffer_pool;
  int i;

  for (i = 0; i < FRAME_BUFFERS; ++i) {
@@ -96,14 +97,10 @@ void vp9_free_ref_frame_buffers(BufferPool *pool) {
    pool->frame_bufs[i].mvs = NULL;
    vp9_free_frame_buffer(&pool->frame_bufs[i].buf);
  }
-}

-void vp9_free_postproc_buffers(VP9_COMMON *cm) {
 #if CONFIG_VP9_POSTPROC
  vp9_free_frame_buffer(&cm->post_proc_buffer);
  vp9_free_frame_buffer(&cm->post_proc_buffer_int);
-#else
-  (void)cm;
 #endif
 }

@@ -145,6 +142,7 @@ int vp9_alloc_context_buffers(VP9_COMMON *cm, int width, int height) {
 }

 void vp9_remove_common(VP9_COMMON *cm) {
+  vp9_free_ref_frame_buffers(cm);
  vp9_free_context_buffers(cm);

  vpx_free(cm->fc);
--- a/vp9/common/vp9_alloccommon.h
+++ b/vp9/common/vp9_alloccommon.h
@@ -19,7 +19,6 @@ extern "C" {
 #endif

 struct VP9Common;
-struct BufferPool;

 void vp9_remove_common(struct VP9Common *cm);

@@ -27,8 +26,7 @@ int vp9_alloc_context_buffers(struct VP9Common *cm, int width, int height);
 void vp9_init_context_buffers(struct VP9Common *cm);
 void vp9_free_context_buffers(struct VP9Common *cm);

-void vp9_free_ref_frame_buffers(struct BufferPool *pool);
-void vp9_free_postproc_buffers(struct VP9Common *cm);
+void vp9_free_ref_frame_buffers(struct VP9Common *cm);

 int vp9_alloc_state_buffers(struct VP9Common *cm, int width, int height);
 void vp9_free_state_buffers(struct VP9Common *cm);
--- a/vp9/common/vp9_entropy.c
+++ b/vp9/common/vp9_entropy.c
@@ -15,18 +15,6 @@
 #include "vpx_mem/vpx_mem.h"
 #include "vpx/vpx_integer.h"

-// Unconstrained Node Tree
-const vp9_tree_index vp9_coef_con_tree[TREE_SIZE(ENTROPY_TOKENS)] = {
-  2, 6,                                // 0 = LOW_VAL
-  -TWO_TOKEN, 4,                       // 1 = TWO
-  -THREE_TOKEN, -FOUR_TOKEN,           // 2 = THREE
-  8, 10,                               // 3 = HIGH_LOW
-  -CATEGORY1_TOKEN, -CATEGORY2_TOKEN,  // 4 = CAT_ONE
-  12, 14,                              // 5 = CAT_THREEFOUR
-  -CATEGORY3_TOKEN, -CATEGORY4_TOKEN,  // 6 = CAT_THREE
-  -CATEGORY5_TOKEN, -CATEGORY6_TOKEN   // 7 = CAT_FIVE
-};
-
 const vp9_prob vp9_cat1_prob[] = { 159 };
 const vp9_prob vp9_cat2_prob[] = { 165, 145 };
 const vp9_prob vp9_cat3_prob[] = { 173, 148, 140 };
--- a/vp9/common/vp9_entropy.h
+++ b/vp9/common/vp9_entropy.h
@@ -173,7 +173,6 @@ static INLINE const uint8_t *get_band_translate(TX_SIZE tx_size) {
 #define PIVOT_NODE                  2   // which node is pivot

 #define MODEL_NODES (ENTROPY_NODES - UNCONSTRAINED_NODES)
-extern const vp9_tree_index vp9_coef_con_tree[TREE_SIZE(ENTROPY_TOKENS)];
 extern const vp9_prob vp9_pareto8_full[COEFF_PROB_MODELS][MODEL_NODES];

 typedef vp9_prob vp9_coeff_probs_model[REF_TYPES][COEF_BANDS]
--- a/vp9/common/vp9_loopfilter.c
+++ b/vp9/common/vp9_loopfilter.c
@@ -293,7 +293,7 @@ void vp9_loop_filter_frame_init(VP9_COMMON *cm, int default_filt_lvl) {
  }
 }

-static void filter_selectively_vert_row2(int subsampling_factor,
+static void filter_selectively_vert_row2(PLANE_TYPE plane_type,
                                         uint8_t *s, int pitch,
                                         unsigned int mask_16x16_l,
                                         unsigned int mask_8x8_l,
@@ -301,9 +301,9 @@ static void filter_selectively_vert_row2(int subsampling_factor,
                                         unsigned int mask_4x4_int_l,
                                         const loop_filter_info_n *lfi_n,
                                         const uint8_t *lfl) {
-  const int mask_shift = subsampling_factor ? 4 : 8;
-  const int mask_cutoff = subsampling_factor ? 0xf : 0xff;
-  const int lfl_forward = subsampling_factor ? 4 : 8;
+  const int mask_shift = plane_type ? 4 : 8;
+  const int mask_cutoff = plane_type ? 0xf : 0xff;
+  const int lfl_forward = plane_type ? 4 : 8;

  unsigned int mask_16x16_0 = mask_16x16_l & mask_cutoff;
  unsigned int mask_8x8_0 = mask_8x8_l & mask_cutoff;
@@ -393,7 +393,7 @@ static void filter_selectively_vert_row2(int subsampling_factor,
 }

 #if CONFIG_VP9_HIGHBITDEPTH
-static void highbd_filter_selectively_vert_row2(int subsampling_factor,
+static void highbd_filter_selectively_vert_row2(PLANE_TYPE plane_type,
                                                uint16_t *s, int pitch,
                                                unsigned int mask_16x16_l,
                                                unsigned int mask_8x8_l,
@@ -401,9 +401,9 @@ static void highbd_filter_selectively_vert_row2(int subsampling_factor,
                                                unsigned int mask_4x4_int_l,
                                                const loop_filter_info_n *lfi_n,
                                                const uint8_t *lfl, int bd) {
-  const int mask_shift = subsampling_factor ? 4 : 8;
-  const int mask_cutoff = subsampling_factor ? 0xf : 0xff;
-  const int lfl_forward = subsampling_factor ? 4 : 8;
+  const int mask_shift = plane_type ? 4 : 8;
+  const int mask_cutoff = plane_type ? 0xf : 0xff;
+  const int lfl_forward = plane_type ? 4 : 8;

  unsigned int mask_16x16_0 = mask_16x16_l & mask_cutoff;
  unsigned int mask_8x8_0 = mask_8x8_l & mask_cutoff;
@@ -1326,203 +1326,248 @@ void vp9_filter_block_plane_non420(VP9_COMMON *cm,
  }
 }

-void vp9_filter_block_plane_ss00(VP9_COMMON *const cm,
-                                 struct macroblockd_plane *const plane,
-                                 int mi_row,
-                                 LOOP_FILTER_MASK *lfm) {
+void vp9_filter_block_plane(VP9_COMMON *const cm,
+                            struct macroblockd_plane *const plane,
+                            int mi_row,
+                            LOOP_FILTER_MASK *lfm) {
  struct buf_2d *const dst = &plane->dst;
-  uint8_t *const dst0 = dst->buf;
-  int r;
-  uint64_t mask_16x16 = lfm->left_y[TX_16X16];
-  uint64_t mask_8x8 = lfm->left_y[TX_8X8];
-  uint64_t mask_4x4 = lfm->left_y[TX_4X4];
-  uint64_t mask_4x4_int = lfm->int_4x4_y;
-
-  assert(plane->subsampling_x == 0 && plane->subsampling_y == 0);
-
-  // Vertical pass: do 2 rows at one time
-  for (r = 0; r < MI_BLOCK_SIZE && mi_row + r < cm->mi_rows; r += 2) {
-    unsigned int mask_16x16_l = mask_16x16 & 0xffff;
-    unsigned int mask_8x8_l = mask_8x8 & 0xffff;
-    unsigned int mask_4x4_l = mask_4x4 & 0xffff;
-    unsigned int mask_4x4_int_l = mask_4x4_int & 0xffff;
-
-// Disable filtering on the leftmost column.
-#if CONFIG_VP9_HIGHBITDEPTH
-    if (cm->use_highbitdepth) {
-      highbd_filter_selectively_vert_row2(
-          plane->subsampling_x, CONVERT_TO_SHORTPTR(dst->buf), dst->stride,
-          mask_16x16_l, mask_8x8_l, mask_4x4_l, mask_4x4_int_l, &cm->lf_info,
-          &lfm->lfl_y[r << 3], (int)cm->bit_depth);
-    } else {
-      filter_selectively_vert_row2(
-          plane->subsampling_x, dst->buf, dst->stride, mask_16x16_l, mask_8x8_l,
-          mask_4x4_l, mask_4x4_int_l, &cm->lf_info, &lfm->lfl_y[r << 3]);
-    }
-#else
-    filter_selectively_vert_row2(
-        plane->subsampling_x, dst->buf, dst->stride, mask_16x16_l, mask_8x8_l,
-        mask_4x4_l, mask_4x4_int_l, &cm->lf_info, &lfm->lfl_y[r << 3]);
-#endif  // CONFIG_VP9_HIGHBITDEPTH
-    dst->buf += 16 * dst->stride;
-    mask_16x16 >>= 16;
-    mask_8x8 >>= 16;
-    mask_4x4 >>= 16;
-    mask_4x4_int >>= 16;
-  }
-
-  // Horizontal pass
-  dst->buf = dst0;
-  mask_16x16 = lfm->above_y[TX_16X16];
-  mask_8x8 = lfm->above_y[TX_8X8];
-  mask_4x4 = lfm->above_y[TX_4X4];
-  mask_4x4_int = lfm->int_4x4_y;
-
-  for (r = 0; r < MI_BLOCK_SIZE && mi_row + r < cm->mi_rows; r++) {
-    unsigned int mask_16x16_r;
-    unsigned int mask_8x8_r;
-    unsigned int mask_4x4_r;
-
-    if (mi_row + r == 0) {
-      mask_16x16_r = 0;
-      mask_8x8_r = 0;
-      mask_4x4_r = 0;
-    } else {
-      mask_16x16_r = mask_16x16 & 0xff;
-      mask_8x8_r = mask_8x8 & 0xff;
-      mask_4x4_r = mask_4x4 & 0xff;
-    }
-
-#if CONFIG_VP9_HIGHBITDEPTH
-    if (cm->use_highbitdepth) {
-      highbd_filter_selectively_horiz(
-          CONVERT_TO_SHORTPTR(dst->buf), dst->stride, mask_16x16_r, mask_8x8_r,
-          mask_4x4_r, mask_4x4_int & 0xff, &cm->lf_info, &lfm->lfl_y[r << 3],
-          (int)cm->bit_depth);
-    } else {
-      filter_selectively_horiz(dst->buf, dst->stride, mask_16x16_r, mask_8x8_r,
-                               mask_4x4_r, mask_4x4_int & 0xff, &cm->lf_info,
-                               &lfm->lfl_y[r << 3]);
-    }
-#else
-    filter_selectively_horiz(dst->buf, dst->stride, mask_16x16_r, mask_8x8_r,
-                             mask_4x4_r, mask_4x4_int & 0xff, &cm->lf_info,
-                             &lfm->lfl_y[r << 3]);
-#endif  // CONFIG_VP9_HIGHBITDEPTH
-
-    dst->buf += 8 * dst->stride;
-    mask_16x16 >>= 8;
-    mask_8x8 >>= 8;
-    mask_4x4 >>= 8;
-    mask_4x4_int >>= 8;
-  }
-}
-
-void vp9_filter_block_plane_ss11(VP9_COMMON *const cm,
-                                 struct macroblockd_plane *const plane,
-                                 int mi_row,
-                                 LOOP_FILTER_MASK *lfm) {
-  struct buf_2d *const dst = &plane->dst;
-  uint8_t *const dst0 = dst->buf;
+  uint8_t* const dst0 = dst->buf;
  int r, c;

-  uint16_t mask_16x16 = lfm->left_uv[TX_16X16];
-  uint16_t mask_8x8 = lfm->left_uv[TX_8X8];
-  uint16_t mask_4x4 = lfm->left_uv[TX_4X4];
-  uint16_t mask_4x4_int = lfm->int_4x4_uv;
+  if (!plane->plane_type) {
+    uint64_t mask_16x16 = lfm->left_y[TX_16X16];
+    uint64_t mask_8x8 = lfm->left_y[TX_8X8];
+    uint64_t mask_4x4 = lfm->left_y[TX_4X4];
+    uint64_t mask_4x4_int = lfm->int_4x4_y;

-  assert(plane->subsampling_x == 1 && plane->subsampling_y == 1);
+    // Vertical pass: do 2 rows at one time
+    for (r = 0; r < MI_BLOCK_SIZE && mi_row + r < cm->mi_rows; r += 2) {
+      unsigned int mask_16x16_l = mask_16x16 & 0xffff;
+      unsigned int mask_8x8_l = mask_8x8 & 0xffff;
+      unsigned int mask_4x4_l = mask_4x4 & 0xffff;
+      unsigned int mask_4x4_int_l = mask_4x4_int & 0xffff;

-  // Vertical pass: do 2 rows at one time
-  for (r = 0; r < MI_BLOCK_SIZE && mi_row + r < cm->mi_rows; r += 4) {
-    if (plane->plane_type == 1) {
-      for (c = 0; c < (MI_BLOCK_SIZE >> 1); c++) {
-        lfm->lfl_uv[(r << 1) + c] = lfm->lfl_y[(r << 3) + (c << 1)];
-        lfm->lfl_uv[((r + 2) << 1) + c] = lfm->lfl_y[((r + 2) << 3) + (c << 1)];
-      }
-    }
-
-    {
-      unsigned int mask_16x16_l = mask_16x16 & 0xff;
-      unsigned int mask_8x8_l = mask_8x8 & 0xff;
-      unsigned int mask_4x4_l = mask_4x4 & 0xff;
-      unsigned int mask_4x4_int_l = mask_4x4_int & 0xff;
-
-// Disable filtering on the leftmost column.
+      // Disable filtering on the leftmost column.
 #if CONFIG_VP9_HIGHBITDEPTH
      if (cm->use_highbitdepth) {
-        highbd_filter_selectively_vert_row2(
-            plane->subsampling_x, CONVERT_TO_SHORTPTR(dst->buf), dst->stride,
-            mask_16x16_l, mask_8x8_l, mask_4x4_l, mask_4x4_int_l, &cm->lf_info,
-            &lfm->lfl_uv[r << 1], (int)cm->bit_depth);
+        highbd_filter_selectively_vert_row2(plane->plane_type,
+                                            CONVERT_TO_SHORTPTR(dst->buf),
+                                            dst->stride,
+                                            mask_16x16_l,
+                                            mask_8x8_l,
+                                            mask_4x4_l,
+                                            mask_4x4_int_l,
+                                            &cm->lf_info, &lfm->lfl_y[r << 3],
+                                            (int)cm->bit_depth);
      } else {
-        filter_selectively_vert_row2(
-            plane->subsampling_x, dst->buf, dst->stride,
-            mask_16x16_l, mask_8x8_l, mask_4x4_l, mask_4x4_int_l, &cm->lf_info,
-            &lfm->lfl_uv[r << 1]);
+        filter_selectively_vert_row2(plane->plane_type,
+                                     dst->buf, dst->stride,
+                                     mask_16x16_l,
+                                     mask_8x8_l,
+                                     mask_4x4_l,
+                                     mask_4x4_int_l,
+                                     &cm->lf_info,
+                                     &lfm->lfl_y[r << 3]);
      }
 #else
-      filter_selectively_vert_row2(
-          plane->subsampling_x, dst->buf, dst->stride,
-          mask_16x16_l, mask_8x8_l, mask_4x4_l, mask_4x4_int_l, &cm->lf_info,
-          &lfm->lfl_uv[r << 1]);
+      filter_selectively_vert_row2(plane->plane_type,
+                                   dst->buf, dst->stride,
+                                   mask_16x16_l,
+                                   mask_8x8_l,
+                                   mask_4x4_l,
+                                   mask_4x4_int_l,
+                                   &cm->lf_info, &lfm->lfl_y[r << 3]);
+#endif  // CONFIG_VP9_HIGHBITDEPTH
+      dst->buf += 16 * dst->stride;
+      mask_16x16 >>= 16;
+      mask_8x8 >>= 16;
+      mask_4x4 >>= 16;
+      mask_4x4_int >>= 16;
+    }
+
+    // Horizontal pass
+    dst->buf = dst0;
+    mask_16x16 = lfm->above_y[TX_16X16];
+    mask_8x8 = lfm->above_y[TX_8X8];
+    mask_4x4 = lfm->above_y[TX_4X4];
+    mask_4x4_int = lfm->int_4x4_y;
+
+    for (r = 0; r < MI_BLOCK_SIZE && mi_row + r < cm->mi_rows; r++) {
+      unsigned int mask_16x16_r;
+      unsigned int mask_8x8_r;
+      unsigned int mask_4x4_r;
+
+      if (mi_row + r == 0) {
+        mask_16x16_r = 0;
+        mask_8x8_r = 0;
+        mask_4x4_r = 0;
+      } else {
+        mask_16x16_r = mask_16x16 & 0xff;
+        mask_8x8_r = mask_8x8 & 0xff;
+        mask_4x4_r = mask_4x4 & 0xff;
+      }
+
+#if CONFIG_VP9_HIGHBITDEPTH
+      if (cm->use_highbitdepth) {
+        highbd_filter_selectively_horiz(CONVERT_TO_SHORTPTR(dst->buf),
+                                        dst->stride,
+                                        mask_16x16_r,
+                                        mask_8x8_r,
+                                        mask_4x4_r,
+                                        mask_4x4_int & 0xff,
+                                        &cm->lf_info,
+                                        &lfm->lfl_y[r << 3],
+                                        (int)cm->bit_depth);
+      } else {
+        filter_selectively_horiz(dst->buf, dst->stride,
+                                 mask_16x16_r,
+                                 mask_8x8_r,
+                                 mask_4x4_r,
+                                 mask_4x4_int & 0xff,
+                                 &cm->lf_info,
+                                 &lfm->lfl_y[r << 3]);
+      }
+#else
+      filter_selectively_horiz(dst->buf, dst->stride,
+                               mask_16x16_r,
+                               mask_8x8_r,
+                               mask_4x4_r,
+                               mask_4x4_int & 0xff,
+                               &cm->lf_info,
+                               &lfm->lfl_y[r << 3]);
 #endif  // CONFIG_VP9_HIGHBITDEPTH

-      dst->buf += 16 * dst->stride;
+      dst->buf += 8 * dst->stride;
      mask_16x16 >>= 8;
      mask_8x8 >>= 8;
      mask_4x4 >>= 8;
      mask_4x4_int >>= 8;
    }
-  }
+  } else {
+    uint16_t mask_16x16 = lfm->left_uv[TX_16X16];
+    uint16_t mask_8x8 = lfm->left_uv[TX_8X8];
+    uint16_t mask_4x4 = lfm->left_uv[TX_4X4];
+    uint16_t mask_4x4_int = lfm->int_4x4_uv;

-  // Horizontal pass
-  dst->buf = dst0;
-  mask_16x16 = lfm->above_uv[TX_16X16];
-  mask_8x8 = lfm->above_uv[TX_8X8];
-  mask_4x4 = lfm->above_uv[TX_4X4];
-  mask_4x4_int = lfm->int_4x4_uv;
+    // Vertical pass: do 2 rows at one time
+    for (r = 0; r < MI_BLOCK_SIZE && mi_row + r < cm->mi_rows; r += 4) {
+      if (plane->plane_type == 1) {
+        for (c = 0; c < (MI_BLOCK_SIZE >> 1); c++) {
+          lfm->lfl_uv[(r << 1) + c] = lfm->lfl_y[(r << 3) + (c << 1)];
+          lfm->lfl_uv[((r + 2) << 1) + c] = lfm->lfl_y[((r + 2) << 3) +
+                                                       (c << 1)];
+        }
+      }

-  for (r = 0; r < MI_BLOCK_SIZE && mi_row + r < cm->mi_rows; r += 2) {
-    const int skip_border_4x4_r = mi_row + r == cm->mi_rows - 1;
-    const unsigned int mask_4x4_int_r =
-        skip_border_4x4_r ? 0 : (mask_4x4_int & 0xf);
-    unsigned int mask_16x16_r;
-    unsigned int mask_8x8_r;
-    unsigned int mask_4x4_r;
-
-    if (mi_row + r == 0) {
-      mask_16x16_r = 0;
-      mask_8x8_r = 0;
-      mask_4x4_r = 0;
-    } else {
-      mask_16x16_r = mask_16x16 & 0xf;
-      mask_8x8_r = mask_8x8 & 0xf;
-      mask_4x4_r = mask_4x4 & 0xf;
-    }
+      {
+        unsigned int mask_16x16_l = mask_16x16 & 0xff;
+        unsigned int mask_8x8_l = mask_8x8 & 0xff;
+        unsigned int mask_4x4_l = mask_4x4 & 0xff;
+        unsigned int mask_4x4_int_l = mask_4x4_int & 0xff;

+        // Disable filtering on the leftmost column.
 #if CONFIG_VP9_HIGHBITDEPTH
-    if (cm->use_highbitdepth) {
-      highbd_filter_selectively_horiz(CONVERT_TO_SHORTPTR(dst->buf),
-                                      dst->stride, mask_16x16_r, mask_8x8_r,
-                                      mask_4x4_r, mask_4x4_int_r, &cm->lf_info,
-                                      &lfm->lfl_uv[r << 1], (int)cm->bit_depth);
-    } else {
-      filter_selectively_horiz(dst->buf, dst->stride, mask_16x16_r, mask_8x8_r,
-                               mask_4x4_r, mask_4x4_int_r, &cm->lf_info,
-                               &lfm->lfl_uv[r << 1]);
-    }
+        if (cm->use_highbitdepth) {
+          highbd_filter_selectively_vert_row2(plane->plane_type,
+                                              CONVERT_TO_SHORTPTR(dst->buf),
+                                              dst->stride,
+                                              mask_16x16_l,
+                                              mask_8x8_l,
+                                              mask_4x4_l,
+                                              mask_4x4_int_l,
+                                              &cm->lf_info,
+                                              &lfm->lfl_uv[r << 1],
+                                              (int)cm->bit_depth);
+        } else {
+          filter_selectively_vert_row2(plane->plane_type,
+                                       dst->buf, dst->stride,
+                                       mask_16x16_l,
+                                       mask_8x8_l,
+                                       mask_4x4_l,
+                                       mask_4x4_int_l,
+                                       &cm->lf_info,
+                                       &lfm->lfl_uv[r << 1]);
+        }
 #else
-    filter_selectively_horiz(dst->buf, dst->stride, mask_16x16_r, mask_8x8_r,
-                             mask_4x4_r, mask_4x4_int_r, &cm->lf_info,
-                             &lfm->lfl_uv[r << 1]);
+        filter_selectively_vert_row2(plane->plane_type,
+                                     dst->buf, dst->stride,
+                                     mask_16x16_l,
+                                     mask_8x8_l,
+                                     mask_4x4_l,
+                                     mask_4x4_int_l,
+                                     &cm->lf_info,
+                                     &lfm->lfl_uv[r << 1]);
 #endif  // CONFIG_VP9_HIGHBITDEPTH

-    dst->buf += 8 * dst->stride;
-    mask_16x16 >>= 4;
-    mask_8x8 >>= 4;
-    mask_4x4 >>= 4;
-    mask_4x4_int >>= 4;
+        dst->buf += 16 * dst->stride;
+        mask_16x16 >>= 8;
+        mask_8x8 >>= 8;
+        mask_4x4 >>= 8;
+        mask_4x4_int >>= 8;
+      }
+    }
+
+    // Horizontal pass
+    dst->buf = dst0;
+    mask_16x16 = lfm->above_uv[TX_16X16];
+    mask_8x8 = lfm->above_uv[TX_8X8];
+    mask_4x4 = lfm->above_uv[TX_4X4];
+    mask_4x4_int = lfm->int_4x4_uv;
+
+    for (r = 0; r < MI_BLOCK_SIZE && mi_row + r < cm->mi_rows; r += 2) {
+      const int skip_border_4x4_r = mi_row + r == cm->mi_rows - 1;
+      const unsigned int mask_4x4_int_r = skip_border_4x4_r ?
+          0 : (mask_4x4_int & 0xf);
+      unsigned int mask_16x16_r;
+      unsigned int mask_8x8_r;
+      unsigned int mask_4x4_r;
+
+      if (mi_row + r == 0) {
+        mask_16x16_r = 0;
+        mask_8x8_r = 0;
+        mask_4x4_r = 0;
+      } else {
+        mask_16x16_r = mask_16x16 & 0xf;
+        mask_8x8_r = mask_8x8 & 0xf;
+        mask_4x4_r = mask_4x4 & 0xf;
+      }
+
+#if CONFIG_VP9_HIGHBITDEPTH
+      if (cm->use_highbitdepth) {
+        highbd_filter_selectively_horiz(CONVERT_TO_SHORTPTR(dst->buf),
+                                        dst->stride,
+                                        mask_16x16_r,
+                                        mask_8x8_r,
+                                        mask_4x4_r,
+                                        mask_4x4_int_r,
+                                        &cm->lf_info,
+                                        &lfm->lfl_uv[r << 1],
+                                        (int)cm->bit_depth);
+      } else {
+        filter_selectively_horiz(dst->buf, dst->stride,
+                                 mask_16x16_r,
+                                 mask_8x8_r,
+                                 mask_4x4_r,
+                                 mask_4x4_int_r,
+                                 &cm->lf_info,
+                                 &lfm->lfl_uv[r << 1]);
+      }
+#else
+      filter_selectively_horiz(dst->buf, dst->stride,
+                               mask_16x16_r,
+                               mask_8x8_r,
+                               mask_4x4_r,
+                               mask_4x4_int_r,
+                               &cm->lf_info,
+                               &lfm->lfl_uv[r << 1]);
+#endif  // CONFIG_VP9_HIGHBITDEPTH
+
+      dst->buf += 8 * dst->stride;
+      mask_16x16 >>= 4;
+      mask_8x8 >>= 4;
+      mask_4x4 >>= 4;
+      mask_4x4_int >>= 4;
+    }
  }
 }

@@ -1531,19 +1576,11 @@ void vp9_loop_filter_rows(YV12_BUFFER_CONFIG *frame_buffer,
                          struct macroblockd_plane planes[MAX_MB_PLANE],
                          int start, int stop, int y_only) {
  const int num_planes = y_only ? 1 : MAX_MB_PLANE;
-  enum lf_path path;
+  const int use_420 = y_only || (planes[1].subsampling_y == 1 &&
+                                 planes[1].subsampling_x == 1);
  LOOP_FILTER_MASK lfm;
  int mi_row, mi_col;

-  if (y_only)
-    path = LF_PATH_444;
-  else if (planes[1].subsampling_y == 1 && planes[1].subsampling_x == 1)
-    path = LF_PATH_420;
-  else if (planes[1].subsampling_y == 0 && planes[1].subsampling_x == 0)
-    path = LF_PATH_444;
-  else
-    path = LF_PATH_SLOW;
-
  for (mi_row = start; mi_row < stop; mi_row += MI_BLOCK_SIZE) {
    MODE_INFO *mi = cm->mi + mi_row * cm->mi_stride;

@@ -1553,23 +1590,16 @@ void vp9_loop_filter_rows(YV12_BUFFER_CONFIG *frame_buffer,
      vp9_setup_dst_planes(planes, frame_buffer, mi_row, mi_col);

      // TODO(JBB): Make setup_mask work for non 420.
-      vp9_setup_mask(cm, mi_row, mi_col, mi + mi_col, cm->mi_stride,
-                     &lfm);
+      if (use_420)
+        vp9_setup_mask(cm, mi_row, mi_col, mi + mi_col, cm->mi_stride,
+                       &lfm);

-      vp9_filter_block_plane_ss00(cm, &planes[0], mi_row, &lfm);
-      for (plane = 1; plane < num_planes; ++plane) {
-        switch (path) {
-          case LF_PATH_420:
-            vp9_filter_block_plane_ss11(cm, &planes[plane], mi_row, &lfm);
-            break;
-          case LF_PATH_444:
-            vp9_filter_block_plane_ss00(cm, &planes[plane], mi_row, &lfm);
-            break;
-          case LF_PATH_SLOW:
-            vp9_filter_block_plane_non420(cm, &planes[plane], mi + mi_col,
-                                          mi_row, mi_col);
-            break;
-        }
+      for (plane = 0; plane < num_planes; ++plane) {
+        if (use_420)
+          vp9_filter_block_plane(cm, &planes[plane], mi_row, &lfm);
+        else
+          vp9_filter_block_plane_non420(cm, &planes[plane], mi + mi_col,
+                                        mi_row, mi_col);
      }
    }
  }
--- a/vp9/common/vp9_loopfilter.h
+++ b/vp9/common/vp9_loopfilter.h
@@ -29,12 +29,6 @@ extern "C" {
 #define MAX_REF_LF_DELTAS       4
 #define MAX_MODE_LF_DELTAS      2

-enum lf_path {
-  LF_PATH_420,
-  LF_PATH_444,
-  LF_PATH_SLOW,
-};
-
 struct loopfilter {
  int filter_level;

@@ -98,15 +92,10 @@ void vp9_setup_mask(struct VP9Common *const cm,
                    MODE_INFO *mi_8x8, const int mode_info_stride,
                    LOOP_FILTER_MASK *lfm);

-void vp9_filter_block_plane_ss00(struct VP9Common *const cm,
-                                 struct macroblockd_plane *const plane,
-                                 int mi_row,
-                                 LOOP_FILTER_MASK *lfm);
-
-void vp9_filter_block_plane_ss11(struct VP9Common *const cm,
-                                 struct macroblockd_plane *const plane,
-                                 int mi_row,
-                                 LOOP_FILTER_MASK *lfm);
+void vp9_filter_block_plane(struct VP9Common *const cm,
+                            struct macroblockd_plane *const plane,
+                            int mi_row,
+                            LOOP_FILTER_MASK *lfm);

 void vp9_filter_block_plane_non420(struct VP9Common *cm,
                                   struct macroblockd_plane *plane,
--- a/vp9/common/vp9_onyxc_int.h
+++ b/vp9/common/vp9_onyxc_int.h
@@ -88,7 +88,7 @@ typedef struct {
  int col;
 } RefCntBuffer;

-typedef struct BufferPool {
+typedef struct {
  // Protect BufferPool from being accessed by several FrameWorkers at
  // the same time during frame parallel decode.
  // TODO(hkuang): Try to use atomic variable instead of locking the whole pool.
--- a/vp9/common/vp9_postproc.c
+++ b/vp9/common/vp9_postproc.c
@@ -91,7 +91,10 @@ void vp9_post_proc_down_and_across_c(const uint8_t *src_ptr,
                                     int flimit) {
  uint8_t const *p_src;
  uint8_t *p_dst;
-  int row, col, i, v, kernel;
+  int row;
+  int col;
+  int i;
+  int v;
  int pitch = src_pixels_per_line;
  uint8_t d[8];
  (void)dst_pixels_per_line;
@@ -102,8 +105,8 @@ void vp9_post_proc_down_and_across_c(const uint8_t *src_ptr,
    p_dst = dst_ptr;

    for (col = 0; col < cols; col++) {
-      kernel = 4;
-      v = p_src[col];
+      int kernel = 4;
+      int v = p_src[col];

      for (i = -2; i <= 2; i++) {
        if (abs(v - p_src[col + i * pitch]) > flimit)
@@ -125,7 +128,7 @@ void vp9_post_proc_down_and_across_c(const uint8_t *src_ptr,
      d[i] = p_src[i];

    for (col = 0; col < cols; col++) {
-      kernel = 4;
+      int kernel = 4;
      v = p_src[col];

      d[col & 7] = v;
@@ -165,7 +168,10 @@ void vp9_highbd_post_proc_down_and_across_c(const uint16_t *src_ptr,
                                            int flimit) {
  uint16_t const *p_src;
  uint16_t *p_dst;
-  int row, col, i, v, kernel;
+  int row;
+  int col;
+  int i;
+  int v;
  int pitch = src_pixels_per_line;
  uint16_t d[8];

@@ -175,8 +181,8 @@ void vp9_highbd_post_proc_down_and_across_c(const uint16_t *src_ptr,
    p_dst = dst_ptr;

    for (col = 0; col < cols; col++) {
-      kernel = 4;
-      v = p_src[col];
+      int kernel = 4;
+      int v = p_src[col];

      for (i = -2; i <= 2; i++) {
        if (abs(v - p_src[col + i * pitch]) > flimit)
@@ -199,7 +205,7 @@ void vp9_highbd_post_proc_down_and_across_c(const uint16_t *src_ptr,
      d[i] = p_src[i];

    for (col = 0; col < cols; col++) {
-      kernel = 4;
+      int kernel = 4;
      v = p_src[col];

      d[col & 7] = v;
@@ -512,24 +518,22 @@ void vp9_denoise(const YV12_BUFFER_CONFIG *src, YV12_BUFFER_CONFIG *dst,
    assert((src->flags & YV12_FLAG_HIGHBITDEPTH) ==
           (dst->flags & YV12_FLAG_HIGHBITDEPTH));
    if (src->flags & YV12_FLAG_HIGHBITDEPTH) {
-      const uint16_t *const src_plane = CONVERT_TO_SHORTPTR(
-          srcs[i] + 2 * src_stride + 2);
-      uint16_t *const dst_plane = CONVERT_TO_SHORTPTR(
-          dsts[i] + 2 * dst_stride + 2);
-      vp9_highbd_post_proc_down_and_across(src_plane, dst_plane, src_stride,
-                                           dst_stride, src_height, src_width,
-                                           ppl);
+      const uint16_t *const src = CONVERT_TO_SHORTPTR(srcs[i] + 2 * src_stride
+                                                      + 2);
+      uint16_t *const dst = CONVERT_TO_SHORTPTR(dsts[i] + 2 * dst_stride + 2);
+      vp9_highbd_post_proc_down_and_across(src, dst, src_stride, dst_stride,
+                                           src_height, src_width, ppl);
    } else {
-      const uint8_t *const src_plane = srcs[i] + 2 * src_stride + 2;
-      uint8_t *const dst_plane = dsts[i] + 2 * dst_stride + 2;
+      const uint8_t *const src = srcs[i] + 2 * src_stride + 2;
+      uint8_t *const dst = dsts[i] + 2 * dst_stride + 2;

-      vp9_post_proc_down_and_across(src_plane, dst_plane, src_stride,
-                                    dst_stride, src_height, src_width, ppl);
+      vp9_post_proc_down_and_across(src, dst, src_stride, dst_stride,
+                                    src_height, src_width, ppl);
    }
 #else
-    const uint8_t *const src_plane = srcs[i] + 2 * src_stride + 2;
-    uint8_t *const dst_plane = dsts[i] + 2 * dst_stride + 2;
-    vp9_post_proc_down_and_across(src_plane, dst_plane, src_stride, dst_stride,
+    const uint8_t *const src = srcs[i] + 2 * src_stride + 2;
+    uint8_t *const dst = dsts[i] + 2 * dst_stride + 2;
+    vp9_post_proc_down_and_across(src, dst, src_stride, dst_stride,
                                  src_height, src_width, ppl);
 #endif
  }
@@ -554,15 +558,16 @@ static void fillrd(struct postproc_state *state, int q, int a) {
   * a gaussian distribution with sigma determined by q.
   */
  {
+    double i;
    int next, j;

    next = 0;

    for (i = -32; i < 32; i++) {
-      int a_i = (int)(0.5 + 256 * gaussian(sigma, 0, i));
+      int a = (int)(0.5 + 256 * gaussian(sigma, 0, i));

-      if (a_i) {
-        for (j = 0; j < a_i; j++) {
+      if (a) {
+        for (j = 0; j < a; j++) {
          char_dist[next + j] = (char) i;
        }

--- a/vp9/common/vp9_reconintra.c
+++ b/vp9/common/vp9_reconintra.c
@@ -30,25 +30,6 @@ const TX_TYPE intra_mode_to_tx_type_lookup[INTRA_MODES] = {
  ADST_ADST,  // TM
 };

-enum {
-  NEED_LEFT = 1 << 1,
-  NEED_ABOVE = 1 << 2,
-  NEED_ABOVERIGHT = 1 << 3,
-};
-
-static const uint8_t extend_modes[INTRA_MODES] = {
-  NEED_ABOVE | NEED_LEFT,       // DC
-  NEED_ABOVE,                   // V
-  NEED_LEFT,                    // H
-  NEED_ABOVERIGHT,              // D45
-  NEED_LEFT | NEED_ABOVE,       // D135
-  NEED_LEFT | NEED_ABOVE,       // D117
-  NEED_LEFT | NEED_ABOVE,       // D153
-  NEED_LEFT,                    // D207
-  NEED_ABOVERIGHT,              // D63
-  NEED_LEFT | NEED_ABOVE,       // TM
-};
-
 // This serves as a wrapper function, so that all the prediction functions
 // can be unified and accessed as a pointer array. Note that the boundary
 // above and left are not necessarily used all the time.
@@ -809,106 +790,75 @@ static void build_intra_predictors(const MACROBLOCKD *xd, const uint8_t *ref,
  x0 = (-xd->mb_to_left_edge >> (3 + pd->subsampling_x)) + x;
  y0 = (-xd->mb_to_top_edge >> (3 + pd->subsampling_y)) + y;

-  // NEED_LEFT
-  if (extend_modes[mode] & NEED_LEFT) {
-    if (left_available) {
-      if (xd->mb_to_bottom_edge < 0) {
-        /* slower path if the block needs border extension */
-        if (y0 + bs <= frame_height) {
-          for (i = 0; i < bs; ++i)
-            left_col[i] = ref[i * ref_stride - 1];
-        } else {
-          const int extend_bottom = frame_height - y0;
-          for (i = 0; i < extend_bottom; ++i)
-            left_col[i] = ref[i * ref_stride - 1];
-          for (; i < bs; ++i)
-            left_col[i] = ref[(extend_bottom - 1) * ref_stride - 1];
-        }
-      } else {
-        /* faster path if the block does not need extension */
+  vpx_memset(left_col, 129, 64);
+
+  // left
+  if (left_available) {
+    if (xd->mb_to_bottom_edge < 0) {
+      /* slower path if the block needs border extension */
+      if (y0 + bs <= frame_height) {
        for (i = 0; i < bs; ++i)
          left_col[i] = ref[i * ref_stride - 1];
+      } else {
+        const int extend_bottom = frame_height - y0;
+        for (i = 0; i < extend_bottom; ++i)
+          left_col[i] = ref[i * ref_stride - 1];
+        for (; i < bs; ++i)
+          left_col[i] = ref[(extend_bottom - 1) * ref_stride - 1];
      }
    } else {
-      vpx_memset(left_col, 129, bs);
+      /* faster path if the block does not need extension */
+      for (i = 0; i < bs; ++i)
+        left_col[i] = ref[i * ref_stride - 1];
    }
  }

-  // NEED_ABOVE
-  if (extend_modes[mode] & NEED_ABOVE) {
-    if (up_available) {
-      const uint8_t *above_ref = ref - ref_stride;
-      if (xd->mb_to_right_edge < 0) {
-        /* slower path if the block needs border extension */
-        if (x0 + bs <= frame_width) {
-          vpx_memcpy(above_row, above_ref, bs);
-        } else if (x0 <= frame_width) {
-          const int r = frame_width - x0;
-          vpx_memcpy(above_row, above_ref, r);
-          vpx_memset(above_row + r, above_row[r - 1],
-                     x0 + bs - frame_width);
-        }
-      } else {
-        /* faster path if the block does not need extension */
-        if (bs == 4 && right_available && left_available) {
-          const_above_row = above_ref;
+  // TODO(hkuang) do not extend 2*bs pixels for all modes.
+  // above
+  if (up_available) {
+    const uint8_t *above_ref = ref - ref_stride;
+    if (xd->mb_to_right_edge < 0) {
+      /* slower path if the block needs border extension */
+      if (x0 + 2 * bs <= frame_width) {
+        if (right_available && bs == 4) {
+          vpx_memcpy(above_row, above_ref, 2 * bs);
        } else {
          vpx_memcpy(above_row, above_ref, bs);
+          vpx_memset(above_row + bs, above_row[bs - 1], bs);
        }
-      }
-      above_row[-1] = left_available ? above_ref[-1] : 129;
-    } else {
-      vpx_memset(above_row, 127, bs);
-      above_row[-1] = 127;
-    }
-  }
-
-  // NEED_ABOVERIGHT
-  if (extend_modes[mode] & NEED_ABOVERIGHT) {
-    if (up_available) {
-      const uint8_t *above_ref = ref - ref_stride;
-      if (xd->mb_to_right_edge < 0) {
-        /* slower path if the block needs border extension */
-        if (x0 + 2 * bs <= frame_width) {
-          if (right_available && bs == 4) {
-            vpx_memcpy(above_row, above_ref, 2 * bs);
-          } else {
-            vpx_memcpy(above_row, above_ref, bs);
-            vpx_memset(above_row + bs, above_row[bs - 1], bs);
-          }
-        } else if (x0 + bs <= frame_width) {
-          const int r = frame_width - x0;
-          if (right_available && bs == 4) {
-            vpx_memcpy(above_row, above_ref, r);
-            vpx_memset(above_row + r, above_row[r - 1],
-                       x0 + 2 * bs - frame_width);
-          } else {
-            vpx_memcpy(above_row, above_ref, bs);
-            vpx_memset(above_row + bs, above_row[bs - 1], bs);
-          }
-        } else if (x0 <= frame_width) {
-          const int r = frame_width - x0;
+      } else if (x0 + bs <= frame_width) {
+        const int r = frame_width - x0;
+        if (right_available && bs == 4) {
          vpx_memcpy(above_row, above_ref, r);
          vpx_memset(above_row + r, above_row[r - 1],
                     x0 + 2 * bs - frame_width);
-        }
-      } else {
-        /* faster path if the block does not need extension */
-        if (bs == 4 && right_available && left_available) {
-          const_above_row = above_ref;
        } else {
          vpx_memcpy(above_row, above_ref, bs);
-          if (bs == 4 && right_available)
-            vpx_memcpy(above_row + bs, above_ref + bs, bs);
-          else
-            vpx_memset(above_row + bs, above_row[bs - 1], bs);
+          vpx_memset(above_row + bs, above_row[bs - 1], bs);
        }
+      } else if (x0 <= frame_width) {
+        const int r = frame_width - x0;
+        vpx_memcpy(above_row, above_ref, r);
+        vpx_memset(above_row + r, above_row[r - 1],
+                     x0 + 2 * bs - frame_width);
      }
      above_row[-1] = left_available ? above_ref[-1] : 129;
    } else {
-      vpx_memset(above_row, 127, bs * 2);
-      above_row[-1] = 127;
+      /* faster path if the block does not need extension */
+      if (bs == 4 && right_available && left_available) {
+        const_above_row = above_ref;
+      } else {
+        vpx_memcpy(above_row, above_ref, bs);
+        if (bs == 4 && right_available)
+          vpx_memcpy(above_row + bs, above_ref + bs, bs);
+        else
+          vpx_memset(above_row + bs, above_row[bs - 1], bs);
+        above_row[-1] = left_available ? above_ref[-1] : 129;
+      }
    }
+  } else {
+    vpx_memset(above_row, 127, bs * 2);
+    above_row[-1] = 127;
  }

  // predict
--- a/vp9/common/vp9_rtcd_defs.pl
+++ b/vp9/common/vp9_rtcd_defs.pl
@@ -499,7 +499,7 @@ if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") {
  specialize qw/vp9_highbd_d153_predictor_4x4/;

  add_proto qw/void vp9_highbd_v_predictor_4x4/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd";
-  specialize qw/vp9_highbd_v_predictor_4x4 neon/, "$sse_x86inc";
+  specialize qw/vp9_highbd_v_predictor_4x4/, "$sse_x86inc";

  add_proto qw/void vp9_highbd_tm_predictor_4x4/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd";
  specialize qw/vp9_highbd_tm_predictor_4x4/, "$sse_x86inc";
@@ -577,7 +577,7 @@ if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") {
  specialize qw/vp9_highbd_d153_predictor_16x16/;

  add_proto qw/void vp9_highbd_v_predictor_16x16/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd";
-  specialize qw/vp9_highbd_v_predictor_16x16 neon/, "$sse2_x86inc";
+  specialize qw/vp9_highbd_v_predictor_16x16/, "$sse2_x86inc";

  add_proto qw/void vp9_highbd_tm_predictor_16x16/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd";
  specialize qw/vp9_highbd_tm_predictor_16x16/, "$sse2_x86_64";
@@ -1109,15 +1109,6 @@ specialize qw/vp9_avg_8x8 sse2 neon/;
 add_proto qw/unsigned int vp9_avg_4x4/, "const uint8_t *, int p";
 specialize qw/vp9_avg_4x4 sse2/;

-add_proto qw/void vp9_hadamard_8x8/, "int16_t const *src_diff, int src_stride, int16_t *coeff";
-specialize qw/vp9_hadamard_8x8 sse2/;
-
-add_proto qw/void vp9_hadamard_16x16/, "int16_t const *src_diff, int src_stride, int16_t *coeff";
-specialize qw/vp9_hadamard_16x16 sse2/;
-
-add_proto qw/int16_t vp9_satd/, "const int16_t *coeff, int length";
-specialize qw/vp9_satd sse2/;
-
 add_proto qw/void vp9_int_pro_row/, "int16_t *hbuf, uint8_t const *ref, const int ref_stride, const int height";
 specialize qw/vp9_int_pro_row sse2/;

@@ -1171,9 +1162,6 @@ if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") {
  add_proto qw/int64_t vp9_block_error/, "const tran_low_t *coeff, const tran_low_t *dqcoeff, intptr_t block_size, int64_t *ssz";
  specialize qw/vp9_block_error avx2/, "$sse2_x86inc";

-  add_proto qw/int64_t vp9_block_error_fp/, "const int16_t *coeff, const int16_t *dqcoeff, int block_size";
-  specialize qw/vp9_block_error_fp sse2/;
-
  add_proto qw/void vp9_quantize_fp/, "const tran_low_t *coeff_ptr, intptr_t n_coeffs, int skip_block, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan";
  specialize qw/vp9_quantize_fp neon sse2/, "$ssse3_x86_64";

--- a/vp9/common/vp9_thread_common.c
+++ b/vp9/common/vp9_thread_common.c
@@ -13,7 +13,6 @@
 #include "vp9/common/vp9_entropymode.h"
 #include "vp9/common/vp9_thread_common.h"
 #include "vp9/common/vp9_reconinter.h"
-#include "vp9/common/vp9_loopfilter.h"

 #if CONFIG_MULTITHREAD
 static INLINE void mutex_lock(pthread_mutex_t *const mutex) {
@@ -93,17 +92,10 @@ void thread_loop_filter_rows(const YV12_BUFFER_CONFIG *const frame_buffer,
                             int start, int stop, int y_only,
                             VP9LfSync *const lf_sync) {
  const int num_planes = y_only ? 1 : MAX_MB_PLANE;
+  const int use_420 = y_only || (planes[1].subsampling_y == 1 &&
+                                 planes[1].subsampling_x == 1);
  const int sb_cols = mi_cols_aligned_to_sb(cm->mi_cols) >> MI_BLOCK_SIZE_LOG2;
  int mi_row, mi_col;
-  enum lf_path path;
-  if (y_only)
-    path = LF_PATH_444;
-  else if (planes[1].subsampling_y == 1 && planes[1].subsampling_x == 1)
-    path = LF_PATH_420;
-  else if (planes[1].subsampling_y == 0 && planes[1].subsampling_x == 0)
-    path = LF_PATH_444;
-  else
-    path = LF_PATH_SLOW;

  for (mi_row = start; mi_row < stop;
       mi_row += lf_sync->num_workers * MI_BLOCK_SIZE) {
@@ -120,23 +112,16 @@ void thread_loop_filter_rows(const YV12_BUFFER_CONFIG *const frame_buffer,
      vp9_setup_dst_planes(planes, frame_buffer, mi_row, mi_col);

      // TODO(JBB): Make setup_mask work for non 420.
-      vp9_setup_mask(cm, mi_row, mi_col, mi + mi_col, cm->mi_stride,
-                     &lfm);
+      if (use_420)
+        vp9_setup_mask(cm, mi_row, mi_col, mi + mi_col, cm->mi_stride,
+                       &lfm);

-      vp9_filter_block_plane_ss00(cm, &planes[0], mi_row, &lfm);
-      for (plane = 1; plane < num_planes; ++plane) {
-        switch (path) {
-          case LF_PATH_420:
-            vp9_filter_block_plane_ss11(cm, &planes[plane], mi_row, &lfm);
-            break;
-          case LF_PATH_444:
-            vp9_filter_block_plane_ss00(cm, &planes[plane], mi_row, &lfm);
-            break;
-          case LF_PATH_SLOW:
-            vp9_filter_block_plane_non420(cm, &planes[plane], mi + mi_col,
-                                          mi_row, mi_col);
-            break;
-        }
+      for (plane = 0; plane < num_planes; ++plane) {
+        if (use_420)
+          vp9_filter_block_plane(cm, &planes[plane], mi_row, &lfm);
+        else
+          vp9_filter_block_plane_non420(cm, &planes[plane], mi + mi_col,
+                                        mi_row, mi_col);
      }

      sync_write(lf_sync, r, c, sb_cols);
--- a/vp9/decoder/vp9_decodeframe.c
+++ b/vp9/decoder/vp9_decodeframe.c
@@ -1509,7 +1509,7 @@ static int read_compressed_header(VP9Decoder *pbi, const uint8_t *data,
  if (vp9_reader_init(&r, data, partition_size, pbi->decrypt_cb,
                      pbi->decrypt_state))
    vpx_internal_error(&cm->error, VPX_CODEC_MEM_ERROR,
-                       "Failed to allocate boon decoder 0");
+                       "Failed to allocate bool decoder 0");

  cm->tx_mode = xd->lossless ? ONLY_4X4 : read_tx_mode(&r);
  if (cm->tx_mode == TX_MODE_SELECT)
--- a/vp9/decoder/vp9_decodemv.c
+++ b/vp9/decoder/vp9_decodemv.c
@@ -60,35 +60,6 @@ static int read_segment_id(vp9_reader *r, const struct segmentation *seg) {
  return vp9_read_tree(r, vp9_segment_tree, seg->tree_probs);
 }

-static void read_tx_size_inter(VP9_COMMON *cm, MACROBLOCKD *xd,
-                               TX_SIZE tx_size, int mi_row, int mi_col,
-                               vp9_reader *r) {
-  MB_MODE_INFO *mbmi = &xd->mi[0].src_mi->mbmi;
-  int is_split = vp9_read_bit(r);
-
-  if (!is_split) {
-    mbmi->tx_size = tx_size;
-  } else {
-    BLOCK_SIZE bsize = txsize_to_bsize[tx_size];
-    int bh = num_8x8_blocks_high_lookup[bsize];
-    int i;
-
-    if (tx_size == TX_8X8) {
-      mbmi->tx_size = TX_4X4;
-      return;
-    }
-
-    for (i = 0; i < 4; ++i) {
-      int offsetr = (i >> 1) * bh / 2;
-      int offsetc = (i & 0x01) * bh / 2;
-      if ((mi_row + offsetr < cm->mi_rows) &&
-          (mi_col + offsetc < cm->mi_cols))
-        read_tx_size_inter(cm, xd, tx_size - 1,
-                           mi_row + offsetr, mi_col + offsetc, r);
-    }
-  }
-}
-
 static TX_SIZE read_selected_tx_size(VP9_COMMON *cm, MACROBLOCKD *xd,
                                     FRAME_COUNTS *counts,
                                     TX_SIZE max_tx_size, vp9_reader *r) {
@@ -598,36 +569,13 @@ static void read_inter_frame_mode_info(VP9Decoder *const pbi,
  MODE_INFO *const mi = xd->mi[0].src_mi;
  MB_MODE_INFO *const mbmi = &mi->mbmi;
  int inter_block;
-  BLOCK_SIZE bsize = mbmi->sb_type;

  mbmi->mv[0].as_int = 0;
  mbmi->mv[1].as_int = 0;
  mbmi->segment_id = read_inter_segment_id(cm, xd, mi_row, mi_col, r);
  mbmi->skip = read_skip(cm, xd, counts, mbmi->segment_id, r);
  inter_block = read_is_inter_block(cm, xd, counts, mbmi->segment_id, r);
-
-  {
-    FILE *pf = fopen("dec_modes.txt", "a");
-    fprintf(pf, "pos (%d, %d), frame %d, range %d\n",
-            mi_row, mi_col, cm->current_video_frame, r->range);
-    fclose(pf);
-  }
-
-  if (mbmi->sb_type >= BLOCK_8X8 && cm->tx_mode == TX_MODE_SELECT &&
-      !mbmi->skip && inter_block) {
-    int txb_size = txsize_to_bsize[max_txsize_lookup[bsize]];
-    int bh = num_8x8_blocks_wide_lookup[txb_size];
-    int width  = num_8x8_blocks_wide_lookup[bsize];
-    int height = num_8x8_blocks_high_lookup[bsize];
-    int idx, idy;
-    for (idy = 0; idy < height; idy += bh)
-      for (idx = 0; idx < width; idx += bh)
-        read_tx_size_inter(cm, xd, max_txsize_lookup[mbmi->sb_type],
-                           mi_row + idy, mi_col + idx, r);
-  } else {
-    mbmi->tx_size = read_tx_size(cm, xd, counts,
-                                 !mbmi->skip || !inter_block, r);
-  }
+  mbmi->tx_size = read_tx_size(cm, xd, counts, !mbmi->skip || !inter_block, r);

  if (inter_block)
    read_inter_block_mode_info(pbi, xd, counts, tile, mi, mi_row, mi_col, r);
--- a/vp9/decoder/vp9_detokenize.c
+++ b/vp9/decoder/vp9_detokenize.c
@@ -45,6 +45,17 @@ static INLINE int read_coeff(const vp9_prob *probs, int n, vp9_reader *r) {
  return val;
 }

+static const vp9_tree_index coeff_subtree_high[TREE_SIZE(ENTROPY_TOKENS)] = {
+  2, 6,                                         /* 0 = LOW_VAL */
+  -TWO_TOKEN, 4,                                /* 1 = TWO */
+  -THREE_TOKEN, -FOUR_TOKEN,                    /* 2 = THREE */
+  8, 10,                                        /* 3 = HIGH_LOW */
+  -CATEGORY1_TOKEN, -CATEGORY2_TOKEN,           /* 4 = CAT_ONE */
+  12, 14,                                       /* 5 = CAT_THREEFOUR */
+  -CATEGORY3_TOKEN, -CATEGORY4_TOKEN,           /* 6 = CAT_THREE */
+  -CATEGORY5_TOKEN, -CATEGORY6_TOKEN            /* 7 = CAT_FIVE */
+};
+
 static int decode_coefs(VP9_COMMON *cm, const MACROBLOCKD *xd,
                        FRAME_COUNTS *counts, PLANE_TYPE type,
                        tran_low_t *dqcoeff, TX_SIZE tx_size, const int16_t *dq,
@@ -136,7 +147,7 @@ static int decode_coefs(VP9_COMMON *cm, const MACROBLOCKD *xd,
      val = 1;
    } else {
      INCREMENT_COUNT(TWO_TOKEN);
-      token = vp9_read_tree(r, vp9_coef_con_tree,
+      token = vp9_read_tree(r, coeff_subtree_high,
                            vp9_pareto8_full[prob[PIVOT_NODE] - 1]);
      switch (token) {
        case TWO_TOKEN:
--- a/vp9/encoder/vp9_avg.c
+++ b/vp9/encoder/vp9_avg.c
@@ -28,94 +28,6 @@ unsigned int vp9_avg_4x4_c(const uint8_t *s, int p) {
  return (sum + 8) >> 4;
 }

-static void hadamard_col8(const int16_t *src_diff, int src_stride,
-                          int16_t *coeff) {
-  int16_t b0 = src_diff[0 * src_stride] + src_diff[1 * src_stride];
-  int16_t b1 = src_diff[0 * src_stride] - src_diff[1 * src_stride];
-  int16_t b2 = src_diff[2 * src_stride] + src_diff[3 * src_stride];
-  int16_t b3 = src_diff[2 * src_stride] - src_diff[3 * src_stride];
-  int16_t b4 = src_diff[4 * src_stride] + src_diff[5 * src_stride];
-  int16_t b5 = src_diff[4 * src_stride] - src_diff[5 * src_stride];
-  int16_t b6 = src_diff[6 * src_stride] + src_diff[7 * src_stride];
-  int16_t b7 = src_diff[6 * src_stride] - src_diff[7 * src_stride];
-
-  int16_t c0 = b0 + b2;
-  int16_t c1 = b1 + b3;
-  int16_t c2 = b0 - b2;
-  int16_t c3 = b1 - b3;
-  int16_t c4 = b4 + b6;
-  int16_t c5 = b5 + b7;
-  int16_t c6 = b4 - b6;
-  int16_t c7 = b5 - b7;
-
-  coeff[0] = c0 + c4;
-  coeff[7] = c1 + c5;
-  coeff[3] = c2 + c6;
-  coeff[4] = c3 + c7;
-  coeff[2] = c0 - c4;
-  coeff[6] = c1 - c5;
-  coeff[1] = c2 - c6;
-  coeff[5] = c3 - c7;
-}
-
-void vp9_hadamard_8x8_c(int16_t const *src_diff, int src_stride,
-                        int16_t *coeff) {
-  int idx;
-  int16_t buffer[64];
-  int16_t *tmp_buf = &buffer[0];
-  for (idx = 0; idx < 8; ++idx) {
-    hadamard_col8(src_diff, src_stride, tmp_buf);
-    tmp_buf += 8;
-    ++src_diff;
-  }
-
-  tmp_buf = &buffer[0];
-  for (idx = 0; idx < 8; ++idx) {
-    hadamard_col8(tmp_buf, 8, coeff);
-    coeff += 8;
-    ++tmp_buf;
-  }
-}
-
-// In place 16x16 2D Hadamard transform
-void vp9_hadamard_16x16_c(int16_t const *src_diff, int src_stride,
-                          int16_t *coeff) {
-  int idx;
-  for (idx = 0; idx < 4; ++idx) {
-    int16_t const *src_ptr = src_diff + (idx >> 1) * 8 * src_stride
-                                + (idx & 0x01) * 8;
-    vp9_hadamard_8x8_c(src_ptr, src_stride, coeff + idx * 64);
-  }
-
-  for (idx = 0; idx < 64; ++idx) {
-    int16_t a0 = coeff[0];
-    int16_t a1 = coeff[64];
-    int16_t a2 = coeff[128];
-    int16_t a3 = coeff[192];
-
-    int16_t b0 = a0 + a1;
-    int16_t b1 = a0 - a1;
-    int16_t b2 = a2 + a3;
-    int16_t b3 = a2 - a3;
-
-    coeff[0]   = (b0 + b2) >> 1;
-    coeff[64]  = (b1 + b3) >> 1;
-    coeff[128] = (b0 - b2) >> 1;
-    coeff[192] = (b1 - b3) >> 1;
-
-    ++coeff;
-  }
-}
-
-int16_t vp9_satd_c(const int16_t *coeff, int length) {
-  int i;
-  int satd = 0;
-  for (i = 0; i < length; ++i)
-    satd += abs(coeff[i]);
-
-  return (int16_t)satd;
-}
-
 // Integer projection onto row vectors.
 void vp9_int_pro_row_c(int16_t *hbuf, uint8_t const *ref,
                       const int ref_stride, const int height) {
--- a/vp9/encoder/vp9_bitstream.c
+++ b/vp9/encoder/vp9_bitstream.c
@@ -76,35 +76,6 @@ static void prob_diff_update(const vp9_tree_index *tree,
    vp9_cond_prob_diff_update(w, &probs[i], branch_ct[i]);
 }

-static void write_tx_size_inter(const VP9_COMMON *cm, const MACROBLOCKD *xd,
-                                TX_SIZE tx_size, int mi_row, int mi_col,
-                                vp9_writer *w) {
-  MB_MODE_INFO *mbmi = &xd->mi[0].src_mi->mbmi;
-
-  // TODO(jingning): this assumes support of the possible 64x64 transform.
-  if (tx_size == mbmi->tx_size) {
-    vp9_write_bit(w, 0);
-  } else {  // further split
-    BLOCK_SIZE bsize = txsize_to_bsize[tx_size];
-    int bh = num_8x8_blocks_high_lookup[bsize];
-    int i;
-
-    vp9_write_bit(w, 1);
-
-    if (tx_size == TX_8X8)
-      return;
-
-    for (i = 0; i < 4; ++i) {
-      int offsetr = (i >> 1) * bh / 2;
-      int offsetc = (i & 0x01) * bh / 2;
-      if ((mi_row + offsetr < cm->mi_rows) &&
-          (mi_col + offsetc < cm->mi_cols))
-        write_tx_size_inter(cm, xd, tx_size - 1,
-                            mi_row + offsetr, mi_col + offsetc, w);
-    }
-  }
-}
-
 static void write_selected_tx_size(const VP9_COMMON *cm,
                                   const MACROBLOCKD *xd, vp9_writer *w) {
  TX_SIZE tx_size = xd->mi[0].src_mi->mbmi.tx_size;
@@ -264,7 +235,6 @@ static void write_ref_frames(const VP9_COMMON *cm, const MACROBLOCKD *xd,
 }

 static void pack_inter_mode_mvs(VP9_COMP *cpi, const MODE_INFO *mi,
-                                int mi_row, int mi_col,
                                vp9_writer *w) {
  VP9_COMMON *const cm = &cpi->common;
  const nmv_context *nmvc = &cm->fc->nmvc;
@@ -298,20 +268,9 @@ static void pack_inter_mode_mvs(VP9_COMP *cpi, const MODE_INFO *mi,
    vp9_write(w, is_inter, vp9_get_intra_inter_prob(cm, xd));

  if (bsize >= BLOCK_8X8 && cm->tx_mode == TX_MODE_SELECT &&
-      !(is_inter && skip)) {
-    if (!is_inter) {
-      write_selected_tx_size(cm, xd, w);
-    } else {
-      int txb_size = txsize_to_bsize[max_txsize_lookup[bsize]];
-      int bh = num_8x8_blocks_wide_lookup[txb_size];
-      int width  = num_8x8_blocks_wide_lookup[bsize];
-      int height = num_8x8_blocks_high_lookup[bsize];
-      int idx, idy;
-      for (idy = 0; idy < height; idy += bh)
-        for (idx = 0; idx < width; idx += bh)
-          write_tx_size_inter(cm, xd, max_txsize_lookup[bsize],
-                              mi_row + idy, mi_col + idx, w);
-    }
+      !(is_inter &&
+        (skip || vp9_segfeature_active(seg, segment_id, SEG_LVL_SKIP)))) {
+    write_selected_tx_size(cm, xd, w);
  }

  if (!is_inter) {
@@ -433,7 +392,7 @@ static void write_modes_b(VP9_COMP *cpi, const TileInfo *const tile,
  if (frame_is_intra_only(cm)) {
    write_mb_modes_kf(cm, xd, xd->mi, w);
  } else {
-    pack_inter_mode_mvs(cpi, m, mi_row, mi_col, w);
+    pack_inter_mode_mvs(cpi, m, w);
  }

  assert(*tok < tok_end);
@@ -854,10 +813,6 @@ static void encode_txfm_probs(VP9_COMMON *cm, vp9_writer *w,
  if (cm->tx_mode >= ALLOW_32X32)
    vp9_write_bit(w, cm->tx_mode == TX_MODE_SELECT);

-  if (cm->tx_mode != TX_MODE_SELECT) {
-    int a = 10;
-  }
-
  // Probabilities
  if (cm->tx_mode == TX_MODE_SELECT) {
    int i, j;
--- a/vp9/encoder/vp9_encodeframe.c
+++ b/vp9/encoder/vp9_encodeframe.c
@@ -99,9 +99,9 @@ static const uint16_t VP9_HIGH_VAR_OFFS_12[64] = {
 };
 #endif  // CONFIG_VP9_HIGHBITDEPTH

-unsigned int vp9_get_sby_perpixel_variance(VP9_COMP *cpi,
-                                           const struct buf_2d *ref,
-                                           BLOCK_SIZE bs) {
+static unsigned int get_sby_perpixel_variance(VP9_COMP *cpi,
+                                              const struct buf_2d *ref,
+                                              BLOCK_SIZE bs) {
  unsigned int sse;
  const unsigned int var = cpi->fn_ptr[bs].vf(ref->buf, ref->stride,
                                              VP9_VAR_OFFS, 0, &sse);
@@ -109,7 +109,7 @@ unsigned int vp9_get_sby_perpixel_variance(VP9_COMP *cpi,
 }

 #if CONFIG_VP9_HIGHBITDEPTH
-unsigned int vp9_high_get_sby_perpixel_variance(
+static unsigned int high_get_sby_perpixel_variance(
    VP9_COMP *cpi, const struct buf_2d *ref, BLOCK_SIZE bs, int bd) {
  unsigned int var, sse;
  switch (bd) {
@@ -165,6 +165,21 @@ static BLOCK_SIZE get_rd_var_based_fixed_partition(VP9_COMP *cpi, MACROBLOCK *x,
    return BLOCK_8X8;
 }

+static BLOCK_SIZE get_nonrd_var_based_fixed_partition(VP9_COMP *cpi,
+                                                      MACROBLOCK *x,
+                                                      int mi_row,
+                                                      int mi_col) {
+  unsigned int var = get_sby_perpixel_diff_variance(cpi, &x->plane[0].src,
+                                                    mi_row, mi_col,
+                                                    BLOCK_64X64);
+  if (var < 4)
+    return BLOCK_64X64;
+  else if (var < 10)
+    return BLOCK_32X32;
+  else
+    return BLOCK_16X16;
+}
+
 // Lighter version of set_offsets that only sets the mode info
 // pointers.
 static INLINE void set_mode_info_offsets(VP9_COMMON *const cm,
@@ -467,9 +482,9 @@ void vp9_set_vbp_thresholds(VP9_COMP *cpi, int q) {
  } else {
    VP9_COMMON *const cm = &cpi->common;
    const int is_key_frame = (cm->frame_type == KEY_FRAME);
-    const int threshold_multiplier = is_key_frame ? 20 : 1;
+    const int threshold_multiplier = is_key_frame ? 80 : 4;
    const int64_t threshold_base = (int64_t)(threshold_multiplier *
-        cpi->y_dequant[q][1]);
+        vp9_convert_qindex_to_q(q, cm->bit_depth));

    // TODO(marpan): Allow 4x4 partitions for inter-frames.
    // use_4x4_partition = (variance4x4downsample[i2 + j] == 1);
@@ -477,20 +492,21 @@ void vp9_set_vbp_thresholds(VP9_COMP *cpi, int q) {
    // if variance of 16x16 block is very high, so use larger threshold
    // for 16x16 (threshold_bsize_min) in that case.
    if (is_key_frame) {
-      cpi->vbp_threshold_64x64 = threshold_base;
-      cpi->vbp_threshold_32x32 = threshold_base >> 2;
-      cpi->vbp_threshold_16x16 = threshold_base >> 2;
-      cpi->vbp_threshold_8x8 = threshold_base << 2;
+      cpi->vbp_threshold = threshold_base >> 2;
+      cpi->vbp_threshold_bsize_max = threshold_base;
+      cpi->vbp_threshold_bsize_min = threshold_base << 2;
+      cpi->vbp_threshold_16x16 = cpi->vbp_threshold;
      cpi->vbp_bsize_min = BLOCK_8X8;
    } else {
-      cpi->vbp_threshold_32x32 = threshold_base;
+      cpi->vbp_threshold = threshold_base;
      if (cm->width <= 352 && cm->height <= 288) {
-        cpi->vbp_threshold_64x64 = threshold_base >> 2;
-        cpi->vbp_threshold_16x16 = threshold_base << 3;
+        cpi->vbp_threshold_bsize_max = threshold_base >> 2;
+        cpi->vbp_threshold_bsize_min = threshold_base << 3;
      } else {
-        cpi->vbp_threshold_64x64 = threshold_base;
-        cpi->vbp_threshold_16x16 = threshold_base << cpi->oxcf.speed;
+        cpi->vbp_threshold_bsize_max = threshold_base;
+        cpi->vbp_threshold_bsize_min = threshold_base << cpi->oxcf.speed;
      }
+      cpi->vbp_threshold_16x16 = cpi->vbp_threshold_bsize_min;
      cpi->vbp_bsize_min = BLOCK_16X16;
    }
  }
@@ -544,10 +560,18 @@ static void choose_partitioning(VP9_COMP *cpi,

    const YV12_BUFFER_CONFIG *yv12_g = get_ref_frame_buffer(cpi, GOLDEN_FRAME);
    unsigned int y_sad, y_sad_g;
-    const BLOCK_SIZE bsize = BLOCK_32X32
-        + (mi_col + 4 < cm->mi_cols) * 2 + (mi_row + 4 < cm->mi_rows);
+    BLOCK_SIZE bsize;
+    if (mi_row + 4 < cm->mi_rows && mi_col + 4 < cm->mi_cols)
+      bsize = BLOCK_64X64;
+    else if (mi_row + 4 < cm->mi_rows && mi_col + 4 >= cm->mi_cols)
+      bsize = BLOCK_32X64;
+    else if (mi_row + 4 >= cm->mi_rows && mi_col + 4 < cm->mi_cols)
+      bsize = BLOCK_64X32;
+    else
+      bsize = BLOCK_32X32;

    assert(yv12 != NULL);
+
    if (yv12_g && yv12_g != yv12) {
      vp9_setup_pre_planes(xd, 0, yv12_g, mi_row, mi_col,
                           &cm->frame_refs[GOLDEN_FRAME - 1].sf);
@@ -668,7 +692,7 @@ static void choose_partitioning(VP9_COMP *cpi,
      }
      if (is_key_frame || (low_res &&
          vt.split[i].split[j].part_variances.none.variance >
-          (cpi->vbp_threshold_32x32 << 1))) {
+          (cpi->vbp_threshold << 1))) {
        // Go down to 4x4 down-sampling for variance.
        variance4x4downsample[i2 + j] = 1;
        for (k = 0; k < 4; k++) {
@@ -733,7 +757,7 @@ static void choose_partitioning(VP9_COMP *cpi,
    // If variance of this 32x32 block is above the threshold, force the block
    // to split. This also forces a split on the upper (64x64) level.
    get_variance(&vt.split[i].part_variances.none);
-    if (vt.split[i].part_variances.none.variance > cpi->vbp_threshold_32x32) {
+    if (vt.split[i].part_variances.none.variance > cpi->vbp_threshold) {
      force_split[i + 1] = 1;
      force_split[0] = 1;
    }
@@ -745,7 +769,7 @@ static void choose_partitioning(VP9_COMP *cpi,
  // we get to one that's got a variance lower than our threshold.
  if ( mi_col + 8 > cm->mi_cols || mi_row + 8 > cm->mi_rows ||
      !set_vt_partitioning(cpi, xd, &vt, BLOCK_64X64, mi_row, mi_col,
-                           cpi->vbp_threshold_64x64, BLOCK_16X16,
+                           cpi->vbp_threshold_bsize_max, BLOCK_16X16,
                           force_split[0])) {
    for (i = 0; i < 4; ++i) {
      const int x32_idx = ((i & 1) << 2);
@@ -753,7 +777,7 @@ static void choose_partitioning(VP9_COMP *cpi,
      const int i2 = i << 2;
      if (!set_vt_partitioning(cpi, xd, &vt.split[i], BLOCK_32X32,
                               (mi_row + y32_idx), (mi_col + x32_idx),
-                               cpi->vbp_threshold_32x32,
+                               cpi->vbp_threshold,
                               BLOCK_16X16, force_split[i + 1])) {
        for (j = 0; j < 4; ++j) {
          const int x16_idx = ((j & 1) << 1);
@@ -777,7 +801,7 @@ static void choose_partitioning(VP9_COMP *cpi,
                                         BLOCK_8X8,
                                         mi_row + y32_idx + y16_idx + y8_idx,
                                         mi_col + x32_idx + x16_idx + x8_idx,
-                                         cpi->vbp_threshold_8x8,
+                                         cpi->vbp_threshold_bsize_min,
                                         BLOCK_8X8, 0)) {
                  set_block_size(cpi, xd,
                                 (mi_row + y32_idx + y16_idx + y8_idx),
@@ -1049,15 +1073,13 @@ static void rd_pick_sb_modes(VP9_COMP *cpi,
 #if CONFIG_VP9_HIGHBITDEPTH
  if (xd->cur_buf->flags & YV12_FLAG_HIGHBITDEPTH) {
    x->source_variance =
-        vp9_high_get_sby_perpixel_variance(cpi, &x->plane[0].src,
-                                           bsize, xd->bd);
+        high_get_sby_perpixel_variance(cpi, &x->plane[0].src, bsize, xd->bd);
  } else {
    x->source_variance =
-      vp9_get_sby_perpixel_variance(cpi, &x->plane[0].src, bsize);
+        get_sby_perpixel_variance(cpi, &x->plane[0].src, bsize);
  }
 #else
-  x->source_variance =
-    vp9_get_sby_perpixel_variance(cpi, &x->plane[0].src, bsize);
+  x->source_variance = get_sby_perpixel_variance(cpi, &x->plane[0].src, bsize);
 #endif  // CONFIG_VP9_HIGHBITDEPTH

  // Save rdmult before it might be changed, so it can be restored later.
@@ -1081,9 +1103,8 @@ static void rd_pick_sb_modes(VP9_COMP *cpi,
  } else if (aq_mode == CYCLIC_REFRESH_AQ) {
    const uint8_t *const map = cm->seg.update_map ? cpi->segmentation_map
                                                  : cm->last_frame_seg_map;
-    // If segment is boosted, use rdmult for that segment.
-    if (cyclic_refresh_segment_id_boosted(
-            vp9_get_segment_id(cm, map, bsize, mi_row, mi_col)))
+    // If segment 1, use rdmult for that segment.
+    if (vp9_get_segment_id(cm, map, bsize, mi_row, mi_col))
      x->rdmult = vp9_cyclic_refresh_get_rdmult(cpi->cyclic_refresh);
  }

@@ -2821,9 +2842,6 @@ static MV_REFERENCE_FRAME get_frame_type(const VP9_COMP *cpi) {
 static TX_MODE select_tx_mode(const VP9_COMP *cpi, MACROBLOCKD *const xd) {
  if (xd->lossless)
    return ONLY_4X4;
-
-  return TX_MODE_SELECT;
-
  if (cpi->common.frame_type == KEY_FRAME &&
      cpi->sf.use_nonrd_pick_mode &&
      cpi->sf.partition_search_type == VAR_BASED_PARTITION)
@@ -2859,7 +2877,7 @@ static void nonrd_pick_sb_modes(VP9_COMP *cpi,
  mbmi->sb_type = bsize;

  if (cpi->oxcf.aq_mode == CYCLIC_REFRESH_AQ && cm->seg.enabled)
-    if (cyclic_refresh_segment_id_boosted(mbmi->segment_id))
+    if (mbmi->segment_id)
      x->rdmult = vp9_cyclic_refresh_get_rdmult(cpi->cyclic_refresh);

  if (cm->frame_type == KEY_FRAME)
@@ -4090,9 +4108,8 @@ static void encode_superblock(VP9_COMP *cpi, ThreadData *td,
    if (cm->tx_mode == TX_MODE_SELECT &&
        mbmi->sb_type >= BLOCK_8X8  &&
        !(is_inter_block(mbmi) && (mbmi->skip || seg_skip))) {
-      if (!is_inter_block(mbmi))
-        ++get_tx_counts(max_txsize_lookup[bsize], vp9_get_tx_size_context(xd),
-                        &td->counts->tx)[mbmi->tx_size];
+      ++get_tx_counts(max_txsize_lookup[bsize], vp9_get_tx_size_context(xd),
+                      &td->counts->tx)[mbmi->tx_size];
    } else {
      int x, y;
      TX_SIZE tx_size;
--- a/vp9/encoder/vp9_encoder.c
+++ b/vp9/encoder/vp9_encoder.c
@@ -126,25 +126,14 @@ void vp9_apply_active_map(VP9_COMP *cpi) {

  assert(AM_SEGMENT_ID_ACTIVE == CR_SEGMENT_ID_BASE);

-  if (frame_is_intra_only(&cpi->common)) {
-    cpi->active_map.enabled = 0;
-    cpi->active_map.update = 1;
-  }
-
  if (cpi->active_map.update) {
    if (cpi->active_map.enabled) {
      for (i = 0; i < cpi->common.mi_rows * cpi->common.mi_cols; ++i)
        if (seg_map[i] == AM_SEGMENT_ID_ACTIVE) seg_map[i] = active_map[i];
      vp9_enable_segmentation(seg);
      vp9_enable_segfeature(seg, AM_SEGMENT_ID_INACTIVE, SEG_LVL_SKIP);
-      vp9_enable_segfeature(seg, AM_SEGMENT_ID_INACTIVE, SEG_LVL_ALT_LF);
-      // Setting the data to -MAX_LOOP_FILTER will result in the computed loop
-      // filter level being zero regardless of the value of seg->abs_delta.
-      vp9_set_segdata(seg, AM_SEGMENT_ID_INACTIVE,
-                      SEG_LVL_ALT_LF, -MAX_LOOP_FILTER);
    } else {
      vp9_disable_segfeature(seg, AM_SEGMENT_ID_INACTIVE, SEG_LVL_SKIP);
-      vp9_disable_segfeature(seg, AM_SEGMENT_ID_INACTIVE, SEG_LVL_ALT_LF);
      if (seg->enabled) {
        seg->update_data = 1;
        seg->update_map = 1;
@@ -183,33 +172,6 @@ int vp9_set_active_map(VP9_COMP* cpi,
  }
 }

-int vp9_get_active_map(VP9_COMP* cpi,
-                       unsigned char* new_map_16x16,
-                       int rows,
-                       int cols) {
-  if (rows == cpi->common.mb_rows && cols == cpi->common.mb_cols &&
-      new_map_16x16) {
-    unsigned char* const seg_map_8x8 = cpi->segmentation_map;
-    const int mi_rows = cpi->common.mi_rows;
-    const int mi_cols = cpi->common.mi_cols;
-    vpx_memset(new_map_16x16, !cpi->active_map.enabled, rows * cols);
-    if (cpi->active_map.enabled) {
-      int r, c;
-      for (r = 0; r < mi_rows; ++r) {
-        for (c = 0; c < mi_cols; ++c) {
-          // Cyclic refresh segments are considered active despite not having
-          // AM_SEGMENT_ID_ACTIVE
-          new_map_16x16[(r >> 1) * cols + (c >> 1)] |=
-              seg_map_8x8[r * mi_cols + c] != AM_SEGMENT_ID_INACTIVE;
-        }
-      }
-    }
-    return 0;
-  } else {
-    return -1;
-  }
-}
-
 void vp9_set_high_precision_mv(VP9_COMP *cpi, int allow_high_precision_mv) {
  MACROBLOCK *const mb = &cpi->td.mb;
  cpi->common.allow_high_precision_mv = allow_high_precision_mv;
@@ -341,10 +303,7 @@ static void dealloc_compressor_data(VP9_COMP *cpi) {
  vpx_free(cpi->active_map.map);
  cpi->active_map.map = NULL;

-  vp9_free_ref_frame_buffers(cm->buffer_pool);
-#if CONFIG_VP9_POSTPROC
-  vp9_free_postproc_buffers(cm);
-#endif
+  vp9_free_ref_frame_buffers(cm);
  vp9_free_context_buffers(cm);

  vp9_free_frame_buffer(&cpi->last_frame_uf);
@@ -1949,10 +1908,6 @@ void vp9_remove_compressor(VP9_COMP *cpi) {
 #endif

  vp9_remove_common(cm);
-  vp9_free_ref_frame_buffers(cm->buffer_pool);
-#if CONFIG_VP9_POSTPROC
-  vp9_free_postproc_buffers(cm);
-#endif
  vpx_free(cpi);

 #if CONFIG_VP9_TEMPORAL_DENOISING
--- a/vp9/encoder/vp9_encoder.h
+++ b/vp9/encoder/vp9_encoder.h
@@ -416,7 +416,6 @@ typedef struct VP9_COMP {
  double total_ssimg_all;

  int b_calculate_ssimg;
-  int dummy_writing;
 #endif
  int b_calculate_psnr;

@@ -461,10 +460,10 @@ typedef struct VP9_COMP {
  int resize_pending;

  // VAR_BASED_PARTITION thresholds
-  int64_t vbp_threshold_64x64;
-  int64_t vbp_threshold_32x32;
+  int64_t vbp_threshold;
+  int64_t vbp_threshold_bsize_min;
+  int64_t vbp_threshold_bsize_max;
  int64_t vbp_threshold_16x16;
-  int64_t vbp_threshold_8x8;
  BLOCK_SIZE vbp_bsize_min;

  // Multi-threading
@@ -509,8 +508,6 @@ int vp9_update_entropy(VP9_COMP *cpi, int update);

 int vp9_set_active_map(VP9_COMP *cpi, unsigned char *map, int rows, int cols);

-int vp9_get_active_map(VP9_COMP *cpi, unsigned char *map, int rows, int cols);
-
 int vp9_set_internal_size(VP9_COMP *cpi,
                          VPX_SCALING horiz_mode, VPX_SCALING vert_mode);

--- a/vp9/encoder/vp9_firstpass.c
+++ b/vp9/encoder/vp9_firstpass.c
@@ -38,7 +38,7 @@
 #define OUTPUT_FPF          0
 #define ARF_STATS_OUTPUT    0

-#define GROUP_ADAPTIVE_MAXQ 1
+#define GROUP_ADAPTIVE_MAXQ 0

 #define BOOST_BREAKOUT      12.5
 #define BOOST_FACTOR        12.5
@@ -61,9 +61,12 @@
 #define RC_FACTOR_MAX       1.75


+#define INTRA_WEIGHT_EXPERIMENT 0
+#if INTRA_WEIGHT_EXPERIMENT
 #define NCOUNT_INTRA_THRESH 8192
 #define NCOUNT_INTRA_FACTOR 3
 #define NCOUNT_FRAME_II_THRESH 5.0
+#endif

 #define DOUBLE_DIVIDE_CHECK(x) ((x) < 0 ? (x) - 0.000001 : (x) + 0.000001)

@@ -829,6 +832,7 @@ void vp9_first_pass(VP9_COMP *cpi, const struct lookahead_entry *source) {
          // Keep a count of cases where the inter and intra were very close
          // and very low. This helps with scene cut detection for example in
          // cropped clips with black bars at the sides or top and bottom.
+#if INTRA_WEIGHT_EXPERIMENT
          if (((this_error - intrapenalty) * 9 <= motion_error * 10) &&
              (this_error < (2 * intrapenalty))) {
            neutral_count += 1.0;
@@ -839,6 +843,12 @@ void vp9_first_pass(VP9_COMP *cpi, const struct lookahead_entry *source) {
            neutral_count += (double)motion_error /
                             DOUBLE_DIVIDE_CHECK((double)this_error);
          }
+#else
+          if (((this_error - intrapenalty) * 9 <= motion_error * 10) &&
+              (this_error < (2 * intrapenalty))) {
+            neutral_count += 1.0;
+          }
+#endif

          mv.row *= 8;
          mv.col *= 8;
@@ -1281,10 +1291,11 @@ static double get_sr_decay_rate(const VP9_COMP *cpi,
    frame->pcnt_motion * ((frame->mvc_abs + frame->mvr_abs) / 2);

  modified_pct_inter = frame->pcnt_inter;
+#if INTRA_WEIGHT_EXPERIMENT
  if ((frame->intra_error / DOUBLE_DIVIDE_CHECK(frame->coded_error)) <
-      (double)NCOUNT_FRAME_II_THRESH) {
+      (double)NCOUNT_FRAME_II_THRESH)
    modified_pct_inter = frame->pcnt_inter - frame->pcnt_neutral;
-  }
+#endif
  modified_pcnt_intra = 100 * (1.0 - modified_pct_inter);


--- a/vp9/encoder/vp9_pickmode.c
+++ b/vp9/encoder/vp9_pickmode.c
@@ -20,11 +20,9 @@
 #include "vp9/common/vp9_blockd.h"
 #include "vp9/common/vp9_common.h"
 #include "vp9/common/vp9_mvref_common.h"
-#include "vp9/common/vp9_pred_common.h"
 #include "vp9/common/vp9_reconinter.h"
 #include "vp9/common/vp9_reconintra.h"

-#include "vp9/encoder/vp9_cost.h"
 #include "vp9/encoder/vp9_encoder.h"
 #include "vp9/encoder/vp9_pickmode.h"
 #include "vp9/encoder/vp9_ratectrl.h"
@@ -190,8 +188,6 @@ static int combined_motion_search(VP9_COMP *cpi, MACROBLOCK *x,
                                 cond_cost_list(cpi, cost_list),
                                 x->nmvjointcost, x->mvcost,
                                 &dis, &x->pred_sse[ref], NULL, 0, 0);
-    *rate_mv = vp9_mv_bit_cost(&tmp_mv->as_mv, &ref_mv,
-                               x->nmvjointcost, x->mvcost, MV_COST_WEIGHT);
  }

  if (scaled_ref_frame) {
@@ -202,247 +198,6 @@ static int combined_motion_search(VP9_COMP *cpi, MACROBLOCK *x,
  return rv;
 }

-static void block_variance(const uint8_t *src, int src_stride,
-                           const uint8_t *ref, int ref_stride,
-                           int w, int h, unsigned int *sse, int *sum,
-                           int block_size, unsigned int *sse8x8,
-                           int *sum8x8, unsigned int *var8x8) {
-  int i, j, k = 0;
-
-  *sse = 0;
-  *sum = 0;
-
-  for (i = 0; i < h; i += block_size) {
-    for (j = 0; j < w; j += block_size) {
-      vp9_get8x8var(src + src_stride * i + j, src_stride,
-                    ref + ref_stride * i + j, ref_stride,
-                    &sse8x8[k], &sum8x8[k]);
-      *sse += sse8x8[k];
-      *sum += sum8x8[k];
-      var8x8[k] = sse8x8[k] - (((unsigned int)sum8x8[k] * sum8x8[k]) >> 6);
-      k++;
-    }
-  }
-}
-
-static void calculate_variance(int bw, int bh, TX_SIZE tx_size,
-                               unsigned int *sse_i, int *sum_i,
-                               unsigned int *var_o, unsigned int *sse_o,
-                               int *sum_o) {
-  const BLOCK_SIZE unit_size = txsize_to_bsize[tx_size];
-  const int nw = 1 << (bw - b_width_log2_lookup[unit_size]);
-  const int nh = 1 << (bh - b_height_log2_lookup[unit_size]);
-  int i, j, k = 0;
-
-  for (i = 0; i < nh; i += 2) {
-    for (j = 0; j < nw; j += 2) {
-      sse_o[k] = sse_i[i * nw + j] + sse_i[i * nw + j + 1] +
-          sse_i[(i + 1) * nw + j] + sse_i[(i + 1) * nw + j + 1];
-      sum_o[k] = sum_i[i * nw + j] + sum_i[i * nw + j + 1] +
-          sum_i[(i + 1) * nw + j] + sum_i[(i + 1) * nw + j + 1];
-      var_o[k] = sse_o[k] - (((unsigned int)sum_o[k] * sum_o[k]) >>
-          (b_width_log2_lookup[unit_size] +
-              b_height_log2_lookup[unit_size] + 6));
-      k++;
-    }
-  }
-}
-
-static void model_rd_for_sb_y_large(VP9_COMP *cpi, BLOCK_SIZE bsize,
-                                    MACROBLOCK *x, MACROBLOCKD *xd,
-                                    int *out_rate_sum, int64_t *out_dist_sum,
-                                    unsigned int *var_y, unsigned int *sse_y,
-                                    int mi_row, int mi_col, int *early_term) {
-  // Note our transform coeffs are 8 times an orthogonal transform.
-  // Hence quantizer step is also 8 times. To get effective quantizer
-  // we need to divide by 8 before sending to modeling function.
-  unsigned int sse;
-  int rate;
-  int64_t dist;
-  struct macroblock_plane *const p = &x->plane[0];
-  struct macroblockd_plane *const pd = &xd->plane[0];
-  const uint32_t dc_quant = pd->dequant[0];
-  const uint32_t ac_quant = pd->dequant[1];
-  const int64_t dc_thr = dc_quant * dc_quant >> 6;
-  const int64_t ac_thr = ac_quant * ac_quant >> 6;
-  unsigned int var;
-  int sum;
-  int skip_dc = 0;
-
-  const int bw = b_width_log2_lookup[bsize];
-  const int bh = b_height_log2_lookup[bsize];
-  const int num8x8 = 1 << (bw + bh - 2);
-  unsigned int sse8x8[64] = {0};
-  int sum8x8[64] = {0};
-  unsigned int var8x8[64] = {0};
-  TX_SIZE tx_size;
-  int i, k;
-
-  // Calculate variance for whole partition, and also save 8x8 blocks' variance
-  // to be used in following transform skipping test.
-  block_variance(p->src.buf, p->src.stride, pd->dst.buf, pd->dst.stride,
-                 4 << bw, 4 << bh, &sse, &sum, 8, sse8x8, sum8x8, var8x8);
-  var = sse - (((int64_t)sum * sum) >> (bw + bh + 4));
-
-  *var_y = var;
-  *sse_y = sse;
-
-  if (cpi->common.tx_mode == TX_MODE_SELECT) {
-    if (sse > (var << 2))
-      tx_size = MIN(max_txsize_lookup[bsize],
-                    tx_mode_to_biggest_tx_size[cpi->common.tx_mode]);
-    else
-      tx_size = TX_8X8;
-
-    if (cpi->sf.partition_search_type == VAR_BASED_PARTITION) {
-      if (cpi->oxcf.aq_mode == CYCLIC_REFRESH_AQ &&
-          cyclic_refresh_segment_id_boosted(xd->mi[0].src_mi->mbmi.segment_id))
-        tx_size = TX_8X8;
-      else if (tx_size > TX_16X16)
-        tx_size = TX_16X16;
-    }
-  } else {
-    tx_size = MIN(max_txsize_lookup[bsize],
-                  tx_mode_to_biggest_tx_size[cpi->common.tx_mode]);
-  }
-
-  assert(tx_size >= TX_8X8);
-  xd->mi[0].src_mi->mbmi.tx_size = tx_size;
-
-  // Evaluate if the partition block is a skippable block in Y plane.
-  {
-    unsigned int sse16x16[16] = {0};
-    int sum16x16[16] = {0};
-    unsigned int var16x16[16] = {0};
-    const int num16x16 = num8x8 >> 2;
-
-    unsigned int sse32x32[4] = {0};
-    int sum32x32[4] = {0};
-    unsigned int var32x32[4] = {0};
-    const int num32x32 = num8x8 >> 4;
-
-    int ac_test = 1;
-    int dc_test = 1;
-    const int num = (tx_size == TX_8X8) ? num8x8 :
-        ((tx_size == TX_16X16) ? num16x16 : num32x32);
-    const unsigned int *sse_tx = (tx_size == TX_8X8) ? sse8x8 :
-        ((tx_size == TX_16X16) ? sse16x16 : sse32x32);
-    const unsigned int *var_tx = (tx_size == TX_8X8) ? var8x8 :
-        ((tx_size == TX_16X16) ? var16x16 : var32x32);
-
-    // Calculate variance if tx_size > TX_8X8
-    if (tx_size >= TX_16X16)
-      calculate_variance(bw, bh, TX_8X8, sse8x8, sum8x8, var16x16, sse16x16,
-                         sum16x16);
-    if (tx_size == TX_32X32)
-      calculate_variance(bw, bh, TX_16X16, sse16x16, sum16x16, var32x32,
-                         sse32x32, sum32x32);
-
-    // Skipping test
-    x->skip_txfm[0] = 0;
-    for (k = 0; k < num; k++)
-      // Check if all ac coefficients can be quantized to zero.
-      if (!(var_tx[k] < ac_thr || var == 0)) {
-        ac_test = 0;
-        break;
-      }
-
-    for (k = 0; k < num; k++)
-      // Check if dc coefficient can be quantized to zero.
-      if (!(sse_tx[k] - var_tx[k] < dc_thr || sse == var)) {
-        dc_test = 0;
-        break;
-      }
-
-    if (ac_test) {
-      x->skip_txfm[0] = 2;
-
-      if (dc_test)
-        x->skip_txfm[0] = 1;
-    } else if (dc_test) {
-      skip_dc = 1;
-    }
-  }
-
-  if (x->skip_txfm[0] == 1) {
-    int skip_uv[2] = {0};
-    unsigned int var_uv[2];
-    unsigned int sse_uv[2];
-
-    *out_rate_sum = 0;
-    *out_dist_sum = sse << 4;
-
-    // Transform skipping test in UV planes.
-    for (i = 1; i <= 2; i++) {
-      struct macroblock_plane *const p = &x->plane[i];
-      struct macroblockd_plane *const pd = &xd->plane[i];
-      const TX_SIZE uv_tx_size = get_uv_tx_size(&xd->mi[0].src_mi->mbmi, pd);
-      const BLOCK_SIZE unit_size = txsize_to_bsize[uv_tx_size];
-      const int sf = (bw - b_width_log2_lookup[unit_size]) +
-          (bh - b_height_log2_lookup[unit_size]);
-      const BLOCK_SIZE bs = get_plane_block_size(bsize, pd);
-      const uint32_t uv_dc_thr = pd->dequant[0] * pd->dequant[0] >> (6 - sf);
-      const uint32_t uv_ac_thr = pd->dequant[1] * pd->dequant[1] >> (6 - sf);
-      int j = i - 1;
-
-      vp9_build_inter_predictors_sbp(xd, mi_row, mi_col, bsize, i);
-      var_uv[j] = cpi->fn_ptr[bs].vf(p->src.buf, p->src.stride,
-                             pd->dst.buf, pd->dst.stride, &sse_uv[j]);
-
-      if (var_uv[j] < uv_ac_thr || var_uv[j] == 0) {
-        if (sse_uv[j] - var_uv[j] < uv_dc_thr || sse_uv[j] == var_uv[j])
-          skip_uv[j] = 1;
-      }
-    }
-
-    // If the transform in YUV planes are skippable, the mode search checks
-    // fewer inter modes and doesn't check intra modes.
-    if (skip_uv[0] & skip_uv[1]) {
-      *early_term = 1;
-    }
-
-    return;
-  }
-
-  if (!skip_dc) {
-#if CONFIG_VP9_HIGHBITDEPTH
-    if (xd->cur_buf->flags & YV12_FLAG_HIGHBITDEPTH) {
-      vp9_model_rd_from_var_lapndz(sse - var, num_pels_log2_lookup[bsize],
-                                   dc_quant >> (xd->bd - 5), &rate, &dist);
-    } else {
-      vp9_model_rd_from_var_lapndz(sse - var, num_pels_log2_lookup[bsize],
-                                   dc_quant >> 3, &rate, &dist);
-    }
-#else
-    vp9_model_rd_from_var_lapndz(sse - var, num_pels_log2_lookup[bsize],
-                                 dc_quant >> 3, &rate, &dist);
-#endif  // CONFIG_VP9_HIGHBITDEPTH
-  }
-
-  if (!skip_dc) {
-    *out_rate_sum = rate >> 1;
-    *out_dist_sum = dist << 3;
-  } else {
-    *out_rate_sum = 0;
-    *out_dist_sum = (sse - var) << 4;
-  }
-
-#if CONFIG_VP9_HIGHBITDEPTH
-  if (xd->cur_buf->flags & YV12_FLAG_HIGHBITDEPTH) {
-    vp9_model_rd_from_var_lapndz(var, num_pels_log2_lookup[bsize],
-                                 ac_quant >> (xd->bd - 5), &rate, &dist);
-  } else {
-    vp9_model_rd_from_var_lapndz(var, num_pels_log2_lookup[bsize],
-                                 ac_quant >> 3, &rate, &dist);
-  }
-#else
-  vp9_model_rd_from_var_lapndz(var, num_pels_log2_lookup[bsize],
-                               ac_quant >> 3, &rate, &dist);
-#endif  // CONFIG_VP9_HIGHBITDEPTH
-
-  *out_rate_sum += rate;
-  *out_dist_sum += dist << 4;
-}

 static void model_rd_for_sb_y(VP9_COMP *cpi, BLOCK_SIZE bsize,
                              MACROBLOCK *x, MACROBLOCKD *xd,
@@ -557,132 +312,6 @@ static void model_rd_for_sb_y(VP9_COMP *cpi, BLOCK_SIZE bsize,
  *out_dist_sum += dist << 4;
 }

-#if CONFIG_VP9_HIGHBITDEPTH
-static void block_yrd(VP9_COMP *cpi, MACROBLOCK *x, int *rate, int64_t *dist,
-                      int *skippable, int64_t *sse, int plane,
-                      BLOCK_SIZE bsize, TX_SIZE tx_size) {
-  MACROBLOCKD *xd = &x->e_mbd;
-  unsigned int var_y, sse_y;
-  (void)plane;
-  (void)tx_size;
-  model_rd_for_sb_y(cpi, bsize, x, xd, rate, dist, &var_y, &sse_y);
-  *sse = INT_MAX;
-  *skippable = 0;
-  return;
-}
-#else
-static void block_yrd(VP9_COMP *cpi, MACROBLOCK *x, int *rate, int64_t *dist,
-                      int *skippable, int64_t *sse, int plane,
-                      BLOCK_SIZE bsize, TX_SIZE tx_size) {
-  MACROBLOCKD *xd = &x->e_mbd;
-  const struct macroblockd_plane *pd = &xd->plane[plane];
-  const struct macroblock_plane *const p = &x->plane[plane];
-  const int num_4x4_w = num_4x4_blocks_wide_lookup[bsize];
-  const int num_4x4_h = num_4x4_blocks_high_lookup[bsize];
-  const int step = 1 << (tx_size << 1);
-  const int block_step = (1 << tx_size);
-  int block = 0, r, c;
-  int shift = tx_size == TX_32X32 ? 0 : 2;
-  const int max_blocks_wide = num_4x4_w + (xd->mb_to_right_edge >= 0 ? 0 :
-      xd->mb_to_right_edge >> (5 + pd->subsampling_x));
-  const int max_blocks_high = num_4x4_h + (xd->mb_to_bottom_edge >= 0 ? 0 :
-      xd->mb_to_bottom_edge >> (5 + pd->subsampling_y));
-  int eob_cost = 0;
-
-  (void)cpi;
-  vp9_subtract_plane(x, bsize, plane);
-  *skippable = 1;
-  // Keep track of the row and column of the blocks we use so that we know
-  // if we are in the unrestricted motion border.
-  for (r = 0; r < max_blocks_high; r += block_step) {
-    for (c = 0; c < num_4x4_w; c += block_step) {
-      if (c < max_blocks_wide) {
-        const scan_order *const scan_order = &vp9_default_scan_orders[tx_size];
-        tran_low_t *const coeff = BLOCK_OFFSET(p->coeff, block);
-        tran_low_t *const qcoeff = BLOCK_OFFSET(p->qcoeff, block);
-        tran_low_t *const dqcoeff = BLOCK_OFFSET(pd->dqcoeff, block);
-        uint16_t *const eob = &p->eobs[block];
-        const int diff_stride = 4 * num_4x4_blocks_wide_lookup[bsize];
-        const int16_t *src_diff;
-        src_diff = &p->src_diff[(r * diff_stride + c) << 2];
-
-        switch (tx_size) {
-          case TX_32X32:
-            vp9_fdct32x32_rd(src_diff, coeff, diff_stride);
-            vp9_quantize_fp_32x32(coeff, 1024, x->skip_block, p->zbin,
-                                  p->round_fp, p->quant_fp, p->quant_shift,
-                                  qcoeff, dqcoeff, pd->dequant, eob,
-                                  scan_order->scan, scan_order->iscan);
-            break;
-          case TX_16X16:
-            vp9_hadamard_16x16(src_diff, diff_stride, (int16_t *)coeff);
-            vp9_quantize_fp(coeff, 256, x->skip_block, p->zbin, p->round_fp,
-                            p->quant_fp, p->quant_shift, qcoeff, dqcoeff,
-                            pd->dequant, eob,
-                            scan_order->scan, scan_order->iscan);
-            break;
-          case TX_8X8:
-            vp9_hadamard_8x8(src_diff, diff_stride, (int16_t *)coeff);
-            vp9_quantize_fp(coeff, 64, x->skip_block, p->zbin, p->round_fp,
-                            p->quant_fp, p->quant_shift, qcoeff, dqcoeff,
-                            pd->dequant, eob,
-                            scan_order->scan, scan_order->iscan);
-            break;
-          case TX_4X4:
-            x->fwd_txm4x4(src_diff, coeff, diff_stride);
-            vp9_quantize_fp(coeff, 16, x->skip_block, p->zbin, p->round_fp,
-                            p->quant_fp, p->quant_shift, qcoeff, dqcoeff,
-                            pd->dequant, eob,
-                            scan_order->scan, scan_order->iscan);
-            break;
-          default:
-            assert(0);
-            break;
-        }
-        *skippable &= (*eob == 0);
-        eob_cost += 1;
-      }
-      block += step;
-    }
-  }
-
-  if (*skippable && *sse < INT64_MAX) {
-    *rate = 0;
-    *dist = (*sse << 6) >> shift;
-    *sse = *dist;
-    return;
-  }
-
-  block = 0;
-  *rate = 0;
-  *dist = 0;
-  *sse = (*sse << 6) >> shift;
-  for (r = 0; r < max_blocks_high; r += block_step) {
-    for (c = 0; c < num_4x4_w; c += block_step) {
-      if (c < max_blocks_wide) {
-        tran_low_t *const coeff = BLOCK_OFFSET(p->coeff, block);
-        tran_low_t *const qcoeff = BLOCK_OFFSET(p->qcoeff, block);
-        tran_low_t *const dqcoeff = BLOCK_OFFSET(pd->dqcoeff, block);
-        uint16_t *const eob = &p->eobs[block];
-
-        if (*eob == 1)
-          *rate += (int)abs(qcoeff[0]);
-        else if (*eob > 1)
-          *rate += (int)vp9_satd((const int16_t *)qcoeff, step << 4);
-
-        *dist += vp9_block_error_fp(coeff, dqcoeff, step << 4) >> shift;
-      }
-      block += step;
-    }
-  }
-
-  if (*skippable == 0) {
-    *rate <<= 10;
-    *rate += (eob_cost << 8);
-  }
-}
-#endif
-
 static void model_rd_for_sb_uv(VP9_COMP *cpi, BLOCK_SIZE bsize,
                               MACROBLOCK *x, MACROBLOCKD *xd,
                               int *out_rate_sum, int64_t *out_dist_sum,
@@ -889,9 +518,7 @@ static void estimate_block_intra(int plane, int block, BLOCK_SIZE plane_bsize,
  int i, j;
  int rate;
  int64_t dist;
-  int64_t this_sse = INT64_MAX;
-  int is_skippable;
-
+  unsigned int var_y, sse_y;
  txfrm_block_to_raster_xy(plane_bsize, tx_size, block, &i, &j);
  assert(plane == 0);
  (void) plane;
@@ -906,13 +533,8 @@ static void estimate_block_intra(int plane, int block, BLOCK_SIZE plane_bsize,
                          x->skip_encode ? src_stride : dst_stride,
                          pd->dst.buf, dst_stride,
                          i, j, 0);
-
-  // TODO(jingning): This needs further refactoring.
-  block_yrd(cpi, x, &rate, &dist, &is_skippable, &this_sse, 0,
-            bsize_tx, MIN(tx_size, TX_16X16));
-  x->skip_txfm[0] = is_skippable;
-  rate += vp9_cost_bit(vp9_get_skip_prob(&cpi->common, xd), is_skippable);
-
+  // This procedure assumes zero offset from p->src.buf and pd->dst.buf.
+  model_rd_for_sb_y(cpi, bsize_tx, x, xd, &rate, &dist, &var_y, &sse_y);
  p->src.buf = src_buf_base;
  pd->dst.buf = dst_buf_base;
  args->rate += rate;
@@ -980,6 +602,10 @@ void vp9_pick_intra_mode(VP9_COMP *cpi, MACROBLOCK *x, RD_COST *rd_cost,
  *rd_cost = best_rdc;
 }

+static const int ref_frame_cost[MAX_REF_FRAMES] = {
+    1235, 229, 530, 615,
+};
+
 typedef struct {
  MV_REFERENCE_FRAME ref_frame;
  PREDICTION_MODE pred_mode;
@@ -1056,21 +682,6 @@ void vp9_pick_inter_mode(VP9_COMP *cpi, MACROBLOCK *x,
  int ref_frame_skip_mask = 0;
  int idx;
  int best_pred_sad = INT_MAX;
-  int best_early_term = 0;
-  int ref_frame_cost[MAX_REF_FRAMES];
-  vp9_prob intra_inter_p = vp9_get_intra_inter_prob(cm, xd);
-  vp9_prob ref_single_p1 = vp9_get_pred_prob_single_ref_p1(cm, xd);
-  vp9_prob ref_single_p2 = vp9_get_pred_prob_single_ref_p2(cm, xd);
-
-  ref_frame_cost[INTRA_FRAME] = vp9_cost_bit(intra_inter_p, 0);
-  ref_frame_cost[LAST_FRAME] = ref_frame_cost[GOLDEN_FRAME] =
-      ref_frame_cost[ALTREF_FRAME] = vp9_cost_bit(intra_inter_p, 1);
-
-  ref_frame_cost[LAST_FRAME]   += vp9_cost_bit(ref_single_p1, 0);
-  ref_frame_cost[GOLDEN_FRAME] += vp9_cost_bit(ref_single_p1, 1);
-  ref_frame_cost[ALTREF_FRAME] += vp9_cost_bit(ref_single_p1, 1);
-  ref_frame_cost[GOLDEN_FRAME] += vp9_cost_bit(ref_single_p2, 0);
-  ref_frame_cost[ALTREF_FRAME] += vp9_cost_bit(ref_single_p2, 1);

  if (reuse_inter_pred) {
    int i;
@@ -1162,10 +773,6 @@ void vp9_pick_inter_mode(VP9_COMP *cpi, MACROBLOCK *x,
    int mode_index;
    int i;
    PREDICTION_MODE this_mode = ref_mode_set[idx].pred_mode;
-    int64_t this_sse;
-    int is_skippable;
-    int this_early_term = 0;
-
    if (!(cpi->sf.inter_mode_mask[bsize] & (1 << this_mode)))
      continue;

@@ -1243,7 +850,6 @@ void vp9_pick_inter_mode(VP9_COMP *cpi, MACROBLOCK *x,
      best_pred_sad = cpi->fn_ptr[bsize].sdf(x->plane[0].src.buf,
                                   x->plane[0].src.stride,
                                   pre_buf, pre_stride);
-      x->pred_mv_sad[LAST_FRAME] = best_pred_sad;
    }

    if (this_mode != NEARESTMV &&
@@ -1318,54 +924,17 @@ void vp9_pick_inter_mode(VP9_COMP *cpi, MACROBLOCK *x,
      var_y = pf_var[best_filter];
      sse_y = pf_sse[best_filter];
      x->skip_txfm[0] = skip_txfm;
-      if (reuse_inter_pred) {
-        pd->dst.buf = this_mode_pred->data;
-        pd->dst.stride = this_mode_pred->stride;
-      }
    } else {
      mbmi->interp_filter = (filter_ref == SWITCHABLE) ? EIGHTTAP : filter_ref;
      vp9_build_inter_predictors_sby(xd, mi_row, mi_col, bsize);
-
-      // For large partition blocks, extra testing is done.
-      if (bsize > BLOCK_32X32 && xd->mi[0].src_mi->mbmi.segment_id != 1 &&
-          cm->base_qindex) {
-        model_rd_for_sb_y_large(cpi, bsize, x, xd, &this_rdc.rate,
-                                &this_rdc.dist, &var_y, &sse_y, mi_row, mi_col,
-                                &this_early_term);
-      } else {
-        model_rd_for_sb_y(cpi, bsize, x, xd, &this_rdc.rate, &this_rdc.dist,
-                          &var_y, &sse_y);
-      }
-    }
-
-    if (!this_early_term) {
-      this_sse = (int64_t)sse_y;
-      block_yrd(cpi, x, &this_rdc.rate, &this_rdc.dist, &is_skippable,
-                &this_sse, 0, bsize, MIN(mbmi->tx_size, TX_16X16));
-      x->skip_txfm[0] = is_skippable;
-      if (is_skippable) {
-        this_rdc.rate = vp9_cost_bit(vp9_get_skip_prob(cm, xd), 1);
-      } else {
-        if (RDCOST(x->rdmult, x->rddiv, this_rdc.rate, this_rdc.dist) <
-            RDCOST(x->rdmult, x->rddiv, 0, this_sse)) {
-          this_rdc.rate += vp9_cost_bit(vp9_get_skip_prob(cm, xd), 0);
-        } else {
-          this_rdc.rate = vp9_cost_bit(vp9_get_skip_prob(cm, xd), 1);
-          this_rdc.dist = this_sse;
-          x->skip_txfm[0] = 1;
-        }
-      }
-
-      if (cm->interp_filter == SWITCHABLE) {
-        if ((mbmi->mv[0].as_mv.row | mbmi->mv[0].as_mv.col) & 0x07)
-          this_rdc.rate += vp9_get_switchable_rate(cpi, xd);
-      }
-    } else {
-      this_rdc.rate += cm->interp_filter == SWITCHABLE ?
-          vp9_get_switchable_rate(cpi, xd) : 0;
-      this_rdc.rate += vp9_cost_bit(vp9_get_skip_prob(cm, xd), 1);
+      model_rd_for_sb_y(cpi, bsize, x, xd, &this_rdc.rate, &this_rdc.dist,
+                        &var_y, &sse_y);
+      this_rdc.rate +=
+          cm->interp_filter == SWITCHABLE ?
+              vp9_get_switchable_rate(cpi, xd) : 0;
    }

+    // chroma component rate-distortion cost modeling
    if (x->color_sensitivity[0] || x->color_sensitivity[1]) {
      int uv_rate = 0;
      int64_t uv_dist = 0;
@@ -1373,8 +942,7 @@ void vp9_pick_inter_mode(VP9_COMP *cpi, MACROBLOCK *x,
        vp9_build_inter_predictors_sbp(xd, mi_row, mi_col, bsize, 1);
      if (x->color_sensitivity[1])
        vp9_build_inter_predictors_sbp(xd, mi_row, mi_col, bsize, 2);
-      model_rd_for_sb_uv(cpi, bsize, x, xd, &uv_rate, &uv_dist,
-                         &var_y, &sse_y);
+      model_rd_for_sb_uv(cpi, bsize, x, xd, &uv_rate, &uv_dist, &var_y, &sse_y);
      this_rdc.rate += uv_rate;
      this_rdc.dist += uv_dist;
    }
@@ -1413,7 +981,6 @@ void vp9_pick_inter_mode(VP9_COMP *cpi, MACROBLOCK *x,
      best_tx_size = mbmi->tx_size;
      best_ref_frame = ref_frame;
      best_mode_skip_txfm = x->skip_txfm[0];
-      best_early_term = this_early_term;

      if (reuse_inter_pred) {
        free_pred_buffer(best_pred);
@@ -1426,13 +993,6 @@ void vp9_pick_inter_mode(VP9_COMP *cpi, MACROBLOCK *x,

    if (x->skip)
      break;
-
-    // If early termination flag is 1 and at least 2 modes are checked,
-    // the mode search is terminated.
-    if (best_early_term && idx > 0) {
-      x->skip = 1;
-      break;
-    }
  }

  mbmi->mode          = best_mode;
@@ -1481,8 +1041,6 @@ void vp9_pick_inter_mode(VP9_COMP *cpi, MACROBLOCK *x,
      const PREDICTION_MODE this_mode = intra_mode_list[i];
      if (!((1 << this_mode) & cpi->sf.intra_y_mode_mask[intra_tx_size]))
        continue;
-      mbmi->mode = this_mode;
-      mbmi->ref_frame[0] = INTRA_FRAME;
      args.mode = this_mode;
      args.rate = 0;
      args.dist = 0;
@@ -1499,17 +1057,17 @@ void vp9_pick_inter_mode(VP9_COMP *cpi, MACROBLOCK *x,

      if (this_rdc.rdcost < best_rdc.rdcost) {
        best_rdc = this_rdc;
-        best_mode = this_mode;
+        mbmi->mode = this_mode;
        best_intra_tx_size = mbmi->tx_size;
-        best_ref_frame = INTRA_FRAME;
+        mbmi->ref_frame[0] = INTRA_FRAME;
        mbmi->uv_mode = this_mode;
        mbmi->mv[0].as_int = INVALID_MV;
-        best_mode_skip_txfm = x->skip_txfm[0];
      }
    }

    // Reset mb_mode_info to the best inter mode.
-    if (best_ref_frame != INTRA_FRAME) {
+    if (mbmi->ref_frame[0] != INTRA_FRAME) {
+      x->skip_txfm[0] = best_mode_skip_txfm;
      mbmi->tx_size = best_tx_size;
    } else {
      mbmi->tx_size = best_intra_tx_size;
@@ -1517,9 +1075,6 @@ void vp9_pick_inter_mode(VP9_COMP *cpi, MACROBLOCK *x,
  }

  pd->dst = orig_dst;
-  mbmi->mode = best_mode;
-  mbmi->ref_frame[0] = best_ref_frame;
-  x->skip_txfm[0] = best_mode_skip_txfm;

  if (reuse_inter_pred && best_pred != NULL) {
    if (best_pred->data != orig_dst.buf && is_inter_mode(mbmi->mode)) {
--- a/vp9/encoder/vp9_rd.c
+++ b/vp9/encoder/vp9_rd.c
@@ -457,7 +457,6 @@ void vp9_mv_pred(VP9_COMP *cpi, MACROBLOCK *x,
  int best_sad = INT_MAX;
  int this_sad = INT_MAX;
  int max_mv = 0;
-  int near_same_nearest;
  uint8_t *src_y_ptr = x->plane[0].src.buf;
  uint8_t *ref_y_ptr;
  const int num_mv_refs = MAX_MV_REF_CANDIDATES +
@@ -470,27 +469,23 @@ void vp9_mv_pred(VP9_COMP *cpi, MACROBLOCK *x,
  pred_mv[2] = x->pred_mv[ref_frame];
  assert(num_mv_refs <= (int)(sizeof(pred_mv) / sizeof(pred_mv[0])));

-  near_same_nearest =
-      mbmi->ref_mvs[ref_frame][0].as_int == mbmi->ref_mvs[ref_frame][1].as_int;
  // Get the sad for each candidate reference mv.
  for (i = 0; i < num_mv_refs; ++i) {
    const MV *this_mv = &pred_mv[i];
-    int fp_row, fp_col;

-    if (i == 1 && near_same_nearest)
-      continue;
-    fp_row = (this_mv->row + 3 + (this_mv->row >= 0)) >> 3;
-    fp_col = (this_mv->col + 3 + (this_mv->col >= 0)) >> 3;
    max_mv = MAX(max_mv, MAX(abs(this_mv->row), abs(this_mv->col)) >> 3);
-
-    if (fp_row ==0 && fp_col == 0 && zero_seen)
+    if (is_zero_mv(this_mv) && zero_seen)
      continue;
-    zero_seen |= (fp_row ==0 && fp_col == 0);

-    ref_y_ptr =&ref_y_buffer[ref_y_stride * fp_row + fp_col];
+    zero_seen |= is_zero_mv(this_mv);
+
+    ref_y_ptr =
+        &ref_y_buffer[ref_y_stride * (this_mv->row >> 3) + (this_mv->col >> 3)];
+
    // Find sad for current vector.
    this_sad = cpi->fn_ptr[block_size].sdf(src_y_ptr, x->plane[0].src.stride,
                                           ref_y_ptr, ref_y_stride);
+
    // Note if it is the best so far.
    if (this_sad < best_sad) {
      best_sad = this_sad;
--- a/vp9/encoder/vp9_rdopt.c
+++ b/vp9/encoder/vp9_rdopt.c
@@ -292,18 +292,6 @@ int64_t vp9_block_error_c(const tran_low_t *coeff, const tran_low_t *dqcoeff,
  return error;
 }

-int64_t vp9_block_error_fp_c(const int16_t *coeff, const int16_t *dqcoeff,
-                             int block_size) {
-  int i;
-  int64_t error = 0;
-
-  for (i = 0; i < block_size; i++) {
-    const int diff = coeff[i] - dqcoeff[i];
-    error +=  diff * diff;
-  }
-
-  return error;
-}

 #if CONFIG_VP9_HIGHBITDEPTH
 int64_t vp9_highbd_block_error_c(const tran_low_t *coeff,
@@ -1561,6 +1549,13 @@ static void joint_motion_search(VP9_COMP *cpi, MACROBLOCK *x,
                       mbmi->ref_frame[1] < 0 ? 0 : mbmi->ref_frame[1]};
  int_mv ref_mv[2];
  int ite, ref;
+  // Prediction buffer from second frame.
+#if CONFIG_VP9_HIGHBITDEPTH
+  uint8_t *second_pred;
+  uint8_t *second_pred_alloc;
+#else
+  uint8_t *second_pred = vpx_memalign(16, pw * ph * sizeof(uint8_t));
+#endif  // CONFIG_VP9_HIGHBITDEPTH
  const InterpKernel *kernel = vp9_get_interp_kernel(mbmi->interp_filter);
  struct scale_factors sf;

@@ -1571,13 +1566,14 @@ static void joint_motion_search(VP9_COMP *cpi, MACROBLOCK *x,
    vp9_get_scaled_ref_frame(cpi, mbmi->ref_frame[0]),
    vp9_get_scaled_ref_frame(cpi, mbmi->ref_frame[1])
  };
-
-  // Prediction buffer from second frame.
 #if CONFIG_VP9_HIGHBITDEPTH
-  DECLARE_ALIGNED_ARRAY(16, uint16_t, second_pred_alloc_16, 64 * 64);
-  uint8_t *second_pred;
-#else
-  DECLARE_ALIGNED_ARRAY(16, uint8_t, second_pred, 64 * 64);
+  if (xd->cur_buf->flags & YV12_FLAG_HIGHBITDEPTH) {
+    second_pred_alloc = vpx_memalign(16, pw * ph * sizeof(uint16_t));
+    second_pred = CONVERT_TO_BYTEPTR(second_pred_alloc);
+  } else {
+    second_pred_alloc = vpx_memalign(16, pw * ph * sizeof(uint8_t));
+    second_pred = second_pred_alloc;
+  }
 #endif  // CONFIG_VP9_HIGHBITDEPTH

  for (ref = 0; ref < 2; ++ref) {
@@ -1632,7 +1628,6 @@ static void joint_motion_search(VP9_COMP *cpi, MACROBLOCK *x,
    // Get the prediction block from the 'other' reference frame.
 #if CONFIG_VP9_HIGHBITDEPTH
    if (xd->cur_buf->flags & YV12_FLAG_HIGHBITDEPTH) {
-      second_pred = CONVERT_TO_BYTEPTR(second_pred_alloc_16);
      vp9_highbd_build_inter_predictor(ref_yv12[!id].buf,
                                       ref_yv12[!id].stride,
                                       second_pred, pw,
@@ -1642,7 +1637,6 @@ static void joint_motion_search(VP9_COMP *cpi, MACROBLOCK *x,
                                       mi_col * MI_SIZE, mi_row * MI_SIZE,
                                       xd->bd);
    } else {
-      second_pred = (uint8_t *)second_pred_alloc_16;
      vp9_build_inter_predictor(ref_yv12[!id].buf,
                                ref_yv12[!id].stride,
                                second_pred, pw,
@@ -1728,6 +1722,12 @@ static void joint_motion_search(VP9_COMP *cpi, MACROBLOCK *x,
                                &mbmi->ref_mvs[refs[ref]][0].as_mv,
                                x->nmvjointcost, x->mvcost, MV_COST_WEIGHT);
  }
+
+#if CONFIG_VP9_HIGHBITDEPTH
+  vpx_free(second_pred_alloc);
+#else
+  vpx_free(second_pred);
+#endif  // CONFIG_VP9_HIGHBITDEPTH
 }

 static int64_t rd_pick_best_sub8x8_mode(VP9_COMP *cpi, MACROBLOCK *x,
@@ -2422,6 +2422,7 @@ static int64_t handle_inter_mode(VP9_COMP *cpi, MACROBLOCK *x,
  int_mv cur_mv[2];
 #if CONFIG_VP9_HIGHBITDEPTH
  DECLARE_ALIGNED_ARRAY(16, uint16_t, tmp_buf16, MAX_MB_PLANE * 64 * 64);
+  DECLARE_ALIGNED_ARRAY(16, uint8_t, tmp_buf8, MAX_MB_PLANE * 64 * 64);
  uint8_t *tmp_buf;
 #else
  DECLARE_ALIGNED_ARRAY(16, uint8_t, tmp_buf, MAX_MB_PLANE * 64 * 64);
@@ -2450,7 +2451,7 @@ static int64_t handle_inter_mode(VP9_COMP *cpi, MACROBLOCK *x,
  if (xd->cur_buf->flags & YV12_FLAG_HIGHBITDEPTH) {
    tmp_buf = CONVERT_TO_BYTEPTR(tmp_buf16);
  } else {
-    tmp_buf = (uint8_t *)tmp_buf16;
+    tmp_buf = tmp_buf8;
  }
 #endif  // CONFIG_VP9_HIGHBITDEPTH

@@ -2830,65 +2831,6 @@ void vp9_rd_pick_intra_mode_sb(VP9_COMP *cpi, MACROBLOCK *x,
  rd_cost->rdcost = RDCOST(x->rdmult, x->rddiv, rd_cost->rate, rd_cost->dist);
 }

-// This function is designed to apply a bias or adjustment to an rd value based
-// on the relative variance of the source and reconstruction.
-#define LOW_VAR_THRESH 16
-#define VLOW_ADJ_MAX 25
-#define VHIGH_ADJ_MAX 8
-static void rd_variance_adjustment(VP9_COMP *cpi,
-                                   MACROBLOCK *x,
-                                   BLOCK_SIZE bsize,
-                                   int64_t *this_rd,
-                                   MV_REFERENCE_FRAME ref_frame,
-                                   unsigned int source_variance) {
-  MACROBLOCKD *const xd = &x->e_mbd;
-  unsigned int recon_variance;
-  unsigned int absvar_diff = 0;
-  int64_t var_error = 0;
-  int64_t var_factor = 0;
-
-  if (*this_rd == INT64_MAX)
-    return;
-
-#if CONFIG_VP9_HIGHBITDEPTH
-  if (xd->cur_buf->flags & YV12_FLAG_HIGHBITDEPTH) {
-    recon_variance =
-      vp9_high_get_sby_perpixel_variance(cpi, &xd->plane[0].dst, bsize, xd->bd);
-  } else {
-    recon_variance =
-      vp9_get_sby_perpixel_variance(cpi, &xd->plane[0].dst, bsize);
-  }
-#else
-  recon_variance =
-    vp9_get_sby_perpixel_variance(cpi, &xd->plane[0].dst, bsize);
-#endif  // CONFIG_VP9_HIGHBITDEPTH
-
-  if ((source_variance + recon_variance) > LOW_VAR_THRESH) {
-    absvar_diff = (source_variance > recon_variance)
-      ? (source_variance - recon_variance)
-      : (recon_variance - source_variance);
-
-    var_error = (200 * source_variance * recon_variance) /
-      ((source_variance * source_variance) +
-       (recon_variance * recon_variance));
-    var_error = 100 - var_error;
-  }
-
-  // Source variance above a threshold and ref frame is intra.
-  // This case is targeted mainly at discouraging intra modes that give rise
-  // to a predictor with a low spatial complexity compared to the source.
-  if ((source_variance > LOW_VAR_THRESH) && (ref_frame == INTRA_FRAME) &&
-      (source_variance > recon_variance)) {
-    var_factor = MIN(absvar_diff, MIN(VLOW_ADJ_MAX, var_error));
-  // A second possible case of interest is where the source variance
-  // is very low and we wish to discourage false texture or motion trails.
-  } else if ((source_variance < (LOW_VAR_THRESH >> 1)) &&
-             (recon_variance > source_variance)) {
-    var_factor = MIN(absvar_diff, MIN(VHIGH_ADJ_MAX, var_error));
-  }
-  *this_rd += (*this_rd * var_factor) / 100;
-}
-
 void vp9_rd_pick_inter_mode_sb(VP9_COMP *cpi,
                               TileDataEnc *tile_data,
                               MACROBLOCK *x,
@@ -3345,11 +3287,6 @@ void vp9_rd_pick_inter_mode_sb(VP9_COMP *cpi,
      this_rd = RDCOST(x->rdmult, x->rddiv, rate2, distortion2);
    }

-    // Apply an adjustment to the rd value based on the similarity of the
-    // source variance and reconstructed variance.
-    rd_variance_adjustment(cpi, x, bsize, &this_rd,
-                           ref_frame, x->source_variance);
-
    if (ref_frame == INTRA_FRAME) {
    // Keep record of best intra rd
      if (this_rd < best_intra_rd) {
--- a/vp9/encoder/vp9_rdopt.h
+++ b/vp9/encoder/vp9_rdopt.h
@@ -29,15 +29,6 @@ void vp9_rd_pick_intra_mode_sb(struct VP9_COMP *cpi, struct macroblock *x,
                               struct RD_COST *rd_cost, BLOCK_SIZE bsize,
                               PICK_MODE_CONTEXT *ctx, int64_t best_rd);

-unsigned int vp9_get_sby_perpixel_variance(VP9_COMP *cpi,
-                                           const struct buf_2d *ref,
-                                           BLOCK_SIZE bs);
-#if CONFIG_VP9_HIGHBITDEPTH
-unsigned int vp9_high_get_sby_perpixel_variance(VP9_COMP *cpi,
-                                                const struct buf_2d *ref,
-                                                BLOCK_SIZE bs, int bd);
-#endif
-
 void vp9_rd_pick_inter_mode_sb(struct VP9_COMP *cpi,
                               struct TileDataEnc *tile_data,
                               struct macroblock *x,
--- a/vp9/encoder/vp9_speed_features.c
+++ b/vp9/encoder/vp9_speed_features.c
@@ -301,7 +301,7 @@ static void set_rt_speed_feature(VP9_COMP *cpi, SPEED_FEATURES *sf,
        (frames_since_key % (sf->last_partitioning_redo_frequency << 1) == 1);
    sf->max_delta_qindex = is_keyframe ? 20 : 15;
    sf->partition_search_type = REFERENCE_PARTITION;
-    sf->use_nonrd_pick_mode = 1;
+    sf->use_nonrd_pick_mode = !is_keyframe;
    sf->allow_skip_recode = 0;
    sf->inter_mode_mask[BLOCK_32X32] = INTER_NEAREST_NEW_ZERO;
    sf->inter_mode_mask[BLOCK_32X64] = INTER_NEAREST_NEW_ZERO;
--- a/vp9/encoder/vp9_tokenize.c
+++ b/vp9/encoder/vp9_tokenize.c
@@ -65,6 +65,18 @@ const vp9_tree_index vp9_coef_tree[TREE_SIZE(ENTROPY_TOKENS)] = {
  -CATEGORY5_TOKEN, -CATEGORY6_TOKEN   // 10 = CAT_FIVE
 };

+// Unconstrained Node Tree
+const vp9_tree_index vp9_coef_con_tree[TREE_SIZE(ENTROPY_TOKENS)] = {
+  2, 6,                                // 0 = LOW_VAL
+  -TWO_TOKEN, 4,                       // 1 = TWO
+  -THREE_TOKEN, -FOUR_TOKEN,           // 2 = THREE
+  8, 10,                               // 3 = HIGH_LOW
+  -CATEGORY1_TOKEN, -CATEGORY2_TOKEN,  // 4 = CAT_ONE
+  12, 14,                              // 5 = CAT_THREEFOUR
+  -CATEGORY3_TOKEN, -CATEGORY4_TOKEN,  // 6 = CAT_THREE
+  -CATEGORY5_TOKEN, -CATEGORY6_TOKEN   // 7 = CAT_FIVE
+};
+
 static const vp9_tree_index cat1[2] = {0, 0};
 static const vp9_tree_index cat2[4] = {2, 2, 0, 0};
 static const vp9_tree_index cat3[6] = {2, 2, 4, 4, 0, 0};
--- a/vp9/encoder/x86/vp9_avg_intrin_sse2.c
+++ b/vp9/encoder/x86/vp9_avg_intrin_sse2.c
@@ -57,179 +57,6 @@ unsigned int vp9_avg_4x4_sse2(const uint8_t *s, int p) {
  return (avg + 8) >> 4;
 }

-static void hadamard_col8_sse2(__m128i *in, int iter) {
-  __m128i a0 = in[0];
-  __m128i a1 = in[1];
-  __m128i a2 = in[2];
-  __m128i a3 = in[3];
-  __m128i a4 = in[4];
-  __m128i a5 = in[5];
-  __m128i a6 = in[6];
-  __m128i a7 = in[7];
-
-  __m128i b0 = _mm_add_epi16(a0, a1);
-  __m128i b1 = _mm_sub_epi16(a0, a1);
-  __m128i b2 = _mm_add_epi16(a2, a3);
-  __m128i b3 = _mm_sub_epi16(a2, a3);
-  __m128i b4 = _mm_add_epi16(a4, a5);
-  __m128i b5 = _mm_sub_epi16(a4, a5);
-  __m128i b6 = _mm_add_epi16(a6, a7);
-  __m128i b7 = _mm_sub_epi16(a6, a7);
-
-  a0 = _mm_add_epi16(b0, b2);
-  a1 = _mm_add_epi16(b1, b3);
-  a2 = _mm_sub_epi16(b0, b2);
-  a3 = _mm_sub_epi16(b1, b3);
-  a4 = _mm_add_epi16(b4, b6);
-  a5 = _mm_add_epi16(b5, b7);
-  a6 = _mm_sub_epi16(b4, b6);
-  a7 = _mm_sub_epi16(b5, b7);
-
-  if (iter == 0) {
-    b0 = _mm_add_epi16(a0, a4);
-    b7 = _mm_add_epi16(a1, a5);
-    b3 = _mm_add_epi16(a2, a6);
-    b4 = _mm_add_epi16(a3, a7);
-    b2 = _mm_sub_epi16(a0, a4);
-    b6 = _mm_sub_epi16(a1, a5);
-    b1 = _mm_sub_epi16(a2, a6);
-    b5 = _mm_sub_epi16(a3, a7);
-
-    a0 = _mm_unpacklo_epi16(b0, b1);
-    a1 = _mm_unpacklo_epi16(b2, b3);
-    a2 = _mm_unpackhi_epi16(b0, b1);
-    a3 = _mm_unpackhi_epi16(b2, b3);
-    a4 = _mm_unpacklo_epi16(b4, b5);
-    a5 = _mm_unpacklo_epi16(b6, b7);
-    a6 = _mm_unpackhi_epi16(b4, b5);
-    a7 = _mm_unpackhi_epi16(b6, b7);
-
-    b0 = _mm_unpacklo_epi32(a0, a1);
-    b1 = _mm_unpacklo_epi32(a4, a5);
-    b2 = _mm_unpackhi_epi32(a0, a1);
-    b3 = _mm_unpackhi_epi32(a4, a5);
-    b4 = _mm_unpacklo_epi32(a2, a3);
-    b5 = _mm_unpacklo_epi32(a6, a7);
-    b6 = _mm_unpackhi_epi32(a2, a3);
-    b7 = _mm_unpackhi_epi32(a6, a7);
-
-    in[0] = _mm_unpacklo_epi64(b0, b1);
-    in[1] = _mm_unpackhi_epi64(b0, b1);
-    in[2] = _mm_unpacklo_epi64(b2, b3);
-    in[3] = _mm_unpackhi_epi64(b2, b3);
-    in[4] = _mm_unpacklo_epi64(b4, b5);
-    in[5] = _mm_unpackhi_epi64(b4, b5);
-    in[6] = _mm_unpacklo_epi64(b6, b7);
-    in[7] = _mm_unpackhi_epi64(b6, b7);
-  } else {
-    in[0] = _mm_add_epi16(a0, a4);
-    in[7] = _mm_add_epi16(a1, a5);
-    in[3] = _mm_add_epi16(a2, a6);
-    in[4] = _mm_add_epi16(a3, a7);
-    in[2] = _mm_sub_epi16(a0, a4);
-    in[6] = _mm_sub_epi16(a1, a5);
-    in[1] = _mm_sub_epi16(a2, a6);
-    in[5] = _mm_sub_epi16(a3, a7);
-  }
-}
-
-void vp9_hadamard_8x8_sse2(int16_t const *src_diff, int src_stride,
-                           int16_t *coeff) {
-  __m128i src[8];
-  src[0] = _mm_load_si128((const __m128i *)src_diff);
-  src[1] = _mm_load_si128((const __m128i *)(src_diff += src_stride));
-  src[2] = _mm_load_si128((const __m128i *)(src_diff += src_stride));
-  src[3] = _mm_load_si128((const __m128i *)(src_diff += src_stride));
-  src[4] = _mm_load_si128((const __m128i *)(src_diff += src_stride));
-  src[5] = _mm_load_si128((const __m128i *)(src_diff += src_stride));
-  src[6] = _mm_load_si128((const __m128i *)(src_diff += src_stride));
-  src[7] = _mm_load_si128((const __m128i *)(src_diff += src_stride));
-
-  hadamard_col8_sse2(src, 0);
-  hadamard_col8_sse2(src, 1);
-
-  _mm_store_si128((__m128i *)coeff, src[0]);
-  coeff += 8;
-  _mm_store_si128((__m128i *)coeff, src[1]);
-  coeff += 8;
-  _mm_store_si128((__m128i *)coeff, src[2]);
-  coeff += 8;
-  _mm_store_si128((__m128i *)coeff, src[3]);
-  coeff += 8;
-  _mm_store_si128((__m128i *)coeff, src[4]);
-  coeff += 8;
-  _mm_store_si128((__m128i *)coeff, src[5]);
-  coeff += 8;
-  _mm_store_si128((__m128i *)coeff, src[6]);
-  coeff += 8;
-  _mm_store_si128((__m128i *)coeff, src[7]);
-}
-
-void vp9_hadamard_16x16_sse2(int16_t const *src_diff, int src_stride,
-                             int16_t *coeff) {
-  int idx;
-  for (idx = 0; idx < 4; ++idx) {
-    int16_t const *src_ptr = src_diff + (idx >> 1) * 8 * src_stride
-                                + (idx & 0x01) * 8;
-    vp9_hadamard_8x8_sse2(src_ptr, src_stride, coeff + idx * 64);
-  }
-
-  for (idx = 0; idx < 64; idx += 8) {
-    __m128i coeff0 = _mm_load_si128((const __m128i *)coeff);
-    __m128i coeff1 = _mm_load_si128((const __m128i *)(coeff + 64));
-    __m128i coeff2 = _mm_load_si128((const __m128i *)(coeff + 128));
-    __m128i coeff3 = _mm_load_si128((const __m128i *)(coeff + 192));
-
-    __m128i b0 = _mm_add_epi16(coeff0, coeff1);
-    __m128i b1 = _mm_sub_epi16(coeff0, coeff1);
-    __m128i b2 = _mm_add_epi16(coeff2, coeff3);
-    __m128i b3 = _mm_sub_epi16(coeff2, coeff3);
-
-    coeff0 = _mm_add_epi16(b0, b2);
-    coeff1 = _mm_add_epi16(b1, b3);
-    coeff0 = _mm_srai_epi16(coeff0, 1);
-    coeff1 = _mm_srai_epi16(coeff1, 1);
-    _mm_store_si128((__m128i *)coeff, coeff0);
-    _mm_store_si128((__m128i *)(coeff + 64), coeff1);
-
-    coeff2 = _mm_sub_epi16(b0, b2);
-    coeff3 = _mm_sub_epi16(b1, b3);
-    coeff2 = _mm_srai_epi16(coeff2, 1);
-    coeff3 = _mm_srai_epi16(coeff3, 1);
-    _mm_store_si128((__m128i *)(coeff + 128), coeff2);
-    _mm_store_si128((__m128i *)(coeff + 192), coeff3);
-
-    coeff += 8;
-  }
-}
-
-int16_t vp9_satd_sse2(const int16_t *coeff, int length) {
-  int i;
-  __m128i sum = _mm_load_si128((const __m128i *)coeff);
-  __m128i sign = _mm_srai_epi16(sum, 15);
-  __m128i val = _mm_xor_si128(sum, sign);
-  sum = _mm_sub_epi16(val, sign);
-  coeff += 8;
-
-  for (i = 8; i < length; i += 8) {
-    __m128i src_line = _mm_load_si128((const __m128i *)coeff);
-    sign = _mm_srai_epi16(src_line, 15);
-    val = _mm_xor_si128(src_line, sign);
-    val = _mm_sub_epi16(val, sign);
-    sum = _mm_add_epi16(sum, val);
-    coeff += 8;
-  }
-
-  val = _mm_srli_si128(sum, 8);
-  sum = _mm_add_epi16(sum, val);
-  val = _mm_srli_epi64(sum, 32);
-  sum = _mm_add_epi16(sum, val);
-  val = _mm_srli_epi32(sum, 16);
-  sum = _mm_add_epi16(sum, val);
-
-  return _mm_extract_epi16(sum, 0);
-}
-
 void vp9_int_pro_row_sse2(int16_t *hbuf, uint8_t const*ref,
                          const int ref_stride, const int height) {
  int idx;
--- a/vp9/encoder/x86/vp9_dct_ssse3.c
+++ b/vp9/encoder/x86/vp9_dct_ssse3.c
@@ -293,8 +293,7 @@ void vp9_fdct8x8_quant_ssse3(const int16_t *input, int stride,

  if (!skip_block) {
    __m128i eob;
-    __m128i round, quant, dequant, thr;
-    int16_t nzflag;
+    __m128i round, quant, dequant;
    {
      __m128i coeff0, coeff1;

@@ -369,7 +368,6 @@ void vp9_fdct8x8_quant_ssse3(const int16_t *input, int stride,

    // AC only loop
    index = 2;
-    thr = _mm_srai_epi16(dequant, 1);
    while (n_coeffs < 0) {
      __m128i coeff0, coeff1;
      {
@@ -389,39 +387,28 @@ void vp9_fdct8x8_quant_ssse3(const int16_t *input, int stride,
        qcoeff0 = _mm_sub_epi16(qcoeff0, coeff0_sign);
        qcoeff1 = _mm_sub_epi16(qcoeff1, coeff1_sign);

-        nzflag = _mm_movemask_epi8(_mm_cmpgt_epi16(qcoeff0, thr)) |
-            _mm_movemask_epi8(_mm_cmpgt_epi16(qcoeff1, thr));
+        qcoeff0 = _mm_adds_epi16(qcoeff0, round);
+        qcoeff1 = _mm_adds_epi16(qcoeff1, round);
+        qtmp0 = _mm_mulhi_epi16(qcoeff0, quant);
+        qtmp1 = _mm_mulhi_epi16(qcoeff1, quant);

-        if (nzflag) {
-          qcoeff0 = _mm_adds_epi16(qcoeff0, round);
-          qcoeff1 = _mm_adds_epi16(qcoeff1, round);
-          qtmp0 = _mm_mulhi_epi16(qcoeff0, quant);
-          qtmp1 = _mm_mulhi_epi16(qcoeff1, quant);
+        // Reinsert signs
+        qcoeff0 = _mm_xor_si128(qtmp0, coeff0_sign);
+        qcoeff1 = _mm_xor_si128(qtmp1, coeff1_sign);
+        qcoeff0 = _mm_sub_epi16(qcoeff0, coeff0_sign);
+        qcoeff1 = _mm_sub_epi16(qcoeff1, coeff1_sign);

-          // Reinsert signs
-          qcoeff0 = _mm_xor_si128(qtmp0, coeff0_sign);
-          qcoeff1 = _mm_xor_si128(qtmp1, coeff1_sign);
-          qcoeff0 = _mm_sub_epi16(qcoeff0, coeff0_sign);
-          qcoeff1 = _mm_sub_epi16(qcoeff1, coeff1_sign);
+        _mm_store_si128((__m128i*)(qcoeff_ptr + n_coeffs), qcoeff0);
+        _mm_store_si128((__m128i*)(qcoeff_ptr + n_coeffs) + 1, qcoeff1);

-          _mm_store_si128((__m128i*)(qcoeff_ptr + n_coeffs), qcoeff0);
-          _mm_store_si128((__m128i*)(qcoeff_ptr + n_coeffs) + 1, qcoeff1);
+        coeff0 = _mm_mullo_epi16(qcoeff0, dequant);
+        coeff1 = _mm_mullo_epi16(qcoeff1, dequant);

-          coeff0 = _mm_mullo_epi16(qcoeff0, dequant);
-          coeff1 = _mm_mullo_epi16(qcoeff1, dequant);
-
-          _mm_store_si128((__m128i*)(dqcoeff_ptr + n_coeffs), coeff0);
-          _mm_store_si128((__m128i*)(dqcoeff_ptr + n_coeffs) + 1, coeff1);
-        } else {
-          _mm_store_si128((__m128i*)(qcoeff_ptr + n_coeffs), zero);
-          _mm_store_si128((__m128i*)(qcoeff_ptr + n_coeffs) + 1, zero);
-
-          _mm_store_si128((__m128i*)(dqcoeff_ptr + n_coeffs), zero);
-          _mm_store_si128((__m128i*)(dqcoeff_ptr + n_coeffs) + 1, zero);
-        }
+        _mm_store_si128((__m128i*)(dqcoeff_ptr + n_coeffs), coeff0);
+        _mm_store_si128((__m128i*)(dqcoeff_ptr + n_coeffs) + 1, coeff1);
      }

-      if (nzflag) {
+      {
        // Scan for eob
        __m128i zero_coeff0, zero_coeff1;
        __m128i nzero_coeff0, nzero_coeff1;
--- a/vp9/encoder/x86/vp9_error_sse2.asm
+++ b/vp9/encoder/x86/vp9_error_sse2.asm
@@ -72,49 +72,3 @@ cglobal block_error, 3, 3, 8, uqc, dqc, size, ssz
  movd    edx, m5
 %endif
  RET
-
-; Compute the sum of squared difference between two int16_t vectors.
-; int64_t vp9_block_error_fp(int16_t *coeff, int16_t *dqcoeff,
-;                            intptr_t block_size)
-
-INIT_XMM sse2
-cglobal block_error_fp, 3, 3, 8, uqc, dqc, size
-  pxor      m4, m4                 ; sse accumulator
-  pxor      m5, m5                 ; dedicated zero register
-  lea     uqcq, [uqcq+sizeq*2]
-  lea     dqcq, [dqcq+sizeq*2]
-  neg    sizeq
-.loop:
-  mova      m2, [uqcq+sizeq*2]
-  mova      m0, [dqcq+sizeq*2]
-  mova      m3, [uqcq+sizeq*2+mmsize]
-  mova      m1, [dqcq+sizeq*2+mmsize]
-  psubw     m0, m2
-  psubw     m1, m3
-  ; individual errors are max. 15bit+sign, so squares are 30bit, and
-  ; thus the sum of 2 should fit in a 31bit integer (+ unused sign bit)
-  pmaddwd   m0, m0
-  pmaddwd   m1, m1
-  ; accumulate in 64bit
-  punpckldq m7, m0, m5
-  punpckhdq m0, m5
-  paddq     m4, m7
-  punpckldq m7, m1, m5
-  paddq     m4, m0
-  punpckhdq m1, m5
-  paddq     m4, m7
-  paddq     m4, m1
-  add    sizeq, mmsize
-  jl .loop
-
-  ; accumulate horizontally and store in return value
-  movhlps   m5, m4
-  paddq     m4, m5
-%if ARCH_X86_64
-  movq    rax, m4
-%else
-  pshufd   m5, m4, 0x1
-  movd    eax, m4
-  movd    edx, m5
-%endif
-  RET
--- a/vp9/encoder/x86/vp9_quantize_sse2.c
+++ b/vp9/encoder/x86/vp9_quantize_sse2.c
@@ -230,8 +230,6 @@ void vp9_quantize_fp_sse2(const int16_t* coeff_ptr, intptr_t n_coeffs,
                          const int16_t* scan_ptr,
                          const int16_t* iscan_ptr) {
  __m128i zero;
-  __m128i thr;
-  int16_t nzflag;
  (void)scan_ptr;
  (void)zbin_ptr;
  (void)quant_shift_ptr;
@@ -318,8 +316,6 @@ void vp9_quantize_fp_sse2(const int16_t* coeff_ptr, intptr_t n_coeffs,
      n_coeffs += 8 * 2;
    }

-    thr = _mm_srai_epi16(dequant, 1);
-
    // AC only loop
    while (n_coeffs < 0) {
      __m128i coeff0, coeff1;
@@ -339,39 +335,28 @@ void vp9_quantize_fp_sse2(const int16_t* coeff_ptr, intptr_t n_coeffs,
        qcoeff0 = _mm_sub_epi16(qcoeff0, coeff0_sign);
        qcoeff1 = _mm_sub_epi16(qcoeff1, coeff1_sign);

-        nzflag = _mm_movemask_epi8(_mm_cmpgt_epi16(qcoeff0, thr)) |
-            _mm_movemask_epi8(_mm_cmpgt_epi16(qcoeff1, thr));
+        qcoeff0 = _mm_adds_epi16(qcoeff0, round);
+        qcoeff1 = _mm_adds_epi16(qcoeff1, round);
+        qtmp0 = _mm_mulhi_epi16(qcoeff0, quant);
+        qtmp1 = _mm_mulhi_epi16(qcoeff1, quant);

-        if (nzflag) {
-          qcoeff0 = _mm_adds_epi16(qcoeff0, round);
-          qcoeff1 = _mm_adds_epi16(qcoeff1, round);
-          qtmp0 = _mm_mulhi_epi16(qcoeff0, quant);
-          qtmp1 = _mm_mulhi_epi16(qcoeff1, quant);
+        // Reinsert signs
+        qcoeff0 = _mm_xor_si128(qtmp0, coeff0_sign);
+        qcoeff1 = _mm_xor_si128(qtmp1, coeff1_sign);
+        qcoeff0 = _mm_sub_epi16(qcoeff0, coeff0_sign);
+        qcoeff1 = _mm_sub_epi16(qcoeff1, coeff1_sign);

-          // Reinsert signs
-          qcoeff0 = _mm_xor_si128(qtmp0, coeff0_sign);
-          qcoeff1 = _mm_xor_si128(qtmp1, coeff1_sign);
-          qcoeff0 = _mm_sub_epi16(qcoeff0, coeff0_sign);
-          qcoeff1 = _mm_sub_epi16(qcoeff1, coeff1_sign);
+        _mm_store_si128((__m128i*)(qcoeff_ptr + n_coeffs), qcoeff0);
+        _mm_store_si128((__m128i*)(qcoeff_ptr + n_coeffs) + 1, qcoeff1);

-          _mm_store_si128((__m128i*)(qcoeff_ptr + n_coeffs), qcoeff0);
-          _mm_store_si128((__m128i*)(qcoeff_ptr + n_coeffs) + 1, qcoeff1);
+        coeff0 = _mm_mullo_epi16(qcoeff0, dequant);
+        coeff1 = _mm_mullo_epi16(qcoeff1, dequant);

-          coeff0 = _mm_mullo_epi16(qcoeff0, dequant);
-          coeff1 = _mm_mullo_epi16(qcoeff1, dequant);
-
-          _mm_store_si128((__m128i*)(dqcoeff_ptr + n_coeffs), coeff0);
-          _mm_store_si128((__m128i*)(dqcoeff_ptr + n_coeffs) + 1, coeff1);
-        } else {
-          _mm_store_si128((__m128i*)(qcoeff_ptr + n_coeffs), zero);
-          _mm_store_si128((__m128i*)(qcoeff_ptr + n_coeffs) + 1, zero);
-
-          _mm_store_si128((__m128i*)(dqcoeff_ptr + n_coeffs), zero);
-          _mm_store_si128((__m128i*)(dqcoeff_ptr + n_coeffs) + 1, zero);
-        }
+        _mm_store_si128((__m128i*)(dqcoeff_ptr + n_coeffs), coeff0);
+        _mm_store_si128((__m128i*)(dqcoeff_ptr + n_coeffs) + 1, coeff1);
      }

-      if (nzflag) {
+      {
        // Scan for eob
        __m128i zero_coeff0, zero_coeff1;
        __m128i nzero_coeff0, nzero_coeff1;
--- a/vp9/encoder/x86/vp9_quantize_ssse3_x86_64.asm
+++ b/vp9/encoder/x86/vp9_quantize_ssse3_x86_64.asm
@@ -282,8 +282,6 @@ cglobal quantize_%1, 0, %2, 15, coeff, ncoeff, skip, zbin, round, quant, \
  psignw                          m8, m9
  psignw                         m13, m10
  psrlw                           m0, m3, 2
-%else
-  psrlw                           m0, m3, 1
 %endif
  mova            [r4q+ncoeffq*2+ 0], m8
  mova            [r4q+ncoeffq*2+16], m13
@@ -304,7 +302,7 @@ cglobal quantize_%1, 0, %2, 15, coeff, ncoeff, skip, zbin, round, quant, \
  mova                           m10, [  coeffq+ncoeffq*2+16] ; m10 = c[i]
  pabsw                           m6, m9                   ; m6 = abs(m9)
  pabsw                          m11, m10                  ; m11 = abs(m10)
-
+%ifidn %1, fp_32x32
  pcmpgtw                         m7, m6,  m0
  pcmpgtw                        m12, m11, m0
  pmovmskb                       r6d, m7
@@ -312,7 +310,7 @@ cglobal quantize_%1, 0, %2, 15, coeff, ncoeff, skip, zbin, round, quant, \

  or                              r6, r2
  jz .skip_iter
-
+%endif
  pcmpeqw                         m7, m7

  paddsw                          m6, m1                   ; m6 += round
@@ -350,6 +348,7 @@ cglobal quantize_%1, 0, %2, 15, coeff, ncoeff, skip, zbin, round, quant, \
  add                        ncoeffq, mmsize
  jl .ac_only_loop

+%ifidn %1, fp_32x32
  jmp .accumulate_eob
 .skip_iter:
  mova            [r3q+ncoeffq*2+ 0], m5
@@ -358,6 +357,7 @@ cglobal quantize_%1, 0, %2, 15, coeff, ncoeff, skip, zbin, round, quant, \
  mova            [r4q+ncoeffq*2+16], m5
  add                        ncoeffq, mmsize
  jl .ac_only_loop
+%endif

 .accumulate_eob:
  ; horizontally accumulate/max eobs and write into [eob] memory pointer
--- a/vp9/vp9_cx_iface.c
+++ b/vp9/vp9_cx_iface.c
@@ -1260,21 +1260,6 @@ static vpx_codec_err_t ctrl_set_active_map(vpx_codec_alg_priv_t *ctx,
  }
 }

-static vpx_codec_err_t ctrl_get_active_map(vpx_codec_alg_priv_t *ctx,
-                                           va_list args) {
-  vpx_active_map_t *const map = va_arg(args, vpx_active_map_t *);
-
-  if (map) {
-    if (!vp9_get_active_map(ctx->cpi, map->active_map,
-                            (int)map->rows, (int)map->cols))
-      return VPX_CODEC_OK;
-    else
-      return VPX_CODEC_INVALID_PARAM;
-  } else {
-    return VPX_CODEC_INVALID_PARAM;
-  }
-}
-
 static vpx_codec_err_t ctrl_set_scale_mode(vpx_codec_alg_priv_t *ctx,
                                           va_list args) {
  vpx_scaling_mode_t *const mode = va_arg(args, vpx_scaling_mode_t *);
@@ -1432,7 +1417,6 @@ static vpx_codec_ctrl_fn_map_t encoder_ctrl_maps[] = {
 #if VPX_ENCODER_ABI_VERSION > (4 + VPX_CODEC_ABI_VERSION)
  {VP9E_GET_SVC_LAYER_ID,             ctrl_get_svc_layer_id},
 #endif
-  {VP9E_GET_ACTIVEMAP,                ctrl_get_active_map},

  { -1, NULL},
 };
--- a/vp9/vp9_dx_iface.c
+++ b/vp9/vp9_dx_iface.c
@@ -116,9 +116,6 @@ static vpx_codec_err_t decoder_destroy(vpx_codec_alg_priv_t *ctx) {
          (FrameWorkerData *)worker->data1;
      vp9_get_worker_interface()->end(worker);
      vp9_remove_common(&frame_worker_data->pbi->common);
-#if CONFIG_VP9_POSTPROC
-      vp9_free_postproc_buffers(&frame_worker_data->pbi->common);
-#endif
      vp9_decoder_remove(frame_worker_data->pbi);
      vpx_free(frame_worker_data->scratch_buffer);
 #if CONFIG_MULTITHREAD
@@ -132,10 +129,8 @@ static vpx_codec_err_t decoder_destroy(vpx_codec_alg_priv_t *ctx) {
 #endif
  }

-  if (ctx->buffer_pool) {
-    vp9_free_ref_frame_buffers(ctx->buffer_pool);
+  if (ctx->buffer_pool)
    vp9_free_internal_frame_buffers(&ctx->buffer_pool->int_frame_buffers);
-  }

  vpx_free(ctx->frame_workers);
  vpx_free(ctx->buffer_pool);
@@ -755,8 +750,6 @@ static vpx_image_t *decoder_get_frame(vpx_codec_alg_priv_t *ctx,
          (FrameWorkerData *)worker->data1;
      ctx->next_output_worker_id =
          (ctx->next_output_worker_id + 1) % ctx->num_frame_workers;
-      if (ctx->base.init_flags & VPX_CODEC_USE_POSTPROC)
-        set_ppflags(ctx, &flags);
      // Wait for the frame from worker thread.
      if (winterface->sync(worker)) {
        // Check if worker has received any frames.
--- a/vpx/internal/vpx_codec_internal.h
+++ b/vpx/internal/vpx_codec_internal.h
@@ -425,18 +425,10 @@ struct vpx_internal_error_info {
  jmp_buf          jmp;
 };

-#define CLANG_ANALYZER_NORETURN
-#if defined(__has_feature)
-#if __has_feature(attribute_analyzer_noreturn)
-#undef CLANG_ANALYZER_NORETURN
-#define CLANG_ANALYZER_NORETURN __attribute__((analyzer_noreturn))
-#endif
-#endif
-
 void vpx_internal_error(struct vpx_internal_error_info *info,
                        vpx_codec_err_t                 error,
                        const char                     *fmt,
-                        ...) CLANG_ANALYZER_NORETURN;
+                        ...);

 #ifdef __cplusplus
 }  // extern "C"
--- a/vpx/vp8cx.h
+++ b/vpx/vp8cx.h
@@ -508,12 +508,6 @@ enum vp8e_enc_control_id {
   * Supported in codecs: VP9
   */
  VP9E_SET_COLOR_SPACE,
-
-  /*!\brief Codec control function to get an Active map back from the encoder.
-   *
-   * Supported in codecs: VP9
-   */
-  VP9E_GET_ACTIVEMAP,
 };

 /*!\brief vpx 1-D scaling mode
@@ -697,8 +691,6 @@ VPX_CTRL_USE_TYPE(VP9E_SET_NOISE_SENSITIVITY,  unsigned int)
 VPX_CTRL_USE_TYPE(VP9E_SET_TUNE_CONTENT, int) /* vp9e_tune_content */

 VPX_CTRL_USE_TYPE(VP9E_SET_COLOR_SPACE, int)
-
-VPX_CTRL_USE_TYPE(VP9E_GET_ACTIVEMAP, vpx_active_map_t *)
 /*! @} - end defgroup vp8_encoder */
 #ifdef __cplusplus
 }  // extern "C"
--- a/vpx/vpx_encoder.h
+++ b/vpx/vpx_encoder.h
@@ -59,7 +59,7 @@ extern "C" {
   * types, removing or reassigning enums, adding/removing/rearranging
   * fields to structures
   */
-#define VPX_ENCODER_ABI_VERSION (4 + 1 + VPX_CODEC_ABI_VERSION) /**<\hideinitializer*/
+#define VPX_ENCODER_ABI_VERSION (4 + VPX_CODEC_ABI_VERSION) /**<\hideinitializer*/


  /*! \brief Encoder capabilities bitfield
--- a/vpxdec.c
+++ b/vpxdec.c
@@ -1080,6 +1080,9 @@ int main_loop(int argc, const char **argv_) {
        }
      }
    }
+
+    if (stop_after && frame_in >= stop_after)
+      break;
  }

  if (summary || progress) {
--- a/webmdec.cc
+++ b/webmdec.cc
@@ -63,7 +63,6 @@ int file_is_webm(struct WebmInputContext *webm_ctx,
                 struct VpxInputContext *vpx_ctx) {
  mkvparser::MkvReader *const reader = new mkvparser::MkvReader(vpx_ctx->file);
  webm_ctx->reader = reader;
-  webm_ctx->reached_eos = 0;

  mkvparser::EBMLHeader header;
  long long pos = 0;
@@ -122,11 +121,6 @@ int webm_read_frame(struct WebmInputContext *webm_ctx,
                    uint8_t **buffer,
                    size_t *bytes_in_buffer,
                    size_t *buffer_size) {
-  // This check is needed for frame parallel decoding, in which case this
-  // function could be called even after it has reached end of input stream.
-  if (webm_ctx->reached_eos) {
-    return 1;
-  }
  mkvparser::Segment *const segment =
      reinterpret_cast<mkvparser::Segment*>(webm_ctx->segment);
  const mkvparser::Cluster* cluster =
@@ -146,7 +140,6 @@ int webm_read_frame(struct WebmInputContext *webm_ctx,
      cluster = segment->GetNext(cluster);
      if (cluster == NULL || cluster->EOS()) {
        *bytes_in_buffer = 0;
-        webm_ctx->reached_eos = 1;
        return 1;
      }
      status = cluster->GetFirst(block_entry);
--- a/webmdec.h
+++ b/webmdec.h
@@ -29,7 +29,6 @@ struct WebmInputContext {
  int video_track_index;
  uint64_t timestamp_ns;
  int is_key_frame;
-  int reached_eos;
 };

 // Checks if the input is a WebM file. If so, initializes WebMInputContext so
Author	SHA1	Message	Date
Johann	c74bf6d889	Update CHANGELOG for v1.4.0 (Indian Runner Duck) release Change-Id: Id31b4da40c484aefc1236f5cc568171a9fd12af2	2015-04-03 11:49:19 -07:00
James Zern	d181a627f0	vp9: fix high-bitdepth NEON build remove incorrect specializations in rtcd and update a configuration check in partial_idct_test.cc (cherry picked from commit `8845334097`) Change-Id: I20f551f38ce502092b476fb16d3ca0969dba56f0	2015-04-02 15:19:46 -07:00
Adrian Grange	5ef2d1ddae	Fix use of scaling in joint motion search To enable us to the scale-invariant motion estimation code during mode selection, each of the reference buffers is scaled to match the size of the frame being encoded. This fix ensures that a unit scaling factor is used in this case rather than the one calculated assuming that the reference frame is not scaled. (cherry picked from commit `8d8d7bfde5`) Change-Id: Id9a5c85dad402f3a7cc7ea9f30f204edad080ebf	2015-04-02 15:19:23 -07:00
Johann	bb5a39c1a7	Prepare Release Candidate for libvpx v1.4.0 Change-Id: I9ffd30c88a5e40c555bde1f5efcf8a3c9ffcf5ff	2015-03-23 23:54:52 -07:00
James Zern	19b4dead25	vp8cx.h: vpx/vpx_encoder.h -> ./vpx_encoder.h this matches the other includes and simplifies include paths in builds from source (cherry picked from commit `7999c07697`) Change-Id: I344902c84f688ef93c9f3a53e7c06c30db49d8d3	2015-03-23 17:21:27 -07:00