Syntax coding

Change-Id: I6cac24c4f1e44f29ffcc9b87ba1167eeb32d1b69
Remove get_nonrd_var_based_fixed_partition function
2015-04-15 16:48:45 -07:00 · 2015-04-09 09:49:55 -07:00 · 2015-04-07 12:41:46 -07:00 · 2015-04-04 09:58:45 -07:00 · 2015-04-03 10:31:51 -07:00 · 2015-04-01 15:28:01 -07:00
51 changed files with 1640 additions and 617 deletions
--- a/.mailmap
+++ b/.mailmap
@@ -1,26 +1,18 @@
 Adrian Grange <agrange@google.com>
-Alex Converse <aconverse@google.com> <alex.converse@gmail.com>
 Alexis Ballier <aballier@gentoo.org> <alexis.ballier@gmail.com>
-Alpha Lam <hclam@google.com> <hclam@chromium.org>
-Deb Mukherjee <debargha@google.com>
-Erik Niemeyer <erik.a.niemeyer@intel.com> <erik.a.niemeyer@gmail.com>
-Guillaume Martres <gmartres@google.com> <smarter3@gmail.com>
 Hangyu Kuang <hkuang@google.com>
 Jim Bankoski <jimbankoski@google.com>
+John Koleszar <jkoleszar@google.com>
 Johann Koenig <johannkoenig@google.com>
 Johann Koenig <johannkoenig@google.com> <johann.koenig@duck.com>
-John Koleszar <jkoleszar@google.com>
-Joshua Litt <joshualitt@google.com> <joshualitt@chromium.org>
-Marco Paniconi <marpan@google.com>
-Marco Paniconi <marpan@google.com> <marpan@chromium.org>
+Johann Koenig <johannkoenig@google.com> <johannkoenig@dhcp-172-19-7-52.mtv.corp.google.com>
 Pascal Massimino <pascal.massimino@gmail.com>
-Paul Wilkins <paulwilkins@google.com>
-Ralph Giles <giles@xiph.org> <giles@entropywave.com>
-Ralph Giles <giles@xiph.org> <giles@mozilla.com>
 Sami Pietilä <samipietila@google.com>
-Tamar Levy <tamar.levy@intel.com>
-Tamar Levy <tamar.levy@intel.com> <levytamar82@gmail.com>
 Tero Rintaluoma <teror@google.com> <tero.rintaluoma@on2.com>
 Timothy B. Terriberry <tterribe@xiph.org> Tim Terriberry <tterriberry@mozilla.com>
 Tom Finegan <tomfinegan@google.com>
+Ralph Giles <giles@xiph.org> <giles@entropywave.com>
+Ralph Giles <giles@xiph.org> <giles@mozilla.com>
+Alpha Lam <hclam@google.com> <hclam@chromium.org>
+Deb Mukherjee <debargha@google.com>
 Yaowu Xu <yaowu@google.com> <yaowu@xuyaowu.com>
--- a/29
+++ b/29
@@ -3,11 +3,10 @@

 Aaron Watry <awatry@gmail.com>
 Abo Talib Mahfoodh <ab.mahfoodh@gmail.com>
-Adam Xu <adam@xuyaowu.com>
 Adrian Grange <agrange@google.com>
 Ahmad Sharif <asharif@google.com>
 Alexander Voronov <avoronov@graphics.cs.msu.ru>
-Alex Converse <aconverse@google.com>
+Alex Converse <alex.converse@gmail.com>
 Alexis Ballier <aballier@gentoo.org>
 Alok Ahuja <waveletcoeff@gmail.com>
 Alpha Lam <hclam@google.com>
@@ -15,58 +14,44 @@ A.Mahfoodh <ab.mahfoodh@gmail.com>
 Ami Fischman <fischman@chromium.org>
 Andoni Morales Alastruey <ylatuya@gmail.com>
 Andres Mejia <mcitadel@gmail.com>
-Andrew Russell <anrussell@google.com>
 Aron Rosenberg <arosenberg@logitech.com>
 Attila Nagy <attilanagy@google.com>
 changjun.yang <changjun.yang@intel.com>
-Charles 'Buck' Krasic <ckrasic@google.com>
 chm <chm@rock-chips.com>
 Christian Duvivier <cduvivier@google.com>
 Daniel Kang <ddkang@google.com>
 Deb Mukherjee <debargha@google.com>
-Dim Temp <dimtemp0@gmail.com>
 Dmitry Kovalev <dkovalev@google.com>
 Dragan Mrdjan <dmrdjan@mips.com>
-Ehsan Akhgari <ehsan.akhgari@gmail.com>
-Erik Niemeyer <erik.a.niemeyer@intel.com>
+Erik Niemeyer <erik.a.niemeyer@gmail.com>
 Fabio Pedretti <fabio.ped@libero.it>
 Frank Galligan <fgalligan@google.com>
 Fredrik Söderquist <fs@opera.com>
 Fritz Koenig <frkoenig@google.com>
 Gaute Strokkenes <gaute.strokkenes@broadcom.com>
 Giuseppe Scrivano <gscrivano@gnu.org>
-Gordana Cmiljanovic <gordana.cmiljanovic@imgtec.com>
 Guillaume Martres <gmartres@google.com>
 Guillermo Ballester Valor <gbvalor@gmail.com>
 Hangyu Kuang <hkuang@google.com>
-Hanno Böck <hanno@hboeck.de>
 Henrik Lundin <hlundin@google.com>
 Hui Su <huisu@google.com>
 Ivan Maltz <ivanmaltz@google.com>
-Jacek Caban <cjacek@gmail.com>
-JackyChen <jackychen@google.com>
 James Berry <jamesberry@google.com>
-James Yu <james.yu@linaro.org>
 James Zern <jzern@google.com>
-Jan Gerber <j@mailb.org>
 Jan Kratochvil <jan.kratochvil@redhat.com>
 Janne Salonen <jsalonen@google.com>
 Jeff Faust <jfaust@google.com>
 Jeff Muizelaar <jmuizelaar@mozilla.com>
 Jeff Petkau <jpet@chromium.org>
-Jia Jia <jia.jia@linaro.org>
 Jim Bankoski <jimbankoski@google.com>
 Jingning Han <jingning@google.com>
-Joey Parrish <joeyparrish@google.com>
 Johann Koenig <johannkoenig@google.com>
 John Koleszar <jkoleszar@google.com>
-John Stark <jhnstrk@gmail.com>
 Joshua Bleecher Snyder <josh@treelinelabs.com>
 Joshua Litt <joshualitt@google.com>
 Justin Clift <justin@salasaga.org>
 Justin Lebar <justin.lebar@gmail.com>
 KO Myung-Hun <komh@chollian.net>
-Lawrence Velázquez <larryv@macports.org>
 Lou Quillio <louquillio@google.com>
 Luca Barbato <lu_zero@gentoo.org>
 Makoto Kato <makoto.kt@gmail.com>
@@ -80,7 +65,6 @@ Michael Kohler <michaelkohler@live.com>
 Mike Frysinger <vapier@chromium.org>
 Mike Hommey <mhommey@mozilla.com>
 Mikhal Shemer <mikhal@google.com>
-Minghai Shang <minghai@google.com>
 Morton Jonuschat <yabawock@gmail.com>
 Parag Salasakar <img.mips1@gmail.com>
 Pascal Massimino <pascal.massimino@gmail.com>
@@ -88,8 +72,6 @@ Patrik Westin <patrik.westin@gmail.com>
 Paul Wilkins <paulwilkins@google.com>
 Pavol Rusnak <stick@gk2.sk>
 Paweł Hajdan <phajdan@google.com>
-Pengchong Jin <pengchong@google.com>
-Peter de Rivaz <peter.derivaz@gmail.com>
 Philip Jägenstedt <philipj@opera.com>
 Priit Laes <plaes@plaes.org>
 Rafael Ávila de Espíndola <rafael.espindola@gmail.com>
@@ -97,29 +79,22 @@ Rafaël Carré <funman@videolan.org>
 Ralph Giles <giles@xiph.org>
 Rob Bradford <rob@linux.intel.com>
 Ronald S. Bultje <rbultje@google.com>
-Rui Ueyama <ruiu@google.com>
 Sami Pietilä <samipietila@google.com>
 Scott Graham <scottmg@chromium.org>
 Scott LaVarnway <slavarnway@google.com>
-Sean McGovern <gseanmcg@gmail.com>
-Sergey Ulanov <sergeyu@chromium.org>
 Shimon Doodkin <helpmepro1@gmail.com>
 Stefan Holmer <holmer@google.com>
 Suman Sunkara <sunkaras@google.com>
 Taekhyun Kim <takim@nvidia.com>
 Takanori MATSUURA <t.matsuu@gmail.com>
 Tamar Levy <tamar.levy@intel.com>
-Tao Bai <michaelbai@chromium.org>
 Tero Rintaluoma <teror@google.com>
 Thijs Vermeir <thijsvermeir@gmail.com>
-Tim Kopp <tkopp@google.com>
 Timothy B. Terriberry <tterribe@xiph.org>
 Tom Finegan <tomfinegan@google.com>
 Vignesh Venkatasubramanian <vigneshv@google.com>
 Yaowu Xu <yaowu@google.com>
-Yongzhe Wang <yongzhe@google.com>
 Yunqing Wang <yunqingwang@google.com>
-Zoe Liu <zoeliu@google.com>
 Google Inc.
 The Mozilla Foundation
 The Xiph.Org Foundation
--- a/23
+++ b/23
@@ -1,26 +1,3 @@
-2015-04-03 v1.4.0 "Indian Runner Duck"
-  This release includes significant improvements to the VP9 codec.
-
-  - Upgrading:
-    This release is ABI incompatible with 1.3.0. It drops the compatibility
-    layer, requiring VPX_IMG_FMT_* instead of IMG_FMT_*, and adds several codec
-    controls for VP9.
-
-  - Enhancements:
-    Faster VP9 encoding and decoding
-    Multithreaded VP9 decoding (tile and frame-based)
-    Multithreaded VP9 encoding - on by default
-    YUV 4:2:2 and 4:4:4 support in VP9
-    10 and 12bit support in VP9
-    64bit ARM support by replacing ARM assembly with intrinsics
-
-  - Bug Fixes:
-    Fixes a VP9 bitstream issue in Profile 1. This only affected non-YUV 4:2:0
-    files.
-
-  - Known Issues:
-    Frame Parallel decoding fails for segmented and non-420 files.
-
 2013-11-15 v1.3.0 "Forest"
  This release introduces the VP9 codec in a backward-compatible way.
  All existing users of VP8 can continue to use the library without
--- a/5
+++ b/5
@@ -1,4 +1,4 @@
-README - 23 March 2015
+README - 30 May 2014

 Welcome to the WebM VP8/VP9 Codec SDK!

@@ -78,7 +78,6 @@ COMPILING THE APPLICATIONS/LIBRARIES:
    x86-darwin11-gcc
    x86-darwin12-gcc
    x86-darwin13-gcc
-    x86-darwin14-gcc
    x86-iphonesimulator-gcc
    x86-linux-gcc
    x86-linux-icc
@@ -96,7 +95,6 @@ COMPILING THE APPLICATIONS/LIBRARIES:
    x86_64-darwin11-gcc
    x86_64-darwin12-gcc
    x86_64-darwin13-gcc
-    x86_64-darwin14-gcc
    x86_64-iphonesimulator-gcc
    x86_64-linux-gcc
    x86_64-linux-icc
@@ -113,7 +111,6 @@ COMPILING THE APPLICATIONS/LIBRARIES:
    universal-darwin11-gcc
    universal-darwin12-gcc
    universal-darwin13-gcc
-    universal-darwin14-gcc
    generic-gnu

  The generic-gnu target, in conjunction with the CROSS environment variable,
--- a/build/make/Makefile
+++ b/build/make/Makefile
@@ -383,8 +383,8 @@ LIBS=$(call enabled,LIBS)
 .libs: $(LIBS)
 	@touch $@
 $(foreach lib,$(filter %_g.a,$(LIBS)),$(eval $(call archive_template,$(lib))))
-$(foreach lib,$(filter %so.$(SO_VERSION_MAJOR).$(SO_VERSION_MINOR).$(SO_VERSION_PATCH),$(LIBS)),$(eval $(call so_template,$(lib))))
-$(foreach lib,$(filter %$(SO_VERSION_MAJOR).dylib,$(LIBS)),$(eval $(call dl_template,$(lib))))
+$(foreach lib,$(filter %so.$(VERSION_MAJOR).$(VERSION_MINOR).$(VERSION_PATCH),$(LIBS)),$(eval $(call so_template,$(lib))))
+$(foreach lib,$(filter %$(VERSION_MAJOR).dylib,$(LIBS)),$(eval $(call dl_template,$(lib))))

 INSTALL-LIBS=$(call cond_enabled,CONFIG_INSTALL_LIBS,INSTALL-LIBS)
 ifeq ($(MAKECMDGOALS),dist)
--- a/build/make/configure.sh
+++ b/build/make/configure.sh
@@ -1041,6 +1041,31 @@ EOF
        check_add_cflags -mips32r2 -mdspr2
        disable_feature fast_unaligned
      fi
+
+      if [ -n "${tune_cpu}" ]; then
+        case ${tune_cpu} in
+          p5600)
+            add_cflags -mips32r5 -funroll-loops -mload-store-pairs
+            add_cflags -msched-weight -mhard-float
+            add_asflags -mips32r5 -mhard-float
+            ;;
+          i6400)
+            add_cflags -mips64r6 -mabi=64 -funroll-loops -mload-store-pairs
+            add_cflags -msched-weight -mhard-float
+            add_asflags -mips64r6 -mabi=64 -mhard-float
+            add_ldflags -mips64r6 -mabi=64
+            ;;
+        esac
+
+        if enabled msa; then
+          add_cflags -mmsa -mfp64 -flax-vector-conversions
+          add_asflags -mmsa -mfp64 -flax-vector-conversions
+          add_ldflags -mmsa -mfp64 -flax-vector-conversions
+
+          disable_feature fast_unaligned
+        fi
+      fi
+
      check_add_cflags -march=${tgt_isa}
      check_add_asflags -march=${tgt_isa}
      check_add_asflags -KPIC
--- a/build/make/rtcd.pl
+++ b/build/make/rtcd.pl
@@ -376,6 +376,10 @@ if ($opts{arch} eq 'x86') {
      @ALL_ARCHS = filter("$opts{arch}", qw/dspr2/);
      last;
    }
+    if (/HAVE_MSA=yes/) {
+      @ALL_ARCHS = filter("$opts{arch}", qw/msa/);
+      last;
+    }
  }
  close CONFIG_FILE;
  mips;
--- a/2
+++ b/2
@@ -258,7 +258,7 @@ ARCH_EXT_LIST="

    mips32
    dspr2
-
+    msa
    mips64

    mmx
--- a/libs.mk
+++ b/libs.mk
@@ -230,27 +230,25 @@ $(BUILD_PFX)libvpx_g.a: $(LIBVPX_OBJS)

 BUILD_LIBVPX_SO         := $(if $(BUILD_LIBVPX),$(CONFIG_SHARED))

-SO_VERSION_MAJOR := 2
-SO_VERSION_MINOR := 0
-SO_VERSION_PATCH := 0
 ifeq ($(filter darwin%,$(TGT_OS)),$(TGT_OS))
-LIBVPX_SO               := libvpx.$(SO_VERSION_MAJOR).dylib
+LIBVPX_SO               := libvpx.$(VERSION_MAJOR).dylib
 EXPORT_FILE             := libvpx.syms
 LIBVPX_SO_SYMLINKS      := $(addprefix $(LIBSUBDIR)/, \
                             libvpx.dylib  )
 else
-LIBVPX_SO               := libvpx.so.$(SO_VERSION_MAJOR).$(SO_VERSION_MINOR).$(SO_VERSION_PATCH)
+LIBVPX_SO               := libvpx.so.$(VERSION_MAJOR).$(VERSION_MINOR).$(VERSION_PATCH)
 EXPORT_FILE             := libvpx.ver
+SYM_LINK                := libvpx.so
 LIBVPX_SO_SYMLINKS      := $(addprefix $(LIBSUBDIR)/, \
-                             libvpx.so libvpx.so.$(SO_VERSION_MAJOR) \
-                             libvpx.so.$(SO_VERSION_MAJOR).$(SO_VERSION_MINOR))
+                             libvpx.so libvpx.so.$(VERSION_MAJOR) \
+                             libvpx.so.$(VERSION_MAJOR).$(VERSION_MINOR))
 endif

 LIBS-$(BUILD_LIBVPX_SO) += $(BUILD_PFX)$(LIBVPX_SO)\
                           $(notdir $(LIBVPX_SO_SYMLINKS))
 $(BUILD_PFX)$(LIBVPX_SO): $(LIBVPX_OBJS) $(EXPORT_FILE)
 $(BUILD_PFX)$(LIBVPX_SO): extralibs += -lm
-$(BUILD_PFX)$(LIBVPX_SO): SONAME = libvpx.so.$(SO_VERSION_MAJOR)
+$(BUILD_PFX)$(LIBVPX_SO): SONAME = libvpx.so.$(VERSION_MAJOR)
 $(BUILD_PFX)$(LIBVPX_SO): EXPORTS_FILE = $(EXPORT_FILE)

 libvpx.ver: $(call enabled,CODEC_EXPORTS)
--- a/test/partial_idct_test.cc
+++ b/test/partial_idct_test.cc
@@ -230,7 +230,7 @@ INSTANTIATE_TEST_CASE_P(
                   &vp9_idct4x4_1_add_c,
                   TX_4X4, 1)));

-#if HAVE_NEON && !CONFIG_VP9_HIGHBITDEPTH && !CONFIG_EMULATE_HARDWARE
+#if HAVE_NEON
 INSTANTIATE_TEST_CASE_P(
    NEON, PartialIDctTest,
    ::testing::Values(
@@ -258,7 +258,7 @@ INSTANTIATE_TEST_CASE_P(
                   &vp9_idct4x4_16_add_c,
                   &vp9_idct4x4_1_add_neon,
                   TX_4X4, 1)));
-#endif  // HAVE_NEON && !CONFIG_VP9_HIGHBITDEPTH && !CONFIG_EMULATE_HARDWARE
+#endif  // HAVE_NEON

 #if HAVE_SSE2 && !CONFIG_VP9_HIGHBITDEPTH && !CONFIG_EMULATE_HARDWARE
 INSTANTIATE_TEST_CASE_P(
--- a/test/test_vector_test.cc
+++ b/test/test_vector_test.cc
@@ -29,7 +29,7 @@ namespace {

 enum DecodeMode {
  kSerialMode,
-  kFrameParallMode
+  kFrameParallelMode
 };

 const int kDecodeMode = 0;
@@ -95,7 +95,7 @@ TEST_P(TestVectorTest, MD5Match) {
  vpx_codec_dec_cfg_t cfg = {0};
  char str[256];

-  if (mode == kFrameParallMode) {
+  if (mode == kFrameParallelMode) {
    flags |= VPX_CODEC_USE_FRAME_THREADING;
  }

--- a/vp8/vp8_cx_iface.c
+++ b/vp8/vp8_cx_iface.c
@@ -858,9 +858,6 @@ static vpx_codec_err_t vp8e_encode(vpx_codec_alg_priv_t  *ctx,
 {
    vpx_codec_err_t res = VPX_CODEC_OK;

-    if (!ctx->cfg.rc_target_bitrate)
-        return res;
-
    if (!ctx->cfg.rc_target_bitrate)
        return res;

--- a/vp9/common/vp9_alloccommon.c
+++ b/vp9/common/vp9_alloccommon.c
@@ -83,8 +83,7 @@ static void free_seg_map(VP9_COMMON *cm) {
  }
 }

-void vp9_free_ref_frame_buffers(VP9_COMMON *cm) {
-  BufferPool *const pool = cm->buffer_pool;
+void vp9_free_ref_frame_buffers(BufferPool *pool) {
  int i;

  for (i = 0; i < FRAME_BUFFERS; ++i) {
@@ -97,10 +96,14 @@ void vp9_free_ref_frame_buffers(VP9_COMMON *cm) {
    pool->frame_bufs[i].mvs = NULL;
    vp9_free_frame_buffer(&pool->frame_bufs[i].buf);
  }
+}

+void vp9_free_postproc_buffers(VP9_COMMON *cm) {
 #if CONFIG_VP9_POSTPROC
  vp9_free_frame_buffer(&cm->post_proc_buffer);
  vp9_free_frame_buffer(&cm->post_proc_buffer_int);
+#else
+  (void)cm;
 #endif
 }

@@ -142,7 +145,6 @@ int vp9_alloc_context_buffers(VP9_COMMON *cm, int width, int height) {
 }

 void vp9_remove_common(VP9_COMMON *cm) {
-  vp9_free_ref_frame_buffers(cm);
  vp9_free_context_buffers(cm);

  vpx_free(cm->fc);
--- a/vp9/common/vp9_alloccommon.h
+++ b/vp9/common/vp9_alloccommon.h
@@ -19,6 +19,7 @@ extern "C" {
 #endif

 struct VP9Common;
+struct BufferPool;

 void vp9_remove_common(struct VP9Common *cm);

@@ -26,7 +27,8 @@ int vp9_alloc_context_buffers(struct VP9Common *cm, int width, int height);
 void vp9_init_context_buffers(struct VP9Common *cm);
 void vp9_free_context_buffers(struct VP9Common *cm);

-void vp9_free_ref_frame_buffers(struct VP9Common *cm);
+void vp9_free_ref_frame_buffers(struct BufferPool *pool);
+void vp9_free_postproc_buffers(struct VP9Common *cm);

 int vp9_alloc_state_buffers(struct VP9Common *cm, int width, int height);
 void vp9_free_state_buffers(struct VP9Common *cm);
--- a/vp9/common/vp9_entropy.c
+++ b/vp9/common/vp9_entropy.c
@@ -15,6 +15,18 @@
 #include "vpx_mem/vpx_mem.h"
 #include "vpx/vpx_integer.h"

+// Unconstrained Node Tree
+const vp9_tree_index vp9_coef_con_tree[TREE_SIZE(ENTROPY_TOKENS)] = {
+  2, 6,                                // 0 = LOW_VAL
+  -TWO_TOKEN, 4,                       // 1 = TWO
+  -THREE_TOKEN, -FOUR_TOKEN,           // 2 = THREE
+  8, 10,                               // 3 = HIGH_LOW
+  -CATEGORY1_TOKEN, -CATEGORY2_TOKEN,  // 4 = CAT_ONE
+  12, 14,                              // 5 = CAT_THREEFOUR
+  -CATEGORY3_TOKEN, -CATEGORY4_TOKEN,  // 6 = CAT_THREE
+  -CATEGORY5_TOKEN, -CATEGORY6_TOKEN   // 7 = CAT_FIVE
+};
+
 const vp9_prob vp9_cat1_prob[] = { 159 };
 const vp9_prob vp9_cat2_prob[] = { 165, 145 };
 const vp9_prob vp9_cat3_prob[] = { 173, 148, 140 };
--- a/vp9/common/vp9_entropy.h
+++ b/vp9/common/vp9_entropy.h
@@ -173,6 +173,7 @@ static INLINE const uint8_t *get_band_translate(TX_SIZE tx_size) {
 #define PIVOT_NODE                  2   // which node is pivot

 #define MODEL_NODES (ENTROPY_NODES - UNCONSTRAINED_NODES)
+extern const vp9_tree_index vp9_coef_con_tree[TREE_SIZE(ENTROPY_TOKENS)];
 extern const vp9_prob vp9_pareto8_full[COEFF_PROB_MODELS][MODEL_NODES];

 typedef vp9_prob vp9_coeff_probs_model[REF_TYPES][COEF_BANDS]
--- a/vp9/common/vp9_loopfilter.c
+++ b/vp9/common/vp9_loopfilter.c
@@ -293,7 +293,7 @@ void vp9_loop_filter_frame_init(VP9_COMMON *cm, int default_filt_lvl) {
  }
 }

-static void filter_selectively_vert_row2(PLANE_TYPE plane_type,
+static void filter_selectively_vert_row2(int subsampling_factor,
                                         uint8_t *s, int pitch,
                                         unsigned int mask_16x16_l,
                                         unsigned int mask_8x8_l,
@@ -301,9 +301,9 @@ static void filter_selectively_vert_row2(PLANE_TYPE plane_type,
                                         unsigned int mask_4x4_int_l,
                                         const loop_filter_info_n *lfi_n,
                                         const uint8_t *lfl) {
-  const int mask_shift = plane_type ? 4 : 8;
-  const int mask_cutoff = plane_type ? 0xf : 0xff;
-  const int lfl_forward = plane_type ? 4 : 8;
+  const int mask_shift = subsampling_factor ? 4 : 8;
+  const int mask_cutoff = subsampling_factor ? 0xf : 0xff;
+  const int lfl_forward = subsampling_factor ? 4 : 8;

  unsigned int mask_16x16_0 = mask_16x16_l & mask_cutoff;
  unsigned int mask_8x8_0 = mask_8x8_l & mask_cutoff;
@@ -393,7 +393,7 @@ static void filter_selectively_vert_row2(PLANE_TYPE plane_type,
 }

 #if CONFIG_VP9_HIGHBITDEPTH
-static void highbd_filter_selectively_vert_row2(PLANE_TYPE plane_type,
+static void highbd_filter_selectively_vert_row2(int subsampling_factor,
                                                uint16_t *s, int pitch,
                                                unsigned int mask_16x16_l,
                                                unsigned int mask_8x8_l,
@@ -401,9 +401,9 @@ static void highbd_filter_selectively_vert_row2(PLANE_TYPE plane_type,
                                                unsigned int mask_4x4_int_l,
                                                const loop_filter_info_n *lfi_n,
                                                const uint8_t *lfl, int bd) {
-  const int mask_shift = plane_type ? 4 : 8;
-  const int mask_cutoff = plane_type ? 0xf : 0xff;
-  const int lfl_forward = plane_type ? 4 : 8;
+  const int mask_shift = subsampling_factor ? 4 : 8;
+  const int mask_cutoff = subsampling_factor ? 0xf : 0xff;
+  const int lfl_forward = subsampling_factor ? 4 : 8;

  unsigned int mask_16x16_0 = mask_16x16_l & mask_cutoff;
  unsigned int mask_8x8_0 = mask_8x8_l & mask_cutoff;
@@ -1326,248 +1326,203 @@ void vp9_filter_block_plane_non420(VP9_COMMON *cm,
  }
 }

-void vp9_filter_block_plane(VP9_COMMON *const cm,
-                            struct macroblockd_plane *const plane,
-                            int mi_row,
-                            LOOP_FILTER_MASK *lfm) {
+void vp9_filter_block_plane_ss00(VP9_COMMON *const cm,
+                                 struct macroblockd_plane *const plane,
+                                 int mi_row,
+                                 LOOP_FILTER_MASK *lfm) {
  struct buf_2d *const dst = &plane->dst;
-  uint8_t* const dst0 = dst->buf;
-  int r, c;
+  uint8_t *const dst0 = dst->buf;
+  int r;
+  uint64_t mask_16x16 = lfm->left_y[TX_16X16];
+  uint64_t mask_8x8 = lfm->left_y[TX_8X8];
+  uint64_t mask_4x4 = lfm->left_y[TX_4X4];
+  uint64_t mask_4x4_int = lfm->int_4x4_y;

-  if (!plane->plane_type) {
-    uint64_t mask_16x16 = lfm->left_y[TX_16X16];
-    uint64_t mask_8x8 = lfm->left_y[TX_8X8];
-    uint64_t mask_4x4 = lfm->left_y[TX_4X4];
-    uint64_t mask_4x4_int = lfm->int_4x4_y;
+  assert(plane->subsampling_x == 0 && plane->subsampling_y == 0);

-    // Vertical pass: do 2 rows at one time
-    for (r = 0; r < MI_BLOCK_SIZE && mi_row + r < cm->mi_rows; r += 2) {
-      unsigned int mask_16x16_l = mask_16x16 & 0xffff;
-      unsigned int mask_8x8_l = mask_8x8 & 0xffff;
-      unsigned int mask_4x4_l = mask_4x4 & 0xffff;
-      unsigned int mask_4x4_int_l = mask_4x4_int & 0xffff;
+  // Vertical pass: do 2 rows at one time
+  for (r = 0; r < MI_BLOCK_SIZE && mi_row + r < cm->mi_rows; r += 2) {
+    unsigned int mask_16x16_l = mask_16x16 & 0xffff;
+    unsigned int mask_8x8_l = mask_8x8 & 0xffff;
+    unsigned int mask_4x4_l = mask_4x4 & 0xffff;
+    unsigned int mask_4x4_int_l = mask_4x4_int & 0xffff;

-      // Disable filtering on the leftmost column.
+// Disable filtering on the leftmost column.
 #if CONFIG_VP9_HIGHBITDEPTH
-      if (cm->use_highbitdepth) {
-        highbd_filter_selectively_vert_row2(plane->plane_type,
-                                            CONVERT_TO_SHORTPTR(dst->buf),
-                                            dst->stride,
-                                            mask_16x16_l,
-                                            mask_8x8_l,
-                                            mask_4x4_l,
-                                            mask_4x4_int_l,
-                                            &cm->lf_info, &lfm->lfl_y[r << 3],
-                                            (int)cm->bit_depth);
-      } else {
-        filter_selectively_vert_row2(plane->plane_type,
-                                     dst->buf, dst->stride,
-                                     mask_16x16_l,
-                                     mask_8x8_l,
-                                     mask_4x4_l,
-                                     mask_4x4_int_l,
-                                     &cm->lf_info,
-                                     &lfm->lfl_y[r << 3]);
-      }
+    if (cm->use_highbitdepth) {
+      highbd_filter_selectively_vert_row2(
+          plane->subsampling_x, CONVERT_TO_SHORTPTR(dst->buf), dst->stride,
+          mask_16x16_l, mask_8x8_l, mask_4x4_l, mask_4x4_int_l, &cm->lf_info,
+          &lfm->lfl_y[r << 3], (int)cm->bit_depth);
+    } else {
+      filter_selectively_vert_row2(
+          plane->subsampling_x, dst->buf, dst->stride, mask_16x16_l, mask_8x8_l,
+          mask_4x4_l, mask_4x4_int_l, &cm->lf_info, &lfm->lfl_y[r << 3]);
+    }
 #else
-      filter_selectively_vert_row2(plane->plane_type,
-                                   dst->buf, dst->stride,
-                                   mask_16x16_l,
-                                   mask_8x8_l,
-                                   mask_4x4_l,
-                                   mask_4x4_int_l,
-                                   &cm->lf_info, &lfm->lfl_y[r << 3]);
+    filter_selectively_vert_row2(
+        plane->subsampling_x, dst->buf, dst->stride, mask_16x16_l, mask_8x8_l,
+        mask_4x4_l, mask_4x4_int_l, &cm->lf_info, &lfm->lfl_y[r << 3]);
 #endif  // CONFIG_VP9_HIGHBITDEPTH
-      dst->buf += 16 * dst->stride;
-      mask_16x16 >>= 16;
-      mask_8x8 >>= 16;
-      mask_4x4 >>= 16;
-      mask_4x4_int >>= 16;
+    dst->buf += 16 * dst->stride;
+    mask_16x16 >>= 16;
+    mask_8x8 >>= 16;
+    mask_4x4 >>= 16;
+    mask_4x4_int >>= 16;
+  }
+
+  // Horizontal pass
+  dst->buf = dst0;
+  mask_16x16 = lfm->above_y[TX_16X16];
+  mask_8x8 = lfm->above_y[TX_8X8];
+  mask_4x4 = lfm->above_y[TX_4X4];
+  mask_4x4_int = lfm->int_4x4_y;
+
+  for (r = 0; r < MI_BLOCK_SIZE && mi_row + r < cm->mi_rows; r++) {
+    unsigned int mask_16x16_r;
+    unsigned int mask_8x8_r;
+    unsigned int mask_4x4_r;
+
+    if (mi_row + r == 0) {
+      mask_16x16_r = 0;
+      mask_8x8_r = 0;
+      mask_4x4_r = 0;
+    } else {
+      mask_16x16_r = mask_16x16 & 0xff;
+      mask_8x8_r = mask_8x8 & 0xff;
+      mask_4x4_r = mask_4x4 & 0xff;
    }

-    // Horizontal pass
-    dst->buf = dst0;
-    mask_16x16 = lfm->above_y[TX_16X16];
-    mask_8x8 = lfm->above_y[TX_8X8];
-    mask_4x4 = lfm->above_y[TX_4X4];
-    mask_4x4_int = lfm->int_4x4_y;
-
-    for (r = 0; r < MI_BLOCK_SIZE && mi_row + r < cm->mi_rows; r++) {
-      unsigned int mask_16x16_r;
-      unsigned int mask_8x8_r;
-      unsigned int mask_4x4_r;
-
-      if (mi_row + r == 0) {
-        mask_16x16_r = 0;
-        mask_8x8_r = 0;
-        mask_4x4_r = 0;
-      } else {
-        mask_16x16_r = mask_16x16 & 0xff;
-        mask_8x8_r = mask_8x8 & 0xff;
-        mask_4x4_r = mask_4x4 & 0xff;
-      }
-
 #if CONFIG_VP9_HIGHBITDEPTH
-      if (cm->use_highbitdepth) {
-        highbd_filter_selectively_horiz(CONVERT_TO_SHORTPTR(dst->buf),
-                                        dst->stride,
-                                        mask_16x16_r,
-                                        mask_8x8_r,
-                                        mask_4x4_r,
-                                        mask_4x4_int & 0xff,
-                                        &cm->lf_info,
-                                        &lfm->lfl_y[r << 3],
-                                        (int)cm->bit_depth);
-      } else {
-        filter_selectively_horiz(dst->buf, dst->stride,
-                                 mask_16x16_r,
-                                 mask_8x8_r,
-                                 mask_4x4_r,
-                                 mask_4x4_int & 0xff,
-                                 &cm->lf_info,
-                                 &lfm->lfl_y[r << 3]);
-      }
-#else
-      filter_selectively_horiz(dst->buf, dst->stride,
-                               mask_16x16_r,
-                               mask_8x8_r,
-                               mask_4x4_r,
-                               mask_4x4_int & 0xff,
-                               &cm->lf_info,
+    if (cm->use_highbitdepth) {
+      highbd_filter_selectively_horiz(
+          CONVERT_TO_SHORTPTR(dst->buf), dst->stride, mask_16x16_r, mask_8x8_r,
+          mask_4x4_r, mask_4x4_int & 0xff, &cm->lf_info, &lfm->lfl_y[r << 3],
+          (int)cm->bit_depth);
+    } else {
+      filter_selectively_horiz(dst->buf, dst->stride, mask_16x16_r, mask_8x8_r,
+                               mask_4x4_r, mask_4x4_int & 0xff, &cm->lf_info,
                               &lfm->lfl_y[r << 3]);
+    }
+#else
+    filter_selectively_horiz(dst->buf, dst->stride, mask_16x16_r, mask_8x8_r,
+                             mask_4x4_r, mask_4x4_int & 0xff, &cm->lf_info,
+                             &lfm->lfl_y[r << 3]);
 #endif  // CONFIG_VP9_HIGHBITDEPTH

-      dst->buf += 8 * dst->stride;
+    dst->buf += 8 * dst->stride;
+    mask_16x16 >>= 8;
+    mask_8x8 >>= 8;
+    mask_4x4 >>= 8;
+    mask_4x4_int >>= 8;
+  }
+}
+
+void vp9_filter_block_plane_ss11(VP9_COMMON *const cm,
+                                 struct macroblockd_plane *const plane,
+                                 int mi_row,
+                                 LOOP_FILTER_MASK *lfm) {
+  struct buf_2d *const dst = &plane->dst;
+  uint8_t *const dst0 = dst->buf;
+  int r, c;
+
+  uint16_t mask_16x16 = lfm->left_uv[TX_16X16];
+  uint16_t mask_8x8 = lfm->left_uv[TX_8X8];
+  uint16_t mask_4x4 = lfm->left_uv[TX_4X4];
+  uint16_t mask_4x4_int = lfm->int_4x4_uv;
+
+  assert(plane->subsampling_x == 1 && plane->subsampling_y == 1);
+
+  // Vertical pass: do 2 rows at one time
+  for (r = 0; r < MI_BLOCK_SIZE && mi_row + r < cm->mi_rows; r += 4) {
+    if (plane->plane_type == 1) {
+      for (c = 0; c < (MI_BLOCK_SIZE >> 1); c++) {
+        lfm->lfl_uv[(r << 1) + c] = lfm->lfl_y[(r << 3) + (c << 1)];
+        lfm->lfl_uv[((r + 2) << 1) + c] = lfm->lfl_y[((r + 2) << 3) + (c << 1)];
+      }
+    }
+
+    {
+      unsigned int mask_16x16_l = mask_16x16 & 0xff;
+      unsigned int mask_8x8_l = mask_8x8 & 0xff;
+      unsigned int mask_4x4_l = mask_4x4 & 0xff;
+      unsigned int mask_4x4_int_l = mask_4x4_int & 0xff;
+
+// Disable filtering on the leftmost column.
+#if CONFIG_VP9_HIGHBITDEPTH
+      if (cm->use_highbitdepth) {
+        highbd_filter_selectively_vert_row2(
+            plane->subsampling_x, CONVERT_TO_SHORTPTR(dst->buf), dst->stride,
+            mask_16x16_l, mask_8x8_l, mask_4x4_l, mask_4x4_int_l, &cm->lf_info,
+            &lfm->lfl_uv[r << 1], (int)cm->bit_depth);
+      } else {
+        filter_selectively_vert_row2(
+            plane->subsampling_x, dst->buf, dst->stride,
+            mask_16x16_l, mask_8x8_l, mask_4x4_l, mask_4x4_int_l, &cm->lf_info,
+            &lfm->lfl_uv[r << 1]);
+      }
+#else
+      filter_selectively_vert_row2(
+          plane->subsampling_x, dst->buf, dst->stride,
+          mask_16x16_l, mask_8x8_l, mask_4x4_l, mask_4x4_int_l, &cm->lf_info,
+          &lfm->lfl_uv[r << 1]);
+#endif  // CONFIG_VP9_HIGHBITDEPTH
+
+      dst->buf += 16 * dst->stride;
      mask_16x16 >>= 8;
      mask_8x8 >>= 8;
      mask_4x4 >>= 8;
      mask_4x4_int >>= 8;
    }
-  } else {
-    uint16_t mask_16x16 = lfm->left_uv[TX_16X16];
-    uint16_t mask_8x8 = lfm->left_uv[TX_8X8];
-    uint16_t mask_4x4 = lfm->left_uv[TX_4X4];
-    uint16_t mask_4x4_int = lfm->int_4x4_uv;
+  }

-    // Vertical pass: do 2 rows at one time
-    for (r = 0; r < MI_BLOCK_SIZE && mi_row + r < cm->mi_rows; r += 4) {
-      if (plane->plane_type == 1) {
-        for (c = 0; c < (MI_BLOCK_SIZE >> 1); c++) {
-          lfm->lfl_uv[(r << 1) + c] = lfm->lfl_y[(r << 3) + (c << 1)];
-          lfm->lfl_uv[((r + 2) << 1) + c] = lfm->lfl_y[((r + 2) << 3) +
-                                                       (c << 1)];
-        }
-      }
+  // Horizontal pass
+  dst->buf = dst0;
+  mask_16x16 = lfm->above_uv[TX_16X16];
+  mask_8x8 = lfm->above_uv[TX_8X8];
+  mask_4x4 = lfm->above_uv[TX_4X4];
+  mask_4x4_int = lfm->int_4x4_uv;

-      {
-        unsigned int mask_16x16_l = mask_16x16 & 0xff;
-        unsigned int mask_8x8_l = mask_8x8 & 0xff;
-        unsigned int mask_4x4_l = mask_4x4 & 0xff;
-        unsigned int mask_4x4_int_l = mask_4x4_int & 0xff;
+  for (r = 0; r < MI_BLOCK_SIZE && mi_row + r < cm->mi_rows; r += 2) {
+    const int skip_border_4x4_r = mi_row + r == cm->mi_rows - 1;
+    const unsigned int mask_4x4_int_r =
+        skip_border_4x4_r ? 0 : (mask_4x4_int & 0xf);
+    unsigned int mask_16x16_r;
+    unsigned int mask_8x8_r;
+    unsigned int mask_4x4_r;

-        // Disable filtering on the leftmost column.
-#if CONFIG_VP9_HIGHBITDEPTH
-        if (cm->use_highbitdepth) {
-          highbd_filter_selectively_vert_row2(plane->plane_type,
-                                              CONVERT_TO_SHORTPTR(dst->buf),
-                                              dst->stride,
-                                              mask_16x16_l,
-                                              mask_8x8_l,
-                                              mask_4x4_l,
-                                              mask_4x4_int_l,
-                                              &cm->lf_info,
-                                              &lfm->lfl_uv[r << 1],
-                                              (int)cm->bit_depth);
-        } else {
-          filter_selectively_vert_row2(plane->plane_type,
-                                       dst->buf, dst->stride,
-                                       mask_16x16_l,
-                                       mask_8x8_l,
-                                       mask_4x4_l,
-                                       mask_4x4_int_l,
-                                       &cm->lf_info,
-                                       &lfm->lfl_uv[r << 1]);
-        }
-#else
-        filter_selectively_vert_row2(plane->plane_type,
-                                     dst->buf, dst->stride,
-                                     mask_16x16_l,
-                                     mask_8x8_l,
-                                     mask_4x4_l,
-                                     mask_4x4_int_l,
-                                     &cm->lf_info,
-                                     &lfm->lfl_uv[r << 1]);
-#endif  // CONFIG_VP9_HIGHBITDEPTH
-
-        dst->buf += 16 * dst->stride;
-        mask_16x16 >>= 8;
-        mask_8x8 >>= 8;
-        mask_4x4 >>= 8;
-        mask_4x4_int >>= 8;
-      }
+    if (mi_row + r == 0) {
+      mask_16x16_r = 0;
+      mask_8x8_r = 0;
+      mask_4x4_r = 0;
+    } else {
+      mask_16x16_r = mask_16x16 & 0xf;
+      mask_8x8_r = mask_8x8 & 0xf;
+      mask_4x4_r = mask_4x4 & 0xf;
    }

-    // Horizontal pass
-    dst->buf = dst0;
-    mask_16x16 = lfm->above_uv[TX_16X16];
-    mask_8x8 = lfm->above_uv[TX_8X8];
-    mask_4x4 = lfm->above_uv[TX_4X4];
-    mask_4x4_int = lfm->int_4x4_uv;
-
-    for (r = 0; r < MI_BLOCK_SIZE && mi_row + r < cm->mi_rows; r += 2) {
-      const int skip_border_4x4_r = mi_row + r == cm->mi_rows - 1;
-      const unsigned int mask_4x4_int_r = skip_border_4x4_r ?
-          0 : (mask_4x4_int & 0xf);
-      unsigned int mask_16x16_r;
-      unsigned int mask_8x8_r;
-      unsigned int mask_4x4_r;
-
-      if (mi_row + r == 0) {
-        mask_16x16_r = 0;
-        mask_8x8_r = 0;
-        mask_4x4_r = 0;
-      } else {
-        mask_16x16_r = mask_16x16 & 0xf;
-        mask_8x8_r = mask_8x8 & 0xf;
-        mask_4x4_r = mask_4x4 & 0xf;
-      }
-
 #if CONFIG_VP9_HIGHBITDEPTH
-      if (cm->use_highbitdepth) {
-        highbd_filter_selectively_horiz(CONVERT_TO_SHORTPTR(dst->buf),
-                                        dst->stride,
-                                        mask_16x16_r,
-                                        mask_8x8_r,
-                                        mask_4x4_r,
-                                        mask_4x4_int_r,
-                                        &cm->lf_info,
-                                        &lfm->lfl_uv[r << 1],
-                                        (int)cm->bit_depth);
-      } else {
-        filter_selectively_horiz(dst->buf, dst->stride,
-                                 mask_16x16_r,
-                                 mask_8x8_r,
-                                 mask_4x4_r,
-                                 mask_4x4_int_r,
-                                 &cm->lf_info,
-                                 &lfm->lfl_uv[r << 1]);
-      }
-#else
-      filter_selectively_horiz(dst->buf, dst->stride,
-                               mask_16x16_r,
-                               mask_8x8_r,
-                               mask_4x4_r,
-                               mask_4x4_int_r,
-                               &cm->lf_info,
+    if (cm->use_highbitdepth) {
+      highbd_filter_selectively_horiz(CONVERT_TO_SHORTPTR(dst->buf),
+                                      dst->stride, mask_16x16_r, mask_8x8_r,
+                                      mask_4x4_r, mask_4x4_int_r, &cm->lf_info,
+                                      &lfm->lfl_uv[r << 1], (int)cm->bit_depth);
+    } else {
+      filter_selectively_horiz(dst->buf, dst->stride, mask_16x16_r, mask_8x8_r,
+                               mask_4x4_r, mask_4x4_int_r, &cm->lf_info,
                               &lfm->lfl_uv[r << 1]);
+    }
+#else
+    filter_selectively_horiz(dst->buf, dst->stride, mask_16x16_r, mask_8x8_r,
+                             mask_4x4_r, mask_4x4_int_r, &cm->lf_info,
+                             &lfm->lfl_uv[r << 1]);
 #endif  // CONFIG_VP9_HIGHBITDEPTH

-      dst->buf += 8 * dst->stride;
-      mask_16x16 >>= 4;
-      mask_8x8 >>= 4;
-      mask_4x4 >>= 4;
-      mask_4x4_int >>= 4;
-    }
+    dst->buf += 8 * dst->stride;
+    mask_16x16 >>= 4;
+    mask_8x8 >>= 4;
+    mask_4x4 >>= 4;
+    mask_4x4_int >>= 4;
  }
 }

@@ -1576,11 +1531,19 @@ void vp9_loop_filter_rows(YV12_BUFFER_CONFIG *frame_buffer,
                          struct macroblockd_plane planes[MAX_MB_PLANE],
                          int start, int stop, int y_only) {
  const int num_planes = y_only ? 1 : MAX_MB_PLANE;
-  const int use_420 = y_only || (planes[1].subsampling_y == 1 &&
-                                 planes[1].subsampling_x == 1);
+  enum lf_path path;
  LOOP_FILTER_MASK lfm;
  int mi_row, mi_col;

+  if (y_only)
+    path = LF_PATH_444;
+  else if (planes[1].subsampling_y == 1 && planes[1].subsampling_x == 1)
+    path = LF_PATH_420;
+  else if (planes[1].subsampling_y == 0 && planes[1].subsampling_x == 0)
+    path = LF_PATH_444;
+  else
+    path = LF_PATH_SLOW;
+
  for (mi_row = start; mi_row < stop; mi_row += MI_BLOCK_SIZE) {
    MODE_INFO *mi = cm->mi + mi_row * cm->mi_stride;

@@ -1590,16 +1553,23 @@ void vp9_loop_filter_rows(YV12_BUFFER_CONFIG *frame_buffer,
      vp9_setup_dst_planes(planes, frame_buffer, mi_row, mi_col);

      // TODO(JBB): Make setup_mask work for non 420.
-      if (use_420)
-        vp9_setup_mask(cm, mi_row, mi_col, mi + mi_col, cm->mi_stride,
-                       &lfm);
+      vp9_setup_mask(cm, mi_row, mi_col, mi + mi_col, cm->mi_stride,
+                     &lfm);

-      for (plane = 0; plane < num_planes; ++plane) {
-        if (use_420)
-          vp9_filter_block_plane(cm, &planes[plane], mi_row, &lfm);
-        else
-          vp9_filter_block_plane_non420(cm, &planes[plane], mi + mi_col,
-                                        mi_row, mi_col);
+      vp9_filter_block_plane_ss00(cm, &planes[0], mi_row, &lfm);
+      for (plane = 1; plane < num_planes; ++plane) {
+        switch (path) {
+          case LF_PATH_420:
+            vp9_filter_block_plane_ss11(cm, &planes[plane], mi_row, &lfm);
+            break;
+          case LF_PATH_444:
+            vp9_filter_block_plane_ss00(cm, &planes[plane], mi_row, &lfm);
+            break;
+          case LF_PATH_SLOW:
+            vp9_filter_block_plane_non420(cm, &planes[plane], mi + mi_col,
+                                          mi_row, mi_col);
+            break;
+        }
      }
    }
  }
--- a/vp9/common/vp9_loopfilter.h
+++ b/vp9/common/vp9_loopfilter.h
@@ -29,6 +29,12 @@ extern "C" {
 #define MAX_REF_LF_DELTAS       4
 #define MAX_MODE_LF_DELTAS      2

+enum lf_path {
+  LF_PATH_420,
+  LF_PATH_444,
+  LF_PATH_SLOW,
+};
+
 struct loopfilter {
  int filter_level;

@@ -92,10 +98,15 @@ void vp9_setup_mask(struct VP9Common *const cm,
                    MODE_INFO *mi_8x8, const int mode_info_stride,
                    LOOP_FILTER_MASK *lfm);

-void vp9_filter_block_plane(struct VP9Common *const cm,
-                            struct macroblockd_plane *const plane,
-                            int mi_row,
-                            LOOP_FILTER_MASK *lfm);
+void vp9_filter_block_plane_ss00(struct VP9Common *const cm,
+                                 struct macroblockd_plane *const plane,
+                                 int mi_row,
+                                 LOOP_FILTER_MASK *lfm);
+
+void vp9_filter_block_plane_ss11(struct VP9Common *const cm,
+                                 struct macroblockd_plane *const plane,
+                                 int mi_row,
+                                 LOOP_FILTER_MASK *lfm);

 void vp9_filter_block_plane_non420(struct VP9Common *cm,
                                   struct macroblockd_plane *plane,
--- a/vp9/common/vp9_onyxc_int.h
+++ b/vp9/common/vp9_onyxc_int.h
@@ -88,7 +88,7 @@ typedef struct {
  int col;
 } RefCntBuffer;

-typedef struct {
+typedef struct BufferPool {
  // Protect BufferPool from being accessed by several FrameWorkers at
  // the same time during frame parallel decode.
  // TODO(hkuang): Try to use atomic variable instead of locking the whole pool.
--- a/vp9/common/vp9_postproc.c
+++ b/vp9/common/vp9_postproc.c
@@ -91,10 +91,7 @@ void vp9_post_proc_down_and_across_c(const uint8_t *src_ptr,
                                     int flimit) {
  uint8_t const *p_src;
  uint8_t *p_dst;
-  int row;
-  int col;
-  int i;
-  int v;
+  int row, col, i, v, kernel;
  int pitch = src_pixels_per_line;
  uint8_t d[8];
  (void)dst_pixels_per_line;
@@ -105,8 +102,8 @@ void vp9_post_proc_down_and_across_c(const uint8_t *src_ptr,
    p_dst = dst_ptr;

    for (col = 0; col < cols; col++) {
-      int kernel = 4;
-      int v = p_src[col];
+      kernel = 4;
+      v = p_src[col];

      for (i = -2; i <= 2; i++) {
        if (abs(v - p_src[col + i * pitch]) > flimit)
@@ -128,7 +125,7 @@ void vp9_post_proc_down_and_across_c(const uint8_t *src_ptr,
      d[i] = p_src[i];

    for (col = 0; col < cols; col++) {
-      int kernel = 4;
+      kernel = 4;
      v = p_src[col];

      d[col & 7] = v;
@@ -168,10 +165,7 @@ void vp9_highbd_post_proc_down_and_across_c(const uint16_t *src_ptr,
                                            int flimit) {
  uint16_t const *p_src;
  uint16_t *p_dst;
-  int row;
-  int col;
-  int i;
-  int v;
+  int row, col, i, v, kernel;
  int pitch = src_pixels_per_line;
  uint16_t d[8];

@@ -181,8 +175,8 @@ void vp9_highbd_post_proc_down_and_across_c(const uint16_t *src_ptr,
    p_dst = dst_ptr;

    for (col = 0; col < cols; col++) {
-      int kernel = 4;
-      int v = p_src[col];
+      kernel = 4;
+      v = p_src[col];

      for (i = -2; i <= 2; i++) {
        if (abs(v - p_src[col + i * pitch]) > flimit)
@@ -205,7 +199,7 @@ void vp9_highbd_post_proc_down_and_across_c(const uint16_t *src_ptr,
      d[i] = p_src[i];

    for (col = 0; col < cols; col++) {
-      int kernel = 4;
+      kernel = 4;
      v = p_src[col];

      d[col & 7] = v;
@@ -518,22 +512,24 @@ void vp9_denoise(const YV12_BUFFER_CONFIG *src, YV12_BUFFER_CONFIG *dst,
    assert((src->flags & YV12_FLAG_HIGHBITDEPTH) ==
           (dst->flags & YV12_FLAG_HIGHBITDEPTH));
    if (src->flags & YV12_FLAG_HIGHBITDEPTH) {
-      const uint16_t *const src = CONVERT_TO_SHORTPTR(srcs[i] + 2 * src_stride
-                                                      + 2);
-      uint16_t *const dst = CONVERT_TO_SHORTPTR(dsts[i] + 2 * dst_stride + 2);
-      vp9_highbd_post_proc_down_and_across(src, dst, src_stride, dst_stride,
-                                           src_height, src_width, ppl);
+      const uint16_t *const src_plane = CONVERT_TO_SHORTPTR(
+          srcs[i] + 2 * src_stride + 2);
+      uint16_t *const dst_plane = CONVERT_TO_SHORTPTR(
+          dsts[i] + 2 * dst_stride + 2);
+      vp9_highbd_post_proc_down_and_across(src_plane, dst_plane, src_stride,
+                                           dst_stride, src_height, src_width,
+                                           ppl);
    } else {
-      const uint8_t *const src = srcs[i] + 2 * src_stride + 2;
-      uint8_t *const dst = dsts[i] + 2 * dst_stride + 2;
+      const uint8_t *const src_plane = srcs[i] + 2 * src_stride + 2;
+      uint8_t *const dst_plane = dsts[i] + 2 * dst_stride + 2;

-      vp9_post_proc_down_and_across(src, dst, src_stride, dst_stride,
-                                    src_height, src_width, ppl);
+      vp9_post_proc_down_and_across(src_plane, dst_plane, src_stride,
+                                    dst_stride, src_height, src_width, ppl);
    }
 #else
-    const uint8_t *const src = srcs[i] + 2 * src_stride + 2;
-    uint8_t *const dst = dsts[i] + 2 * dst_stride + 2;
-    vp9_post_proc_down_and_across(src, dst, src_stride, dst_stride,
+    const uint8_t *const src_plane = srcs[i] + 2 * src_stride + 2;
+    uint8_t *const dst_plane = dsts[i] + 2 * dst_stride + 2;
+    vp9_post_proc_down_and_across(src_plane, dst_plane, src_stride, dst_stride,
                                  src_height, src_width, ppl);
 #endif
  }
@@ -558,16 +554,15 @@ static void fillrd(struct postproc_state *state, int q, int a) {
   * a gaussian distribution with sigma determined by q.
   */
  {
-    double i;
    int next, j;

    next = 0;

    for (i = -32; i < 32; i++) {
-      int a = (int)(0.5 + 256 * gaussian(sigma, 0, i));
+      int a_i = (int)(0.5 + 256 * gaussian(sigma, 0, i));

-      if (a) {
-        for (j = 0; j < a; j++) {
+      if (a_i) {
+        for (j = 0; j < a_i; j++) {
          char_dist[next + j] = (char) i;
        }

--- a/vp9/common/vp9_reconintra.c
+++ b/vp9/common/vp9_reconintra.c
@@ -30,6 +30,25 @@ const TX_TYPE intra_mode_to_tx_type_lookup[INTRA_MODES] = {
  ADST_ADST,  // TM
 };

+enum {
+  NEED_LEFT = 1 << 1,
+  NEED_ABOVE = 1 << 2,
+  NEED_ABOVERIGHT = 1 << 3,
+};
+
+static const uint8_t extend_modes[INTRA_MODES] = {
+  NEED_ABOVE | NEED_LEFT,       // DC
+  NEED_ABOVE,                   // V
+  NEED_LEFT,                    // H
+  NEED_ABOVERIGHT,              // D45
+  NEED_LEFT | NEED_ABOVE,       // D135
+  NEED_LEFT | NEED_ABOVE,       // D117
+  NEED_LEFT | NEED_ABOVE,       // D153
+  NEED_LEFT,                    // D207
+  NEED_ABOVERIGHT,              // D63
+  NEED_LEFT | NEED_ABOVE,       // TM
+};
+
 // This serves as a wrapper function, so that all the prediction functions
 // can be unified and accessed as a pointer array. Note that the boundary
 // above and left are not necessarily used all the time.
@@ -790,75 +809,106 @@ static void build_intra_predictors(const MACROBLOCKD *xd, const uint8_t *ref,
  x0 = (-xd->mb_to_left_edge >> (3 + pd->subsampling_x)) + x;
  y0 = (-xd->mb_to_top_edge >> (3 + pd->subsampling_y)) + y;

-  vpx_memset(left_col, 129, 64);
-
-  // left
-  if (left_available) {
-    if (xd->mb_to_bottom_edge < 0) {
-      /* slower path if the block needs border extension */
-      if (y0 + bs <= frame_height) {
+  // NEED_LEFT
+  if (extend_modes[mode] & NEED_LEFT) {
+    if (left_available) {
+      if (xd->mb_to_bottom_edge < 0) {
+        /* slower path if the block needs border extension */
+        if (y0 + bs <= frame_height) {
+          for (i = 0; i < bs; ++i)
+            left_col[i] = ref[i * ref_stride - 1];
+        } else {
+          const int extend_bottom = frame_height - y0;
+          for (i = 0; i < extend_bottom; ++i)
+            left_col[i] = ref[i * ref_stride - 1];
+          for (; i < bs; ++i)
+            left_col[i] = ref[(extend_bottom - 1) * ref_stride - 1];
+        }
+      } else {
+        /* faster path if the block does not need extension */
        for (i = 0; i < bs; ++i)
          left_col[i] = ref[i * ref_stride - 1];
-      } else {
-        const int extend_bottom = frame_height - y0;
-        for (i = 0; i < extend_bottom; ++i)
-          left_col[i] = ref[i * ref_stride - 1];
-        for (; i < bs; ++i)
-          left_col[i] = ref[(extend_bottom - 1) * ref_stride - 1];
      }
    } else {
-      /* faster path if the block does not need extension */
-      for (i = 0; i < bs; ++i)
-        left_col[i] = ref[i * ref_stride - 1];
+      vpx_memset(left_col, 129, bs);
    }
  }

-  // TODO(hkuang) do not extend 2*bs pixels for all modes.
-  // above
-  if (up_available) {
-    const uint8_t *above_ref = ref - ref_stride;
-    if (xd->mb_to_right_edge < 0) {
-      /* slower path if the block needs border extension */
-      if (x0 + 2 * bs <= frame_width) {
-        if (right_available && bs == 4) {
-          vpx_memcpy(above_row, above_ref, 2 * bs);
-        } else {
+  // NEED_ABOVE
+  if (extend_modes[mode] & NEED_ABOVE) {
+    if (up_available) {
+      const uint8_t *above_ref = ref - ref_stride;
+      if (xd->mb_to_right_edge < 0) {
+        /* slower path if the block needs border extension */
+        if (x0 + bs <= frame_width) {
          vpx_memcpy(above_row, above_ref, bs);
-          vpx_memset(above_row + bs, above_row[bs - 1], bs);
-        }
-      } else if (x0 + bs <= frame_width) {
-        const int r = frame_width - x0;
-        if (right_available && bs == 4) {
+        } else if (x0 <= frame_width) {
+          const int r = frame_width - x0;
          vpx_memcpy(above_row, above_ref, r);
          vpx_memset(above_row + r, above_row[r - 1],
-                     x0 + 2 * bs - frame_width);
+                     x0 + bs - frame_width);
+        }
+      } else {
+        /* faster path if the block does not need extension */
+        if (bs == 4 && right_available && left_available) {
+          const_above_row = above_ref;
        } else {
          vpx_memcpy(above_row, above_ref, bs);
-          vpx_memset(above_row + bs, above_row[bs - 1], bs);
        }
-      } else if (x0 <= frame_width) {
-        const int r = frame_width - x0;
-        vpx_memcpy(above_row, above_ref, r);
-        vpx_memset(above_row + r, above_row[r - 1],
-                     x0 + 2 * bs - frame_width);
      }
      above_row[-1] = left_available ? above_ref[-1] : 129;
    } else {
-      /* faster path if the block does not need extension */
-      if (bs == 4 && right_available && left_available) {
-        const_above_row = above_ref;
-      } else {
-        vpx_memcpy(above_row, above_ref, bs);
-        if (bs == 4 && right_available)
-          vpx_memcpy(above_row + bs, above_ref + bs, bs);
-        else
-          vpx_memset(above_row + bs, above_row[bs - 1], bs);
-        above_row[-1] = left_available ? above_ref[-1] : 129;
-      }
+      vpx_memset(above_row, 127, bs);
+      above_row[-1] = 127;
+    }
+  }
+
+  // NEED_ABOVERIGHT
+  if (extend_modes[mode] & NEED_ABOVERIGHT) {
+    if (up_available) {
+      const uint8_t *above_ref = ref - ref_stride;
+      if (xd->mb_to_right_edge < 0) {
+        /* slower path if the block needs border extension */
+        if (x0 + 2 * bs <= frame_width) {
+          if (right_available && bs == 4) {
+            vpx_memcpy(above_row, above_ref, 2 * bs);
+          } else {
+            vpx_memcpy(above_row, above_ref, bs);
+            vpx_memset(above_row + bs, above_row[bs - 1], bs);
+          }
+        } else if (x0 + bs <= frame_width) {
+          const int r = frame_width - x0;
+          if (right_available && bs == 4) {
+            vpx_memcpy(above_row, above_ref, r);
+            vpx_memset(above_row + r, above_row[r - 1],
+                       x0 + 2 * bs - frame_width);
+          } else {
+            vpx_memcpy(above_row, above_ref, bs);
+            vpx_memset(above_row + bs, above_row[bs - 1], bs);
+          }
+        } else if (x0 <= frame_width) {
+          const int r = frame_width - x0;
+          vpx_memcpy(above_row, above_ref, r);
+          vpx_memset(above_row + r, above_row[r - 1],
+                     x0 + 2 * bs - frame_width);
+        }
+      } else {
+        /* faster path if the block does not need extension */
+        if (bs == 4 && right_available && left_available) {
+          const_above_row = above_ref;
+        } else {
+          vpx_memcpy(above_row, above_ref, bs);
+          if (bs == 4 && right_available)
+            vpx_memcpy(above_row + bs, above_ref + bs, bs);
+          else
+            vpx_memset(above_row + bs, above_row[bs - 1], bs);
+        }
+      }
+      above_row[-1] = left_available ? above_ref[-1] : 129;
+    } else {
+      vpx_memset(above_row, 127, bs * 2);
+      above_row[-1] = 127;
    }
-  } else {
-    vpx_memset(above_row, 127, bs * 2);
-    above_row[-1] = 127;
  }

  // predict
--- a/vp9/common/vp9_rtcd_defs.pl
+++ b/vp9/common/vp9_rtcd_defs.pl
@@ -499,7 +499,7 @@ if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") {
  specialize qw/vp9_highbd_d153_predictor_4x4/;

  add_proto qw/void vp9_highbd_v_predictor_4x4/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd";
-  specialize qw/vp9_highbd_v_predictor_4x4/, "$sse_x86inc";
+  specialize qw/vp9_highbd_v_predictor_4x4 neon/, "$sse_x86inc";

  add_proto qw/void vp9_highbd_tm_predictor_4x4/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd";
  specialize qw/vp9_highbd_tm_predictor_4x4/, "$sse_x86inc";
@@ -577,7 +577,7 @@ if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") {
  specialize qw/vp9_highbd_d153_predictor_16x16/;

  add_proto qw/void vp9_highbd_v_predictor_16x16/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd";
-  specialize qw/vp9_highbd_v_predictor_16x16/, "$sse2_x86inc";
+  specialize qw/vp9_highbd_v_predictor_16x16 neon/, "$sse2_x86inc";

  add_proto qw/void vp9_highbd_tm_predictor_16x16/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd";
  specialize qw/vp9_highbd_tm_predictor_16x16/, "$sse2_x86_64";
@@ -1109,6 +1109,15 @@ specialize qw/vp9_avg_8x8 sse2 neon/;
 add_proto qw/unsigned int vp9_avg_4x4/, "const uint8_t *, int p";
 specialize qw/vp9_avg_4x4 sse2/;

+add_proto qw/void vp9_hadamard_8x8/, "int16_t const *src_diff, int src_stride, int16_t *coeff";
+specialize qw/vp9_hadamard_8x8 sse2/;
+
+add_proto qw/void vp9_hadamard_16x16/, "int16_t const *src_diff, int src_stride, int16_t *coeff";
+specialize qw/vp9_hadamard_16x16 sse2/;
+
+add_proto qw/int16_t vp9_satd/, "const int16_t *coeff, int length";
+specialize qw/vp9_satd sse2/;
+
 add_proto qw/void vp9_int_pro_row/, "int16_t *hbuf, uint8_t const *ref, const int ref_stride, const int height";
 specialize qw/vp9_int_pro_row sse2/;

@@ -1162,6 +1171,9 @@ if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") {
  add_proto qw/int64_t vp9_block_error/, "const tran_low_t *coeff, const tran_low_t *dqcoeff, intptr_t block_size, int64_t *ssz";
  specialize qw/vp9_block_error avx2/, "$sse2_x86inc";

+  add_proto qw/int64_t vp9_block_error_fp/, "const int16_t *coeff, const int16_t *dqcoeff, int block_size";
+  specialize qw/vp9_block_error_fp sse2/;
+
  add_proto qw/void vp9_quantize_fp/, "const tran_low_t *coeff_ptr, intptr_t n_coeffs, int skip_block, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan";
  specialize qw/vp9_quantize_fp neon sse2/, "$ssse3_x86_64";

--- a/vp9/common/vp9_thread_common.c
+++ b/vp9/common/vp9_thread_common.c
@@ -13,6 +13,7 @@
 #include "vp9/common/vp9_entropymode.h"
 #include "vp9/common/vp9_thread_common.h"
 #include "vp9/common/vp9_reconinter.h"
+#include "vp9/common/vp9_loopfilter.h"

 #if CONFIG_MULTITHREAD
 static INLINE void mutex_lock(pthread_mutex_t *const mutex) {
@@ -92,10 +93,17 @@ void thread_loop_filter_rows(const YV12_BUFFER_CONFIG *const frame_buffer,
                             int start, int stop, int y_only,
                             VP9LfSync *const lf_sync) {
  const int num_planes = y_only ? 1 : MAX_MB_PLANE;
-  const int use_420 = y_only || (planes[1].subsampling_y == 1 &&
-                                 planes[1].subsampling_x == 1);
  const int sb_cols = mi_cols_aligned_to_sb(cm->mi_cols) >> MI_BLOCK_SIZE_LOG2;
  int mi_row, mi_col;
+  enum lf_path path;
+  if (y_only)
+    path = LF_PATH_444;
+  else if (planes[1].subsampling_y == 1 && planes[1].subsampling_x == 1)
+    path = LF_PATH_420;
+  else if (planes[1].subsampling_y == 0 && planes[1].subsampling_x == 0)
+    path = LF_PATH_444;
+  else
+    path = LF_PATH_SLOW;

  for (mi_row = start; mi_row < stop;
       mi_row += lf_sync->num_workers * MI_BLOCK_SIZE) {
@@ -112,16 +120,23 @@ void thread_loop_filter_rows(const YV12_BUFFER_CONFIG *const frame_buffer,
      vp9_setup_dst_planes(planes, frame_buffer, mi_row, mi_col);

      // TODO(JBB): Make setup_mask work for non 420.
-      if (use_420)
-        vp9_setup_mask(cm, mi_row, mi_col, mi + mi_col, cm->mi_stride,
-                       &lfm);
+      vp9_setup_mask(cm, mi_row, mi_col, mi + mi_col, cm->mi_stride,
+                     &lfm);

-      for (plane = 0; plane < num_planes; ++plane) {
-        if (use_420)
-          vp9_filter_block_plane(cm, &planes[plane], mi_row, &lfm);
-        else
-          vp9_filter_block_plane_non420(cm, &planes[plane], mi + mi_col,
-                                        mi_row, mi_col);
+      vp9_filter_block_plane_ss00(cm, &planes[0], mi_row, &lfm);
+      for (plane = 1; plane < num_planes; ++plane) {
+        switch (path) {
+          case LF_PATH_420:
+            vp9_filter_block_plane_ss11(cm, &planes[plane], mi_row, &lfm);
+            break;
+          case LF_PATH_444:
+            vp9_filter_block_plane_ss00(cm, &planes[plane], mi_row, &lfm);
+            break;
+          case LF_PATH_SLOW:
+            vp9_filter_block_plane_non420(cm, &planes[plane], mi + mi_col,
+                                          mi_row, mi_col);
+            break;
+        }
      }

      sync_write(lf_sync, r, c, sb_cols);
--- a/vp9/decoder/vp9_decodeframe.c
+++ b/vp9/decoder/vp9_decodeframe.c
@@ -1509,7 +1509,7 @@ static int read_compressed_header(VP9Decoder *pbi, const uint8_t *data,
  if (vp9_reader_init(&r, data, partition_size, pbi->decrypt_cb,
                      pbi->decrypt_state))
    vpx_internal_error(&cm->error, VPX_CODEC_MEM_ERROR,
-                       "Failed to allocate bool decoder 0");
+                       "Failed to allocate boon decoder 0");

  cm->tx_mode = xd->lossless ? ONLY_4X4 : read_tx_mode(&r);
  if (cm->tx_mode == TX_MODE_SELECT)
--- a/vp9/decoder/vp9_decodemv.c
+++ b/vp9/decoder/vp9_decodemv.c
@@ -60,6 +60,35 @@ static int read_segment_id(vp9_reader *r, const struct segmentation *seg) {
  return vp9_read_tree(r, vp9_segment_tree, seg->tree_probs);
 }

+static void read_tx_size_inter(VP9_COMMON *cm, MACROBLOCKD *xd,
+                               TX_SIZE tx_size, int mi_row, int mi_col,
+                               vp9_reader *r) {
+  MB_MODE_INFO *mbmi = &xd->mi[0].src_mi->mbmi;
+  int is_split = vp9_read_bit(r);
+
+  if (!is_split) {
+    mbmi->tx_size = tx_size;
+  } else {
+    BLOCK_SIZE bsize = txsize_to_bsize[tx_size];
+    int bh = num_8x8_blocks_high_lookup[bsize];
+    int i;
+
+    if (tx_size == TX_8X8) {
+      mbmi->tx_size = TX_4X4;
+      return;
+    }
+
+    for (i = 0; i < 4; ++i) {
+      int offsetr = (i >> 1) * bh / 2;
+      int offsetc = (i & 0x01) * bh / 2;
+      if ((mi_row + offsetr < cm->mi_rows) &&
+          (mi_col + offsetc < cm->mi_cols))
+        read_tx_size_inter(cm, xd, tx_size - 1,
+                           mi_row + offsetr, mi_col + offsetc, r);
+    }
+  }
+}
+
 static TX_SIZE read_selected_tx_size(VP9_COMMON *cm, MACROBLOCKD *xd,
                                     FRAME_COUNTS *counts,
                                     TX_SIZE max_tx_size, vp9_reader *r) {
@@ -569,13 +598,29 @@ static void read_inter_frame_mode_info(VP9Decoder *const pbi,
  MODE_INFO *const mi = xd->mi[0].src_mi;
  MB_MODE_INFO *const mbmi = &mi->mbmi;
  int inter_block;
+  BLOCK_SIZE bsize = mbmi->sb_type;

  mbmi->mv[0].as_int = 0;
  mbmi->mv[1].as_int = 0;
  mbmi->segment_id = read_inter_segment_id(cm, xd, mi_row, mi_col, r);
  mbmi->skip = read_skip(cm, xd, counts, mbmi->segment_id, r);
  inter_block = read_is_inter_block(cm, xd, counts, mbmi->segment_id, r);
-  mbmi->tx_size = read_tx_size(cm, xd, counts, !mbmi->skip || !inter_block, r);
+
+  if (mbmi->sb_type >= BLOCK_8X8 && cm->tx_mode == TX_MODE_SELECT &&
+      !mbmi->skip && inter_block) {
+    int txb_size = txsize_to_bsize[max_txsize_lookup[bsize]];
+    int bh = num_8x8_blocks_wide_lookup[txb_size];
+    int width  = num_8x8_blocks_wide_lookup[bsize];
+    int height = num_8x8_blocks_high_lookup[bsize];
+    int idx, idy;
+    for (idy = 0; idy < height; idy += bh)
+      for (idx = 0; idx < width; idx += bh)
+        read_tx_size_inter(cm, xd, max_txsize_lookup[mbmi->sb_type],
+                           mi_row + idy, mi_col + idx, r);
+  } else {
+    mbmi->tx_size = read_tx_size(cm, xd, counts,
+                                 !mbmi->skip || !inter_block, r);
+  }

  if (inter_block)
    read_inter_block_mode_info(pbi, xd, counts, tile, mi, mi_row, mi_col, r);
--- a/vp9/decoder/vp9_detokenize.c
+++ b/vp9/decoder/vp9_detokenize.c
@@ -45,17 +45,6 @@ static INLINE int read_coeff(const vp9_prob *probs, int n, vp9_reader *r) {
  return val;
 }

-static const vp9_tree_index coeff_subtree_high[TREE_SIZE(ENTROPY_TOKENS)] = {
-  2, 6,                                         /* 0 = LOW_VAL */
-  -TWO_TOKEN, 4,                                /* 1 = TWO */
-  -THREE_TOKEN, -FOUR_TOKEN,                    /* 2 = THREE */
-  8, 10,                                        /* 3 = HIGH_LOW */
-  -CATEGORY1_TOKEN, -CATEGORY2_TOKEN,           /* 4 = CAT_ONE */
-  12, 14,                                       /* 5 = CAT_THREEFOUR */
-  -CATEGORY3_TOKEN, -CATEGORY4_TOKEN,           /* 6 = CAT_THREE */
-  -CATEGORY5_TOKEN, -CATEGORY6_TOKEN            /* 7 = CAT_FIVE */
-};
-
 static int decode_coefs(VP9_COMMON *cm, const MACROBLOCKD *xd,
                        FRAME_COUNTS *counts, PLANE_TYPE type,
                        tran_low_t *dqcoeff, TX_SIZE tx_size, const int16_t *dq,
@@ -147,7 +136,7 @@ static int decode_coefs(VP9_COMMON *cm, const MACROBLOCKD *xd,
      val = 1;
    } else {
      INCREMENT_COUNT(TWO_TOKEN);
-      token = vp9_read_tree(r, coeff_subtree_high,
+      token = vp9_read_tree(r, vp9_coef_con_tree,
                            vp9_pareto8_full[prob[PIVOT_NODE] - 1]);
      switch (token) {
        case TWO_TOKEN:
--- a/vp9/encoder/vp9_avg.c
+++ b/vp9/encoder/vp9_avg.c
@@ -28,6 +28,94 @@ unsigned int vp9_avg_4x4_c(const uint8_t *s, int p) {
  return (sum + 8) >> 4;
 }

+static void hadamard_col8(const int16_t *src_diff, int src_stride,
+                          int16_t *coeff) {
+  int16_t b0 = src_diff[0 * src_stride] + src_diff[1 * src_stride];
+  int16_t b1 = src_diff[0 * src_stride] - src_diff[1 * src_stride];
+  int16_t b2 = src_diff[2 * src_stride] + src_diff[3 * src_stride];
+  int16_t b3 = src_diff[2 * src_stride] - src_diff[3 * src_stride];
+  int16_t b4 = src_diff[4 * src_stride] + src_diff[5 * src_stride];
+  int16_t b5 = src_diff[4 * src_stride] - src_diff[5 * src_stride];
+  int16_t b6 = src_diff[6 * src_stride] + src_diff[7 * src_stride];
+  int16_t b7 = src_diff[6 * src_stride] - src_diff[7 * src_stride];
+
+  int16_t c0 = b0 + b2;
+  int16_t c1 = b1 + b3;
+  int16_t c2 = b0 - b2;
+  int16_t c3 = b1 - b3;
+  int16_t c4 = b4 + b6;
+  int16_t c5 = b5 + b7;
+  int16_t c6 = b4 - b6;
+  int16_t c7 = b5 - b7;
+
+  coeff[0] = c0 + c4;
+  coeff[7] = c1 + c5;
+  coeff[3] = c2 + c6;
+  coeff[4] = c3 + c7;
+  coeff[2] = c0 - c4;
+  coeff[6] = c1 - c5;
+  coeff[1] = c2 - c6;
+  coeff[5] = c3 - c7;
+}
+
+void vp9_hadamard_8x8_c(int16_t const *src_diff, int src_stride,
+                        int16_t *coeff) {
+  int idx;
+  int16_t buffer[64];
+  int16_t *tmp_buf = &buffer[0];
+  for (idx = 0; idx < 8; ++idx) {
+    hadamard_col8(src_diff, src_stride, tmp_buf);
+    tmp_buf += 8;
+    ++src_diff;
+  }
+
+  tmp_buf = &buffer[0];
+  for (idx = 0; idx < 8; ++idx) {
+    hadamard_col8(tmp_buf, 8, coeff);
+    coeff += 8;
+    ++tmp_buf;
+  }
+}
+
+// In place 16x16 2D Hadamard transform
+void vp9_hadamard_16x16_c(int16_t const *src_diff, int src_stride,
+                          int16_t *coeff) {
+  int idx;
+  for (idx = 0; idx < 4; ++idx) {
+    int16_t const *src_ptr = src_diff + (idx >> 1) * 8 * src_stride
+                                + (idx & 0x01) * 8;
+    vp9_hadamard_8x8_c(src_ptr, src_stride, coeff + idx * 64);
+  }
+
+  for (idx = 0; idx < 64; ++idx) {
+    int16_t a0 = coeff[0];
+    int16_t a1 = coeff[64];
+    int16_t a2 = coeff[128];
+    int16_t a3 = coeff[192];
+
+    int16_t b0 = a0 + a1;
+    int16_t b1 = a0 - a1;
+    int16_t b2 = a2 + a3;
+    int16_t b3 = a2 - a3;
+
+    coeff[0]   = (b0 + b2) >> 1;
+    coeff[64]  = (b1 + b3) >> 1;
+    coeff[128] = (b0 - b2) >> 1;
+    coeff[192] = (b1 - b3) >> 1;
+
+    ++coeff;
+  }
+}
+
+int16_t vp9_satd_c(const int16_t *coeff, int length) {
+  int i;
+  int satd = 0;
+  for (i = 0; i < length; ++i)
+    satd += abs(coeff[i]);
+
+  return (int16_t)satd;
+}
+
 // Integer projection onto row vectors.
 void vp9_int_pro_row_c(int16_t *hbuf, uint8_t const *ref,
                       const int ref_stride, const int height) {
--- a/vp9/encoder/vp9_bitstream.c
+++ b/vp9/encoder/vp9_bitstream.c
@@ -76,6 +76,35 @@ static void prob_diff_update(const vp9_tree_index *tree,
    vp9_cond_prob_diff_update(w, &probs[i], branch_ct[i]);
 }

+static void write_tx_size_inter(const VP9_COMMON *cm, const MACROBLOCKD *xd,
+                                TX_SIZE tx_size, int mi_row, int mi_col,
+                                vp9_writer *w) {
+  MB_MODE_INFO *mbmi = &xd->mi[0].src_mi->mbmi;
+
+  // TODO(jingning): this assumes support of the possible 64x64 transform.
+  if (tx_size == mbmi->tx_size) {
+    vp9_write_bit(w, 0);
+  } else {  // further split
+    BLOCK_SIZE bsize = txsize_to_bsize[tx_size];
+    int bh = num_8x8_blocks_high_lookup[bsize];
+    int i;
+
+    vp9_write_bit(w, 1);
+
+    if (tx_size == TX_8X8)
+      return;
+
+    for (i = 0; i < 4; ++i) {
+      int offsetr = (i >> 1) * bh / 2;
+      int offsetc = (i & 0x01) * bh / 2;
+      if ((mi_row + offsetr < cm->mi_rows) &&
+          (mi_col + offsetc < cm->mi_cols))
+        write_tx_size_inter(cm, xd, tx_size - 1,
+                            mi_row + offsetr, mi_col + offsetc, w);
+    }
+  }
+}
+
 static void write_selected_tx_size(const VP9_COMMON *cm,
                                   const MACROBLOCKD *xd, vp9_writer *w) {
  TX_SIZE tx_size = xd->mi[0].src_mi->mbmi.tx_size;
@@ -235,6 +264,7 @@ static void write_ref_frames(const VP9_COMMON *cm, const MACROBLOCKD *xd,
 }

 static void pack_inter_mode_mvs(VP9_COMP *cpi, const MODE_INFO *mi,
+                                int mi_row, int mi_col,
                                vp9_writer *w) {
  VP9_COMMON *const cm = &cpi->common;
  const nmv_context *nmvc = &cm->fc->nmvc;
@@ -268,9 +298,20 @@ static void pack_inter_mode_mvs(VP9_COMP *cpi, const MODE_INFO *mi,
    vp9_write(w, is_inter, vp9_get_intra_inter_prob(cm, xd));

  if (bsize >= BLOCK_8X8 && cm->tx_mode == TX_MODE_SELECT &&
-      !(is_inter &&
-        (skip || vp9_segfeature_active(seg, segment_id, SEG_LVL_SKIP)))) {
-    write_selected_tx_size(cm, xd, w);
+      !(is_inter && skip)) {
+    if (!is_inter) {
+      write_selected_tx_size(cm, xd, w);
+    } else {
+      int txb_size = txsize_to_bsize[max_txsize_lookup[bsize]];
+      int bh = num_8x8_blocks_wide_lookup[txb_size];
+      int width  = num_8x8_blocks_wide_lookup[bsize];
+      int height = num_8x8_blocks_high_lookup[bsize];
+      int idx, idy;
+      for (idy = 0; idy < height; idy += bh)
+        for (idx = 0; idx < width; idx += bh)
+          write_tx_size_inter(cm, xd, max_txsize_lookup[bsize],
+                              mi_row + idy, mi_col + idx, w);
+    }
  }

  if (!is_inter) {
@@ -392,7 +433,7 @@ static void write_modes_b(VP9_COMP *cpi, const TileInfo *const tile,
  if (frame_is_intra_only(cm)) {
    write_mb_modes_kf(cm, xd, xd->mi, w);
  } else {
-    pack_inter_mode_mvs(cpi, m, w);
+    pack_inter_mode_mvs(cpi, m, mi_row, mi_col, w);
  }

  assert(*tok < tok_end);
@@ -813,6 +854,10 @@ static void encode_txfm_probs(VP9_COMMON *cm, vp9_writer *w,
  if (cm->tx_mode >= ALLOW_32X32)
    vp9_write_bit(w, cm->tx_mode == TX_MODE_SELECT);

+  if (cm->tx_mode != TX_MODE_SELECT) {
+    int a = 10;
+  }
+
  // Probabilities
  if (cm->tx_mode == TX_MODE_SELECT) {
    int i, j;
--- a/vp9/encoder/vp9_encodeframe.c
+++ b/vp9/encoder/vp9_encodeframe.c
@@ -99,9 +99,9 @@ static const uint16_t VP9_HIGH_VAR_OFFS_12[64] = {
 };
 #endif  // CONFIG_VP9_HIGHBITDEPTH

-static unsigned int get_sby_perpixel_variance(VP9_COMP *cpi,
-                                              const struct buf_2d *ref,
-                                              BLOCK_SIZE bs) {
+unsigned int vp9_get_sby_perpixel_variance(VP9_COMP *cpi,
+                                           const struct buf_2d *ref,
+                                           BLOCK_SIZE bs) {
  unsigned int sse;
  const unsigned int var = cpi->fn_ptr[bs].vf(ref->buf, ref->stride,
                                              VP9_VAR_OFFS, 0, &sse);
@@ -109,7 +109,7 @@ static unsigned int get_sby_perpixel_variance(VP9_COMP *cpi,
 }

 #if CONFIG_VP9_HIGHBITDEPTH
-static unsigned int high_get_sby_perpixel_variance(
+unsigned int vp9_high_get_sby_perpixel_variance(
    VP9_COMP *cpi, const struct buf_2d *ref, BLOCK_SIZE bs, int bd) {
  unsigned int var, sse;
  switch (bd) {
@@ -165,21 +165,6 @@ static BLOCK_SIZE get_rd_var_based_fixed_partition(VP9_COMP *cpi, MACROBLOCK *x,
    return BLOCK_8X8;
 }

-static BLOCK_SIZE get_nonrd_var_based_fixed_partition(VP9_COMP *cpi,
-                                                      MACROBLOCK *x,
-                                                      int mi_row,
-                                                      int mi_col) {
-  unsigned int var = get_sby_perpixel_diff_variance(cpi, &x->plane[0].src,
-                                                    mi_row, mi_col,
-                                                    BLOCK_64X64);
-  if (var < 4)
-    return BLOCK_64X64;
-  else if (var < 10)
-    return BLOCK_32X32;
-  else
-    return BLOCK_16X16;
-}
-
 // Lighter version of set_offsets that only sets the mode info
 // pointers.
 static INLINE void set_mode_info_offsets(VP9_COMMON *const cm,
@@ -482,9 +467,9 @@ void vp9_set_vbp_thresholds(VP9_COMP *cpi, int q) {
  } else {
    VP9_COMMON *const cm = &cpi->common;
    const int is_key_frame = (cm->frame_type == KEY_FRAME);
-    const int threshold_multiplier = is_key_frame ? 80 : 4;
+    const int threshold_multiplier = is_key_frame ? 20 : 1;
    const int64_t threshold_base = (int64_t)(threshold_multiplier *
-        vp9_convert_qindex_to_q(q, cm->bit_depth));
+        cpi->y_dequant[q][1]);

    // TODO(marpan): Allow 4x4 partitions for inter-frames.
    // use_4x4_partition = (variance4x4downsample[i2 + j] == 1);
@@ -492,21 +477,20 @@ void vp9_set_vbp_thresholds(VP9_COMP *cpi, int q) {
    // if variance of 16x16 block is very high, so use larger threshold
    // for 16x16 (threshold_bsize_min) in that case.
    if (is_key_frame) {
-      cpi->vbp_threshold = threshold_base >> 2;
-      cpi->vbp_threshold_bsize_max = threshold_base;
-      cpi->vbp_threshold_bsize_min = threshold_base << 2;
-      cpi->vbp_threshold_16x16 = cpi->vbp_threshold;
+      cpi->vbp_threshold_64x64 = threshold_base;
+      cpi->vbp_threshold_32x32 = threshold_base >> 2;
+      cpi->vbp_threshold_16x16 = threshold_base >> 2;
+      cpi->vbp_threshold_8x8 = threshold_base << 2;
      cpi->vbp_bsize_min = BLOCK_8X8;
    } else {
-      cpi->vbp_threshold = threshold_base;
+      cpi->vbp_threshold_32x32 = threshold_base;
      if (cm->width <= 352 && cm->height <= 288) {
-        cpi->vbp_threshold_bsize_max = threshold_base >> 2;
-        cpi->vbp_threshold_bsize_min = threshold_base << 3;
+        cpi->vbp_threshold_64x64 = threshold_base >> 2;
+        cpi->vbp_threshold_16x16 = threshold_base << 3;
      } else {
-        cpi->vbp_threshold_bsize_max = threshold_base;
-        cpi->vbp_threshold_bsize_min = threshold_base << cpi->oxcf.speed;
+        cpi->vbp_threshold_64x64 = threshold_base;
+        cpi->vbp_threshold_16x16 = threshold_base << cpi->oxcf.speed;
      }
-      cpi->vbp_threshold_16x16 = cpi->vbp_threshold_bsize_min;
      cpi->vbp_bsize_min = BLOCK_16X16;
    }
  }
@@ -560,18 +544,10 @@ static void choose_partitioning(VP9_COMP *cpi,

    const YV12_BUFFER_CONFIG *yv12_g = get_ref_frame_buffer(cpi, GOLDEN_FRAME);
    unsigned int y_sad, y_sad_g;
-    BLOCK_SIZE bsize;
-    if (mi_row + 4 < cm->mi_rows && mi_col + 4 < cm->mi_cols)
-      bsize = BLOCK_64X64;
-    else if (mi_row + 4 < cm->mi_rows && mi_col + 4 >= cm->mi_cols)
-      bsize = BLOCK_32X64;
-    else if (mi_row + 4 >= cm->mi_rows && mi_col + 4 < cm->mi_cols)
-      bsize = BLOCK_64X32;
-    else
-      bsize = BLOCK_32X32;
+    const BLOCK_SIZE bsize = BLOCK_32X32
+        + (mi_col + 4 < cm->mi_cols) * 2 + (mi_row + 4 < cm->mi_rows);

    assert(yv12 != NULL);
-
    if (yv12_g && yv12_g != yv12) {
      vp9_setup_pre_planes(xd, 0, yv12_g, mi_row, mi_col,
                           &cm->frame_refs[GOLDEN_FRAME - 1].sf);
@@ -692,7 +668,7 @@ static void choose_partitioning(VP9_COMP *cpi,
      }
      if (is_key_frame || (low_res &&
          vt.split[i].split[j].part_variances.none.variance >
-          (cpi->vbp_threshold << 1))) {
+          (cpi->vbp_threshold_32x32 << 1))) {
        // Go down to 4x4 down-sampling for variance.
        variance4x4downsample[i2 + j] = 1;
        for (k = 0; k < 4; k++) {
@@ -757,7 +733,7 @@ static void choose_partitioning(VP9_COMP *cpi,
    // If variance of this 32x32 block is above the threshold, force the block
    // to split. This also forces a split on the upper (64x64) level.
    get_variance(&vt.split[i].part_variances.none);
-    if (vt.split[i].part_variances.none.variance > cpi->vbp_threshold) {
+    if (vt.split[i].part_variances.none.variance > cpi->vbp_threshold_32x32) {
      force_split[i + 1] = 1;
      force_split[0] = 1;
    }
@@ -769,7 +745,7 @@ static void choose_partitioning(VP9_COMP *cpi,
  // we get to one that's got a variance lower than our threshold.
  if ( mi_col + 8 > cm->mi_cols || mi_row + 8 > cm->mi_rows ||
      !set_vt_partitioning(cpi, xd, &vt, BLOCK_64X64, mi_row, mi_col,
-                           cpi->vbp_threshold_bsize_max, BLOCK_16X16,
+                           cpi->vbp_threshold_64x64, BLOCK_16X16,
                           force_split[0])) {
    for (i = 0; i < 4; ++i) {
      const int x32_idx = ((i & 1) << 2);
@@ -777,7 +753,7 @@ static void choose_partitioning(VP9_COMP *cpi,
      const int i2 = i << 2;
      if (!set_vt_partitioning(cpi, xd, &vt.split[i], BLOCK_32X32,
                               (mi_row + y32_idx), (mi_col + x32_idx),
-                               cpi->vbp_threshold,
+                               cpi->vbp_threshold_32x32,
                               BLOCK_16X16, force_split[i + 1])) {
        for (j = 0; j < 4; ++j) {
          const int x16_idx = ((j & 1) << 1);
@@ -801,7 +777,7 @@ static void choose_partitioning(VP9_COMP *cpi,
                                         BLOCK_8X8,
                                         mi_row + y32_idx + y16_idx + y8_idx,
                                         mi_col + x32_idx + x16_idx + x8_idx,
-                                         cpi->vbp_threshold_bsize_min,
+                                         cpi->vbp_threshold_8x8,
                                         BLOCK_8X8, 0)) {
                  set_block_size(cpi, xd,
                                 (mi_row + y32_idx + y16_idx + y8_idx),
@@ -1073,13 +1049,15 @@ static void rd_pick_sb_modes(VP9_COMP *cpi,
 #if CONFIG_VP9_HIGHBITDEPTH
  if (xd->cur_buf->flags & YV12_FLAG_HIGHBITDEPTH) {
    x->source_variance =
-        high_get_sby_perpixel_variance(cpi, &x->plane[0].src, bsize, xd->bd);
+        vp9_high_get_sby_perpixel_variance(cpi, &x->plane[0].src,
+                                           bsize, xd->bd);
  } else {
    x->source_variance =
-        get_sby_perpixel_variance(cpi, &x->plane[0].src, bsize);
+      vp9_get_sby_perpixel_variance(cpi, &x->plane[0].src, bsize);
  }
 #else
-  x->source_variance = get_sby_perpixel_variance(cpi, &x->plane[0].src, bsize);
+  x->source_variance =
+    vp9_get_sby_perpixel_variance(cpi, &x->plane[0].src, bsize);
 #endif  // CONFIG_VP9_HIGHBITDEPTH

  // Save rdmult before it might be changed, so it can be restored later.
@@ -1103,8 +1081,9 @@ static void rd_pick_sb_modes(VP9_COMP *cpi,
  } else if (aq_mode == CYCLIC_REFRESH_AQ) {
    const uint8_t *const map = cm->seg.update_map ? cpi->segmentation_map
                                                  : cm->last_frame_seg_map;
-    // If segment 1, use rdmult for that segment.
-    if (vp9_get_segment_id(cm, map, bsize, mi_row, mi_col))
+    // If segment is boosted, use rdmult for that segment.
+    if (cyclic_refresh_segment_id_boosted(
+            vp9_get_segment_id(cm, map, bsize, mi_row, mi_col)))
      x->rdmult = vp9_cyclic_refresh_get_rdmult(cpi->cyclic_refresh);
  }

@@ -2842,6 +2821,9 @@ static MV_REFERENCE_FRAME get_frame_type(const VP9_COMP *cpi) {
 static TX_MODE select_tx_mode(const VP9_COMP *cpi, MACROBLOCKD *const xd) {
  if (xd->lossless)
    return ONLY_4X4;
+
+  return TX_MODE_SELECT;
+
  if (cpi->common.frame_type == KEY_FRAME &&
      cpi->sf.use_nonrd_pick_mode &&
      cpi->sf.partition_search_type == VAR_BASED_PARTITION)
@@ -2877,7 +2859,7 @@ static void nonrd_pick_sb_modes(VP9_COMP *cpi,
  mbmi->sb_type = bsize;

  if (cpi->oxcf.aq_mode == CYCLIC_REFRESH_AQ && cm->seg.enabled)
-    if (mbmi->segment_id)
+    if (cyclic_refresh_segment_id_boosted(mbmi->segment_id))
      x->rdmult = vp9_cyclic_refresh_get_rdmult(cpi->cyclic_refresh);

  if (cm->frame_type == KEY_FRAME)
@@ -4108,8 +4090,9 @@ static void encode_superblock(VP9_COMP *cpi, ThreadData *td,
    if (cm->tx_mode == TX_MODE_SELECT &&
        mbmi->sb_type >= BLOCK_8X8  &&
        !(is_inter_block(mbmi) && (mbmi->skip || seg_skip))) {
-      ++get_tx_counts(max_txsize_lookup[bsize], vp9_get_tx_size_context(xd),
-                      &td->counts->tx)[mbmi->tx_size];
+      if (!is_inter_block(mbmi))
+        ++get_tx_counts(max_txsize_lookup[bsize], vp9_get_tx_size_context(xd),
+                        &td->counts->tx)[mbmi->tx_size];
    } else {
      int x, y;
      TX_SIZE tx_size;
--- a/vp9/encoder/vp9_encoder.c
+++ b/vp9/encoder/vp9_encoder.c
@@ -126,14 +126,25 @@ void vp9_apply_active_map(VP9_COMP *cpi) {

  assert(AM_SEGMENT_ID_ACTIVE == CR_SEGMENT_ID_BASE);

+  if (frame_is_intra_only(&cpi->common)) {
+    cpi->active_map.enabled = 0;
+    cpi->active_map.update = 1;
+  }
+
  if (cpi->active_map.update) {
    if (cpi->active_map.enabled) {
      for (i = 0; i < cpi->common.mi_rows * cpi->common.mi_cols; ++i)
        if (seg_map[i] == AM_SEGMENT_ID_ACTIVE) seg_map[i] = active_map[i];
      vp9_enable_segmentation(seg);
      vp9_enable_segfeature(seg, AM_SEGMENT_ID_INACTIVE, SEG_LVL_SKIP);
+      vp9_enable_segfeature(seg, AM_SEGMENT_ID_INACTIVE, SEG_LVL_ALT_LF);
+      // Setting the data to -MAX_LOOP_FILTER will result in the computed loop
+      // filter level being zero regardless of the value of seg->abs_delta.
+      vp9_set_segdata(seg, AM_SEGMENT_ID_INACTIVE,
+                      SEG_LVL_ALT_LF, -MAX_LOOP_FILTER);
    } else {
      vp9_disable_segfeature(seg, AM_SEGMENT_ID_INACTIVE, SEG_LVL_SKIP);
+      vp9_disable_segfeature(seg, AM_SEGMENT_ID_INACTIVE, SEG_LVL_ALT_LF);
      if (seg->enabled) {
        seg->update_data = 1;
        seg->update_map = 1;
@@ -172,6 +183,33 @@ int vp9_set_active_map(VP9_COMP* cpi,
  }
 }

+int vp9_get_active_map(VP9_COMP* cpi,
+                       unsigned char* new_map_16x16,
+                       int rows,
+                       int cols) {
+  if (rows == cpi->common.mb_rows && cols == cpi->common.mb_cols &&
+      new_map_16x16) {
+    unsigned char* const seg_map_8x8 = cpi->segmentation_map;
+    const int mi_rows = cpi->common.mi_rows;
+    const int mi_cols = cpi->common.mi_cols;
+    vpx_memset(new_map_16x16, !cpi->active_map.enabled, rows * cols);
+    if (cpi->active_map.enabled) {
+      int r, c;
+      for (r = 0; r < mi_rows; ++r) {
+        for (c = 0; c < mi_cols; ++c) {
+          // Cyclic refresh segments are considered active despite not having
+          // AM_SEGMENT_ID_ACTIVE
+          new_map_16x16[(r >> 1) * cols + (c >> 1)] |=
+              seg_map_8x8[r * mi_cols + c] != AM_SEGMENT_ID_INACTIVE;
+        }
+      }
+    }
+    return 0;
+  } else {
+    return -1;
+  }
+}
+
 void vp9_set_high_precision_mv(VP9_COMP *cpi, int allow_high_precision_mv) {
  MACROBLOCK *const mb = &cpi->td.mb;
  cpi->common.allow_high_precision_mv = allow_high_precision_mv;
@@ -303,7 +341,10 @@ static void dealloc_compressor_data(VP9_COMP *cpi) {
  vpx_free(cpi->active_map.map);
  cpi->active_map.map = NULL;

-  vp9_free_ref_frame_buffers(cm);
+  vp9_free_ref_frame_buffers(cm->buffer_pool);
+#if CONFIG_VP9_POSTPROC
+  vp9_free_postproc_buffers(cm);
+#endif
  vp9_free_context_buffers(cm);

  vp9_free_frame_buffer(&cpi->last_frame_uf);
@@ -1908,6 +1949,10 @@ void vp9_remove_compressor(VP9_COMP *cpi) {
 #endif

  vp9_remove_common(cm);
+  vp9_free_ref_frame_buffers(cm->buffer_pool);
+#if CONFIG_VP9_POSTPROC
+  vp9_free_postproc_buffers(cm);
+#endif
  vpx_free(cpi);

 #if CONFIG_VP9_TEMPORAL_DENOISING
--- a/vp9/encoder/vp9_encoder.h
+++ b/vp9/encoder/vp9_encoder.h
@@ -416,6 +416,7 @@ typedef struct VP9_COMP {
  double total_ssimg_all;

  int b_calculate_ssimg;
+  int dummy_writing;
 #endif
  int b_calculate_psnr;

@@ -460,10 +461,10 @@ typedef struct VP9_COMP {
  int resize_pending;

  // VAR_BASED_PARTITION thresholds
-  int64_t vbp_threshold;
-  int64_t vbp_threshold_bsize_min;
-  int64_t vbp_threshold_bsize_max;
+  int64_t vbp_threshold_64x64;
+  int64_t vbp_threshold_32x32;
  int64_t vbp_threshold_16x16;
+  int64_t vbp_threshold_8x8;
  BLOCK_SIZE vbp_bsize_min;

  // Multi-threading
@@ -508,6 +509,8 @@ int vp9_update_entropy(VP9_COMP *cpi, int update);

 int vp9_set_active_map(VP9_COMP *cpi, unsigned char *map, int rows, int cols);

+int vp9_get_active_map(VP9_COMP *cpi, unsigned char *map, int rows, int cols);
+
 int vp9_set_internal_size(VP9_COMP *cpi,
                          VPX_SCALING horiz_mode, VPX_SCALING vert_mode);

--- a/vp9/encoder/vp9_firstpass.c
+++ b/vp9/encoder/vp9_firstpass.c
@@ -38,7 +38,7 @@
 #define OUTPUT_FPF          0
 #define ARF_STATS_OUTPUT    0

-#define GROUP_ADAPTIVE_MAXQ 0
+#define GROUP_ADAPTIVE_MAXQ 1

 #define BOOST_BREAKOUT      12.5
 #define BOOST_FACTOR        12.5
@@ -61,12 +61,9 @@
 #define RC_FACTOR_MAX       1.75


-#define INTRA_WEIGHT_EXPERIMENT 0
-#if INTRA_WEIGHT_EXPERIMENT
 #define NCOUNT_INTRA_THRESH 8192
 #define NCOUNT_INTRA_FACTOR 3
 #define NCOUNT_FRAME_II_THRESH 5.0
-#endif

 #define DOUBLE_DIVIDE_CHECK(x) ((x) < 0 ? (x) - 0.000001 : (x) + 0.000001)

@@ -832,7 +829,6 @@ void vp9_first_pass(VP9_COMP *cpi, const struct lookahead_entry *source) {
          // Keep a count of cases where the inter and intra were very close
          // and very low. This helps with scene cut detection for example in
          // cropped clips with black bars at the sides or top and bottom.
-#if INTRA_WEIGHT_EXPERIMENT
          if (((this_error - intrapenalty) * 9 <= motion_error * 10) &&
              (this_error < (2 * intrapenalty))) {
            neutral_count += 1.0;
@@ -843,12 +839,6 @@ void vp9_first_pass(VP9_COMP *cpi, const struct lookahead_entry *source) {
            neutral_count += (double)motion_error /
                             DOUBLE_DIVIDE_CHECK((double)this_error);
          }
-#else
-          if (((this_error - intrapenalty) * 9 <= motion_error * 10) &&
-              (this_error < (2 * intrapenalty))) {
-            neutral_count += 1.0;
-          }
-#endif

          mv.row *= 8;
          mv.col *= 8;
@@ -1291,11 +1281,10 @@ static double get_sr_decay_rate(const VP9_COMP *cpi,
    frame->pcnt_motion * ((frame->mvc_abs + frame->mvr_abs) / 2);

  modified_pct_inter = frame->pcnt_inter;
-#if INTRA_WEIGHT_EXPERIMENT
  if ((frame->intra_error / DOUBLE_DIVIDE_CHECK(frame->coded_error)) <
-      (double)NCOUNT_FRAME_II_THRESH)
+      (double)NCOUNT_FRAME_II_THRESH) {
    modified_pct_inter = frame->pcnt_inter - frame->pcnt_neutral;
-#endif
+  }
  modified_pcnt_intra = 100 * (1.0 - modified_pct_inter);


--- a/vp9/encoder/vp9_pickmode.c
+++ b/vp9/encoder/vp9_pickmode.c
@@ -20,9 +20,11 @@
 #include "vp9/common/vp9_blockd.h"
 #include "vp9/common/vp9_common.h"
 #include "vp9/common/vp9_mvref_common.h"
+#include "vp9/common/vp9_pred_common.h"
 #include "vp9/common/vp9_reconinter.h"
 #include "vp9/common/vp9_reconintra.h"

+#include "vp9/encoder/vp9_cost.h"
 #include "vp9/encoder/vp9_encoder.h"
 #include "vp9/encoder/vp9_pickmode.h"
 #include "vp9/encoder/vp9_ratectrl.h"
@@ -188,6 +190,8 @@ static int combined_motion_search(VP9_COMP *cpi, MACROBLOCK *x,
                                 cond_cost_list(cpi, cost_list),
                                 x->nmvjointcost, x->mvcost,
                                 &dis, &x->pred_sse[ref], NULL, 0, 0);
+    *rate_mv = vp9_mv_bit_cost(&tmp_mv->as_mv, &ref_mv,
+                               x->nmvjointcost, x->mvcost, MV_COST_WEIGHT);
  }

  if (scaled_ref_frame) {
@@ -198,6 +202,247 @@ static int combined_motion_search(VP9_COMP *cpi, MACROBLOCK *x,
  return rv;
 }

+static void block_variance(const uint8_t *src, int src_stride,
+                           const uint8_t *ref, int ref_stride,
+                           int w, int h, unsigned int *sse, int *sum,
+                           int block_size, unsigned int *sse8x8,
+                           int *sum8x8, unsigned int *var8x8) {
+  int i, j, k = 0;
+
+  *sse = 0;
+  *sum = 0;
+
+  for (i = 0; i < h; i += block_size) {
+    for (j = 0; j < w; j += block_size) {
+      vp9_get8x8var(src + src_stride * i + j, src_stride,
+                    ref + ref_stride * i + j, ref_stride,
+                    &sse8x8[k], &sum8x8[k]);
+      *sse += sse8x8[k];
+      *sum += sum8x8[k];
+      var8x8[k] = sse8x8[k] - (((unsigned int)sum8x8[k] * sum8x8[k]) >> 6);
+      k++;
+    }
+  }
+}
+
+static void calculate_variance(int bw, int bh, TX_SIZE tx_size,
+                               unsigned int *sse_i, int *sum_i,
+                               unsigned int *var_o, unsigned int *sse_o,
+                               int *sum_o) {
+  const BLOCK_SIZE unit_size = txsize_to_bsize[tx_size];
+  const int nw = 1 << (bw - b_width_log2_lookup[unit_size]);
+  const int nh = 1 << (bh - b_height_log2_lookup[unit_size]);
+  int i, j, k = 0;
+
+  for (i = 0; i < nh; i += 2) {
+    for (j = 0; j < nw; j += 2) {
+      sse_o[k] = sse_i[i * nw + j] + sse_i[i * nw + j + 1] +
+          sse_i[(i + 1) * nw + j] + sse_i[(i + 1) * nw + j + 1];
+      sum_o[k] = sum_i[i * nw + j] + sum_i[i * nw + j + 1] +
+          sum_i[(i + 1) * nw + j] + sum_i[(i + 1) * nw + j + 1];
+      var_o[k] = sse_o[k] - (((unsigned int)sum_o[k] * sum_o[k]) >>
+          (b_width_log2_lookup[unit_size] +
+              b_height_log2_lookup[unit_size] + 6));
+      k++;
+    }
+  }
+}
+
+static void model_rd_for_sb_y_large(VP9_COMP *cpi, BLOCK_SIZE bsize,
+                                    MACROBLOCK *x, MACROBLOCKD *xd,
+                                    int *out_rate_sum, int64_t *out_dist_sum,
+                                    unsigned int *var_y, unsigned int *sse_y,
+                                    int mi_row, int mi_col, int *early_term) {
+  // Note our transform coeffs are 8 times an orthogonal transform.
+  // Hence quantizer step is also 8 times. To get effective quantizer
+  // we need to divide by 8 before sending to modeling function.
+  unsigned int sse;
+  int rate;
+  int64_t dist;
+  struct macroblock_plane *const p = &x->plane[0];
+  struct macroblockd_plane *const pd = &xd->plane[0];
+  const uint32_t dc_quant = pd->dequant[0];
+  const uint32_t ac_quant = pd->dequant[1];
+  const int64_t dc_thr = dc_quant * dc_quant >> 6;
+  const int64_t ac_thr = ac_quant * ac_quant >> 6;
+  unsigned int var;
+  int sum;
+  int skip_dc = 0;
+
+  const int bw = b_width_log2_lookup[bsize];
+  const int bh = b_height_log2_lookup[bsize];
+  const int num8x8 = 1 << (bw + bh - 2);
+  unsigned int sse8x8[64] = {0};
+  int sum8x8[64] = {0};
+  unsigned int var8x8[64] = {0};
+  TX_SIZE tx_size;
+  int i, k;
+
+  // Calculate variance for whole partition, and also save 8x8 blocks' variance
+  // to be used in following transform skipping test.
+  block_variance(p->src.buf, p->src.stride, pd->dst.buf, pd->dst.stride,
+                 4 << bw, 4 << bh, &sse, &sum, 8, sse8x8, sum8x8, var8x8);
+  var = sse - (((int64_t)sum * sum) >> (bw + bh + 4));
+
+  *var_y = var;
+  *sse_y = sse;
+
+  if (cpi->common.tx_mode == TX_MODE_SELECT) {
+    if (sse > (var << 2))
+      tx_size = MIN(max_txsize_lookup[bsize],
+                    tx_mode_to_biggest_tx_size[cpi->common.tx_mode]);
+    else
+      tx_size = TX_8X8;
+
+    if (cpi->sf.partition_search_type == VAR_BASED_PARTITION) {
+      if (cpi->oxcf.aq_mode == CYCLIC_REFRESH_AQ &&
+          cyclic_refresh_segment_id_boosted(xd->mi[0].src_mi->mbmi.segment_id))
+        tx_size = TX_8X8;
+      else if (tx_size > TX_16X16)
+        tx_size = TX_16X16;
+    }
+  } else {
+    tx_size = MIN(max_txsize_lookup[bsize],
+                  tx_mode_to_biggest_tx_size[cpi->common.tx_mode]);
+  }
+
+  assert(tx_size >= TX_8X8);
+  xd->mi[0].src_mi->mbmi.tx_size = tx_size;
+
+  // Evaluate if the partition block is a skippable block in Y plane.
+  {
+    unsigned int sse16x16[16] = {0};
+    int sum16x16[16] = {0};
+    unsigned int var16x16[16] = {0};
+    const int num16x16 = num8x8 >> 2;
+
+    unsigned int sse32x32[4] = {0};
+    int sum32x32[4] = {0};
+    unsigned int var32x32[4] = {0};
+    const int num32x32 = num8x8 >> 4;
+
+    int ac_test = 1;
+    int dc_test = 1;
+    const int num = (tx_size == TX_8X8) ? num8x8 :
+        ((tx_size == TX_16X16) ? num16x16 : num32x32);
+    const unsigned int *sse_tx = (tx_size == TX_8X8) ? sse8x8 :
+        ((tx_size == TX_16X16) ? sse16x16 : sse32x32);
+    const unsigned int *var_tx = (tx_size == TX_8X8) ? var8x8 :
+        ((tx_size == TX_16X16) ? var16x16 : var32x32);
+
+    // Calculate variance if tx_size > TX_8X8
+    if (tx_size >= TX_16X16)
+      calculate_variance(bw, bh, TX_8X8, sse8x8, sum8x8, var16x16, sse16x16,
+                         sum16x16);
+    if (tx_size == TX_32X32)
+      calculate_variance(bw, bh, TX_16X16, sse16x16, sum16x16, var32x32,
+                         sse32x32, sum32x32);
+
+    // Skipping test
+    x->skip_txfm[0] = 0;
+    for (k = 0; k < num; k++)
+      // Check if all ac coefficients can be quantized to zero.
+      if (!(var_tx[k] < ac_thr || var == 0)) {
+        ac_test = 0;
+        break;
+      }
+
+    for (k = 0; k < num; k++)
+      // Check if dc coefficient can be quantized to zero.
+      if (!(sse_tx[k] - var_tx[k] < dc_thr || sse == var)) {
+        dc_test = 0;
+        break;
+      }
+
+    if (ac_test) {
+      x->skip_txfm[0] = 2;
+
+      if (dc_test)
+        x->skip_txfm[0] = 1;
+    } else if (dc_test) {
+      skip_dc = 1;
+    }
+  }
+
+  if (x->skip_txfm[0] == 1) {
+    int skip_uv[2] = {0};
+    unsigned int var_uv[2];
+    unsigned int sse_uv[2];
+
+    *out_rate_sum = 0;
+    *out_dist_sum = sse << 4;
+
+    // Transform skipping test in UV planes.
+    for (i = 1; i <= 2; i++) {
+      struct macroblock_plane *const p = &x->plane[i];
+      struct macroblockd_plane *const pd = &xd->plane[i];
+      const TX_SIZE uv_tx_size = get_uv_tx_size(&xd->mi[0].src_mi->mbmi, pd);
+      const BLOCK_SIZE unit_size = txsize_to_bsize[uv_tx_size];
+      const int sf = (bw - b_width_log2_lookup[unit_size]) +
+          (bh - b_height_log2_lookup[unit_size]);
+      const BLOCK_SIZE bs = get_plane_block_size(bsize, pd);
+      const uint32_t uv_dc_thr = pd->dequant[0] * pd->dequant[0] >> (6 - sf);
+      const uint32_t uv_ac_thr = pd->dequant[1] * pd->dequant[1] >> (6 - sf);
+      int j = i - 1;
+
+      vp9_build_inter_predictors_sbp(xd, mi_row, mi_col, bsize, i);
+      var_uv[j] = cpi->fn_ptr[bs].vf(p->src.buf, p->src.stride,
+                             pd->dst.buf, pd->dst.stride, &sse_uv[j]);
+
+      if (var_uv[j] < uv_ac_thr || var_uv[j] == 0) {
+        if (sse_uv[j] - var_uv[j] < uv_dc_thr || sse_uv[j] == var_uv[j])
+          skip_uv[j] = 1;
+      }
+    }
+
+    // If the transform in YUV planes are skippable, the mode search checks
+    // fewer inter modes and doesn't check intra modes.
+    if (skip_uv[0] & skip_uv[1]) {
+      *early_term = 1;
+    }
+
+    return;
+  }
+
+  if (!skip_dc) {
+#if CONFIG_VP9_HIGHBITDEPTH
+    if (xd->cur_buf->flags & YV12_FLAG_HIGHBITDEPTH) {
+      vp9_model_rd_from_var_lapndz(sse - var, num_pels_log2_lookup[bsize],
+                                   dc_quant >> (xd->bd - 5), &rate, &dist);
+    } else {
+      vp9_model_rd_from_var_lapndz(sse - var, num_pels_log2_lookup[bsize],
+                                   dc_quant >> 3, &rate, &dist);
+    }
+#else
+    vp9_model_rd_from_var_lapndz(sse - var, num_pels_log2_lookup[bsize],
+                                 dc_quant >> 3, &rate, &dist);
+#endif  // CONFIG_VP9_HIGHBITDEPTH
+  }
+
+  if (!skip_dc) {
+    *out_rate_sum = rate >> 1;
+    *out_dist_sum = dist << 3;
+  } else {
+    *out_rate_sum = 0;
+    *out_dist_sum = (sse - var) << 4;
+  }
+
+#if CONFIG_VP9_HIGHBITDEPTH
+  if (xd->cur_buf->flags & YV12_FLAG_HIGHBITDEPTH) {
+    vp9_model_rd_from_var_lapndz(var, num_pels_log2_lookup[bsize],
+                                 ac_quant >> (xd->bd - 5), &rate, &dist);
+  } else {
+    vp9_model_rd_from_var_lapndz(var, num_pels_log2_lookup[bsize],
+                                 ac_quant >> 3, &rate, &dist);
+  }
+#else
+  vp9_model_rd_from_var_lapndz(var, num_pels_log2_lookup[bsize],
+                               ac_quant >> 3, &rate, &dist);
+#endif  // CONFIG_VP9_HIGHBITDEPTH
+
+  *out_rate_sum += rate;
+  *out_dist_sum += dist << 4;
+}

 static void model_rd_for_sb_y(VP9_COMP *cpi, BLOCK_SIZE bsize,
                              MACROBLOCK *x, MACROBLOCKD *xd,
@@ -312,6 +557,132 @@ static void model_rd_for_sb_y(VP9_COMP *cpi, BLOCK_SIZE bsize,
  *out_dist_sum += dist << 4;
 }

+#if CONFIG_VP9_HIGHBITDEPTH
+static void block_yrd(VP9_COMP *cpi, MACROBLOCK *x, int *rate, int64_t *dist,
+                      int *skippable, int64_t *sse, int plane,
+                      BLOCK_SIZE bsize, TX_SIZE tx_size) {
+  MACROBLOCKD *xd = &x->e_mbd;
+  unsigned int var_y, sse_y;
+  (void)plane;
+  (void)tx_size;
+  model_rd_for_sb_y(cpi, bsize, x, xd, rate, dist, &var_y, &sse_y);
+  *sse = INT_MAX;
+  *skippable = 0;
+  return;
+}
+#else
+static void block_yrd(VP9_COMP *cpi, MACROBLOCK *x, int *rate, int64_t *dist,
+                      int *skippable, int64_t *sse, int plane,
+                      BLOCK_SIZE bsize, TX_SIZE tx_size) {
+  MACROBLOCKD *xd = &x->e_mbd;
+  const struct macroblockd_plane *pd = &xd->plane[plane];
+  const struct macroblock_plane *const p = &x->plane[plane];
+  const int num_4x4_w = num_4x4_blocks_wide_lookup[bsize];
+  const int num_4x4_h = num_4x4_blocks_high_lookup[bsize];
+  const int step = 1 << (tx_size << 1);
+  const int block_step = (1 << tx_size);
+  int block = 0, r, c;
+  int shift = tx_size == TX_32X32 ? 0 : 2;
+  const int max_blocks_wide = num_4x4_w + (xd->mb_to_right_edge >= 0 ? 0 :
+      xd->mb_to_right_edge >> (5 + pd->subsampling_x));
+  const int max_blocks_high = num_4x4_h + (xd->mb_to_bottom_edge >= 0 ? 0 :
+      xd->mb_to_bottom_edge >> (5 + pd->subsampling_y));
+  int eob_cost = 0;
+
+  (void)cpi;
+  vp9_subtract_plane(x, bsize, plane);
+  *skippable = 1;
+  // Keep track of the row and column of the blocks we use so that we know
+  // if we are in the unrestricted motion border.
+  for (r = 0; r < max_blocks_high; r += block_step) {
+    for (c = 0; c < num_4x4_w; c += block_step) {
+      if (c < max_blocks_wide) {
+        const scan_order *const scan_order = &vp9_default_scan_orders[tx_size];
+        tran_low_t *const coeff = BLOCK_OFFSET(p->coeff, block);
+        tran_low_t *const qcoeff = BLOCK_OFFSET(p->qcoeff, block);
+        tran_low_t *const dqcoeff = BLOCK_OFFSET(pd->dqcoeff, block);
+        uint16_t *const eob = &p->eobs[block];
+        const int diff_stride = 4 * num_4x4_blocks_wide_lookup[bsize];
+        const int16_t *src_diff;
+        src_diff = &p->src_diff[(r * diff_stride + c) << 2];
+
+        switch (tx_size) {
+          case TX_32X32:
+            vp9_fdct32x32_rd(src_diff, coeff, diff_stride);
+            vp9_quantize_fp_32x32(coeff, 1024, x->skip_block, p->zbin,
+                                  p->round_fp, p->quant_fp, p->quant_shift,
+                                  qcoeff, dqcoeff, pd->dequant, eob,
+                                  scan_order->scan, scan_order->iscan);
+            break;
+          case TX_16X16:
+            vp9_hadamard_16x16(src_diff, diff_stride, (int16_t *)coeff);
+            vp9_quantize_fp(coeff, 256, x->skip_block, p->zbin, p->round_fp,
+                            p->quant_fp, p->quant_shift, qcoeff, dqcoeff,
+                            pd->dequant, eob,
+                            scan_order->scan, scan_order->iscan);
+            break;
+          case TX_8X8:
+            vp9_hadamard_8x8(src_diff, diff_stride, (int16_t *)coeff);
+            vp9_quantize_fp(coeff, 64, x->skip_block, p->zbin, p->round_fp,
+                            p->quant_fp, p->quant_shift, qcoeff, dqcoeff,
+                            pd->dequant, eob,
+                            scan_order->scan, scan_order->iscan);
+            break;
+          case TX_4X4:
+            x->fwd_txm4x4(src_diff, coeff, diff_stride);
+            vp9_quantize_fp(coeff, 16, x->skip_block, p->zbin, p->round_fp,
+                            p->quant_fp, p->quant_shift, qcoeff, dqcoeff,
+                            pd->dequant, eob,
+                            scan_order->scan, scan_order->iscan);
+            break;
+          default:
+            assert(0);
+            break;
+        }
+        *skippable &= (*eob == 0);
+        eob_cost += 1;
+      }
+      block += step;
+    }
+  }
+
+  if (*skippable && *sse < INT64_MAX) {
+    *rate = 0;
+    *dist = (*sse << 6) >> shift;
+    *sse = *dist;
+    return;
+  }
+
+  block = 0;
+  *rate = 0;
+  *dist = 0;
+  *sse = (*sse << 6) >> shift;
+  for (r = 0; r < max_blocks_high; r += block_step) {
+    for (c = 0; c < num_4x4_w; c += block_step) {
+      if (c < max_blocks_wide) {
+        tran_low_t *const coeff = BLOCK_OFFSET(p->coeff, block);
+        tran_low_t *const qcoeff = BLOCK_OFFSET(p->qcoeff, block);
+        tran_low_t *const dqcoeff = BLOCK_OFFSET(pd->dqcoeff, block);
+        uint16_t *const eob = &p->eobs[block];
+
+        if (*eob == 1)
+          *rate += (int)abs(qcoeff[0]);
+        else if (*eob > 1)
+          *rate += (int)vp9_satd((const int16_t *)qcoeff, step << 4);
+
+        *dist += vp9_block_error_fp(coeff, dqcoeff, step << 4) >> shift;
+      }
+      block += step;
+    }
+  }
+
+  if (*skippable == 0) {
+    *rate <<= 10;
+    *rate += (eob_cost << 8);
+  }
+}
+#endif
+
 static void model_rd_for_sb_uv(VP9_COMP *cpi, BLOCK_SIZE bsize,
                               MACROBLOCK *x, MACROBLOCKD *xd,
                               int *out_rate_sum, int64_t *out_dist_sum,
@@ -518,7 +889,9 @@ static void estimate_block_intra(int plane, int block, BLOCK_SIZE plane_bsize,
  int i, j;
  int rate;
  int64_t dist;
-  unsigned int var_y, sse_y;
+  int64_t this_sse = INT64_MAX;
+  int is_skippable;
+
  txfrm_block_to_raster_xy(plane_bsize, tx_size, block, &i, &j);
  assert(plane == 0);
  (void) plane;
@@ -533,8 +906,13 @@ static void estimate_block_intra(int plane, int block, BLOCK_SIZE plane_bsize,
                          x->skip_encode ? src_stride : dst_stride,
                          pd->dst.buf, dst_stride,
                          i, j, 0);
-  // This procedure assumes zero offset from p->src.buf and pd->dst.buf.
-  model_rd_for_sb_y(cpi, bsize_tx, x, xd, &rate, &dist, &var_y, &sse_y);
+
+  // TODO(jingning): This needs further refactoring.
+  block_yrd(cpi, x, &rate, &dist, &is_skippable, &this_sse, 0,
+            bsize_tx, MIN(tx_size, TX_16X16));
+  x->skip_txfm[0] = is_skippable;
+  rate += vp9_cost_bit(vp9_get_skip_prob(&cpi->common, xd), is_skippable);
+
  p->src.buf = src_buf_base;
  pd->dst.buf = dst_buf_base;
  args->rate += rate;
@@ -602,10 +980,6 @@ void vp9_pick_intra_mode(VP9_COMP *cpi, MACROBLOCK *x, RD_COST *rd_cost,
  *rd_cost = best_rdc;
 }

-static const int ref_frame_cost[MAX_REF_FRAMES] = {
-    1235, 229, 530, 615,
-};
-
 typedef struct {
  MV_REFERENCE_FRAME ref_frame;
  PREDICTION_MODE pred_mode;
@@ -682,6 +1056,21 @@ void vp9_pick_inter_mode(VP9_COMP *cpi, MACROBLOCK *x,
  int ref_frame_skip_mask = 0;
  int idx;
  int best_pred_sad = INT_MAX;
+  int best_early_term = 0;
+  int ref_frame_cost[MAX_REF_FRAMES];
+  vp9_prob intra_inter_p = vp9_get_intra_inter_prob(cm, xd);
+  vp9_prob ref_single_p1 = vp9_get_pred_prob_single_ref_p1(cm, xd);
+  vp9_prob ref_single_p2 = vp9_get_pred_prob_single_ref_p2(cm, xd);
+
+  ref_frame_cost[INTRA_FRAME] = vp9_cost_bit(intra_inter_p, 0);
+  ref_frame_cost[LAST_FRAME] = ref_frame_cost[GOLDEN_FRAME] =
+      ref_frame_cost[ALTREF_FRAME] = vp9_cost_bit(intra_inter_p, 1);
+
+  ref_frame_cost[LAST_FRAME]   += vp9_cost_bit(ref_single_p1, 0);
+  ref_frame_cost[GOLDEN_FRAME] += vp9_cost_bit(ref_single_p1, 1);
+  ref_frame_cost[ALTREF_FRAME] += vp9_cost_bit(ref_single_p1, 1);
+  ref_frame_cost[GOLDEN_FRAME] += vp9_cost_bit(ref_single_p2, 0);
+  ref_frame_cost[ALTREF_FRAME] += vp9_cost_bit(ref_single_p2, 1);

  if (reuse_inter_pred) {
    int i;
@@ -773,6 +1162,10 @@ void vp9_pick_inter_mode(VP9_COMP *cpi, MACROBLOCK *x,
    int mode_index;
    int i;
    PREDICTION_MODE this_mode = ref_mode_set[idx].pred_mode;
+    int64_t this_sse;
+    int is_skippable;
+    int this_early_term = 0;
+
    if (!(cpi->sf.inter_mode_mask[bsize] & (1 << this_mode)))
      continue;

@@ -850,6 +1243,7 @@ void vp9_pick_inter_mode(VP9_COMP *cpi, MACROBLOCK *x,
      best_pred_sad = cpi->fn_ptr[bsize].sdf(x->plane[0].src.buf,
                                   x->plane[0].src.stride,
                                   pre_buf, pre_stride);
+      x->pred_mv_sad[LAST_FRAME] = best_pred_sad;
    }

    if (this_mode != NEARESTMV &&
@@ -924,17 +1318,54 @@ void vp9_pick_inter_mode(VP9_COMP *cpi, MACROBLOCK *x,
      var_y = pf_var[best_filter];
      sse_y = pf_sse[best_filter];
      x->skip_txfm[0] = skip_txfm;
+      if (reuse_inter_pred) {
+        pd->dst.buf = this_mode_pred->data;
+        pd->dst.stride = this_mode_pred->stride;
+      }
    } else {
      mbmi->interp_filter = (filter_ref == SWITCHABLE) ? EIGHTTAP : filter_ref;
      vp9_build_inter_predictors_sby(xd, mi_row, mi_col, bsize);
-      model_rd_for_sb_y(cpi, bsize, x, xd, &this_rdc.rate, &this_rdc.dist,
-                        &var_y, &sse_y);
-      this_rdc.rate +=
-          cm->interp_filter == SWITCHABLE ?
-              vp9_get_switchable_rate(cpi, xd) : 0;
+
+      // For large partition blocks, extra testing is done.
+      if (bsize > BLOCK_32X32 && xd->mi[0].src_mi->mbmi.segment_id != 1 &&
+          cm->base_qindex) {
+        model_rd_for_sb_y_large(cpi, bsize, x, xd, &this_rdc.rate,
+                                &this_rdc.dist, &var_y, &sse_y, mi_row, mi_col,
+                                &this_early_term);
+      } else {
+        model_rd_for_sb_y(cpi, bsize, x, xd, &this_rdc.rate, &this_rdc.dist,
+                          &var_y, &sse_y);
+      }
+    }
+
+    if (!this_early_term) {
+      this_sse = (int64_t)sse_y;
+      block_yrd(cpi, x, &this_rdc.rate, &this_rdc.dist, &is_skippable,
+                &this_sse, 0, bsize, MIN(mbmi->tx_size, TX_16X16));
+      x->skip_txfm[0] = is_skippable;
+      if (is_skippable) {
+        this_rdc.rate = vp9_cost_bit(vp9_get_skip_prob(cm, xd), 1);
+      } else {
+        if (RDCOST(x->rdmult, x->rddiv, this_rdc.rate, this_rdc.dist) <
+            RDCOST(x->rdmult, x->rddiv, 0, this_sse)) {
+          this_rdc.rate += vp9_cost_bit(vp9_get_skip_prob(cm, xd), 0);
+        } else {
+          this_rdc.rate = vp9_cost_bit(vp9_get_skip_prob(cm, xd), 1);
+          this_rdc.dist = this_sse;
+          x->skip_txfm[0] = 1;
+        }
+      }
+
+      if (cm->interp_filter == SWITCHABLE) {
+        if ((mbmi->mv[0].as_mv.row | mbmi->mv[0].as_mv.col) & 0x07)
+          this_rdc.rate += vp9_get_switchable_rate(cpi, xd);
+      }
+    } else {
+      this_rdc.rate += cm->interp_filter == SWITCHABLE ?
+          vp9_get_switchable_rate(cpi, xd) : 0;
+      this_rdc.rate += vp9_cost_bit(vp9_get_skip_prob(cm, xd), 1);
    }

-    // chroma component rate-distortion cost modeling
    if (x->color_sensitivity[0] || x->color_sensitivity[1]) {
      int uv_rate = 0;
      int64_t uv_dist = 0;
@@ -942,7 +1373,8 @@ void vp9_pick_inter_mode(VP9_COMP *cpi, MACROBLOCK *x,
        vp9_build_inter_predictors_sbp(xd, mi_row, mi_col, bsize, 1);
      if (x->color_sensitivity[1])
        vp9_build_inter_predictors_sbp(xd, mi_row, mi_col, bsize, 2);
-      model_rd_for_sb_uv(cpi, bsize, x, xd, &uv_rate, &uv_dist, &var_y, &sse_y);
+      model_rd_for_sb_uv(cpi, bsize, x, xd, &uv_rate, &uv_dist,
+                         &var_y, &sse_y);
      this_rdc.rate += uv_rate;
      this_rdc.dist += uv_dist;
    }
@@ -981,6 +1413,7 @@ void vp9_pick_inter_mode(VP9_COMP *cpi, MACROBLOCK *x,
      best_tx_size = mbmi->tx_size;
      best_ref_frame = ref_frame;
      best_mode_skip_txfm = x->skip_txfm[0];
+      best_early_term = this_early_term;

      if (reuse_inter_pred) {
        free_pred_buffer(best_pred);
@@ -993,6 +1426,13 @@ void vp9_pick_inter_mode(VP9_COMP *cpi, MACROBLOCK *x,

    if (x->skip)
      break;
+
+    // If early termination flag is 1 and at least 2 modes are checked,
+    // the mode search is terminated.
+    if (best_early_term && idx > 0) {
+      x->skip = 1;
+      break;
+    }
  }

  mbmi->mode          = best_mode;
@@ -1041,6 +1481,8 @@ void vp9_pick_inter_mode(VP9_COMP *cpi, MACROBLOCK *x,
      const PREDICTION_MODE this_mode = intra_mode_list[i];
      if (!((1 << this_mode) & cpi->sf.intra_y_mode_mask[intra_tx_size]))
        continue;
+      mbmi->mode = this_mode;
+      mbmi->ref_frame[0] = INTRA_FRAME;
      args.mode = this_mode;
      args.rate = 0;
      args.dist = 0;
@@ -1057,17 +1499,17 @@ void vp9_pick_inter_mode(VP9_COMP *cpi, MACROBLOCK *x,

      if (this_rdc.rdcost < best_rdc.rdcost) {
        best_rdc = this_rdc;
-        mbmi->mode = this_mode;
+        best_mode = this_mode;
        best_intra_tx_size = mbmi->tx_size;
-        mbmi->ref_frame[0] = INTRA_FRAME;
+        best_ref_frame = INTRA_FRAME;
        mbmi->uv_mode = this_mode;
        mbmi->mv[0].as_int = INVALID_MV;
+        best_mode_skip_txfm = x->skip_txfm[0];
      }
    }

    // Reset mb_mode_info to the best inter mode.
-    if (mbmi->ref_frame[0] != INTRA_FRAME) {
-      x->skip_txfm[0] = best_mode_skip_txfm;
+    if (best_ref_frame != INTRA_FRAME) {
      mbmi->tx_size = best_tx_size;
    } else {
      mbmi->tx_size = best_intra_tx_size;
@@ -1075,6 +1517,9 @@ void vp9_pick_inter_mode(VP9_COMP *cpi, MACROBLOCK *x,
  }

  pd->dst = orig_dst;
+  mbmi->mode = best_mode;
+  mbmi->ref_frame[0] = best_ref_frame;
+  x->skip_txfm[0] = best_mode_skip_txfm;

  if (reuse_inter_pred && best_pred != NULL) {
    if (best_pred->data != orig_dst.buf && is_inter_mode(mbmi->mode)) {
--- a/vp9/encoder/vp9_rd.c
+++ b/vp9/encoder/vp9_rd.c
@@ -457,6 +457,7 @@ void vp9_mv_pred(VP9_COMP *cpi, MACROBLOCK *x,
  int best_sad = INT_MAX;
  int this_sad = INT_MAX;
  int max_mv = 0;
+  int near_same_nearest;
  uint8_t *src_y_ptr = x->plane[0].src.buf;
  uint8_t *ref_y_ptr;
  const int num_mv_refs = MAX_MV_REF_CANDIDATES +
@@ -469,23 +470,27 @@ void vp9_mv_pred(VP9_COMP *cpi, MACROBLOCK *x,
  pred_mv[2] = x->pred_mv[ref_frame];
  assert(num_mv_refs <= (int)(sizeof(pred_mv) / sizeof(pred_mv[0])));

+  near_same_nearest =
+      mbmi->ref_mvs[ref_frame][0].as_int == mbmi->ref_mvs[ref_frame][1].as_int;
  // Get the sad for each candidate reference mv.
  for (i = 0; i < num_mv_refs; ++i) {
    const MV *this_mv = &pred_mv[i];
+    int fp_row, fp_col;

-    max_mv = MAX(max_mv, MAX(abs(this_mv->row), abs(this_mv->col)) >> 3);
-    if (is_zero_mv(this_mv) && zero_seen)
+    if (i == 1 && near_same_nearest)
      continue;
+    fp_row = (this_mv->row + 3 + (this_mv->row >= 0)) >> 3;
+    fp_col = (this_mv->col + 3 + (this_mv->col >= 0)) >> 3;
+    max_mv = MAX(max_mv, MAX(abs(this_mv->row), abs(this_mv->col)) >> 3);

-    zero_seen |= is_zero_mv(this_mv);
-
-    ref_y_ptr =
-        &ref_y_buffer[ref_y_stride * (this_mv->row >> 3) + (this_mv->col >> 3)];
+    if (fp_row ==0 && fp_col == 0 && zero_seen)
+      continue;
+    zero_seen |= (fp_row ==0 && fp_col == 0);

+    ref_y_ptr =&ref_y_buffer[ref_y_stride * fp_row + fp_col];
    // Find sad for current vector.
    this_sad = cpi->fn_ptr[block_size].sdf(src_y_ptr, x->plane[0].src.stride,
                                           ref_y_ptr, ref_y_stride);
-
    // Note if it is the best so far.
    if (this_sad < best_sad) {
      best_sad = this_sad;
--- a/vp9/encoder/vp9_rdopt.c
+++ b/vp9/encoder/vp9_rdopt.c
@@ -292,6 +292,18 @@ int64_t vp9_block_error_c(const tran_low_t *coeff, const tran_low_t *dqcoeff,
  return error;
 }

+int64_t vp9_block_error_fp_c(const int16_t *coeff, const int16_t *dqcoeff,
+                             int block_size) {
+  int i;
+  int64_t error = 0;
+
+  for (i = 0; i < block_size; i++) {
+    const int diff = coeff[i] - dqcoeff[i];
+    error +=  diff * diff;
+  }
+
+  return error;
+}

 #if CONFIG_VP9_HIGHBITDEPTH
 int64_t vp9_highbd_block_error_c(const tran_low_t *coeff,
@@ -1549,13 +1561,6 @@ static void joint_motion_search(VP9_COMP *cpi, MACROBLOCK *x,
                       mbmi->ref_frame[1] < 0 ? 0 : mbmi->ref_frame[1]};
  int_mv ref_mv[2];
  int ite, ref;
-  // Prediction buffer from second frame.
-#if CONFIG_VP9_HIGHBITDEPTH
-  uint8_t *second_pred;
-  uint8_t *second_pred_alloc;
-#else
-  uint8_t *second_pred = vpx_memalign(16, pw * ph * sizeof(uint8_t));
-#endif  // CONFIG_VP9_HIGHBITDEPTH
  const InterpKernel *kernel = vp9_get_interp_kernel(mbmi->interp_filter);
  struct scale_factors sf;

@@ -1566,14 +1571,13 @@ static void joint_motion_search(VP9_COMP *cpi, MACROBLOCK *x,
    vp9_get_scaled_ref_frame(cpi, mbmi->ref_frame[0]),
    vp9_get_scaled_ref_frame(cpi, mbmi->ref_frame[1])
  };
+
+  // Prediction buffer from second frame.
 #if CONFIG_VP9_HIGHBITDEPTH
-  if (xd->cur_buf->flags & YV12_FLAG_HIGHBITDEPTH) {
-    second_pred_alloc = vpx_memalign(16, pw * ph * sizeof(uint16_t));
-    second_pred = CONVERT_TO_BYTEPTR(second_pred_alloc);
-  } else {
-    second_pred_alloc = vpx_memalign(16, pw * ph * sizeof(uint8_t));
-    second_pred = second_pred_alloc;
-  }
+  DECLARE_ALIGNED_ARRAY(16, uint16_t, second_pred_alloc_16, 64 * 64);
+  uint8_t *second_pred;
+#else
+  DECLARE_ALIGNED_ARRAY(16, uint8_t, second_pred, 64 * 64);
 #endif  // CONFIG_VP9_HIGHBITDEPTH

  for (ref = 0; ref < 2; ++ref) {
@@ -1628,6 +1632,7 @@ static void joint_motion_search(VP9_COMP *cpi, MACROBLOCK *x,
    // Get the prediction block from the 'other' reference frame.
 #if CONFIG_VP9_HIGHBITDEPTH
    if (xd->cur_buf->flags & YV12_FLAG_HIGHBITDEPTH) {
+      second_pred = CONVERT_TO_BYTEPTR(second_pred_alloc_16);
      vp9_highbd_build_inter_predictor(ref_yv12[!id].buf,
                                       ref_yv12[!id].stride,
                                       second_pred, pw,
@@ -1637,6 +1642,7 @@ static void joint_motion_search(VP9_COMP *cpi, MACROBLOCK *x,
                                       mi_col * MI_SIZE, mi_row * MI_SIZE,
                                       xd->bd);
    } else {
+      second_pred = (uint8_t *)second_pred_alloc_16;
      vp9_build_inter_predictor(ref_yv12[!id].buf,
                                ref_yv12[!id].stride,
                                second_pred, pw,
@@ -1722,12 +1728,6 @@ static void joint_motion_search(VP9_COMP *cpi, MACROBLOCK *x,
                                &mbmi->ref_mvs[refs[ref]][0].as_mv,
                                x->nmvjointcost, x->mvcost, MV_COST_WEIGHT);
  }
-
-#if CONFIG_VP9_HIGHBITDEPTH
-  vpx_free(second_pred_alloc);
-#else
-  vpx_free(second_pred);
-#endif  // CONFIG_VP9_HIGHBITDEPTH
 }

 static int64_t rd_pick_best_sub8x8_mode(VP9_COMP *cpi, MACROBLOCK *x,
@@ -2422,7 +2422,6 @@ static int64_t handle_inter_mode(VP9_COMP *cpi, MACROBLOCK *x,
  int_mv cur_mv[2];
 #if CONFIG_VP9_HIGHBITDEPTH
  DECLARE_ALIGNED_ARRAY(16, uint16_t, tmp_buf16, MAX_MB_PLANE * 64 * 64);
-  DECLARE_ALIGNED_ARRAY(16, uint8_t, tmp_buf8, MAX_MB_PLANE * 64 * 64);
  uint8_t *tmp_buf;
 #else
  DECLARE_ALIGNED_ARRAY(16, uint8_t, tmp_buf, MAX_MB_PLANE * 64 * 64);
@@ -2451,7 +2450,7 @@ static int64_t handle_inter_mode(VP9_COMP *cpi, MACROBLOCK *x,
  if (xd->cur_buf->flags & YV12_FLAG_HIGHBITDEPTH) {
    tmp_buf = CONVERT_TO_BYTEPTR(tmp_buf16);
  } else {
-    tmp_buf = tmp_buf8;
+    tmp_buf = (uint8_t *)tmp_buf16;
  }
 #endif  // CONFIG_VP9_HIGHBITDEPTH

@@ -2831,6 +2830,65 @@ void vp9_rd_pick_intra_mode_sb(VP9_COMP *cpi, MACROBLOCK *x,
  rd_cost->rdcost = RDCOST(x->rdmult, x->rddiv, rd_cost->rate, rd_cost->dist);
 }

+// This function is designed to apply a bias or adjustment to an rd value based
+// on the relative variance of the source and reconstruction.
+#define LOW_VAR_THRESH 16
+#define VLOW_ADJ_MAX 25
+#define VHIGH_ADJ_MAX 8
+static void rd_variance_adjustment(VP9_COMP *cpi,
+                                   MACROBLOCK *x,
+                                   BLOCK_SIZE bsize,
+                                   int64_t *this_rd,
+                                   MV_REFERENCE_FRAME ref_frame,
+                                   unsigned int source_variance) {
+  MACROBLOCKD *const xd = &x->e_mbd;
+  unsigned int recon_variance;
+  unsigned int absvar_diff = 0;
+  int64_t var_error = 0;
+  int64_t var_factor = 0;
+
+  if (*this_rd == INT64_MAX)
+    return;
+
+#if CONFIG_VP9_HIGHBITDEPTH
+  if (xd->cur_buf->flags & YV12_FLAG_HIGHBITDEPTH) {
+    recon_variance =
+      vp9_high_get_sby_perpixel_variance(cpi, &xd->plane[0].dst, bsize, xd->bd);
+  } else {
+    recon_variance =
+      vp9_get_sby_perpixel_variance(cpi, &xd->plane[0].dst, bsize);
+  }
+#else
+  recon_variance =
+    vp9_get_sby_perpixel_variance(cpi, &xd->plane[0].dst, bsize);
+#endif  // CONFIG_VP9_HIGHBITDEPTH
+
+  if ((source_variance + recon_variance) > LOW_VAR_THRESH) {
+    absvar_diff = (source_variance > recon_variance)
+      ? (source_variance - recon_variance)
+      : (recon_variance - source_variance);
+
+    var_error = (200 * source_variance * recon_variance) /
+      ((source_variance * source_variance) +
+       (recon_variance * recon_variance));
+    var_error = 100 - var_error;
+  }
+
+  // Source variance above a threshold and ref frame is intra.
+  // This case is targeted mainly at discouraging intra modes that give rise
+  // to a predictor with a low spatial complexity compared to the source.
+  if ((source_variance > LOW_VAR_THRESH) && (ref_frame == INTRA_FRAME) &&
+      (source_variance > recon_variance)) {
+    var_factor = MIN(absvar_diff, MIN(VLOW_ADJ_MAX, var_error));
+  // A second possible case of interest is where the source variance
+  // is very low and we wish to discourage false texture or motion trails.
+  } else if ((source_variance < (LOW_VAR_THRESH >> 1)) &&
+             (recon_variance > source_variance)) {
+    var_factor = MIN(absvar_diff, MIN(VHIGH_ADJ_MAX, var_error));
+  }
+  *this_rd += (*this_rd * var_factor) / 100;
+}
+
 void vp9_rd_pick_inter_mode_sb(VP9_COMP *cpi,
                               TileDataEnc *tile_data,
                               MACROBLOCK *x,
@@ -3287,6 +3345,11 @@ void vp9_rd_pick_inter_mode_sb(VP9_COMP *cpi,
      this_rd = RDCOST(x->rdmult, x->rddiv, rate2, distortion2);
    }

+    // Apply an adjustment to the rd value based on the similarity of the
+    // source variance and reconstructed variance.
+    rd_variance_adjustment(cpi, x, bsize, &this_rd,
+                           ref_frame, x->source_variance);
+
    if (ref_frame == INTRA_FRAME) {
    // Keep record of best intra rd
      if (this_rd < best_intra_rd) {
--- a/vp9/encoder/vp9_rdopt.h
+++ b/vp9/encoder/vp9_rdopt.h
@@ -29,6 +29,15 @@ void vp9_rd_pick_intra_mode_sb(struct VP9_COMP *cpi, struct macroblock *x,
                               struct RD_COST *rd_cost, BLOCK_SIZE bsize,
                               PICK_MODE_CONTEXT *ctx, int64_t best_rd);

+unsigned int vp9_get_sby_perpixel_variance(VP9_COMP *cpi,
+                                           const struct buf_2d *ref,
+                                           BLOCK_SIZE bs);
+#if CONFIG_VP9_HIGHBITDEPTH
+unsigned int vp9_high_get_sby_perpixel_variance(VP9_COMP *cpi,
+                                                const struct buf_2d *ref,
+                                                BLOCK_SIZE bs, int bd);
+#endif
+
 void vp9_rd_pick_inter_mode_sb(struct VP9_COMP *cpi,
                               struct TileDataEnc *tile_data,
                               struct macroblock *x,
--- a/vp9/encoder/vp9_speed_features.c
+++ b/vp9/encoder/vp9_speed_features.c
@@ -301,7 +301,7 @@ static void set_rt_speed_feature(VP9_COMP *cpi, SPEED_FEATURES *sf,
        (frames_since_key % (sf->last_partitioning_redo_frequency << 1) == 1);
    sf->max_delta_qindex = is_keyframe ? 20 : 15;
    sf->partition_search_type = REFERENCE_PARTITION;
-    sf->use_nonrd_pick_mode = !is_keyframe;
+    sf->use_nonrd_pick_mode = 1;
    sf->allow_skip_recode = 0;
    sf->inter_mode_mask[BLOCK_32X32] = INTER_NEAREST_NEW_ZERO;
    sf->inter_mode_mask[BLOCK_32X64] = INTER_NEAREST_NEW_ZERO;
--- a/vp9/encoder/vp9_tokenize.c
+++ b/vp9/encoder/vp9_tokenize.c
@@ -65,18 +65,6 @@ const vp9_tree_index vp9_coef_tree[TREE_SIZE(ENTROPY_TOKENS)] = {
  -CATEGORY5_TOKEN, -CATEGORY6_TOKEN   // 10 = CAT_FIVE
 };

-// Unconstrained Node Tree
-const vp9_tree_index vp9_coef_con_tree[TREE_SIZE(ENTROPY_TOKENS)] = {
-  2, 6,                                // 0 = LOW_VAL
-  -TWO_TOKEN, 4,                       // 1 = TWO
-  -THREE_TOKEN, -FOUR_TOKEN,           // 2 = THREE
-  8, 10,                               // 3 = HIGH_LOW
-  -CATEGORY1_TOKEN, -CATEGORY2_TOKEN,  // 4 = CAT_ONE
-  12, 14,                              // 5 = CAT_THREEFOUR
-  -CATEGORY3_TOKEN, -CATEGORY4_TOKEN,  // 6 = CAT_THREE
-  -CATEGORY5_TOKEN, -CATEGORY6_TOKEN   // 7 = CAT_FIVE
-};
-
 static const vp9_tree_index cat1[2] = {0, 0};
 static const vp9_tree_index cat2[4] = {2, 2, 0, 0};
 static const vp9_tree_index cat3[6] = {2, 2, 4, 4, 0, 0};
--- a/vp9/encoder/x86/vp9_avg_intrin_sse2.c
+++ b/vp9/encoder/x86/vp9_avg_intrin_sse2.c
@@ -57,6 +57,179 @@ unsigned int vp9_avg_4x4_sse2(const uint8_t *s, int p) {
  return (avg + 8) >> 4;
 }

+static void hadamard_col8_sse2(__m128i *in, int iter) {
+  __m128i a0 = in[0];
+  __m128i a1 = in[1];
+  __m128i a2 = in[2];
+  __m128i a3 = in[3];
+  __m128i a4 = in[4];
+  __m128i a5 = in[5];
+  __m128i a6 = in[6];
+  __m128i a7 = in[7];
+
+  __m128i b0 = _mm_add_epi16(a0, a1);
+  __m128i b1 = _mm_sub_epi16(a0, a1);
+  __m128i b2 = _mm_add_epi16(a2, a3);
+  __m128i b3 = _mm_sub_epi16(a2, a3);
+  __m128i b4 = _mm_add_epi16(a4, a5);
+  __m128i b5 = _mm_sub_epi16(a4, a5);
+  __m128i b6 = _mm_add_epi16(a6, a7);
+  __m128i b7 = _mm_sub_epi16(a6, a7);
+
+  a0 = _mm_add_epi16(b0, b2);
+  a1 = _mm_add_epi16(b1, b3);
+  a2 = _mm_sub_epi16(b0, b2);
+  a3 = _mm_sub_epi16(b1, b3);
+  a4 = _mm_add_epi16(b4, b6);
+  a5 = _mm_add_epi16(b5, b7);
+  a6 = _mm_sub_epi16(b4, b6);
+  a7 = _mm_sub_epi16(b5, b7);
+
+  if (iter == 0) {
+    b0 = _mm_add_epi16(a0, a4);
+    b7 = _mm_add_epi16(a1, a5);
+    b3 = _mm_add_epi16(a2, a6);
+    b4 = _mm_add_epi16(a3, a7);
+    b2 = _mm_sub_epi16(a0, a4);
+    b6 = _mm_sub_epi16(a1, a5);
+    b1 = _mm_sub_epi16(a2, a6);
+    b5 = _mm_sub_epi16(a3, a7);
+
+    a0 = _mm_unpacklo_epi16(b0, b1);
+    a1 = _mm_unpacklo_epi16(b2, b3);
+    a2 = _mm_unpackhi_epi16(b0, b1);
+    a3 = _mm_unpackhi_epi16(b2, b3);
+    a4 = _mm_unpacklo_epi16(b4, b5);
+    a5 = _mm_unpacklo_epi16(b6, b7);
+    a6 = _mm_unpackhi_epi16(b4, b5);
+    a7 = _mm_unpackhi_epi16(b6, b7);
+
+    b0 = _mm_unpacklo_epi32(a0, a1);
+    b1 = _mm_unpacklo_epi32(a4, a5);
+    b2 = _mm_unpackhi_epi32(a0, a1);
+    b3 = _mm_unpackhi_epi32(a4, a5);
+    b4 = _mm_unpacklo_epi32(a2, a3);
+    b5 = _mm_unpacklo_epi32(a6, a7);
+    b6 = _mm_unpackhi_epi32(a2, a3);
+    b7 = _mm_unpackhi_epi32(a6, a7);
+
+    in[0] = _mm_unpacklo_epi64(b0, b1);
+    in[1] = _mm_unpackhi_epi64(b0, b1);
+    in[2] = _mm_unpacklo_epi64(b2, b3);
+    in[3] = _mm_unpackhi_epi64(b2, b3);
+    in[4] = _mm_unpacklo_epi64(b4, b5);
+    in[5] = _mm_unpackhi_epi64(b4, b5);
+    in[6] = _mm_unpacklo_epi64(b6, b7);
+    in[7] = _mm_unpackhi_epi64(b6, b7);
+  } else {
+    in[0] = _mm_add_epi16(a0, a4);
+    in[7] = _mm_add_epi16(a1, a5);
+    in[3] = _mm_add_epi16(a2, a6);
+    in[4] = _mm_add_epi16(a3, a7);
+    in[2] = _mm_sub_epi16(a0, a4);
+    in[6] = _mm_sub_epi16(a1, a5);
+    in[1] = _mm_sub_epi16(a2, a6);
+    in[5] = _mm_sub_epi16(a3, a7);
+  }
+}
+
+void vp9_hadamard_8x8_sse2(int16_t const *src_diff, int src_stride,
+                           int16_t *coeff) {
+  __m128i src[8];
+  src[0] = _mm_load_si128((const __m128i *)src_diff);
+  src[1] = _mm_load_si128((const __m128i *)(src_diff += src_stride));
+  src[2] = _mm_load_si128((const __m128i *)(src_diff += src_stride));
+  src[3] = _mm_load_si128((const __m128i *)(src_diff += src_stride));
+  src[4] = _mm_load_si128((const __m128i *)(src_diff += src_stride));
+  src[5] = _mm_load_si128((const __m128i *)(src_diff += src_stride));
+  src[6] = _mm_load_si128((const __m128i *)(src_diff += src_stride));
+  src[7] = _mm_load_si128((const __m128i *)(src_diff += src_stride));
+
+  hadamard_col8_sse2(src, 0);
+  hadamard_col8_sse2(src, 1);
+
+  _mm_store_si128((__m128i *)coeff, src[0]);
+  coeff += 8;
+  _mm_store_si128((__m128i *)coeff, src[1]);
+  coeff += 8;
+  _mm_store_si128((__m128i *)coeff, src[2]);
+  coeff += 8;
+  _mm_store_si128((__m128i *)coeff, src[3]);
+  coeff += 8;
+  _mm_store_si128((__m128i *)coeff, src[4]);
+  coeff += 8;
+  _mm_store_si128((__m128i *)coeff, src[5]);
+  coeff += 8;
+  _mm_store_si128((__m128i *)coeff, src[6]);
+  coeff += 8;
+  _mm_store_si128((__m128i *)coeff, src[7]);
+}
+
+void vp9_hadamard_16x16_sse2(int16_t const *src_diff, int src_stride,
+                             int16_t *coeff) {
+  int idx;
+  for (idx = 0; idx < 4; ++idx) {
+    int16_t const *src_ptr = src_diff + (idx >> 1) * 8 * src_stride
+                                + (idx & 0x01) * 8;
+    vp9_hadamard_8x8_sse2(src_ptr, src_stride, coeff + idx * 64);
+  }
+
+  for (idx = 0; idx < 64; idx += 8) {
+    __m128i coeff0 = _mm_load_si128((const __m128i *)coeff);
+    __m128i coeff1 = _mm_load_si128((const __m128i *)(coeff + 64));
+    __m128i coeff2 = _mm_load_si128((const __m128i *)(coeff + 128));
+    __m128i coeff3 = _mm_load_si128((const __m128i *)(coeff + 192));
+
+    __m128i b0 = _mm_add_epi16(coeff0, coeff1);
+    __m128i b1 = _mm_sub_epi16(coeff0, coeff1);
+    __m128i b2 = _mm_add_epi16(coeff2, coeff3);
+    __m128i b3 = _mm_sub_epi16(coeff2, coeff3);
+
+    coeff0 = _mm_add_epi16(b0, b2);
+    coeff1 = _mm_add_epi16(b1, b3);
+    coeff0 = _mm_srai_epi16(coeff0, 1);
+    coeff1 = _mm_srai_epi16(coeff1, 1);
+    _mm_store_si128((__m128i *)coeff, coeff0);
+    _mm_store_si128((__m128i *)(coeff + 64), coeff1);
+
+    coeff2 = _mm_sub_epi16(b0, b2);
+    coeff3 = _mm_sub_epi16(b1, b3);
+    coeff2 = _mm_srai_epi16(coeff2, 1);
+    coeff3 = _mm_srai_epi16(coeff3, 1);
+    _mm_store_si128((__m128i *)(coeff + 128), coeff2);
+    _mm_store_si128((__m128i *)(coeff + 192), coeff3);
+
+    coeff += 8;
+  }
+}
+
+int16_t vp9_satd_sse2(const int16_t *coeff, int length) {
+  int i;
+  __m128i sum = _mm_load_si128((const __m128i *)coeff);
+  __m128i sign = _mm_srai_epi16(sum, 15);
+  __m128i val = _mm_xor_si128(sum, sign);
+  sum = _mm_sub_epi16(val, sign);
+  coeff += 8;
+
+  for (i = 8; i < length; i += 8) {
+    __m128i src_line = _mm_load_si128((const __m128i *)coeff);
+    sign = _mm_srai_epi16(src_line, 15);
+    val = _mm_xor_si128(src_line, sign);
+    val = _mm_sub_epi16(val, sign);
+    sum = _mm_add_epi16(sum, val);
+    coeff += 8;
+  }
+
+  val = _mm_srli_si128(sum, 8);
+  sum = _mm_add_epi16(sum, val);
+  val = _mm_srli_epi64(sum, 32);
+  sum = _mm_add_epi16(sum, val);
+  val = _mm_srli_epi32(sum, 16);
+  sum = _mm_add_epi16(sum, val);
+
+  return _mm_extract_epi16(sum, 0);
+}
+
 void vp9_int_pro_row_sse2(int16_t *hbuf, uint8_t const*ref,
                          const int ref_stride, const int height) {
  int idx;
--- a/vp9/encoder/x86/vp9_dct_ssse3.c
+++ b/vp9/encoder/x86/vp9_dct_ssse3.c
@@ -293,7 +293,8 @@ void vp9_fdct8x8_quant_ssse3(const int16_t *input, int stride,

  if (!skip_block) {
    __m128i eob;
-    __m128i round, quant, dequant;
+    __m128i round, quant, dequant, thr;
+    int16_t nzflag;
    {
      __m128i coeff0, coeff1;

@@ -368,6 +369,7 @@ void vp9_fdct8x8_quant_ssse3(const int16_t *input, int stride,

    // AC only loop
    index = 2;
+    thr = _mm_srai_epi16(dequant, 1);
    while (n_coeffs < 0) {
      __m128i coeff0, coeff1;
      {
@@ -387,28 +389,39 @@ void vp9_fdct8x8_quant_ssse3(const int16_t *input, int stride,
        qcoeff0 = _mm_sub_epi16(qcoeff0, coeff0_sign);
        qcoeff1 = _mm_sub_epi16(qcoeff1, coeff1_sign);

-        qcoeff0 = _mm_adds_epi16(qcoeff0, round);
-        qcoeff1 = _mm_adds_epi16(qcoeff1, round);
-        qtmp0 = _mm_mulhi_epi16(qcoeff0, quant);
-        qtmp1 = _mm_mulhi_epi16(qcoeff1, quant);
+        nzflag = _mm_movemask_epi8(_mm_cmpgt_epi16(qcoeff0, thr)) |
+            _mm_movemask_epi8(_mm_cmpgt_epi16(qcoeff1, thr));

-        // Reinsert signs
-        qcoeff0 = _mm_xor_si128(qtmp0, coeff0_sign);
-        qcoeff1 = _mm_xor_si128(qtmp1, coeff1_sign);
-        qcoeff0 = _mm_sub_epi16(qcoeff0, coeff0_sign);
-        qcoeff1 = _mm_sub_epi16(qcoeff1, coeff1_sign);
+        if (nzflag) {
+          qcoeff0 = _mm_adds_epi16(qcoeff0, round);
+          qcoeff1 = _mm_adds_epi16(qcoeff1, round);
+          qtmp0 = _mm_mulhi_epi16(qcoeff0, quant);
+          qtmp1 = _mm_mulhi_epi16(qcoeff1, quant);

-        _mm_store_si128((__m128i*)(qcoeff_ptr + n_coeffs), qcoeff0);
-        _mm_store_si128((__m128i*)(qcoeff_ptr + n_coeffs) + 1, qcoeff1);
+          // Reinsert signs
+          qcoeff0 = _mm_xor_si128(qtmp0, coeff0_sign);
+          qcoeff1 = _mm_xor_si128(qtmp1, coeff1_sign);
+          qcoeff0 = _mm_sub_epi16(qcoeff0, coeff0_sign);
+          qcoeff1 = _mm_sub_epi16(qcoeff1, coeff1_sign);

-        coeff0 = _mm_mullo_epi16(qcoeff0, dequant);
-        coeff1 = _mm_mullo_epi16(qcoeff1, dequant);
+          _mm_store_si128((__m128i*)(qcoeff_ptr + n_coeffs), qcoeff0);
+          _mm_store_si128((__m128i*)(qcoeff_ptr + n_coeffs) + 1, qcoeff1);

-        _mm_store_si128((__m128i*)(dqcoeff_ptr + n_coeffs), coeff0);
-        _mm_store_si128((__m128i*)(dqcoeff_ptr + n_coeffs) + 1, coeff1);
+          coeff0 = _mm_mullo_epi16(qcoeff0, dequant);
+          coeff1 = _mm_mullo_epi16(qcoeff1, dequant);
+
+          _mm_store_si128((__m128i*)(dqcoeff_ptr + n_coeffs), coeff0);
+          _mm_store_si128((__m128i*)(dqcoeff_ptr + n_coeffs) + 1, coeff1);
+        } else {
+          _mm_store_si128((__m128i*)(qcoeff_ptr + n_coeffs), zero);
+          _mm_store_si128((__m128i*)(qcoeff_ptr + n_coeffs) + 1, zero);
+
+          _mm_store_si128((__m128i*)(dqcoeff_ptr + n_coeffs), zero);
+          _mm_store_si128((__m128i*)(dqcoeff_ptr + n_coeffs) + 1, zero);
+        }
      }

-      {
+      if (nzflag) {
        // Scan for eob
        __m128i zero_coeff0, zero_coeff1;
        __m128i nzero_coeff0, nzero_coeff1;
--- a/vp9/encoder/x86/vp9_error_sse2.asm
+++ b/vp9/encoder/x86/vp9_error_sse2.asm
@@ -72,3 +72,49 @@ cglobal block_error, 3, 3, 8, uqc, dqc, size, ssz
  movd    edx, m5
 %endif
  RET
+
+; Compute the sum of squared difference between two int16_t vectors.
+; int64_t vp9_block_error_fp(int16_t *coeff, int16_t *dqcoeff,
+;                            intptr_t block_size)
+
+INIT_XMM sse2
+cglobal block_error_fp, 3, 3, 8, uqc, dqc, size
+  pxor      m4, m4                 ; sse accumulator
+  pxor      m5, m5                 ; dedicated zero register
+  lea     uqcq, [uqcq+sizeq*2]
+  lea     dqcq, [dqcq+sizeq*2]
+  neg    sizeq
+.loop:
+  mova      m2, [uqcq+sizeq*2]
+  mova      m0, [dqcq+sizeq*2]
+  mova      m3, [uqcq+sizeq*2+mmsize]
+  mova      m1, [dqcq+sizeq*2+mmsize]
+  psubw     m0, m2
+  psubw     m1, m3
+  ; individual errors are max. 15bit+sign, so squares are 30bit, and
+  ; thus the sum of 2 should fit in a 31bit integer (+ unused sign bit)
+  pmaddwd   m0, m0
+  pmaddwd   m1, m1
+  ; accumulate in 64bit
+  punpckldq m7, m0, m5
+  punpckhdq m0, m5
+  paddq     m4, m7
+  punpckldq m7, m1, m5
+  paddq     m4, m0
+  punpckhdq m1, m5
+  paddq     m4, m7
+  paddq     m4, m1
+  add    sizeq, mmsize
+  jl .loop
+
+  ; accumulate horizontally and store in return value
+  movhlps   m5, m4
+  paddq     m4, m5
+%if ARCH_X86_64
+  movq    rax, m4
+%else
+  pshufd   m5, m4, 0x1
+  movd    eax, m4
+  movd    edx, m5
+%endif
+  RET
--- a/vp9/encoder/x86/vp9_quantize_sse2.c
+++ b/vp9/encoder/x86/vp9_quantize_sse2.c
@@ -230,6 +230,8 @@ void vp9_quantize_fp_sse2(const int16_t* coeff_ptr, intptr_t n_coeffs,
                          const int16_t* scan_ptr,
                          const int16_t* iscan_ptr) {
  __m128i zero;
+  __m128i thr;
+  int16_t nzflag;
  (void)scan_ptr;
  (void)zbin_ptr;
  (void)quant_shift_ptr;
@@ -316,6 +318,8 @@ void vp9_quantize_fp_sse2(const int16_t* coeff_ptr, intptr_t n_coeffs,
      n_coeffs += 8 * 2;
    }

+    thr = _mm_srai_epi16(dequant, 1);
+
    // AC only loop
    while (n_coeffs < 0) {
      __m128i coeff0, coeff1;
@@ -335,28 +339,39 @@ void vp9_quantize_fp_sse2(const int16_t* coeff_ptr, intptr_t n_coeffs,
        qcoeff0 = _mm_sub_epi16(qcoeff0, coeff0_sign);
        qcoeff1 = _mm_sub_epi16(qcoeff1, coeff1_sign);

-        qcoeff0 = _mm_adds_epi16(qcoeff0, round);
-        qcoeff1 = _mm_adds_epi16(qcoeff1, round);
-        qtmp0 = _mm_mulhi_epi16(qcoeff0, quant);
-        qtmp1 = _mm_mulhi_epi16(qcoeff1, quant);
+        nzflag = _mm_movemask_epi8(_mm_cmpgt_epi16(qcoeff0, thr)) |
+            _mm_movemask_epi8(_mm_cmpgt_epi16(qcoeff1, thr));

-        // Reinsert signs
-        qcoeff0 = _mm_xor_si128(qtmp0, coeff0_sign);
-        qcoeff1 = _mm_xor_si128(qtmp1, coeff1_sign);
-        qcoeff0 = _mm_sub_epi16(qcoeff0, coeff0_sign);
-        qcoeff1 = _mm_sub_epi16(qcoeff1, coeff1_sign);
+        if (nzflag) {
+          qcoeff0 = _mm_adds_epi16(qcoeff0, round);
+          qcoeff1 = _mm_adds_epi16(qcoeff1, round);
+          qtmp0 = _mm_mulhi_epi16(qcoeff0, quant);
+          qtmp1 = _mm_mulhi_epi16(qcoeff1, quant);

-        _mm_store_si128((__m128i*)(qcoeff_ptr + n_coeffs), qcoeff0);
-        _mm_store_si128((__m128i*)(qcoeff_ptr + n_coeffs) + 1, qcoeff1);
+          // Reinsert signs
+          qcoeff0 = _mm_xor_si128(qtmp0, coeff0_sign);
+          qcoeff1 = _mm_xor_si128(qtmp1, coeff1_sign);
+          qcoeff0 = _mm_sub_epi16(qcoeff0, coeff0_sign);
+          qcoeff1 = _mm_sub_epi16(qcoeff1, coeff1_sign);

-        coeff0 = _mm_mullo_epi16(qcoeff0, dequant);
-        coeff1 = _mm_mullo_epi16(qcoeff1, dequant);
+          _mm_store_si128((__m128i*)(qcoeff_ptr + n_coeffs), qcoeff0);
+          _mm_store_si128((__m128i*)(qcoeff_ptr + n_coeffs) + 1, qcoeff1);

-        _mm_store_si128((__m128i*)(dqcoeff_ptr + n_coeffs), coeff0);
-        _mm_store_si128((__m128i*)(dqcoeff_ptr + n_coeffs) + 1, coeff1);
+          coeff0 = _mm_mullo_epi16(qcoeff0, dequant);
+          coeff1 = _mm_mullo_epi16(qcoeff1, dequant);
+
+          _mm_store_si128((__m128i*)(dqcoeff_ptr + n_coeffs), coeff0);
+          _mm_store_si128((__m128i*)(dqcoeff_ptr + n_coeffs) + 1, coeff1);
+        } else {
+          _mm_store_si128((__m128i*)(qcoeff_ptr + n_coeffs), zero);
+          _mm_store_si128((__m128i*)(qcoeff_ptr + n_coeffs) + 1, zero);
+
+          _mm_store_si128((__m128i*)(dqcoeff_ptr + n_coeffs), zero);
+          _mm_store_si128((__m128i*)(dqcoeff_ptr + n_coeffs) + 1, zero);
+        }
      }

-      {
+      if (nzflag) {
        // Scan for eob
        __m128i zero_coeff0, zero_coeff1;
        __m128i nzero_coeff0, nzero_coeff1;
--- a/vp9/encoder/x86/vp9_quantize_ssse3_x86_64.asm
+++ b/vp9/encoder/x86/vp9_quantize_ssse3_x86_64.asm
@@ -282,6 +282,8 @@ cglobal quantize_%1, 0, %2, 15, coeff, ncoeff, skip, zbin, round, quant, \
  psignw                          m8, m9
  psignw                         m13, m10
  psrlw                           m0, m3, 2
+%else
+  psrlw                           m0, m3, 1
 %endif
  mova            [r4q+ncoeffq*2+ 0], m8
  mova            [r4q+ncoeffq*2+16], m13
@@ -302,7 +304,7 @@ cglobal quantize_%1, 0, %2, 15, coeff, ncoeff, skip, zbin, round, quant, \
  mova                           m10, [  coeffq+ncoeffq*2+16] ; m10 = c[i]
  pabsw                           m6, m9                   ; m6 = abs(m9)
  pabsw                          m11, m10                  ; m11 = abs(m10)
-%ifidn %1, fp_32x32
+
  pcmpgtw                         m7, m6,  m0
  pcmpgtw                        m12, m11, m0
  pmovmskb                       r6d, m7
@@ -310,7 +312,7 @@ cglobal quantize_%1, 0, %2, 15, coeff, ncoeff, skip, zbin, round, quant, \

  or                              r6, r2
  jz .skip_iter
-%endif
+
  pcmpeqw                         m7, m7

  paddsw                          m6, m1                   ; m6 += round
@@ -348,7 +350,6 @@ cglobal quantize_%1, 0, %2, 15, coeff, ncoeff, skip, zbin, round, quant, \
  add                        ncoeffq, mmsize
  jl .ac_only_loop

-%ifidn %1, fp_32x32
  jmp .accumulate_eob
 .skip_iter:
  mova            [r3q+ncoeffq*2+ 0], m5
@@ -357,7 +358,6 @@ cglobal quantize_%1, 0, %2, 15, coeff, ncoeff, skip, zbin, round, quant, \
  mova            [r4q+ncoeffq*2+16], m5
  add                        ncoeffq, mmsize
  jl .ac_only_loop
-%endif

 .accumulate_eob:
  ; horizontally accumulate/max eobs and write into [eob] memory pointer
--- a/vp9/vp9_cx_iface.c
+++ b/vp9/vp9_cx_iface.c
@@ -1260,6 +1260,21 @@ static vpx_codec_err_t ctrl_set_active_map(vpx_codec_alg_priv_t *ctx,
  }
 }

+static vpx_codec_err_t ctrl_get_active_map(vpx_codec_alg_priv_t *ctx,
+                                           va_list args) {
+  vpx_active_map_t *const map = va_arg(args, vpx_active_map_t *);
+
+  if (map) {
+    if (!vp9_get_active_map(ctx->cpi, map->active_map,
+                            (int)map->rows, (int)map->cols))
+      return VPX_CODEC_OK;
+    else
+      return VPX_CODEC_INVALID_PARAM;
+  } else {
+    return VPX_CODEC_INVALID_PARAM;
+  }
+}
+
 static vpx_codec_err_t ctrl_set_scale_mode(vpx_codec_alg_priv_t *ctx,
                                           va_list args) {
  vpx_scaling_mode_t *const mode = va_arg(args, vpx_scaling_mode_t *);
@@ -1417,6 +1432,7 @@ static vpx_codec_ctrl_fn_map_t encoder_ctrl_maps[] = {
 #if VPX_ENCODER_ABI_VERSION > (4 + VPX_CODEC_ABI_VERSION)
  {VP9E_GET_SVC_LAYER_ID,             ctrl_get_svc_layer_id},
 #endif
+  {VP9E_GET_ACTIVEMAP,                ctrl_get_active_map},

  { -1, NULL},
 };
--- a/vp9/vp9_dx_iface.c
+++ b/vp9/vp9_dx_iface.c
@@ -116,6 +116,9 @@ static vpx_codec_err_t decoder_destroy(vpx_codec_alg_priv_t *ctx) {
          (FrameWorkerData *)worker->data1;
      vp9_get_worker_interface()->end(worker);
      vp9_remove_common(&frame_worker_data->pbi->common);
+#if CONFIG_VP9_POSTPROC
+      vp9_free_postproc_buffers(&frame_worker_data->pbi->common);
+#endif
      vp9_decoder_remove(frame_worker_data->pbi);
      vpx_free(frame_worker_data->scratch_buffer);
 #if CONFIG_MULTITHREAD
@@ -129,8 +132,10 @@ static vpx_codec_err_t decoder_destroy(vpx_codec_alg_priv_t *ctx) {
 #endif
  }

-  if (ctx->buffer_pool)
+  if (ctx->buffer_pool) {
+    vp9_free_ref_frame_buffers(ctx->buffer_pool);
    vp9_free_internal_frame_buffers(&ctx->buffer_pool->int_frame_buffers);
+  }

  vpx_free(ctx->frame_workers);
  vpx_free(ctx->buffer_pool);
@@ -750,6 +755,8 @@ static vpx_image_t *decoder_get_frame(vpx_codec_alg_priv_t *ctx,
          (FrameWorkerData *)worker->data1;
      ctx->next_output_worker_id =
          (ctx->next_output_worker_id + 1) % ctx->num_frame_workers;
+      if (ctx->base.init_flags & VPX_CODEC_USE_POSTPROC)
+        set_ppflags(ctx, &flags);
      // Wait for the frame from worker thread.
      if (winterface->sync(worker)) {
        // Check if worker has received any frames.
--- a/vpx/internal/vpx_codec_internal.h
+++ b/vpx/internal/vpx_codec_internal.h
@@ -425,10 +425,18 @@ struct vpx_internal_error_info {
  jmp_buf          jmp;
 };

+#define CLANG_ANALYZER_NORETURN
+#if defined(__has_feature)
+#if __has_feature(attribute_analyzer_noreturn)
+#undef CLANG_ANALYZER_NORETURN
+#define CLANG_ANALYZER_NORETURN __attribute__((analyzer_noreturn))
+#endif
+#endif
+
 void vpx_internal_error(struct vpx_internal_error_info *info,
                        vpx_codec_err_t                 error,
                        const char                     *fmt,
-                        ...);
+                        ...) CLANG_ANALYZER_NORETURN;

 #ifdef __cplusplus
 }  // extern "C"
--- a/vpx/vp8cx.h
+++ b/vpx/vp8cx.h
@@ -508,6 +508,12 @@ enum vp8e_enc_control_id {
   * Supported in codecs: VP9
   */
  VP9E_SET_COLOR_SPACE,
+
+  /*!\brief Codec control function to get an Active map back from the encoder.
+   *
+   * Supported in codecs: VP9
+   */
+  VP9E_GET_ACTIVEMAP,
 };

 /*!\brief vpx 1-D scaling mode
@@ -691,6 +697,8 @@ VPX_CTRL_USE_TYPE(VP9E_SET_NOISE_SENSITIVITY,  unsigned int)
 VPX_CTRL_USE_TYPE(VP9E_SET_TUNE_CONTENT, int) /* vp9e_tune_content */

 VPX_CTRL_USE_TYPE(VP9E_SET_COLOR_SPACE, int)
+
+VPX_CTRL_USE_TYPE(VP9E_GET_ACTIVEMAP, vpx_active_map_t *)
 /*! @} - end defgroup vp8_encoder */
 #ifdef __cplusplus
 }  // extern "C"
--- a/vpx/vpx_encoder.h
+++ b/vpx/vpx_encoder.h
@@ -59,7 +59,7 @@ extern "C" {
   * types, removing or reassigning enums, adding/removing/rearranging
   * fields to structures
   */
-#define VPX_ENCODER_ABI_VERSION (4 + VPX_CODEC_ABI_VERSION) /**<\hideinitializer*/
+#define VPX_ENCODER_ABI_VERSION (4 + 1 + VPX_CODEC_ABI_VERSION) /**<\hideinitializer*/


  /*! \brief Encoder capabilities bitfield
--- a/vpxdec.c
+++ b/vpxdec.c
@@ -1080,9 +1080,6 @@ int main_loop(int argc, const char **argv_) {
        }
      }
    }
-
-    if (stop_after && frame_in >= stop_after)
-      break;
  }

  if (summary || progress) {
--- a/webmdec.cc
+++ b/webmdec.cc
@@ -63,6 +63,7 @@ int file_is_webm(struct WebmInputContext *webm_ctx,
                 struct VpxInputContext *vpx_ctx) {
  mkvparser::MkvReader *const reader = new mkvparser::MkvReader(vpx_ctx->file);
  webm_ctx->reader = reader;
+  webm_ctx->reached_eos = 0;

  mkvparser::EBMLHeader header;
  long long pos = 0;
@@ -121,6 +122,11 @@ int webm_read_frame(struct WebmInputContext *webm_ctx,
                    uint8_t **buffer,
                    size_t *bytes_in_buffer,
                    size_t *buffer_size) {
+  // This check is needed for frame parallel decoding, in which case this
+  // function could be called even after it has reached end of input stream.
+  if (webm_ctx->reached_eos) {
+    return 1;
+  }
  mkvparser::Segment *const segment =
      reinterpret_cast<mkvparser::Segment*>(webm_ctx->segment);
  const mkvparser::Cluster* cluster =
@@ -140,6 +146,7 @@ int webm_read_frame(struct WebmInputContext *webm_ctx,
      cluster = segment->GetNext(cluster);
      if (cluster == NULL || cluster->EOS()) {
        *bytes_in_buffer = 0;
+        webm_ctx->reached_eos = 1;
        return 1;
      }
      status = cluster->GetFirst(block_entry);
--- a/webmdec.h
+++ b/webmdec.h
@@ -29,6 +29,7 @@ struct WebmInputContext {
  int video_track_index;
  uint64_t timestamp_ns;
  int is_key_frame;
+  int reached_eos;
 };

 // Checks if the input is a WebM file. If so, initializes WebMInputContext so
Author	SHA1	Message	Date
Jingning Han	f734e231cc	Syntax coding Change-Id: I6cac24c4f1e44f29ffcc9b87ba1167eeb32d1b69	2015-04-15 16:48:45 -07:00
Jingning Han	208aa6158b	Remove get_nonrd_var_based_fixed_partition function This function has been replaced by other approaches and is not in use now. Change-Id: I387f45b5607d202539e482468ccc70e6c0f9341f	2015-04-09 09:49:55 -07:00
Jingning Han	25206e7b7f	Compute prediction filter type cost only when needed Skip redundant prediction filter type cost in filter search loop, if the rate value will be reset in Hadamard transform based rate distortion estimate. Change-Id: Ie5221f4bc8da9461c449df367251aeeac52c6e5d	2015-04-07 12:41:46 -07:00
Jingning Han	9922e4344a	Enable Hadamard transform based cost estimate for all block sizes This commit turns on the Hadamard transform based rate distortion estimate for all block sizes in RTC coding mode. It conditionally skips the rate distortion estimation if all zero block flag is set on. No significant encoding speed change is observed. The compression performance of speed -6 is improved by 1.7% over using it only for block sizes of 32x32 and below. Change-Id: I768145e6f05c737b05b5b5f1ee674e929532cafb	2015-04-04 09:58:45 -07:00
Jingning Han	60e01c6530	Account for eob cost in the RTC mode decision process This commit accounts for the transform block end of coefficient flag cost in the RTC mode decision process. This allows a more precise rate estimate. It also turns on the model to block sizes up to 32x32. The test sequences shows about 3% - 5% speed penalty for speed -6. The average compression performance improvement for speed -6 is 1.58% in PSNR. The compression gains for hard clips like jimredvga, mmmoving, and tacomascmv at low bit-rate range are 1.8%, 2.1%, and 3.2%, respectively. Change-Id: Ic2ae211888e25a93979eac56b274c6e5ebcc21fb	2015-04-03 10:31:51 -07:00
Jingning Han	657cabe0f7	Tune SSSE3 assembly implementation to improve quantization speed Change-Id: If0ca8b25b4800d4336e6cbc97194cd9b01c5b5a3	2015-04-01 15:28:01 -07:00
Yaowu Xu	fff4654d36	Merge "Simplify bsize calculation"	2015-04-01 15:06:55 -07:00
Jingning Han	cf4447339e	Merge "Optimize quantization simd implementation"	2015-04-01 14:55:18 -07:00
Jingning Han	a4364e5146	Merge "Simplify effective src_diff address computation"	2015-04-01 14:55:03 -07:00
Jingning Han	7acb2a8795	Merge "Refactor block_yrd function for RTC coding mode"	2015-04-01 14:54:24 -07:00
Yaowu Xu	ba91b54d7c	Simplify bsize calculation Change-Id: Ibc514684def9914c66f04cb7931f773e2b79c168	2015-04-01 12:15:06 -07:00
Jingning Han	19da916716	Simplify effective src_diff address computation Remove redundant offset calculation for effective src_diff address. Change-Id: I4aab241a36abcef7fd8adf74aed5e12b8b88e0ef	2015-04-01 12:07:47 -07:00
Jingning Han	1470529f62	Refactor block_yrd function for RTC coding mode This commit separates Hadamard transform/quantization operations from rate and distortion computation in block_yrd. This allows one to skip SATD computation when all transform blocks are quantized to zero. It also uses a new block error function that skips repeated computation of sum of squared residuals. It reduces the CPU cycles spent on block error calculation in block_yrd by 40%. Change-Id: I726acb2454b44af1c3bd95385abecac209959b10	2015-04-01 12:00:43 -07:00
Jingning Han	eed1badedd	Optimize quantization simd implementation This commit allows the quantizer to compare the AC coefficients to the quantization step size to determine if further multiplication operations are needed. It makes the quantization process 20% faster without coding statistics change. Change-Id: I735aaf6a9c0874c82175bb565b20e131464db64a	2015-04-01 11:47:09 -07:00
Yunqing Wang	a0043c6d30	Enhance the transform skipping decision-making in non-rd mode For large partition blocks(block_size > 32x32), the variance calculation is modified so that every 8x8 block's variance is stored during the calculation, which is used in the following transform skipping test. Also, the variance for every tx block is calculated. The skipping test checks all tx blocks in the partition, and sets the skip flag only if all tx blocks are skippable. If the skip flag of Y plane is 1, a quick evaluation is done on UV planes. If the current partition block is skippable in YUV planes, the mode search checks fewer inter modes and doesn't check intra modes. The rtc set borg test(at speed 6) showed that: Overall psnr: -0.527%; Avg psnr: -0.510%; ssim: -0.573%. Average single-thread speedup on rtc set was 3.5%. For 720p clips, more speedups were seen. gipsrecmotion: 13% gipsrestat: 12% vidyo: 5 - 9% dark: 15% niklas: 6% Change-Id: I8d8ebec0cb305f1de016516400bf007c3042666e	2015-04-01 09:43:40 -07:00
Yunqing Wang	fc98114761	Merge "Rename vbp thresholds"	2015-03-31 16:33:30 -07:00
Vignesh Venkatasubramanian	639955f66e	Merge "webmdec: Fix read_frame return value for calls after EOS"	2015-03-31 16:11:56 -07:00
Marco	c2b8218eba	Merge "Set postproc flags in decoder_get_frame."	2015-03-31 15:22:14 -07:00
Yunqing Wang	c28ff1a9de	Rename vbp thresholds Code refactoring Change-Id: I410fcce1bc6d95c62c474445f4c97ea8469f1e79	2015-03-31 15:14:44 -07:00
Jingning Han	502ac72233	Merge "Tuning SATD rate calculation for speed"	2015-03-31 14:24:26 -07:00
Jingning Han	1c39c5b96f	Merge "Use aligned copy in 8x8 Hadamard transform SSE2"	2015-03-31 12:16:47 -07:00
Jingning Han	fa4289522e	Merge "Allow block skip coding option in RTC mode"	2015-03-31 12:16:36 -07:00
Jingning Han	1638d7dc96	Merge "Fix 8x8 Hadamard SSE2 implementation"	2015-03-31 12:16:27 -07:00
Alex Converse	9670d766ab	Merge "VP9E_GET_ACTIVE_MAP API function."	2015-03-31 11:52:56 -07:00
Jingning Han	531468a07a	Tuning SATD rate calculation for speed This commit allows the encoder to check the eob per transform block to decide how to compute the SATD rate cost. If the entire block is quantized to zero, there is no need to add anything; if only the DC coefficient is non-zero, add its absolute value; otherwise, sum over the block. This reduces the CPU cycles spent on vp9_satd_sse2 to one third. Change-Id: I0d56044b793b286efc0875fafc0b8bf2d2047e32	2015-03-31 11:02:20 -07:00
hui su	d4f2f1dd5b	Merge "Move vp9_coef_con_tree to common/"	2015-03-31 10:51:10 -07:00
Jingning Han	014fa45298	Use aligned copy in 8x8 Hadamard transform SSE2 This reduces the 8x8 Hadamard transform cycles by 20%. Change-Id: If34c5e02f3afa42244c6efabe121f7cf5d2df41b	2015-03-31 10:21:52 -07:00
Jingning Han	db5ec37edc	Merge "Enable 16x16 Hadamard transform in SATD based mode decision"	2015-03-31 09:55:41 -07:00
Jingning Han	8c5670bb6f	Merge "Use SATD based mode decision for block sizes below 16x16"	2015-03-31 09:47:47 -07:00
Jingning Han	ebe1be9186	Allow block skip coding option in RTC mode When the estimated rate-distortion cost of skip coding mode is lower than that of sending quantized coefficients, allow the encoder to drop these coefficients. This improves the compression performance of speed -6 by 0.268% and makes the encoding speed slightly faster. Change-Id: Idff2d7ba59f27ead33dd5a0e9f68746ed3c2ab68	2015-03-31 09:32:53 -07:00
hui su	302e24cb3e	Move vp9_coef_con_tree to common/ This tree should be defined in common/, as it is needed for both encoder and decoder. Change-Id: I4f5cbc80025cf2ced14182c98f7c82dc7d0f87db	2015-03-31 09:20:46 -07:00
Marco	385ca8f741	Set postproc flags in decoder_get_frame. The postproc settings were not set in decoder_get_frame(). Change-Id: I20d23de3ea18f6df061a53d691d4095d5c62532a	2015-03-30 16:15:57 -07:00
Jingning Han	9b99eb2e12	Merge "Reuse inter prediction pixel block for Hadamard transform"	2015-03-30 16:09:38 -07:00
Jingning Han	34a996ac1e	Fix 8x8 Hadamard SSE2 implementation This commit fixes the SSE2 version 8x8 Hadamard transform alignment and makes it consistent with the C version. Change-Id: I1304e5f97e0e5ef2d798fe38081609c39f5bfe74	2015-03-30 15:54:08 -07:00
Jingning Han	26d3d3af6a	Enable 16x16 Hadamard transform in SATD based mode decision This commit replaces the 16x16 2D-DCT transform with Hadamard transform for RTC coding mode. It reduces the CPU cycles cost on 16x16 transform by 5X. Overall it makes the speed -6 encoding speed 1.5% faster without compromise on compression performance. Change-Id: If6c993831dc4c678d841edc804ff395ed37f2a1b	2015-03-30 15:43:31 -07:00
Jingning Han	f0ac5aaa08	Merge "Hadamard transform based coding mode decision process"	2015-03-30 15:43:15 -07:00
Jingning Han	b4b5af6acd	Use SATD based mode decision for block sizes below 16x16 This commit makes the encoder to select between SATD/variance as metric for mode decision. It also allows to account chroma component costs for mode decision as well. The overall encoding time increase as compared to variance based mode selection is about 15% for speed -6. The compression performance is on average 2.2% better than variance based approach, with about 5% compression performance gains for hard clips (e.g., jimredvga, nikas720p, and mmmoving) at lower bit-rate range. Change-Id: I4d04a31d36f4fcb3f5f491dacd6e7fe44cb9d815	2015-03-30 15:20:07 -07:00
Jingning Han	8a927a1b7a	Reuse inter prediction pixel block for Hadamard transform It saves one unnecessary motion compensated prediction constructed by using 8-tap filter. Change-Id: I101215131e6f38621d5935885f94cc74de6a5377	2015-03-30 15:04:33 -07:00
Jingning Han	8c411f74e0	Hadamard transform based coding mode decision process This commit uses Hadamard transform based rate-distortion cost estimate for rtc coding mode decision. It improves the compression performance of speed -6 for many hard clips at lower bit-rates. For example, 5.5% for jimredvga, 6.7% for mmmoving, 6.1% for niklas720p. This will introduce extra encoding cycle costs at this point. Change-Id: Iaf70634fa2417a705ee29f2456175b981db3d375	2015-03-30 14:46:05 -07:00
Vignesh Venkatasubramanian	1f05b19e69	webmdec: Fix read_frame return value for calls after EOS webm_read_frame assumes that it won't be called once end of file is reached. But for frame parallel mode that turns out to be not true. this patch fixes that behavior by checking for EOS and returning the appropriate value for subsequent calls. Change-Id: Ie2fddbe00493a0f96c4172c67be1eb719f0fe8ed	2015-03-30 12:58:26 -07:00
Alex Converse	bf7def9a43	Merge "Simplify skip check."	2015-03-30 11:31:45 -07:00
jackychen	b38b32a794	Merge "vp9_postproc.c: eliminate -Wshadow build warnings."	2015-03-30 10:29:39 -07:00
jackychen	68610ae568	vp9_postproc.c: eliminate -Wshadow build warnings. Change-Id: I6df525a9ad1ae3cfbba8710d21db8fee76e64dbb	2015-03-27 20:27:30 -07:00
Marco	fa20a60f0d	Speed 5: use non-rd mode for key frame coding. Metrics on RTC set go down by ~1.5% on average. Key frame encoding time goes down by factor of ~5. Change-Id: Ia83acc55848613870e5ac6efe7f3d904d877febb	2015-03-27 16:19:26 -07:00
hkuang	0c85718954	Merge "Fix the issue that --limit is not working in --frame-parallel mode."	2015-03-27 10:12:45 -07:00
Adrian Grange	553792cee2	Merge "Remove 8-bit array in HBD"	2015-03-26 16:31:27 -07:00
Adrian Grange	300d428ecd	Merge "Replace heap with stack memory allocation"	2015-03-26 16:31:06 -07:00
Adrian Grange	9931110971	Merge "Fix use of scaling in joint motion search"	2015-03-26 16:30:35 -07:00
hkuang	ffafcd6281	Fix the issue that --limit is not working in --frame-parallel mode. The reason is due to early break out before outputting all the frames inside decoder. Change-Id: I4a138fba08d12935c39bd7602c95f8c18b474e29	2015-03-26 15:36:22 -07:00
Johann	46ce6954cc	Remove duplicate code from merge Change-Id: I5e2a1270001b7e29f3f198d57ea40e1efccef367	2015-03-26 14:56:24 -07:00
Adrian Grange	ad18b2b641	Remove 8-bit array in HBD Creating both 8- and 16-bit arrays and then only using one of them is wasteful. Change-Id: Ic5b397c283efaff7bcfff2d2413838ba3e065561	2015-03-25 15:37:03 -07:00
Adrian Grange	65df3d138a	Replace heap with stack memory allocation Replaced the dynamic memory allocation of the second_pred buffer with an allocation on the stack. Change-Id: I2716c46b71e8587714ca5733a99eca2c68419b23	2015-03-25 15:36:43 -07:00
Adrian Grange	8d8d7bfde5	Fix use of scaling in joint motion search To enable us to the scale-invariant motion estimation code during mode selection, each of the reference buffers is scaled to match the size of the frame being encoded. This fix ensures that a unit scaling factor is used in this case rather than the one calculated assuming that the reference frame is not scaled. Change-Id: Id9a5c85dad402f3a7cc7ea9f30f204edad080ebf	2015-03-25 15:35:29 -07:00
Johann	ba13ff8501	Parall -> Parallel Change-Id: I565fef382fa17a00d5ae54e980ef14d9f0ad4f55	2015-03-25 12:45:36 -07:00
James Zern	e865be95bf	Merge "fix static analysis warnings related to CHECK_MEM_ERROR"	2015-03-24 23:56:04 -07:00
Parag Salasakar	84ec68d21a	mips msa configuration patch for MIPS SIMD Arch (MSA) P5600 and I6400 For P5600: CROSS=$MTI/bin/mips-mti-linux-gnu- CFLAGS='-EL' CXXFLAGS='-EL' LDFLAGS='-EL'\ ../configure --target=mips32-linux-gcc --cpu=p5600 --enable-msa For I6400: CROSS=$IMG/bin/mips-img-linux-gnu- CFLAGS='-EL' CXXFLAGS='-EL' LDFLAGS='-EL'\ ../configure --target=mips64-linux-gcc --cpu=i6400 --enable-msa Change-Id: Id25f721ea1f1991d5116e04dba713aebd7378f05	2015-03-24 15:18:38 -07:00
paulwilkins	ab788c5380	Merge "Enable group adaptive max q by default."	2015-03-24 15:00:12 -07:00
Alex Converse	4dcb839607	VP9E_GET_ACTIVE_MAP API function. This is useful when aq mode 3 (cyclic refresh) reactivates segments for refresh. Change-Id: I3ad1d9410b899ede393d82bb8db14e2da4d84eca	2015-03-24 11:19:47 -07:00
Alex Converse	a1e20ec58f	Refactor fast loop filter code to handle 444. Change-Id: I921b1ebabdf617049f8fa26fbe462c3ff115c1ce	2015-03-24 11:17:50 -07:00
Yaowu Xu	c77d4dcb35	Merge "vp9_pred_mv(): misc fixes and optimizations"	2015-03-24 10:36:51 -07:00
Alex Converse	02697e35dc	Merge "A tiny cyclic refresh / active map fix."	2015-03-24 09:43:24 -07:00
paulwilkins	8ea7bafdaa	Merge "Revised rd adjustment for variance."	2015-03-24 03:12:56 -07:00
paulwilkins	c0b71cf82f	Merge "Experimental rd bias based on source vs recon variance."	2015-03-24 03:12:41 -07:00
Alex Converse	31f1563a92	A tiny cyclic refresh / active map fix. Change-Id: I198727461455c8c198a0c892d02ed3cb1673aa50	2015-03-23 18:51:00 -07:00
James Zern	7cc3e70394	Merge "vp8cx.h: vpx/vpx_encoder.h -> ./vpx_encoder.h"	2015-03-23 17:19:52 -07:00
hkuang	9f4f98fdbd	Merge "Optimize the intra frame decode to skip some unnecessary copy."	2015-03-23 16:50:37 -07:00
hkuang	cd1d40ff5d	Merge "Safely free all the frame buffers after all the workers finish the work."	2015-03-23 16:50:15 -07:00
James Zern	7999c07697	vp8cx.h: vpx/vpx_encoder.h -> ./vpx_encoder.h this matches the other includes and simplifies include paths in builds from source Change-Id: I344902c84f688ef93c9f3a53e7c06c30db49d8d3	2015-03-23 16:07:21 -07:00
Alex Converse	b7605a9d70	Simplify skip check. SEG_LVL_SKIP implies skip. This is enforced by skip = write_skip(). Change-Id: I61c79581c9c53deae36685c2bcf388cb4d8827d3	2015-03-23 10:53:31 -07:00
hkuang	85107641a4	Optimize the intra frame decode to skip some unnecessary copy. This speeds up a normal YT style 1080P clip decode by ~1% on nexus 7. Change-Id: Ied7fa0d8bc941b2adb4db9382f549ee4d5654f3a	2015-03-23 10:11:49 -07:00
Alex Converse	f7bcce91af	Merge "Don't apply active map on key frames."	2015-03-23 10:04:39 -07:00
Alex Converse	03177cb7fa	Merge "Set loop filter level to zero on inactive segment."	2015-03-23 10:04:29 -07:00
paulwilkins	691ec45b4e	Enable group adaptive max q by default. Set the GF group adaptive max Q compile flag to 1 by default. This change has a quite big visual impact in some clips and also contributes to tighter rate control. For short test clips that have consistent content the impact is quite small on metrics but for more varied long form clips there is a drop in overal psnr but a sharp rise in average psnr caused by greater expenditure on some easier sections and tighter rate clipping in hard sections. In chunck'ed encodes some of the effect will already be present due to the independent rate control in each chunk but this change takes the control down to a smaller scale. yt hd +10.67%, - 3.77%, -1.56% yt +9.654%, - 3.6%, - 1.82% std hd +0.25%, -0.85%, -0.42% derf +0.25%, - 1.1%. - 0.87% Change-Id: Ibbc39b800d99d053939f4c6712d715124082843e	2015-03-23 15:57:09 +00:00
Yaowu Xu	9fd8abc541	vp9_pred_mv(): misc fixes and optimizations 1. skip near if it is same as nearest 2. correct rounding for converting mv to fullpel position 3. update pred_mv_sad after new mv search. Overall .1%~.25% compression gains on rtc set for speed 5, 6, 7, 8. Change-Id: Ic300ca53f7da18073771f1bb993c58cde9deee89	2015-03-20 17:17:04 -07:00
Alex Converse	6d6ef8eb3c	Don't apply active map on key frames. This allows applciations to be KF oblivious. Change-Id: Ic02712eae6ad8d6b3eaec26548299d24ca0d5cc0	2015-03-20 14:57:24 -07:00
Alex Converse	e032fc7b9e	Set loop filter level to zero on inactive segment. Change-Id: I6022a79351882a72a219aee13563bf21bcd70383	2015-03-20 14:43:06 -07:00
paulwilkins	7e234b9228	Revised rd adjustment for variance. Revised adjustment for rd based on source complexity. Two cases: 1) Bias against low variance intra predictors when the actual source variance is higher. 2) When the source variance is very low to give a slight bias against predictors that might introduce false texture or features. The impact on metrics of this change across the test sets is small and mixed. derf -0.073%, -0.049%, -0.291% std hd -0.093%, -0.1%, -0.557% yt +0.186%, +0.04%, - 0.074% ythd +0.625%, + 0.563%, +0.584% Medium to strong psycho-visual improvements in some problem clips. This feature and intra weight on GF group length now turned on by default. Change-Id: Idefc8b633a7b7bc56c42dbe19f6b2f872d73851e	2015-03-20 11:59:39 +00:00
paulwilkins	9a1ce7be7d	Experimental rd bias based on source vs recon variance. This experiment biases the rd decision based on the impact a mode decision has on the relative spatial complexity of the reconstruction vs the source. The aim is to better retain a semblance of texture even if it is slightly misaligned / wrong, rather than use a simple rd measure that tends to favor use of a flat predictor if a perfect match can't be found. This improves the appearance of texture and visual quality on specific test clips but is hidden under a flag and currently off by default pending visual quality testing on a wider Yt set. Change-Id: Idf6e754a8949bf39ed9d314c6f2daaa20c888aad	2015-03-20 11:57:36 +00:00
hkuang	b88dac8938	Safely free all the frame buffers after all the workers finish the work. Issue: 978 Change-Id: Ia7aa809095008f6819a44d7ecb0329def79b1117	2015-03-19 12:21:00 -07:00
James Zern	3ab1c0227a	fix static analysis warnings related to CHECK_MEM_ERROR mark vpx_internal_error as noreturn under the analyzer Change-Id: If214a0e740aab9b82cc04f4492eb77a7a07ef7ab	2015-03-18 14:35:49 -07:00