Release 1.7.0 Mandarin Duck

Change-Id: I186440f3643a85694f45400393efb661f6d012fc
vpx_codec_enc_init_multi: fix segfault w/vp9
2018-01-24 14:25:44 -08:00 · 2018-01-23 18:14:23 -08:00 · 2018-01-23 18:14:14 -08:00 · 2018-01-17 20:17:33 +00:00 · 2018-01-17 11:28:24 -08:00 · 2018-01-16 18:27:42 -08:00
43 changed files with 1304 additions and 900 deletions
--- a/.mailmap
+++ b/.mailmap
@@ -3,6 +3,7 @@ Aℓex Converse <aconverse@google.com>
 Aℓex Converse <aconverse@google.com> <alex.converse@gmail.com>
 Alexis Ballier <aballier@gentoo.org> <alexis.ballier@gmail.com>
 Alpha Lam <hclam@google.com> <hclam@chromium.org>
+Chris Cunningham <chcunningham@chromium.org>
 Daniele Castagna <dcastagna@chromium.org> <dcastagna@google.com>
 Deb Mukherjee <debargha@google.com>
 Erik Niemeyer <erik.a.niemeyer@intel.com> <erik.a.niemeyer@gmail.com>
@@ -21,18 +22,21 @@ Marco Paniconi <marpan@google.com>
 Marco Paniconi <marpan@google.com> <marpan@chromium.org>
 Pascal Massimino <pascal.massimino@gmail.com>
 Paul Wilkins <paulwilkins@google.com>
+Peter Boström <pbos@chromium.org> <pbos@google.com>
 Peter de Rivaz <peter.derivaz@gmail.com>
 Peter de Rivaz <peter.derivaz@gmail.com> <peter.derivaz@argondesign.com>
 Ralph Giles <giles@xiph.org> <giles@entropywave.com>
 Ralph Giles <giles@xiph.org> <giles@mozilla.com>
 Ronald S. Bultje <rsbultje@gmail.com> <rbultje@google.com>
 Sami Pietilä <samipietila@google.com>
+Shiyou Yin <yinshiyou-hf@loongson.cn>
 Tamar Levy <tamar.levy@intel.com>
 Tamar Levy <tamar.levy@intel.com> <levytamar82@gmail.com>
 Tero Rintaluoma <teror@google.com> <tero.rintaluoma@on2.com>
 Timothy B. Terriberry <tterribe@xiph.org> <tterriberry@mozilla.com>
 Tom Finegan <tomfinegan@google.com>
 Tom Finegan <tomfinegan@google.com> <tomfinegan@chromium.org>
+Urvang Joshi <urvang@google.com> <urvang@chromium.org>
 Yaowu Xu <yaowu@google.com> <adam@xuyaowu.com>
 Yaowu Xu <yaowu@google.com> <yaowu@xuyaowu.com>
 Yaowu Xu <yaowu@google.com> <Yaowu Xu>
--- a/16
+++ b/16
@@ -3,13 +3,13 @@

 Aaron Watry <awatry@gmail.com>
 Abo Talib Mahfoodh <ab.mahfoodh@gmail.com>
-Adam Xu <adam@xuyaowu.com>
 Adrian Grange <agrange@google.com>
 Aℓex Converse <aconverse@google.com>
 Ahmad Sharif <asharif@google.com>
 Aleksey Vasenev <margtu-fivt@ya.ru>
 Alexander Potapenko <glider@google.com>
 Alexander Voronov <avoronov@graphics.cs.msu.ru>
+Alexandra Hájková <alexandra.khirnova@gmail.com>
 Alexis Ballier <aballier@gentoo.org>
 Alok Ahuja <waveletcoeff@gmail.com>
 Alpha Lam <hclam@google.com>
@@ -17,6 +17,7 @@ A.Mahfoodh <ab.mahfoodh@gmail.com>
 Ami Fischman <fischman@chromium.org>
 Andoni Morales Alastruey <ylatuya@gmail.com>
 Andres Mejia <mcitadel@gmail.com>
+Andrew Lewis <andrewlewis@google.com>
 Andrew Russell <anrussell@google.com>
 Angie Chiang <angiebird@google.com>
 Aron Rosenberg <arosenberg@logitech.com>
@@ -24,7 +25,9 @@ Attila Nagy <attilanagy@google.com>
 Brion Vibber <bvibber@wikimedia.org>
 changjun.yang <changjun.yang@intel.com>
 Charles 'Buck' Krasic <ckrasic@google.com>
+Cheng Chen <chengchen@google.com>
 chm <chm@rock-chips.com>
+Chris Cunningham <chcunningham@chromium.org>
 Christian Duvivier <cduvivier@google.com>
 Daniele Castagna <dcastagna@chromium.org>
 Daniel Kang <ddkang@google.com>
@@ -46,10 +49,12 @@ Geza Lore <gezalore@gmail.com>
 Ghislain MARY <ghislainmary2@gmail.com>
 Giuseppe Scrivano <gscrivano@gnu.org>
 Gordana Cmiljanovic <gordana.cmiljanovic@imgtec.com>
+Gregor Jasny <gjasny@gmail.com>
 Guillaume Martres <gmartres@google.com>
 Guillermo Ballester Valor <gbvalor@gmail.com>
 Hangyu Kuang <hkuang@google.com>
 Hanno Böck <hanno@hboeck.de>
+Han Shen <shenhan@google.com>
 Henrik Lundin <hlundin@google.com>
 Hui Su <huisu@google.com>
 Ivan Krasin <krasin@chromium.org>
@@ -83,6 +88,7 @@ Justin Clift <justin@salasaga.org>
 Justin Lebar <justin.lebar@gmail.com>
 Kaustubh Raste <kaustubh.raste@imgtec.com>
 KO Myung-Hun <komh@chollian.net>
+Kyle Siefring <kylesiefring@gmail.com>
 Lawrence Velázquez <larryv@macports.org>
 Linfeng Zhang <linfengz@google.com>
 Lou Quillio <louquillio@google.com>
@@ -101,6 +107,7 @@ Mikhal Shemer <mikhal@google.com>
 Min Chen <chenm003@gmail.com>
 Minghai Shang <minghai@google.com>
 Min Ye <yeemmi@google.com>
+Moriyoshi Koizumi <mozo@mozo.jp>
 Morton Jonuschat <yabawock@gmail.com>
 Nathan E. Egge <negge@mozilla.com>
 Nico Weber <thakis@chromium.org>
@@ -111,12 +118,15 @@ Paul Wilkins <paulwilkins@google.com>
 Pavol Rusnak <stick@gk2.sk>
 Paweł Hajdan <phajdan@google.com>
 Pengchong Jin <pengchong@google.com>
-Peter Boström <pbos@google.com>
+Peter Boström <pbos@chromium.org>
+Peter Collingbourne <pcc@chromium.org>
 Peter de Rivaz <peter.derivaz@gmail.com>
 Philip Jägenstedt <philipj@opera.com>
 Priit Laes <plaes@plaes.org>
 Rafael Ávila de Espíndola <rafael.espindola@gmail.com>
 Rafaël Carré <funman@videolan.org>
+Rafael de Lucena Valle <rafaeldelucena@gmail.com>
+Rahul Chaudhry <rahulchaudhry@google.com>
 Ralph Giles <giles@xiph.org>
 Ranjit Kumar Tulabandu <ranjit.tulabandu@ittiam.com>
 Rob Bradford <rob@linux.intel.com>
@@ -135,6 +145,7 @@ Shiyou Yin <yinshiyou-hf@loongson.cn>
 Shunyao Li <shunyaoli@google.com>
 Stefan Holmer <holmer@google.com>
 Suman Sunkara <sunkaras@google.com>
+Sylvestre Ledru <sylvestre@mozilla.com>
 Taekhyun Kim <takim@nvidia.com>
 Takanori MATSUURA <t.matsuu@gmail.com>
 Tamar Levy <tamar.levy@intel.com>
@@ -147,6 +158,7 @@ Tom Finegan <tomfinegan@google.com>
 Tristan Matthews <le.businessman@gmail.com>
 Urvang Joshi <urvang@google.com>
 Vignesh Venkatasubramanian <vigneshv@google.com>
+Vlad Tsyrklevich <vtsyrklevich@chromium.org>
 Yaowu Xu <yaowu@google.com>
 Yi Luo <luoyi@google.com>
 Yongzhe Wang <yongzhe@google.com>
--- a/25
+++ b/25
@@ -1,3 +1,28 @@
+2017-01-04 v1.7.0 "Mandarin Duck"
+  This release focused on high bit depth performance (10/12 bit) and vp9
+  encoding improvements.
+
+  - Upgrading:
+    This release is ABI incompatible due to new vp9 encoder features.
+
+    Frame parallel decoding for vp9 has been removed.
+
+  - Enhancements:
+    vp9 encoding supports additional threads with --row-mt. This can be greater
+    than the number of tiles.
+
+    Two new vp9 encoder options have been added:
+      --corpus-complexity
+      --tune-content=film
+
+    Additional tooling for respecting the vp9 "level" profiles has been added.
+
+  - Bug fixes:
+    A variety of fuzzing issues.
+    vp8 threading fix for ARM.
+    Codec control VP9_SET_SKIP_LOOP_FILTER fixed.
+    Reject invalid multi resolution configurations.
+
 2017-01-09 v1.6.1 "Long Tailed Duck"
  This release improves upon the VP9 encoder and speeds up the encoding and
  decoding processes.
--- a/4
+++ b/4
@@ -1,4 +1,4 @@
-README - 26 January 2017
+README - 24 January 2018

 Welcome to the WebM VP8/VP9 Codec SDK!

@@ -63,6 +63,8 @@ COMPILING THE APPLICATIONS/LIBRARIES:
    armv8-linux-gcc
    mips32-linux-gcc
    mips64-linux-gcc
+    ppc64-linux-gcc
+    ppc64le-linux-gcc
    sparc-solaris-gcc
    x86-android-gcc
    x86-darwin8-gcc
--- a/build/make/configure.sh
+++ b/build/make/configure.sh
@@ -1312,6 +1312,11 @@ EOF
            check_gcc_machine_option ${ext%_*} $ext
          fi
        fi
+
+        # https://bugs.chromium.org/p/webm/issues/detail?id=1464
+        # The assembly optimizations for vpx_sub_pixel_variance do not link with
+        # gcc 6.
+        enabled sse2 && soft_enable pic
      done

      if enabled external_build; then
--- a/build/make/rtcd.pl
+++ b/build/make/rtcd.pl
@@ -1,4 +1,13 @@
 #!/usr/bin/env perl
+##
+##  Copyright (c) 2017 The WebM project authors. All Rights Reserved.
+##
+##  Use of this source code is governed by a BSD-style license
+##  that can be found in the LICENSE file in the root of the source
+##  tree. An additional intellectual property rights grant can be found
+##  in the file PATENTS.  All contributing project authors may
+##  be found in the AUTHORS file in the root of the source tree.
+##

 no strict 'refs';
 use warnings;
@@ -200,6 +209,7 @@ sub filter {
 sub common_top() {
  my $include_guard = uc($opts{sym})."_H_";
  print <<EOF;
+// This file is generated. Do not edit.
 #ifndef ${include_guard}
 #define ${include_guard}

--- a/build/make/version.sh
+++ b/build/make/version.sh
@@ -60,6 +60,7 @@ if [ ${bare} ]; then
    echo "${changelog_version}${git_version_id}" > $$.tmp
 else
    cat<<EOF>$$.tmp
+// This file is generated. Do not edit.
 #define VERSION_MAJOR  $major_version
 #define VERSION_MINOR  $minor_version
 #define VERSION_PATCH  $patch_version
--- a/2
+++ b/2
@@ -665,7 +665,7 @@ process_toolchain() {
             gen_vcproj_cmd=${source_path}/build/make/gen_msvs_vcxproj.sh
             enabled werror && gen_vcproj_cmd="${gen_vcproj_cmd} --enable-werror"
             all_targets="${all_targets} solution"
-             INLINE="__forceinline"
+             INLINE="__inline"
        ;;
    esac

--- a/examples/vpx_temporal_svc_encoder.c
+++ b/examples/vpx_temporal_svc_encoder.c
@@ -32,13 +32,21 @@ static const char *exec_name;

 void usage_exit(void) { exit(EXIT_FAILURE); }

-// Denoiser states, for temporal denoising.
-enum denoiserState {
-  kDenoiserOff,
-  kDenoiserOnYOnly,
-  kDenoiserOnYUV,
-  kDenoiserOnYUVAggressive,
-  kDenoiserOnAdaptive
+// Denoiser states for vp8, for temporal denoising.
+enum denoiserStateVp8 {
+  kVp8DenoiserOff,
+  kVp8DenoiserOnYOnly,
+  kVp8DenoiserOnYUV,
+  kVp8DenoiserOnYUVAggressive,
+  kVp8DenoiserOnAdaptive
+};
+
+// Denoiser states for vp9, for temporal denoising.
+enum denoiserStateVp9 {
+  kVp9DenoiserOff,
+  kVp9DenoiserOnYOnly,
+  // For SVC: denoise the top two spatial layers.
+  kVp9DenoiserOnYTwoSpatialLayers
 };

 static int mode_to_num_layers[13] = { 1, 2, 2, 3, 3, 3, 3, 5, 2, 3, 3, 3, 3 };
@@ -755,7 +763,7 @@ int main(int argc, char **argv) {

  if (strncmp(encoder->name, "vp8", 3) == 0) {
    vpx_codec_control(&codec, VP8E_SET_CPUUSED, -speed);
-    vpx_codec_control(&codec, VP8E_SET_NOISE_SENSITIVITY, kDenoiserOff);
+    vpx_codec_control(&codec, VP8E_SET_NOISE_SENSITIVITY, kVp8DenoiserOff);
    vpx_codec_control(&codec, VP8E_SET_STATIC_THRESHOLD, 1);
    vpx_codec_control(&codec, VP8E_SET_GF_CBR_BOOST_PCT, 0);
 #if VP8_ROI_MAP
@@ -772,7 +780,7 @@ int main(int argc, char **argv) {
    vpx_codec_control(&codec, VP9E_SET_GF_CBR_BOOST_PCT, 0);
    vpx_codec_control(&codec, VP9E_SET_FRAME_PARALLEL_DECODING, 0);
    vpx_codec_control(&codec, VP9E_SET_FRAME_PERIODIC_BOOST, 0);
-    vpx_codec_control(&codec, VP9E_SET_NOISE_SENSITIVITY, kDenoiserOff);
+    vpx_codec_control(&codec, VP9E_SET_NOISE_SENSITIVITY, kVp9DenoiserOff);
    vpx_codec_control(&codec, VP8E_SET_STATIC_THRESHOLD, 1);
    vpx_codec_control(&codec, VP9E_SET_TUNE_CONTENT, 0);
    vpx_codec_control(&codec, VP9E_SET_TILE_COLUMNS, (cfg.g_threads >> 1));
--- a/libs.mk
+++ b/libs.mk
@@ -233,8 +233,8 @@ OBJS-yes += $(LIBVPX_OBJS)
 LIBS-$(if yes,$(CONFIG_STATIC)) += $(BUILD_PFX)libvpx.a $(BUILD_PFX)libvpx_g.a
 $(BUILD_PFX)libvpx_g.a: $(LIBVPX_OBJS)

-SO_VERSION_MAJOR := 4
-SO_VERSION_MINOR := 1
+SO_VERSION_MAJOR := 5
+SO_VERSION_MINOR := 0
 SO_VERSION_PATCH := 0
 ifeq ($(filter darwin%,$(TGT_OS)),$(TGT_OS))
 LIBVPX_SO               := libvpx.$(SO_VERSION_MAJOR).dylib
--- a/test/datarate_test.cc
+++ b/test/datarate_test.cc
@@ -1216,16 +1216,17 @@ class DatarateOnePassCbrSvc
  }
  virtual void ResetModel() {
    last_pts_ = 0;
-    bits_in_buffer_model_ = cfg_.rc_target_bitrate * cfg_.rc_buf_initial_sz;
-    frame_number_ = 0;
-    first_drop_ = 0;
-    bits_total_ = 0;
    duration_ = 0.0;
    mismatch_psnr_ = 0.0;
    mismatch_nframes_ = 0;
    denoiser_on_ = 0;
    tune_content_ = 0;
    base_speed_setting_ = 5;
+    spatial_layer_id_ = 0;
+    temporal_layer_id_ = 0;
+    memset(bits_in_buffer_model_, 0, sizeof(bits_in_buffer_model_));
+    memset(bits_total_, 0, sizeof(bits_total_));
+    memset(layer_target_avg_bandwidth_, 0, sizeof(layer_target_avg_bandwidth_));
  }
  virtual void BeginPassHook(unsigned int /*pass*/) {}
  virtual void PreEncodeFrameHook(::libvpx_test::VideoSource *video,
@@ -1256,32 +1257,94 @@ class DatarateOnePassCbrSvc
    timebase_ = static_cast<double>(tb.num) / tb.den;
    duration_ = 0;
  }
+
+  virtual void PostEncodeFrameHook(::libvpx_test::Encoder *encoder) {
+    vpx_svc_layer_id_t layer_id;
+    encoder->Control(VP9E_GET_SVC_LAYER_ID, &layer_id);
+    spatial_layer_id_ = layer_id.spatial_layer_id;
+    temporal_layer_id_ = layer_id.temporal_layer_id;
+    // Update buffer with per-layer target frame bandwidth, this is done
+    // for every frame passed to the encoder (encoded or dropped).
+    // For temporal layers, update the cumulative buffer level.
+    for (int sl = 0; sl < number_spatial_layers_; ++sl) {
+      for (int tl = temporal_layer_id_; tl < number_temporal_layers_; ++tl) {
+        const int layer = sl * number_temporal_layers_ + tl;
+        bits_in_buffer_model_[layer] +=
+            static_cast<int64_t>(layer_target_avg_bandwidth_[layer]);
+      }
+    }
+  }
+
+  vpx_codec_err_t parse_superframe_index(const uint8_t *data, size_t data_sz,
+                                         uint32_t sizes[8], int *count) {
+    uint8_t marker;
+    marker = *(data + data_sz - 1);
+    *count = 0;
+    if ((marker & 0xe0) == 0xc0) {
+      const uint32_t frames = (marker & 0x7) + 1;
+      const uint32_t mag = ((marker >> 3) & 0x3) + 1;
+      const size_t index_sz = 2 + mag * frames;
+      // This chunk is marked as having a superframe index but doesn't have
+      // enough data for it, thus it's an invalid superframe index.
+      if (data_sz < index_sz) return VPX_CODEC_CORRUPT_FRAME;
+      {
+        const uint8_t marker2 = *(data + data_sz - index_sz);
+        // This chunk is marked as having a superframe index but doesn't have
+        // the matching marker byte at the front of the index therefore it's an
+        // invalid chunk.
+        if (marker != marker2) return VPX_CODEC_CORRUPT_FRAME;
+      }
+      {
+        uint32_t i, j;
+        const uint8_t *x = &data[data_sz - index_sz + 1];
+        for (i = 0; i < frames; ++i) {
+          uint32_t this_sz = 0;
+
+          for (j = 0; j < mag; ++j) this_sz |= (*x++) << (j * 8);
+          sizes[i] = this_sz;
+        }
+        *count = frames;
+      }
+    }
+    return VPX_CODEC_OK;
+  }
+
  virtual void FramePktHook(const vpx_codec_cx_pkt_t *pkt) {
-    vpx_codec_pts_t duration = pkt->data.frame.pts - last_pts_;
-    if (last_pts_ == 0) duration = 1;
-    bits_in_buffer_model_ += static_cast<int64_t>(
-        duration * timebase_ * cfg_.rc_target_bitrate * 1000);
+    uint32_t sizes[8] = { 0 };
+    int count = 0;
+    last_pts_ = pkt->data.frame.pts;
    const bool key_frame =
        (pkt->data.frame.flags & VPX_FRAME_IS_KEY) ? true : false;
-    if (!key_frame) {
-      // TODO(marpan): This check currently fails for some of the SVC tests,
-      // re-enable when issue (webm:1350) is resolved.
-      //  ASSERT_GE(bits_in_buffer_model_, 0) << "Buffer Underrun at frame "
-      //                                      << pkt->data.frame.pts;
+    parse_superframe_index(static_cast<const uint8_t *>(pkt->data.frame.buf),
+                           pkt->data.frame.sz, sizes, &count);
+    ASSERT_EQ(count, number_spatial_layers_);
+    for (int sl = 0; sl < number_spatial_layers_; ++sl) {
+      sizes[sl] = sizes[sl] << 3;
+      // Update the total encoded bits per layer.
+      // For temporal layers, update the cumulative encoded bits per layer.
+      for (int tl = temporal_layer_id_; tl < number_temporal_layers_; ++tl) {
+        const int layer = sl * number_temporal_layers_ + tl;
+        bits_total_[layer] += static_cast<int64_t>(sizes[sl]);
+        // Update the per-layer buffer level with the encoded frame size.
+        bits_in_buffer_model_[layer] -= static_cast<int64_t>(sizes[sl]);
+        // There should be no buffer underrun, except on the base
+        // temporal layer, since there may be key frames there.
+        if (!key_frame && tl > 0) {
+          ASSERT_GE(bits_in_buffer_model_[layer], 0)
+              << "Buffer Underrun at frame " << pkt->data.frame.pts;
        }
-    const size_t frame_size_in_bits = pkt->data.frame.sz * 8;
-    bits_in_buffer_model_ -= static_cast<int64_t>(frame_size_in_bits);
-    bits_total_ += frame_size_in_bits;
-    if (!first_drop_ && duration > 1) first_drop_ = last_pts_ + 1;
-    last_pts_ = pkt->data.frame.pts;
-    bits_in_last_frame_ = frame_size_in_bits;
-    ++frame_number_;
      }
+    }
+  }
+
  virtual void EndPassHook(void) {
-    if (bits_total_) {
-      const double file_size_in_kb = bits_total_ / 1000.;  // bits per kilobit
+    for (int sl = 0; sl < number_spatial_layers_; ++sl) {
+      for (int tl = 0; tl < number_temporal_layers_; ++tl) {
+        const int layer = sl * number_temporal_layers_ + tl;
+        const double file_size_in_kb = bits_total_[layer] / 1000.;
        duration_ = (last_pts_ + 1) * timebase_;
-      file_datarate_ = file_size_in_kb / duration_;
+        file_datarate_[layer] = file_size_in_kb / duration_;
+      }
    }
  }

@@ -1294,13 +1357,11 @@ class DatarateOnePassCbrSvc
  unsigned int GetMismatchFrames() { return mismatch_nframes_; }

  vpx_codec_pts_t last_pts_;
-  int64_t bits_in_buffer_model_;
+  int64_t bits_in_buffer_model_[VPX_MAX_LAYERS];
  double timebase_;
-  int frame_number_;
-  vpx_codec_pts_t first_drop_;
-  int64_t bits_total_;
+  int64_t bits_total_[VPX_MAX_LAYERS];
  double duration_;
-  double file_datarate_;
+  double file_datarate_[VPX_MAX_LAYERS];
  size_t bits_in_last_frame_;
  vpx_svc_extra_cfg_t svc_params_;
  int speed_setting_;
@@ -1309,14 +1370,22 @@ class DatarateOnePassCbrSvc
  int denoiser_on_;
  int tune_content_;
  int base_speed_setting_;
+  int spatial_layer_id_;
+  int temporal_layer_id_;
+  int number_spatial_layers_;
+  int number_temporal_layers_;
+  int layer_target_avg_bandwidth_[VPX_MAX_LAYERS];
 };
 static void assign_layer_bitrates(vpx_codec_enc_cfg_t *const enc_cfg,
                                  const vpx_svc_extra_cfg_t *svc_params,
                                  int spatial_layers, int temporal_layers,
-                                  int temporal_layering_mode) {
+                                  int temporal_layering_mode,
+                                  int *layer_target_avg_bandwidth,
+                                  int64_t *bits_in_buffer_model) {
  int sl, spatial_layer_target;
  float total = 0;
  float alloc_ratio[VPX_MAX_LAYERS] = { 0 };
+  float framerate = 30.0;
  for (sl = 0; sl < spatial_layers; ++sl) {
    if (svc_params->scaling_factor_den[sl] > 0) {
      alloc_ratio[sl] = (float)(svc_params->scaling_factor_num[sl] * 1.0 /
@@ -1336,8 +1405,41 @@ static void assign_layer_bitrates(vpx_codec_enc_cfg_t *const enc_cfg,
    } else if (temporal_layering_mode == 2) {
      enc_cfg->layer_target_bitrate[index] = spatial_layer_target * 2 / 3;
      enc_cfg->layer_target_bitrate[index + 1] = spatial_layer_target;
+    } else if (temporal_layering_mode <= 1) {
+      enc_cfg->layer_target_bitrate[index] = spatial_layer_target;
    }
  }
+  for (sl = 0; sl < spatial_layers; ++sl) {
+    for (int tl = 0; tl < temporal_layers; ++tl) {
+      const int layer = sl * temporal_layers + tl;
+      float layer_framerate = framerate;
+      if (temporal_layers == 2 && tl == 0) layer_framerate = framerate / 2;
+      if (temporal_layers == 3 && tl == 0) layer_framerate = framerate / 4;
+      if (temporal_layers == 3 && tl == 1) layer_framerate = framerate / 2;
+      layer_target_avg_bandwidth[layer] = static_cast<int>(
+          enc_cfg->layer_target_bitrate[layer] * 1000.0 / layer_framerate);
+      bits_in_buffer_model[layer] =
+          enc_cfg->layer_target_bitrate[layer] * enc_cfg->rc_buf_initial_sz;
+    }
+  }
+}
+
+static void CheckLayerRateTargeting(vpx_codec_enc_cfg_t *const cfg,
+                                    int number_spatial_layers,
+                                    int number_temporal_layers,
+                                    double *file_datarate,
+                                    double thresh_overshoot,
+                                    double thresh_undershoot) {
+  for (int sl = 0; sl < number_spatial_layers; ++sl)
+    for (int tl = 0; tl < number_temporal_layers; ++tl) {
+      const int layer = sl * number_temporal_layers + tl;
+      ASSERT_GE(cfg->layer_target_bitrate[layer],
+                file_datarate[layer] * thresh_overshoot)
+          << " The datarate for the file exceeds the target by too much!";
+      ASSERT_LE(cfg->layer_target_bitrate[layer],
+                file_datarate[layer] * thresh_undershoot)
+          << " The datarate for the file is lower than the target by too much!";
+    }
 }

 // Check basic rate targeting for 1 pass CBR SVC: 2 spatial layers and 1
@@ -1363,14 +1465,19 @@ TEST_P(DatarateOnePassCbrSvc, OnePassCbrSvc2SL1TLScreenContent1) {
  svc_params_.scaling_factor_den[1] = 288;
  cfg_.rc_dropframe_thresh = 10;
  cfg_.kf_max_dist = 9999;
-  ::libvpx_test::Y4mVideoSource video("niklas_1280_720_30.y4m", 0, 300);
+  number_spatial_layers_ = cfg_.ss_number_layers;
+  number_temporal_layers_ = cfg_.ts_number_layers;
+  ::libvpx_test::Y4mVideoSource video("niklas_1280_720_30.y4m", 0, 60);
  cfg_.rc_target_bitrate = 500;
  ResetModel();
  tune_content_ = 1;
  base_speed_setting_ = speed_setting_;
  assign_layer_bitrates(&cfg_, &svc_params_, cfg_.ss_number_layers,
-                        cfg_.ts_number_layers, cfg_.temporal_layering_mode);
+                        cfg_.ts_number_layers, cfg_.temporal_layering_mode,
+                        layer_target_avg_bandwidth_, bits_in_buffer_model_);
  ASSERT_NO_FATAL_FAILURE(RunLoop(&video));
+  CheckLayerRateTargeting(&cfg_, number_spatial_layers_,
+                          number_temporal_layers_, file_datarate_, 0.78, 1.15);
  EXPECT_EQ(static_cast<unsigned int>(0), GetMismatchFrames());
 }

@@ -1398,26 +1505,28 @@ TEST_P(DatarateOnePassCbrSvc, OnePassCbrSvc2SL3TL) {
  svc_params_.scaling_factor_den[1] = 288;
  cfg_.rc_dropframe_thresh = 0;
  cfg_.kf_max_dist = 9999;
-  ::libvpx_test::I420VideoSource video("hantro_collage_w352h288.yuv", 352, 288,
-                                       30, 1, 0, 200);
+  number_spatial_layers_ = cfg_.ss_number_layers;
+  number_temporal_layers_ = cfg_.ts_number_layers;
+  ::libvpx_test::I420VideoSource video("niklas_640_480_30.yuv", 640, 480, 30, 1,
+                                       0, 400);
  // TODO(marpan): Check that effective_datarate for each layer hits the
  // layer target_bitrate.
  for (int i = 200; i <= 800; i += 200) {
    cfg_.rc_target_bitrate = i;
    ResetModel();
    assign_layer_bitrates(&cfg_, &svc_params_, cfg_.ss_number_layers,
-                          cfg_.ts_number_layers, cfg_.temporal_layering_mode);
+                          cfg_.ts_number_layers, cfg_.temporal_layering_mode,
+                          layer_target_avg_bandwidth_, bits_in_buffer_model_);
    ASSERT_NO_FATAL_FAILURE(RunLoop(&video));
-    ASSERT_GE(cfg_.rc_target_bitrate, file_datarate_ * 0.78)
-        << " The datarate for the file exceeds the target by too much!";
-    ASSERT_LE(cfg_.rc_target_bitrate, file_datarate_ * 1.15)
-        << " The datarate for the file is lower than the target by too much!";
+    CheckLayerRateTargeting(&cfg_, number_spatial_layers_,
+                            number_temporal_layers_, file_datarate_, 0.78,
+                            1.15);
 #if CONFIG_VP9_DECODER
    // Number of temporal layers > 1, so half of the frames in this SVC pattern
    // will be non-reference frame and hence encoder will avoid loopfilter.
-    // Since frame dropper is off, we can expcet 100 (half of the sequence)
+    // Since frame dropper is off, we can expect 200 (half of the sequence)
    // mismatched frames.
-    EXPECT_EQ(static_cast<unsigned int>(100), GetMismatchFrames());
+    EXPECT_EQ(static_cast<unsigned int>(200), GetMismatchFrames());
 #endif
  }
 }
@@ -1446,33 +1555,41 @@ TEST_P(DatarateOnePassCbrSvc, OnePassCbrSvc2SL3TLDenoiserOn) {
  svc_params_.scaling_factor_den[1] = 288;
  cfg_.rc_dropframe_thresh = 0;
  cfg_.kf_max_dist = 9999;
-  ::libvpx_test::Y4mVideoSource video("niklas_1280_720_30.y4m", 0, 300);
+  number_spatial_layers_ = cfg_.ss_number_layers;
+  number_temporal_layers_ = cfg_.ts_number_layers;
+  ::libvpx_test::I420VideoSource video("niklas_640_480_30.yuv", 640, 480, 30, 1,
+                                       0, 400);
  // TODO(marpan): Check that effective_datarate for each layer hits the
  // layer target_bitrate.
+  // For SVC, noise_sen = 1 means denoising only the top spatial layer
+  // noise_sen = 2 means denoising the two top spatial layers.
+  for (int noise_sen = 1; noise_sen <= 2; noise_sen++) {
    for (int i = 600; i <= 1000; i += 200) {
      cfg_.rc_target_bitrate = i;
      ResetModel();
-    denoiser_on_ = 1;
+      denoiser_on_ = noise_sen;
      assign_layer_bitrates(&cfg_, &svc_params_, cfg_.ss_number_layers,
-                          cfg_.ts_number_layers, cfg_.temporal_layering_mode);
+                            cfg_.ts_number_layers, cfg_.temporal_layering_mode,
+                            layer_target_avg_bandwidth_, bits_in_buffer_model_);
      ASSERT_NO_FATAL_FAILURE(RunLoop(&video));
-    ASSERT_GE(cfg_.rc_target_bitrate, file_datarate_ * 0.78)
-        << " The datarate for the file exceeds the target by too much!";
-    ASSERT_LE(cfg_.rc_target_bitrate, file_datarate_ * 1.15)
-        << " The datarate for the file is lower than the target by too much!";
+      CheckLayerRateTargeting(&cfg_, number_spatial_layers_,
+                              number_temporal_layers_, file_datarate_, 0.78,
+                              1.15);
 #if CONFIG_VP9_DECODER
-    // Number of temporal layers > 1, so half of the frames in this SVC pattern
+      // Number of temporal layers > 1, so half of the frames in this SVC
+      // pattern
      // will be non-reference frame and hence encoder will avoid loopfilter.
-    // Since frame dropper is off, we can expcet 150 (half of the sequence)
+      // Since frame dropper is off, we can expect 200 (half of the sequence)
      // mismatched frames.
-    EXPECT_EQ(static_cast<unsigned int>(150), GetMismatchFrames());
+      EXPECT_EQ(static_cast<unsigned int>(200), GetMismatchFrames());
 #endif
    }
+  }
 }

 // Check basic rate targeting for 1 pass CBR SVC: 2 spatial layers and 3
 // temporal layers. Run CIF clip with 1 thread, and few short key frame periods.
-TEST_P(DatarateOnePassCbrSvc, DISABLED_OnePassCbrSvc2SL3TLSmallKf) {
+TEST_P(DatarateOnePassCbrSvc, OnePassCbrSvc2SL3TLSmallKf) {
  cfg_.rc_buf_initial_sz = 500;
  cfg_.rc_buf_optimal_sz = 500;
  cfg_.rc_buf_sz = 1000;
@@ -1493,21 +1610,23 @@ TEST_P(DatarateOnePassCbrSvc, DISABLED_OnePassCbrSvc2SL3TLSmallKf) {
  svc_params_.scaling_factor_num[1] = 288;
  svc_params_.scaling_factor_den[1] = 288;
  cfg_.rc_dropframe_thresh = 10;
-  ::libvpx_test::I420VideoSource video("hantro_collage_w352h288.yuv", 352, 288,
-                                       30, 1, 0, 200);
  cfg_.rc_target_bitrate = 400;
+  number_spatial_layers_ = cfg_.ss_number_layers;
+  number_temporal_layers_ = cfg_.ts_number_layers;
+  ::libvpx_test::I420VideoSource video("niklas_640_480_30.yuv", 640, 480, 30, 1,
+                                       0, 400);
  // For this 3 temporal layer case, pattern repeats every 4 frames, so choose
  // 4 key neighboring key frame periods (so key frame will land on 0-2-1-2).
  for (int j = 64; j <= 67; j++) {
    cfg_.kf_max_dist = j;
    ResetModel();
    assign_layer_bitrates(&cfg_, &svc_params_, cfg_.ss_number_layers,
-                          cfg_.ts_number_layers, cfg_.temporal_layering_mode);
+                          cfg_.ts_number_layers, cfg_.temporal_layering_mode,
+                          layer_target_avg_bandwidth_, bits_in_buffer_model_);
    ASSERT_NO_FATAL_FAILURE(RunLoop(&video));
-    ASSERT_GE(cfg_.rc_target_bitrate, file_datarate_ * 0.80)
-        << " The datarate for the file exceeds the target by too much!";
-    ASSERT_LE(cfg_.rc_target_bitrate, file_datarate_ * 1.15)
-        << " The datarate for the file is lower than the target by too much!";
+    CheckLayerRateTargeting(&cfg_, number_spatial_layers_,
+                            number_temporal_layers_, file_datarate_, 0.78,
+                            1.15);
  }
 }

@@ -1535,22 +1654,23 @@ TEST_P(DatarateOnePassCbrSvc, OnePassCbrSvc2SL3TL4Threads) {
  svc_params_.scaling_factor_den[1] = 288;
  cfg_.rc_dropframe_thresh = 0;
  cfg_.kf_max_dist = 9999;
-  ::libvpx_test::Y4mVideoSource video("niklas_1280_720_30.y4m", 0, 300);
+  number_spatial_layers_ = cfg_.ss_number_layers;
+  number_temporal_layers_ = cfg_.ts_number_layers;
+  ::libvpx_test::Y4mVideoSource video("niklas_1280_720_30.y4m", 0, 60);
  cfg_.rc_target_bitrate = 800;
  ResetModel();
  assign_layer_bitrates(&cfg_, &svc_params_, cfg_.ss_number_layers,
-                        cfg_.ts_number_layers, cfg_.temporal_layering_mode);
+                        cfg_.ts_number_layers, cfg_.temporal_layering_mode,
+                        layer_target_avg_bandwidth_, bits_in_buffer_model_);
  ASSERT_NO_FATAL_FAILURE(RunLoop(&video));
-  ASSERT_GE(cfg_.rc_target_bitrate, file_datarate_ * 0.78)
-      << " The datarate for the file exceeds the target by too much!";
-  ASSERT_LE(cfg_.rc_target_bitrate, file_datarate_ * 1.15)
-      << " The datarate for the file is lower than the target by too much!";
+  CheckLayerRateTargeting(&cfg_, number_spatial_layers_,
+                          number_temporal_layers_, file_datarate_, 0.78, 1.15);
 #if CONFIG_VP9_DECODER
  // Number of temporal layers > 1, so half of the frames in this SVC pattern
  // will be non-reference frame and hence encoder will avoid loopfilter.
-  // Since frame dropper is off, we can expcet 150 (half of the sequence)
+  // Since frame dropper is off, we can expect 30 (half of the sequence)
  // mismatched frames.
-  EXPECT_EQ(static_cast<unsigned int>(150), GetMismatchFrames());
+  EXPECT_EQ(static_cast<unsigned int>(30), GetMismatchFrames());
 #endif
 }

@@ -1580,22 +1700,24 @@ TEST_P(DatarateOnePassCbrSvc, OnePassCbrSvc3SL3TL) {
  svc_params_.scaling_factor_den[2] = 288;
  cfg_.rc_dropframe_thresh = 0;
  cfg_.kf_max_dist = 9999;
-  ::libvpx_test::Y4mVideoSource video("niklas_1280_720_30.y4m", 0, 300);
+  number_spatial_layers_ = cfg_.ss_number_layers;
+  number_temporal_layers_ = cfg_.ts_number_layers;
+  ::libvpx_test::I420VideoSource video("niklas_640_480_30.yuv", 640, 480, 30, 1,
+                                       0, 400);
  cfg_.rc_target_bitrate = 800;
  ResetModel();
  assign_layer_bitrates(&cfg_, &svc_params_, cfg_.ss_number_layers,
-                        cfg_.ts_number_layers, cfg_.temporal_layering_mode);
+                        cfg_.ts_number_layers, cfg_.temporal_layering_mode,
+                        layer_target_avg_bandwidth_, bits_in_buffer_model_);
  ASSERT_NO_FATAL_FAILURE(RunLoop(&video));
-  ASSERT_GE(cfg_.rc_target_bitrate, file_datarate_ * 0.78)
-      << " The datarate for the file exceeds the target by too much!";
-  ASSERT_LE(cfg_.rc_target_bitrate, file_datarate_ * 1.22)
-      << " The datarate for the file is lower than the target by too much!";
+  CheckLayerRateTargeting(&cfg_, number_spatial_layers_,
+                          number_temporal_layers_, file_datarate_, 0.78, 1.15);
 #if CONFIG_VP9_DECODER
  // Number of temporal layers > 1, so half of the frames in this SVC pattern
  // will be non-reference frame and hence encoder will avoid loopfilter.
-  // Since frame dropper is off, we can expcet 150 (half of the sequence)
+  // Since frame dropper is off, we can expect 200 (half of the sequence)
  // mismatched frames.
-  EXPECT_EQ(static_cast<unsigned int>(150), GetMismatchFrames());
+  EXPECT_EQ(static_cast<unsigned int>(200), GetMismatchFrames());
 #endif
 }

@@ -1624,20 +1746,23 @@ TEST_P(DatarateOnePassCbrSvc, OnePassCbrSvc3SL3TLSmallKf) {
  svc_params_.scaling_factor_num[2] = 288;
  svc_params_.scaling_factor_den[2] = 288;
  cfg_.rc_dropframe_thresh = 10;
-  ::libvpx_test::Y4mVideoSource video("niklas_1280_720_30.y4m", 0, 300);
  cfg_.rc_target_bitrate = 800;
+  number_spatial_layers_ = cfg_.ss_number_layers;
+  number_temporal_layers_ = cfg_.ts_number_layers;
+  ::libvpx_test::I420VideoSource video("niklas_640_480_30.yuv", 640, 480, 30, 1,
+                                       0, 400);
  // For this 3 temporal layer case, pattern repeats every 4 frames, so choose
  // 4 key neighboring key frame periods (so key frame will land on 0-2-1-2).
  for (int j = 32; j <= 35; j++) {
    cfg_.kf_max_dist = j;
    ResetModel();
    assign_layer_bitrates(&cfg_, &svc_params_, cfg_.ss_number_layers,
-                          cfg_.ts_number_layers, cfg_.temporal_layering_mode);
+                          cfg_.ts_number_layers, cfg_.temporal_layering_mode,
+                          layer_target_avg_bandwidth_, bits_in_buffer_model_);
    ASSERT_NO_FATAL_FAILURE(RunLoop(&video));
-    ASSERT_GE(cfg_.rc_target_bitrate, file_datarate_ * 0.80)
-        << " The datarate for the file exceeds the target by too much!";
-    ASSERT_LE(cfg_.rc_target_bitrate, file_datarate_ * 1.30)
-        << " The datarate for the file is lower than the target by too much!";
+    CheckLayerRateTargeting(&cfg_, number_spatial_layers_,
+                            number_temporal_layers_, file_datarate_, 0.78,
+                            1.15);
  }
 }

@@ -1667,22 +1792,23 @@ TEST_P(DatarateOnePassCbrSvc, OnePassCbrSvc3SL3TL4threads) {
  svc_params_.scaling_factor_den[2] = 288;
  cfg_.rc_dropframe_thresh = 0;
  cfg_.kf_max_dist = 9999;
-  ::libvpx_test::Y4mVideoSource video("niklas_1280_720_30.y4m", 0, 300);
+  number_spatial_layers_ = cfg_.ss_number_layers;
+  number_temporal_layers_ = cfg_.ts_number_layers;
+  ::libvpx_test::Y4mVideoSource video("niklas_1280_720_30.y4m", 0, 60);
  cfg_.rc_target_bitrate = 800;
  ResetModel();
  assign_layer_bitrates(&cfg_, &svc_params_, cfg_.ss_number_layers,
-                        cfg_.ts_number_layers, cfg_.temporal_layering_mode);
+                        cfg_.ts_number_layers, cfg_.temporal_layering_mode,
+                        layer_target_avg_bandwidth_, bits_in_buffer_model_);
  ASSERT_NO_FATAL_FAILURE(RunLoop(&video));
-  ASSERT_GE(cfg_.rc_target_bitrate, file_datarate_ * 0.78)
-      << " The datarate for the file exceeds the target by too much!";
-  ASSERT_LE(cfg_.rc_target_bitrate, file_datarate_ * 1.22)
-      << " The datarate for the file is lower than the target by too much!";
+  CheckLayerRateTargeting(&cfg_, number_spatial_layers_,
+                          number_temporal_layers_, file_datarate_, 0.78, 1.15);
 #if CONFIG_VP9_DECODER
  // Number of temporal layers > 1, so half of the frames in this SVC pattern
  // will be non-reference frame and hence encoder will avoid loopfilter.
-  // Since frame dropper is off, we can expcet 150 (half of the sequence)
+  // Since frame dropper is off, we can expect 30 (half of the sequence)
  // mismatched frames.
-  EXPECT_EQ(static_cast<unsigned int>(150), GetMismatchFrames());
+  EXPECT_EQ(static_cast<unsigned int>(30), GetMismatchFrames());
 #endif
 }

@@ -1714,9 +1840,19 @@ TEST_P(DatarateOnePassCbrSvc, OnePassCbrSvc2SL1TL5x5MultipleRuns) {
  cfg_.layer_target_bitrate[0] = 300;
  cfg_.layer_target_bitrate[1] = 1400;
  cfg_.rc_target_bitrate = 1700;
-  ::libvpx_test::Y4mVideoSource video("niklas_1280_720_30.y4m", 0, 300);
+  number_spatial_layers_ = cfg_.ss_number_layers;
+  number_temporal_layers_ = cfg_.ts_number_layers;
  ResetModel();
+  layer_target_avg_bandwidth_[0] = cfg_.layer_target_bitrate[0] * 1000 / 30;
+  bits_in_buffer_model_[0] =
+      cfg_.layer_target_bitrate[0] * cfg_.rc_buf_initial_sz;
+  layer_target_avg_bandwidth_[1] = cfg_.layer_target_bitrate[1] * 1000 / 30;
+  bits_in_buffer_model_[1] =
+      cfg_.layer_target_bitrate[1] * cfg_.rc_buf_initial_sz;
+  ::libvpx_test::Y4mVideoSource video("niklas_1280_720_30.y4m", 0, 60);
  ASSERT_NO_FATAL_FAILURE(RunLoop(&video));
+  CheckLayerRateTargeting(&cfg_, number_spatial_layers_,
+                          number_temporal_layers_, file_datarate_, 0.78, 1.15);
  EXPECT_EQ(static_cast<unsigned int>(0), GetMismatchFrames());
 }

--- a/test/encode_api_test.cc
+++ b/test/encode_api_test.cc
@@ -106,4 +106,90 @@ TEST(EncodeAPI, ImageSizeSetting) {
 }
 #endif

+// Set up 2 spatial streams with 2 temporal layers per stream, and generate
+// invalid configuration by setting the temporal layer rate allocation
+// (ts_target_bitrate[]) to 0 for both layers. This should fail independent of
+// CONFIG_MULTI_RES_ENCODING.
+TEST(EncodeAPI, MultiResEncode) {
+  static const vpx_codec_iface_t *kCodecs[] = {
+#if CONFIG_VP8_ENCODER
+    &vpx_codec_vp8_cx_algo,
+#endif
+#if CONFIG_VP9_ENCODER
+    &vpx_codec_vp9_cx_algo,
+#endif
+  };
+  const int width = 1280;
+  const int height = 720;
+  const int width_down = width / 2;
+  const int height_down = height / 2;
+  const int target_bitrate = 1000;
+  const int framerate = 30;
+
+  for (int c = 0; c < NELEMENTS(kCodecs); ++c) {
+    const vpx_codec_iface_t *const iface = kCodecs[c];
+    vpx_codec_ctx_t enc[2];
+    vpx_codec_enc_cfg_t cfg[2];
+    vpx_rational_t dsf[2] = { { 2, 1 }, { 2, 1 } };
+
+    memset(enc, 0, sizeof(enc));
+
+    for (int i = 0; i < 2; i++) {
+      vpx_codec_enc_config_default(iface, &cfg[i], 0);
+    }
+
+    /* Highest-resolution encoder settings */
+    cfg[0].g_w = width;
+    cfg[0].g_h = height;
+    cfg[0].rc_dropframe_thresh = 0;
+    cfg[0].rc_end_usage = VPX_CBR;
+    cfg[0].rc_resize_allowed = 0;
+    cfg[0].rc_min_quantizer = 2;
+    cfg[0].rc_max_quantizer = 56;
+    cfg[0].rc_undershoot_pct = 100;
+    cfg[0].rc_overshoot_pct = 15;
+    cfg[0].rc_buf_initial_sz = 500;
+    cfg[0].rc_buf_optimal_sz = 600;
+    cfg[0].rc_buf_sz = 1000;
+    cfg[0].g_error_resilient = 1; /* Enable error resilient mode */
+    cfg[0].g_lag_in_frames = 0;
+
+    cfg[0].kf_mode = VPX_KF_AUTO;
+    cfg[0].kf_min_dist = 3000;
+    cfg[0].kf_max_dist = 3000;
+
+    cfg[0].rc_target_bitrate = target_bitrate; /* Set target bitrate */
+    cfg[0].g_timebase.num = 1;                 /* Set fps */
+    cfg[0].g_timebase.den = framerate;
+
+    memcpy(&cfg[1], &cfg[0], sizeof(cfg[0]));
+    cfg[1].rc_target_bitrate = 500;
+    cfg[1].g_w = width_down;
+    cfg[1].g_h = height_down;
+
+    for (int i = 0; i < 2; i++) {
+      cfg[i].ts_number_layers = 2;
+      cfg[i].ts_periodicity = 2;
+      cfg[i].ts_rate_decimator[0] = 2;
+      cfg[i].ts_rate_decimator[1] = 1;
+      cfg[i].ts_layer_id[0] = 0;
+      cfg[i].ts_layer_id[1] = 1;
+      // Invalid parameters.
+      cfg[i].ts_target_bitrate[0] = 0;
+      cfg[i].ts_target_bitrate[1] = 0;
+    }
+
+    // VP9 should report incapable, VP8 invalid for all configurations.
+    const char kVP9Name[] = "WebM Project VP9";
+    const bool is_vp9 = strncmp(kVP9Name, vpx_codec_iface_name(iface),
+                                sizeof(kVP9Name) - 1) == 0;
+    EXPECT_EQ(is_vp9 ? VPX_CODEC_INCAPABLE : VPX_CODEC_INVALID_PARAM,
+              vpx_codec_enc_init_multi(&enc[0], iface, &cfg[0], 2, 0, &dsf[0]));
+
+    for (int i = 0; i < 2; i++) {
+      vpx_codec_destroy(&enc[i]);
+    }
+  }
+}
+
 }  // namespace
--- a/test/encode_test_driver.cc
+++ b/test/encode_test_driver.cc
@@ -201,6 +201,8 @@ void EncoderTest::RunLoop(VideoSource *video) {
      PreEncodeFrameHook(video, encoder.get());
      encoder->EncodeFrame(video, frame_flags_);

+      PostEncodeFrameHook(encoder.get());
+
      CxDataIterator iter = encoder->GetCxData();

      bool has_cxdata = false;
--- a/test/encode_test_driver.h
+++ b/test/encode_test_driver.h
@@ -219,6 +219,8 @@ class EncoderTest {
  virtual void PreEncodeFrameHook(VideoSource * /*video*/,
                                  Encoder * /*encoder*/) {}

+  virtual void PostEncodeFrameHook(Encoder * /*encoder*/) {}
+
  // Hook to be called on every compressed data packet.
  virtual void FramePktHook(const vpx_codec_cx_pkt_t * /*pkt*/) {}

--- a/test/lpf_test.cc
+++ b/test/lpf_test.cc
@@ -114,6 +114,18 @@ void InitInput(Pixel *s, Pixel *ref_s, ACMRandom *rnd, const uint8_t limit,
  }
 }

+uint8_t GetOuterThresh(ACMRandom *rnd) {
+  return static_cast<uint8_t>(rnd->RandRange(3 * MAX_LOOP_FILTER + 5));
+}
+
+uint8_t GetInnerThresh(ACMRandom *rnd) {
+  return static_cast<uint8_t>(rnd->RandRange(MAX_LOOP_FILTER + 1));
+}
+
+uint8_t GetHevThresh(ACMRandom *rnd) {
+  return static_cast<uint8_t>(rnd->RandRange(MAX_LOOP_FILTER + 1) >> 4);
+}
+
 class Loop8Test6Param : public ::testing::TestWithParam<loop8_param_t> {
 public:
  virtual ~Loop8Test6Param() {}
@@ -162,15 +174,15 @@ TEST_P(Loop8Test6Param, OperationCheck) {
  int first_failure = -1;
  for (int i = 0; i < count_test_block; ++i) {
    int err_count = 0;
-    uint8_t tmp = static_cast<uint8_t>(rnd(3 * MAX_LOOP_FILTER + 4));
+    uint8_t tmp = GetOuterThresh(&rnd);
    DECLARE_ALIGNED(16, const uint8_t,
                    blimit[16]) = { tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp,
                                    tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp };
-    tmp = static_cast<uint8_t>(rnd(MAX_LOOP_FILTER));
+    tmp = GetInnerThresh(&rnd);
    DECLARE_ALIGNED(16, const uint8_t,
                    limit[16]) = { tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp,
                                   tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp };
-    tmp = rnd.Rand8();
+    tmp = GetHevThresh(&rnd);
    DECLARE_ALIGNED(16, const uint8_t,
                    thresh[16]) = { tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp,
                                    tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp };
@@ -221,15 +233,15 @@ TEST_P(Loop8Test6Param, ValueCheck) {

  for (int i = 0; i < count_test_block; ++i) {
    int err_count = 0;
-    uint8_t tmp = static_cast<uint8_t>(rnd(3 * MAX_LOOP_FILTER + 4));
+    uint8_t tmp = GetOuterThresh(&rnd);
    DECLARE_ALIGNED(16, const uint8_t,
                    blimit[16]) = { tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp,
                                    tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp };
-    tmp = static_cast<uint8_t>(rnd(MAX_LOOP_FILTER));
+    tmp = GetInnerThresh(&rnd);
    DECLARE_ALIGNED(16, const uint8_t,
                    limit[16]) = { tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp,
                                   tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp };
-    tmp = rnd.Rand8();
+    tmp = GetHevThresh(&rnd);
    DECLARE_ALIGNED(16, const uint8_t,
                    thresh[16]) = { tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp,
                                    tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp };
@@ -271,27 +283,27 @@ TEST_P(Loop8Test9Param, OperationCheck) {
  int first_failure = -1;
  for (int i = 0; i < count_test_block; ++i) {
    int err_count = 0;
-    uint8_t tmp = static_cast<uint8_t>(rnd(3 * MAX_LOOP_FILTER + 4));
+    uint8_t tmp = GetOuterThresh(&rnd);
    DECLARE_ALIGNED(16, const uint8_t,
                    blimit0[16]) = { tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp,
                                     tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp };
-    tmp = static_cast<uint8_t>(rnd(MAX_LOOP_FILTER));
+    tmp = GetInnerThresh(&rnd);
    DECLARE_ALIGNED(16, const uint8_t,
                    limit0[16]) = { tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp,
                                    tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp };
-    tmp = rnd.Rand8();
+    tmp = GetHevThresh(&rnd);
    DECLARE_ALIGNED(16, const uint8_t,
                    thresh0[16]) = { tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp,
                                     tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp };
-    tmp = static_cast<uint8_t>(rnd(3 * MAX_LOOP_FILTER + 4));
+    tmp = GetOuterThresh(&rnd);
    DECLARE_ALIGNED(16, const uint8_t,
                    blimit1[16]) = { tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp,
                                     tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp };
-    tmp = static_cast<uint8_t>(rnd(MAX_LOOP_FILTER));
+    tmp = GetInnerThresh(&rnd);
    DECLARE_ALIGNED(16, const uint8_t,
                    limit1[16]) = { tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp,
                                    tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp };
-    tmp = rnd.Rand8();
+    tmp = GetHevThresh(&rnd);
    DECLARE_ALIGNED(16, const uint8_t,
                    thresh1[16]) = { tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp,
                                     tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp };
@@ -334,27 +346,27 @@ TEST_P(Loop8Test9Param, ValueCheck) {
  int first_failure = -1;
  for (int i = 0; i < count_test_block; ++i) {
    int err_count = 0;
-    uint8_t tmp = static_cast<uint8_t>(rnd(3 * MAX_LOOP_FILTER + 4));
+    uint8_t tmp = GetOuterThresh(&rnd);
    DECLARE_ALIGNED(16, const uint8_t,
                    blimit0[16]) = { tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp,
                                     tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp };
-    tmp = static_cast<uint8_t>(rnd(MAX_LOOP_FILTER));
+    tmp = GetInnerThresh(&rnd);
    DECLARE_ALIGNED(16, const uint8_t,
                    limit0[16]) = { tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp,
                                    tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp };
-    tmp = rnd.Rand8();
+    tmp = GetHevThresh(&rnd);
    DECLARE_ALIGNED(16, const uint8_t,
                    thresh0[16]) = { tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp,
                                     tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp };
-    tmp = static_cast<uint8_t>(rnd(3 * MAX_LOOP_FILTER + 4));
+    tmp = GetOuterThresh(&rnd);
    DECLARE_ALIGNED(16, const uint8_t,
                    blimit1[16]) = { tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp,
                                     tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp };
-    tmp = static_cast<uint8_t>(rnd(MAX_LOOP_FILTER));
+    tmp = GetInnerThresh(&rnd);
    DECLARE_ALIGNED(16, const uint8_t,
                    limit1[16]) = { tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp,
                                    tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp };
-    tmp = rnd.Rand8();
+    tmp = GetHevThresh(&rnd);
    DECLARE_ALIGNED(16, const uint8_t,
                    thresh1[16]) = { tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp,
                                     tmp, tmp, tmp, tmp, tmp, tmp, tmp, tmp };
--- a/test/vp9_quantize_test.cc
+++ b/test/vp9_quantize_test.cc
@@ -14,9 +14,9 @@

 #include "third_party/googletest/src/include/gtest/gtest.h"

+#include "./vp9_rtcd.h"
 #include "./vpx_config.h"
 #include "./vpx_dsp_rtcd.h"
-#include "./vp9_rtcd.h"
 #include "test/acm_random.h"
 #include "test/buffer.h"
 #include "test/clear_system_state.h"
@@ -42,7 +42,7 @@ typedef void (*QuantizeFunc)(const tran_low_t *coeff, intptr_t count,
                             uint16_t *eob, const int16_t *scan,
                             const int16_t *iscan);
 typedef std::tr1::tuple<QuantizeFunc, QuantizeFunc, vpx_bit_depth_t,
-                        int /*max_size*/>
+                        int /*max_size*/, bool /*is_fp*/>
    QuantizeParam;

 // Wrapper for FP version which does not use zbin or quant_shift.
@@ -69,11 +69,15 @@ void QuantFPWrapper(const tran_low_t *coeff, intptr_t count, int skip_block,

 class VP9QuantizeBase {
 public:
-  VP9QuantizeBase(vpx_bit_depth_t bit_depth, int max_size)
-      : bit_depth_(bit_depth), max_size_(max_size) {
+  VP9QuantizeBase(vpx_bit_depth_t bit_depth, int max_size, bool is_fp)
+      : bit_depth_(bit_depth), max_size_(max_size), is_fp_(is_fp) {
    max_value_ = (1 << bit_depth_) - 1;
    zbin_ptr_ =
        reinterpret_cast<int16_t *>(vpx_memalign(16, 8 * sizeof(*zbin_ptr_)));
+    round_fp_ptr_ = reinterpret_cast<int16_t *>(
+        vpx_memalign(16, 8 * sizeof(*round_fp_ptr_)));
+    quant_fp_ptr_ = reinterpret_cast<int16_t *>(
+        vpx_memalign(16, 8 * sizeof(*quant_fp_ptr_)));
    round_ptr_ =
        reinterpret_cast<int16_t *>(vpx_memalign(16, 8 * sizeof(*round_ptr_)));
    quant_ptr_ =
@@ -86,11 +90,15 @@ class VP9QuantizeBase {

  ~VP9QuantizeBase() {
    vpx_free(zbin_ptr_);
+    vpx_free(round_fp_ptr_);
+    vpx_free(quant_fp_ptr_);
    vpx_free(round_ptr_);
    vpx_free(quant_ptr_);
    vpx_free(quant_shift_ptr_);
    vpx_free(dequant_ptr_);
    zbin_ptr_ = NULL;
+    round_fp_ptr_ = NULL;
+    quant_fp_ptr_ = NULL;
    round_ptr_ = NULL;
    quant_ptr_ = NULL;
    quant_shift_ptr_ = NULL;
@@ -100,6 +108,8 @@ class VP9QuantizeBase {

 protected:
  int16_t *zbin_ptr_;
+  int16_t *round_fp_ptr_;
+  int16_t *quant_fp_ptr_;
  int16_t *round_ptr_;
  int16_t *quant_ptr_;
  int16_t *quant_shift_ptr_;
@@ -107,29 +117,103 @@ class VP9QuantizeBase {
  const vpx_bit_depth_t bit_depth_;
  int max_value_;
  const int max_size_;
+  const bool is_fp_;
 };

 class VP9QuantizeTest : public VP9QuantizeBase,
                        public ::testing::TestWithParam<QuantizeParam> {
 public:
  VP9QuantizeTest()
-      : VP9QuantizeBase(GET_PARAM(2), GET_PARAM(3)), quantize_op_(GET_PARAM(0)),
-        ref_quantize_op_(GET_PARAM(1)) {}
+      : VP9QuantizeBase(GET_PARAM(2), GET_PARAM(3), GET_PARAM(4)),
+        quantize_op_(GET_PARAM(0)), ref_quantize_op_(GET_PARAM(1)) {}

 protected:
  const QuantizeFunc quantize_op_;
  const QuantizeFunc ref_quantize_op_;
 };

+// This quantizer compares the AC coefficients to the quantization step size to
+// determine if further multiplication operations are needed.
+// Based on vp9_quantize_fp_sse2().
+void quantize_fp_nz_c(const tran_low_t *coeff_ptr, intptr_t n_coeffs,
+                      int skip_block, const int16_t *round_ptr,
+                      const int16_t *quant_ptr, tran_low_t *qcoeff_ptr,
+                      tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr,
+                      uint16_t *eob_ptr, const int16_t *scan,
+                      const int16_t *iscan) {
+  int i, eob = -1;
+  const int thr = dequant_ptr[1] >> 1;
+  (void)iscan;
+  (void)skip_block;
+  assert(!skip_block);
+
+  // Quantization pass: All coefficients with index >= zero_flag are
+  // skippable. Note: zero_flag can be zero.
+  for (i = 0; i < n_coeffs; i += 16) {
+    int y;
+    int nzflag_cnt = 0;
+    int abs_coeff[16];
+    int coeff_sign[16];
+
+    // count nzflag for each row (16 tran_low_t)
+    for (y = 0; y < 16; ++y) {
+      const int rc = i + y;
+      const int coeff = coeff_ptr[rc];
+      coeff_sign[y] = (coeff >> 31);
+      abs_coeff[y] = (coeff ^ coeff_sign[y]) - coeff_sign[y];
+      // The first 16 are skipped in the sse2 code.  Do the same here to match.
+      if (i >= 16 && (abs_coeff[y] <= thr)) {
+        nzflag_cnt++;
+      }
+    }
+
+    for (y = 0; y < 16; ++y) {
+      const int rc = i + y;
+      // If all of the AC coeffs in a row has magnitude less than the
+      // quantization step_size/2, quantize to zero.
+      if (nzflag_cnt < 16) {
+        int tmp =
+            clamp(abs_coeff[y] + round_ptr[rc != 0], INT16_MIN, INT16_MAX);
+        tmp = (tmp * quant_ptr[rc != 0]) >> 16;
+        qcoeff_ptr[rc] = (tmp ^ coeff_sign[y]) - coeff_sign[y];
+        dqcoeff_ptr[rc] = qcoeff_ptr[rc] * dequant_ptr[rc != 0];
+      } else {
+        qcoeff_ptr[rc] = 0;
+        dqcoeff_ptr[rc] = 0;
+      }
+    }
+  }
+
+  // Scan for eob.
+  for (i = 0; i < n_coeffs; i++) {
+    // Use the scan order to find the correct eob.
+    const int rc = scan[i];
+    if (qcoeff_ptr[rc]) {
+      eob = i;
+    }
+  }
+  *eob_ptr = eob + 1;
+}
+
 void GenerateHelperArrays(ACMRandom *rnd, int16_t *zbin, int16_t *round,
                          int16_t *quant, int16_t *quant_shift,
-                          int16_t *dequant) {
+                          int16_t *dequant, int16_t *round_fp,
+                          int16_t *quant_fp) {
+  // Max when q == 0.  Otherwise, it is 48 for Y and 42 for U/V.
+  const int max_qrounding_factor_fp = 64;
+
  for (int j = 0; j < 2; j++) {
+    // The range is 4 to 1828 in the VP9 tables.
+    const int qlookup = rnd->RandRange(1825) + 4;
+    round_fp[j] = (max_qrounding_factor_fp * qlookup) >> 7;
+    quant_fp[j] = (1 << 16) / qlookup;
+
    // Values determined by deconstructing vp9_init_quantizer().
    // zbin may be up to 1143 for 8 and 10 bit Y values, or 1200 for 12 bit Y
    // values or U/V values of any bit depth. This is because y_delta is not
    // factored into the vp9_ac_quant() call.
    zbin[j] = rnd->RandRange(1200);
+
    // round may be up to 685 for Y values or 914 for U/V.
    round[j] = rnd->RandRange(914);
    // quant ranges from 1 to -32703
@@ -141,6 +225,8 @@ void GenerateHelperArrays(ACMRandom *rnd, int16_t *zbin, int16_t *round,
  }
  for (int j = 2; j < 8; j++) {
    zbin[j] = zbin[1];
+    round_fp[j] = round_fp[1];
+    quant_fp[j] = quant_fp[1];
    round[j] = round[1];
    quant[j] = quant[1];
    quant_shift[j] = quant_shift[1];
@@ -179,18 +265,18 @@ TEST_P(VP9QuantizeTest, OperationCheck) {
    const int count = (4 << sz) * (4 << sz);
    coeff.Set(&rnd, -max_value_, max_value_);
    GenerateHelperArrays(&rnd, zbin_ptr_, round_ptr_, quant_ptr_,
-                         quant_shift_ptr_, dequant_ptr_);
+                         quant_shift_ptr_, dequant_ptr_, round_fp_ptr_,
+                         quant_fp_ptr_);
+    int16_t *r_ptr = (is_fp_) ? round_fp_ptr_ : round_ptr_;
+    int16_t *q_ptr = (is_fp_) ? quant_fp_ptr_ : quant_ptr_;
+    ref_quantize_op_(coeff.TopLeftPixel(), count, skip_block, zbin_ptr_, r_ptr,
+                     q_ptr, quant_shift_ptr_, ref_qcoeff.TopLeftPixel(),
+                     ref_dqcoeff.TopLeftPixel(), dequant_ptr_, &ref_eob,
+                     scan_order->scan, scan_order->iscan);

-    ref_quantize_op_(coeff.TopLeftPixel(), count, skip_block, zbin_ptr_,
-                     round_ptr_, quant_ptr_, quant_shift_ptr_,
-                     ref_qcoeff.TopLeftPixel(), ref_dqcoeff.TopLeftPixel(),
-                     dequant_ptr_, &ref_eob, scan_order->scan,
-                     scan_order->iscan);
-
-    ASM_REGISTER_STATE_CHECK(
-        quantize_op_(coeff.TopLeftPixel(), count, skip_block, zbin_ptr_,
-                     round_ptr_, quant_ptr_, quant_shift_ptr_,
-                     qcoeff.TopLeftPixel(), dqcoeff.TopLeftPixel(),
+    ASM_REGISTER_STATE_CHECK(quantize_op_(
+        coeff.TopLeftPixel(), count, skip_block, zbin_ptr_, r_ptr, q_ptr,
+        quant_shift_ptr_, qcoeff.TopLeftPixel(), dqcoeff.TopLeftPixel(),
        dequant_ptr_, &eob, scan_order->scan, scan_order->iscan));

    EXPECT_TRUE(qcoeff.CheckValues(ref_qcoeff));
@@ -241,18 +327,18 @@ TEST_P(VP9QuantizeTest, EOBCheck) {
    coeff.TopLeftPixel()[rnd(count)] =
        static_cast<int>(rnd.RandRange(max_value_ * 2)) - max_value_;
    GenerateHelperArrays(&rnd, zbin_ptr_, round_ptr_, quant_ptr_,
-                         quant_shift_ptr_, dequant_ptr_);
+                         quant_shift_ptr_, dequant_ptr_, round_fp_ptr_,
+                         quant_fp_ptr_);
+    int16_t *r_ptr = (is_fp_) ? round_fp_ptr_ : round_ptr_;
+    int16_t *q_ptr = (is_fp_) ? quant_fp_ptr_ : quant_ptr_;
+    ref_quantize_op_(coeff.TopLeftPixel(), count, skip_block, zbin_ptr_, r_ptr,
+                     q_ptr, quant_shift_ptr_, ref_qcoeff.TopLeftPixel(),
+                     ref_dqcoeff.TopLeftPixel(), dequant_ptr_, &ref_eob,
+                     scan_order->scan, scan_order->iscan);

-    ref_quantize_op_(coeff.TopLeftPixel(), count, skip_block, zbin_ptr_,
-                     round_ptr_, quant_ptr_, quant_shift_ptr_,
-                     ref_qcoeff.TopLeftPixel(), ref_dqcoeff.TopLeftPixel(),
-                     dequant_ptr_, &ref_eob, scan_order->scan,
-                     scan_order->iscan);
-
-    ASM_REGISTER_STATE_CHECK(
-        quantize_op_(coeff.TopLeftPixel(), count, skip_block, zbin_ptr_,
-                     round_ptr_, quant_ptr_, quant_shift_ptr_,
-                     qcoeff.TopLeftPixel(), dqcoeff.TopLeftPixel(),
+    ASM_REGISTER_STATE_CHECK(quantize_op_(
+        coeff.TopLeftPixel(), count, skip_block, zbin_ptr_, r_ptr, q_ptr,
+        quant_shift_ptr_, qcoeff.TopLeftPixel(), dqcoeff.TopLeftPixel(),
        dequant_ptr_, &eob, scan_order->scan, scan_order->iscan));

    EXPECT_TRUE(qcoeff.CheckValues(ref_qcoeff));
@@ -299,7 +385,10 @@ TEST_P(VP9QuantizeTest, DISABLED_Speed) {
      const int count = (4 << sz) * (4 << sz);

      GenerateHelperArrays(&rnd, zbin_ptr_, round_ptr_, quant_ptr_,
-                           quant_shift_ptr_, dequant_ptr_);
+                           quant_shift_ptr_, dequant_ptr_, round_fp_ptr_,
+                           quant_fp_ptr_);
+      int16_t *r_ptr = (is_fp_) ? round_fp_ptr_ : round_ptr_;
+      int16_t *q_ptr = (is_fp_) ? quant_fp_ptr_ : quant_ptr_;

      if (i == 0) {
        // When |coeff values| are less than zbin the results are 0.
@@ -319,10 +408,10 @@ TEST_P(VP9QuantizeTest, DISABLED_Speed) {
      vpx_usec_timer timer;
      vpx_usec_timer_start(&timer);
      for (int j = 0; j < 100000000 / count; ++j) {
-        quantize_op_(coeff.TopLeftPixel(), count, skip_block, zbin_ptr_,
-                     round_ptr_, quant_ptr_, quant_shift_ptr_,
-                     qcoeff.TopLeftPixel(), dqcoeff.TopLeftPixel(),
-                     dequant_ptr_, &eob, scan_order->scan, scan_order->iscan);
+        quantize_op_(coeff.TopLeftPixel(), count, skip_block, zbin_ptr_, r_ptr,
+                     q_ptr, quant_shift_ptr_, qcoeff.TopLeftPixel(),
+                     dqcoeff.TopLeftPixel(), dequant_ptr_, &eob,
+                     scan_order->scan, scan_order->iscan);
      }
      vpx_usec_timer_mark(&timer);
      const int elapsed_time = static_cast<int>(vpx_usec_timer_elapsed(&timer));
@@ -345,50 +434,55 @@ INSTANTIATE_TEST_CASE_P(
    SSE2, VP9QuantizeTest,
    ::testing::Values(
        make_tuple(&vpx_highbd_quantize_b_sse2, &vpx_highbd_quantize_b_c,
-                   VPX_BITS_8, 16),
+                   VPX_BITS_8, 16, false),
        make_tuple(&vpx_highbd_quantize_b_sse2, &vpx_highbd_quantize_b_c,
-                   VPX_BITS_10, 16),
+                   VPX_BITS_10, 16, false),
        make_tuple(&vpx_highbd_quantize_b_sse2, &vpx_highbd_quantize_b_c,
-                   VPX_BITS_12, 16),
+                   VPX_BITS_12, 16, false),
        make_tuple(&vpx_highbd_quantize_b_32x32_sse2,
-                   &vpx_highbd_quantize_b_32x32_c, VPX_BITS_8, 32),
+                   &vpx_highbd_quantize_b_32x32_c, VPX_BITS_8, 32, false),
        make_tuple(&vpx_highbd_quantize_b_32x32_sse2,
-                   &vpx_highbd_quantize_b_32x32_c, VPX_BITS_10, 32),
+                   &vpx_highbd_quantize_b_32x32_c, VPX_BITS_10, 32, false),
        make_tuple(&vpx_highbd_quantize_b_32x32_sse2,
-                   &vpx_highbd_quantize_b_32x32_c, VPX_BITS_12, 32)));
+                   &vpx_highbd_quantize_b_32x32_c, VPX_BITS_12, 32, false)));

 #else
-INSTANTIATE_TEST_CASE_P(SSE2, VP9QuantizeTest,
-                        ::testing::Values(make_tuple(&vpx_quantize_b_sse2,
-                                                     &vpx_quantize_b_c,
-                                                     VPX_BITS_8, 16)));
-#endif  // CONFIG_VP9_HIGHBITDEPTH
-
 INSTANTIATE_TEST_CASE_P(
-    DISABLED_SSE2, VP9QuantizeTest,
-    ::testing::Values(make_tuple(&QuantFPWrapper<vp9_quantize_fp_sse2>,
-                                 &QuantFPWrapper<vp9_quantize_fp_c>, VPX_BITS_8,
-                                 16)));
+    SSE2, VP9QuantizeTest,
+    ::testing::Values(make_tuple(&vpx_quantize_b_sse2, &vpx_quantize_b_c,
+                                 VPX_BITS_8, 16, false),
+                      make_tuple(&QuantFPWrapper<vp9_quantize_fp_sse2>,
+                                 &QuantFPWrapper<quantize_fp_nz_c>, VPX_BITS_8,
+                                 16, true)));
+#endif  // CONFIG_VP9_HIGHBITDEPTH
 #endif  // HAVE_SSE2

 #if HAVE_SSSE3 && !CONFIG_VP9_HIGHBITDEPTH
+#if ARCH_X86_64
+INSTANTIATE_TEST_CASE_P(
+    SSSE3, VP9QuantizeTest,
+    ::testing::Values(make_tuple(&vpx_quantize_b_ssse3, &vpx_quantize_b_c,
+                                 VPX_BITS_8, 16, false),
+                      make_tuple(&QuantFPWrapper<vp9_quantize_fp_ssse3>,
+                                 &QuantFPWrapper<quantize_fp_nz_c>, VPX_BITS_8,
+                                 16, true)));
+#else
 INSTANTIATE_TEST_CASE_P(SSSE3, VP9QuantizeTest,
                        ::testing::Values(make_tuple(&vpx_quantize_b_ssse3,
                                                     &vpx_quantize_b_c,
-                                                     VPX_BITS_8, 16)));
+                                                     VPX_BITS_8, 16, false)));
+#endif

 #if ARCH_X86_64
 // TODO(johannkoenig): SSSE3 optimizations do not yet pass this test.
 INSTANTIATE_TEST_CASE_P(
    DISABLED_SSSE3, VP9QuantizeTest,
    ::testing::Values(make_tuple(&vpx_quantize_b_32x32_ssse3,
-                                 &vpx_quantize_b_32x32_c, VPX_BITS_8, 32),
-                      make_tuple(&QuantFPWrapper<vp9_quantize_fp_ssse3>,
-                                 &QuantFPWrapper<vp9_quantize_fp_c>, VPX_BITS_8,
-                                 16),
+                                 &vpx_quantize_b_32x32_c, VPX_BITS_8, 32,
+                                 false),
                      make_tuple(&QuantFPWrapper<vp9_quantize_fp_32x32_ssse3>,
                                 &QuantFPWrapper<vp9_quantize_fp_32x32_c>,
-                                 VPX_BITS_8, 32)));
+                                 VPX_BITS_8, 32, true)));
 #endif  // ARCH_X86_64
 #endif  // HAVE_SSSE3 && !CONFIG_VP9_HIGHBITDEPTH

@@ -398,36 +492,43 @@ INSTANTIATE_TEST_CASE_P(
 INSTANTIATE_TEST_CASE_P(
    AVX, VP9QuantizeTest,
    ::testing::Values(make_tuple(&vpx_quantize_b_avx, &vpx_quantize_b_c,
-                                 VPX_BITS_8, 16),
+                                 VPX_BITS_8, 16, false),
                      // Even though SSSE3 and AVX do not match the reference
                      // code, we can keep them in sync with each other.
                      make_tuple(&vpx_quantize_b_32x32_avx,
-                                 &vpx_quantize_b_32x32_ssse3, VPX_BITS_8, 32)));
+                                 &vpx_quantize_b_32x32_ssse3, VPX_BITS_8, 32,
+                                 false)));
 #endif  // HAVE_AVX && !CONFIG_VP9_HIGHBITDEPTH

 // TODO(webm:1448): dqcoeff is not handled correctly in HBD builds.
 #if HAVE_NEON && !CONFIG_VP9_HIGHBITDEPTH
 INSTANTIATE_TEST_CASE_P(
    NEON, VP9QuantizeTest,
-    ::testing::Values(
-        make_tuple(&vpx_quantize_b_neon, &vpx_quantize_b_c, VPX_BITS_8, 16),
-        make_tuple(&vpx_quantize_b_32x32_neon, &vpx_quantize_b_32x32_c,
-                   VPX_BITS_8, 32),
+    ::testing::Values(make_tuple(&vpx_quantize_b_neon, &vpx_quantize_b_c,
+                                 VPX_BITS_8, 16, false),
+                      make_tuple(&vpx_quantize_b_32x32_neon,
+                                 &vpx_quantize_b_32x32_c, VPX_BITS_8, 32,
+                                 false),
                      make_tuple(&QuantFPWrapper<vp9_quantize_fp_neon>,
-                   &QuantFPWrapper<vp9_quantize_fp_c>, VPX_BITS_8, 16),
+                                 &QuantFPWrapper<vp9_quantize_fp_c>, VPX_BITS_8,
+                                 16, true),
                      make_tuple(&QuantFPWrapper<vp9_quantize_fp_32x32_neon>,
-                   &QuantFPWrapper<vp9_quantize_fp_32x32_c>, VPX_BITS_8, 32)));
+                                 &QuantFPWrapper<vp9_quantize_fp_32x32_c>,
+                                 VPX_BITS_8, 32, true)));
 #endif  // HAVE_NEON && !CONFIG_VP9_HIGHBITDEPTH

 // Only useful to compare "Speed" test results.
 INSTANTIATE_TEST_CASE_P(
    DISABLED_C, VP9QuantizeTest,
    ::testing::Values(
-        make_tuple(&vpx_quantize_b_c, &vpx_quantize_b_c, VPX_BITS_8, 16),
+        make_tuple(&vpx_quantize_b_c, &vpx_quantize_b_c, VPX_BITS_8, 16, false),
        make_tuple(&vpx_quantize_b_32x32_c, &vpx_quantize_b_32x32_c, VPX_BITS_8,
-                   32),
+                   32, false),
        make_tuple(&QuantFPWrapper<vp9_quantize_fp_c>,
-                   &QuantFPWrapper<vp9_quantize_fp_c>, VPX_BITS_8, 16),
+                   &QuantFPWrapper<vp9_quantize_fp_c>, VPX_BITS_8, 16, true),
+        make_tuple(&QuantFPWrapper<quantize_fp_nz_c>,
+                   &QuantFPWrapper<quantize_fp_nz_c>, VPX_BITS_8, 16, true),
        make_tuple(&QuantFPWrapper<vp9_quantize_fp_32x32_c>,
-                   &QuantFPWrapper<vp9_quantize_fp_32x32_c>, VPX_BITS_8, 32)));
+                   &QuantFPWrapper<vp9_quantize_fp_32x32_c>, VPX_BITS_8, 32,
+                   true)));
 }  // namespace
--- a/tools/all_builds.py
+++ b/tools/all_builds.py
@@ -1,72 +0,0 @@
-#!/usr/bin/python
-
-import getopt
-import subprocess
-import sys
-
-LONG_OPTIONS = ["shard=", "shards="]
-BASE_COMMAND = "./configure --enable-internal-stats --enable-experimental"
-
-def RunCommand(command):
-  run = subprocess.Popen(command, shell=True)
-  output = run.communicate()
-  if run.returncode:
-    print "Non-zero return code: " + str(run.returncode) + " => exiting!"
-    sys.exit(1)
-
-def list_of_experiments():
-  experiments = []
-  configure_file = open("configure")
-  list_start = False
-  for line in configure_file.read().split("\n"):
-    if line == 'EXPERIMENT_LIST="':
-      list_start = True
-    elif line == '"':
-      list_start = False
-    elif list_start:
-      currently_broken = ["csm"]
-      experiment = line[4:]
-      if experiment not in currently_broken:
-        experiments.append(experiment)
-  return experiments
-
-def main(argv):
-  # Parse arguments
-  options = {"--shard": 0, "--shards": 1}
-  if "--" in argv:
-    opt_end_index = argv.index("--")
-  else:
-    opt_end_index = len(argv)
-  try:
-    o, _ = getopt.getopt(argv[1:opt_end_index], None, LONG_OPTIONS)
-  except getopt.GetoptError, err:
-    print str(err)
-    print "Usage: %s [--shard=<n> --shards=<n>] -- [configure flag ...]"%argv[0]
-    sys.exit(2)
-
-  options.update(o)
-  extra_args = argv[opt_end_index + 1:]
-
-  # Shard experiment list
-  shard = int(options["--shard"])
-  shards = int(options["--shards"])
-  experiments = list_of_experiments()
-  base_command = " ".join([BASE_COMMAND] + extra_args)
-  configs = [base_command]
-  configs += ["%s --enable-%s" % (base_command, e) for e in experiments]
-  my_configs = zip(configs, range(len(configs)))
-  my_configs = filter(lambda x: x[1] % shards == shard, my_configs)
-  my_configs = [e[0] for e in my_configs]
-
-  # Run configs for this shard
-  for config in my_configs:
-    test_build(config)
-
-def test_build(configure_command):
-  print "\033[34m\033[47mTesting %s\033[0m" % (configure_command)
-  RunCommand(configure_command)
-  RunCommand("make clean")
-  RunCommand("make")
-
-if __name__ == "__main__":
-  main(sys.argv)
--- a/tools/author_first_release.sh
+++ b/tools/author_first_release.sh
@@ -1,15 +0,0 @@
-#!/bin/bash
-##
-## List the release each author first contributed to.
-##
-## Usage: author_first_release.sh [TAGS]
-##
-## If the TAGS arguments are unspecified, all tags reported by `git tag`
-## will be considered.
-##
-tags=${@:-$(git tag)}
-for tag in $tags; do
-  git shortlog -n -e -s $tag |
-      cut -f2- |
-      awk "{print \"${tag#v}\t\"\$0}"
-done | sort -k2  | uniq -f2
--- a/tools/ftfy.sh
+++ b/tools/ftfy.sh
@@ -1,158 +0,0 @@
-#!/bin/sh
-self="$0"
-dirname_self=$(dirname "$self")
-
-usage() {
-  cat <<EOF >&2
-Usage: $self [option]
-
-This script applies a whitespace transformation to the commit at HEAD. If no
-options are given, then the modified files are left in the working tree.
-
-Options:
-  -h, --help     Shows this message
-  -n, --dry-run  Shows a diff of the changes to be made.
-  --amend        Squashes the changes into the commit at HEAD
-                     This option will also reformat the commit message.
-  --commit       Creates a new commit containing only the whitespace changes
-  --msg-only     Reformat the commit message only, ignore the patch itself.
-
-EOF
-  rm -f ${CLEAN_FILES}
-  exit 1
-}
-
-
-log() {
-  echo "${self##*/}: $@" >&2
-}
-
-
-vpx_style() {
-  for f; do
-    case "$f" in
-      *.h|*.c|*.cc)
-        clang-format -i --style=file "$f"
-        ;;
-    esac
-  done
-}
-
-
-apply() {
-  [ $INTERSECT_RESULT -ne 0 ] && patch -p1 < "$1"
-}
-
-
-commit() {
-  LAST_CHANGEID=$(git show | awk '/Change-Id:/{print $2}')
-  if [ -z "$LAST_CHANGEID" ]; then
-    log "HEAD doesn't have a Change-Id, unable to generate a new commit"
-    exit 1
-  fi
-
-  # Build a deterministic Change-Id from the parent's
-  NEW_CHANGEID=${LAST_CHANGEID}-styled
-  NEW_CHANGEID=I$(echo $NEW_CHANGEID | git hash-object --stdin)
-
-  # Commit, preserving authorship from the parent commit.
-  git commit -a -C HEAD > /dev/null
-  git commit --amend -F- << EOF
-Cosmetic: Fix whitespace in change ${LAST_CHANGEID:0:9}
-
-Change-Id: ${NEW_CHANGEID}
-EOF
-}
-
-
-show_commit_msg_diff() {
-  if [ $DIFF_MSG_RESULT -ne 0 ]; then
-    log "Modified commit message:"
-    diff -u "$ORIG_COMMIT_MSG" "$NEW_COMMIT_MSG" | tail -n +3
-  fi
-}
-
-
-amend() {
-  show_commit_msg_diff
-  if [ $DIFF_MSG_RESULT -ne 0 ] || [ $INTERSECT_RESULT -ne 0 ]; then
-    git commit -a --amend -F "$NEW_COMMIT_MSG"
-  fi
-}
-
-
-diff_msg() {
-  git log -1 --format=%B > "$ORIG_COMMIT_MSG"
-  "${dirname_self}"/wrap-commit-msg.py \
-      < "$ORIG_COMMIT_MSG" > "$NEW_COMMIT_MSG"
-  cmp -s "$ORIG_COMMIT_MSG" "$NEW_COMMIT_MSG"
-  DIFF_MSG_RESULT=$?
-}
-
-
-# Temporary files
-ORIG_DIFF=orig.diff.$$
-MODIFIED_DIFF=modified.diff.$$
-FINAL_DIFF=final.diff.$$
-ORIG_COMMIT_MSG=orig.commit-msg.$$
-NEW_COMMIT_MSG=new.commit-msg.$$
-CLEAN_FILES="${ORIG_DIFF} ${MODIFIED_DIFF} ${FINAL_DIFF}"
-CLEAN_FILES="${CLEAN_FILES} ${ORIG_COMMIT_MSG} ${NEW_COMMIT_MSG}"
-
-# Preconditions
-[ $# -lt 2 ] || usage
-
-if ! clang-format -version >/dev/null 2>&1; then
-  log "clang-format not found"
-  exit 1
-fi
-
-if ! git diff --quiet HEAD; then
-  log "Working tree is dirty, commit your changes first"
-  exit 1
-fi
-
-# Need to be in the root
-cd "$(git rev-parse --show-toplevel)"
-
-# Collect the original diff
-git show > "${ORIG_DIFF}"
-
-# Apply the style guide on new and modified files and collect its diff
-for f in $(git diff HEAD^ --name-only -M90 --diff-filter=AM); do
-  case "$f" in
-    third_party/*) continue;;
-  esac
-  vpx_style "$f"
-done
-git diff --no-color --no-ext-diff > "${MODIFIED_DIFF}"
-
-# Intersect the two diffs
-"${dirname_self}"/intersect-diffs.py \
-    "${ORIG_DIFF}" "${MODIFIED_DIFF}" > "${FINAL_DIFF}"
-INTERSECT_RESULT=$?
-git reset --hard >/dev/null
-
-# Fixup the commit message
-diff_msg
-
-# Handle options
-if [ -n "$1" ]; then
-  case "$1" in
-    -h|--help) usage;;
-    -n|--dry-run) cat "${FINAL_DIFF}"; show_commit_msg_diff;;
-    --commit) apply "${FINAL_DIFF}"; commit;;
-    --amend) apply "${FINAL_DIFF}"; amend;;
-    --msg-only) amend;;
-    *) usage;;
-  esac
-else
-  apply "${FINAL_DIFF}"
-  if ! git diff --quiet; then
-    log "Formatting changes applied, verify and commit."
-    log "See also: http://www.webmproject.org/code/contribute/conventions/"
-    git diff --stat
-  fi
-fi
-
-rm -f ${CLEAN_FILES}
--- a/vp8/common/mips/mmi/loopfilter_filters_mmi.c
+++ b/vp8/common/mips/mmi/loopfilter_filters_mmi.c
@@ -461,96 +461,87 @@ void vp8_loop_filter_vertical_edge_mmi(unsigned char *src_ptr,
  );
 }

+/* clang-format off */
 #define VP8_MBLOOP_HPSRAB                                               \
-  "xor        %[ftmp3],   %[ftmp3],           %[ftmp3]            \n\t" \
-  "xor        %[ftmp8],   %[ftmp8],           %[ftmp8]            \n\t" \
-  "punpcklbh  %[ftmp3],   %[ftmp3],           %[ftmp0]            \n\t" \
-  "punpckhbh  %[ftmp8],   %[ftmp8],           %[ftmp0]            \n\t" \
-  "psrah      %[ftmp3],   %[ftmp3],           %[ftmp9]            \n\t" \
-  "psrah      %[ftmp8],   %[ftmp8],           %[ftmp9]            \n\t" \
-  "packsshb   %[ftmp0],   %[ftmp3],           %[ftmp8]            \n\t"
-
-#define VP8_MBLOOP_HPSRAB_PMULHH(reg1, reg2) \
-  "pmulhh   " #reg1 ",  " #reg1 ",  " #reg2 "                     \n\t"
+  "punpcklbh  %[ftmp10],  %[ftmp10],          %[ftmp0]            \n\t" \
+  "punpckhbh  %[ftmp11],  %[ftmp11],          %[ftmp0]            \n\t" \
+  "psrah      %[ftmp10],  %[ftmp10],          %[ftmp9]            \n\t" \
+  "psrah      %[ftmp11],  %[ftmp11],          %[ftmp9]            \n\t" \
+  "packsshb   %[ftmp0],   %[ftmp10],          %[ftmp11]            \n\t"

 #define VP8_MBLOOP_HPSRAB_ADD(reg)                                      \
-  "xor        %[ftmp3],   %[ftmp3],           %[ftmp3]            \n\t" \
-  "xor        %[ftmp8],   %[ftmp8],           %[ftmp8]            \n\t" \
-  "punpcklbh  %[ftmp3],   %[ftmp3],           %[ftmp2]            \n\t" \
-  "punpckhbh  %[ftmp8],   %[ftmp8],           %[ftmp2]            \n\t" \
-  VP8_MBLOOP_HPSRAB_PMULHH(%[ftmp3], reg)                               \
-  VP8_MBLOOP_HPSRAB_PMULHH(%[ftmp8], reg)                               \
-  "paddh      %[ftmp3],   %[ftmp3],           %[ff_ph_003f]       \n\t" \
-  "paddh      %[ftmp8],   %[ftmp8],           %[ff_ph_003f]       \n\t" \
-  "psrah      %[ftmp3],   %[ftmp3],           %[ftmp9]            \n\t" \
-  "psrah      %[ftmp8],   %[ftmp8],           %[ftmp9]            \n\t" \
-  "packsshb   %[ftmp3],   %[ftmp3],           %[ftmp8]            \n\t"
+  "punpcklbh  %[ftmp1],   %[ftmp0],           %[ftmp12]           \n\t" \
+  "punpckhbh  %[ftmp2],   %[ftmp0],           %[ftmp12]           \n\t" \
+  "pmulhh     %[ftmp1],   %[ftmp1],         " #reg "              \n\t" \
+  "pmulhh     %[ftmp2],   %[ftmp2],         " #reg "              \n\t" \
+  "paddh      %[ftmp1],   %[ftmp1],           %[ff_ph_003f]       \n\t" \
+  "paddh      %[ftmp2],   %[ftmp2],           %[ff_ph_003f]       \n\t" \
+  "psrah      %[ftmp1],   %[ftmp1],           %[ftmp9]            \n\t" \
+  "psrah      %[ftmp2],   %[ftmp2],           %[ftmp9]            \n\t" \
+  "packsshb   %[ftmp1],   %[ftmp1],           %[ftmp2]            \n\t"
+/* clang-format on */

 void vp8_mbloop_filter_horizontal_edge_mmi(
    unsigned char *src_ptr, int src_pixel_step, const unsigned char *blimit,
    const unsigned char *limit, const unsigned char *thresh, int count) {
  uint32_t tmp[1];
-  mips_reg addr[2];
-  DECLARE_ALIGNED(8, const uint64_t, srct[1]);
-  double ftmp[10];
+  double ftmp[13];

  __asm__ volatile (
+    MMI_SLL(%[tmp0], %[src_pixel_step], 0x02)
+    MMI_SUBU(%[src_ptr], %[src_ptr], %[tmp0])
    "1:                                                             \n\t"
    "gsldlc1    %[ftmp9],   0x07(%[limit])                          \n\t"
    "gsldrc1    %[ftmp9],   0x00(%[limit])                          \n\t"
+    /* ftmp1: p3 */
+    "gsldlc1    %[ftmp1],   0x07(%[src_ptr])                        \n\t"
+    "gsldrc1    %[ftmp1],   0x00(%[src_ptr])                        \n\t"
+    /* ftmp3: p2 */
+    MMI_ADDU(%[src_ptr], %[src_ptr], %[src_pixel_step])
+    "gsldlc1    %[ftmp3],   0x07(%[src_ptr])                        \n\t"
+    "gsldrc1    %[ftmp3],   0x00(%[src_ptr])                        \n\t"
+    /* ftmp4: p1 */
+    MMI_ADDU(%[src_ptr], %[src_ptr], %[src_pixel_step])
+    "gsldlc1    %[ftmp4],   0x07(%[src_ptr])                        \n\t"
+    "gsldrc1    %[ftmp4],   0x00(%[src_ptr])                        \n\t"
+    /* ftmp5: p0 */
+    MMI_ADDU(%[src_ptr], %[src_ptr], %[src_pixel_step])
+    "gsldlc1    %[ftmp5],   0x07(%[src_ptr])                        \n\t"
+    "gsldrc1    %[ftmp5],   0x00(%[src_ptr])                        \n\t"
+    /* ftmp6: q0 */
+    MMI_ADDU(%[src_ptr], %[src_ptr], %[src_pixel_step])
+    "gsldlc1    %[ftmp6],   0x07(%[src_ptr])                        \n\t"
+    "gsldrc1    %[ftmp6],   0x00(%[src_ptr])                        \n\t"
+    /* ftmp7: q1 */
+    MMI_ADDU(%[src_ptr], %[src_ptr], %[src_pixel_step])
+    "gsldlc1    %[ftmp7],   0x07(%[src_ptr])                        \n\t"
+    "gsldrc1    %[ftmp7],   0x00(%[src_ptr])                        \n\t"
+    /* ftmp8: q2 */
+    MMI_ADDU(%[src_ptr], %[src_ptr], %[src_pixel_step])
+    "gsldlc1    %[ftmp8],   0x07(%[src_ptr])                        \n\t"
+    "gsldrc1    %[ftmp8],   0x00(%[src_ptr])                        \n\t"
+    /* ftmp2: q3 */
+    MMI_ADDU(%[src_ptr], %[src_ptr], %[src_pixel_step])
+    "gsldlc1    %[ftmp2],   0x07(%[src_ptr])                        \n\t"
+    "gsldrc1    %[ftmp2],   0x00(%[src_ptr])                        \n\t"

-    MMI_ADDU(%[addr0], %[src_ptr], %[src_pixel_step])
+    "gsldlc1    %[ftmp12],  0x07(%[blimit])                         \n\t"
+    "gsldrc1    %[ftmp12],  0x00(%[blimit])                         \n\t"

-    MMI_SLL(%[tmp0], %[src_pixel_step], 0x02)
-    MMI_SUBU(%[addr1], %[src_ptr], %[tmp0])
-    "gsldlc1    %[ftmp1],   0x07(%[addr1])                          \n\t"
-    "gsldrc1    %[ftmp1],   0x00(%[addr1])                          \n\t"
-    MMI_SUBU(%[addr1], %[addr0], %[tmp0])
-    "gsldlc1    %[ftmp3],   0x07(%[addr1])                          \n\t"
-    "gsldrc1    %[ftmp3],   0x00(%[addr1])                          \n\t"
    "pasubub    %[ftmp0],   %[ftmp1],           %[ftmp3]            \n\t"
    "psubusb    %[ftmp0],   %[ftmp0],           %[ftmp9]            \n\t"
-
-    /* ftmp4:p1 */
-    MMI_SLL(%[tmp0], %[src_pixel_step], 0x01)
-    MMI_SUBU(%[addr1], %[src_ptr], %[tmp0])
-    "gsldlc1    %[ftmp4],   0x07(%[addr1])                          \n\t"
-    "gsldrc1    %[ftmp4],   0x00(%[addr1])                          \n\t"
    "pasubub    %[ftmp1],   %[ftmp3],           %[ftmp4]            \n\t"
    "psubusb    %[ftmp1],   %[ftmp1],           %[ftmp9]            \n\t"
    "or         %[ftmp0],   %[ftmp0],           %[ftmp1]            \n\t"
-
-    /* ftmp5:p0 */
-    MMI_SUBU(%[addr1], %[src_ptr], %[src_pixel_step])
-    "gsldlc1    %[ftmp5],   0x07(%[addr1])                          \n\t"
-    "gsldrc1    %[ftmp5],   0x00(%[addr1])                          \n\t"
-    "pasubub    %[ftmp1],   %[ftmp4],           %[ftmp5]            \n\t"
-    "sdc1       %[ftmp1],   0x00(%[srct])                           \n\t"
-    "psubusb    %[ftmp1],   %[ftmp1],           %[ftmp9]            \n\t"
+    "pasubub    %[ftmp10],  %[ftmp4],           %[ftmp5]            \n\t"
+    "psubusb    %[ftmp1],   %[ftmp10],          %[ftmp9]            \n\t"
    "or         %[ftmp0],   %[ftmp0],           %[ftmp1]            \n\t"
-
-    /* ftmp6:q0 */
-    "gsldlc1    %[ftmp6],   0x07(%[src_ptr])                        \n\t"
-    "gsldrc1    %[ftmp6],   0x00(%[src_ptr])                        \n\t"
-
-    /* ftmp7:q1 */
-    "gsldlc1    %[ftmp7],   0x07(%[addr0])                          \n\t"
-    "gsldrc1    %[ftmp7],   0x00(%[addr0])                          \n\t"
-    "pasubub    %[ftmp1],   %[ftmp7],           %[ftmp6]            \n\t"
-    "sdc1       %[ftmp1],   0x08(%[srct])                           \n\t"
-    "psubusb    %[ftmp1],   %[ftmp1],           %[ftmp9]            \n\t"
+    "pasubub    %[ftmp11],  %[ftmp7],           %[ftmp6]            \n\t"
+    "psubusb    %[ftmp1],   %[ftmp11],          %[ftmp9]            \n\t"
    "or         %[ftmp0],   %[ftmp0],           %[ftmp1]            \n\t"
-
-    MMI_ADDU(%[addr1], %[src_ptr], %[tmp0])
-    "gsldlc1    %[ftmp8],   0x07(%[addr1])                          \n\t"
-    "gsldrc1    %[ftmp8],   0x00(%[addr1])                          \n\t"
    "pasubub    %[ftmp1],   %[ftmp8],           %[ftmp7]            \n\t"
    "psubusb    %[ftmp1],   %[ftmp1],           %[ftmp9]            \n\t"
    "or         %[ftmp0],   %[ftmp0],           %[ftmp1]            \n\t"
-
-    MMI_ADDU(%[addr1], %[addr0], %[tmp0])
-    "gsldlc1    %[ftmp2],   0x07(%[addr1])                          \n\t"
-    "gsldrc1    %[ftmp2],   0x00(%[addr1])                          \n\t"
    "pasubub    %[ftmp1],   %[ftmp2],           %[ftmp8]            \n\t"
    "psubusb    %[ftmp1],   %[ftmp1],           %[ftmp9]            \n\t"
    "or         %[ftmp0],   %[ftmp0],           %[ftmp1]            \n\t"
@@ -563,9 +554,7 @@ void vp8_mbloop_filter_horizontal_edge_mmi(
    "mtc1       %[tmp0],    %[ftmp9]                                \n\t"
    "psrlh      %[ftmp2],   %[ftmp2],           %[ftmp9]            \n\t"
    "paddusb    %[ftmp1],   %[ftmp1],           %[ftmp2]            \n\t"
-    "gsldlc1    %[ftmp9],   0x07(%[blimit])                         \n\t"
-    "gsldrc1    %[ftmp9],   0x00(%[blimit])                         \n\t"
-    "psubusb    %[ftmp1],   %[ftmp1],           %[ftmp9]            \n\t"
+    "psubusb    %[ftmp1],   %[ftmp1],           %[ftmp12]           \n\t"
    "or         %[ftmp0],   %[ftmp0],           %[ftmp1]            \n\t"
    "xor        %[ftmp9],   %[ftmp9],           %[ftmp9]            \n\t"
    /* ftmp0: mask */
@@ -573,29 +562,26 @@ void vp8_mbloop_filter_horizontal_edge_mmi(

    "gsldlc1    %[ftmp9],   0x07(%[thresh])                         \n\t"
    "gsldrc1    %[ftmp9],   0x00(%[thresh])                         \n\t"
-    "ldc1       %[ftmp1],   0x00(%[srct])                           \n\t"
-    "psubusb    %[ftmp1],   %[ftmp1],           %[ftmp9]            \n\t"
-    "ldc1       %[ftmp2],   0x08(%[srct])                           \n\t"
-    "psubusb    %[ftmp2],   %[ftmp2],           %[ftmp9]            \n\t"
+    "psubusb    %[ftmp1],   %[ftmp10],          %[ftmp9]            \n\t"
+    "psubusb    %[ftmp2],   %[ftmp11],          %[ftmp9]            \n\t"
    "paddb      %[ftmp1],   %[ftmp1],           %[ftmp2]            \n\t"
    "xor        %[ftmp2],   %[ftmp2],           %[ftmp2]            \n\t"
    "pcmpeqb    %[ftmp1],   %[ftmp1],           %[ftmp2]            \n\t"
    "pcmpeqb    %[ftmp2],   %[ftmp2],           %[ftmp2]            \n\t"
-    /* ftmp1:hev*/
+    /* ftmp1: hev */
    "xor        %[ftmp1],   %[ftmp1],           %[ftmp2]            \n\t"

    "xor        %[ftmp4],   %[ftmp4],           %[ff_pb_80]         \n\t"
    "xor        %[ftmp5],   %[ftmp5],           %[ff_pb_80]         \n\t"
    "xor        %[ftmp6],   %[ftmp6],           %[ff_pb_80]         \n\t"
    "xor        %[ftmp7],   %[ftmp7],           %[ff_pb_80]         \n\t"
-
    "psubsb     %[ftmp2],   %[ftmp4],           %[ftmp7]            \n\t"
    "psubsb     %[ftmp9],   %[ftmp6],           %[ftmp5]            \n\t"
    "paddsb     %[ftmp2],   %[ftmp2],           %[ftmp9]            \n\t"
    "paddsb     %[ftmp2],   %[ftmp2],           %[ftmp9]            \n\t"
    "paddsb     %[ftmp2],   %[ftmp2],           %[ftmp9]            \n\t"
    "and        %[ftmp2],   %[ftmp2],           %[ftmp0]            \n\t"
-    "sdc1       %[ftmp2],   0x00(%[srct])                           \n\t"
+    "pandn      %[ftmp12],  %[ftmp1],           %[ftmp2]            \n\t"
    "and        %[ftmp2],   %[ftmp2],           %[ftmp1]            \n\t"

    "li         %[tmp0],    0x0b                                    \n\t"
@@ -606,70 +592,66 @@ void vp8_mbloop_filter_horizontal_edge_mmi(
    "paddsb     %[ftmp0],   %[ftmp2],           %[ff_pb_04]         \n\t"
    VP8_MBLOOP_HPSRAB
    "psubsb     %[ftmp6],   %[ftmp6],           %[ftmp0]            \n\t"
-    "ldc1       %[ftmp2],   0x00(%[srct])                           \n\t"
-    "pandn      %[ftmp2],   %[ftmp1],           %[ftmp2]            \n\t"

    "li         %[tmp0],    0x07                                    \n\t"
    "mtc1       %[tmp0],    %[ftmp9]                                \n\t"
+    "xor        %[ftmp0],   %[ftmp0],           %[ftmp0]            \n\t"
+
    VP8_MBLOOP_HPSRAB_ADD(%[ff_ph_1b00])
-    "psubsb     %[ftmp6],   %[ftmp6],           %[ftmp3]            \n\t"
-    "paddsb     %[ftmp5],   %[ftmp5],           %[ftmp3]            \n\t"
+    "psubsb     %[ftmp6],   %[ftmp6],           %[ftmp1]            \n\t"
+    "paddsb     %[ftmp5],   %[ftmp5],           %[ftmp1]            \n\t"
    "xor        %[ftmp6],   %[ftmp6],           %[ff_pb_80]         \n\t"
    "xor        %[ftmp5],   %[ftmp5],           %[ff_pb_80]         \n\t"
-
-    MMI_SUBU(%[addr1], %[src_ptr], %[src_pixel_step])
-    "gssdlc1    %[ftmp5],   0x07(%[addr1])                          \n\t"
-    "gssdrc1    %[ftmp5],   0x00(%[addr1])                          \n\t"
+    MMI_SLL(%[tmp0], %[src_pixel_step], 0x02)
+    MMI_SUBU(%[src_ptr], %[src_ptr], %[tmp0])
+    "gssdlc1    %[ftmp5],   0x07(%[src_ptr])                        \n\t"
+    "gssdrc1    %[ftmp5],   0x00(%[src_ptr])                        \n\t"
+    MMI_ADDU(%[src_ptr], %[src_ptr], %[src_pixel_step])
    "gssdlc1    %[ftmp6],   0x07(%[src_ptr])                        \n\t"
    "gssdrc1    %[ftmp6],   0x00(%[src_ptr])                        \n\t"
+
    VP8_MBLOOP_HPSRAB_ADD(%[ff_ph_1200])
-    "paddsb     %[ftmp4],   %[ftmp4],           %[ftmp3]            \n\t"
-    "psubsb     %[ftmp7],   %[ftmp7],           %[ftmp3]            \n\t"
+    "paddsb     %[ftmp4],   %[ftmp4],           %[ftmp1]            \n\t"
+    "psubsb     %[ftmp7],   %[ftmp7],           %[ftmp1]            \n\t"
    "xor        %[ftmp4],   %[ftmp4],           %[ff_pb_80]         \n\t"
    "xor        %[ftmp7],   %[ftmp7],           %[ff_pb_80]         \n\t"
-
-    "gssdlc1    %[ftmp7],   0x07(%[addr0])                          \n\t"
-    "gssdrc1    %[ftmp7],   0x00(%[addr0])                          \n\t"
-    MMI_SLL(%[tmp0], %[src_pixel_step], 0x01)
-    MMI_SUBU(%[addr1], %[src_ptr], %[tmp0])
-    "gssdlc1    %[ftmp4],   0x07(%[addr1])                          \n\t"
-    "gssdrc1    %[ftmp4],   0x00(%[addr1])                          \n\t"
+    MMI_ADDU(%[src_ptr], %[src_ptr], %[src_pixel_step])
+    "gssdlc1    %[ftmp7],   0x07(%[src_ptr])                        \n\t"
+    "gssdrc1    %[ftmp7],   0x00(%[src_ptr])                        \n\t"
+    MMI_SUBU(%[src_ptr], %[src_ptr], %[tmp0])
+    MMI_ADDU(%[src_ptr], %[src_ptr], %[src_pixel_step])
+    "gssdlc1    %[ftmp4],   0x07(%[src_ptr])                        \n\t"
+    "gssdrc1    %[ftmp4],   0x00(%[src_ptr])                        \n\t"

    VP8_MBLOOP_HPSRAB_ADD(%[ff_ph_0900])
-    MMI_SLL(%[tmp0], %[src_pixel_step], 0x02)
-    MMI_SUBU(%[addr1], %[addr0], %[tmp0])
-    "gsldlc1    %[ftmp4],   0x07(%[addr1])                          \n\t"
-    "gsldrc1    %[ftmp4],   0x00(%[addr1])                          \n\t"
-    MMI_ADDU(%[addr1], %[addr0], %[src_pixel_step])
-    "gsldlc1    %[ftmp7],   0x07(%[addr1])                          \n\t"
-    "gsldrc1    %[ftmp7],   0x00(%[addr1])                          \n\t"
+    "xor        %[ftmp3],   %[ftmp3],           %[ff_pb_80]         \n\t"
+    "xor        %[ftmp8],   %[ftmp8],           %[ff_pb_80]         \n\t"
+    "paddsb     %[ftmp3],   %[ftmp3],           %[ftmp1]            \n\t"
+    "psubsb     %[ftmp8],   %[ftmp8],           %[ftmp1]            \n\t"
+    "xor        %[ftmp3],   %[ftmp3],           %[ff_pb_80]         \n\t"
+    "xor        %[ftmp8],   %[ftmp8],           %[ff_pb_80]         \n\t"
+    MMI_ADDU(%[src_ptr], %[src_ptr], %[tmp0])
+    "gssdlc1    %[ftmp8],   0x07(%[src_ptr])                        \n\t"
+    "gssdrc1    %[ftmp8],   0x00(%[src_ptr])                        \n\t"
+    MMI_SUBU(%[src_ptr], %[src_ptr], %[tmp0])
+    MMI_SUBU(%[src_ptr], %[src_ptr], %[src_pixel_step])
+    "gssdlc1    %[ftmp3],   0x07(%[src_ptr])                        \n\t"
+    "gssdrc1    %[ftmp3],   0x00(%[src_ptr])                        \n\t"

-    "xor        %[ftmp4],   %[ftmp4],           %[ff_pb_80]         \n\t"
-    "xor        %[ftmp7],   %[ftmp7],           %[ff_pb_80]         \n\t"
-    "paddsb     %[ftmp4],   %[ftmp4],           %[ftmp3]            \n\t"
-    "psubsb     %[ftmp7],   %[ftmp7],           %[ftmp3]            \n\t"
-    "xor        %[ftmp4],   %[ftmp4],           %[ff_pb_80]         \n\t"
-    "xor        %[ftmp7],   %[ftmp7],           %[ff_pb_80]         \n\t"
-    MMI_ADDU(%[addr1], %[addr0], %[src_pixel_step])
-    "gssdlc1    %[ftmp7],   0x07(%[addr1])                          \n\t"
-    "gssdrc1    %[ftmp7],   0x00(%[addr1])                          \n\t"
-    MMI_SUBU(%[addr1], %[addr0], %[tmp0])
-    "gssdlc1    %[ftmp4],   0x07(%[addr1])                          \n\t"
-    "gssdrc1    %[ftmp4],   0x00(%[addr1])                          \n\t"
-
-    "addiu      %[count],   %[count],           -0x01               \n\t"
+    MMI_SUBU(%[src_ptr], %[src_ptr], %[src_pixel_step])
    MMI_ADDIU(%[src_ptr], %[src_ptr], 0x08)
+    "addiu      %[count],   %[count],           -0x01               \n\t"
    "bnez       %[count],   1b                                      \n\t"
    : [ftmp0]"=&f"(ftmp[0]),              [ftmp1]"=&f"(ftmp[1]),
      [ftmp2]"=&f"(ftmp[2]),              [ftmp3]"=&f"(ftmp[3]),
      [ftmp4]"=&f"(ftmp[4]),              [ftmp5]"=&f"(ftmp[5]),
      [ftmp6]"=&f"(ftmp[6]),              [ftmp7]"=&f"(ftmp[7]),
      [ftmp8]"=&f"(ftmp[8]),              [ftmp9]"=&f"(ftmp[9]),
-      [tmp0]"=&r"(tmp[0]),
-      [addr0]"=&r"(addr[0]),            [addr1]"=&r"(addr[1]),
+      [ftmp10]"=&f"(ftmp[10]),            [ftmp11]"=&f"(ftmp[11]),
+      [ftmp12]"=&f"(ftmp[12]),            [tmp0]"=&r"(tmp[0]),
      [src_ptr]"+&r"(src_ptr),            [count]"+&r"(count)
    : [limit]"r"(limit),                  [blimit]"r"(blimit),
-      [srct]"r"(srct),                  [thresh]"r"(thresh),
+      [thresh]"r"(thresh),
      [src_pixel_step]"r"((mips_reg)src_pixel_step),
      [ff_pb_fe]"f"(ff_pb_fe),            [ff_pb_80]"f"(ff_pb_80),
      [ff_pb_04]"f"(ff_pb_04),            [ff_pb_03]"f"(ff_pb_03),
@@ -696,64 +678,60 @@ void vp8_mbloop_filter_vertical_edge_mmi(
    unsigned char *src_ptr, int src_pixel_step, const unsigned char *blimit,
    const unsigned char *limit, const unsigned char *thresh, int count) {
  mips_reg tmp[1];
-  mips_reg addr[2];
  DECLARE_ALIGNED(8, const uint64_t, srct[1]);
-  double ftmp[13];
+  double ftmp[14];

  __asm__ volatile (
-    MMI_SLL(%[tmp0], %[src_pixel_step], 0x02)
-    MMI_ADDU(%[src_ptr], %[src_ptr], %[tmp0])
    MMI_SUBU(%[src_ptr], %[src_ptr], 0x04)

    "1:                                                             \n\t"
-    MMI_SLL (%[tmp0], %[src_pixel_step], 0x01)
-    MMI_ADDU(%[addr0], %[src_ptr], %[tmp0])
-    "gsldlc1    %[ftmp11],  0x07(%[addr0])                          \n\t"
-    "gsldrc1    %[ftmp11],  0x00(%[addr0])                          \n\t"
-    MMI_ADDU(%[addr0], %[addr0], %[src_pixel_step])
-    "gsldlc1    %[ftmp12],  0x07(%[addr0])                          \n\t"
-    "gsldrc1    %[ftmp12],  0x00(%[addr0])                          \n\t"
-    "punpcklbh  %[ftmp1],   %[ftmp11],          %[ftmp12]           \n\t"
-    "punpckhbh  %[ftmp2],   %[ftmp11],          %[ftmp12]           \n\t"
+    "gsldlc1    %[ftmp5],   0x07(%[src_ptr])                        \n\t"
+    "gsldrc1    %[ftmp5],   0x00(%[src_ptr])                        \n\t"
+    MMI_ADDU(%[src_ptr], %[src_ptr], %[src_pixel_step])
+    "gsldlc1    %[ftmp6],   0x07(%[src_ptr])                        \n\t"
+    "gsldrc1    %[ftmp6],   0x00(%[src_ptr])                        \n\t"
+    MMI_ADDU(%[src_ptr], %[src_ptr], %[src_pixel_step])
+    "gsldlc1    %[ftmp7],   0x07(%[src_ptr])                        \n\t"
+    "gsldrc1    %[ftmp7],   0x00(%[src_ptr])                        \n\t"
+    MMI_ADDU(%[src_ptr], %[src_ptr], %[src_pixel_step])
+    "gsldlc1    %[ftmp8],   0x07(%[src_ptr])                        \n\t"
+    "gsldrc1    %[ftmp8],   0x00(%[src_ptr])                        \n\t"

-    "gsldlc1    %[ftmp11],  0x07(%[src_ptr])                        \n\t"
-    "gsldrc1    %[ftmp11],  0x00(%[src_ptr])                        \n\t"
-    MMI_ADDU(%[addr0], %[src_ptr], %[src_pixel_step])
-    "gsldlc1    %[ftmp12],  0x07(%[addr0])                          \n\t"
-    "gsldrc1    %[ftmp12],  0x00(%[addr0])                          \n\t"
-    "punpcklbh  %[ftmp3],   %[ftmp11],          %[ftmp12]           \n\t"
-    "punpckhbh  %[ftmp4],   %[ftmp11],          %[ftmp12]           \n\t"
+    "punpcklbh  %[ftmp11],  %[ftmp5],           %[ftmp6]            \n\t"
+    "punpckhbh  %[ftmp12],  %[ftmp5],           %[ftmp6]            \n\t"
+    "punpcklbh  %[ftmp9],   %[ftmp7],           %[ftmp8]            \n\t"
+    "punpckhbh  %[ftmp10],  %[ftmp7],           %[ftmp8]            \n\t"

-    "punpcklhw  %[ftmp5],   %[ftmp4],           %[ftmp2]            \n\t"
-    "punpckhhw  %[ftmp6],   %[ftmp4],           %[ftmp2]            \n\t"
-    "punpcklhw  %[ftmp7],   %[ftmp3],           %[ftmp1]            \n\t"
-    "punpckhhw  %[ftmp8],   %[ftmp3],           %[ftmp1]            \n\t"
+    "punpcklhw  %[ftmp1],   %[ftmp12],          %[ftmp10]           \n\t"
+    "punpckhhw  %[ftmp2],   %[ftmp12],          %[ftmp10]           \n\t"
+    "punpcklhw  %[ftmp3],   %[ftmp11],          %[ftmp9]            \n\t"
+    "punpckhhw  %[ftmp4],   %[ftmp11],          %[ftmp9]            \n\t"

-    MMI_SLL(%[tmp0], %[src_pixel_step], 0x01)
-    MMI_SUBU(%[addr0], %[src_ptr], %[tmp0])
-    "gsldlc1    %[ftmp11],  0x07(%[addr0])                          \n\t"
-    "gsldrc1    %[ftmp11],  0x00(%[addr0])                          \n\t"
-    MMI_SUBU(%[addr0], %[src_ptr], %[src_pixel_step])
-    "gsldlc1    %[ftmp12],  0x07(%[addr0])                          \n\t"
-    "gsldrc1    %[ftmp12],  0x00(%[addr0])                          \n\t"
-    "punpcklbh  %[ftmp9],   %[ftmp11],          %[ftmp12]           \n\t"
-    "punpckhbh  %[ftmp10],  %[ftmp11],          %[ftmp12]           \n\t"
+    MMI_ADDU(%[src_ptr], %[src_ptr], %[src_pixel_step])
+    "gsldlc1    %[ftmp5],   0x07(%[src_ptr])                        \n\t"
+    "gsldrc1    %[ftmp5],   0x00(%[src_ptr])                        \n\t"
+    MMI_ADDU(%[src_ptr], %[src_ptr], %[src_pixel_step])
+    "gsldlc1    %[ftmp6],   0x07(%[src_ptr])                        \n\t"
+    "gsldrc1    %[ftmp6],   0x00(%[src_ptr])                        \n\t"
+    MMI_ADDU(%[src_ptr], %[src_ptr], %[src_pixel_step])
+    "gsldlc1    %[ftmp7],   0x07(%[src_ptr])                        \n\t"
+    "gsldrc1    %[ftmp7],   0x00(%[src_ptr])                        \n\t"
+    MMI_ADDU(%[src_ptr], %[src_ptr], %[src_pixel_step])
+    "gsldlc1    %[ftmp8],   0x07(%[src_ptr])                        \n\t"
+    "gsldrc1    %[ftmp8],   0x00(%[src_ptr])                        \n\t"

-    MMI_SLL(%[tmp0], %[src_pixel_step], 0x02)
-    MMI_SUBU(%[addr0], %[src_ptr], %[tmp0])
-    "gsldlc1    %[ftmp11],  0x07(%[addr0])                          \n\t"
-    "gsldrc1    %[ftmp11],  0x00(%[addr0])                          \n\t"
-    MMI_ADDU(%[addr0], %[addr0], %[src_pixel_step])
-    "gsldlc1    %[ftmp12],  0x07(%[addr0])                          \n\t"
-    "gsldrc1    %[ftmp12],  0x00(%[addr0])                          \n\t"
-    "punpcklbh  %[ftmp0],   %[ftmp11],          %[ftmp12]           \n\t"
-    "punpckhbh  %[ftmp11],  %[ftmp11],          %[ftmp12]           \n\t"
+    "punpcklbh  %[ftmp11],  %[ftmp5],           %[ftmp6]            \n\t"
+    "punpckhbh  %[ftmp12],  %[ftmp5],           %[ftmp6]            \n\t"
+    "punpcklbh  %[ftmp9],   %[ftmp7],           %[ftmp8]            \n\t"
+    "punpckhbh  %[ftmp10],  %[ftmp7],           %[ftmp8]            \n\t"

-    "punpcklhw  %[ftmp1],   %[ftmp11],          %[ftmp10]           \n\t"
-    "punpckhhw  %[ftmp2],   %[ftmp11],          %[ftmp10]           \n\t"
-    "punpcklhw  %[ftmp3],   %[ftmp0],           %[ftmp9]            \n\t"
-    "punpckhhw  %[ftmp4],   %[ftmp0],           %[ftmp9]            \n\t"
+    "punpcklhw  %[ftmp5],   %[ftmp12],          %[ftmp10]           \n\t"
+    "punpckhhw  %[ftmp6],   %[ftmp12],          %[ftmp10]           \n\t"
+    "punpcklhw  %[ftmp7],   %[ftmp11],          %[ftmp9]            \n\t"
+    "punpckhhw  %[ftmp8],   %[ftmp11],          %[ftmp9]            \n\t"

+    "gsldlc1    %[ftmp13],  0x07(%[limit])                          \n\t"
+    "gsldrc1    %[ftmp13],  0x00(%[limit])                          \n\t"
    /* ftmp9:q0  ftmp10:q1 */
    "punpcklwd  %[ftmp9],   %[ftmp1],           %[ftmp5]            \n\t"
    "punpckhwd  %[ftmp10],  %[ftmp1],           %[ftmp5]            \n\t"
@@ -771,60 +749,61 @@ void vp8_mbloop_filter_vertical_edge_mmi(
    "punpcklwd  %[ftmp5],   %[ftmp4],           %[ftmp8]            \n\t"
    "punpckhwd  %[ftmp6],   %[ftmp4],           %[ftmp8]            \n\t"

-    "gsldlc1    %[ftmp8],   0x07(%[limit])                          \n\t"
-    "gsldrc1    %[ftmp8],   0x00(%[limit])                          \n\t"
-
    /* abs (q3-q2) */
    "pasubub    %[ftmp7],   %[ftmp12],          %[ftmp11]           \n\t"
-    "psubusb    %[ftmp0],   %[ftmp7],           %[ftmp8]            \n\t"
+    "psubusb    %[ftmp0],   %[ftmp7],           %[ftmp13]           \n\t"
    /* abs (q2-q1) */
    "pasubub    %[ftmp7],   %[ftmp11],          %[ftmp10]           \n\t"
-    "psubusb    %[ftmp7],   %[ftmp7],           %[ftmp8]            \n\t"
+    "psubusb    %[ftmp7],   %[ftmp7],           %[ftmp13]           \n\t"
    "or         %[ftmp0],   %[ftmp0],           %[ftmp7]            \n\t"
    /* ftmp3: abs(q1-q0) */
    "pasubub    %[ftmp3],   %[ftmp10],          %[ftmp9]            \n\t"
-    "psubusb    %[ftmp7],   %[ftmp3],           %[ftmp8]            \n\t"
+    "psubusb    %[ftmp7],   %[ftmp3],           %[ftmp13]           \n\t"
    "or         %[ftmp0],   %[ftmp0],           %[ftmp7]            \n\t"
    /* ftmp4: abs(p1-p0) */
    "pasubub    %[ftmp4],   %[ftmp5],           %[ftmp6]            \n\t"
-    "psubusb    %[ftmp7],   %[ftmp4],           %[ftmp8]            \n\t"
+    "psubusb    %[ftmp7],   %[ftmp4],           %[ftmp13]           \n\t"
    "or         %[ftmp0],   %[ftmp0],           %[ftmp7]            \n\t"
    /* abs (p2-p1) */
    "pasubub    %[ftmp7],   %[ftmp2],           %[ftmp5]            \n\t"
-    "psubusb    %[ftmp7],   %[ftmp7],           %[ftmp8]            \n\t"
+    "psubusb    %[ftmp7],   %[ftmp7],           %[ftmp13]           \n\t"
    "or         %[ftmp0],   %[ftmp0],           %[ftmp7]            \n\t"
    /* abs (p3-p2) */
    "pasubub    %[ftmp7],   %[ftmp1],           %[ftmp2]            \n\t"
-    "psubusb    %[ftmp7],   %[ftmp7],           %[ftmp8]            \n\t"
+    "psubusb    %[ftmp7],   %[ftmp7],           %[ftmp13]           \n\t"
    "or         %[ftmp0],   %[ftmp0],           %[ftmp7]            \n\t"
-    /* abs (p0-q0) */
+
+    "gsldlc1    %[ftmp13],  0x07(%[blimit])                         \n\t"
+    "gsldrc1    %[ftmp13],  0x00(%[blimit])                         \n\t"
+    "gsldlc1    %[ftmp7],   0x07(%[thresh])                         \n\t"
+    "gsldrc1    %[ftmp7],   0x00(%[thresh])                         \n\t"
+    /* abs (p0-q0) * 2 */
    "pasubub    %[ftmp1],   %[ftmp9],           %[ftmp6]            \n\t"
    "paddusb    %[ftmp1],   %[ftmp1],           %[ftmp1]            \n\t"
-    /* abs (p1-q1) */
+    /* abs (p1-q1) / 2 */
    "pasubub    %[ftmp12],  %[ftmp10],          %[ftmp5]            \n\t"
    "and        %[ftmp12],  %[ftmp12],          %[ff_pb_fe]         \n\t"
    "li         %[tmp0],    0x01                                    \n\t"
    "mtc1       %[tmp0],    %[ftmp8]                                \n\t"
    "psrlh      %[ftmp12],  %[ftmp12],          %[ftmp8]            \n\t"
    "paddusb    %[ftmp12],  %[ftmp1],           %[ftmp12]           \n\t"
-
-    "gsldlc1    %[ftmp8],   0x07(%[blimit])                         \n\t"
-    "gsldrc1    %[ftmp8],   0x00(%[blimit])                         \n\t"
-    "psubusb    %[ftmp12],  %[ftmp12],          %[ftmp8]            \n\t"
+    "psubusb    %[ftmp12],  %[ftmp12],          %[ftmp13]           \n\t"
    "or         %[ftmp0],   %[ftmp0],           %[ftmp12]           \n\t"
    "xor        %[ftmp12],  %[ftmp12],          %[ftmp12]           \n\t"
+    /* ftmp0: mask */
    "pcmpeqb    %[ftmp0],   %[ftmp0],           %[ftmp12]           \n\t"

-    "gsldlc1    %[ftmp8],   0x07(%[thresh])                         \n\t"
-    "gsldrc1    %[ftmp8],   0x00(%[thresh])                         \n\t"
-    /* ftmp3: abs(q1-q0)  ftmp4: abs(p1-p0) */
-    "psubusb    %[ftmp4],   %[ftmp4],           %[ftmp8]            \n\t"
-    "psubusb    %[ftmp3],   %[ftmp3],           %[ftmp8]            \n\t"
+    /* abs(p1-p0) - thresh */
+    "psubusb    %[ftmp4],   %[ftmp4],           %[ftmp7]            \n\t"
+    /* abs(q1-q0) - thresh */
+    "psubusb    %[ftmp3],   %[ftmp3],           %[ftmp7]            \n\t"
    "or         %[ftmp3],   %[ftmp4],           %[ftmp3]            \n\t"
    "pcmpeqb    %[ftmp3],   %[ftmp3],           %[ftmp12]           \n\t"
    "pcmpeqb    %[ftmp1],   %[ftmp1],           %[ftmp1]            \n\t"
+    /* ftmp1: hev */
    "xor        %[ftmp1],   %[ftmp3],           %[ftmp1]            \n\t"

+    /* ftmp2:ps2, ftmp5:ps1, ftmp6:ps0, ftmp9:qs0, ftmp10:qs1, ftmp11:qs2 */
    "xor        %[ftmp11],  %[ftmp11],          %[ff_pb_80]         \n\t"
    "xor        %[ftmp10],  %[ftmp10],          %[ff_pb_80]         \n\t"
    "xor        %[ftmp9],   %[ftmp9],           %[ff_pb_80]         \n\t"
@@ -837,30 +816,30 @@ void vp8_mbloop_filter_vertical_edge_mmi(
    "paddsb     %[ftmp3],   %[ftmp3],           %[ftmp4]            \n\t"
    "paddsb     %[ftmp3],   %[ftmp3],           %[ftmp4]            \n\t"
    "paddsb     %[ftmp3],   %[ftmp3],           %[ftmp4]            \n\t"
+    /* filter_value &= mask */
    "and        %[ftmp0],   %[ftmp0],           %[ftmp3]            \n\t"
+    /* Filter2 = filter_value & hev */
    "and        %[ftmp3],   %[ftmp1],           %[ftmp0]            \n\t"
+    /* filter_value &= ~hev */
    "pandn      %[ftmp0],   %[ftmp1],           %[ftmp0]            \n\t"

    "paddsb     %[ftmp4],   %[ftmp3],           %[ff_pb_04]         \n\t"
    "li         %[tmp0],    0x0b                                    \n\t"
    "mtc1       %[tmp0],    %[ftmp12]                               \n\t"
-    "xor        %[ftmp7],   %[ftmp7],           %[ftmp7]            \n\t"
-    "xor        %[ftmp8],   %[ftmp8],           %[ftmp8]            \n\t"
    "punpcklbh  %[ftmp7],   %[ftmp7],           %[ftmp4]            \n\t"
    "punpckhbh  %[ftmp8],   %[ftmp8],           %[ftmp4]            \n\t"
    "psrah      %[ftmp7],   %[ftmp7],           %[ftmp12]           \n\t"
    "psrah      %[ftmp8],   %[ftmp8],           %[ftmp12]           \n\t"
    "packsshb   %[ftmp4],   %[ftmp7],           %[ftmp8]            \n\t"
+    /* ftmp9: qs0 */
    "psubsb     %[ftmp9],   %[ftmp9],           %[ftmp4]            \n\t"
    "paddsb     %[ftmp3],   %[ftmp3],           %[ff_pb_03]         \n\t"
-    "xor        %[ftmp7],   %[ftmp7],           %[ftmp7]            \n\t"
-    "xor        %[ftmp8],   %[ftmp8],           %[ftmp8]            \n\t"
    "punpcklbh  %[ftmp7],   %[ftmp7],           %[ftmp3]            \n\t"
    "punpckhbh  %[ftmp8],   %[ftmp8],           %[ftmp3]            \n\t"
    "psrah      %[ftmp7],   %[ftmp7],           %[ftmp12]           \n\t"
    "psrah      %[ftmp8],   %[ftmp8],           %[ftmp12]           \n\t"
    "packsshb   %[ftmp3],   %[ftmp7],           %[ftmp8]            \n\t"
-
+    /* ftmp6: ps0 */
    "paddsb     %[ftmp6],   %[ftmp6],           %[ftmp3]            \n\t"

    "li         %[tmp0],    0x07                                    \n\t"
@@ -872,8 +851,10 @@ void vp8_mbloop_filter_vertical_edge_mmi(
    "pmulhh     %[ftmp8],   %[ftmp8],           %[ftmp1]            \n\t"
    VP8_MBLOOP_VPSRAB_ADDT
    "psubsb     %[ftmp4],   %[ftmp9],           %[ftmp3]            \n\t"
+    /* ftmp9: oq0 */
    "xor        %[ftmp9],   %[ftmp4],           %[ff_pb_80]         \n\t"
    "paddsb     %[ftmp4],   %[ftmp6],           %[ftmp3]            \n\t"
+    /* ftmp6: op0 */
    "xor        %[ftmp6],   %[ftmp4],           %[ff_pb_80]         \n\t"

    VP8_MBLOOP_VPSRAB_ADDH
@@ -882,8 +863,10 @@ void vp8_mbloop_filter_vertical_edge_mmi(
    "pmulhh     %[ftmp8],   %[ftmp8],           %[ftmp1]            \n\t"
    VP8_MBLOOP_VPSRAB_ADDT
    "psubsb     %[ftmp4],   %[ftmp10],          %[ftmp3]            \n\t"
+    /* ftmp10: oq1 */
    "xor        %[ftmp10],   %[ftmp4],          %[ff_pb_80]         \n\t"
    "paddsb     %[ftmp4],   %[ftmp5],           %[ftmp3]            \n\t"
+    /* ftmp5: op1 */
    "xor        %[ftmp5],   %[ftmp4],           %[ff_pb_80]         \n\t"

    VP8_MBLOOP_VPSRAB_ADDH
@@ -891,8 +874,10 @@ void vp8_mbloop_filter_vertical_edge_mmi(
    "pmulhh     %[ftmp8],   %[ftmp8],           %[ff_ph_0900]       \n\t"
    VP8_MBLOOP_VPSRAB_ADDT
    "psubsb     %[ftmp4],   %[ftmp11],          %[ftmp3]            \n\t"
+    /* ftmp11: oq2 */
    "xor        %[ftmp11],  %[ftmp4],           %[ff_pb_80]         \n\t"
    "paddsb     %[ftmp4],   %[ftmp2],           %[ftmp3]            \n\t"
+    /* ftmp2: op2 */
    "xor        %[ftmp2],   %[ftmp4],           %[ff_pb_80]         \n\t"

    "ldc1       %[ftmp12],  0x00(%[srct])                           \n\t"
@@ -916,41 +901,40 @@ void vp8_mbloop_filter_vertical_edge_mmi(
    "punpcklhw  %[ftmp10],  %[ftmp1],           %[ftmp3]            \n\t"
    "punpckhhw  %[ftmp11],  %[ftmp1],           %[ftmp3]            \n\t"

-    "punpcklwd  %[ftmp0],   %[ftmp6],           %[ftmp10]           \n\t"
-    "punpckhwd  %[ftmp1],   %[ftmp6],           %[ftmp10]           \n\t"
-
-    "gssdlc1    %[ftmp0],   0x07(%[src_ptr])                        \n\t"
-    "gssdrc1    %[ftmp0],   0x00(%[src_ptr])                        \n\t"
-    MMI_ADDU(%[addr0], %[src_ptr], %[src_pixel_step])
-    "gssdlc1    %[ftmp1],   0x07(%[addr0])                          \n\t"
-    "gssdrc1    %[ftmp1],   0x00(%[addr0])                          \n\t"
-
    "punpcklwd  %[ftmp0],   %[ftmp7],           %[ftmp11]           \n\t"
    "punpckhwd  %[ftmp1],   %[ftmp7],           %[ftmp11]           \n\t"
-    MMI_ADDU(%[addr0], %[addr0], %[src_pixel_step])
-    "gssdlc1    %[ftmp0],   0x07(%[addr0])                          \n\t"
-    "gssdrc1    %[ftmp0],   0x00(%[addr0])                          \n\t"
-    MMI_ADDU(%[addr0], %[addr0], %[src_pixel_step])
-    "gssdlc1    %[ftmp1],   0x07(%[addr0])                          \n\t"
-    "gssdrc1    %[ftmp1],   0x00(%[addr0])                          \n\t"
+    "gssdlc1    %[ftmp1],   0x07(%[src_ptr])                        \n\t"
+    "gssdrc1    %[ftmp1],   0x00(%[src_ptr])                        \n\t"
+    MMI_SUBU(%[src_ptr], %[src_ptr], %[src_pixel_step])
+    "gssdlc1    %[ftmp0],   0x07(%[src_ptr])                        \n\t"
+    "gssdrc1    %[ftmp0],   0x00(%[src_ptr])                        \n\t"
+
+    "punpcklwd  %[ftmp0],   %[ftmp6],           %[ftmp10]           \n\t"
+    "punpckhwd  %[ftmp1],   %[ftmp6],           %[ftmp10]           \n\t"
+    MMI_SUBU(%[src_ptr], %[src_ptr], %[src_pixel_step])
+    "gssdlc1    %[ftmp1],   0x07(%[src_ptr])                        \n\t"
+    "gssdrc1    %[ftmp1],   0x00(%[src_ptr])                        \n\t"
+    MMI_SUBU(%[src_ptr], %[src_ptr], %[src_pixel_step])
+    "gssdlc1    %[ftmp0],   0x07(%[src_ptr])                        \n\t"
+    "gssdrc1    %[ftmp0],   0x00(%[src_ptr])                        \n\t"

    "punpcklwd  %[ftmp1],   %[ftmp5],           %[ftmp9]            \n\t"
    "punpckhwd  %[ftmp0],   %[ftmp5],           %[ftmp9]            \n\t"
-    MMI_SUBU(%[addr0], %[src_ptr], %[src_pixel_step])
-    "gssdlc1    %[ftmp0],   0x07(%[addr0])                          \n\t"
-    "gssdrc1    %[ftmp0],   0x00(%[addr0])                          \n\t"
-    MMI_SUBU(%[addr0], %[addr0], %[src_pixel_step])
-    "gssdlc1    %[ftmp1],   0x07(%[addr0])                          \n\t"
-    "gssdrc1    %[ftmp1],   0x00(%[addr0])                          \n\t"
+    MMI_SUBU(%[src_ptr], %[src_ptr], %[src_pixel_step])
+    "gssdlc1    %[ftmp0],   0x07(%[src_ptr])                        \n\t"
+    "gssdrc1    %[ftmp0],   0x00(%[src_ptr])                        \n\t"
+    MMI_SUBU(%[src_ptr], %[src_ptr], %[src_pixel_step])
+    "gssdlc1    %[ftmp1],   0x07(%[src_ptr])                        \n\t"
+    "gssdrc1    %[ftmp1],   0x00(%[src_ptr])                        \n\t"

    "punpcklwd  %[ftmp1],   %[ftmp4],           %[ftmp8]            \n\t"
    "punpckhwd  %[ftmp0],   %[ftmp4],           %[ftmp8]            \n\t"
-    MMI_SUBU(%[addr0], %[addr0], %[src_pixel_step])
-    "gssdlc1    %[ftmp0],   0x07(%[addr0])                          \n\t"
-    "gssdrc1    %[ftmp0],   0x00(%[addr0])                          \n\t"
-    MMI_SUBU(%[addr0], %[addr0], %[src_pixel_step])
-    "gssdlc1    %[ftmp1],   0x07(%[addr0])                          \n\t"
-    "gssdrc1    %[ftmp1],   0x00(%[addr0])                          \n\t"
+    MMI_SUBU(%[src_ptr], %[src_ptr], %[src_pixel_step])
+    "gssdlc1    %[ftmp0],   0x07(%[src_ptr])                        \n\t"
+    "gssdrc1    %[ftmp0],   0x00(%[src_ptr])                        \n\t"
+    MMI_SUBU(%[src_ptr], %[src_ptr], %[src_pixel_step])
+    "gssdlc1    %[ftmp1],   0x07(%[src_ptr])                        \n\t"
+    "gssdrc1    %[ftmp1],   0x00(%[src_ptr])                        \n\t"
    "addiu      %[count],   %[count],           -0x01               \n\t"

    MMI_SLL(%[tmp0], %[src_pixel_step], 0x03)
@@ -962,9 +946,9 @@ void vp8_mbloop_filter_vertical_edge_mmi(
      [ftmp6]"=&f"(ftmp[6]),              [ftmp7]"=&f"(ftmp[7]),
      [ftmp8]"=&f"(ftmp[8]),              [ftmp9]"=&f"(ftmp[9]),
      [ftmp10]"=&f"(ftmp[10]),            [ftmp11]"=&f"(ftmp[11]),
-      [ftmp12]"=&f"(ftmp[12]),            [tmp0]"=&r"(tmp[0]),
-      [addr0]"=&r"(addr[0]),
-      [src_ptr]"+&r"(src_ptr),          [count]"+&r"(count)
+      [ftmp12]"=&f"(ftmp[12]),            [ftmp13]"=&f"(ftmp[13]),
+      [tmp0]"=&r"(tmp[0]),                [src_ptr]"+&r"(src_ptr),
+      [count]"+&r"(count)
    : [limit]"r"(limit),                [blimit]"r"(blimit),
      [srct]"r"(srct),                  [thresh]"r"(thresh),
      [src_pixel_step]"r"((mips_reg)src_pixel_step),
--- a/vp8/common/mips/mmi/sixtap_filter_mmi.c
+++ b/vp8/common/mips/mmi/sixtap_filter_mmi.c
@@ -86,6 +86,7 @@ static INLINE void vp8_filter_block1d_h6_mmi(unsigned char *src_ptr,
  register double ftmp8 asm("$f18");
  register double ftmp9 asm("$f20");
  register double ftmp10 asm("$f22");
+  register double ftmp11 asm("$f24");
 #else
  register double fzero asm("$f0");
  register double ftmp0 asm("$f1");
@@ -99,6 +100,7 @@ static INLINE void vp8_filter_block1d_h6_mmi(unsigned char *src_ptr,
  register double ftmp8 asm("$f9");
  register double ftmp9 asm("$f10");
  register double ftmp10 asm("$f11");
+  register double ftmp11 asm("$f12");
 #endif  // _MIPS_SIM == _ABIO32

  __asm__ volatile (
@@ -112,11 +114,13 @@ static INLINE void vp8_filter_block1d_h6_mmi(unsigned char *src_ptr,
    "li         %[tmp0],        0x07                                  \n\t"
    "mtc1       %[tmp0],        %[ftmp7]                              \n\t"
    "li         %[tmp0],        0x08                                  \n\t"
-    "mtc1       %[tmp0],        %[ftmp10]                             \n\t"
+    "mtc1       %[tmp0],        %[ftmp11]                             \n\t"

    "1:                                                               \n\t"
    "gsldlc1    %[ftmp9],       0x05(%[src_ptr])                      \n\t"
    "gsldrc1    %[ftmp9],       -0x02(%[src_ptr])                     \n\t"
+    "gsldlc1    %[ftmp10],      0x06(%[src_ptr])                      \n\t"
+    "gsldrc1    %[ftmp10],      -0x01(%[src_ptr])                     \n\t"

    "punpcklbh  %[ftmp6],       %[ftmp9],          %[fzero]           \n\t"
    "pmullh     %[ftmp8],       %[ftmp6],          %[ftmp0]           \n\t"
@@ -125,24 +129,21 @@ static INLINE void vp8_filter_block1d_h6_mmi(unsigned char *src_ptr,
    "pmullh     %[ftmp6],       %[ftmp6],          %[ftmp4]           \n\t"
    "paddsh     %[ftmp8],       %[ftmp8],          %[ftmp6]           \n\t"

-    "gsldlc1    %[ftmp9],       0x06(%[src_ptr])                      \n\t"
-    "gsldrc1    %[ftmp9],      -0x01(%[src_ptr])                      \n\t"
-
-    "punpcklbh  %[ftmp6],       %[ftmp9],          %[fzero]           \n\t"
+    "punpcklbh  %[ftmp6],       %[ftmp10],         %[fzero]           \n\t"
    "pmullh     %[ftmp6],       %[ftmp6],          %[ftmp1]           \n\t"
    "paddsh     %[ftmp8],       %[ftmp8],          %[ftmp6]           \n\t"

-    "punpckhbh  %[ftmp6],       %[ftmp9],          %[fzero]           \n\t"
+    "punpckhbh  %[ftmp6],       %[ftmp10],         %[fzero]           \n\t"
    "pmullh     %[ftmp6],       %[ftmp6],          %[ftmp5]           \n\t"
    "paddsh     %[ftmp8],       %[ftmp8],          %[ftmp6]           \n\t"

-    "dsrl       %[ftmp9],       %[ftmp9],          %[ftmp10]          \n\t"
-    "punpcklbh  %[ftmp6],       %[ftmp9],          %[fzero]           \n\t"
+    "dsrl       %[ftmp10],      %[ftmp10],         %[ftmp11]          \n\t"
+    "punpcklbh  %[ftmp6],       %[ftmp10],         %[fzero]           \n\t"
    "pmullh     %[ftmp6],       %[ftmp6],          %[ftmp2]           \n\t"
    "paddsh     %[ftmp8],       %[ftmp8],          %[ftmp6]           \n\t"

-    "dsrl       %[ftmp9],       %[ftmp9],          %[ftmp10]          \n\t"
-    "punpcklbh  %[ftmp6],       %[ftmp9],          %[fzero]           \n\t"
+    "dsrl       %[ftmp10],      %[ftmp10],         %[ftmp11]          \n\t"
+    "punpcklbh  %[ftmp6],       %[ftmp10],         %[fzero]           \n\t"
    "pmullh     %[ftmp6],       %[ftmp6],          %[ftmp3]           \n\t"
    "paddsh     %[ftmp8],       %[ftmp8],          %[ftmp6]           \n\t"

@@ -163,8 +164,9 @@ static INLINE void vp8_filter_block1d_h6_mmi(unsigned char *src_ptr,
      [ftmp5]"=&f"(ftmp5),              [ftmp6]"=&f"(ftmp6),
      [ftmp7]"=&f"(ftmp7),              [ftmp8]"=&f"(ftmp8),
      [ftmp9]"=&f"(ftmp9),              [ftmp10]"=&f"(ftmp10),
-      [tmp0]"=&r"(tmp[0]),              [src_ptr]"+&r"(src_ptr),
-      [output_ptr]"+&r"(output_ptr),    [output_height]"+&r"(output_height)
+      [ftmp11]"=&f"(ftmp11),            [tmp0]"=&r"(tmp[0]),
+      [output_ptr]"+&r"(output_ptr),    [output_height]"+&r"(output_height),
+      [src_ptr]"+&r"(src_ptr)
    : [src_pixels_per_line]"r"((mips_reg)src_pixels_per_line),
      [vp8_filter]"r"(vp8_filter),      [output_width]"r"(output_width),
      [ff_ph_40]"f"(ff_ph_40)
@@ -190,6 +192,11 @@ static INLINE void vp8_filter_block1dc_v6_mmi(
  register double ftmp6 asm("$f14");
  register double ftmp7 asm("$f16");
  register double ftmp8 asm("$f18");
+  register double ftmp9 asm("$f20");
+  register double ftmp10 asm("$f22");
+  register double ftmp11 asm("$f24");
+  register double ftmp12 asm("$f26");
+  register double ftmp13 asm("$f28");
 #else
  register double fzero asm("$f0");
  register double ftmp0 asm("$f1");
@@ -201,6 +208,11 @@ static INLINE void vp8_filter_block1dc_v6_mmi(
  register double ftmp6 asm("$f7");
  register double ftmp7 asm("$f8");
  register double ftmp8 asm("$f9");
+  register double ftmp9 asm("$f10");
+  register double ftmp10 asm("$f11");
+  register double ftmp11 asm("$f12");
+  register double ftmp12 asm("$f13");
+  register double ftmp13 asm("$f14");
 #endif  // _MIPS_SIM == _ABIO32

  __asm__ volatile (
@@ -210,52 +222,56 @@ static INLINE void vp8_filter_block1dc_v6_mmi(
    "ldc1       %[ftmp3],     0x30(%[vp8_filter])                     \n\t"
    "ldc1       %[ftmp4],     0x40(%[vp8_filter])                     \n\t"
    "ldc1       %[ftmp5],     0x50(%[vp8_filter])                     \n\t"
-    MMI_SUBU(%[src_ptr],   %[src_ptr],      %[pixels_per_line_x2])
    "xor        %[fzero],     %[fzero],        %[fzero]               \n\t"
    "li         %[tmp0],      0x07                                    \n\t"
-    "mtc1       %[tmp0],      %[ftmp7]                                \n\t"
+    "mtc1       %[tmp0],      %[ftmp13]                               \n\t"

+    /* In order to make full use of memory load delay slot,
+     * Operation of memory loading and calculating has been rearranged.
+     */
    "1:                                                               \n\t"
    "gsldlc1    %[ftmp6],     0x07(%[src_ptr])                        \n\t"
    "gsldrc1    %[ftmp6],     0x00(%[src_ptr])                        \n\t"
-    "pmullh     %[ftmp8],     %[ftmp6],        %[ftmp0]               \n\t"
-
    MMI_ADDU(%[addr0],     %[src_ptr],      %[pixels_per_line])
-    "gsldlc1    %[ftmp6],     0x07(%[addr0])                          \n\t"
-    "gsldrc1    %[ftmp6],     0x00(%[addr0])                          \n\t"
-    "pmullh     %[ftmp6],     %[ftmp6],        %[ftmp1]               \n\t"
-    "paddsh     %[ftmp8],     %[ftmp8],        %[ftmp6]               \n\t"
-
+    "gsldlc1    %[ftmp7],     0x07(%[addr0])                          \n\t"
+    "gsldrc1    %[ftmp7],     0x00(%[addr0])                          \n\t"
    MMI_ADDU(%[addr0],     %[src_ptr],      %[pixels_per_line_x2])
-    "gsldlc1    %[ftmp6],     0x07(%[addr0])                          \n\t"
-    "gsldrc1    %[ftmp6],     0x00(%[addr0])                          \n\t"
-    "pmullh     %[ftmp6],     %[ftmp6],        %[ftmp2]               \n\t"
-    "paddsh     %[ftmp8],     %[ftmp8],        %[ftmp6]               \n\t"
+    "gsldlc1    %[ftmp8],     0x07(%[addr0])                          \n\t"
+    "gsldrc1    %[ftmp8],     0x00(%[addr0])                          \n\t"

    MMI_ADDU(%[addr0],     %[src_ptr],      %[pixels_per_line_x4])
-    "gsldlc1    %[ftmp6],     0x07(%[addr0])                          \n\t"
-    "gsldrc1    %[ftmp6],     0x00(%[addr0])                          \n\t"
-    "pmullh     %[ftmp6],     %[ftmp6],        %[ftmp4]               \n\t"
-    "paddsh     %[ftmp8],     %[ftmp8],        %[ftmp6]               \n\t"
-
+    "gsldlc1    %[ftmp9],     0x07(%[addr0])                          \n\t"
+    "gsldrc1    %[ftmp9],     0x00(%[addr0])                          \n\t"
    MMI_ADDU(%[src_ptr],   %[src_ptr],      %[pixels_per_line])
    MMI_ADDU(%[addr0],     %[src_ptr],      %[pixels_per_line_x2])
-    "gsldlc1    %[ftmp6],     0x07(%[addr0])                          \n\t"
-    "gsldrc1    %[ftmp6],     0x00(%[addr0])                          \n\t"
-    "pmullh     %[ftmp6],     %[ftmp6],        %[ftmp3]               \n\t"
-    "paddsh     %[ftmp8],     %[ftmp8],        %[ftmp6]               \n\t"
-
+    "gsldlc1    %[ftmp10],    0x07(%[addr0])                          \n\t"
+    "gsldrc1    %[ftmp10],    0x00(%[addr0])                          \n\t"
    MMI_ADDU(%[addr0],     %[src_ptr],      %[pixels_per_line_x4])
-    "gsldlc1    %[ftmp6],     0x07(%[addr0])                          \n\t"
-    "gsldrc1    %[ftmp6],     0x00(%[addr0])                          \n\t"
-    "pmullh     %[ftmp6],     %[ftmp6],        %[ftmp5]               \n\t"
-    "paddsh     %[ftmp8],     %[ftmp8],        %[ftmp6]               \n\t"
+    "gsldlc1    %[ftmp11],    0x07(%[addr0])                          \n\t"
+    "gsldrc1    %[ftmp11],    0x00(%[addr0])                          \n\t"

-    "paddsh     %[ftmp8],     %[ftmp8],        %[ff_ph_40]            \n\t"
-    "psrah      %[ftmp8],     %[ftmp8],        %[ftmp7]               \n\t"
-    "packushb   %[ftmp8],     %[ftmp8],        %[fzero]               \n\t"
-    "gsswlc1    %[ftmp8],     0x03(%[output_ptr])                     \n\t"
-    "gsswrc1    %[ftmp8],     0x00(%[output_ptr])                     \n\t"
+    "pmullh     %[ftmp12],    %[ftmp6],        %[ftmp0]               \n\t"
+
+    "pmullh     %[ftmp7],     %[ftmp7],        %[ftmp1]               \n\t"
+    "paddsh     %[ftmp12],    %[ftmp12],       %[ftmp7]               \n\t"
+
+    "pmullh     %[ftmp8],     %[ftmp8],        %[ftmp2]               \n\t"
+    "paddsh     %[ftmp12],    %[ftmp12],       %[ftmp8]               \n\t"
+
+    "pmullh     %[ftmp9],     %[ftmp9],        %[ftmp4]               \n\t"
+    "paddsh     %[ftmp12],    %[ftmp12],       %[ftmp9]               \n\t"
+
+    "pmullh     %[ftmp10],    %[ftmp10],       %[ftmp3]               \n\t"
+    "paddsh     %[ftmp12],    %[ftmp12],       %[ftmp10]              \n\t"
+
+    "pmullh     %[ftmp11],    %[ftmp11],       %[ftmp5]               \n\t"
+    "paddsh     %[ftmp12],    %[ftmp12],       %[ftmp11]              \n\t"
+
+    "paddsh     %[ftmp12],    %[ftmp12],       %[ff_ph_40]            \n\t"
+    "psrah      %[ftmp12],    %[ftmp12],       %[ftmp13]              \n\t"
+    "packushb   %[ftmp12],    %[ftmp12],       %[fzero]               \n\t"
+    "gsswlc1    %[ftmp12],    0x03(%[output_ptr])                     \n\t"
+    "gsswrc1    %[ftmp12],    0x00(%[output_ptr])                     \n\t"

    MMI_ADDIU(%[output_height], %[output_height], -0x01)
    MMI_ADDU(%[output_ptr], %[output_ptr], %[output_pitch])
@@ -265,9 +281,11 @@ static INLINE void vp8_filter_block1dc_v6_mmi(
      [ftmp3]"=&f"(ftmp3),              [ftmp4]"=&f"(ftmp4),
      [ftmp5]"=&f"(ftmp5),              [ftmp6]"=&f"(ftmp6),
      [ftmp7]"=&f"(ftmp7),              [ftmp8]"=&f"(ftmp8),
-      [tmp0]"=&r"(tmp[0]),              [addr0]"=&r"(addr[0]),
-      [src_ptr]"+&r"(src_ptr),          [output_ptr]"+&r"(output_ptr),
-      [output_height]"+&r"(output_height)
+      [ftmp9]"=&f"(ftmp9),              [ftmp10]"=&f"(ftmp10),
+      [ftmp11]"=&f"(ftmp11),            [ftmp12]"=&f"(ftmp12),
+      [ftmp13]"=&f"(ftmp13),            [tmp0]"=&r"(tmp[0]),
+      [addr0]"=&r"(addr[0]),            [src_ptr]"+&r"(src_ptr),
+      [output_ptr]"+&r"(output_ptr),    [output_height]"+&r"(output_height)
    : [pixels_per_line]"r"((mips_reg)pixels_per_line),
      [pixels_per_line_x2]"r"((mips_reg)(pixels_per_line<<1)),
      [pixels_per_line_x4]"r"((mips_reg)(pixels_per_line<<2)),
@@ -301,6 +319,7 @@ static INLINE void vp8_filter_block1d_h6_filter0_mmi(
    "1:                                                               \n\t"
    "gsldlc1    %[ftmp0],       0x07(%[src_ptr])                      \n\t"
    "gsldrc1    %[ftmp0],       0x00(%[src_ptr])                      \n\t"
+    MMI_ADDU(%[src_ptr],  %[src_ptr], %[src_pixels_per_line])

    "punpcklbh  %[ftmp1],       %[ftmp0],          %[fzero]           \n\t"
    "gssdlc1    %[ftmp1],       0x07(%[output_ptr])                   \n\t"
@@ -308,7 +327,6 @@ static INLINE void vp8_filter_block1d_h6_filter0_mmi(

    "addiu      %[output_height], %[output_height], -0x01             \n\t"
    MMI_ADDU(%[output_ptr],  %[output_ptr],    %[output_width])
-    MMI_ADDU(%[src_ptr],  %[src_ptr], %[src_pixels_per_line])
    "bnez       %[output_height],               1b                    \n\t"
    : [fzero]"=&f"(fzero),              [ftmp0]"=&f"(ftmp0),
      [ftmp1]"=&f"(ftmp1),              [src_ptr]"+&r"(src_ptr),
@@ -338,12 +356,12 @@ static INLINE void vp8_filter_block1dc_v6_filter0_mmi(
    "1:                                                               \n\t"
    "gsldlc1    %[ftmp0],     0x07(%[src_ptr])                        \n\t"
    "gsldrc1    %[ftmp0],     0x00(%[src_ptr])                        \n\t"
+    MMI_ADDU(%[src_ptr],   %[src_ptr],      %[pixels_per_line])
+    MMI_ADDIU(%[output_height], %[output_height], -0x01)
    "packushb   %[ftmp1],     %[ftmp0],        %[fzero]               \n\t"
    "gsswlc1    %[ftmp1],     0x03(%[output_ptr])                     \n\t"
    "gsswrc1    %[ftmp1],     0x00(%[output_ptr])                     \n\t"

-    MMI_ADDU(%[src_ptr],   %[src_ptr],      %[pixels_per_line])
-    MMI_ADDIU(%[output_height], %[output_height], -0x01)
    MMI_ADDU(%[output_ptr], %[output_ptr], %[output_pitch])
    "bnez       %[output_height], 1b                                  \n\t"
    : [fzero]"=&f"(fzero),              [ftmp0]"=&f"(ftmp0),
@@ -386,7 +404,7 @@ static INLINE void vp8_filter_block1dc_v6_filter0_mmi(
      }                                                                        \
    } else {                                                                   \
      for (i = 0; i < loop; ++i) {                                             \
-        vp8_filter_block1dc_v6_mmi(FData2 + n * 2 + i * 4, dst_ptr + i * 4, m, \
+        vp8_filter_block1dc_v6_mmi(FData2 + i * 4, dst_ptr + i * 4, m,         \
                                   dst_pitch, n * 2, VFilter);                 \
      }                                                                        \
    }                                                                          \
--- a/vp8/common/rtcd_defs.pl
+++ b/vp8/common/rtcd_defs.pl
@@ -1,3 +1,13 @@
+##
+##  Copyright (c) 2017 The WebM project authors. All Rights Reserved.
+##
+##  Use of this source code is governed by a BSD-style license
+##  that can be found in the LICENSE file in the root of the source
+##  tree. An additional intellectual property rights grant can be found
+##  in the file PATENTS.  All contributing project authors may
+##  be found in the AUTHORS file in the root of the source tree.
+##
+
 sub vp8_common_forward_decls() {
 print <<EOF
 /*
--- a/vp8/encoder/rdopt.c
+++ b/vp8/encoder/rdopt.c
@@ -23,6 +23,7 @@
 #include "modecosts.h"
 #include "encodeintra.h"
 #include "pickinter.h"
+#include "vp8/common/common.h"
 #include "vp8/common/entropymode.h"
 #include "vp8/common/reconinter.h"
 #include "vp8/common/reconintra.h"
@@ -959,19 +960,13 @@ static void rd_check_segment(VP8_COMP *cpi, MACROBLOCK *x, BEST_SEG_INFO *bsi,
  vp8_variance_fn_ptr_t *v_fn_ptr;

  ENTROPY_CONTEXT_PLANES t_above, t_left;
-  ENTROPY_CONTEXT *ta;
-  ENTROPY_CONTEXT *tl;
  ENTROPY_CONTEXT_PLANES t_above_b, t_left_b;
-  ENTROPY_CONTEXT *ta_b;
-  ENTROPY_CONTEXT *tl_b;

  memcpy(&t_above, x->e_mbd.above_context, sizeof(ENTROPY_CONTEXT_PLANES));
  memcpy(&t_left, x->e_mbd.left_context, sizeof(ENTROPY_CONTEXT_PLANES));

-  ta = (ENTROPY_CONTEXT *)&t_above;
-  tl = (ENTROPY_CONTEXT *)&t_left;
-  ta_b = (ENTROPY_CONTEXT *)&t_above_b;
-  tl_b = (ENTROPY_CONTEXT *)&t_left_b;
+  vp8_zero(t_above_b);
+  vp8_zero(t_left_b);

  br = 0;
  bd = 0;
@@ -1151,13 +1146,13 @@ static void rd_check_segment(VP8_COMP *cpi, MACROBLOCK *x, BEST_SEG_INFO *bsi,
        mode_selected = this_mode;
        best_label_rd = this_rd;

-        memcpy(ta_b, ta_s, sizeof(ENTROPY_CONTEXT_PLANES));
-        memcpy(tl_b, tl_s, sizeof(ENTROPY_CONTEXT_PLANES));
+        memcpy(&t_above_b, &t_above_s, sizeof(ENTROPY_CONTEXT_PLANES));
+        memcpy(&t_left_b, &t_left_s, sizeof(ENTROPY_CONTEXT_PLANES));
      }
    } /*for each 4x4 mode*/

-    memcpy(ta, ta_b, sizeof(ENTROPY_CONTEXT_PLANES));
-    memcpy(tl, tl_b, sizeof(ENTROPY_CONTEXT_PLANES));
+    memcpy(&t_above, &t_above_b, sizeof(ENTROPY_CONTEXT_PLANES));
+    memcpy(&t_left, &t_left_b, sizeof(ENTROPY_CONTEXT_PLANES));

    labels2mode(x, labels, i, mode_selected, &mode_mv[mode_selected],
                bsi->ref_mv, x->mvcost);
--- a/vp9/common/vp9_rtcd_defs.pl
+++ b/vp9/common/vp9_rtcd_defs.pl
@@ -1,3 +1,13 @@
+##
+##  Copyright (c) 2017 The WebM project authors. All Rights Reserved.
+##
+##  Use of this source code is governed by a BSD-style license
+##  that can be found in the LICENSE file in the root of the source
+##  tree. An additional intellectual property rights grant can be found
+##  in the file PATENTS.  All contributing project authors may
+##  be found in the AUTHORS file in the root of the source tree.
+##
+
 sub vp9_common_forward_decls() {
 print <<EOF
 /*
--- a/vp9/encoder/vp9_denoiser.c
+++ b/vp9/encoder/vp9_denoiser.c
@@ -189,11 +189,12 @@ static VP9_DENOISER_DECISION perform_motion_compensation(
    int increase_denoising, int mi_row, int mi_col, PICK_MODE_CONTEXT *ctx,
    int motion_magnitude, int is_skin, int *zeromv_filter, int consec_zeromv,
    int num_spatial_layers, int width, int lst_fb_idx, int gld_fb_idx,
-    int use_svc) {
+    int use_svc, int spatial_layer) {
  const int sse_diff = (ctx->newmv_sse == UINT_MAX)
                           ? 0
                           : ((int)ctx->zeromv_sse - (int)ctx->newmv_sse);
  int frame;
+  int denoise_layer_idx = 0;
  MACROBLOCKD *filter_mbd = &mb->e_mbd;
  MODE_INFO *mi = filter_mbd->mi[0];
  MODE_INFO saved_mi;
@@ -254,6 +255,10 @@ static VP9_DENOISER_DECISION perform_motion_compensation(
      frame = lst_fb_idx + 1;
    else if (frame == GOLDEN_FRAME)
      frame = gld_fb_idx + 1;
+    // Shift for the second spatial layer.
+    if (num_spatial_layers - spatial_layer == 2)
+      frame = frame + denoiser->num_ref_frames;
+    denoise_layer_idx = num_spatial_layers - spatial_layer - 1;
  }

  if (ctx->newmv_sse > sse_thresh(bs, increase_denoising)) {
@@ -289,18 +294,21 @@ static VP9_DENOISER_DECISION perform_motion_compensation(
                  denoiser->running_avg_y[frame].uv_stride, mi_row, mi_col);
  filter_mbd->plane[2].pre[0].stride = denoiser->running_avg_y[frame].uv_stride;

-  filter_mbd->plane[0].dst.buf =
-      block_start(denoiser->mc_running_avg_y.y_buffer,
-                  denoiser->mc_running_avg_y.y_stride, mi_row, mi_col);
-  filter_mbd->plane[0].dst.stride = denoiser->mc_running_avg_y.y_stride;
-  filter_mbd->plane[1].dst.buf =
-      block_start(denoiser->mc_running_avg_y.u_buffer,
-                  denoiser->mc_running_avg_y.uv_stride, mi_row, mi_col);
-  filter_mbd->plane[1].dst.stride = denoiser->mc_running_avg_y.uv_stride;
-  filter_mbd->plane[2].dst.buf =
-      block_start(denoiser->mc_running_avg_y.v_buffer,
-                  denoiser->mc_running_avg_y.uv_stride, mi_row, mi_col);
-  filter_mbd->plane[2].dst.stride = denoiser->mc_running_avg_y.uv_stride;
+  filter_mbd->plane[0].dst.buf = block_start(
+      denoiser->mc_running_avg_y[denoise_layer_idx].y_buffer,
+      denoiser->mc_running_avg_y[denoise_layer_idx].y_stride, mi_row, mi_col);
+  filter_mbd->plane[0].dst.stride =
+      denoiser->mc_running_avg_y[denoise_layer_idx].y_stride;
+  filter_mbd->plane[1].dst.buf = block_start(
+      denoiser->mc_running_avg_y[denoise_layer_idx].u_buffer,
+      denoiser->mc_running_avg_y[denoise_layer_idx].uv_stride, mi_row, mi_col);
+  filter_mbd->plane[1].dst.stride =
+      denoiser->mc_running_avg_y[denoise_layer_idx].uv_stride;
+  filter_mbd->plane[2].dst.buf = block_start(
+      denoiser->mc_running_avg_y[denoise_layer_idx].v_buffer,
+      denoiser->mc_running_avg_y[denoise_layer_idx].uv_stride, mi_row, mi_col);
+  filter_mbd->plane[2].dst.stride =
+      denoiser->mc_running_avg_y[denoise_layer_idx].uv_stride;

  set_ref_ptrs(cm, filter_mbd, saved_frame, NONE);
  vp9_build_inter_predictors_sby(filter_mbd, mi_row, mi_col, bs);
@@ -324,9 +332,17 @@ void vp9_denoiser_denoise(VP9_COMP *cpi, MACROBLOCK *mb, int mi_row, int mi_col,
  int zeromv_filter = 0;
  VP9_DENOISER *denoiser = &cpi->denoiser;
  VP9_DENOISER_DECISION decision = COPY_BLOCK;
-  YV12_BUFFER_CONFIG avg = denoiser->running_avg_y[INTRA_FRAME];
-  YV12_BUFFER_CONFIG mc_avg = denoiser->mc_running_avg_y;
+
+  const int shift =
+      cpi->svc.number_spatial_layers - cpi->svc.spatial_layer_id == 2
+          ? denoiser->num_ref_frames
+          : 0;
+  YV12_BUFFER_CONFIG avg = denoiser->running_avg_y[INTRA_FRAME + shift];
+  const int denoise_layer_index =
+      cpi->svc.number_spatial_layers - cpi->svc.spatial_layer_id - 1;
+  YV12_BUFFER_CONFIG mc_avg = denoiser->mc_running_avg_y[denoise_layer_index];
  uint8_t *avg_start = block_start(avg.y_buffer, avg.y_stride, mi_row, mi_col);
+
  uint8_t *mc_avg_start =
      block_start(mc_avg.y_buffer, mc_avg.y_stride, mi_row, mi_col);
  struct buf_2d src = mb->plane[0].src;
@@ -381,7 +397,7 @@ void vp9_denoiser_denoise(VP9_COMP *cpi, MACROBLOCK *mb, int mi_row, int mi_col,
        &cpi->common, denoiser, mb, bs, increase_denoising, mi_row, mi_col, ctx,
        motion_magnitude, is_skin, &zeromv_filter, consec_zeromv,
        cpi->svc.number_spatial_layers, cpi->Source->y_width, cpi->lst_fb_idx,
-        cpi->gld_fb_idx, cpi->use_svc);
+        cpi->gld_fb_idx, cpi->use_svc, cpi->svc.spatial_layer_id);

  if (decision == FILTER_BLOCK) {
    decision = vp9_denoiser_filter(src.buf, src.stride, mc_avg_start,
@@ -432,7 +448,8 @@ void vp9_denoiser_update_frame_info(
    VP9_DENOISER *denoiser, YV12_BUFFER_CONFIG src, FRAME_TYPE frame_type,
    int refresh_alt_ref_frame, int refresh_golden_frame, int refresh_last_frame,
    int alt_fb_idx, int gld_fb_idx, int lst_fb_idx, int resized,
-    int svc_base_is_key) {
+    int svc_base_is_key, int second_spatial_layer) {
+  const int shift = second_spatial_layer ? denoiser->num_ref_frames : 0;
  // Copy source into denoised reference buffers on KEY_FRAME or
  // if the just encoded frame was resized. For SVC, copy source if the base
  // spatial layer was key frame.
@@ -441,8 +458,8 @@ void vp9_denoiser_update_frame_info(
    int i;
    // Start at 1 so as not to overwrite the INTRA_FRAME
    for (i = 1; i < denoiser->num_ref_frames; ++i) {
-      if (denoiser->running_avg_y[i].buffer_alloc != NULL)
-        copy_frame(&denoiser->running_avg_y[i], &src);
+      if (denoiser->running_avg_y[i + shift].buffer_alloc != NULL)
+        copy_frame(&denoiser->running_avg_y[i + shift], &src);
    }
    denoiser->reset = 0;
    return;
@@ -451,29 +468,29 @@ void vp9_denoiser_update_frame_info(
  // If more than one refresh occurs, must copy frame buffer.
  if ((refresh_alt_ref_frame + refresh_golden_frame + refresh_last_frame) > 1) {
    if (refresh_alt_ref_frame) {
-      copy_frame(&denoiser->running_avg_y[alt_fb_idx + 1],
-                 &denoiser->running_avg_y[INTRA_FRAME]);
+      copy_frame(&denoiser->running_avg_y[alt_fb_idx + 1 + shift],
+                 &denoiser->running_avg_y[INTRA_FRAME + shift]);
    }
    if (refresh_golden_frame) {
-      copy_frame(&denoiser->running_avg_y[gld_fb_idx + 1],
-                 &denoiser->running_avg_y[INTRA_FRAME]);
+      copy_frame(&denoiser->running_avg_y[gld_fb_idx + 1 + shift],
+                 &denoiser->running_avg_y[INTRA_FRAME + shift]);
    }
    if (refresh_last_frame) {
-      copy_frame(&denoiser->running_avg_y[lst_fb_idx + 1],
-                 &denoiser->running_avg_y[INTRA_FRAME]);
+      copy_frame(&denoiser->running_avg_y[lst_fb_idx + 1 + shift],
+                 &denoiser->running_avg_y[INTRA_FRAME + shift]);
    }
  } else {
    if (refresh_alt_ref_frame) {
-      swap_frame_buffer(&denoiser->running_avg_y[alt_fb_idx + 1],
-                        &denoiser->running_avg_y[INTRA_FRAME]);
+      swap_frame_buffer(&denoiser->running_avg_y[alt_fb_idx + 1 + shift],
+                        &denoiser->running_avg_y[INTRA_FRAME + shift]);
    }
    if (refresh_golden_frame) {
-      swap_frame_buffer(&denoiser->running_avg_y[gld_fb_idx + 1],
-                        &denoiser->running_avg_y[INTRA_FRAME]);
+      swap_frame_buffer(&denoiser->running_avg_y[gld_fb_idx + 1 + shift],
+                        &denoiser->running_avg_y[INTRA_FRAME + shift]);
    }
    if (refresh_last_frame) {
-      swap_frame_buffer(&denoiser->running_avg_y[lst_fb_idx + 1],
-                        &denoiser->running_avg_y[INTRA_FRAME]);
+      swap_frame_buffer(&denoiser->running_avg_y[lst_fb_idx + 1 + shift],
+                        &denoiser->running_avg_y[INTRA_FRAME + shift]);
    }
  }
 }
@@ -522,44 +539,75 @@ static int vp9_denoiser_realloc_svc_helper(VP9_COMMON *cm,
 }

 int vp9_denoiser_realloc_svc(VP9_COMMON *cm, VP9_DENOISER *denoiser,
-                             int refresh_alt, int refresh_gld, int refresh_lst,
-                             int alt_fb_idx, int gld_fb_idx, int lst_fb_idx) {
+                             int svc_buf_shift, int refresh_alt,
+                             int refresh_gld, int refresh_lst, int alt_fb_idx,
+                             int gld_fb_idx, int lst_fb_idx) {
  int fail = 0;
  if (refresh_alt) {
    // Increase the frame buffer index by 1 to map it to the buffer index in the
    // denoiser.
-    fail = vp9_denoiser_realloc_svc_helper(cm, denoiser, alt_fb_idx + 1);
+    fail = vp9_denoiser_realloc_svc_helper(cm, denoiser,
+                                           alt_fb_idx + 1 + svc_buf_shift);
    if (fail) return 1;
  }
  if (refresh_gld) {
-    fail = vp9_denoiser_realloc_svc_helper(cm, denoiser, gld_fb_idx + 1);
+    fail = vp9_denoiser_realloc_svc_helper(cm, denoiser,
+                                           gld_fb_idx + 1 + svc_buf_shift);
    if (fail) return 1;
  }
  if (refresh_lst) {
-    fail = vp9_denoiser_realloc_svc_helper(cm, denoiser, lst_fb_idx + 1);
+    fail = vp9_denoiser_realloc_svc_helper(cm, denoiser,
+                                           lst_fb_idx + 1 + svc_buf_shift);
    if (fail) return 1;
  }
  return 0;
 }

-int vp9_denoiser_alloc(VP9_COMMON *cm, int use_svc, VP9_DENOISER *denoiser,
-                       int width, int height, int ssx, int ssy,
+int vp9_denoiser_alloc(VP9_COMMON *cm, struct SVC *svc, VP9_DENOISER *denoiser,
+                       int use_svc, int noise_sen, int width, int height,
+                       int ssx, int ssy,
 #if CONFIG_VP9_HIGHBITDEPTH
                       int use_highbitdepth,
 #endif
                       int border) {
-  int i, fail, init_num_ref_frames;
+  int i, layer, fail, init_num_ref_frames;
  const int legacy_byte_alignment = 0;
+  int num_layers = 1;
+  int scaled_width = width;
+  int scaled_height = height;
+  if (use_svc) {
+    LAYER_CONTEXT *lc = &svc->layer_context[svc->spatial_layer_id *
+                                                svc->number_temporal_layers +
+                                            svc->temporal_layer_id];
+    get_layer_resolution(width, height, lc->scaling_factor_num,
+                         lc->scaling_factor_den, &scaled_width, &scaled_height);
+    // For SVC: only denoise at most 2 spatial (highest) layers.
+    if (noise_sen >= 2)
+      // Denoise from one spatial layer below the top.
+      svc->first_layer_denoise = VPXMAX(svc->number_spatial_layers - 2, 0);
+    else
+      // Only denoise the top spatial layer.
+      svc->first_layer_denoise = VPXMAX(svc->number_spatial_layers - 1, 0);
+    num_layers = svc->number_spatial_layers - svc->first_layer_denoise;
+  }
  assert(denoiser != NULL);
-
  denoiser->num_ref_frames = use_svc ? SVC_REF_FRAMES : NONSVC_REF_FRAMES;
  init_num_ref_frames = use_svc ? MAX_REF_FRAMES : NONSVC_REF_FRAMES;
+  denoiser->num_layers = num_layers;
+  CHECK_MEM_ERROR(cm, denoiser->running_avg_y,
+                  vpx_calloc(denoiser->num_ref_frames * num_layers,
+                             sizeof(denoiser->running_avg_y[0])));
  CHECK_MEM_ERROR(
-      cm, denoiser->running_avg_y,
-      vpx_calloc(denoiser->num_ref_frames, sizeof(denoiser->running_avg_y[0])));
+      cm, denoiser->mc_running_avg_y,
+      vpx_calloc(num_layers, sizeof(denoiser->mc_running_avg_y[0])));
+
+  for (layer = 0; layer < num_layers; ++layer) {
+    const int denoise_width = (layer == 0) ? width : scaled_width;
+    const int denoise_height = (layer == 0) ? height : scaled_height;
    for (i = 0; i < init_num_ref_frames; ++i) {
-    fail = vpx_alloc_frame_buffer(&denoiser->running_avg_y[i], width, height,
-                                  ssx, ssy,
+      fail = vpx_alloc_frame_buffer(
+          &denoiser->running_avg_y[i + denoiser->num_ref_frames * layer],
+          denoise_width, denoise_height, ssx, ssy,
 #if CONFIG_VP9_HIGHBITDEPTH
          use_highbitdepth,
 #endif
@@ -573,8 +621,8 @@ int vp9_denoiser_alloc(VP9_COMMON *cm, int use_svc, VP9_DENOISER *denoiser,
 #endif
    }

-  fail = vpx_alloc_frame_buffer(&denoiser->mc_running_avg_y, width, height, ssx,
-                                ssy,
+    fail = vpx_alloc_frame_buffer(&denoiser->mc_running_avg_y[layer],
+                                  denoise_width, denoise_height, ssx, ssy,
 #if CONFIG_VP9_HIGHBITDEPTH
                                  use_highbitdepth,
 #endif
@@ -583,7 +631,10 @@ int vp9_denoiser_alloc(VP9_COMMON *cm, int use_svc, VP9_DENOISER *denoiser,
      vp9_denoiser_free(denoiser);
      return 1;
    }
+  }

+  // denoiser->last_source only used for noise_estimation, so only for top
+  // layer.
  fail = vpx_alloc_frame_buffer(&denoiser->last_source, width, height, ssx, ssy,
 #if CONFIG_VP9_HIGHBITDEPTH
                                use_highbitdepth,
@@ -609,12 +660,18 @@ void vp9_denoiser_free(VP9_DENOISER *denoiser) {
    return;
  }
  denoiser->frame_buffer_initialized = 0;
-  for (i = 0; i < denoiser->num_ref_frames; ++i) {
+  for (i = 0; i < denoiser->num_ref_frames * denoiser->num_layers; ++i) {
    vpx_free_frame_buffer(&denoiser->running_avg_y[i]);
  }
  vpx_free(denoiser->running_avg_y);
  denoiser->running_avg_y = NULL;
-  vpx_free_frame_buffer(&denoiser->mc_running_avg_y);
+
+  for (i = 0; i < denoiser->num_layers; ++i) {
+    vpx_free_frame_buffer(&denoiser->mc_running_avg_y[i]);
+  }
+
+  vpx_free(denoiser->mc_running_avg_y);
+  denoiser->mc_running_avg_y = NULL;
  vpx_free_frame_buffer(&denoiser->last_source);
 }

--- a/vp9/encoder/vp9_denoiser.h
+++ b/vp9/encoder/vp9_denoiser.h
@@ -44,11 +44,12 @@ typedef enum vp9_denoiser_level {

 typedef struct vp9_denoiser {
  YV12_BUFFER_CONFIG *running_avg_y;
-  YV12_BUFFER_CONFIG mc_running_avg_y;
+  YV12_BUFFER_CONFIG *mc_running_avg_y;
  YV12_BUFFER_CONFIG last_source;
  int frame_buffer_initialized;
  int reset;
  int num_ref_frames;
+  int num_layers;
  VP9_DENOISER_LEVEL denoising_level;
  VP9_DENOISER_LEVEL prev_denoising_level;
 } VP9_DENOISER;
@@ -66,12 +67,13 @@ typedef struct {
 } VP9_PICKMODE_CTX_DEN;

 struct VP9_COMP;
+struct SVC;

 void vp9_denoiser_update_frame_info(
    VP9_DENOISER *denoiser, YV12_BUFFER_CONFIG src, FRAME_TYPE frame_type,
    int refresh_alt_ref_frame, int refresh_golden_frame, int refresh_last_frame,
    int alt_fb_idx, int gld_fb_idx, int lst_fb_idx, int resized,
-    int svc_base_is_key);
+    int svc_base_is_key, int second_spatial_layer);

 void vp9_denoiser_denoise(struct VP9_COMP *cpi, MACROBLOCK *mb, int mi_row,
                          int mi_col, BLOCK_SIZE bs, PICK_MODE_CONTEXT *ctx,
@@ -84,11 +86,13 @@ void vp9_denoiser_update_frame_stats(MODE_INFO *mi, unsigned int sse,
                                     PICK_MODE_CONTEXT *ctx);

 int vp9_denoiser_realloc_svc(VP9_COMMON *cm, VP9_DENOISER *denoiser,
-                             int refresh_alt, int refresh_gld, int refresh_lst,
-                             int alt_fb_idx, int gld_fb_idx, int lst_fb_idx);
+                             int svc_buf_shift, int refresh_alt,
+                             int refresh_gld, int refresh_lst, int alt_fb_idx,
+                             int gld_fb_idx, int lst_fb_idx);

-int vp9_denoiser_alloc(VP9_COMMON *cm, int use_svc, VP9_DENOISER *denoiser,
-                       int width, int height, int ssx, int ssy,
+int vp9_denoiser_alloc(VP9_COMMON *cm, struct SVC *svc, VP9_DENOISER *denoiser,
+                       int use_svc, int noise_sen, int width, int height,
+                       int ssx, int ssy,
 #if CONFIG_VP9_HIGHBITDEPTH
                       int use_highbitdepth,
 #endif
--- a/vp9/encoder/vp9_encoder.c
+++ b/vp9/encoder/vp9_encoder.c
@@ -437,34 +437,37 @@ static int is_psnr_calc_enabled(VP9_COMP *cpi) {

 /* clang-format off */
 const Vp9LevelSpec vp9_level_defs[VP9_LEVELS] = {
-  { LEVEL_1,   829440,      36864,    200,    400,    2, 1,  4,  8 },
-  { LEVEL_1_1, 2764800,     73728,    800,    1000,   2, 1,  4,  8 },
-  { LEVEL_2,   4608000,     122880,   1800,   1500,   2, 1,  4,  8 },
-  { LEVEL_2_1, 9216000,     245760,   3600,   2800,   2, 2,  4,  8 },
-  { LEVEL_3,   20736000,    552960,   7200,   6000,   2, 4,  4,  8 },
-  { LEVEL_3_1, 36864000,    983040,   12000,  10000,  2, 4,  4,  8 },
-  { LEVEL_4,   83558400,    2228224,  18000,  16000,  4, 4,  4,  8 },
-  { LEVEL_4_1, 160432128,   2228224,  30000,  18000,  4, 4,  5,  6 },
-  { LEVEL_5,   311951360,   8912896,  60000,  36000,  6, 8,  6,  4 },
-  { LEVEL_5_1, 588251136,   8912896,  120000, 46000,  8, 8,  10, 4 },
+  //         sample rate    size   breadth  bitrate  cpb
+  { LEVEL_1,   829440,      36864,    512,   200,    400,    2, 1,  4,  8 },
+  { LEVEL_1_1, 2764800,     73728,    768,   800,    1000,   2, 1,  4,  8 },
+  { LEVEL_2,   4608000,     122880,   960,   1800,   1500,   2, 1,  4,  8 },
+  { LEVEL_2_1, 9216000,     245760,   1344,  3600,   2800,   2, 2,  4,  8 },
+  { LEVEL_3,   20736000,    552960,   2048,  7200,   6000,   2, 4,  4,  8 },
+  { LEVEL_3_1, 36864000,    983040,   2752,  12000,  10000,  2, 4,  4,  8 },
+  { LEVEL_4,   83558400,    2228224,  4160,  18000,  16000,  4, 4,  4,  8 },
+  { LEVEL_4_1, 160432128,   2228224,  4160,  30000,  18000,  4, 4,  5,  6 },
+  { LEVEL_5,   311951360,   8912896,  8384,  60000,  36000,  6, 8,  6,  4 },
+  { LEVEL_5_1, 588251136,   8912896,  8384,  120000, 46000,  8, 8,  10, 4 },
  // TODO(huisu): update max_cpb_size for level 5_2 ~ 6_2 when
  // they are finalized (currently tentative).
-  { LEVEL_5_2, 1176502272,  8912896,  180000, 90000,  8, 8,  10, 4 },
-  { LEVEL_6,   1176502272,  35651584, 180000, 90000,  8, 16, 10, 4 },
-  { LEVEL_6_1, 2353004544u, 35651584, 240000, 180000, 8, 16, 10, 4 },
-  { LEVEL_6_2, 4706009088u, 35651584, 480000, 360000, 8, 16, 10, 4 },
+  { LEVEL_5_2, 1176502272,  8912896,  8384,  180000, 90000,  8, 8,  10, 4 },
+  { LEVEL_6,   1176502272,  35651584, 16832, 180000, 90000,  8, 16, 10, 4 },
+  { LEVEL_6_1, 2353004544u, 35651584, 16832, 240000, 180000, 8, 16, 10, 4 },
+  { LEVEL_6_2, 4706009088u, 35651584, 16832, 480000, 360000, 8, 16, 10, 4 },
 };
 /* clang-format on */

-static const char *level_fail_messages[TARGET_LEVEL_FAIL_IDS] =
-    { "The average bit-rate is too high.",
+static const char *level_fail_messages[TARGET_LEVEL_FAIL_IDS] = {
+  "The average bit-rate is too high.",
  "The picture size is too large.",
+  "The picture width/height is too large.",
  "The luma sample rate is too large.",
  "The CPB size is too large.",
  "The compression ratio is too small",
  "Too many column tiles are used.",
  "The alt-ref distance is too small.",
-      "Too many reference buffers are used." };
+  "Too many reference buffers are used."
+};

 static INLINE void Scale2Ratio(VPX_SCALING mode, int *hr, int *hs) {
  switch (mode) {
@@ -566,6 +569,8 @@ VP9_LEVEL vp9_get_level(const Vp9LevelSpec *const level_spec) {
            (double)this_level->max_luma_sample_rate *
                (1 + SAMPLE_RATE_GRACE_P) ||
        level_spec->max_luma_picture_size > this_level->max_luma_picture_size ||
+        level_spec->max_luma_picture_breadth >
+            this_level->max_luma_picture_breadth ||
        level_spec->average_bitrate > this_level->average_bitrate ||
        level_spec->max_cpb_size > this_level->max_cpb_size ||
        level_spec->compression_ratio < this_level->compression_ratio ||
@@ -1219,8 +1224,8 @@ static void set_tile_limits(VP9_COMP *cpi) {
  }

  if (cpi->oxcf.target_level == LEVEL_AUTO) {
-    const uint32_t pic_size = cpi->common.width * cpi->common.height;
-    const int level_tile_cols = log_tile_cols_from_picsize_level(pic_size);
+    const int level_tile_cols =
+        log_tile_cols_from_picsize_level(cpi->common.width, cpi->common.height);
    if (cm->log2_tile_cols > level_tile_cols) {
      cm->log2_tile_cols = VPXMAX(level_tile_cols, min_log2_tile_cols);
    }
@@ -1848,6 +1853,8 @@ void vp9_change_config(struct VP9_COMP *cpi, const VP9EncoderConfig *oxcf) {
           cm->mi_rows * cm->mi_cols * sizeof(*cpi->consec_zero_mv));
    if (cpi->oxcf.aq_mode == CYCLIC_REFRESH_AQ)
      vp9_cyclic_refresh_reset_resize(cpi);
+    rc->rc_1_frame = 0;
+    rc->rc_2_frame = 0;
  }

  if ((cpi->svc.number_temporal_layers > 1 && cpi->oxcf.rc_mode == VPX_CBR) ||
@@ -1858,6 +1865,24 @@ void vp9_change_config(struct VP9_COMP *cpi, const VP9EncoderConfig *oxcf) {
                                           (int)cpi->oxcf.target_bandwidth);
  }

+  // Check for resetting the rc flags (rc_1_frame, rc_2_frame) if the
+  // configuration change has a large change in avg_frame_bandwidth.
+  // For SVC check for resetting based on spatial layer average bandwidth.
+  // Also reset buffer level to optimal level.
+  if (cm->current_video_frame > 0) {
+    if (cpi->use_svc) {
+      vp9_svc_check_reset_layer_rc_flag(cpi);
+    } else {
+      if (rc->avg_frame_bandwidth > (3 * rc->last_avg_frame_bandwidth >> 1) ||
+          rc->avg_frame_bandwidth < (rc->last_avg_frame_bandwidth >> 1)) {
+        rc->rc_1_frame = 0;
+        rc->rc_2_frame = 0;
+        rc->bits_off_target = rc->optimal_buffer_level;
+        rc->buffer_level = rc->optimal_buffer_level;
+      }
+    }
+  }
+
  cpi->alt_ref_source = NULL;
  rc->is_src_frame_alt_ref = 0;

@@ -2856,18 +2881,26 @@ void vp9_update_reference_frames(VP9_COMP *cpi) {
  if (cpi->oxcf.noise_sensitivity > 0 && denoise_svc(cpi) &&
      cpi->denoiser.denoising_level > kDenLowLow) {
    int svc_base_is_key = 0;
+    int denoise_svc_second_layer = 0;
    if (cpi->use_svc) {
      int realloc_fail = 0;
+      const int svc_buf_shift =
+          cpi->svc.number_spatial_layers - cpi->svc.spatial_layer_id == 2
+              ? cpi->denoiser.num_ref_frames
+              : 0;
      int layer = LAYER_IDS_TO_IDX(cpi->svc.spatial_layer_id,
                                   cpi->svc.temporal_layer_id,
                                   cpi->svc.number_temporal_layers);
      LAYER_CONTEXT *lc = &cpi->svc.layer_context[layer];
      svc_base_is_key = lc->is_key_frame;
-
-      // Check if we need to allocate extra buffers in the denoiser for
+      denoise_svc_second_layer =
+          cpi->svc.number_spatial_layers - cpi->svc.spatial_layer_id == 2 ? 1
+                                                                          : 0;
+      // Check if we need to allocate extra buffers in the denoiser
+      // for
      // refreshed frames.
      realloc_fail = vp9_denoiser_realloc_svc(
-          cm, &cpi->denoiser, cpi->refresh_alt_ref_frame,
+          cm, &cpi->denoiser, svc_buf_shift, cpi->refresh_alt_ref_frame,
          cpi->refresh_golden_frame, cpi->refresh_last_frame, cpi->alt_fb_idx,
          cpi->gld_fb_idx, cpi->lst_fb_idx);
      if (realloc_fail)
@@ -2878,7 +2911,8 @@ void vp9_update_reference_frames(VP9_COMP *cpi) {
        &cpi->denoiser, *cpi->Source, cpi->common.frame_type,
        cpi->refresh_alt_ref_frame, cpi->refresh_golden_frame,
        cpi->refresh_last_frame, cpi->alt_fb_idx, cpi->gld_fb_idx,
-        cpi->lst_fb_idx, cpi->resize_pending, svc_base_is_key);
+        cpi->lst_fb_idx, cpi->resize_pending, svc_base_is_key,
+        denoise_svc_second_layer);
  }
 #endif
  if (is_one_pass_cbr_svc(cpi)) {
@@ -3313,8 +3347,9 @@ static void setup_denoiser_buffer(VP9_COMP *cpi) {
  VP9_COMMON *const cm = &cpi->common;
  if (cpi->oxcf.noise_sensitivity > 0 &&
      !cpi->denoiser.frame_buffer_initialized) {
-    if (vp9_denoiser_alloc(cm, cpi->use_svc, &cpi->denoiser, cm->width,
-                           cm->height, cm->subsampling_x, cm->subsampling_y,
+    if (vp9_denoiser_alloc(cm, &cpi->svc, &cpi->denoiser, cpi->use_svc,
+                           cpi->oxcf.noise_sensitivity, cm->width, cm->height,
+                           cm->subsampling_x, cm->subsampling_y,
 #if CONFIG_VP9_HIGHBITDEPTH
                           cm->use_highbitdepth,
 #endif
@@ -4829,6 +4864,7 @@ static void update_level_info(VP9_COMP *cpi, size_t *size, int arf_src_index) {
  int i, idx;
  uint64_t luma_samples, dur_end;
  const uint32_t luma_pic_size = cm->width * cm->height;
+  const uint32_t luma_pic_breadth = VPXMAX(cm->width, cm->height);
  LevelConstraint *const level_constraint = &cpi->level_constraint;
  const int8_t level_index = level_constraint->level_index;
  double cpb_data_size;
@@ -4932,6 +4968,11 @@ static void update_level_info(VP9_COMP *cpi, size_t *size, int arf_src_index) {
    level_spec->max_luma_picture_size = luma_pic_size;
  }

+  // update max_luma_picture_breadth
+  if (luma_pic_breadth > level_spec->max_luma_picture_breadth) {
+    level_spec->max_luma_picture_breadth = luma_pic_breadth;
+  }
+
  // update compression_ratio
  level_spec->compression_ratio = (double)level_stats->total_uncompressed_size *
                                  cm->bit_depth /
@@ -4952,6 +4993,15 @@ static void update_level_info(VP9_COMP *cpi, size_t *size, int arf_src_index) {
                         level_fail_messages[LUMA_PIC_SIZE_TOO_LARGE]);
    }

+    if (level_spec->max_luma_picture_breadth >
+        vp9_level_defs[level_index].max_luma_picture_breadth) {
+      level_constraint->fail_flag |= (1 << LUMA_PIC_BREADTH_TOO_LARGE);
+      vpx_internal_error(&cm->error, VPX_CODEC_ERROR,
+                         "Failed to encode to the target level %d. %s",
+                         vp9_level_defs[level_index].level,
+                         level_fail_messages[LUMA_PIC_BREADTH_TOO_LARGE]);
+    }
+
    if ((double)level_spec->max_luma_sample_rate >
        (double)vp9_level_defs[level_index].max_luma_sample_rate *
            (1 + SAMPLE_RATE_GRACE_P)) {
@@ -5152,8 +5202,6 @@ int vp9_get_compressed_data(VP9_COMP *cpi, unsigned int *frame_flags,
      cm->intra_only = 0;
      // if the flags indicate intra frame, but if the current picture is for
      // non-zero spatial layer, it should not be an intra picture.
-      // TODO(Won Kap): this needs to change if per-layer intra frame is
-      // allowed.
      if ((source->flags & VPX_EFLAG_FORCE_KF) &&
          cpi->svc.spatial_layer_id > cpi->svc.first_spatial_layer_to_encode) {
        source->flags &= ~(unsigned int)(VPX_EFLAG_FORCE_KF);
--- a/vp9/encoder/vp9_encoder.h
+++ b/vp9/encoder/vp9_encoder.h
@@ -383,6 +383,7 @@ typedef struct {
  VP9_LEVEL level;
  uint64_t max_luma_sample_rate;
  uint32_t max_luma_picture_size;
+  uint32_t max_luma_picture_breadth;
  double average_bitrate;  // in kilobits per second
  double max_cpb_size;     // in kilobits
  double compression_ratio;
@@ -422,14 +423,15 @@ typedef struct {

 typedef enum {
  BITRATE_TOO_LARGE = 0,
-  LUMA_PIC_SIZE_TOO_LARGE = 1,
-  LUMA_SAMPLE_RATE_TOO_LARGE = 2,
-  CPB_TOO_LARGE = 3,
-  COMPRESSION_RATIO_TOO_SMALL = 4,
-  TOO_MANY_COLUMN_TILE = 5,
-  ALTREF_DIST_TOO_SMALL = 6,
-  TOO_MANY_REF_BUFFER = 7,
-  TARGET_LEVEL_FAIL_IDS = 8
+  LUMA_PIC_SIZE_TOO_LARGE,
+  LUMA_PIC_BREADTH_TOO_LARGE,
+  LUMA_SAMPLE_RATE_TOO_LARGE,
+  CPB_TOO_LARGE,
+  COMPRESSION_RATIO_TOO_SMALL,
+  TOO_MANY_COLUMN_TILE,
+  ALTREF_DIST_TOO_SMALL,
+  TOO_MANY_REF_BUFFER,
+  TARGET_LEVEL_FAIL_IDS
 } TARGET_LEVEL_FAIL_ID;

 typedef struct {
@@ -868,7 +870,7 @@ static INLINE int is_one_pass_cbr_svc(const struct VP9_COMP *const cpi) {
 static INLINE int denoise_svc(const struct VP9_COMP *const cpi) {
  return (!cpi->use_svc ||
          (cpi->use_svc &&
-           cpi->svc.spatial_layer_id == cpi->svc.number_spatial_layers - 1));
+           cpi->svc.spatial_layer_id >= cpi->svc.first_layer_denoise));
 }
 #endif

@@ -920,10 +922,14 @@ static INLINE int get_level_index(VP9_LEVEL level) {

 // Return the log2 value of max column tiles corresponding to the level that
 // the picture size fits into.
-static INLINE int log_tile_cols_from_picsize_level(uint32_t pic_size) {
+static INLINE int log_tile_cols_from_picsize_level(uint32_t width,
+                                                   uint32_t height) {
  int i;
+  const uint32_t pic_size = width * height;
+  const uint32_t pic_breadth = VPXMAX(width, height);
  for (i = LEVEL_1; i < LEVEL_MAX; ++i) {
-    if (vp9_level_defs[i].max_luma_picture_size > pic_size) {
+    if (vp9_level_defs[i].max_luma_picture_size >= pic_size &&
+        vp9_level_defs[i].max_luma_picture_breadth >= pic_breadth) {
      return get_msb(vp9_level_defs[i].max_col_tiles);
    }
  }
--- a/vp9/encoder/vp9_ethread.c
+++ b/vp9/encoder/vp9_ethread.c
@@ -66,8 +66,8 @@ static int get_max_tile_cols(VP9_COMP *cpi) {
  log2_tile_cols =
      clamp(cpi->oxcf.tile_columns, min_log2_tile_cols, max_log2_tile_cols);
  if (cpi->oxcf.target_level == LEVEL_AUTO) {
-    const uint32_t pic_size = cpi->common.width * cpi->common.height;
-    const int level_tile_cols = log_tile_cols_from_picsize_level(pic_size);
+    const int level_tile_cols =
+        log_tile_cols_from_picsize_level(cpi->common.width, cpi->common.height);
    if (log2_tile_cols > level_tile_cols) {
      log2_tile_cols = VPXMAX(level_tile_cols, min_log2_tile_cols);
    }
--- a/vp9/encoder/vp9_firstpass.c
+++ b/vp9/encoder/vp9_firstpass.c
@@ -44,7 +44,6 @@
 #define COMPLEXITY_STATS_OUTPUT 0

 #define FIRST_PASS_Q 10.0
-#define GF_MAX_BOOST 96.0
 #define INTRA_MODE_PENALTY 1024
 #define MIN_ARF_GF_BOOST 240
 #define MIN_DECAY_FACTOR 0.01
@@ -1949,6 +1948,7 @@ static void accumulate_frame_motion_stats(const FIRSTPASS_STATS *stats,
 }

 #define BASELINE_ERR_PER_MB 12500.0
+#define GF_MAX_BOOST 96.0
 static double calc_frame_boost(VP9_COMP *cpi, const FIRSTPASS_STATS *this_frame,
                               double this_frame_mv_in_out) {
  double frame_boost;
@@ -2383,6 +2383,8 @@ static void adjust_group_arnr_filter(VP9_COMP *cpi, double section_noise,

 // Analyse and define a gf/arf group.
 #define ARF_DECAY_BREAKOUT 0.10
+#define ARF_ABS_ZOOM_THRESH 4.0
+
 static void define_gf_group(VP9_COMP *cpi, FIRSTPASS_STATS *this_frame) {
  VP9_COMMON *const cm = &cpi->common;
  RATE_CONTROL *const rc = &cpi->rc;
@@ -2411,7 +2413,6 @@ static void define_gf_group(VP9_COMP *cpi, FIRSTPASS_STATS *this_frame) {
  double mv_in_out_accumulator = 0.0;
  double abs_mv_in_out_accumulator = 0.0;
  double mv_ratio_accumulator_thresh;
-  double mv_in_out_thresh;
  double abs_mv_in_out_thresh;
  double sr_accumulator = 0.0;
  const double av_err = get_distribution_av_err(cpi, twopass);
@@ -2457,8 +2458,7 @@ static void define_gf_group(VP9_COMP *cpi, FIRSTPASS_STATS *this_frame) {
  // Motion breakout threshold for loop below depends on image size.
  mv_ratio_accumulator_thresh =
      (cpi->initial_height + cpi->initial_width) / 4.0;
-  mv_in_out_thresh = (cpi->initial_height + cpi->initial_width) / 300.0;
-  abs_mv_in_out_thresh = (cpi->initial_height + cpi->initial_width) / 200.0;
+  abs_mv_in_out_thresh = ARF_ABS_ZOOM_THRESH;

  // Set a maximum and minimum interval for the GF group.
  // If the image appears almost completely static we can extend beyond this.
@@ -2543,8 +2543,11 @@ static void define_gf_group(VP9_COMP *cpi, FIRSTPASS_STATS *this_frame) {
      // Update the accumulator for second ref error difference.
      // This is intended to give an indication of how much the coded error is
      // increasing over time.
+      if (i == 1) {
+        sr_accumulator += next_frame.coded_error;
+      } else {
        sr_accumulator += (next_frame.sr_coded_error - next_frame.coded_error);
-      sr_accumulator = VPXMAX(0.0, sr_accumulator);
+      }
    }

    // Break out conditions.
@@ -2559,7 +2562,6 @@ static void define_gf_group(VP9_COMP *cpi, FIRSTPASS_STATS *this_frame) {
            (!flash_detected) &&
            ((mv_ratio_accumulator > mv_ratio_accumulator_thresh) ||
             (abs_mv_in_out_accumulator > abs_mv_in_out_thresh) ||
-             (mv_in_out_accumulator < -mv_in_out_thresh) ||
             (sr_accumulator > next_frame.intra_error)))) {
      break;
    }
--- a/vp9/encoder/vp9_noise_estimate.c
+++ b/vp9/encoder/vp9_noise_estimate.c
@@ -21,6 +21,15 @@
 #include "vp9/encoder/vp9_noise_estimate.h"
 #include "vp9/encoder/vp9_encoder.h"

+#if CONFIG_VP9_TEMPORAL_DENOISING
+// For SVC: only do noise estimation on top spatial layer.
+static INLINE int noise_est_svc(const struct VP9_COMP *const cpi) {
+  return (!cpi->use_svc ||
+          (cpi->use_svc &&
+           cpi->svc.spatial_layer_id == cpi->svc.number_spatial_layers - 1));
+}
+#endif
+
 void vp9_noise_estimate_init(NOISE_ESTIMATE *const ne, int width, int height) {
  ne->enabled = 0;
  ne->level = kLowLow;
@@ -45,7 +54,7 @@ static int enable_noise_estimation(VP9_COMP *const cpi) {
 #endif
 // Enable noise estimation if denoising is on.
 #if CONFIG_VP9_TEMPORAL_DENOISING
-  if (cpi->oxcf.noise_sensitivity > 0 && denoise_svc(cpi) &&
+  if (cpi->oxcf.noise_sensitivity > 0 && noise_est_svc(cpi) &&
      cpi->common.width >= 320 && cpi->common.height >= 180)
    return 1;
 #endif
@@ -111,7 +120,7 @@ void vp9_update_noise_estimate(VP9_COMP *const cpi) {
  // Estimate is between current source and last source.
  YV12_BUFFER_CONFIG *last_source = cpi->Last_Source;
 #if CONFIG_VP9_TEMPORAL_DENOISING
-  if (cpi->oxcf.noise_sensitivity > 0 && denoise_svc(cpi)) {
+  if (cpi->oxcf.noise_sensitivity > 0 && noise_est_svc(cpi)) {
    last_source = &cpi->denoiser.last_source;
    // Tune these thresholds for different resolutions when denoising is
    // enabled.
@@ -131,7 +140,7 @@ void vp9_update_noise_estimate(VP9_COMP *const cpi) {
      (cpi->svc.number_spatial_layers == 1 &&
       (ne->last_w != cm->width || ne->last_h != cm->height))) {
 #if CONFIG_VP9_TEMPORAL_DENOISING
-    if (cpi->oxcf.noise_sensitivity > 0 && denoise_svc(cpi))
+    if (cpi->oxcf.noise_sensitivity > 0 && noise_est_svc(cpi))
      copy_frame(&cpi->denoiser.last_source, cpi->Source);
 #endif
    if (last_source != NULL) {
@@ -146,7 +155,7 @@ void vp9_update_noise_estimate(VP9_COMP *const cpi) {
    ne->count = 0;
    ne->num_frames_estimate = 10;
 #if CONFIG_VP9_TEMPORAL_DENOISING
-    if (cpi->oxcf.noise_sensitivity > 0 && denoise_svc(cpi) &&
+    if (cpi->oxcf.noise_sensitivity > 0 && noise_est_svc(cpi) &&
        cpi->svc.current_superframe > 1) {
      vp9_denoiser_set_noise_level(&cpi->denoiser, ne->level);
      copy_frame(&cpi->denoiser.last_source, cpi->Source);
@@ -257,14 +266,14 @@ void vp9_update_noise_estimate(VP9_COMP *const cpi) {
        ne->count = 0;
        ne->level = vp9_noise_estimate_extract_level(ne);
 #if CONFIG_VP9_TEMPORAL_DENOISING
-        if (cpi->oxcf.noise_sensitivity > 0 && denoise_svc(cpi))
+        if (cpi->oxcf.noise_sensitivity > 0 && noise_est_svc(cpi))
          vp9_denoiser_set_noise_level(&cpi->denoiser, ne->level);
 #endif
      }
    }
  }
 #if CONFIG_VP9_TEMPORAL_DENOISING
-  if (cpi->oxcf.noise_sensitivity > 0 && denoise_svc(cpi))
+  if (cpi->oxcf.noise_sensitivity > 0 && noise_est_svc(cpi))
    copy_frame(&cpi->denoiser.last_source, cpi->Source);
 #endif
 }
--- a/vp9/encoder/vp9_pickmode.c
+++ b/vp9/encoder/vp9_pickmode.c
@@ -1498,6 +1498,9 @@ void vp9_pick_inter_mode(VP9_COMP *cpi, MACROBLOCK *x, TileDataEnc *tile_data,
  MV_REFERENCE_FRAME best_second_ref_frame = NONE;
  int comp_modes = 0;
  int num_inter_modes = (cpi->use_svc) ? RT_INTER_MODES_SVC : RT_INTER_MODES;
+  int flag_svc_subpel = 0;
+  int svc_mv_col = 0;
+  int svc_mv_row = 0;

  init_ref_frame_cost(cm, xd, ref_frame_cost);

@@ -1647,6 +1650,18 @@ void vp9_pick_inter_mode(VP9_COMP *cpi, MACROBLOCK *x, TileDataEnc *tile_data,
  if (cpi->use_svc || cpi->oxcf.speed <= 7 || bsize < BLOCK_32X32)
    x->sb_use_mv_part = 0;

+  // Set the flag_svc_subpel to 1 for SVC if the lower spatial layer used
+  // an averaging filter for downsampling (phase = 8). If so, we will test
+  // a nonzero motion mode on the spatial (goldeen) reference.
+  // The nonzero motion is half pixel shifted to left and top (-4, -4).
+  if (cpi->use_svc && cpi->svc.spatial_layer_id > 0 &&
+      svc_force_zero_mode[GOLDEN_FRAME - 1] &&
+      cpi->svc.downsample_filter_phase[cpi->svc.spatial_layer_id - 1] == 8) {
+    svc_mv_col = -4;
+    svc_mv_row = -4;
+    flag_svc_subpel = 1;
+  }
+
  for (idx = 0; idx < num_inter_modes + comp_modes; ++idx) {
    int rate_mv = 0;
    int mode_rd_thresh;
@@ -1660,6 +1675,7 @@ void vp9_pick_inter_mode(VP9_COMP *cpi, MACROBLOCK *x, TileDataEnc *tile_data,
    int inter_mv_mode = 0;
    int skip_this_mv = 0;
    int comp_pred = 0;
+    int force_gf_mv = 0;
    PREDICTION_MODE this_mode;
    second_ref_frame = NONE;

@@ -1680,6 +1696,22 @@ void vp9_pick_inter_mode(VP9_COMP *cpi, MACROBLOCK *x, TileDataEnc *tile_data,
      comp_pred = 1;
    }

+    if (ref_frame > usable_ref_frame) continue;
+    if (skip_ref_find_pred[ref_frame]) continue;
+
+    if (flag_svc_subpel && ref_frame == GOLDEN_FRAME) {
+      force_gf_mv = 1;
+      // Only test mode if NEARESTMV/NEARMV is (svc_mv_col, svc_mv_row),
+      // otherwise set NEWMV to (svc_mv_col, svc_mv_row).
+      if (this_mode == NEWMV) {
+        frame_mv[this_mode][ref_frame].as_mv.col = svc_mv_col;
+        frame_mv[this_mode][ref_frame].as_mv.row = svc_mv_row;
+      } else if (frame_mv[this_mode][ref_frame].as_mv.col != svc_mv_col ||
+                 frame_mv[this_mode][ref_frame].as_mv.row != svc_mv_row) {
+        continue;
+      }
+    }
+
    if (comp_pred) {
      const struct segmentation *const seg = &cm->seg;
      if (!cpi->allow_comp_inter_inter) continue;
@@ -1690,9 +1722,6 @@ void vp9_pick_inter_mode(VP9_COMP *cpi, MACROBLOCK *x, TileDataEnc *tile_data,
      if (segfeature_active(seg, mi->segment_id, SEG_LVL_REF_FRAME)) continue;
    }

-    if (ref_frame > usable_ref_frame) continue;
-    if (skip_ref_find_pred[ref_frame]) continue;
-
    // For SVC, skip the golden (spatial) reference search if sse of zeromv_last
    // is below threshold.
    if (cpi->use_svc && ref_frame == GOLDEN_FRAME &&
@@ -1737,7 +1766,7 @@ void vp9_pick_inter_mode(VP9_COMP *cpi, MACROBLOCK *x, TileDataEnc *tile_data,
    // Skip non-zeromv mode search for golden frame if force_skip_low_temp_var
    // is set. If nearestmv for golden frame is 0, zeromv mode will be skipped
    // later.
-    if (force_skip_low_temp_var && ref_frame == GOLDEN_FRAME &&
+    if (!force_gf_mv && force_skip_low_temp_var && ref_frame == GOLDEN_FRAME &&
        frame_mv[this_mode][ref_frame].as_int != 0) {
      continue;
    }
@@ -1751,7 +1780,7 @@ void vp9_pick_inter_mode(VP9_COMP *cpi, MACROBLOCK *x, TileDataEnc *tile_data,
    }

    if (cpi->use_svc) {
-      if (svc_force_zero_mode[ref_frame - 1] &&
+      if (!force_gf_mv && svc_force_zero_mode[ref_frame - 1] &&
          frame_mv[this_mode][ref_frame].as_int != 0)
        continue;
    }
@@ -1808,7 +1837,7 @@ void vp9_pick_inter_mode(VP9_COMP *cpi, MACROBLOCK *x, TileDataEnc *tile_data,
                             &rd_thresh_freq_fact[mode_index])))
      continue;

-    if (this_mode == NEWMV) {
+    if (this_mode == NEWMV && !force_gf_mv) {
      if (ref_frame > LAST_FRAME && !cpi->use_svc &&
          cpi->oxcf.rc_mode == VPX_CBR) {
        int tmp_sad;
@@ -1949,7 +1978,7 @@ void vp9_pick_inter_mode(VP9_COMP *cpi, MACROBLOCK *x, TileDataEnc *tile_data,
    if ((this_mode == NEWMV || filter_ref == SWITCHABLE) &&
        pred_filter_search &&
        (ref_frame == LAST_FRAME ||
-         (ref_frame == GOLDEN_FRAME &&
+         (ref_frame == GOLDEN_FRAME && !force_gf_mv &&
          (cpi->use_svc || cpi->oxcf.rc_mode == VPX_VBR))) &&
        (((mi->mv[0].as_mv.row | mi->mv[0].as_mv.col) & 0x07) != 0)) {
      int pf_rate[3];
--- a/vp9/encoder/vp9_ratectrl.c
+++ b/vp9/encoder/vp9_ratectrl.c
@@ -1488,6 +1488,8 @@ void vp9_rc_postencode_update(VP9_COMP *cpi, uint64_t bytes_used) {
    cpi->rc.last_frame_is_src_altref = cpi->rc.is_src_frame_alt_ref;
  }
  if (cm->frame_type != KEY_FRAME) rc->reset_high_source_sad = 0;
+
+  rc->last_avg_frame_bandwidth = rc->avg_frame_bandwidth;
 }

 void vp9_rc_postencode_update_drop_frame(VP9_COMP *cpi) {
@@ -1873,9 +1875,12 @@ void vp9_rc_set_gf_interval_range(const VP9_COMP *const cpi,

    if (oxcf->target_level == LEVEL_AUTO) {
      const uint32_t pic_size = cpi->common.width * cpi->common.height;
+      const uint32_t pic_breadth =
+          VPXMAX(cpi->common.width, cpi->common.height);
      int i;
      for (i = LEVEL_1; i < LEVEL_MAX; ++i) {
-        if (vp9_level_defs[i].max_luma_picture_size > pic_size) {
+        if (vp9_level_defs[i].max_luma_picture_size >= pic_size &&
+            vp9_level_defs[i].max_luma_picture_breadth >= pic_breadth) {
          if (rc->min_gf_interval <=
              (int)vp9_level_defs[i].min_altref_distance) {
            rc->min_gf_interval =
--- a/vp9/encoder/vp9_ratectrl.h
+++ b/vp9/encoder/vp9_ratectrl.h
@@ -152,6 +152,8 @@ typedef struct {
  int rc_2_frame;
  int q_1_frame;
  int q_2_frame;
+  // Keep track of the last target average frame bandwidth.
+  int last_avg_frame_bandwidth;

  // Auto frame-scaling variables.
  FRAME_SCALE_LEVEL frame_size_selector;
--- a/vp9/encoder/vp9_svc_layercontext.c
+++ b/vp9/encoder/vp9_svc_layercontext.c
@@ -37,6 +37,7 @@ void vp9_init_layer_context(VP9_COMP *const cpi) {
  svc->scaled_one_half = 0;
  svc->current_superframe = 0;
  svc->non_reference_frame = 0;
+
  for (i = 0; i < REF_FRAMES; ++i) svc->ref_frame_index[i] = -1;
  for (sl = 0; sl < oxcf->ss_number_layers; ++sl) {
    svc->ext_frame_flags[sl] = 0;
@@ -389,7 +390,7 @@ int vp9_is_upper_layer_key_frame(const VP9_COMP *const cpi) {
             .is_key_frame;
 }

-static void get_layer_resolution(const int width_org, const int height_org,
+void get_layer_resolution(const int width_org, const int height_org,
                          const int num, const int den, int *width_out,
                          int *height_out) {
  int w, h;
@@ -656,9 +657,9 @@ int vp9_one_pass_cbr_svc_start_layer(VP9_COMP *const cpi) {
                       lc->scaling_factor_num, lc->scaling_factor_den, &width,
                       &height);

-  // For resolutions <= QVGA: set phase of the filter = 8 (for symmetric
+  // For resolutions <= VGA: set phase of the filter = 8 (for symmetric
  // averaging filter), use bilinear for now.
-  if (width * height <= 320 * 240) {
+  if (width * height <= 640 * 480) {
    cpi->svc.downsample_filter_type[cpi->svc.spatial_layer_id] = BILINEAR;
    cpi->svc.downsample_filter_phase[cpi->svc.spatial_layer_id] = 8;
  }
@@ -861,3 +862,28 @@ void vp9_svc_reset_key_frame(VP9_COMP *const cpi) {
  vp9_update_temporal_layer_framerate(cpi);
  vp9_restore_layer_context(cpi);
 }
+
+void vp9_svc_check_reset_layer_rc_flag(VP9_COMP *const cpi) {
+  SVC *svc = &cpi->svc;
+  int sl, tl;
+  for (sl = 0; sl < svc->number_spatial_layers; ++sl) {
+    // Check for reset based on avg_frame_bandwidth for spatial layer sl.
+    int layer = LAYER_IDS_TO_IDX(sl, svc->number_temporal_layers - 1,
+                                 svc->number_temporal_layers);
+    LAYER_CONTEXT *lc = &svc->layer_context[layer];
+    RATE_CONTROL *lrc = &lc->rc;
+    if (lrc->avg_frame_bandwidth > (3 * lrc->last_avg_frame_bandwidth >> 1) ||
+        lrc->avg_frame_bandwidth < (lrc->last_avg_frame_bandwidth >> 1)) {
+      // Reset for all temporal layers with spatial layer sl.
+      for (tl = 0; tl < svc->number_temporal_layers; ++tl) {
+        int layer = LAYER_IDS_TO_IDX(sl, tl, svc->number_temporal_layers);
+        LAYER_CONTEXT *lc = &svc->layer_context[layer];
+        RATE_CONTROL *lrc = &lc->rc;
+        lrc->rc_1_frame = 0;
+        lrc->rc_2_frame = 0;
+        lrc->bits_off_target = lrc->optimal_buffer_level;
+        lrc->buffer_level = lrc->optimal_buffer_level;
+      }
+    }
+  }
+}
--- a/vp9/encoder/vp9_svc_layercontext.h
+++ b/vp9/encoder/vp9_svc_layercontext.h
@@ -49,7 +49,7 @@ typedef struct {
  uint8_t speed;
 } LAYER_CONTEXT;

-typedef struct {
+typedef struct SVC {
  int spatial_layer_id;
  int temporal_layer_id;
  int number_spatial_layers;
@@ -99,6 +99,8 @@ typedef struct {

  BLOCK_SIZE *prev_partition_svc;
  int mi_stride[VPX_MAX_LAYERS];
+
+  int first_layer_denoise;
 } SVC;

 struct VP9_COMP;
@@ -128,6 +130,10 @@ void vp9_save_layer_context(struct VP9_COMP *const cpi);
 // Initialize second pass rc for spatial svc.
 void vp9_init_second_pass_spatial_svc(struct VP9_COMP *cpi);

+void get_layer_resolution(const int width_org, const int height_org,
+                          const int num, const int den, int *width_out,
+                          int *height_out);
+
 // Increment number of video frames in layer
 void vp9_inc_frame_in_layer(struct VP9_COMP *const cpi);

@@ -148,6 +154,8 @@ void vp9_free_svc_cyclic_refresh(struct VP9_COMP *const cpi);

 void vp9_svc_reset_key_frame(struct VP9_COMP *const cpi);

+void vp9_svc_check_reset_layer_rc_flag(struct VP9_COMP *const cpi);
+
 #ifdef __cplusplus
 }  // extern "C"
 #endif
--- a/vp9/encoder/x86/vp9_error_avx2.c
+++ b/vp9/encoder/x86/vp9_error_avx2.c
@@ -1,7 +1,7 @@
 /*
 *  Copyright (c) 2014 The WebM project authors. All Rights Reserved.
 *
- *  Usee of this source code is governed by a BSD-style license
+ *  Use of this source code is governed by a BSD-style license
 *  that can be found in the LICENSE file in the root of the source
 *  tree. An additional intellectual property rights grant can be found
 *  in the file PATENTS.  All contributing project authors may
--- a/vp9/encoder/x86/vp9_quantize_ssse3_x86_64.asm
+++ b/vp9/encoder/x86/vp9_quantize_ssse3_x86_64.asm
@@ -169,7 +169,7 @@ cglobal quantize_%1, 0, %2, 15, coeff, ncoeff, skip, round, quant, \
  pshuflw                         m7, m8, 0x1
  pmaxsw                          m8, m7
  pextrw                          r6, m8, 0
-  mov                           [r2], r6
+  mov                           [r2], r6w
  RET
 %endmacro

--- a/vpx/src/vpx_encoder.c
+++ b/vpx/src/vpx_encoder.c
@@ -12,8 +12,11 @@
 * \brief Provides the high level interface to wrap encoder algorithms.
 *
 */
+#include <assert.h>
 #include <limits.h>
+#include <stdlib.h>
 #include <string.h>
+#include "vp8/common/blockd.h"
 #include "vpx_config.h"
 #include "vpx/internal/vpx_codec_internal.h"

@@ -81,6 +84,8 @@ vpx_codec_err_t vpx_codec_enc_init_multi_ver(
    int i;
    void *mem_loc = NULL;

+    if (iface->enc.mr_get_mem_loc == NULL) return VPX_CODEC_INCAPABLE;
+
    if (!(res = iface->enc.mr_get_mem_loc(cfg, &mem_loc))) {
      for (i = 0; i < num_enc; i++) {
        vpx_codec_priv_enc_mr_cfg_t mr_cfg;
@@ -89,9 +94,7 @@ vpx_codec_err_t vpx_codec_enc_init_multi_ver(
        if (dsf->num < 1 || dsf->num > 4096 || dsf->den < 1 ||
            dsf->den > dsf->num) {
          res = VPX_CODEC_INVALID_PARAM;
-          break;
-        }
-
+        } else {
          mr_cfg.mr_low_res_mode_info = mem_loc;
          mr_cfg.mr_total_resolutions = num_enc;
          mr_cfg.mr_encoder_id = num_enc - 1 - i;
@@ -110,6 +113,7 @@ vpx_codec_err_t vpx_codec_enc_init_multi_ver(
          ctx->init_flags = flags;
          ctx->config.enc = cfg;
          res = ctx->iface->init(ctx, &mr_cfg);
+        }

        if (res) {
          const char *error_detail = ctx->priv ? ctx->priv->err_detail : NULL;
@@ -124,10 +128,14 @@ vpx_codec_err_t vpx_codec_enc_init_multi_ver(
            vpx_codec_destroy(ctx);
            i--;
          }
+#if CONFIG_MULTI_RES_ENCODING
+          assert(mem_loc);
+          free(((LOWER_RES_FRAME_INFO *)mem_loc)->mb_info);
+          free(mem_loc);
+#endif
+          return SAVE_STATUS(ctx, res);
        }

-        if (res) break;
-
        ctx++;
        cfg++;
        dsf++;
--- a/vpx/vp8cx.h
+++ b/vpx/vp8cx.h
@@ -408,7 +408,7 @@ enum vp8e_enc_control_id {

  /*!\brief Codec control function to set noise sensitivity.
   *
-   *  0: off, 1: On(YOnly)
+   *  0: off, 1: On(YOnly), 2: For SVC only, on top two spatial layers(YOnly)
   *
   * Supported in codecs: VP9
   */
--- a/vpx/vp8dx.h
+++ b/vpx/vp8dx.h
@@ -179,6 +179,8 @@ VPX_CTRL_USE_TYPE(VP9_INVERT_TILE_DECODE_ORDER, int)
 #define VPX_CTRL_VP9_INVERT_TILE_DECODE_ORDER
 #define VPX_CTRL_VP9_DECODE_SVC_SPATIAL_LAYER
 VPX_CTRL_USE_TYPE(VP9_DECODE_SVC_SPATIAL_LAYER, int)
+#define VPX_CTRL_VP9_SET_SKIP_LOOP_FILTER
+VPX_CTRL_USE_TYPE(VP9_SET_SKIP_LOOP_FILTER, int)

 /*!\endcond */
 /*! @} - end defgroup vp8_decoder */
--- a/vpx_dsp/vpx_dsp_rtcd_defs.pl
+++ b/vpx_dsp/vpx_dsp_rtcd_defs.pl
@@ -1,3 +1,13 @@
+##
+##  Copyright (c) 2017 The WebM project authors. All Rights Reserved.
+##
+##  Use of this source code is governed by a BSD-style license
+##  that can be found in the LICENSE file in the root of the source
+##  tree. An additional intellectual property rights grant can be found
+##  in the file PATENTS.  All contributing project authors may
+##  be found in the AUTHORS file in the root of the source tree.
+##
+
 sub vpx_dsp_forward_decls() {
 print <<EOF
 /*
--- a/vpx_scale/vpx_scale_rtcd.pl
+++ b/vpx_scale/vpx_scale_rtcd.pl
@@ -1,3 +1,13 @@
+##
+##  Copyright (c) 2017 The WebM project authors. All Rights Reserved.
+##
+##  Use of this source code is governed by a BSD-style license
+##  that can be found in the LICENSE file in the root of the source
+##  tree. An additional intellectual property rights grant can be found
+##  in the file PATENTS.  All contributing project authors may
+##  be found in the AUTHORS file in the root of the source tree.
+##
+
 sub vpx_scale_forward_decls() {
 print <<EOF
 struct yv12_buffer_config;
Author	SHA1	Message	Date
Johann	f80be22a10	Release 1.7.0 Mandarin Duck Change-Id: I186440f3643a85694f45400393efb661f6d012fc	2018-01-24 14:25:44 -08:00
James Zern	81d66e2cc6	vpx_codec_enc_init_multi: fix segfault w/vp9 vp9 does not support multi-res encoding, the request should not crash. + encode_api_test: unconditionally expose multi-res test vpx_codec_enc_init_multi should fail independent of CONFIG_MULTI_RES_ENCODING if not for the same reason. Change-Id: I44fc58ef70ee4e0e482cb6a5736885f4cb2a8517 (cherry picked from commit `004fb91416`)	2018-01-23 18:14:23 -08:00
Jerome Jiang	9f36419bf2	Fix crash invalid params for vp8 multres. Add test. Fix is from the patch in the issue. Release memories allocated before early exit. BUG=webm:1482 Change-Id: I64952af99c58241496e03fa55da09fd129a07c77 (cherry picked from commit `5b6ae020b6`)	2018-01-23 18:14:14 -08:00
Johann Koenig	740883897a	Merge "Revert "Add frame width & height to frame pkt. Add test."" into mandarinduck	2018-01-17 20:17:33 +00:00
Johann	f87a4594fb	Revert "Add frame width & height to frame pkt. Add test." This reverts commit `bd1d995cd3`. Remove the feature from the release as it requires additional work. BUG=webm:1485 Change-Id: I1a01ac2525703af97a456a3eed85718306c0f734	2018-01-17 11:28:24 -08:00
Vignesh Venkatasubramanian	eedda5f924	vp8dx.h: Add macro for skipping loop filter Without this applications cannot use the vpx_codec_control macro for VP9_SET_SKIP_LOOP_FILTER. The tests only cover the underscored version vpx_codec_control_(). Change-Id: I3e6c1888307b76636fdc1a8deae70b5c14238163 (cherry picked from commit `373e08f921`)	2018-01-16 18:27:42 -08:00
paulwilkins	7d19739949	Fix bug in use of zoom metric as part of arf breakout. The in/out (or zoom metrics) in accumulate_frame_motion_stats() are in effect a % of the blocks that have a motion vector pointing either towards or away from the center. As such they are already normalized in terms of image size and the thresholds against which these are tested should be image size independent. In practice a zoom either in or out is an indicator for a shorter group length so the abs value is more important as a breakout clause. This patch fixes the threshold test. Clips without noticeable zoom show no effect but some with strong zooms such as "station" show a big gain (5-10%). Average psnr-hvs gain on hdres set was 0.292% Change-Id: I4f97a72b0e273e4e844ade15285749c32cd81c1c (cherry picked from commit `0226ce79e9`)	2018-01-16 11:21:29 -08:00
Johann	c5dc3373db	work around pic issue with gcc 6 Enable pic when building sse2 or higher optimizations. BUG=webm:1464 Change-Id: I36c6e83ed716649f3d9ee10ce3aa9bb847cac2d9	2018-01-09 12:46:45 -08:00
Scott LaVarnway	8a4336ed2e	Add vp9_quantize_fp_nz_c() -- 2 This c version uses the shortcuts found in the x86 vp9_quantize_fp functions. The test was updated to use the correct quant/round range. Change-Id: Ie5871f710d9eb39047d8d9f48b907c0633e1f830	2017-12-21 15:26:36 -08:00
James Zern	1a7bf0d1f9	Merge "vp9_quantize_ssse3_x86_64: fix out of bounds write"	2017-12-21 23:02:32 +00:00
Ralph Giles	117893a717	Don't force inlining for msvc targets. INLINE is defined as __forceinline for vs* configs, but is the normal, compiler-discretion inline for gcc/clang configs. This makes many functions very large when building for windows targets, much larger than they are elsewhere. Use '__inline' as a consistent definition to get consistent function sizes. Although Visual Studio documentation says that 'inline' is only available in C+ code. This is probably incorrect, since Visual Studio 2017 accepts C99 'inline' even when passed /TC. Nevertheless, this commit uses the recommended '__inline' for consistency. Thanks to David Major for the diagnosis. Change-Id: Ib0b31a3afcea77822c84fe3c6cd452add66d825a	2017-12-21 13:55:18 -08:00
James Zern	84a7263d4c	vp9_quantize_ssse3_x86_64: fix out of bounds write eob is a pointer to a uint16_t. previously the code would store 64-bits causing a crash or test failure with the right stack layout. Change-Id: Ibd653baf323db114f2444951b9d8b00c596bf15a	2017-12-21 16:53:14 -05:00
James Zern	7a245adb18	Revert "Add vp9_quantize_fp_nz_c()" This reverts commit `86842855d3`. SSSE3/VP9QuantizeTest.EOBCheck/1 fails on Mac and the build breaks under visual studio due to a #if within another macro. Change-Id: I475095a04aafcc714fade2b24e4df7b682be2cd1	2017-12-21 06:05:19 -08:00
Scott LaVarnway	de50e8052c	Merge "Add vp9_quantize_fp_nz_c()"	2017-12-20 23:15:11 +00:00
James Zern	1a9c7bee88	Merge "lpf_test: correct threshold ranges"	2017-12-20 20:22:34 +00:00
Marco	9ca9c12dbd	vp9-svc: Add layer bitrate targeting to SVC datarate tests. Modify and update the SVC datarate unittests to verify the rate targeting for each spatial-temporal layer. The current tests were only verifying the rate targeting of the full SVC stream, not individual layers. Also re-enabled a test that was disabled. This is a stronger verification of the layered rate control for SVC for 1 pass CBR encoding. Added PostEncodeFrameHook, needed to get the layer_id and update the layer buffer level. Change-Id: I9fd54ad474686b20a6de3250d587e2cec194a56f	2017-12-19 19:48:47 -08:00
Scott LaVarnway	86842855d3	Add vp9_quantize_fp_nz_c() This c version uses the shortcuts found in the x86 vp9_quantize_fp functions. The test was updated to use the correct quant/round range. Change-Id: I5d19f8af2fddda8e50910249eafb740acb29415b	2017-12-19 12:48:45 -08:00
Marco	a2127236ae	vp9: Reset buffer level on large bitrate changes. For a large change in the target avg_frame_bandwidth, via the update in change_config()), reset the buffer_level to optimal_level. This fix prevents possible frame drops, where for example, encoder suddenly goes from lower to higher bitrate. Change-Id: I2f844c41d04c01240e85f574e59d2b9075c7eb6d	2017-12-19 09:57:21 -08:00
James Zern	5203b40a2a	lpf_test: correct threshold ranges the random number generator creates values from [0, range) add 1 to all and make hev more realistic by mirroring its calculation of level >> 4, i.e., [0, 3] Change-Id: Ic19be5d7ba668deb17c96f143b739116a4b5d21c	2017-12-18 23:17:45 -08:00
Shiyou Yin	08a668af32	vp8: [loongson] optimize loopfilter v2. Optimize function vp8_mbloop_filter_vertical_edge_mmi and function vp8_mbloop_filter_horizontal_edge_mmi. Make full use of memory loading delay slot and reduce unnecessary instructions. Change-Id: I61da2c3a44c06044225461f46bf487d83cba6c16	2017-12-15 17:06:47 +08:00
Shiyou Yin	09519a55c7	Merge "vp8: [loongson] optimize sixtab predict v2."	2017-12-15 00:53:21 +00:00
Johann Koenig	7970cc02df	Merge "add copyright to rtcd files"	2017-12-14 23:44:30 +00:00
Johann Koenig	d95ddc7c71	Merge "mark generated version header"	2017-12-14 23:44:04 +00:00
Johann	e4b3f03c64	add copyright to rtcd files Allows them to pass the license check in chromium. BUG=chromium:98319 Change-Id: Iefc1706152a549d8c4ae774c917596bf1c9492d8	2017-12-14 22:50:08 +00:00
Johann Koenig	7d1bf5d12a	Merge "remove unused tools"	2017-12-14 21:19:59 +00:00
Johann Koenig	9f8433ffe2	Merge "fix typo in boilerplate"	2017-12-14 21:19:47 +00:00
Johann	920ba82409	remove unused tools all_builds.py has been more or less replaced by Jenkins. author_first_release.sh is unused. ftfy.sh has been obviated by having the whole tree clang-format clean. Change-Id: I741315ad9042e6e901f07410e93f28371db703b2	2017-12-14 20:34:14 +00:00
Johann	fe4de1ff63	mark generated version header Allows it to pass the license check in chromium. BUG=chromium:98319 Change-Id: I5ba9c8c81ab9eb4168df09db9d2eab846e99e981	2017-12-14 11:58:10 -08:00
Johann	6746ba6d01	fix typo in boilerplate The extra 'e' was causing the chromium license check to flag this file. BUG=chromium:98319 Change-Id: Ic875ba66370298bf998438d14ff5f7e760293706	2017-12-14 11:54:16 -08:00
Johann	05e6e9ac83	mark generated rtcd headers Allows them to pass the license check in chromium. BUG=chromium:98319 Change-Id: Ib37bf45bdac8cf1edc62037dea17b734a5e37fa7	2017-12-14 11:48:46 -08:00
Shiyou Yin	f2ad523461	vp8: [loongson] optimize sixtab predict v2. 1. Delete unnecessary zero setting process. 2. Optimize the method of calculating SSE in vpx_varianceWxH. Change-Id: I8bab801416e7f4958c28c6d080e3cf785a50f82b	2017-12-14 16:29:58 +08:00
Marco	c58f01724c	vp9: Update to SVC datarate tests. With recent fixes to rate control for SVC the buffer underrun in the tests does not happen, so comment and TODO can be removed. Also, in some of these SVC tests, replace the HD clip with the corresponding VGA clip, which has > 400 frames. For the (niklas) HD clip: it has only 60 frames but the test was running up to 300 frames. Fixed it to 60 frames. Keep some tests with the HD clip, needed for the 4 thread and 5 level scaling test. Change-Id: I0a2356a908e8b2271c7a422eb8b15c0d56eec968	2017-12-13 14:07:52 -08:00
Marco Paniconi	028429310a	Merge "vp9: Reset rc flags on some configuration changes."	2017-12-13 21:03:40 +00:00
Marco	e9ad5d2aee	vp9: Cleanup/remove TODO comment. Change-Id: I2bd43e996909ad688b7e00b81ee19a5fc4df460b	2017-12-13 11:30:09 -08:00
Marco	a40fa1f95d	vp9: Reset rc flags on some configuration changes. For large dynamic changes in target avg_frame_bandwidth, or a change in resolution, via the update in change_config()), reset the under/overshoot flags (rc_1_frame, rc_2_frame) to prevent constraining the QP for the first few frames following the change. For SVC use the spatial stream avg_frame_bandwidth in reset condition. For the avg_frame_bandwidth condition, use fairly large threshold (~50%) for now in reset. This allows for better/faster QP response if, for example, application dynamically changes bitrate by large amount. Change-Id: Ib6e3761732d956949d79c9247e50dba744a535c0	2017-12-13 10:41:38 -08:00
Paul Wilkins	94eaecaa91	Merge "Bug fix for second reference stats."	2017-12-12 11:56:10 +00:00
Jerome Jiang	f9ecdc35ec	Merge "vp9 svc: Allow denoising next to highest resolution."	2017-12-12 05:27:11 +00:00
Jerome Jiang	c1e511fd82	vp9 svc: Allow denoising next to highest resolution. Denoise 2 spatial layes at most. Add noise sensitivity level 2 for vp9 such that applications can control whether to denoise the second highest spatial layer. Add tests to cover this case. Change-Id: Ic327d14b29adeba3f0dae547629f43b98d22997f	2017-12-11 15:20:19 -08:00
Jerome Jiang	a1689ed16b	Merge "Fix build warnings for gcc 6.3"	2017-12-11 18:27:17 +00:00
paulwilkins	f1ce050f44	Bug fix for second reference stats. Immediately following a key frame the trailing second reference error in the first pass stats will be based on a reference frame from the prior key frame group and will thus usually be much larger. This fix eliminates that effect (which typically triggers a short arf group immediately after a key frame). It also changes the accounting for the first frame in each new arf group. This change gives large gains on a couple of clips that contain mid sequence key frames (e.g. 6% on 1080P tennis). Overall there was a net gain in PSNR and PSNR-HVS ~(0.05- 0.4%) and mixed results for SSIM (+/- 0.2%). Change-Id: I8e00538ac2c0b5c2e7e637903cac329ce5c2a375	2017-12-08 10:05:36 +00:00
Jerome Jiang	2a602f745d	Fix build warnings for gcc 6.3 Clean up some alias. BUG=webm:1465 Change-Id: I99e186162db9f9e15375fef01564692434eda619	2017-12-07 13:42:10 -08:00
Jerome Jiang	14dbdd95e6	Merge "Add frame width & height to frame pkt. Add test."	2017-12-06 22:37:15 +00:00
Jerome Jiang	bd1d995cd3	Add frame width & height to frame pkt. Add test. Used to return correct frame width and height when dynamic resizing happens. BUG=webm:1474 Change-Id: Ia2043f7e1635b3821848a67b9b134f47f14b0f3a	2017-12-06 13:55:18 -08:00
Marco	3562d6b0a2	vp9-svc: Set downsampling filter for VGA layer. Downsampling filter for SVC was set to subsample (phase 0) for HD -> VGA, and bilinear averaging (phase 8) for VGA -> QVGA. This change makes it bilinear averaging for HD -> VGA. Given the recent commit `9f9d4f8`, quality is improved with this change: avgPSNR/SSIM up ~1-3% on HD clips in RTC set. Speed decrease of ~1% for 3 layer SVC. Change-Id: If834a320e372b8b922a6bf7cab4227703b1beae6	2017-12-06 12:01:24 -08:00
Marco Paniconi	575c1933ea	Merge "vp9: Nonrd-pickmode: move some early exits up."	2017-12-06 19:18:51 +00:00
Hui Su	2e44f16443	Merge "Add max luma picture width/height constraint in VP9 level"	2017-12-06 18:46:19 +00:00
Marco	33953f310e	vp9: Nonrd-pickmode: move some early exits up. Move the early exit checks on usable_ref_frame and skip_ref_find_pref up before the check on flag_svc_subpel. The code under flag_svc_subpel requires frame_mv to be set for the golden/spatial reference, which is only set if the both those exits don't pass. No change in behavior. Change-Id: Id304276c745eeb389ff85fa2dcf510d5976bc413	2017-12-06 10:18:44 -08:00
Marco	9f9d4f8dc9	vp9-svc: Allow for nonzero motion on spatial reference. For nonrd pickmode on a given spatial layer, the spatial (golden) reference was always only using zeromv for prediction. In this patch if the downsampling filter used for generating the lower spatial layer is an averaging filter (nonzero phase), we allow for subpel motion on the spatial (golden) reference to compensate for the shift. This is done by forcing the testing of nonzero motion mode to compensate for spatial downsampling shift. Improvement for cases where the downsampling is averaging filter. In the current code this is only done for generating resolutions <= QVGA. Improvement for avgPSNR/SSIM on RTC set for speed 7: ~1.2%. Gain is larger (~2-3%) for VGA clips with 2 spatial layers. ~1% speed slowdown for 3 layer SVC on mac. Change-Id: I9ec4fa20a38947934fc650594596c25280c3b289	2017-12-05 22:41:07 -08:00
Hui Su	07b12aad77	Add max luma picture width/height constraint in VP9 level BUG=b/65412009 Change-Id: I9e1478dcbd2ef9e97f5f8fb5a1c733b5f5cdf396	2017-12-01 16:29:40 -08:00