Values of MODE_UPDATE_PROB and VP9_COEF_UPDATE_PROB are equal, so replacing
them with one constant. Inlining appropriate arguments for functions:
vp9_cond_prob_diff_update (encoder)
vp9_diff_update_prob (decoder)
Change-Id: I1255a1cb477743b799b3bfbbcd8de6b32b067338
Converts the constant rddiv parameter to 128 (from 100) and
implements RDCOST with bit-shift rather than multiplication.
Other parameters are also adjusted to roughly keep the same
balance between Rate and Distortion.
There is a slight speed-up of about 0.5-1% (at speed 0) as
testted on football_cif.
There is a slight change in performance due to small change
in the parameters.
derfraw300: +0.033%
stdhdraw250; +0.102%
Change-Id: I70ac69f58fa71c83108f68fe41796cd19d1fc760
The commit changes to mask available intra prediction modes for test
based on prediction block size.
With this patch, encoding time of CpuUsed 2 reduces from 10% to 20% for
HD clips with a compression drop of 0.2%
Change-Id: I65f320f1237c0f5ae3a355bf7caf447f55625455
When the codec in VBR (or cq) mode hits its max q limits and is
struggling to hit a target bandwidth, the bit target per frame collapses.
In the first instance normal frames cap out at the maximum allowed
Q and then the ARF and GFs do the same. This latter behavior is not
generally desirable as GFs and ARFs are only effective from a quality
and data rate perspective if they have at lease some level of -Q delta
compared to the surrounding frames.
In this patch I define a separate max Q for GFs and ARFs that is
derived from but somewhat lower than that defined for normal frames.
In effect there is a minimum Q delta that will always be available for
GFs and ARFs regardless of the target rate and MAXQ setting.
This may of course mean that the absolute lowest rate obtainable for
a given clip is somewhat higher.
Change-Id: I268868b28401900d0cd87e51e609cd3b784ab54a
We have two SSE2-optimized functions for idct4_1d:
vp9_idct4_1d_sse2 <-- removing this one
idct4_1d_sse2
vp9_idct4_1d_sse2 was used only by the following functions which already
have SSE2 optimized variants:
vp9_idct4x4_16_add_c -> vp9_idct4x4_16_add_see2
idct8_1d -> vp9_idct8x8_{16, 10, 1}_see2
vp9_short_iht4x4_add_c -> vp9_short_iht4x4_add_see2
Change-Id: Ib0a7f6d1373dbaf7a4a41208cd9d0671fdf15edb
To ensure fast encoding/decoding on devices without ssse3 support,
SSE2 optimization of sub-pixel filters was done. Test using 1080p
clip showed the decoder speeds were ~70fps with ssse3 filters, ~60fps
with sse2 filters, and ~15fps with c filters.
Change-Id: Ie2088f87d83a889fba80a613e4d0e287aadd785c
Renames:
fdct4_1d -> fdct4
fadst4_1d -> fadst4
fdct8_1d -> fdct8
fadst8_1d -> fadst8
fdct16_1d -> fdct16
fadst16_1d -> fadst16
"_1d" suffix is redundant, so removing it. The same will happen with idct
in the next change sets.
Change-Id: Ibf421cd2f569146c6079269df7a31819c098265e
This commit re-designs the per transformed block rate-distortion
costs tracking buffers. It removes redundant buffer usage, makes
the needed context memory allocation per VP9_COMP instance and
reuses the same buffer sets inside the rate-distortion optimization
search loop, thereby avoiding repeatedly requiring memory space.
It reduces speed 0 runtime:
bus at 2000 kbps from 166763ms to 158967ms,
football at 600 kbps from 246614ms to 234257ms.
Both about 5% speed-up. Local tests suggest about 2% to 5% speed-up
for speed 1 and 2 settings. This does not change compression
performance.
Change-Id: I363514c5276b5cf9a38c7251088ffc6ab7f9a4c3
Increases these parameters.
There is a small efficiency gain.
Change-Id: Ie5f0ddb39c907d335e0dafa5eb112365a81f4542
derfraw300: +0.091%
stdhdraw250: +0.238%
The intra mode distortion adjustment for skip_encode feature was
broken in the refactoring cc91851. This commit fixes it and tunes
the distortion models used therein.
Change-Id: I0d676e82f8e855536a90cf9b3e3fdefafcd886c6
snprintf is not supported by MSVC, the commit replace it with the msvc
variant _snprintf to enable build.
Change-Id: I686943a78c289bae6b486a5e75effad5f86c24de
Use b_mode_info to store the inter prediction mode of sub8x8 block,
in replacement of the use of partition_info. Remove redundant buffer
update for partition_info. For bus_cif at 2000 kbps, this seem to make
speed 0 about 1% faster.
Change-Id: Id1b3be45e75a24fb4b42335ac480c23e440978f6
When all coefficients are zeros, skip the corresponding 1-D inverse
transform. This practice has been used in the SSE2 implementation of
inverse 32x32 DCT. This commit imports this algorithm into the C code.
Change-Id: I0f58bfcb183a569fab85d524d5d9cf8ae8653f86
We already have itxm_add member in MACROBLOCKD structure. Both
inv_txm4x4_1_add and inv_txm4x4_add are just its special cases for
different eob values. But eob logic is already implemented in
vp9_iwht4x4_add and vp9_idct4x4_add (that's why also removing
inverse_transform_b_4x4_add).
Change-Id: I80bec9b6f7d40c5e5033c613faca5c819c3e6326
For CpuUsed 1 & 2, this commit allow to skip retangular partition check
when NONE is better than SPLIT. It also changed to allow such logic
on alt ref frame coding rather than use square partition all them. The
change has gain compressio about .3% on yt and ythd for both 1&2, It
helped .6% compression on cif and stdhd for both CpuUsed 1&2.
Change-Id: I814b653baf89f59acd20e042629a12938a1bd4e5
Now we have entropy code separate from scan/iscan code. The next step
in future is to move iscan code from common part to the encoder.
Change-Id: Id9732f7d80aec00af35c1d58d1137c4c96c91451
A new set of MSVC warnings were introduced by change
I3f36d3f7cd8d15195a6e2fafd1777cdaf9ecb847
In particular MSVC does not like:-
typedef const int16_t subpel_kernel[SUBPEL_TAPS];
struct subpix_fn_table {
const subpel_kernel *filter_x;
const subpel_kernel *filter_y;
};
causes new warning in MSVC.
warning C4114: same type qualifier used more than once
Change-Id: Iae596fd13aadf36169faf00c68eabe9a32a9b156
This commit allows sub8x8 intra modes test in the rate-distortion
loop for hd sequences in speed 1 and 2.
For sequence y90n of hd set at 8000 kbps, speed 2 runtime goes
from 207s to 210s. For ped_1080p at 3000 kbps, speed 2 runtim goes
from 336s to 337s. Both are running with 300 frames.
This improves compression performance by 0.24% for stdhd and 0.32%
for hd.
Change-Id: I173ca38a6411565ae6cfadd184c42b2070c5de1f
The idea is to have the following names for each transform size:
vp9_idct4x4_add
vp9_idct4x4_1_add
vp9_idct4x4_10_add
vp9_idct4x4_16_add
vp9_idct8x8_add
vp9_idct8x8_1_add
vp9_idct8x8_10_add
vp9_idct8x8_64_add
etc for 16x16, 32x32
The actual list of renames in this patch:
vp9_idct_add_lossless -> vp9_iwht4x4_add
vp9_short_iwalsh4x4_add -> vp9_iwht4x4_16_add
vp9_short_iwalsh4x4_1_add -> vp9_iwht4x4_1_add
vp9_idct_add -> vp9_idct4x4_add
vp9_short_idct4x4_add -> vp9_idct4x4_16_add
vp9_short_idct4x4_1_add -> vp9_idct4x4_1_add
Change-Id: I6f43f7437c68dd30cdd05d72e213765578ed30b1
Speed 4 still does not give a big gain over speed 3.
This just cleans it up a little from the last patch and comments
out features that do not seem to be giving much benefit.
Change-Id: I5f366e6160e1dbe5dc45cf5eb90cc02712baa1b6
Allow selective masking of individual split modes rather than
just a single on / off flag.
For speed 2 recovers the large speed loss seen for some derf
clips in change Ie6bdfa0a370148dd60bd800961077f7e97e67dd4
and a small quality gain.
For speed 1 10 % speed increase observed locally on some derf clips
for minimal quality change.
Change-Id: If86191087b93cbc05351c26c60c7933e2149e485
Moving INTERPOLATIONFILTERTYPE enum and subpix_fn_table struct to
vp9_filter.h. Adding convenient typedef for subpel kernels.
Function vp9_setup_interp_filters() besides setting xd->subpix.filter_x &
xd->subpix.filter_y has a side effect of also setting scale factors. This
is not required inside decode_modes_b() because scale factors have been
already set by set_ref() calls. That's why replacing
vp9_setup_interp_filters() call with newly created vp9_get_filter_kernel()
call. The behavior of vp9_setup_interp_filters() is unchanged (it
is used from the encoder).
Change-Id: I3f36d3f7cd8d15195a6e2fafd1777cdaf9ecb847