Commit Graph

252 Commits

Author SHA1 Message Date
Dmitry Kovalev
12d57a9409 Removing redundant 'extern' keyword.
Change-Id: Ie51306689c0dc527a8aa12d3984389dd8f360dea
2013-09-24 15:13:09 -07:00
Yaowu Xu
014acfa2af fix integer overflow errors
Change-Id: I76f440a917832c02d7a727697b225bac66b99f56
2013-09-19 08:14:26 -07:00
Paul Wilkins
cb50dc7f33 Minor clean up.
Removed some unused code and minor cleanup
/ reordering.

Change-Id: I4083ae56aeb8edfe9b85aa2f42a16aa28d19da94
2013-09-16 13:45:20 +01:00
Jingning Han
e8a967d960 Merge "Adaptive motion search control" 2013-09-13 14:43:23 -07:00
Jingning Han
c4826c5941 Adaptive motion search control
This commit enables adaptive constraint on motion search range for
smaller partitions, given the motion vectors of collocated larger
partition as a candidate initial search point.

It makes speed 0 runtime of bus at CIF and 2000 kbps goes from
167s down to 162s (3% speed-up), at 0.01dB performance gains. In
the settings of speed 1, this makes the runtime goes from 33687 ms
to 32142 ms (4.5% speed-up), at 0.03dB performance gains.

Compression performance wise, it gains at speed 1:
derf  0.118%
yt    0.237%
hd    0.203%
stdhd 0.438%

Change-Id: Ic8b34c67810d9504a9579bef2825d3fa54b69454
2013-09-13 13:58:10 -07:00
Deb Mukherjee
0c3038234d Merge "Clean up of the search best filter speed feature" 2013-09-13 11:03:59 -07:00
Deb Mukherjee
b964646756 Clean up of the search best filter speed feature
Removes this speed feature since it is very slow and unlikely
to be used in practice. This cleanup removes a bunch of unnecessary
complications in the outer encode loop.

Change-Id: I3c66ef1ca924fbfad7dadff297c9e7f652d308a1
2013-09-11 15:16:36 -07:00
Deb Mukherjee
69fe840ec4 Changes in speed 2 settings
Propose some changes to the speed 2 settings to improve quality.
In particular, turns off the adjust_thresholds_by_speed feature
which improves results by 6%. Also removes the code for
adjust_thresholds_by_speed since it conflicts with the adaptive
rd thresh feature.

Overall, with this change speed 2 is -15.2% from speed 0 settings,
on derf, which is significantly better than -21.6% down before.

Change-Id: I6e90a563470979eb0c258ec32d6183ed7ce9a505
2013-09-11 10:54:07 -07:00
Yunqing Wang
939791a129 Modify encode breakout for static frames
Thank Paul for the suggestions. While turning on static-thresh
for static-image videos, a big jump on bitrate was seen. In this
patch, we detected static frames in the video using first-pass
stats. For different cases, disable encode breakout or reduce
encode breakout threshold to limit the skipping.

More modification need be done to break incorrect partition
picking pattern for static frames while skipping happens.

Change-Id: Ia25f47041af0f04e229c70a0185e12b0ffa6047f
2013-09-10 09:06:03 -07:00
Paul Wilkins
4f660cc018 Modified mode skip functionality.
A previous speed feature skipped modes not used in earlier
partitions but this not longer worked as intended following
changes to the partition coding order and in conjunction
with some other speed features (Especially speed 2 and above).

This modified mode skip feature sets a mask after the first X
modes have been tested in each partition depending on the
reference frame of the current best case.

This patch also makes some changes to the order modes are
tested to fit better with this skip functionality.

Initial testing suggests speed and rd hit count improvements
of up to 20% at speed 1. Quality results. (derf -1.9%, std hd  +0.23%).

Change-Id: Idd8efa656cbc0c28f06d09690984c1f18b1115e1
2013-09-10 13:30:10 +01:00
Ivan Maltz
01b35c3c16 API extensions and sample app for spacial scalable encoder
Sample app: vp9_spatial_scalable_encoder
vpx_codec_control extensions:
  VP9E_SET_SVC
  VP9E_SET_WIDTH, VP9E_SET_HEIGHT, VP9E_SET_LAYER
  VP9E_SET_MIN_Q, VP9E_SET_MAX_Q
expanded buffer size for vp9_convolve

modified setting of initial width in vp9_onyx_if.c so that layer size
can be set prior to initial encode

Default number of layers set to 3 (VPX_SS_DEFAULT_LAYERS)
Number of layers set explicitly in vpx_codec_enc_cfg.ss_number_layers

Change-Id: I2c7a6fe6d665113671337032f7ad032430ac4197
2013-09-09 15:57:56 -07:00
Paul Wilkins
1f4bf79d65 Added per pixel inter rd hit count stats
Added some code to output normalized rd hit count stats.
In effect this approximates to the average number of rd
operations/tests per pixel for the sequence.

The results are not quite accurate and I have not bothered
to account for partial SB64s at frame edges and for key frames
However they do give some idea of the number of modes /
prediction methods being tested for each pixel across the
different partition sizes. This indicates how much scope their
is for further gains either by reducing the number of partitions
examined or the modes per partition through heuristics.

Patch 3 moved place where count incremented so partial rd
tests that are aborted with INT_MAX return are also counted.

Example numbers for first 50 frames of Akiyo.
Speed 0 ~84.4 rd operations / pixel
Speed 1 ~28.8
Speed 2 ~11.9

Change-Id: Ib956e787e12f7fa8b12d3a1a2f6cda19a65a6cb8
2013-08-30 00:13:51 +01:00
Deb Mukherjee
b6dbf11ed5 Merge "Adds a speed feature for fast 1-loop forw updates" 2013-08-29 15:54:04 -07:00
Deb Mukherjee
e02dc84c1a Adds a speed feature for fast 1-loop forw updates
Incorporates a speed feature for fast forward updates of
coefficients. This feature takes 3 values:
0 - use standard 2-loop version
1 - use a 1-loop version
2 - use a 1-loop version with reduced updates

Results: derfraw300 +0.007% (on speed 0) at feature value = 1
                    -0.160% (on speed 0) at feature value = 2

There is substantial speed up at speeds 2 and above for low
resolution sequences where the entropy updates are a big part
of the overall computations.

Change-Id: Ie96fc50777088a5bd441288bca6111e43d03bcae
2013-08-28 10:56:52 -07:00
Dmitry Kovalev
7b95f9bf39 Renaming BLOCK_SIZE_TYPE to BLOCK_SIZE in the encoder.
Change-Id: I62bb07c377f947cb72fac68add7a6b199e42c6b9
2013-08-27 11:05:08 -07:00
Dmitry Kovalev
78e670fcf8 Merge "Renaming D27 to D207." 2013-08-27 10:03:57 -07:00
Paul Wilkins
aa823f8667 Merge "Changes to adaptive inter rd thresholds." 2013-08-26 12:48:11 -07:00
Paul Wilkins
642696b678 Merge "Limit Key frame Intra modes checks." 2013-08-26 12:34:56 -07:00
James Zern
c8ba8c513c cosmetics: strip 'VP9_' from defines in vp9 only code
Change-Id: I481d9bb2fa3ec72b6a83d5f04d545ad8013f295c
2013-08-23 19:16:49 -07:00
Dmitry Kovalev
50ee61db4c Renaming D27 to D207.
I've already renamed d27_predictor to d207_predictor but forgot about the
corresponding constant.

Change-Id: Id312aa80fc5b5a1ab8a709a33418a029552a6857
2013-08-23 17:33:48 -07:00
Paul Wilkins
aa5b67add0 Changes to adaptive inter rd thresholds.
Values now carried over frame to frame.
Change to algorithm for decreasing threshold after
a hit and to max threshold (now based on speed)

Removed some old commented out code relating to
VP8 adaptive thresholds.

The impact of these changes tested on Akiyo (50 frames)
and measured in terms of unit rd hits is as follows:

Speed 0 84.36 -> 84.67
Speed 1 29.48 -> 22.22
Speed 2 11.76 -> 8.21
Speed 3 12.32 -> 7.21

Encode speed impact is broadly in line with these.

Change-Id: I5b886efee3077a11553fa950d796fd6d00c8cb19
2013-08-23 16:18:45 +01:00
Paul Wilkins
f76f52df61 Limit Key frame Intra modes checks.
Most of the focus so far has been on inter frames.

At high speed settings the key frame is now taking a high %
of the cycles.

This patch puts in some masking to reduce the number
of INTRA modes searched during key frame coding (as already
happens for inter frames) at higher speed settings

TODO: Develop this further with either adaptive rd thresholds
when choosing which intra modes to consider or some other
heuristic.

Impact.
At high speed settings on some clips the key frame was starting
to dominate. In a coding of the first 50 frames of AKIYO at speed
2 limiting the key frame intra modes to DC or TM_PRED resulted in
~30% overall speedup. For Bus the number was lower at ~4-5%.

Change-Id: I7bde68aee04995f9d9beb13a1902143112e341e2
2013-08-23 16:10:30 +01:00
Deb Mukherjee
2ffe64ad5c Cleanup/enhancements of switchable filter search
Cleans up the switchable filter search logic. Also adds a
speed feature - a variance threshold - to disable filter search
if source variance is lower than this value.

Results: derfraw300
threshold = 16, psnr -0.238%, 4-5% speedup (tested on football)
threshold = 32, psnr -0.381%, 8-9% speedup (tested on football)
threshold = 64, psnr -0.611%, 12-13% speedup (tested on football)
threshold = 96, psnr -0.804%, 16-17% speedup (tested on football)

Based on these results, the threshold is chosen as 16 for speed 1,
32 for speed 2, 64 for speed 3 and 96 for speed 4.

Change-Id: Ib630d39192773b1983d3d349b97973768e170c04
2013-08-20 09:47:04 -07:00
Deb Mukherjee
24856b6abc Speed feature to skip split partition based on var
Adds a speed feature to disable split partition search based on a
given threshold on the source variance. A tighter threshold derived
from the threshold provided is used to also disable horizontal and
vertical partitions.

Results on derfraw300:
threshold = 16, psnr = -0.057%, speedup ~1% (football)
threshold = 32, psnr = -0.150%, speedup ~4-5% (football)
threshold = 64, psnr = -0.570%, speedup ~10-12% (football)

Results on stdhdraw250:
threshold = 32, psnr = -0.18%, speedup is somewhat more than derf
because of a larger number of smoother blocks at higher resolution.

Based on these results, a threshold of 32 is chosen for speed 1,
and a threshold of 64 is chosen for speeds 2 and above.

Change-Id: If08912fb6c67fd4242d12a0d094783a99f52f6c6
2013-08-15 10:01:45 -07:00
Paul Wilkins
5459f68d71 Trivial clean up.
Delete unused / commented out  variable references.

Change-Id: Iaf20c0c3744f89adb296d153b516b5ea41b4f3b4
2013-08-13 13:26:18 +01:00
Dmitry Kovalev
3c43ec206c Renaming BLOCK_SIZE_TYPES constant to BLOCK_SIZES.
There will be another change set to rename BLOCK_SIZE_TYPE enum to
BLOCK_SIZE.

Change-Id: I8d1dfc873d6186fa5e554262f5169e929978085e
2013-08-09 17:47:32 -07:00
Yaowu Xu
c7c9901845 added a speed feature on lpf level picking
Change-Id: Id578f8afdeab3702fc8386969f2d832d8f1b5420
2013-08-09 07:36:32 -07:00
Deb Mukherjee
2158909fc3 Merge "Adds a new subpel motion function" 2013-08-08 12:26:55 -07:00
Deb Mukherjee
1ba91a84ad Adds a new subpel motion function
Adds a new subpel motion estimation function that uses a 2-level
tree-structured decision tree to eliminate redundant computations.
It searches fewer points than iterative search (which can search
the same point multiple times) but has the same quality roughly.

This is made the default setting at speeds 0 and 1, while at
speed 2 and above only a 1-level search is used.

Also includes various cleanups for consistency and redundancy removal.

Results:
derf: +0.012% psnr
stdhd: +0.09% psnr
Speedup of about 2-3%

Change-Id: Iedde4866f5475586dea0f0ba4cb7428fba24eee9
2013-08-08 11:41:49 -07:00
Jingning Han
debb9c68c8 Use low precision 32x32fdct for encodemb in speed1
The low precision 32x32 fdct has all the intermediate steps within
16-bit depth, hence allowing faster SSE2 implementation, at the
expense of larger round-trip error. It was used in the rate-distortion
optimization search loop only.

Using the low precision version, in replace of the high precision one,
affects the compression performance by about 0.7% (derf, stdhd) at
speed 0. For speed 1, it makes derf set down by only 0.017%.

Change-Id: I4e7d18fac5bea5317b91c8e7dabae143bc6b5c8b
2013-08-07 15:34:12 -07:00
Deb Mukherjee
71b43b0ff0 Clean ups of the subpel search functions
Removes some unused code and speed features, and organizes the
interfaces for fractional mv step functions for use in new speed
features to come.

In the process a new speed feature - number of iterations per
step during the subpel search - is exposed.

No change when this parameter is set as the original value of 3.

Results:
subpel_iters_per_step = 3: baseline
subpel_iters_per_step = 2: psnr -0.067%, 1% speedup
subpel_iters_per_step = 1: psnr -0.331%, 3-4% speedup

Change-Id: I2eba8a21f6461be8caf56af04a5337257a5693a8
2013-08-06 17:23:50 -07:00
Deb Mukherjee
15b5a6a2c7 Flexible support for various pattern searches
Adds a few pattern searches to achieve various tradeoffs
between motion estimation complexity and performance.
The search framework is unified across these searches so that a
common pattern search function is used for all. Besides it will
be easier to experiment with various patterns or combinations
thereof at different scales in the future.

The new pattern search is multi-scale and is capable of using
different patterns at different scales.

The new hex search uses 8 points at the smallest scale
and 6 points at other scales.
Two other pattern searches - big-diamond and square are
also added. Big diamond uses 4 points at the smallest scale and
8 points in diamond shape at the larger scales.
Square is very similar conceptually to the default n-step search
but is somewhat faster since it keeps only one survivor across
all scales.

Psnr/speed-up results on derf300:

hex: -1.6% psnr%, 6-8% speed-up
big-diamond: -0.96% psnr, 4-5% speedup
square: -0.93% psnr, 4-5% speedup

Change-Id: I02a7ef5193f762601e0994e2c99399a3535a43d2
2013-08-06 11:56:39 -07:00
Deb Mukherjee
33afddadb9 Merge "Add variance based mode/skipping" 2013-08-06 10:19:15 -07:00
Deb Mukherjee
8b3faccb9e Add variance based mode/skipping
Adds a speed feature to skip all intra modes other than
DC_PRED if the source variance is small. This feature is
made part of speed 1 and up.

Results on derf300: psnr -0.07%, speedup about 1-2%

Also uses the source variance to fine-tune the early
termination criteria when FLAG_EARLY_TERMINATE is on.
This feature is made part of speed 2 and up.

Results on derf300: psnr -0.52%, speedup about 5-7%

Change-Id: I59e38aa836557cfa5405ae706fc64815cbfe4232
2013-08-05 14:14:01 -07:00
Yunqing Wang
7965a6ea34 Comment out 2 unused speed features
use_min_partition_size and use_max_partition_size are not used
currently, and could be added back if needed later.

Change-Id: Ib22a9c06b064567a7c1d6d5445567ed77e0d3acc
2013-08-01 11:03:34 -07:00
Adrian Grange
fbd73648dd Merge "Cleanup typos, remove unnecessary lines, replace switch" 2013-07-30 12:59:46 -07:00
Adrian Grange
b30a06b930 Cleanup typos, remove unnecessary lines, replace switch
Removed unnecessary code lines, replaced switch with an if,
fixed spelling errors and formatting.

Change-Id: Ie48aa4604aa0ed48362ca359d792fb21b2ec1dc6
2013-07-30 12:10:32 -07:00
Dmitry Kovalev
730a34416f Renaming NB_TXFM_MODES constant to TX_MODES.
Change-Id: I10bf06e3a3d5271221ae6a42a36074d01d493039
2013-07-29 13:38:40 -07:00
Dmitry Kovalev
23391ea835 Renaming TX_SIZE_MAX_SB to TX_SIZES.
Change-Id: I6aa4191935aa93461a07c41b59fdae1eb5f5f107
2013-07-29 12:25:34 -07:00
Paul Wilkins
fe5e2a91bb Auto min and max partition size experiment.
Speed feature experiment to set an upper and lower
partition size limit based on what has been seen
in spatial neighbors.

This seems to gives quite reasonable speed gains in local
(10-15%) and when used with speed 0 the losses are small
(0.25% derf, 0.35% stdhd). However, for now I am only
enabling it on speed 1 as there may be clashes with the existing
temporal partition selection in speed 2.

Using a tighter min / max around the range derived from the
neighbors increases speed further but at the cost of a
bigger quality loss. However,  I think this spatial method could
be combined with data from either the last frame or a variance
method (or both) to refine the range of minimum and maximum
partition size. I.e. consider the min and max from spatial and
temporal neighbors and the variance recommendation.

Change-Id: I1b96bf8b84368d6aad0c7aa600fe141b4f07435f
2013-07-26 18:30:49 +01:00
Paul Wilkins
cedd24ec61 Merge "Renaming of segment constants." 2013-07-23 08:16:12 -07:00
Paul Wilkins
7c134bc0cd Merge "Reworked the auto_mv_step_size speed feature" 2013-07-23 04:49:55 -07:00
Paul Wilkins
32042af14b Renaming of segment constants.
Renamed:
  MAX_MB_SEGMENTS to MAX_SEGMENTS
  MB_SEG_TREE_PROBS to SEG_TREE_PROBS

The minimum unit for segmentation in the segment map
is now 8x8 so it is misleading to use MB_ as macro-block
traditionally refers to a 16x16 region.

Change-Id: I0b55a6f0426bb46dd13435fcfa5bae0a30a7fa22
2013-07-23 12:09:04 +01:00
James Zern
76db4d599a Merge "VP[89]_COMMON: remove golden/altref frame counts" 2013-07-22 12:55:07 -07:00
Paul Wilkins
1d189d6464 Re-order mode search in rd.
Mode search order in rd loop changed to better reflect
observed hit counts.

Also some adjustment of the baseline mode rd thresholds
to reflect the order change and observed frequencies.

Change-Id: I47a131cc83e11551df8add6d6d8d413d78d3a63c
2013-07-22 17:21:12 +01:00
Deb Mukherjee
302698fb12 Reworked the auto_mv_step_size speed feature
This patch modifies the auto_mv_step_size speed feature to
use a combination of the maximum magnitude mv from the last
inter frame, and the maximum magnitude mv for the two reference
mvs with the same reference. For arf frames, the max mav step
for the resolution is used.
The bounds therefore are slightly tighter. The feature is made
a speed 1 feature.

Rebased.

Results (when this feature is turned on over speed 0):
derfraw300: -0.046% psnr, about 5+% speedup
(tested on football: goes from 4m30.760s to 4m17.410s).

Change-Id: If492797a61b0b4b3e58c0b8f86afb880165fc9f6
2013-07-19 15:12:56 -07:00
Paul Wilkins
f3ed9f5523 Alignment of THR_MODES to vp9_mode_order[]
Change-Id: I4032dd0442043543954dcb3724df974b7cc7e515
2013-07-19 11:33:39 +01:00
James Zern
5f30a0c687 VP[89]_COMMON: remove golden/altref frame counts
these are only used in the encoder.
frames_since_golden / frames_till_alt_ref_frame -> VP[89]_COMP

Change-Id: Ie14a6f46987bced685ddb449b85dc261caba6dfe
2013-07-18 14:09:21 -07:00
Yunqing Wang
df90d58f4f Speed up motion estimation using small partitions' result(experiment)
Current partition checking starts from small sizes, and then goes up
to large sizes. This experiment uses the small partitions' motion
estimation result, which is already available, to speed up the
large partition's motion estimation. We can decide to skip some
patition checkings if they are unlikely choices. We could use the
motion vector(MV) result as current partition's prediction MV, limit
the search range and reference frame.

Current result at speed 1:
psnr loss: 1.19% for stdhd, 0.287% for derf.
speed gain: 14% for sunflower(hd), 11% for akiyo.

Further improvement will be done later.

Change-Id: I5abfd070e9cace2e91e2a0247d1325df313887ab
2013-07-17 09:11:47 -07:00
Paul Wilkins
d66eab15dd Merge "Move uv intra mode selection in rd loop." 2013-07-17 05:19:26 -07:00
Paul Wilkins
2ee338ce3b Move uv intra mode selection in rd loop.
Use an estimate based on DC_PRED for intra uv cost
within the rd loop then only do a full uv mode analysis
if an intra mode is chosen.

Significant speed gains in some cases. Currently only
enabled for speed 2 pending speed/quality tests.

Change-Id: Ie851a12400d5483bce47ec0e3ccb8516041e91c0
2013-07-17 11:11:21 +01:00
James Zern
9581eb6e8a use consistent framerate naming
s/frame_rate/framerate/g

Change-Id: I6fc3e088e419c5f46e3a9390dd8a2cad2677a2fc
2013-07-16 14:12:47 -07:00
Jingning Han
faff6ed0fb Skip duplicate block encoding in the rd loop
This speed feature allows the encoder to largely remove the spatial
dependency between blocks inside a 64x64 superblock, thereby removing
the need to repeatedly encode superblocks per partition type in the
rate-distortion optimization loop.

A major challenge lies in the intra modes tested in the rate-distortion
optimization loop. The subsequent blocks do not have access to the
reconstructed boundary pixels without the intermediate coding steps.
This was resolved by using the original pixels for intra prediction
in the rd loop, followed by an appropriately designed distortion
modeling on the quantization parameters. Experiments also suggested
that the performance impact is more discernible at lower bit-rate/psnr
settings. Hence a quantizer dependent threshold is applied to deactivate
skip of block coding.

For bus_cif at 2000 kbps,
speed 0: runtime 269854ms -> 237774ms (12% speed-up) at 0.05dB
         performance loss.

speed 1: runtime 65312ms  -> 61536ms, (7% speed-up) at 0.04dB
         performance loss.

This operation is currently turned on in settings of speed 1.

Change-Id: Ib689741dfff8dd38365d8c1b92860a3e176f56ec
2013-07-15 11:08:58 -07:00
Dmitry Kovalev
cc662dd768 Adding struct tx_probs and struct tx_counts to cleanup the code.
Also removing unused declarations from vp9_entropymode.h file.

Change-Id: Ib9c5826db3584a32f6bb3297a76c522b99d83402
2013-07-12 15:22:38 -07:00
Deb Mukherjee
94c481f9f1 Some minor cleanups for efficiency
Implements some of the helper functions more efficiently with
lookups rathers than branches. Modeling function is consolidated
to reduce some computations.

Also merged the two enums BLOCK_SIZE_TYPES and BlockSize into
one because there is no need to keep them separate (even though
the semantics are a little different).

No bitstream or output change.

About 0.5% speedup

Change-Id: I7d71a66e8031ddb340744dc493f22976052b8f9f
2013-07-12 10:22:56 -07:00
Deb Mukherjee
53ff43adc3 Prunes out full-rd computation based on modeled rd
Adds a speed feature to eliminate full-rd computation if the modeled
rd or rd based on a different parameter in the same mode is already
a lot larger than the best rd yet.

Specifically, only search the sharp and smooth filters if the modeled
rd cost based on the  regular filter is within a certain factor of the
best rd cost so far. Also, skip full-rd computation of non splitmv
inter modes if the modeled rd cost based on pred error is within the
same factor of the best rd cost so far.

Also adds some enhancements in the rd search for splitmv mode to
speed things up by early breakouts. Negligible impact on performance.

Resuts on derfraw300:
psnr:    -0.013% with the splitmv enhancements, -0.24% with the rd
         breakout feature on.
speedup: 6% with splitmv enhancements, 20% with also residual breakout
         (tested on football sequence at 600 Kbps)

Change-Id: I37abc308ea9f110c1679ce649b6a7e73ab1ad5fc
2013-07-10 13:49:49 -07:00
Yaowu Xu
bed27a960a Add a feature to reduce chrome intra mode search
Change-Id: I721ebdeef2b53ce3e5c3eba3f7462ae2103c95a8
2013-07-10 08:59:18 -07:00
Ronald S. Bultje
ed995afba1 Make frame-wide filter-type decision fully RD-based.
Overall, on all test sets, this gains about +0.2% on all metrics.
City is a clip where this really hurts (-1.0% on all metrics), I'm
not quite sure why yet. Maybe interesting to look into in the future.

Change-Id: I6f0eecb20e72f0194633270d30bf00d76d9eae78
2013-07-08 16:22:37 -07:00
Deb Mukherjee
d9b62160a0 Implements several heuristics to prune mode search
Skips mode searches for intra and compound inter modes depending
on the best mode so far and the reference frames. The various
heuristics to be used are selected by bits from a flag. The
previous direction based intra mode search pruning is also absorbed
in this framework.

Specifically the flags and their impact are:

1) FLAG_SKIP_INTRA_BESTINTER (skip intra mode search for oblique
directional modes and TM_PRED if the best so far is
an inter mode)
derfraw300: -0.15%, 10% speedup

2) FLAG_SKIP_INTRA_DIRMISMATCH (skip D27, D63, D117 and D153
mode search if the best so far is not one of the closest
hor/vert/diagonal directions.
derfraw300: -0.05%, about 9% speedup

3) FLAG_SKIP_COMP_BESTINTRA (skip compound prediction mode
search if the best so far is an intra mode)
derfraw300: -0.06%, about 7-8% speedup

4) FLAG_SKIP_COMP_REFMISMATCH (skip compound prediction search
if the best single ref inter mode does not have the same ref
as one of the two references being tested in the compound mode)
derfraw300: -0.56%, about 10% speedup

Change-Id: I1a736cd29b36325489e7af9f32698d6394b2c495
2013-07-08 12:17:12 -07:00
Paul Wilkins
72c5778ec5 Added two new skip experiments.
sf->unused_mode_skip_lvl. Tests modes as normal for all
sizes at or below the given level. At larger sizes it skips
all modes that were not chosen at any smaller size.
Hence setting BLOCK_SIZE_SB64X64 is in effect off.
Setting BLOCK_SIZE_AB4X4 will only consider modes that
were chosen for one or more 4x4 blocks at larger sizes.

sf->reference_masking.
Do a test encode of the NONE partition at one size and create
a reference frame mask based on the best rd choice. In the
full search only allow this reference frame.
Currently it is testing 64x64 and repeats this in the full search.
This does not work well with Jim's Partition code just now and
is disabled by default.

Change-Id: I8f8c52d2ef4a0c08100150b0ea4155d1aaab93dd
2013-07-03 16:56:06 +01:00
Yaowu Xu
0d7b7c09cb Added a speed feature use_square_partition_only
This commit adds a speed feature where only squared partition are
evaluated in partition picking. Enable this feature in cpu-used 2
reduces encoding time by ~30%.

loss of compression:
-0.9% on cif set
-1.23% on stdhd

Change-Id: Ia6fad11210f0b78365abb889f9245604513be5b9
2013-07-02 16:40:15 -07:00
Deb Mukherjee
37501d687c Speed feature to binary search dir intramodes
This speed feature will skip searching the directional intra prediction
modes D63, D117, D27, D153 if the best intra mode so far is not one of
the diagonal, horizontal or vertical directions closest to the respective
directions being tested. In other words, this implements a sort of
binary search in the angular domain.

Speedup: about 9-10%
Results: -0.05% only on derfraw300.

Change-Id: I413584c41f2a3e8dabfbdeb40718c8fc4b1d63a2
2013-07-02 14:07:19 -07:00
Deb Mukherjee
8d3d2b76f3 Tx size selection enhancements
(1) Refines the modeling function and uses that to add some speed
features. Specifically, intead of using a flag use_largest_txfm as
a speed feature, an enum tx_size_search_method is used, of which
two of the types are USE_FULL_RD and USE_LARGESTALL. Two other
new types are added:
USE_LARGESTINTRA (use largest only for intra)
USE_LARGESTINTRA_MODELINTER (use largest for intra, and model for
inter)

(2) Another change is that the framework for deciding transform type
is simplified to use a heuristic count based method rather than
an rd based method using txfm_cache. In practice the new method
is found to work just as well - with derf only -0.01 down.
The new method is more compatible with the new framework where
certain rd costs are based on full rd and certain others are
based on modeled rd or are not computed. In this patch the existing
rd based method is still kept for use in the USE_FULL_RD mode.
In the other modes, the count based method is used.
However the recommendation is to remove it eventually since the
benefit is limited, and will remove a lot of complications in
the code

(3) Finally a bug is fixed with the existing use_largest_txfm speed feature
that causes mismatches when the lossless mode and 4x4 WH transform is
forced.

Results on derf:
USE_FULL_RD: +0.03% (due to change in the tables), 0% encode time reduction
USE_LARGESTINTRA: -0.21%, 15% encode time reduction (this one is a
pretty good compromise)
USE_LARGESTINTRA_MODELINTER: -0.98%, 22% encode time reduction
(currently the benefit of modeling is limited for txfm size selection,
but keeping this enum as a placeholder) .
USE_LARGESTALL: -1.05%, 27% encode-time reduction (same as existing
use_largest_txfm speed feature).

Change-Id: I4d60a5f9ce78fbc90cddf2f97ed91d8bc0d4f936
2013-07-02 13:54:00 -07:00
Yunqing Wang
b12e060b55 Add speed feature to disable splitmv
Added a speed feature in speed 1 to disable splitmv for HD (>=720)
clips. Test result on stdhd set: 0.3% psnr loss and 0.07% ssim
loss. Encoding speedup is 36%.

(For reference: The test result on derf set showed 2% psnr loss
and 1.6% ssim loss. Encoding speedup is 34%. SPLITMV should be
enabled for small resolution videos.)

Change-Id: I54f72b94f506c6d404b47c42e71acaa5374d6ee6
2013-07-02 10:02:34 -07:00
Paul Wilkins
b7cd01ed73 Revert "New motion threshold factor - speed feature."
This reverts commit 1377278180.
Also fixes a spelling mistake.

Change-Id: I5be8aa4d8d3c0323d4a6f41968a7b2c048949c3f
2013-07-02 15:06:40 +01:00
Jim Bankoski
d4158283e7 use partitioning from last frame
This cl converts use partition from last frame to do the following:

if part is none,horz, vert -> try split
if part != none and one of the children is not split - try none


Change-Id: I5b6c659e35f3ac9f11c051b92ba98af6d7e8aa87
Signed-off-by: Jim Bankoski <jimbankoski@google.com>
2013-07-01 18:18:50 -07:00
Yaowu Xu
ba3b2604f0 Merge "Quantize (64-bit only, for now) SSSE3 SIMD." 2013-07-01 15:58:57 -07:00
Ronald S. Bultje
7353ceab9d Quantize (64-bit only, for now) SSSE3 SIMD.
Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps
goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is
x86-64 only, it needs some minor modifications to be 32bit compatible,
because it uses 15 xmm registers, whereas 32bit only has 8.

Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904
2013-07-01 11:36:07 -07:00
Paul Wilkins
1377278180 New motion threshold factor - speed feature.
Added a speed feature that focuses only on thresholds
for new motion modes.

Moved sf->comp_inter_joint_search_thresh into speed
1.  This has ~+0.4% impact on quality at speed 0 as
our quality reference baseline.

Slight adjustment to baseline thresholds.

Change-Id: I7ebf104f1fe29af77ed4837b2e84be065621bbe5
2013-07-01 12:11:21 +01:00
Dmitry Kovalev
59070f6e3c Merge "Removing CONFIG_DEBUG checks on assertions." 2013-06-28 14:03:28 -07:00
Dmitry Kovalev
8e6ce6bb9e Removing CONFIG_DEBUG checks on assertions.
Adding CHECK_MEM_ERROR macro to vp9_common.h and removing two duplicated
ones from vp9_onyx_int.h and vp9_onyxd_int.h.

Change-Id: I916afec61b3019f18193135dac7c35ed0f89b8b6
2013-06-28 10:36:20 -07:00
Ronald S. Bultje
af660715c0 Make coefficient skip condition an explicit RD choice.
This commit replaces zrun_zbin_boost, a method of biasing non-zero
coefficients following runs of zero-coefficients to be rounded towards
zero, with an explicit skip-block choice in the RD loop.

The logic is basically that if individual coefficients should be rounded
towards zero (from a RD point of view), the trellis/optimize loop should
take care of it. If whole blocks should be zero (from a RD point of
view), a single RD check is much more efficient than a complete
serialization of the quantization loop.

Quality change: derf +0.5% psnr, +1.6% ssim; yt +0.6% psnr, +1.1% ssim.
SIMD for quantize will follow in a separate patch. Results for other
test sets pending.

Change-Id: Ife5fa641163ac5150ac428011e87188f1937c1f4
2013-06-28 10:28:49 -07:00
Yaowu Xu
1374a06bd8 Optimize partition search order
This commit change the partition search order to allow checking of
rectangular partition to be done after square partitions. It also
added a speed feature to skip rectangular partition check when
NONE is better than SPLIT in RD sense.

This feature roughly speed up encoder by 1.5X with loss on compression
-0.91% on cif set
-0.56% on stdhd set

Change-Id: I0d2d06993041aa9ea9073fcc39c54f73a127dfa4
2013-06-28 07:13:54 -07:00
Paul Wilkins
05ffdf2625 Merge "Auto adapt step size feature." 2013-06-27 02:28:41 -07:00
Paul Wilkins
5bcf069c6b Merge "Change meaning of cpi->sf.first_step and rename." 2013-06-27 02:28:21 -07:00
Jingning Han
861cb06c67 Make intra predictor reference buffer configurable
This commit enables configurable reference buffer pointer for intra
predictor. This allows later removal of spatial dependency between
blocks inside a 64x64 superblock in the rate-distortion optimization
loop.

Change-Id: I02418c2077efe19adc86e046a6b49364a980f5b1
2013-06-26 17:17:21 -07:00
Paul Wilkins
9f3ab83486 Auto adapt step size feature.
Also tweaks to other features and experiments with
what is on and off at different speed settings.

Change-Id: I3e1d0be0d195216bf17c2ac5df67f34ce0b306b2
2013-06-26 19:48:39 +01:00
Paul Wilkins
e606cac046 Change meaning of cpi->sf.first_step and rename.
Renamed cpi->sf.first_step to cpi->sf.reduce_first_step_size
and changed its meaning such that it is a delta applied to
reduce the default first step size (>> x) in the motion search
rather than an absolute value.

The default first step size is already changed according to the image
dimensions (smaller for smaller images). cpi->sf.reduce_first_step_size
now applies a further correction from the default.

Change-Id: Ia94e08bc24c67b604831f980909af7e982fcd16d
2013-06-26 17:04:06 +01:00
Jim Bankoski
9f2a1ae23e adds force partitioning greater than or less than block size
adds a new speed feature to force partitioning to be greater than
or less than a certain size

Change-Id: I8c048eeeef93700ae822eccf98f8751a45b2e7d0
2013-06-20 09:51:42 -07:00
Jim Bankoski
18bdf708e7 adds a set partitioning to speed features
this feature lets you set a partitioning size to be used by the entire
frame.

Change-Id: I208a4c8c701375cbb054418266f677768b6f8f06
2013-06-20 09:50:44 -07:00
Jim Bankoski
476d73d294 partition by variance using var from last frame
This uses variance to split partition. Variance is calculated using
nearest mv,  always from last ref frame.

Change-Id: Idd015b4a9aa3bc82591759eac239680c07496896
2013-06-20 09:48:22 -07:00
Jim Bankoski
1f94b97694 convert all speed things to speed features
Change-Id: Ie24489a4d39f3e53e816eeebf75a1c9c7d94515a
2013-06-20 09:42:44 -07:00
Jim Bankoski
0fad6a9d99 fix to set up new speed feature
This uses the speed feature functionality for code.

Change-Id: I9cd16c0c5f98520ae27ebba81aa2c178546587f8
2013-06-20 09:35:02 -07:00
Ronald S. Bultje
8a0808a145 Fix row tiling.
Change-Id: I57be4eeaea6e4402f6a0cc04f5c6b7a5d9aedf9b
2013-06-12 13:42:59 -04:00
Deb Mukherjee
17da2cab78 TX_SIZE contexts simplification.
Reduces TX_SIZE contexts to 2 for each kind. The code is
cleaner and there is hardly any performance difference with
more than two contexts.

Results: almost neutral

Change-Id: I17656bd6db76224ae2856adf882504560e7dbaa4
2013-06-08 12:32:26 -07:00
Deb Mukherjee
21401942b0 Coding tx-size selection by use of spatial context
Adds coding of transform size within a frame by use of context
of transform sizes selected in left and above blocks.

Also incorporates code for generating stats.

TODO: generate and incorporate new default stats

Change-Id: I6a7af099f6ad61d448521d9a51167aedaf638ed6
2013-06-07 16:07:58 -07:00
Deb Mukherjee
869a39ba60 Cleans up mbskip encoding
Refactors mbskip coding to be compatible with coding of the rest of
the symbols. Adds forward/backward adaptation and removes a lot of
the legacy code.

Results:
fast50: +1.6%
derfraw300: +0.317%

Change-Id: I395a2976d15af044d3b8ded5acfa45f6f065f980
2013-06-07 16:00:26 -07:00
Yaowu Xu
0bb6da3668 Merge "Remove two un-used entries in mode_lf_delta[]" into experimental 2013-06-07 10:10:45 -07:00
Yaowu Xu
b097a3ba82 Remove two un-used entries in mode_lf_delta[]
With the removal of i4X4 and SPLIT_MV modes, the two entries for the
modes are no longer used. This patch remove the coding of the deltas.

Change-Id: Iea4eb500404ebe9706159380a03b8eca542fb4c3
2013-06-07 09:24:09 -07:00
Deb Mukherjee
3ee1a21a42 Coding updates for tx-size selection
Changes to the coding of transform sizes, along with forward
and backward probability updates.

Results:
derf300: +0.241%

Context based coding of transform sizes will be in a separate
patch.

Change-Id: I97241d60a926f014fee2de21fa4446ca56495756
2013-06-07 08:54:00 -07:00
Ronald S. Bultje
6ef805eb9d Change ref frame coding.
Code intra/inter, then comp/single, then the ref frame selection.
Use contextualization for all steps. Don't code two past frames
in comp pred mode.

Change-Id: I4639a78cd5cccb283023265dbcc07898c3e7cf95
2013-06-06 17:28:09 -07:00
Ronald S. Bultje
ad34368786 New intra mode and partitioning probabilities.
Split partition probabilities between keyframes and non-keyframes,
since they are fairly different. Also have per-blocksize interframe
y intramode probabilities, since these vary heavily between different
blocksizes.

Lastly, replace default probabilities for partitioning and intra modes
with new ones generated from current codec. Replace counts with actual
probabilities also.

Change-Id: I77ca996e25e4a28e03bdbc542f27a3e64ca1234f
2013-06-06 10:45:30 -07:00
Paul Wilkins
c3316c2bc5 Rd thresholds change with block size.
Added structures to support independent rd thresholds
for different block sizes (and set experimental block
size correction factors).

Added structure to to allow dynamic adaptation of thresholds
per mode and per block size basis depending on how often
the mode/block size combination is seen (currently fixed factor).

Removed some unused variables.

TODO
- Adaptation of thresholds based on how often each mode chosen.
- The baseline mode values could also be adjusted based on
  the block size (e.g. for a particular intra mode use a low threshold
  for 4x4 prediction blocks but a relatively high value for 64x64.

Change-Id: Iddee65ff3324ee309815ae7c1c5a8584720e7568
2013-06-06 15:45:53 +01:00
Paul Wilkins
c880e02f97 Turn off compound inter search refinement for good quality.
Turn this feature off for some modes in  "good" quality.

Change-Id: I3f262d62cca8f01736b977af1465291e8be29f0a
2013-06-06 15:44:25 +01:00
Deb Mukherjee
30226a658f Cosmetic renaming VP9_MVREFS to VP9_INTER_MODES
NO bitstream change

Change-Id: I79f6146dac5fdd157051b6f8dc611c0b7b5e5f7f
2013-06-05 11:24:01 -07:00
Deb Mukherjee
83885235a7 Clean-ups on switchable interpolation and mv_ref
Adds backward adaptation and differential forward updates of switchable
interpolation filter probabilities. Also adds some cosmetic cleanups
and minor fixes on mv_ref probabilities.

derfraw300: +0.353% (with most coming from switchable interp changes)

Change-Id: Ie2718be73528c945fd0d80cfd63ca2d9cb3032de
2013-06-05 10:11:52 -07:00
Ronald S. Bultje
e9d68a5e36 Merge all various transform size data trackers into single variables.
Change-Id: I2dfc569106b29fbe4da20585a0e85e5e9ea6a4db
2013-05-31 09:18:59 -07:00
Ronald S. Bultje
b480d413e7 Minor cosmetic changes.
Change-Id: Ieb4a8c97bf1b1dfb993f40a9a3ef3bed5ae7d948
2013-05-30 20:58:53 -07:00
Ronald S. Bultje
310bc1030a Merge "Merge VP9_YMODES, VP9_UV_MODES, INTRA_MODE_COUNT and cousins." into experimental 2013-05-30 20:58:19 -07:00
Ronald S. Bultje
7d549870f7 Merge "Remove TX_SIZE_MAX_MB." into experimental 2013-05-30 20:58:16 -07:00