Commit Graph

69 Commits

Author SHA1 Message Date
Yaowu Xu
ee5d79995e Move computation up to frame level
This is to avoid redo the same calculation repeatly, and also allow
easier adjustments for further experiments.

This commit shall have no effect on quality/compression.

Change-Id: I4460acf5c808ff5518da18d21e002c5da58af857
2015-02-10 15:41:52 -08:00
Adrian Grange
527e073163 Remove elevate_newmv_thresh from SPEED_FEATURES (unused)
Change-Id: I78ef7f89586a329787f6bc4c58ec83af210989a3
2015-01-22 16:12:50 -08:00
Jingning Han
a8d8c0f633 Remove unused ONE_LOOP entry from speed feature
Change-Id: I56ead0ebc2491144c4e79e5859b05e126176702c
2014-12-03 09:17:08 -08:00
Jingning Han
8fe50191c6 Rework coeff probability model update for rtc coding
This commit reworks the ONE_LOOP_REDUCED coefficient probability
model update process. It allows model update for every coefficient
across the spectrum at a coarser resolution, instead of performing
precise update only for certain subset of probability models.

The overall runtime remains nearly same (<1% change) for speed -6.
The compression performance is improved by 7.5% in PSNR for speed
-5 and 4.57% for speed -6, respectively.

Change-Id: Ifb17136382ee7e39a9f34ff4a4f09a753125c8d1
2014-12-03 09:15:25 -08:00
Yunqing Wang
ad7586a9e1 vp9_ethread: move max/min partition size to mb struct
The max_partition_size and max_partition_size are set at the
beginning while setting speed features, and then adjusted at
SB level. Moving them to mb struct ensures there is a local
copy for each thread.

Change-Id: I7dd08dc918d9f772fcd718bbd6533e0787720ad4
2014-11-20 09:24:50 -08:00
Adrian Grange
0d085ebc0a Prepare for dynamic frame resizing in the recode loop
Prepare for the introduction of frame-size change
logic into the recode loop.

Separated the speed dependent features into
separate static and dynamic parts, the latter being
those features that are dependent on the frame size.

Change-Id: Ia693e28c5cf069a1a7bf12e49ecf83e440e1d313
2014-11-13 11:41:20 -08:00
Yunqing Wang
aed48c786a Remove unused speed feature
Partition_check was unused and removed.

Change-Id: I15ec9162d86dc61f04c09229c498629878ed7155
2014-10-29 17:05:04 -07:00
Jingning Han
abb2fbb10e Remove deprecated constrain_copy_partitioning function
Its functionality has been replaced with choose_partitioning and
threshold based control on split mode check.

Change-Id: Ic9bb321df06b524f5c38ea5874dc6f6a8f93c5e3
2014-10-20 17:08:21 -07:00
Jingning Han
e62ce79e1a Remove deprecated use_lastframe_partitioning feature
This speed feature has been deprecated in both yt and rtc coding
modes. This commit removes the related operations.

Change-Id: I079c79c6adafe45581af2ebf8b98faebcface1ce
2014-10-20 17:03:38 -07:00
Jingning Han
e1111fba7e Remove unused VAR_BASED_FIXED_PARTITION flag
Change-Id: I4ce19b7cb1c45fed86e81ee785e787630020fb4f
2014-10-17 09:02:25 -07:00
Deb Mukherjee
3117830af3 Merge "Subpel search cleanups and enhancements" 2014-10-09 11:14:51 -07:00
Deb Mukherjee
d78dbff09a Subpel search cleanups and enhancements
- Some fixes to surface fit.
- Returns variance function as cost rather than sad in the
  pattern search and diamond search functions. Only
  vp9_pattern_search_sad function used in bigdia search
  uses sad as integer 1-away costs.
- Deploys SUBPEL_TREE_PRUNED_MORE for speed 4+.

Results:
derf [Speed 3]: About +0.036% in coding efficiency without any
discernible speed loss.
derf [Speed 4]: About 2-3% faster at -0.199% loss in coding efficiency.
derf [Speed 5]: About 3-4% faster at -0.149% loss in coding efficiency.

Change-Id: I8462f94f6adb46966ca964f2bd0400977357fd63
2014-10-08 23:59:43 -07:00
Jim Bankoski
0ce51d823f experimental : partition using 1/8 x 1/8 image
The concept:

There's too much noise in source pixels for variance and at low bitrate
the reconstructed looks nothing like the source so we have problems
getting good partitionings with either.   This skirts the issue by using
a box blur scaled down version for variance calculations.  To compare
against source_var_ moved keyframe to be rd based like source_var.

Change-Id: Ie3babdbfadae324b7b5a76bea192893af27f0624
2014-10-07 16:36:14 -07:00
Yunqing Wang
b1b6fd85db Merge "Skip the partition search for still frames" 2014-09-30 11:59:05 -07:00
Deb Mukherjee
4e9c0d2ad4 Adds two new subpel search methods
One is a more aggressive version of the pruned subpel tree
search where only a single halfpel candidate is searched.
The search candidate is based on a surface fit result.
The other is a method to obtain the subpel position at one
shot based on the same surface fit.

The methods have not been deployed in any speed setting yet.

Change-Id: I34fef3f2e34f11396c9d1ba97f4be8c4ffca62d3
2014-09-29 12:51:20 -07:00
Yunqing Wang
1fcbf6ed56 Skip the partition search for still frames
This patch re-enabled the feature in Pengchong's patch
(commit 1286126073). Originally, it
was turned on while use_lastframe_partitioning > 0(not used anymore).
Now it was added as a feature, and turned on while speed >= 2.
As described in the original patch, this feature helps speed up the
slideshows in YouTube.

Change-Id: I1b0f18d65da1ee1c8d1e117dabba910c5207c471
2014-09-26 09:03:52 -07:00
Jingning Han
eee904c9b9 Adaptive mode search scheduling
This commit enables an adaptive mode search order scheduling scheme
in the rate-distortion optimization. It changes the compression
performance by -0.433% and -0.420% for derf and stdhd respectively.
It provides speed improvement for speed 3:

bus CIF 1000 kbps
24590 b/f, 35.513 dB, 7864 ms ->
24696 b/f, 35.491 dB, 7408 ms (6% speed-up)

stockholm 720p 1000 kbps
8983 b/f, 35.078 dB, 65698 ms ->
8962 b/f, 35.054 dB, 60298 ms (8%)

old_town_cross 720p 1000 kbps
11804 b/f, 35.666 dB, 62492 ms ->
11778 b/f, 35.609 dB, 56040 ms (10%)

blue_sky 1080p 1500 kbps
57173 b/f, 36.179 dB, 77879 ms ->
57199 b/f, 36.131 dB, 69821 ms (10%)

pedestrian_area 1080p 2000 kbps
74241 b/f, 41.105 dB, 144031 ms ->
74271 b/f, 41.091 dB, 133614 ms (8%)

Change-Id: Iaad28cbc99399030fc5f9951eb5aa7fa633f320e
2014-09-22 09:28:16 -07:00
Jingning Han
00fe92c22f Remove unused speed feature
The speed feature that skips compound inter prediction modes was
subsumed by other speed features and effectively was not in use.
This commit removes it.

Change-Id: I22b0c71a8ddd15d93b25d86fa63a1dce2ba6a1a9
2014-09-11 15:54:53 -07:00
Jingning Han
f9f0879756 Refactor to remove speed feature dependency on mode search order
This commit refactor the rate-distortion optimization search for
regular block sizes to remove the speed feature dependency on mode
search order.

Change-Id: Ied033ee484c2957e17baa7b6450b720fe7dd0e7d
2014-09-10 17:09:14 -07:00
Jingning Han
4282955ee1 Skip intra mode tests depending on inter residuals
This commit allows encoder to skip intra coding mode test, when
the known inter residual is less than the source variance. It
reduces the runtime of speed 3 for test clips:
bus cif 1000 kbps: 8587 ms -> 8260 ms, 3.8% speed-up
pedestrian 1080p 2000 kbps: 161381 ms -> 155241 ms, 3.7% speed-up.

The compression performance is down by
derf   -0.36%
stdhd  -0.25%

Change-Id: I75ce1e035b4da2153cb1ac14111d1a07c05a735d
2014-08-29 08:37:35 -07:00
Yunqing Wang
4d2c376923 Early termination in encoding partition search
In the partition search, the encoder checks all possible
partitionings in the superblock's partition search tree.
This patch proposed a set of criteria for partition search
early termination, which effectively decided whether or
not to terminate the search in current branch based on the
"skippable" result of the quantized transform coefficients.
The "skippable" information was gathered during the
partition mode search, and no overhead calculations were
introduced.

This patch gives significant encoding speed gains without
sacrificing the quality.

Borg test results:
1. At speed 1,
   stdhd set: psnr: +0.074%, ssim: +0.093%;
   derf set:  psnr: -0.024%, ssim: +0.011%;
2. At speed 2,
   stdhd set: psnr: +0.033%, ssim: +0.100%;
   derf set:  psnr: -0.062%, ssim: +0.003%;
3. At speed 3,
   stdhd set: psnr: +0.060%, ssim: +0.190%;
   derf set:  psnr: -0.064%, ssim: -0.002%;
4. At speed 4,
   stdhd set: psnr: +0.070%, ssim: +0.143%;
   derf set:  psnr: -0.104%, ssim: +0.039%;

The speedup ranges from several percent to 60+%.
                 speed1    speed2    speed3    speed4
(1080p, 100f):
old_town_cross:  48.2%     23.9%     20.8%     16.5%
park_joy:        11.4%     17.8%     29.4%     18.2%
pedestrian_area: 10.7%      4.0%      4.2%      2.4%
(720p, 200f):
mobcal:          68.1%     36.3%     34.4%     17.7%
parkrun:         15.8%     24.2%     37.1%     16.8%
shields:         45.1%     32.8%     30.1%      9.6%
(cif, 300f)
bus:              3.7%     10.4%     14.0%      7.9%
deadline:        13.6%     14.8%     12.6%     10.9%
mobile:           5.3%     11.5%     14.7%     10.7%

Change-Id: I246c38fb952ad762ce5e365711235b605f470a66
2014-08-28 11:27:28 -07:00
Deb Mukherjee
bb2a9abb1e Merge "Updates vp9_pattern search to return integer sads" 2014-08-28 09:38:56 -07:00
Deb Mukherjee
04b100b23e Updates vp9_pattern search to return integer sads
Updates the vp9_pattern_search function to return integer one-away
neighbors' sad values, for subsequent use in speeding up the
sub-pel search. Also, removes code for the do_refine option
which is not being used currently.
Updates the integer and subpel functions to pass in a 5-element
sad list for output or input.

A new pruned sub-pel search algorithm is implemented that uses
the sad returned from the integer pel search. But it is not
deployed yet.

Change-Id: Ifa9f5ad024b5b660570366d2bd900343e1891520
2014-08-28 06:49:58 -07:00
Yaowu Xu
1144fee3d5 add a new interp filter search strategy.
This commit addes a new strategy to reduce the search for optimal
interpolation filter type. The encoder counts and store how many each
filter type is selected and used for each of the reference frames.
A filter type that is rarely used for all three reference frames is
masked out to avoid computation.

The impact on compression is neglectible:
-0.02% on derf
+0.02% on stdhd

Encoding time is seen to reduce by 2~3%.

Change-Id: Ibafa92291b51185de40da513716222db4b230383
2014-08-26 09:05:04 -07:00
Yunqing Wang
4d98b50be5 Merge "Add early termination in transform size search" 2014-08-18 19:00:24 -07:00
Yunqing Wang
ba70f16011 Add early termination in transform size search
In the full-rd transform size search, we go through all transform
sizes to choose the one with best rd score. In this patch, an
early termination is added to stop the search once we see that the
smaller size won't give better rd score than the larger size. Also,
the search starts from largest transform size, then goes down to
smallest size.

A speed feature tx_size_search_breakout is added, which is turned off
at speed 0, and on for other speeds. The transform size search is
turned on at speed 1.

Borg test results:
1. At speed 1,
   derf set: psnr gain: 0.618%, ssim gain: 0.377%;
   stdhd set: psnr gain: 0.594%, ssim gain: 0.162%;
   No noticeable speed change.
3. At speed 2,
   derf set: psnr loss: 0.157%, ssim loss: 0.175%;
   stdhd set: psnr loss: 0.090%, ssim loss: 0.101%;
   speed gain: ~4%.

Change-Id: I22535cd2017b5e54f2a62bb6a38231aea4268b3f
2014-08-18 16:27:04 -07:00
Jingning Han
6a464eca05 Speed up mode search depending on relative ref frame position
This commit enables the encoder to record the location of the
center frame to generate alter reference frame. It then allows to
skip checking prediction modes of other reference frame types when
it comes to encode this frame.

The speed 3 runtime is reduced for the test sequences:
bus at CIF 1000 kbps, 9791 ms -> 9446 ms, i.e., 3.5% speed-up,
pedestrian at 1080p 2000 kbps, 184043 ms -> 175730 ms, i.e., 4.5%
speed-up.

No compression performance change observed.

Change-Id: Iacfde3bcc1445964e7a241f239bd6ea11cb94bd1
2014-08-18 16:06:54 -07:00
Pengchong Jin
997db6fc3f Merge "Add a speed feature to give the tighter search range" 2014-08-15 19:51:04 -07:00
Pengchong Jin
eca93642e2 Add a speed feature to give the tighter search range
Add a speed feature to give the tighter partition search
range. Before partition search, calculate the histogram
of the partition sizes of the left, above and previous
co-located blocks of the current block. If the variance of
observed partition sizes is small enough, adjust the search
range around the mean partition size, which will be tigher.

The feature is currently turned on at speed 2. Experiments on
sample youtube clips show on average the runtime is reduced
by 3-7%.

For hard stdhd clips:
park_joy_1080p @ 15000kbps:       509251 ms -> 491953 ms (3.3%)
pedestrian_area_1080p @ 2000kbps: 223941 ms -> 214226 ms (4.3%)

The PSNR performance is changed:
derf: -0.112%
yt:   -0.099%
hd:   -0.090%
stdhd:-0.102%

Change-Id: Ie205ec5325bf92ec5676c243e30ba9d0adca10f2
2014-08-15 16:14:20 -07:00
Yunqing Wang
28b1437d77 Remove a unused speed feature
Removed disable_split_var_thresh, which is not used anymore.

Change-Id: I50119b150442e1571157433b5effc6aae0dbe0fd
2014-08-15 14:10:27 -07:00
Jingning Han
0daadeb60c Enable motion field based mode seach skip
This commit allows the encoder to check the above and left neighbor
blocks' reference frames and motion vectors. If they are all
consistent, skip checking the NEARMV and ZEROMV modes. This is
enabled in speed 3. The coding performance is improved:

pedestrian area 1080p at 2000 kbps,
from  74773 b/f, 41.101 dB, 198064 ms
to    74795 b/f, 41.099 dB, 193078 ms

park joy 1080p at 15000 kbps,
from 290727 b/f, 30.640 dB, 609113 ms
to   290558 b/f, 30.630 dB, 592815 ms

Overall compression performance of speed 3 is changed
derf  -0.171%
stdhd -0.168%

Change-Id: I8d47dd543a5f90d7a1c583f74035b926b6704b95
2014-08-13 12:15:13 -07:00
Jingning Han
ca2dcb7fed Chessboard pattern partition search
This commit enables a chessboard pattern constrained partition
search for 720p and above resolutions. The scheme applies stricter
partition search to alternative blocks based on its above/left
neighboring blocks' partition range, as well as that of the
collocated blocks in the previous frame. It is currently turned
on at 16x16 block size level. The chessboard pattern is flipped
per coding frame.

The speed 3 runtime is reduced:
park_joy_1080p, 652832 ms -> 607738 ms (7% speed-up)
pedestrian_area_1080p, 215998 ms -> 200589 ms (8% speed-up)

The compression performance is changed:
hd     -0.223%
stdhd  -0.295%

Change-Id: I2d4d123ae89f7171562f618febb4d81789575b19
2014-07-30 10:32:41 -07:00
Jingning Han
54ad09586c Enable chessboard inter prediction filter type search
This commit enables a chessboard pattern prediction filter type
search scheme for rate-distortion optimization speed-up. For the
inferred motion vector modes, the encoder can re-use its above/left
neighbor blocks' prediction filter type and skip a full test on
all possible filter types. Such operation is turned on/off
alternatively in a chessboard manner.

It is turned on in speed 3. For test clip pedestrian 1080p, the
runtime is reduced from 231500 ms -> 221700 ms. The compression
performance is changed:
derf:  -0.147%
yt:    -0.134%
hd:    -0.079%
stdhd: -0.220%

Change-Id: I1912f278e7576c2dc632688e3ad7a257410c605a
2014-07-22 16:49:03 -07:00
Yaowu Xu
51c60a891e make default_interp_filter choice a speed feature
This commit changed the hard-coded DEFAULT_INTERP_FILTER to a speed
feature with the same default value: SWITCHABLE.

Change-Id: I7f54f40f1bd3f5277841d04b85db7a84e47313f1
2014-07-16 14:28:51 -07:00
Alex Converse
f60a1178c6 Cleanup motion search speed features.
* Replace max_step_search_steps with constant MAX_MVSEARCH_STEPS
* Fold (reduce_first_step_size + speed > 5) into reduce_first_step_size
  replacing uses of reduce_first_step_size that don't add the speed
  check with zero.

Change-Id: Iae46395dbf3eaca138bf4d18b838a9e364b5a198
2014-07-07 10:08:45 -07:00
Yaowu Xu
92a6db7928 Added a speed feature controlling a motion search parameter
This commit added a speed feature to control the step_param used in
full pixel motion search. The intention is to reduced the search
steps for high speed real time coding.

Change-Id: I21d2f0105c2b647783a6688615da7fcf2b6d670b
2014-07-02 09:30:43 -07:00
Yaowu Xu
82fd084b35 Merge "Re-design quantization process" 2014-07-01 19:04:01 -07:00
Jingning Han
9ac2f66320 Re-design quantization process
This commit re-designs the quantization process for transform
coefficient blocks of size 4x4 to 16x16. It improves compression
performance for speed 7 by 3.85%. The SSSE3 version for the
new quantization process is included.

The average runtime of the 8x8 block quantization is reduced
from 285 cycles -> 255 cycles, i.e., over 10% faster.

Change-Id: I61278aa02efc70599b962d3314671db5b0446a50
2014-07-01 17:00:07 -07:00
Yunqing Wang
f31ff029df Elevate NEWMV mode checking threshold in real time
The current threshold is knid of low, and in many cases NEWMV
mode is checked but not picked as the best mode. This patch
added a speed feature to increase NEWMV threshold, so that
less partition mode checking goes to check NEWMV. This feature
is enabled for speed 6 and 7.

Rtc set borg tests showed:
1. Speed 6, overall psnr: -0.088%, ssim: -1.339%;
   Average speedup on rtc set is 11.1%.
2. Speed 7, overall psnr: -0.505%, ssim: -2.320%
   Average speedup on rtc set is 12.9%.

Change-Id: I953b849eeb6e0d5a1f13eacba30c14204472c5be
2014-07-01 14:50:39 -07:00
Yunqing Wang
dee5782f93 Enable encode breakout in real time
For real time speed 7, once encode breakout is on(i.e. encoding
setting --static-thresh=1), a proper encode breakout threshold
is set to speed up the encoder.

Set --static-thresh=1, RTC set borg test showed a slight overall
psnr loss of 0.162%, but ssim gain of 0.287%. The average speedup
on RTC set is 6%, and for some clips, the speedup can be 10+%.

Change-Id: Id522d9ce779ff7c699936d13d0c47083de4afb85
2014-06-30 10:41:12 -07:00
Yunqing Wang
9d41313e4b Decide the partitioning threshold from the variance histogram
Before encoding a frame, calculate and store each 16x16 block's
variance of source difference between last and current frame.
Find partitioning threshold T for the frame from its variance
histogram, and then use T to make partition decisions.

Comparing with fixed 16x16 partitioning, rtc set test showed an
overall psnr gain of 3.242%, and ssim gain of 3.751%. The best
psnr gain is 8.653%.

The overall encoding speed didn't change much. It got faster for
some clips(for example, 12% speedup for vidyo1), and a little
slower for others.

Also, a minor modification was made in datarate unit test.

Change-Id: Ie290743aa3814e83607b93831b667a2a49d0932c
2014-06-30 09:36:23 -07:00
Yaowu Xu
d0cb273e04 Allow encoder to set lpf level to 0
As a way to speed-up rtc encoding at speed 7.

Change-Id: Ie36a010392cf7b741dc130df21a4e733622a75b7
2014-06-27 15:23:41 -07:00
Yunqing Wang
bccc785f63 Merge "Reuse inter prediction result in real-time speed 6" 2014-06-25 08:18:33 -07:00
Yunqing Wang
0aae100076 Reuse inter prediction result in real-time speed 6
In real-time speed 6, no partition search is done. The inter
prediction results got from picking mode can be reused in the
following encoding process. A speed feature reuse_inter_pred_sby
is added to only enable the resue in speed 6.

This patch doesn't change encoding result. RTC set tests showed
that the encoding speed gain is 2% - 5%.

Change-Id: I3884780f64ef95dd8be10562926542528713b92c
2014-06-24 12:46:33 -07:00
Paul Wilkins
8160a26fa0 Fix some bugs in multi-arf
Fix some bugs relating to the use of buffers
in the overlay frames.

Fix bug where a mid sequence overlay was
propagating large partition and transform sizes into
the subsequent frame because of :-
  sf->last_partitioning_redo_frequency  > 1 and
  sf->tx_size_search_method == USE_LARGESTALL

Change-Id: Ibf9ef39a5a5150f8cbdd2c9275abb0316c67873a
2014-06-24 13:07:48 +01:00
Dmitry Kovalev
4ff1a614f1 Adding MV_SPEED_FEATURES struct.
Moving all motion vector related speed parameters from SPEED_FEATURES to
MV_SPEED_FEATURES.

Change-Id: I3e9af0039c7162f8671878c5920bce3cb256a84e
2014-06-12 14:15:27 -07:00
Dmitry Kovalev
bc93f425d0 Removing two unused TX_SIZE_SEARCH_METHOD members.
Change-Id: I33a38bb9f46e7ef509bbbf0cfd7bc3ea5072d022
2014-06-10 11:08:30 -07:00
Dmitry Kovalev
22368479c0 Merge "Removing chessboard_index from SPEED_FEATURES." 2014-06-10 10:53:53 -07:00
Yunqing Wang
b04d766800 Use small transform size in non-rd real-time mode
In non-rd real-time mode, choosing smaller transform size in
encoding gives better video quality and good speed gain than
choosing larger transform size. This patch set tx size search
method to ALLOW_8X8, which is better than using 4x4 or other
larger sizes.

Borg tests on rtc set at speed 6 showed significant gain on quality.
PSNR gain: 11.034% and SSIM gain: 15.466%.

The speed gain is 5% - 12% for <720p clips, and 2% - 7% for
720p clips.

Change-Id: If4dc74ed2df359346b059f47fb73b4a0193ec548
2014-06-09 08:26:50 -07:00
Dmitry Kovalev
923c30a174 Removing chessboard_index from SPEED_FEATURES.
This is not a speed feature, adding inline function instead.

Change-Id: Ia48c41802eec9e92cf990339d724097279695c9a
2014-06-05 18:17:54 -07:00