615 Commits

Author SHA1 Message Date
Ronald S. Bultje
e20fcd9585 More optimizations for cost_coeffs().
4x4:    163 ->  123 cycles (33% faster)
8x8:    491 ->  399 cycles (23% faster)
16x16: 1889 -> 1763 cycles (7% faster)
32x32: 8311 -> 8180 cycles (1.6% faster)

Overall encoding time of first 50 frames of bus (speed 0) @ 1500kbps
goes from 1min4.33 to 1min3.00, i.e. 2.11% faster.

Change-Id: Ib52d1dbb5649b14de769d3e7a74af67440b5284f
2013-07-22 16:09:09 -07:00
Dmitry Kovalev
b2fc6fa969 Adding update_tx_counts function.
Moving common encoder/decoder code to update_tx_counts. Also renaming
vp9_get_pred_probs_tx_size to get_tx_probs2 and adding get_tx_probs to
call vp9_get_pred_context_tx_size inside read_selected_tx_size only once
(twice before).

Change-Id: Ia50247f3893de88ef8e9041b0d44be44a40aaa4d
2013-07-22 14:57:43 -07:00
Yaowu Xu
fc186dcad6 fix a build error
Change-Id: I3b05687f439ff6a7c426d2c97a6c58c831fa51ac
2013-07-22 12:37:30 -07:00
Jingning Han
416f315e82 Merge "Skip buffer update in sub8x8 rd loop" 2013-07-22 12:08:22 -07:00
Jingning Han
a5a9f5f7f3 Merge "Optimize operation flow in sub8x8 rd loop" 2013-07-22 12:08:15 -07:00
Jingning Han
409e77f2d4 Optimize operation flow in sub8x8 rd loop
Stack the rate-distortion statistics in the sub8x8 rd loop. This allows
the encoder to skip the forward transform, quantization, and coeff cost
estimation, in the sub8x8 rd optimization search, if the motion
vector(s) are of integer pixel value, and have been tested in the
previous prediction filter type rd loops of the same block.

This gives about 2% speed-up for bus_cif at 2000 kpbs, for speed 0.
Its efficacy depends how frequently the motion search will select an
integer motion vector.

Change-Id: Iee15d4283ad4adea05522c1d40b198b127e6dd97
2013-07-22 10:40:33 -07:00
Paul Wilkins
1d189d6464 Re-order mode search in rd.
Mode search order in rd loop changed to better reflect
observed hit counts.

Also some adjustment of the baseline mode rd thresholds
to reflect the order change and observed frequencies.

Change-Id: I47a131cc83e11551df8add6d6d8d413d78d3a63c
2013-07-22 17:21:12 +01:00
Dmitry Kovalev
39342db138 Merge "Consistent names for inter mode probabilities and encodings." 2013-07-20 22:40:51 -07:00
Dmitry Kovalev
f66821afbb Merge "Removing frame_type field from MACROBLOCKD struct." 2013-07-20 22:40:06 -07:00
Jingning Han
c725502bf3 Skip buffer update in sub8x8 rd loop
This commit allows the encoder to skip a few buffer update steps in
rd_pick_best_mbsegmentation, when early breakout has been triggered
in the rd_check_segment_txsize. It provides about 1% speed-up for
bus_cif at 2000 kbps, in the settings of speed 0.

Change-Id: Ica034f10a24dec572b397d8389a2b81020ebc0b9
2013-07-20 21:38:12 -07:00
Deb Mukherjee
302698fb12 Reworked the auto_mv_step_size speed feature
This patch modifies the auto_mv_step_size speed feature to
use a combination of the maximum magnitude mv from the last
inter frame, and the maximum magnitude mv for the two reference
mvs with the same reference. For arf frames, the max mav step
for the resolution is used.
The bounds therefore are slightly tighter. The feature is made
a speed 1 feature.

Rebased.

Results (when this feature is turned on over speed 0):
derfraw300: -0.046% psnr, about 5+% speedup
(tested on football: goes from 4m30.760s to 4m17.410s).

Change-Id: If492797a61b0b4b3e58c0b8f86afb880165fc9f6
2013-07-19 15:12:56 -07:00
Dmitry Kovalev
e71a4a77bb Merge "Renaming TXFM_MODE to TX_MODE (like TX_SIZE, TX_TYPE)." 2013-07-19 12:14:32 -07:00
Dmitry Kovalev
97e96bc4e9 Removing frame_type field from MACROBLOCKD struct.
Change-Id: Ia4e83913251c1cdc7aa2abd64bf01ecb1a962119
2013-07-19 11:55:36 -07:00
Dmitry Kovalev
c0eb57406c Renaming TXFM_MODE to TX_MODE (like TX_SIZE, TX_TYPE).
Moving TX_MODE enum to vp9_enums.h. Renaming txfm_mode variables to
tx_mode.

Change-Id: I459d1af6dd928ce7fccdf8ce30b6f1ca057bef92
2013-07-19 11:37:13 -07:00
Dmitry Kovalev
afe43d4089 Removing redundant VP9_COMMON* from function signatures.
Functions: vp9_get_pred_context_switchable_interp,
           vp9_get_pred_context_intra_inter,
           vp9_get_pred_context_single_ref_p1,
           vp9_get_pred_context_single_ref_p2.

Change-Id: I3d6fb8aee23c9062270768e1e6da416dd9bb8f96
2013-07-19 11:20:49 -07:00
Dmitry Kovalev
bc7acb134b Consistent names for inter mode probabilities and encodings.
Renaming vp9_sb_mv_ref_tree to vp9_inter_mode_tree, and
vp9_sb_mv_ref_encoding_array to vp9_inter_mode_encodings.

Change-Id: I0e91fbf81350d3ec5a2599064c74089b5d06133a
2013-07-19 10:40:04 -07:00
Yaowu Xu
37d901a47a Merge "Add best_rd breakout to keyframe partition selection also." 2013-07-18 17:50:39 -07:00
Yaowu Xu
67fb0679ee Merge "Merge scale_factors and scale_factors_uv." 2013-07-18 17:50:34 -07:00
Yaowu Xu
55b52e32da Merge "Do in-place UV intra mode selection." 2013-07-18 17:50:07 -07:00
Yaowu Xu
51972d1279 Merge "Change break statement in a 2d loop to a return statement." 2013-07-18 17:49:58 -07:00
Dmitry Kovalev
92f4198d52 Merge "Using VP9_REF_NO_SCALE instead of (1 << VP9_REF_SCALE_SHIFT)." 2013-07-18 17:29:05 -07:00
Dmitry Kovalev
0b562b2d3d Using VP9_REF_NO_SCALE instead of (1 << VP9_REF_SCALE_SHIFT).
Change-Id: Ide58a74d31ff948319445a6337d2c05e98720e34
2013-07-18 15:12:46 -07:00
Ronald S. Bultje
96e4db2660 Add best_rd breakout to keyframe partition selection also.
Change-Id: I96b8058f6dfecf8aa3e152cdcbfd7e10071fbbc9
2013-07-18 14:10:56 -07:00
Ronald S. Bultje
5ebe503f04 Merge scale_factors and scale_factors_uv.
This prevents a duplicate memcpy of a 128-byte struct every time
set_scale_factors() is called (which is a lot), thus leading to a
decrease from 3.7 MB to 1.85 MB of struct copying per 64x64 block
RD/partition loop.

Overall, this decreases encoding time of the first 50 frames of bus
@ 1500kbps (speed 0) from 1min5.9 to 1min4.9, i.e. about a 1.5%
overall speedup. We can likely get more gains by removing the copy
of the other struct (and replacing it with an indexing) as well.

Change-Id: I3dceb7e79f71e6fe911b11cc994cf89a869dde7a
2013-07-18 14:10:56 -07:00
Ronald S. Bultje
df4b4fab26 Do in-place UV intra mode selection.
This means we only do UV intra mode selection if we find any intra
mode to actually be useful at all; in addition, we only do UV intra
mode selection for the transform sizes that were selected, rather
than all sizes available in this partition.

First 50 frames of bus @ 1500kbps (speed 0) gains about 5% with this
change.

Change-Id: I7b461eb8b803247f57896c5a9505f745b55502b3
2013-07-18 14:10:56 -07:00
Ronald S. Bultje
e54a5782b9 Change break statement in a 2d loop to a return statement.
The break statement only breaks out of the nested loop, not the
top-level loop, so it doesn't always work as intended. Changing it
to a return statement does what's intended.

Change-Id: I585419823b39a04ec8826b1c8a216099b1728ba7
2013-07-18 14:10:56 -07:00
Ronald S. Bultje
2d4929e340 Remove motion vectors from PARTITION_INFO.
The same information already exists in union b_mode_info.

Change-Id: Iac5086b99a3c3cc270380138062bb693e58f9e6d
2013-07-18 14:10:52 -07:00
Ronald S. Bultje
9da67da04a Merge "Fix bug where we don't choose any mode in RD selection." 2013-07-18 12:47:50 -07:00
Ronald S. Bultje
247197d57b Fix bug where we don't choose any mode in RD selection.
This could happen during golden overlay frame coding from a previous
alt-ref frame if the special overlay code was triggered.

Change-Id: I3056d0c547cd26903b260ef93c94026e96bd9868
2013-07-18 12:13:15 -07:00
Ronald S. Bultje
4f5815290c Merge "Fix bug which skips zeromv even if near/nearest is not 0,0." 2013-07-18 10:06:51 -07:00
Ronald S. Bultje
deb7456058 Fix bug which skips zeromv even if near/nearest is not 0,0.
Change-Id: Id4f454831f3f11099f39c30246adeaa52857d08d
2013-07-18 09:35:19 -07:00
Jingning Han
ced3c20165 Use mv_check_bounds in sub8x8 rd loop
Make the use of mv_check_bounds consistent for mvs of both ref_frame[0]
and ref_frame[1].

Change-Id: I1ca24865cc7232ca9cbe5db566c53abad1592211
2013-07-17 17:13:51 -07:00
Ronald S. Bultje
facecd80da Merge "Add a best_yrd shortcut in splitmv mode search." 2013-07-17 16:11:13 -07:00
Ronald S. Bultje
056111c822 Merge "Skip redundant nearest/near/zero encodes in splitmv." 2013-07-17 16:10:51 -07:00
Ronald S. Bultje
0b1eba25b2 Merge "Skip nearest/near/zero redundant encodes." 2013-07-17 16:10:41 -07:00
Ronald S. Bultje
607424449c Merge "Best_rd breakout in rd partition search." 2013-07-17 16:10:22 -07:00
Yaowu Xu
6ac5b7db2c Merge "changed mode checking order" 2013-07-17 14:44:40 -07:00
Dmitry Kovalev
a7a1e96136 Merge changes Ieffea49e,Idf610746
* changes:
  Removing two unused arguments from vp9_inc_mv signature.
  Changing signature of vp9_get_pred_probs_tx_size.
2013-07-17 14:44:20 -07:00
Ronald S. Bultje
c6917528a5 Add a best_yrd shortcut in splitmv mode search.
Encoding of first 50 frames of bus (speed 0) @ 1500kbps goes from
1min6.2 to 1min5.9, i.e. 0.5% faster overall.

Change-Id: I59d8a3b2f0a75010fa041d5e2646c8caac5bd683
2013-07-17 14:21:57 -07:00
Ronald S. Bultje
161c995658 Skip redundant nearest/near/zero encodes in splitmv.
Encode of first 50 frames of bus @ 1500kbps (speed 0) goes from
1min7.3 to 1min6.2, i.e. 1.7% faster overall.

Change-Id: I19d2deacfbffadd61d32551cee9586757ab4a987
2013-07-17 13:53:48 -07:00
Yaowu Xu
42facc292d changed mode checking order
Change-Id: Ic4c4b363ed840935e42f495f13ea5e601a56f1b2
2013-07-17 13:43:50 -07:00
Ronald S. Bultje
8fea880b6f Skip nearest/near/zero redundant encodes.
Encode of first 50 frames of bus @ 1500kbps (speed 0) goes from 1min12.8
to 1min7.3, i.e. 8% faster.

Change-Id: Ia22d1c7b687316c553cc60eacae988b24e175b62
2013-07-17 11:33:15 -07:00
Ronald S. Bultje
9f427bfe98 Best_rd breakout in rd partition search.
About 15% faster for bus (speed 0) first 50 frames @ 1500kbps, which
goes from 1min36 to 1min24. Results become slightly better (+0.2% on
derf/yt, +0.4% on hd), probably because of a bugfix for skipmode in
super_block_yrd(). Overall speed change (on derfraw300) is roughly
-13%. This can probably be improved further by caching best_yrd
between partition searches. Also, we might be able to get more
speedups by always doing PARTITION_NONE before PARTITIONS_SPLIT, not
just at the sb8x8 level.

Change-Id: I83736949ebd5b4a3b400ee688d7661913fefc98b
2013-07-17 09:56:46 -07:00
Ronald S. Bultje
83c7e13a6b Do a skip-block check for sub8x8 partitions also.
+0.2% SSIM and glbPSNR on derfraw300.

Change-Id: I9cba0bca55e606a22f557c7732b064f738efe84d
2013-07-17 09:46:47 -07:00
Yunqing Wang
df90d58f4f Speed up motion estimation using small partitions' result(experiment)
Current partition checking starts from small sizes, and then goes up
to large sizes. This experiment uses the small partitions' motion
estimation result, which is already available, to speed up the
large partition's motion estimation. We can decide to skip some
patition checkings if they are unlikely choices. We could use the
motion vector(MV) result as current partition's prediction MV, limit
the search range and reference frame.

Current result at speed 1:
psnr loss: 1.19% for stdhd, 0.287% for derf.
speed gain: 14% for sunflower(hd), 11% for akiyo.

Further improvement will be done later.

Change-Id: I5abfd070e9cace2e91e2a0247d1325df313887ab
2013-07-17 09:11:47 -07:00
Paul Wilkins
d66eab15dd Merge "Move uv intra mode selection in rd loop." 2013-07-17 05:19:26 -07:00
Paul Wilkins
154c34a3ee Merge "Limit transform sizes searched for uv intra." 2013-07-17 03:40:11 -07:00
Paul Wilkins
2ee338ce3b Move uv intra mode selection in rd loop.
Use an estimate based on DC_PRED for intra uv cost
within the rd loop then only do a full uv mode analysis
if an intra mode is chosen.

Significant speed gains in some cases. Currently only
enabled for speed 2 pending speed/quality tests.

Change-Id: Ie851a12400d5483bce47ec0e3ccb8516041e91c0
2013-07-17 11:11:21 +01:00
Paul Wilkins
6c667f0ffe Limit transform sizes searched for uv intra.
Apply limit if search_method == USE_LARGESTALL
to the range of UV tx sizes searched.

Change-Id: I6db29f0dd237285ffc50d75a37e8b68151ad821c
2013-07-17 11:08:55 +01:00
Paul Wilkins
5f4722c75f Merge "Minor cleanup in code to fine uv tx_size." 2013-07-17 02:50:09 -07:00