Commit Graph

1418 Commits

Author SHA1 Message Date
Dmitry Kovalev
39342db138 Merge "Consistent names for inter mode probabilities and encodings." 2013-07-20 22:40:51 -07:00
Dmitry Kovalev
f66821afbb Merge "Removing frame_type field from MACROBLOCKD struct." 2013-07-20 22:40:06 -07:00
Dmitry Kovalev
2b089f149a Merge "Removing unused static arrays from vp9_reatectrl.c." 2013-07-20 22:39:33 -07:00
Jingning Han
c725502bf3 Skip buffer update in sub8x8 rd loop
This commit allows the encoder to skip a few buffer update steps in
rd_pick_best_mbsegmentation, when early breakout has been triggered
in the rd_check_segment_txsize. It provides about 1% speed-up for
bus_cif at 2000 kbps, in the settings of speed 0.

Change-Id: Ica034f10a24dec572b397d8389a2b81020ebc0b9
2013-07-20 21:38:12 -07:00
Yaowu Xu
ea284d6281 added checks to prevent rate/distortion overflow
At speed 2, due to the threshold scheme used, it is possible the rate
and distortion assigned with INT_MAX value. The patch added checking
to prevent the INT_MAX value is used in further calculation of RD
scores. The patch also changed the assertion in rd_use_partition() to
be mirror similar assertion in rd_pick_partition().

Change-Id: Idb52c543cc1e10abdf6e6a5d6e9cb535a42214dc
2013-07-19 17:52:50 -07:00
Dmitry Kovalev
7e703de729 Removing pre probabilities from FRAME_CONTEXT.
Using cm->frame_contexts[cm->frame_context_idx] as source of previous
probabilities.

Change-Id: Ie03778acf0e7bebdc3a1f6a51854d4a0712f24a1
2013-07-19 17:33:10 -07:00
Dmitry Kovalev
ee1771ebaa Moving all loop filter related variables into new struct.
Adding loopfilter struct with fields from MACROBLOCKD and VP9Common.
Eventually it will be moved to vp9_loopfilter.h for better code structure.

Change-Id: Iaf5fb71c33719cdfa1b991f671caf071be9ea035
2013-07-19 16:19:10 -07:00
Dmitry Kovalev
29f0f79317 Removing unused static arrays from vp9_reatectrl.c.
Removed arrays: kf_boost_seperation_adjustment,
                gf_adjust_table,
                gf_intra_usage_adjustment,
                gf_interval_table.

Change-Id: I62e400cb6e4d039787615169a3779e31ebf95893
2013-07-19 15:55:09 -07:00
Dmitry Kovalev
c3a56ee583 Merge "Moving Scale2Ration function from vp9_onyx.h to vp9_onyx_if.c." 2013-07-19 15:27:24 -07:00
Deb Mukherjee
302698fb12 Reworked the auto_mv_step_size speed feature
This patch modifies the auto_mv_step_size speed feature to
use a combination of the maximum magnitude mv from the last
inter frame, and the maximum magnitude mv for the two reference
mvs with the same reference. For arf frames, the max mav step
for the resolution is used.
The bounds therefore are slightly tighter. The feature is made
a speed 1 feature.

Rebased.

Results (when this feature is turned on over speed 0):
derfraw300: -0.046% psnr, about 5+% speedup
(tested on football: goes from 4m30.760s to 4m17.410s).

Change-Id: If492797a61b0b4b3e58c0b8f86afb880165fc9f6
2013-07-19 15:12:56 -07:00
Dmitry Kovalev
e71a4a77bb Merge "Renaming TXFM_MODE to TX_MODE (like TX_SIZE, TX_TYPE)." 2013-07-19 12:14:32 -07:00
Dmitry Kovalev
97e96bc4e9 Removing frame_type field from MACROBLOCKD struct.
Change-Id: Ia4e83913251c1cdc7aa2abd64bf01ecb1a962119
2013-07-19 11:55:36 -07:00
Dmitry Kovalev
c0eb57406c Renaming TXFM_MODE to TX_MODE (like TX_SIZE, TX_TYPE).
Moving TX_MODE enum to vp9_enums.h. Renaming txfm_mode variables to
tx_mode.

Change-Id: I459d1af6dd928ce7fccdf8ce30b6f1ca057bef92
2013-07-19 11:37:13 -07:00
Dmitry Kovalev
afe43d4089 Removing redundant VP9_COMMON* from function signatures.
Functions: vp9_get_pred_context_switchable_interp,
           vp9_get_pred_context_intra_inter,
           vp9_get_pred_context_single_ref_p1,
           vp9_get_pred_context_single_ref_p2.

Change-Id: I3d6fb8aee23c9062270768e1e6da416dd9bb8f96
2013-07-19 11:20:49 -07:00
Dmitry Kovalev
bc7acb134b Consistent names for inter mode probabilities and encodings.
Renaming vp9_sb_mv_ref_tree to vp9_inter_mode_tree, and
vp9_sb_mv_ref_encoding_array to vp9_inter_mode_encodings.

Change-Id: I0e91fbf81350d3ec5a2599064c74089b5d06133a
2013-07-19 10:40:04 -07:00
Paul Wilkins
f3ed9f5523 Alignment of THR_MODES to vp9_mode_order[]
Change-Id: I4032dd0442043543954dcb3724df974b7cc7e515
2013-07-19 11:33:39 +01:00
Dmitry Kovalev
13253d6121 Merge "Removing kf_{y, uv}_mode_prob arrays from VP9Common." 2013-07-19 01:00:46 -07:00
Ronald S. Bultje
e4686c589e Fix slightly quality drop caused at speed 1.
We would skip the rectangular blocks for sub8x8 partitions because
we would conclude that PARTITION_NONE was better than PARTITION_SPLIT,
however, that conclusion was made before we actually really tested
PARTITION_SPLIT.

Change-Id: I8fa91e59894badc1d8cee3ba8a49e40ae4c4a489
2013-07-18 17:52:08 -07:00
Yaowu Xu
37d901a47a Merge "Add best_rd breakout to keyframe partition selection also." 2013-07-18 17:50:39 -07:00
Yaowu Xu
67fb0679ee Merge "Merge scale_factors and scale_factors_uv." 2013-07-18 17:50:34 -07:00
Yaowu Xu
55b52e32da Merge "Do in-place UV intra mode selection." 2013-07-18 17:50:07 -07:00
Yaowu Xu
51972d1279 Merge "Change break statement in a 2d loop to a return statement." 2013-07-18 17:49:58 -07:00
Dmitry Kovalev
92f4198d52 Merge "Using VP9_REF_NO_SCALE instead of (1 << VP9_REF_SCALE_SHIFT)." 2013-07-18 17:29:05 -07:00
Dmitry Kovalev
0b562b2d3d Using VP9_REF_NO_SCALE instead of (1 << VP9_REF_SCALE_SHIFT).
Change-Id: Ide58a74d31ff948319445a6337d2c05e98720e34
2013-07-18 15:12:46 -07:00
Ronald S. Bultje
96e4db2660 Add best_rd breakout to keyframe partition selection also.
Change-Id: I96b8058f6dfecf8aa3e152cdcbfd7e10071fbbc9
2013-07-18 14:10:56 -07:00
Ronald S. Bultje
5ebe503f04 Merge scale_factors and scale_factors_uv.
This prevents a duplicate memcpy of a 128-byte struct every time
set_scale_factors() is called (which is a lot), thus leading to a
decrease from 3.7 MB to 1.85 MB of struct copying per 64x64 block
RD/partition loop.

Overall, this decreases encoding time of the first 50 frames of bus
@ 1500kbps (speed 0) from 1min5.9 to 1min4.9, i.e. about a 1.5%
overall speedup. We can likely get more gains by removing the copy
of the other struct (and replacing it with an indexing) as well.

Change-Id: I3dceb7e79f71e6fe911b11cc994cf89a869dde7a
2013-07-18 14:10:56 -07:00
Ronald S. Bultje
df4b4fab26 Do in-place UV intra mode selection.
This means we only do UV intra mode selection if we find any intra
mode to actually be useful at all; in addition, we only do UV intra
mode selection for the transform sizes that were selected, rather
than all sizes available in this partition.

First 50 frames of bus @ 1500kbps (speed 0) gains about 5% with this
change.

Change-Id: I7b461eb8b803247f57896c5a9505f745b55502b3
2013-07-18 14:10:56 -07:00
Ronald S. Bultje
e54a5782b9 Change break statement in a 2d loop to a return statement.
The break statement only breaks out of the nested loop, not the
top-level loop, so it doesn't always work as intended. Changing it
to a return statement does what's intended.

Change-Id: I585419823b39a04ec8826b1c8a216099b1728ba7
2013-07-18 14:10:56 -07:00
Ronald S. Bultje
2d4929e340 Remove motion vectors from PARTITION_INFO.
The same information already exists in union b_mode_info.

Change-Id: Iac5086b99a3c3cc270380138062bb693e58f9e6d
2013-07-18 14:10:52 -07:00
James Zern
5f30a0c687 VP[89]_COMMON: remove golden/altref frame counts
these are only used in the encoder.
frames_since_golden / frames_till_alt_ref_frame -> VP[89]_COMP

Change-Id: Ie14a6f46987bced685ddb449b85dc261caba6dfe
2013-07-18 14:09:21 -07:00
Dmitry Kovalev
9f3c0e34a9 Moving Scale2Ration function from vp9_onyx.h to vp9_onyx_if.c.
Change-Id: Idfe2a850f72b38f519aea1aac1266d8c3aa813ee
2013-07-18 14:05:06 -07:00
Ronald S. Bultje
9da67da04a Merge "Fix bug where we don't choose any mode in RD selection." 2013-07-18 12:47:50 -07:00
Ronald S. Bultje
247197d57b Fix bug where we don't choose any mode in RD selection.
This could happen during golden overlay frame coding from a previous
alt-ref frame if the special overlay code was triggered.

Change-Id: I3056d0c547cd26903b260ef93c94026e96bd9868
2013-07-18 12:13:15 -07:00
Ronald S. Bultje
4f5815290c Merge "Fix bug which skips zeromv even if near/nearest is not 0,0." 2013-07-18 10:06:51 -07:00
Ronald S. Bultje
deb7456058 Fix bug which skips zeromv even if near/nearest is not 0,0.
Change-Id: Id4f454831f3f11099f39c30246adeaa52857d08d
2013-07-18 09:35:19 -07:00
Jingning Han
ced3c20165 Use mv_check_bounds in sub8x8 rd loop
Make the use of mv_check_bounds consistent for mvs of both ref_frame[0]
and ref_frame[1].

Change-Id: I1ca24865cc7232ca9cbe5db566c53abad1592211
2013-07-17 17:13:51 -07:00
Dmitry Kovalev
f9f453ec8d Removing kf_{y, uv}_mode_prob arrays from VP9Common.
These arrays have constant values (no any updates). Removing two
corresponding memcpy calls. Making a little cleanup in vp9_entropymode.h
as well: removing redundant 'extern' keyword and moving all function
declarations at the end.

Change-Id: Ia16b38b46aec2e2500f5df29c40a297ae241dede
2013-07-17 16:50:52 -07:00
Ronald S. Bultje
facecd80da Merge "Add a best_yrd shortcut in splitmv mode search." 2013-07-17 16:11:13 -07:00
Ronald S. Bultje
056111c822 Merge "Skip redundant nearest/near/zero encodes in splitmv." 2013-07-17 16:10:51 -07:00
Ronald S. Bultje
0b1eba25b2 Merge "Skip nearest/near/zero redundant encodes." 2013-07-17 16:10:41 -07:00
Ronald S. Bultje
607424449c Merge "Best_rd breakout in rd partition search." 2013-07-17 16:10:22 -07:00
Yunqing Wang
3798db88e1 Remove unnecessary calling of vp9_init_quantizer()
vp9_init_quantizer() is called in vp9_create_compressor(), and
should not be called in vp9_set_speed_features().

Change-Id: Ic2f1f4b0531b9d46bb841d7e1d8da9812207dad6
2013-07-17 14:59:00 -07:00
Yaowu Xu
6ac5b7db2c Merge "changed mode checking order" 2013-07-17 14:44:40 -07:00
Dmitry Kovalev
a7a1e96136 Merge changes Ieffea49e,Idf610746
* changes:
  Removing two unused arguments from vp9_inc_mv signature.
  Changing signature of vp9_get_pred_probs_tx_size.
2013-07-17 14:44:20 -07:00
Ronald S. Bultje
c6917528a5 Add a best_yrd shortcut in splitmv mode search.
Encoding of first 50 frames of bus (speed 0) @ 1500kbps goes from
1min6.2 to 1min5.9, i.e. 0.5% faster overall.

Change-Id: I59d8a3b2f0a75010fa041d5e2646c8caac5bd683
2013-07-17 14:21:57 -07:00
Ronald S. Bultje
161c995658 Skip redundant nearest/near/zero encodes in splitmv.
Encode of first 50 frames of bus @ 1500kbps (speed 0) goes from
1min7.3 to 1min6.2, i.e. 1.7% faster overall.

Change-Id: I19d2deacfbffadd61d32551cee9586757ab4a987
2013-07-17 13:53:48 -07:00
Yaowu Xu
42facc292d changed mode checking order
Change-Id: Ic4c4b363ed840935e42f495f13ea5e601a56f1b2
2013-07-17 13:43:50 -07:00
Ronald S. Bultje
8fea880b6f Skip nearest/near/zero redundant encodes.
Encode of first 50 frames of bus @ 1500kbps (speed 0) goes from 1min12.8
to 1min7.3, i.e. 8% faster.

Change-Id: Ia22d1c7b687316c553cc60eacae988b24e175b62
2013-07-17 11:33:15 -07:00
Yunqing Wang
10e83b0717 Enable disable_splitmv feature for other speeds
Added disable_splitmv feature at other speed levels. For speed 3 or
above, always turn it on.

Change-Id: Ibb36f0a7ef12a34b4f8d0f9cb6193eab43b34360
2013-07-17 10:25:49 -07:00
Ronald S. Bultje
9f427bfe98 Best_rd breakout in rd partition search.
About 15% faster for bus (speed 0) first 50 frames @ 1500kbps, which
goes from 1min36 to 1min24. Results become slightly better (+0.2% on
derf/yt, +0.4% on hd), probably because of a bugfix for skipmode in
super_block_yrd(). Overall speed change (on derfraw300) is roughly
-13%. This can probably be improved further by caching best_yrd
between partition searches. Also, we might be able to get more
speedups by always doing PARTITION_NONE before PARTITIONS_SPLIT, not
just at the sb8x8 level.

Change-Id: I83736949ebd5b4a3b400ee688d7661913fefc98b
2013-07-17 09:56:46 -07:00
Ronald S. Bultje
83c7e13a6b Do a skip-block check for sub8x8 partitions also.
+0.2% SSIM and glbPSNR on derfraw300.

Change-Id: I9cba0bca55e606a22f557c7732b064f738efe84d
2013-07-17 09:46:47 -07:00
Yunqing Wang
df90d58f4f Speed up motion estimation using small partitions' result(experiment)
Current partition checking starts from small sizes, and then goes up
to large sizes. This experiment uses the small partitions' motion
estimation result, which is already available, to speed up the
large partition's motion estimation. We can decide to skip some
patition checkings if they are unlikely choices. We could use the
motion vector(MV) result as current partition's prediction MV, limit
the search range and reference frame.

Current result at speed 1:
psnr loss: 1.19% for stdhd, 0.287% for derf.
speed gain: 14% for sunflower(hd), 11% for akiyo.

Further improvement will be done later.

Change-Id: I5abfd070e9cace2e91e2a0247d1325df313887ab
2013-07-17 09:11:47 -07:00
Paul Wilkins
d66eab15dd Merge "Move uv intra mode selection in rd loop." 2013-07-17 05:19:26 -07:00
Paul Wilkins
154c34a3ee Merge "Limit transform sizes searched for uv intra." 2013-07-17 03:40:11 -07:00
Paul Wilkins
2ee338ce3b Move uv intra mode selection in rd loop.
Use an estimate based on DC_PRED for intra uv cost
within the rd loop then only do a full uv mode analysis
if an intra mode is chosen.

Significant speed gains in some cases. Currently only
enabled for speed 2 pending speed/quality tests.

Change-Id: Ie851a12400d5483bce47ec0e3ccb8516041e91c0
2013-07-17 11:11:21 +01:00
Paul Wilkins
6c667f0ffe Limit transform sizes searched for uv intra.
Apply limit if search_method == USE_LARGESTALL
to the range of UV tx sizes searched.

Change-Id: I6db29f0dd237285ffc50d75a37e8b68151ad821c
2013-07-17 11:08:55 +01:00
Paul Wilkins
5f4722c75f Merge "Minor cleanup in code to fine uv tx_size." 2013-07-17 02:50:09 -07:00
Dmitry Kovalev
6638b6f63f Merge "Removing MV_GROUP_UPDATE define and corresponding code." 2013-07-16 21:09:00 -07:00
Jingning Han
0b58fa80a0 Merge "Skip redundant motion search in 4x4 level rd loop" 2013-07-16 20:54:25 -07:00
Jingning Han
a142d6fc93 Skip redundant motion search in 4x4 level rd loop
This commit makes the encoder to perform motion search only once
per reference frame type for each 4x4/4x8/8x4 block. For bus_cif
at 2000 kbps, the runtime goes from 253812ms -> 217817ms
(14% speed-up) for speed 0.

Change-Id: I5f17599ccc8cfaf93ccb4f98fcb6008af6d79e92
2013-07-16 17:21:11 -07:00
Dmitry Kovalev
41ae3d02d4 Removing two unused arguments from vp9_inc_mv signature.
Change-Id: Ieffea49eb7a5e5092f21f8694c546aff69b07c6d
2013-07-16 17:01:08 -07:00
Dmitry Kovalev
5b65a71cdc Changing signature of vp9_get_pred_probs_tx_size.
Removing VP9_COMMON* argument and adding struct tx_probs* instead of
MACROBLOCKD*.

Change-Id: Idf61074631a90ec51eac22c8dcd977f44ac0757c
2013-07-16 16:34:54 -07:00
Dmitry Kovalev
3997da0d35 Removing MV_GROUP_UPDATE define and corresponding code.
Change-Id: I4884cdc2557d25d50c7c4f7e19b1ad8bdb93cd63
2013-07-16 15:03:00 -07:00
Dmitry Kovalev
9482a0bf10 Cleaning up tile code.
Removing tile_rows and tile_columns from VP9Common, removing redundant
constants MIN_TILE_WIDTH and MAX_TILE_WIDTH, changing signature of
vp9_get_tile_n_bits.

Change-Id: I8ff3104a38179b2c6900df965c144c1d6f602267
2013-07-16 14:47:15 -07:00
James Zern
39ce4b13d5 Merge "use consistent framerate naming" 2013-07-16 14:22:52 -07:00
James Zern
9581eb6e8a use consistent framerate naming
s/frame_rate/framerate/g

Change-Id: I6fc3e088e419c5f46e3a9390dd8a2cad2677a2fc
2013-07-16 14:12:47 -07:00
Dmitry Kovalev
5de96b3ce6 Merge "Rewriting vp9_set_pred_flag_{seg_id, mbskip}." 2013-07-16 13:34:42 -07:00
James Zern
5baa416b6c Merge "vp9: remove frames_{since,till}.. from MACROBLOCKD" 2013-07-16 13:00:14 -07:00
James Zern
3a7c2665d0 Merge "yv12config: remove YUV_TYPE" 2013-07-16 12:16:04 -07:00
Dmitry Kovalev
863138a2ad Rewriting vp9_set_pred_flag_{seg_id, mbskip}.
Making implementation of vp9_set_pred_flag_{seg_id, mbskip} consistent
with vp9_get_segment_id without using confusing sub(a, b) macro. Passing
mi_row and mi_col to functions explicitly instead of replying on
mb_to_right_edge and mb_to_bottom_edge.

Change-Id: I54c1087dd2ba9036f8ba7eb165b073e807d00435
2013-07-16 10:44:48 -07:00
Paul Wilkins
30d2ea45ce Minor cleanup in code to fine uv tx_size.
Change-Id: I94b97a966b5efbc9a243048f1f5ddbbdc4b1846e
2013-07-16 18:27:33 +01:00
Jingning Han
dd97c62ab8 Merge "Skip inter-coded block reconstruction in rd loop" 2013-07-16 09:03:38 -07:00
Dmitry Kovalev
e8e7620a1f Merge "Removing and moving around constant definitions." 2013-07-16 00:52:53 -07:00
Yaowu Xu
c5b0cd8405 Merge "Change to extend full border only when needed" 2013-07-15 21:35:32 -07:00
Yaowu Xu
5b915ebd92 Change to extend full border only when needed
This is a short term optimization till we work out a decoder
implementation requiring no frame border extension.

Change-Id: I02d15bfde4d926b50a4e58b393d8c4062d1be70f
2013-07-15 20:52:13 -07:00
Dmitry Kovalev
ca75f1255f Removing and moving around constant definitions.
Removing unused and duplicated constants, moving them from *.h to *.c
if possible.

Change-Id: Ief4d6b984a3ca2e9b38504f0d855ed072cf7133f
2013-07-15 19:26:30 -07:00
Johann
6eae37f45c Merge "Remove print_nmvcounts" 2013-07-15 18:43:41 -07:00
Ronald S. Bultje
1ff94fea56 Inline vp9_quantize() in xform_quant().
Cycle times:
4x4:    151 to  131 cycles (15% faster)
8x8:    334 to  306 cycles (9% faster)
16x16: 1401 to 1368 cycles (2.5% faster)
32x32: 7403 to 7367 cycles (0.5% faster)

Total encode time of first 50 frames of bus @ 1500kbps (speed 0)
goes from 1min39.2 to 1min38.6, i.e. a 0.67% overall speedup.

Change-Id: I799a49460e5e3fcab01725564dd49c629bfe935f
2013-07-15 17:30:57 -07:00
Ronald S. Bultje
6fb418741f Inline xform_quant() in encode_block_intra().
Also inline some of the block calculations to assist the compiler to
not do silly things like calculating the same offset (or converting
between raster/transform block offset or block, mi and pixel unit)
many, many, many times.

Cycle times:
4x4:     584 ->   505 cycles (16% faster)
8x8:    1651 ->  1560 cycles (6% faster)
16x16:  7897 ->  7704 cycles (2.5% faster)
32x32: 16096 -> 15852 cycles (1.5% faster)

Overall, this saves about 0.5 seconds (1min49.8 -> 1min49.3) on the
first 50 frames of bus (speed 0) @ 1500kbps, i.e. 0.5% overall.

Change-Id: If3dd62453f8e2ab9d4ee616bc4ea956fb8874b80
2013-07-15 16:00:42 -07:00
Jingning Han
043e0f9dad Skip inter-coded block reconstruction in rd loop
Skip the inverse transform and reconstruction of inter-mode coded
blocks in the rate-distortion optimization loop, when skip_encode_sb
feature is turned on. This provides about 1% speed-up at speed 0,
and 1.5% speed-up at speed 1. No performance change in both settings.

Change-Id: I2932718bf4d007163702b61b16b6ff100cf9d007
2013-07-15 11:32:14 -07:00
Jingning Han
faff6ed0fb Skip duplicate block encoding in the rd loop
This speed feature allows the encoder to largely remove the spatial
dependency between blocks inside a 64x64 superblock, thereby removing
the need to repeatedly encode superblocks per partition type in the
rate-distortion optimization loop.

A major challenge lies in the intra modes tested in the rate-distortion
optimization loop. The subsequent blocks do not have access to the
reconstructed boundary pixels without the intermediate coding steps.
This was resolved by using the original pixels for intra prediction
in the rd loop, followed by an appropriately designed distortion
modeling on the quantization parameters. Experiments also suggested
that the performance impact is more discernible at lower bit-rate/psnr
settings. Hence a quantizer dependent threshold is applied to deactivate
skip of block coding.

For bus_cif at 2000 kbps,
speed 0: runtime 269854ms -> 237774ms (12% speed-up) at 0.05dB
         performance loss.

speed 1: runtime 65312ms  -> 61536ms, (7% speed-up) at 0.04dB
         performance loss.

This operation is currently turned on in settings of speed 1.

Change-Id: Ib689741dfff8dd38365d8c1b92860a3e176f56ec
2013-07-15 11:08:58 -07:00
James Zern
dc1d2331f6 vp9: remove frames_{since,till}.. from MACROBLOCKD
frames_since_golden / frames_till_alt_ref_frame are unused.

Change-Id: I348e7689d4d75412cf4de7703d885be942e4a26b
2013-07-13 18:02:11 -07:00
Dmitry Kovalev
429070987a Using vp9_copy and vp9_zero instead of custom code.
Change-Id: Id9b6ceeddca3f9b34bfada5c499b1e7a2f42c30b
2013-07-12 18:07:43 -07:00
Yaowu Xu
cdea4a7c66 Merge "Fix a build issue" 2013-07-12 16:17:22 -07:00
James Zern
4fc6c88e9c yv12config: remove YUV_TYPE
this was never fleshed out in the context of VP8, for which it was
added. for VP9 it has no meaning.

Change-Id: Iba2ecc026d9e947067b96690245d337e51e26eff
2013-07-12 15:25:48 -07:00
Dmitry Kovalev
cc662dd768 Adding struct tx_probs and struct tx_counts to cleanup the code.
Also removing unused declarations from vp9_entropymode.h file.

Change-Id: Ib9c5826db3584a32f6bb3297a76c522b99d83402
2013-07-12 15:22:38 -07:00
Yaowu Xu
fb754b182f Fix a build issue
Change-Id: I23a75c495ed7ea917d7f312bef0990e20a6b53d9
2013-07-12 11:38:44 -07:00
James Zern
0195fb53cb vp9: consistent 'log2' variable naming
lg2 -> log2

Change-Id: I0602ddff49e42c9c40c29c084d04b7592b9f8edf
2013-07-12 11:37:43 -07:00
Deb Mukherjee
94c481f9f1 Some minor cleanups for efficiency
Implements some of the helper functions more efficiently with
lookups rathers than branches. Modeling function is consolidated
to reduce some computations.

Also merged the two enums BLOCK_SIZE_TYPES and BlockSize into
one because there is no need to keep them separate (even though
the semantics are a little different).

No bitstream or output change.

About 0.5% speedup

Change-Id: I7d71a66e8031ddb340744dc493f22976052b8f9f
2013-07-12 10:22:56 -07:00
Dmitry Kovalev
727631873d Merge "Removing redundant code mostly from vp9_pred_common.{h, c}." 2013-07-12 10:22:30 -07:00
Paul Wilkins
b8ddc9f0d3 Merge "Speed 2 feature adjustment." 2013-07-12 02:14:01 -07:00
Jingning Han
84c3ac0476 Merge "Remove unnecessary tx_type branch in encode_block" 2013-07-11 21:52:27 -07:00
Dmitry Kovalev
dd150e8ea9 Removing redundant code mostly from vp9_pred_common.{h, c}.
Removing redundant function arguments and curly braces.

Change-Id: I46e02561f33fe02e84a3b19756f03b9504bd6a1b
2013-07-11 18:39:10 -07:00
Johann
e6ab476dd4 Remove print_nmvcounts
For some reason iOS builds take a really long time to sort this
function out.

It's not used anywhere so remove it.

Change-Id: Ia5c8513a0d9c7eb32641cca58ca1c1113e2dd9f4
2013-07-11 17:22:03 -07:00
Ronald S. Bultje
ee09dd9949 Remove unused function block_error().
Change-Id: I78a79fc51c2d7cc3c261f35b569155397f3dc0c4
2013-07-11 17:14:03 -07:00
Dmitry Kovalev
8c05e59065 Calling is_inter_mode() instead of custom code.
Change-Id: Iccd4ab95ea51a6d57ed43947f2fd7ad92e8979cf
2013-07-11 14:14:47 -07:00
Dmitry Kovalev
c4ad3273c7 Moving segmentation related vars into separate struct.
Adding segmentation struct to vp9_seg_common.h. Struct members are from
macroblockd and VP9Common structs. Moving segmentation related constants
and enums to vp9_seg_common.h.

Change-Id: I23fabc33f11a359249f5f80d161daf569d02ec03
2013-07-11 11:57:57 -07:00
Dmitry Kovalev
f70c021d36 Merge "Adding write_compressed_header function." 2013-07-11 11:57:17 -07:00
Dmitry Kovalev
802e57535a Merge "Removing unused TOKENEXTRA arg from pick_sb_modes function." 2013-07-11 11:46:06 -07:00
Jingning Han
b9381b6faf Remove unnecessary tx_type branch in encode_block
The function encode_block is called only by inter-prediction modes,
hence removing the transform type branching there.

Change-Id: I34a3172e28ce2388835efd0f8781922211bff857
2013-07-11 09:11:35 -07:00
Scott LaVarnway
f2a6bcfb18 Eliminated prev_mip memsets/memcpys in encoder
This patch is in experimental but was not merged into master.

This patch swaps ptrs instead of copying and uses the
last show_frame flag instead of setting the entire buffer
to zero.

Change-Id: Ia0950466c8ba301a2a5bf917ff3d07bc1a2c2311
2013-07-11 10:47:28 -04:00
Paul Wilkins
5290eeab88 Speed 2 feature adjustment.
With sf->auto_mv_step_size on it is questionable
whether sf->reduce_first_step_size is worthwhile.
At speed 2 it was not having a big impact.

Even at speed 2 sf->optimize_coefficients = 0 is not
having a big speed imapct so for now I have moved it
down into a higher speed setting.

Change-Id: I8a54de76d486ad37aabce76474889da2768b14c1
2013-07-11 13:59:12 +01:00
Jingning Han
aedc7c59b1 Merge "Fix tx_type bug in intra4x4 rd loop" 2013-07-10 20:13:25 -07:00
Ronald S. Bultje
c13e0bcb52 Remove unused fwalsh/fdct x86 SIMD implementations.
Change-Id: Ia942e56cf322821d42ba06178672791eeee2847e
2013-07-10 18:22:51 -07:00
Dmitry Kovalev
544d8c3316 Removing unused TOKENEXTRA arg from pick_sb_modes function.
Change-Id: I0543e72fa092eef3976b65e16bb597197c364873
2013-07-10 15:57:28 -07:00
Jingning Han
18803f9cc4 Fix tx_type bug in intra4x4 rd loop
This commit fixed the mis-use of the tx_type for inverse transform
in intra4x4 rate-distortion optimization loop. It improves the
overall coding performance.

Change-Id: I7fe9953175b74890357dbcee33c138573766e980
2013-07-10 15:49:49 -07:00
Deb Mukherjee
7494bba66b Merge "Prunes out full-rd computation based on modeled rd" 2013-07-10 15:37:11 -07:00
Dmitry Kovalev
0ac5e4dd58 Adding write_compressed_header function.
Change-Id: Ic5257fa8278e9b6297de230e4fd26a1e23ad2bb7
2013-07-10 15:08:34 -07:00
Jim Bankoski
68ef7a6b8a configure with internal stats not working
Change-Id: I5dea4570cb05df27a522abf6e7b695998654284a
2013-07-10 15:07:53 -07:00
Jim Bankoski
865ca76604 Merge "remove warnings when NDEBUG is set" 2013-07-10 14:39:39 -07:00
Jim Bankoski
6591cf2f7e remove warnings when NDEBUG is set
Change-Id: Ie0cb732fdcb98616a422c4463bff80642248d136
2013-07-10 14:27:20 -07:00
Deb Mukherjee
53ff43adc3 Prunes out full-rd computation based on modeled rd
Adds a speed feature to eliminate full-rd computation if the modeled
rd or rd based on a different parameter in the same mode is already
a lot larger than the best rd yet.

Specifically, only search the sharp and smooth filters if the modeled
rd cost based on the  regular filter is within a certain factor of the
best rd cost so far. Also, skip full-rd computation of non splitmv
inter modes if the modeled rd cost based on pred error is within the
same factor of the best rd cost so far.

Also adds some enhancements in the rd search for splitmv mode to
speed things up by early breakouts. Negligible impact on performance.

Resuts on derfraw300:
psnr:    -0.013% with the splitmv enhancements, -0.24% with the rd
         breakout feature on.
speedup: 6% with splitmv enhancements, 20% with also residual breakout
         (tested on football sequence at 600 Kbps)

Change-Id: I37abc308ea9f110c1679ce649b6a7e73ab1ad5fc
2013-07-10 13:49:49 -07:00
Jingning Han
114423538f SSE2 16x16 ADST/DCT hybrid transform
This commit enables 16x16 ADST/DCT forward hybrid transform using SSE2
operations. It reduces the runtime from 5433 cycles to 1621 cycles, at
no compression performance loss.

Change-Id: I75fd7f1984e9e28846af459f810ff0d6ae125230
2013-07-10 12:14:53 -07:00
Dmitry Kovalev
417df1d42e Merge "Adding encode_tiles function to vp9_bitstream.c." 2013-07-10 11:43:50 -07:00
Yaowu Xu
e52eec490c Merge "Add a feature to reduce chrome intra mode search" 2013-07-10 11:35:47 -07:00
Ronald S. Bultje
b1df674a99 Remove memcpy() in handle_inter_mode() filter selection.
Encode time of first 50 frames of bus (speed 0) @ 1500kbps goes from
2min4.9 to 2min3.1, i.e. a 1.4% speedup overall.

Change-Id: I9b25e87974430cb942caa276410bb2eda815bd83
2013-07-10 09:27:56 -07:00
Yaowu Xu
bed27a960a Add a feature to reduce chrome intra mode search
Change-Id: I721ebdeef2b53ce3e5c3eba3f7462ae2103c95a8
2013-07-10 08:59:18 -07:00
Jim Bankoski
fb027a7658 removing case statements around prediction entropy coding
Removes SEG_ID
Removes MBSKIP
Removes SWITCHABLE_INTERP
Removes INTRA_INTER
Removes COMP_INTER_INTER
Removes COMP_REF_P
Removes SINGLE_REF_P1
Removes SINGLE_REF_P2
Removes TX_SIZE

Change-Id: Ie4520ae1f65c8cac312432c0616cc80dea5bf34b
2013-07-09 20:10:16 -07:00
Yaowu Xu
059f2929e9 Merge "Revert "Remove memcpy() in handle_inter_mode() filter selection."" 2013-07-09 20:10:06 -07:00
Yaowu Xu
205efbc153 Revert "Remove memcpy() in handle_inter_mode() filter selection."
This reverts commit fcf7998a47.

Change-Id: Ic6532223faec9f1483b78adb2e37b79c7b1a0efb
2013-07-09 17:42:10 -07:00
Dmitry Kovalev
d82f459d1a Adding encode_tiles function to vp9_bitstream.c.
Change-Id: Ie44824ec25fd8fdb25d7c8124a9b28c26d802029
2013-07-09 15:59:19 -07:00
John Koleszar
f0d9f10d24 Remove all asm offset files from VP9
The files are empty and unused.

Change-Id: Ieb4242d14273efdf24149bda33f9591540bba06a
2013-07-09 14:26:53 -07:00
Ronald S. Bultje
204d1b7058 Merge "Unbreak lossless." 2013-07-09 09:54:48 -07:00
Ronald S. Bultje
d8fa5d45cc Merge "Make intra prediction pointers RTCD-based." 2013-07-09 09:54:43 -07:00
Ronald S. Bultje
059c0ba5d4 Unbreak lossless.
Change-Id: I8130ec9b5371c65e885f245a5ac73840c23cb4a1
2013-07-09 09:46:37 -07:00
Dmitry Kovalev
c6c279aff0 Merge "Using mi_cols instead of mb_cols." 2013-07-08 20:09:19 -07:00
Dmitry Kovalev
1c65c580d6 Merge "Refactoring setup_pre_planes function." 2013-07-08 20:08:05 -07:00
Dmitry Kovalev
6254c8d780 Merge "Calling set_partition_seg_context() instead of code duplication." 2013-07-08 20:07:06 -07:00
Ronald S. Bultje
8350e7fe38 Make intra prediction pointers RTCD-based.
This probably has a mildly negative impact on performance, but will
(in future commits - or possibly merged with this one) allow SIMD
implementations of individual intra prediction functions. We may
perhaps want to consider having separate functions per txfm-size
also (i.e. 4x4, 8x8, 16x16 and 32x32 intra prediction functions for
each intra prediction mode), but I haven't played much with that
yet.

Change-Id: Ie739985eee0a3fcbb7aed29ee6910fdb653ea269
2013-07-08 17:25:51 -07:00
Ronald S. Bultje
a5062cc635 Don't call encode_sb() for the final of 4-split subpartitions.
The resulting reconstruction is never used, thus it just wastes CPU
cycles. Reduces encode time of first 50 frames of bus (speed 0) @
1500kbps from 2min2.0 to 2min1.2, i.e. a 0.65% overall speedup.

Change-Id: I74755ca3aadc21e2be220f486259060bd4088c45
2013-07-08 16:22:39 -07:00
Ronald S. Bultje
8fde07a3ae Don't recalculate mv_ref costs for each block/partition.
Changes cost_mv_ref() into doing a LUT into pre-calculated cost
arrays instead. Encode time of first 50 frames of bus (speed 0)
@ 1500kbps goes from 2min11.6 to 2min10.9, i.e. 0.5% faster overall.

Change-Id: If186e92c34c201b29cbbc058785a15c9c09e433a
2013-07-08 16:22:39 -07:00
Ronald S. Bultje
5a73254918 Remove unnecessary memset(best_index, 0) from trellis/optimize.
First 50 frames of bus @ 1500kbps (speed 0) goes from 2min12.6 to
2min11.6, i.e. 0.75% overall speedup.

Change-Id: I67054f8146e82a02b6457c51a1c8627a937e5e1e
2013-07-08 16:22:39 -07:00
Ronald S. Bultje
fcf7998a47 Remove memcpy() in handle_inter_mode() filter selection.
Encode time of first 50 frames of bus (speed 0) @ 1500kbps goes from
2min4.9 to 2min3.1, i.e. a 1.4% speedup overall.

Change-Id: Ibe8b08d159797504c5d0c5122de1b6da3b6595e0
2013-07-08 16:22:39 -07:00
Ronald S. Bultje
ed995afba1 Make frame-wide filter-type decision fully RD-based.
Overall, on all test sets, this gains about +0.2% on all metrics.
City is a clip where this really hurts (-1.0% on all metrics), I'm
not quite sure why yet. Maybe interesting to look into in the future.

Change-Id: I6f0eecb20e72f0194633270d30bf00d76d9eae78
2013-07-08 16:22:37 -07:00
Dmitry Kovalev
b7559258a4 Using mi_cols instead of mb_cols.
Eliminating usage of mb-units, switching to mi-units. Adding
ALIGN_POWER_OF_TWO macro.

Change-Id: I2491c969f713207c062011878b57e4e531818607
2013-07-08 14:54:04 -07:00
Deb Mukherjee
d9b62160a0 Implements several heuristics to prune mode search
Skips mode searches for intra and compound inter modes depending
on the best mode so far and the reference frames. The various
heuristics to be used are selected by bits from a flag. The
previous direction based intra mode search pruning is also absorbed
in this framework.

Specifically the flags and their impact are:

1) FLAG_SKIP_INTRA_BESTINTER (skip intra mode search for oblique
directional modes and TM_PRED if the best so far is
an inter mode)
derfraw300: -0.15%, 10% speedup

2) FLAG_SKIP_INTRA_DIRMISMATCH (skip D27, D63, D117 and D153
mode search if the best so far is not one of the closest
hor/vert/diagonal directions.
derfraw300: -0.05%, about 9% speedup

3) FLAG_SKIP_COMP_BESTINTRA (skip compound prediction mode
search if the best so far is an intra mode)
derfraw300: -0.06%, about 7-8% speedup

4) FLAG_SKIP_COMP_REFMISMATCH (skip compound prediction search
if the best single ref inter mode does not have the same ref
as one of the two references being tested in the compound mode)
derfraw300: -0.56%, about 10% speedup

Change-Id: I1a736cd29b36325489e7af9f32698d6394b2c495
2013-07-08 12:17:12 -07:00
Jingning Han
a38cf2658a Merge "Refactor SSE2 8x8 functional units" 2013-07-05 11:18:18 -07:00
Paul Wilkins
ef0ca2deaa Merge "Fix to comp_inter_joint_search_thresh feature." 2013-07-04 03:27:00 -07:00
Dmitry Kovalev
f72e072555 Refactoring setup_pre_planes function.
Removing set_refs, adding set_ref function.

Change-Id: I5635c478b106ae4e57d317f1c83d929644307e63
2013-07-03 17:42:01 -07:00
Dmitry Kovalev
2ce6b23473 Merge "Adding write_skip_coeff function." 2013-07-03 16:33:58 -07:00
Jingning Han
68172dbede Merge "Enable early termination in rd search" 2013-07-03 14:20:41 -07:00
Dmitry Kovalev
430bd0c94a Merge "Replacing 64 / MI_SIZE with MI_BLOCK_SIZE." 2013-07-03 14:16:02 -07:00
Dmitry Kovalev
dda1835dc6 Adding write_skip_coeff function.
Change-Id: I221126f22ab9067348eb0efb8a73b15a8f49c3fd
2013-07-03 13:23:47 -07:00
Jingning Han
2bd6fe08f8 Enable early termination in rd search
This commit allows encoder to detect the cumulative rate-distortion
cost per transformed block inside a partition. If the cumulative
rd cost is already above the best rd value, it terminates the rest
operations and continue to next prediction mode test.

It reduces the runtime of bus at target bit-rate 2000 from 308 second
to 266 second, i.e., about 13% speed-up at no performance penalty.

Change-Id: I5f15a3d8955d97031d5653006027866a00654e7a
2013-07-03 12:54:18 -07:00
Dmitry Kovalev
2ad62c9312 Calling set_partition_seg_context() instead of code duplication.
Change-Id: I65be6acc54c99688fd1f0c946cec3511514b8555
2013-07-03 11:15:58 -07:00
Dmitry Kovalev
5a21de8418 Replacing 64 / MI_SIZE with MI_BLOCK_SIZE.
Change-Id: I32276552b3ea6dc1dce8e298be114cfe1019b31c
2013-07-03 10:54:50 -07:00
Dmitry Kovalev
60198a595d Merge "Adding write_selected_txfm_size function." 2013-07-03 10:33:55 -07:00
Jingning Han
2cb75c9607 Refactor SSE2 8x8 functional units
These serve as building blocks for SSE2 8x8 and 16x16 ADST/DCT
hybrid transform coding.

Change-Id: I4089a754c66e0c986f67d9b8ec4dfb9627ad430d
2013-07-03 10:11:59 -07:00
Ronald S. Bultje
61fe678f36 Merge "Use pmovmskb to skip quantize loops over empty coefficients." 2013-07-03 09:05:48 -07:00
Paul Wilkins
f58b44ad62 Fix to comp_inter_joint_search_thresh feature.
When this is 0 (BLOCK_SIZE_AB4X4) we want to do
the inter joint search for all sizes.

Change-Id: Id40cd6fe7790e7e1165352b9cef5e12fa8c0bc88
2013-07-03 16:58:34 +01:00